Concept Learning
Li M. Fu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Introduction
- It is concerned with
acquiring the definition of a general category (concept)
from a sample of positive and negative training examples of the
category.
- It can be formulated as a problem of searching through a predefined
space of potential hypotheses for one that best fits the
training examples.
- The search can be efficiently organized by general-to-specific ordering
of hypotheses.
- Algorithms: Find-S and Version Space.
- The main issue: inductive bias
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Topics
- A concept learning task
- Concept learning as search
- Find-S: Finding a maximally specific hypothesis
- Version spaces
- The candidate elimination algorithm
- The boundary set representation
- Inductive bias
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Machine Learning Basics
Basics
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Formal Definitions
- Concept learning: c: X -> {0, 1}. What is T, E, P? Supervised or
unsupervised learning?
- Training instance representation: (feature vector, positive/negative).
What is a feature vector?
Given n binary attributes. how many possible instances?
- Hypothesis representation: feature vector. Admit don't care value
(* or ?). Same representation as instances, why?
Given n binary attributes. how many possible hypotheses?
- Hypothesis function: h: X -> {0, 1}.
- Instance space (X) versus hypothesis space (H)
- Learning goal: Find a hypothesis h in H such that h(x) = c(x) for all
x in X.
- Inductive learning hypothesis
- Concept learning as search: Search for a hypothesis consistent with
the training instances. How to represent the hypothesis space for
allowing an efficient search?
- General-to-specific ordering of hypotheses
- A hypothesis viewed as a predicate defines a set of instances
satisfying the hypothesis
- More_general_than_or_equal_to:
hi >=g hj <=> for all x in X, hj(x) = 1 => hi(x) = 1.
- More_general_than: hi >g hj <=> hi >=g hj and hj not >=g hi.
- More_specific_than: hi >s hj <=> hj >g hi.
- Partial order versus complete order
- Does >=g define a partial or complete order?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Generalization Operators
- Introducing variables
- Using property hierarchies (specific to general)
- Dropping conditions
- Introducing disjunctions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Specialization Operators
- Instantiating variables with specific values
- Using property hierarchies (general to specific)
- Adding conditions
- Introducing conjunctions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Inductive Inference (Induction)
Inductive Inference
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FIND-S
- The algorithm:
(1)Initialize h to the most specific hypothesis in H.
(2) For each positive instance x, if h is not satisfied by x,
then minimally generalize h so that it is matched by x;
else do nothing.
(3) Output hypothesis h.
- Issues:
(1) Convergence
(2) Finding the correct concept
(3) In favor of the most specific hypothesis
(4) Multiple MSHs
(5) Data inconsistency
(6) Backtracking ?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Version Space Approach
- Consistent(h, D) <=> for all (x, c(x)) in D, h(x) = c(x),
where h is a hypothesis and D is a set of training examples.
- Satisfy(h, x) <=> h(x) = 1
- Version Space VS_{H,D} = {h in H | Consistent(h, D)}
- The List-Then-Elimination algorithm
- The Candidate-Elimination algorithm
- Boundary set representation of the version space
- The general boundary (wrt H and D):
G <=> {g in H | Consistent(g, D) and these exists NO g' such that
[g' >g g and Consistent(g', D)]} (where >g: more general than)
- The specific boundary (wrt H and D):
S <=> {s in H | Consistent(s, D) and these exists NO s' such that
[s >g s' and Consistent(s', D)]}
- Version space representation theorem:
VS_{H,D} = {h in H | there exist s in S and g in G such that
(g >=g h >=g s)}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Candidate Elimination Algorithm
Version Spaces
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Version Space Issues:
- Convergence? Divergence? Note that the size of the version space
is monotonically non-increasing over time.
- Convergence to the correct hypothesis
- What training instances should be selected?
- How can a partially learned concept (a partially converged version space)
be used?
- Incremental learning
- Conjunctive versus disjunctive concepts
- Noise tolerance
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Inductive Bias
- The inductive bias of a concept learning algorithm L is
any minimal set of assertions B such that
for all x in X, (B and D and x) => L(x, D)
where L(x, D): classification of x by L after training on data D.
- Types of inductive bias:
- (Hypothesis)preference bias (or search bias)
- (Language)restriction bias
- An unbiased learned: Size of the hypothesis space? Why futile?
- Inductive bias of the Candidate Elimination algorithm:
B = {c in H}
A simple proof: c in H and thus c in VS_{H,D}, and therefore c(x) = L(x, D).
- Inductive bias from the weakest to strongest:
- Rote learning: no bias
- Candidate-Elimination: c in H
- Find-S: "c in H" plus "all instances are negative ones unless opposed"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Summary
- Concept learning can be cast as a search problem.
- General-to-specific ordering of hypotheses provides
a useful search structure.
- What are the limitations of the Find-S algorithm?
- The version space approach is good for single concept learning.
- What are the weaknesses of the version space approach?
- What is the inductive bias of the candidate elimination algorithm?