Learning Rule Sets
Li M. Fu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Introduction
-
If-then rules are one of
the most expressive representations of knowledge.
-
Learning rule sets is like learning disjunctive concepts.
-
Learning rules involving variables is challenging.
-
Learning first-order Horn clauses is called inductive
logic programming (ILP).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Topics
-
Sequential covering algorithms
-
Decision trees
-
The FOIL program
-
Induction as inverted deduction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Example
-
Attribute Values
-------------------------------
A a1, a2, a3
B b1, b2
C c1, c2, c3
-------------------------------
Concept: X
-
Data:
Instance Label
-------------------------------
A=a2, B=b1, C=c1 N
A=a1, B=b1, C=c3 P
A=a3, B=b2, C=c2 N
A=a3, B=b2, C=c3 P
A=a1, B=b1, C=c2 P
A=a2, B=b2, C=c1 N
-------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Example
-
Predicates:
-------------------------------
Father
Female
Mother
Male
Equal
-------------------------------
Variables:
-------------------------------
x, y, z
-------------------------------
Concept: GrandDaughter(x, y)
-
Data:
Instance
---------------------------------------------
Father(John, Tom) Father(Tom, Mary)
Father(Tom, Peter) Mother(Linda, Mary)
Female(Mary) Daughter(Mary, Tom)
Son(Tom, John) Male(Tom)
Male(Peter) Male(John)
GrandDaugher(Mary, John)
Grandson(Peter, John)
----------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Induction as Inverted Deduction
Given D, find h such that
for all (x, f(x)) in D, "B and h and x" imply f(x)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Comparison on Rule Learning Methods
(Fu and Shortliffe 2000):
- Symbolic heuristic search:
This method commonly uses general
to specific beam search or hill-climbing search. A performance
criterion based on accuracy and coverage
needs to be defined for evaluating and selecting rules.
However, this criterion is often ill-defined especially in the case
of noise, inconsistency, and uncertainty.
And there is no good theoretical guidance for global optimization.
- Decision trees:
In this approach, classification knowledge is first represented
as a decision tree and then
the tree is translated as a set of rules.
The decision tree is constructed by sequentially selecting
attributes based on an information theoretical measure.
This approach has the advantage in speed
but it searches incompletely through a complete
hypothesis space and is also sensitive to data noise.
- Inverted logic deduction:
In this approach, learning proceeds
by generating a hypothesis that, together with some background
knowledge, explains the given data.
This approach does not naturally handle noise, inconsistency,
and uncertainty. The search through
the hypothesis space is intractable
in the general case and increasingly complex with the amount
of background knowledge.
So far, there is no good solution to all of these problems together.
- Neural networks:
In this approach, a neural network learns a function to fit
the given data, and then the function is decoded as
a set of rules.
There is good theoretical support in functional approximation,
but what remains to be solved
is how to extract correct rules from a trained neural network.
- Genetic Algorithms:
In this approach, each rule set is encoded as a bit string and
genetic search operators are applied to explore the hypothesis space.
The stochastic nature of the algorithm provides a means for
alleviating the local minima effect,
but the element of randomness may also introduce some degree
of imprecision. Experience has shown that this approach fails to
learn true domain rules even in not too complex domains.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Data Mining Process
-
Data selection/Data sampling
-
Database operations: Project, Join, Select, denormalization
-
Cleaning: domain consistency, de-duplication, and disambiguation.
-
Enrichment: Extra useful information
-
Data coding: Transformation of data into a form processable by
data mining algorithms.
For instance,
- Address to region
- Birth date to age
- Continuous attributes to discrete values
-
Data Mining
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Data Mining Tools
- WHIZWHY: rule-based reasoning
- HUGIN: Bayesian reasoning
- DATA LOGIC/R: rough set theory
- NICEL: fuzzy logic
- MINDSET: visualization
- DARWIN: high performance computing
- SAS: statistical reasoning
- DataMind: data warehousing
- RECON: top-down and bottom-up data mining
- Intelligent Miner: multi-strategy
- IDIS: natural language reports
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Instance-Based Learning
-
It does not construct general, explicit target functions or hypotheses.
-
It stores the training examples.
-
Generalization is postponed until a new instance must be classified.
-
It is a kind of delayed or lazy learning strategy.
-
It is construct a local target function for a new instance
to be classified.
-
Advantages: when the target function is very complex
but still can be described by a collection of less
complex local approximations.
-
Advantages: when data are scarce.
-
Disadvantages: cost of classifying can be high.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Instance-Based Learning Methods
- k-nearest neighbor algorithm
- distance-weighted nearest neighbor algorithm
- locally weighted regression
- radial basis function networks
- case-based reasoning