L6-
Learning by Building Identification Trees
An identification tree is a decision tree in which
Each set of possible conclusions is established implicitly by a list of samples of known class.
The problem is to find a way to build a "suitable" tree to classify a given set of data.
There may be more than one tree classifying the data correctly ¥ Occam's Razor (specialized to identification trees):
The world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly.
Example:
(subburned example)
Name |
Hair |
Height |
Weight |
Lotion |
Result |
---|---|---|---|---|---|
Sarah |
Blonde |
Average |
Light |
no |
Sunburned |
Dana |
Blonde |
Tall |
Average |
Yes |
None |
Alex |
Brown |
Short |
Average |
Yes |
None |
Annie |
Blonde |
Short |
Average |
no |
Sunburned |
Emily |
Red |
Average |
Heavy |
no |
Sunburned |
Pete |
Brown |
Tall |
Heavy |
no |
None |
John |
Brown |
Average |
Heavy |
no |
None |
Katie |
Blonde |
Short |
Light |
Yes |
none |
You can construct two different decision trees as follows.
The simpler one should be used.
Criteria for finding the suitable tree:
finding the smallest tree is computational expensive
try to divide the database of samples into subsets in which many samples have the same classification ¥ minimize disorder.
Means to measure disorder ¥ entropy.
Average disorder = weighted average of entropy in each branch
within each branch,
where c=no. Of classes, Hk = entropy for branch k
Average disorder = where n=no. of branches
If nk
= no. of samples in branch k
nt = no. of samples in
all branches
nki = no. of samples in branch k of
class i,
Average disorder =
Remark: Entropy increases with disorder.
For 2 classes, let p1=0, p2=1, then H=-1´log 1 - 0 log 0 « 0
If p1 = 0.5, p2 = 0.5, H=(-0.5 ´ log 0.5) ´ 2 = 1
Choose the one which is least disorder:
In the previous example, the first level node should be hair.
Test |
Hair |
Height |
Weight |
Lotion |
---|---|---|---|---|
Disorder |
0.5 |
0.69 |
0.94 |
0.61 |
To generate an identification tree:
Until each leaf node is populated by as homogeneous a sample set as possible:
select a leaf node with an inhomogeneous sample set
Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder.
From Trees to Rules
It is simple to convert an identification tree into a set of rules.
For the previous example, the following rule can be obtained:
IF the
person's hair is blonde
the person uses lotion
then nothing
happens
IF the
person's hair color is blonde
the person uses no
lotion
then the person turns red
IF the
person's hair color is red
then the person turns red
IF the
person's hair color is brown
then nothing happens.
Rules can be simplified:
ask whether any of the antecedents can be eliminated without changing what the rules does.
Construct contingency table.
Consider the previous example,
two rules has two antecedents, consider the first one.
Try to eliminate the first antecedent. The result still holds. Hence can simplify the rule to:
IF the
person uses lotion
then nothing happens
Or can construct the contingency table:
|
No change |
Sunburned |
---|---|---|
Blonde hair |
2 |
0 |
Not blonde hair |
1 |
0 |
From this table, it can be observed that blonde hair has no effect on nothing happens.
Consider the
other antecedent.
The corresponding contingency table is:
|
No change |
Sunburned |
---|---|---|
Use lotion |
2 |
0 |
Use no lotion |
0 |
2 |
Consider the
other rule.
The corresponding contingency table is:
|
No change |
Sunburned |
---|---|---|
Blonde hair |
0 |
2 |
Not blonde hair |
2 |
1 |
|
No change |
Sunburned |
---|---|---|
Use lotion |
0 |
2 |
Use no lotion |
2 |
0 |
Hence the antecedent cannot be deleted.
Use the same approach to the other rules, nothing can be eliminated.
The rule sets can be further simplified by using a default rule ¥ rules that is to be used only if no other rules applies.
Default rules should eliminate as many other rules as possible.
If the
person's hair is blonde
the person uses no lotion
then the
person turns red
If the
person's hair is red
then the person turns red
If no
other rule applies
then nothing happens
If the
person uses lotion
then nothing happens
If the
person's hair is brown
then nothing happens
If no
other rule applies
then the person turns red
In the previous example, two rules indicate the person turn red, while two rules indicate nothing happens. You can replace one by a default rule.
To break ties,
choose the one that covers the most common consequent in the sample sets
the one that produce the simplest rule set (measured by total number of antecedents, e.g.)
To summarize, to generate rules from an identification tree:
create one rule for each root-to-leaf path in the identification tree.
Simplify each rule by discarding antecedents that have no effect on the conclusion reached by the rule
Replace those rules that share the most common consequent by a default rule that is triggered when no other rule is triggered. In the event of a tie, use some heuristic tie breaker to choose a default rule.
Significance of the evidences
Consider the following contingency table:
|
R1 |
R2 |
---|---|---|
P1 |
l |
m |
P2 |
n |
o |
Where R1=
presence of result R
R2 = absence of result R
P1
= presence of property P
P2 = absence of property P
If all the numbers are small, get rid of an antecedent rather than treat it as though it were solidly supported. Because the evidence is not strong enough.
If the ratio l/m is the same, or nearly the same as n/o, P do not produce useful information ¥ get rid of the antecedent.
If the numbers are large, and if l/m is very different from n/o, then keep the antecedent.
Use statistical means ¥¥ Fisher's exact test (consult the book).