0.Cover

L6-8

Learning by Building Identification Trees

An identification tree is a decision tree in which
- Each set of possible conclusions is established implicitly by a list of samples of known class.
The problem is to find a way to build a "suitable" tree to classify a given set of data.
There may be more than one tree classifying the data correctly ¥ Occam's Razor (specialized to identification trees):
- The world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly.
- Example: (subburned example)

Name	Hair	Height	Weight	Lotion	Result
Sarah	Blonde	Average	Light	no	Sunburned
Dana	Blonde	Tall	Average	Yes	None
Alex	Brown	Short	Average	Yes	None
Annie	Blonde	Short	Average	no	Sunburned
Emily	Red	Average	Heavy	no	Sunburned
Pete	Brown	Tall	Heavy	no	None
John	Brown	Average	Heavy	no	None
Katie	Blonde	Short	Light	Yes	none

You can construct two different decision trees as follows.

The simpler one should be used.

Criteria for finding the suitable tree:
- finding the smallest tree is computational expensive
- try to divide the database of samples into subsets in which many samples have the same classification ¥ minimize disorder.

Means to measure disorder ¥ entropy.

Average disorder = weighted average of entropy in each branch

within each branch,

where c=no. Of classes, H_k = entropy for branch k

Average disorder = where n=no. of branches
If n_k = no. of samples in branch k
n_t = no. of samples in all branches
n_ki = no. of samples in branch k of class i,

Average disorder =

Remark: Entropy increases with disorder.

For 2 classes, let p₁=0, p₂=1, then H=-1´log 1 - 0 log 0 « 0

If p₁ = 0.5, p₂ = 0.5, H=(-0.5 ´ log 0.5) ´ 2 = 1

Test	Hair	Height	Weight	Lotion
Disorder	0.5	0.69	0.94	0.61

To generate an identification tree:

Until each leaf node is populated by as homogeneous a sample set as possible:
- select a leaf node with an inhomogeneous sample set
- Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder.

From Trees to Rules

It is simple to convert an identification tree into a set of rules.
For the previous example, the following rule can be obtained:

IF the person's hair is blonde
the person uses lotion
then nothing happens

IF the person's hair color is blonde
the person uses no lotion
then the person turns red

IF the person's hair color is red
then the person turns red

IF the person's hair color is brown
then nothing happens.
Rules can be simplified:
- ask whether any of the antecedents can be eliminated without changing what the rules does.
- Construct contingency table.
Consider the previous example,
- two rules has two antecedents, consider the first one.
- Try to eliminate the first antecedent. The result still holds. Hence can simplify the rule to:
  
  IF the person uses lotion
  then nothing happens

	No change	Sunburned
Blonde hair	2	0
Not blonde hair	1	0

From this table, it can be observed that blonde hair has no effect on nothing happens.

	No change	Sunburned
Use lotion	2	0
Use no lotion	0	2

	No change	Sunburned
Blonde hair	0	2
Not blonde hair	2	1

	No change	Sunburned
Use lotion	0	2
Use no lotion	2	0

Hence the antecedent cannot be deleted.
Use the same approach to the other rules, nothing can be eliminated.
The rule sets can be further simplified by using a default rule ¥ rules that is to be used only if no other rules applies.
Default rules should eliminate as many other rules as possible.

If the person's hair is blonde
the person uses no lotion
then the person turns red

If the person's hair is red
then the person turns red

If no other rule applies
then nothing happens

If the person uses lotion
then nothing happens

If the person's hair is brown
then nothing happens

If no other rule applies
then the person turns red
In the previous example, two rules indicate the person turn red, while two rules indicate nothing happens. You can replace one by a default rule.
To break ties,
- choose the one that covers the most common consequent in the sample sets
- the one that produce the simplest rule set (measured by total number of antecedents, e.g.)

To summarize, to generate rules from an identification tree:

create one rule for each root-to-leaf path in the identification tree.
Simplify each rule by discarding antecedents that have no effect on the conclusion reached by the rule
Replace those rules that share the most common consequent by a default rule that is triggered when no other rule is triggered. In the event of a tie, use some heuristic tie breaker to choose a default rule.

Significance of the evidences

	R₁	R₂
P₁	l	m
P₂	n	o

Where R₁= presence of result R
R₂ = absence of result R
P₁ = presence of property P
P₂ = absence of property P

If all the numbers are small, get rid of an antecedent rather than treat it as though it were solidly supported. Because the evidence is not strong enough.
If the ratio l/m is the same, or nearly the same as n/o, P do not produce useful information ¥ get rid of the antecedent.
If the numbers are large, and if l/m is very different from n/o, then keep the antecedent.