U-
Reasoning with Uncertainty
Fuzzy Approach
Probabilistic Approach
Dempster Shafer Theory
Fuzzy Logic
Problem with the conventional "either black or white" approach.
"All people are bald!"
Introduced by L. A. Zadeh in 1965.
user formal mathematical tools to investigate problems pertaining to uncertainty, ambiguity and vagueness.
B.
Russell:
"All traditional logic habitually assumes that
precise symbols are being employed. It is therefore not applicable
to this terrestrial life but only to an imagined celestial
existence.
Principle of Incompatibility (Zadeh)
"As the complexity of a system increases, our ability to make precise and yet significant statements about its behaviour diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive.
Conventional Set theory: membership {0,1}
Fuzzy set theory: membership [0,1]
D. Dubois and H. Prade:
"The grade of membersehip reflect an ordering of the object in the universe induced by the predicate associated with fuzzy set A; this ordering, when it exists, is more important than the membership value itself.
In fact, we can define a fuzzy set on a lattice where only partial order exists 0 L-fuzzy set.
Operations of Fuzzy Sets
The set
operations of fuzzy sets resembles that of an ordinary set. Note
that ordinary sets are just special cases of a fuzzy set.
{0,1}
Í [0,1]
We require
the basic set operations to be consistent with ordinary set
operations when the operands happens to be ordinary set,
i.e.
membership mA(x)
= either 1 or 0.
If mA(x)
denotes the membership of x in A, then
mAÈB(x)
= max (mA(x),
mB(x))
mAÃB(x)
= min (mA(x),
mB(x))
m~A(x)
= 1 - (mA(x)
It can be proved that the max and min operators are the only operators f and g that satisfy the following requirement:
The
membership value of x in a compound fuzzy set AÈB,
AÃB
depends on the membership value of x in the elementary fuzzy sets
A, B that forms it, but not on anything else.
I.e. "x
ÎX, mAÈB(x)
= f (mA(x), mB(x))
mAÃB(x)
= g (mA(x), mB(x))
f and g are commutative, associative and mutually distributive operators.
f and g are
continuous and non-decreasing wrt each of their
arguments.
[Intuitively, the membership of x in AÈB
or AÃB
cannot decrease when the membership of x in A or B
increases. A small increase of mA(x)
cannot induce a large increase of mAÈB(x)
or mAÃB(x).]
f(u,u) and
g(u,u) are strictly increasing.
[If mA(x1)
= mB(x1) >
mA(x2) =
mB(x2),
then the membership of x1 in AÈB
or AÃB
is certainly straightly greater than that of x2,
Membership
in AÈB
requires more, and membership in AÃB
less than the membership in one of A or B.
i.e. "x
ÎX, mAÈB(x)
ú max (mA(x),
mB(x))
mAÃB(x)
³ min (mA(x),
mB(x))
Complete
membership in A and B implies complete membership in
AÃB.
Complete lack of membership in A and B implies
complete lack of membership in AÈB.
g(1,1)=1, f(0,0)=0
The above assumptions are consistent and sufficient to ensure the uniqueness of the choice of union and intersection. (max/min)
Furthermore,
A=B iff mA(x) = mB(x) "x ÎX
AÍB iff mA(x) ú mB(x) "x ÎX
Fuzzy Logic
Both set and logic are boolean algebras.
There is
correspondence between conventional set theory and conventional
logic,
set union 0 logical or
set
intersection 0 logical and
set
complement 0 logical not
if T(S) is
the truth value of the logical expression S, (with value between 0
and 1), then
T(not S) = 1 -
T(S)
T(S1 and S2) = min (T(S1),
T(S2))
T(S1 or S2) = max
(T(S1), T(S2))
The truth
value can even be another fuzzy set, which takes value between 0 and
1 0 Type 2 fuzzy set.
E.g. The truth
value can be the following set:
{very-true, true, rather true,
more-or-less-true, not-true}
where each of them a fuzzy set on
[0,1].
Linguistic Variable and Linguistic Hedges
A variable can takes absolute values, such as temperature (in Celsius), height (in m) and speed (in kph).
However, human seldom do reasoning with exact values. Instead, we use linguistic concepts such as hot, cold, short, tall, slow and fast for the above concept.
E.g.
Rule
1
IF Speed is slow
THEN Make the acceleration high
Rule
2
IF Temperature is low
AND Pressure is medium
THEN Make
the speed very slow
The variables that takes these "linguistic" values are called linguistic variables.
We can define the fuzzy membership values in these linguistic values (e.g. tall, short) from the absolute values (1.73m, 1.5m) via a function. [Note that the ordering is more important than the values].
Human may add additional vagueness by using an adverb such as very, somewhat, more-or-less etc.
hedges û model the use of such additional adverb from the original fuzzy set, by applying a function on the membership value.
For example, from the membership function of the fuzzy set tall, we can obtain the corresponding membership function of very tall. [The same applies for medium and short].
Example of Hedges
Concentration
(very)
mCON(A) (x) =
(mA(x))2
tall
persons 0 very tall persons.
Dilation
(somewhat)
mDIL(A) (x) =
(mA(x))0.5
tall
persons 0 more or less tall
person
Intensification
(indeed)
mINT(A) (x) =
2(mA(x))2 for 0 ú
mA(x) ú
0.5
mDIL(A) (x) = 1 -
2(1-(mA(x))2 for
0.5 < mA(x)
ú 1
tall persons 0
indeed tall person
power (very
very)
mPOW(A) (x) =
(mA(x))n
tall
person 0 very very tall person
(e.g. use n=3)
We can use
hedge and fuzzy operations to form new hedges, e.g.
From tall
person 0 not very tall person
mB
(x) = 1 - (mA(x))2
Or even not
very tall persons and not very short persons:
mC
(x) = [1 - (mX(x))2]
Ù [1 -
(mY(x))2]
Fuzzy Inference
fuzzy proposition: a logical expression returns a value between 0 and 1.
A fuzzy rule
relates two fuzzy propositions:
IF X is A THEN Y is B
e.g. IF
Temperature is normal THEN Velocity is medium
fuzzy inference attempts to establish a belief in a rule's conclusion given available evidence on the rule's premise.
The relationship between the premise and the conclusion is represented in the form of a fuzzy relation. (can be represented in a matrix form, for discrete case).
Use max-min
inference, where implication operator used is:
truth(ai
« bj) = min(ai,
bj)
Other
possible forms of implication:
Lukasiewicz implication min(1,
1-ai+bj)
Max product aibj
Actually,
a list of 14 implication was compiled in the book:
G J Klir &
B Yuan, "Fuzzy Sets & Fuzzy Logic", Prentice Hall
Example:
Consider
Normal temperature={0/100,0.5/125,1/150, 0.5/175,0/200}
Medium
Velocity={0/10, 0.6/20, 1/30, 0.6/40, 0/50}
We can form the fuzzy relation matrix M=[mij]
|
|
100 |
125 |
150 |
175 |
200 |
|
---|---|---|---|---|---|---|---|
10 |
|
min(0,0) |
Min(0.5,0) |
Min(1,0) |
Min(0.5,0) |
min(0,0) |
|
20 |
|
Min(0,0.6) |
min(0.5,0.6) |
Min(1,0.6) |
min(0.5,0.6) |
Min(0,0.6) |
|
30 |
|
Min(0,1) |
Min(0.5,1) |
min(1,1) |
Min(0.5,1) |
Min(0,1) |
|
40 |
|
Min(0,0.6) |
Min(0.5,0.6) |
Min(1,0.6) |
min(0.5,0.6) |
Min(0,0.6) |
|
50 |
|
min(0,0) |
Min(0.5,0) |
Min(1,0) |
Min(0.5,0) |
min(0,0) |
|
or,
|
0 |
0 |
0 |
0 |
0 |
|
---|---|---|---|---|---|---|
|
0 |
0.5 |
0.6 |
0.5 |
0 |
|
|
0 |
0.5 |
1 |
0.5 |
0 |
|
|
0 |
0.5 |
0.6 |
0.5 |
0 |
|
|
0 |
0 |
0 |
0 |
0 |
|
Assume that
now the fuzzy input temperature is modelled by the following
fuzzy set A':
A'={0/100, 0.5/125, 0/150, 0/175,0/200}
The output fuzzy set for velocity can then be obtained by using max-min matrix multiplication:
|
0 |
0 |
0 |
0 |
0 |
|
|
|
0 |
|
|
|
0 |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
0 |
0.5 |
0.6 |
0.5 |
0 |
|
|
|
0.5 |
|
|
|
0.5 |
|
|
0 |
0.5 |
1 |
0.5 |
0 |
|
|
|
0 |
|
= |
|
0.5 |
|
|
0 |
0.5 |
0.6 |
0.5 |
0 |
|
|
|
0 |
|
|
|
0.5 |
|
|
0 |
0 |
0 |
0 |
0 |
|
|
|
0 |
|
|
|
0 |
|
The max-min matrix multiplication is obtained by doing a normal matrix multiply but replacing multiply by min and addition by max.
The
resultant velocity fuzzy set will then be
{0/10, 0.5/20, 0.5/30,
0.5/40, 0/50}
Multiple-Premise Rules
Most rules
are of the form:
IF A AND B THEN C.
The following rule is proposed by Kosko (1992):
For each premise A & B, construct the corresponding matrix MAC and MBC.
Compute
independently by max-min matrix multiplication:
CA'
= A' . MAC,
CB' = B' . MBC
If the
premises are connected by "AND", then
C' = [A' . MAC]
Ù [B' . MBC] = CA'
Ù CB'
If the premises
are connected by "OR" then
C' = [A' . MAC]
Ú [B' . MBC] = CA'
Ú CB'
Defuzzification
The result of fuzzy inference is a fuzzy set. However, we need a discrete value so that the system knows what to do.
For example, we need a single velocity value, instead of a fuzzy set of velocity in the previous example.
The most popular defuzzification technique is the fuzzy centroid.
Multiple Fuzzy Rules
More than one rule may give value to the same attribute.
e.g.
IF temperature is normal
OR Pressure is low
THEN Velocity
is medium
IF temperature
is normal
AND Pressure is normal
THEN Velocity is low
Apply the previous inference process to each of the rules, and then compose the resultant fuzzy set together as shown below:
Building
a Fuzzy expert system
Define the system
Define the linguistic variables
Define the fuzzy sets
Define the fuzzy rules
Build the system.
Examples of Fuzzy Rules
Rules for car control:
Rule 1 - Brake lightly
IF error_angle
is large_positive
AND speed is fast
THEN make acceleration
brake_light
Rule 2 - Brake lightly
IF error_angle
is large_negative
AND speed is fast
THEN make acceleration brake_light
Rule 3 - Floor it
IF distance
is far
AND speed is NOT VERY fast
THEN make accleration floor_it
Rule 4 - Set acceleration
IF distance is far
AND speed is VERY fast
THEN make acceleration zero
Rule 5 - Slight acceleration
IF distance
is medium
AND speed is NOT fast
THEN make acceleration
slight_acceleration
Probabilistic Approach
Pearl's Probabilistic Reasoning in Bayesian Networks
Bayesian networks are directed acyclic graphs (i.e. graphs with no cycle) in which the nodes represent propositions and the arcs represent causal relations between the propositions.
The strength of these causal relations are quantified by means of conditional probabilities.
Bayesian Network
By
probability theory,
P(A,B,C,D,E) =
P(E|A,B,C,D).P(D|A,B,C).P(C|A,B).P(B|A).P(A)
according to
the explicitly shown dependency in the Bayesian
network,
P(A,B,C,D,E)=P(E|C) . P(D|A,C) . P(C|A) . P(B|A) . P(A)
independency
checking:
If all paths between nodes xi and xj
are blocked by a subset S of variables, then xi is
independent of xj given the values of the variables in S.
A Bayesian network can be constructed incrementally by the domain expert expressing the casual knowledge relating the objects in the domain.
Addition of new nodes: need to add both links and conditional probabilities.
Prospector's Subjective Bayesian Approach
used in the Prospector expert system by Duda et. al.
the uncertainty associated by the expert with a fact in the knowledge base is modeled by subjective probability.
expresses
the expert's degree of belief on the inference by an ordered
pair (LS, LN).
LS - degree of
sufficiency
LN - degree of
necessity
that condition Ì
conclusion.
First obtain the probability of the condition part by combining the probability of individual components
Rule E «
H (E: evidence, H: hypothesis)
By Bayes
Rule,
Similarly,
Combining (1) & (2):
Denote to
be the prior odd on H.
the posterior
odd on H given
that E is true, and
the
likelihood ratio
then,
Intuitively, LS can be said to express the sufficiency degree of the rules since a high value of LS will increase the probability of H provided that E is true.
In Prospector, the human expert is asked to provide the value of LS together with the rules.
Similarly,
We
obtain
where
LN is alos provided by the expert
intuitively it expresses the degree of necessity of the rule since a small value of LN will decrease the probability of H when E is false.
It can be shown that
\ if the truth of E increases the probability of H (LS > 1), then the falsity of E has a negative effect on H (LN < 1).
Management of uncertain evidence in Prospector
user: "I
am 80% sure that E is true"
i.e. P(E|relevant knowledge) =
0.8;
or P(E|E') = 0.8
then
if E is
already known, then E' does not provide additional information, i.e.
Hence
where is computed as in last section.
Mycin's Certainty Factor
the certainty factor (CF) represents a change in the belief of the hypothesis given some evidence.
CF lies
between -1 and +1.
+ : increase in
belief of the hypothesis
- : decrease
in belief of the hypothesis
Logical
combination of evidence in premise:
OR - max; AND - min; NOT -
1-x
Propagation of Cfs through the network:
parallel: several pieces of evidence supporting the same hypothesis.
sequential: hypothesis supporting other hypothesis (rule chaining).
Parallel
Combination
Sequential
Combination
Dempster-Shafer Theory of Evidence
A sample
space called "frame of discernment".
e.g. pneumonia
diagnosis: 5 organisms:
{Pneumococcus, Legionella, Klebsiella,
Chlamydia, Mycoplasma}
These 5 hypothesis are the frame of
discernment X.
Any subset
of X is itself a new hypothesis which is a disjunction (logical-or)
of the basic hypothesis in X.
\no. of
possible hypothesis = 2|X|
we can assign a probability mass, called basic probability assignment m(), to every subset of X, which satisfies:
m(H) indicates that portion of the total belief exactly commited to hypothesis H, given a piece of evidence.
Belief may
also be based on indirect support:
belief function
e.g. diagnosis of {Pneumococcus, Legionella} degree
0.4
Additional evidence support Pneumococcus degree 0.3
Legionella degree 0.2
Then BEL({Penumococcus, Legionella}) =
0.4 + 0.3 + 0.2 = 0.9
Obviously, BEL(A) + BEL(B) ¹ BEL(AÈB) when AÃB=ã.
This
non-additivity captures the fact that the knowledge of the degree of
belief in A does not necessarily give us information about the
degree of belief of .
The belief
in measures the doubt on A.
\
define
called the plausibility function.
this
represent the extent to which the evidence does not rule out A,
i.e.
It is easy to verify
The interval
[BEL(H), PL(H)] gives the range of degree of belief for H that are
consistent with the evidence.
(or the degree to which the
evidence fails to discriminate between H and ~H).
e.g. if
then total ignorance,
and
Evidence Combination
Two different sources provide 2 basic probability assignment m1 and m2.
To obtain a
combined basic prob. assignment m12 = m1 Å
m2.
where
The 1-K is for normalization
Example:
Suppose one source of evidence supports a bacteria pneumonia ({Pneumococcus, Legionella, Klebsiella}) to a degree 0.3, wheareas another disconfirms Pneumomococcus degree 0.6.
Frame of
discernment X = {Pneumococcus, Legionella, Klebsiella,
Chlamydia,
Mycoplasma}
m1(X) = 0.7
m1({Pneumococcus, Legionella, Klebsiella}) = 0.3
m2({Legionella, Kelbsiella, Chlamydia, Mycoplasma}) = 0.6
m2(X) = 0.4
then m12(X)
= 0.28 (K=0)
m12({Legionella, Klebsiella, Chlamydia,
Mycoplasma})=0.42
m12({Legionella, Klebsiella})=0.18
m12({Pneumococcus, Legionella, Klebsiella})=0.12
and m12=0 for the remaining subsets of X.
\ BEL12({Peumococcus, Legionella, Klebsiella})=0.18+0.12=0.3
BEL12({Legionella,
Klebsiella, Chlamydia, Mycoplasma})
= 0.42+0.18 = 0.6
BEL12({Legionella, Klebsiella}) = 0.18
BEL12({Legionella,
Klebsiella, Chlamydia}) =
BEL12({Legionella, Klebsiella,
Mycoplasma}) = 0.18
BEL12({Pneumococcus,
Legionella, Klebsiella, Chlamydia}) =
BEL12({Penumococcus,
Legionella, Klebsiella, Mycoplasma}) =
BEL12({Pneumococcus,
Legionella, Klebsiella}) = 0.30
BEL12({Pneumococcus,
Legionella, Klebsiella, Chlamydia,
Mycoplasma}) = 1.0
and the remaining subsets of X have a belief = 0
Now, assume a 3rd source of evidence confirms the diagnosis of Pneumococcus with degree 0.7.
\ m3({Pneumococcus}) = 0.7
m3(X) = 0.3
which when combined with m12's:
K=0.18 ´ 0.7 ´ + 0.42 ´ 0.7 = 0.42
\ 1-K = 0.58
m123({Legionella, Klebsiella}) = 0.3 ´ 0.18 / 0.58 = 0.093
m123({Pneumococcus}) = (0.7´0.12 + 0.7´0.28)/0.58=0.483
m123({Pneumococcus,
Legionella, Klebsiella})=0.12´0.3/0.58
= 0.062
m123({Legionella,
Klebsiella, Chlamydia, Mycoplasma}) =
0.42 ´
0.3 / 0.58 = 0.217
m123(X) = 0.28 ´ 0.3 / 0.58 = 0.145
m123 is 0 for the remaining subset of X.