%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Prob(cj | x) > Prob(ci | x) for all i =\ jHere, the posterior probability is used as the discriminant function. An alternative criterion for minimum-error-rate classification is to choose class cj so that
Prob(x | cj)Prob(cj) > Prob(x | ci)Prob(ci) for all i =\ jwhich is derived from well-known Bayes theorem:
Prob(x | c)Prob(c) Prob(c | x) = ------------------- Prob(x)Note that the risk factor can be incorporated into the function for consideration.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0.01 * 0.9 * 0.9 * (1 - 0.8) Probability ratio = ---------------------------- = 0.08 0.05 * 0.5 * 0.9 * (1 - 0.1)So flu is at least ten times more likely than pneumonia.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
P(ai1 = wj | ck) = P(ai2 = wj | ck)
nc + mp ------------------ n + m
P(ck) ∏ P(wj | c k)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
P(x1, ..., xl | y1, ..., ym,z1, ..., zn) = P(x1, ..., xl | z1, ..., zn)[NOTE:] In the network, each node is asserted to be conditionally independent of its nondescendents, given its immediate parents.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
P(V|E) = a * P(E|V)P(V)where P is the probability function and a is a normalizing factor. Normalization makes the sum of the probabilities of all exhaustive and mutually exclusive values equal one. To see information propagation, consider an example. In a simple network, suppose variable A is a causal variable connected to both variables B and C. The belief value of A can propagate to derive that of B by
P(B) = P(B|A)P(A) + P(B|not A)P(not A)and
P(not B) = P(not B|A)P(A) + P(not B|not A)P(not A)with subsequent normalization for ensuring
P(B) + P(not B) = 1The same is true for deriving the belief value of C. The link pointing from node A to node B is characterized by the conditional probabilities P(B|A), P(B|not A), P(not B|A), and P(not B|not A) (often they are represented as a matrix). The message passed from a parent node to a child node is called a pi message. For example, the pi message sent from node A to node B consists of P(A) and P(not A) modulated by the conditional probability matrix. This illustrates forward information propagation. Suppose information E1 arrives at node B. The probability of B is updated by the Bayes theorem:
P(B|E1) = a1 * P(E1|B)P(B)and
P(not B|E1) = a2 * P(E1|not B)P(not B)This information can propagate to node A using the relation
P(E1|A) = P(E1|B)P(B|A) + P(E1|not B)P(not B|A)and
P(E1|not A) = P(E1|B)P(B|not A) + P(E1|not B)P(not B|not A)Then the probability of A is updated by the Bayes theorem. The message passed from a child node to a parent node is called a lambda message. For example, the lambda message received at node A from node B consists of P(E1|A) and P(E1|not A). Information propagates backwards this time. The same evidence should not be used more than once in updating the belief value at the same node. For example, suppose new information E2 arrives at node C. This information is passed backwards to node A, and in turn passed forward to node B. To avoid counting information E1 twice at node B, the pi message sent from node A to node B at this point should be divided by the lambda message (on a term-by-term basis) sent from node B to node A earlier.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%