MACHINE LEARNING
1 Duality (40 points)
In class, we have talked the maximum entropy model. For learning the posterior probabilities Pr(ylx) = p(yjx) for y = 1, ,K given a set of training examples (xi, yi), i = 1, , n, we can maximize the entropy of the posterior probabilities subject to a set of constraints, i.e., K p141r) — E Egylmingyko (ml Ual 8.t. Ep(ylxi) = 1 (1)
ysal f (xi) = P(ci) f (xi) , i = 1, , d, y = 1,…, K, n n where d(y, yi) = 1 if yi = y and 0 otherwise, and Mx’) is a feature function. Let us consider f3 (xi) = [xi], i.e., the j-th coordinate of xi. Derive the dual of the above Maximum Entropy Model. How is this dual problem related to the logistic regression?
Derive the dual of the above Maximum Entropy Model. How is this dual problem related to the logistic regression?