Multiplicative Weight Update Method

Multiplicative weight update method is a meta-algorithm. It is an algorithmic technique which "maintains a distribution on a certain set of interest, and updates it iteratively by multiplying the probability mass of elements by suitably chosen factors based on feedback obtained by running another algorithm on the distribution".[1] It was discovered repeatedly in very diverse fields such as machine learning (AdaBoost, Winnow, Hedge), optimization (solving LPs), theoretical computer science (devising fast algorithm for LPs and SDPs), and game theory.

Name

"Multiplicative weights" implies the iterative rule used in algorithms derived from the Multiplicative Weight Update Method.[2] It is given with different names in the different fields where it was discovered or rediscovered.

Background

An algorithm named "Fictitious Play" similar to multiplicative weight update method was proposed in game theory in the early 1950s. Grigoriadis and Khachiyan[3] applied a randomized variant of "Fictitious Play" to solve two-player zero-sum games efficiently using the multiplicative weights algorithm. That is, player allocates higher weight to the actions that had a better outcome and choose his strategy relying on these weights. In machine learning, Littlestone applied the earliest form of the multiplicative weights update rule in his famous Winnow Algorithm, which is similar to Minsky and Papert's earlier perceptron learning algorithm.

Motivation

A binary decision needs to be made based on n experts’ opinions to attain an associated payoff. In the first round, all experts’ opinions have the same weight. The decision maker will make the first decision based on the majority of the experts' prediction. Then, in each successive round, the decision maker will repeatedly update the weight of each expert's opinion depending on the correctness of his prior predictions.

Algorithm Analysis

Weighted Majority Algorithm[4][5]

"Given the setting in prediction from expert advice. Denote the experts as . Initialize the set of experts who have not made a mistake to .

   Initialization: 
      Fix an ≤1/2. For each expert, associate the weight ≔1.
   For  = , ,…,
      1. Make the prediction that is the weighted majority of the experts’ predictions based on the weights . That is, making a binary choice depending on which prediction has a higher total weight of experts advising it (breaking ties arbitrarily). 
      2. For every expert i who predicts wrongly, decrease his weight for the next round by multiplying it by a factor of (1-η):
                                                             =            (update rule)

Analysis

After steps, let be the number of mistakes of expert i and be the number of mistakes our algorithm has made. Then we have the following bound for every :

                             .

In particular, this holds for i which is the best expert. That is, having the least .

Multiplicative Weights Algorithm[2]

Initialization: Fix an η≤1/2. For each expert, associate the weight ≔1.

  For  = , ,…,:
   1. Choose decision  with probability proportional to its weight . I.e., use the distribution over decision 
                       
      where .
   2. Observe the cost of the decision . 
   3. Penalize the costly decision by updating their weights as follows: for every decision , set 
                               

Analysis

Similar to the analysis for weighted majority algorithm, assume that all costs and . Then the Multiplicative Weights Algorithm guarantees that after rounds, for any decision , we have

                                           

Randomized Weighted Majority Algorithm[6]

Given a situation where the probabilities of the experts predicting positive and negative, counting the weights, are both close to 1/2. The predictions made by the algorithm would be randomized. The algorithm calculates the probabilities of experts predicting positive or negatives, and then makes a random decision based on the computed fraction: predict

where

               .

Analysis

The number of mistakes made by the Randomized Weighted Majority Algorithm is bounded as:

                

where and .

Applications

Multiplicative Weights method is usually used to solve a constrained optimization problem. Let each expert be the constraint in the problem, and the events represent the points in the area of interest. The punishment of the expert corresponds to how well its corresponding constraint is satisfied on the point represented by an event.[1]

Solving Zero-Sum Games Approximately (Oracle Algorithm):[1][4][7]

Suppose we were given the distribution on experts. Let = payoff matrix of a finite 2-player zero-sum game, with rows.

When the row player uses plan and the column player uses plan , the payoff of player is , assuming .

If player chooses action from a distribution over the rows, then the expected result for player selecting action is .

Hence, in order to maximize , player is should choose plan . Similarly, the expected payoff for player is . Choosing plan would minimize this payoff. By John Von Neumann's Min-Max Theorem, we obtain:

                                          

where P and i changes over the distributions over rows, Q and j changes over the columns.

Then, let denote the common value of above quantities, also named as the "value of the game". Let be an error parameter. To solve the zero-sum game bounded by additive error of ,

                                                 
                                                 

Machine Learning

In Machine Learning, Littlestone and Warmuth generalized the Winnow algorithm to the Weighted Majority algorithm.[8] Later, Freund and Schapire generalized it in the form of Hedge algorithm.[9] AdaBoost Algorithm formulated by Yoav Freund and Robert Schapire also employed the Multiplicative Weight Update Method.[4]

Winnow Algorithm [4]

Based on current knowledge in algorithms, multiplicative weight update method was first used in Littlestone's Winnow Algorithm. It is utilized in machine learning to solve a linear program.

Given labeled examples where are feature vectors, and are their labels.

The aim is to find non-negative weights such that for all examples, the sign of the weighted combination of the features matches its labels. That is, require that for all . Without loss of generality, assume the total weight is 1 so that they form a distribution. Thus, for notational convenience, redefine to be , the problem reduces to finding a solution to the following LP:

                     ,
                     ,
                     .

This is general form of LP.

Hedge Algorithm [1]

Hedge Algorithm is similar to Weighted Majority Algorithm. However, their exponential update rules are different.

Analysis

Assume and for , is picked by Hedge. Then for all experts ,

                                

Initialization: Fix an . For each expert, associate the weight ≔1 For t=1,2,…,T:

      1. Pick the distribution  where .
      2. Observe the cost of the decision . 
      3. Set 
                               exp().

Solving Linear Programs Approximately[10]

Problem

Given a matrix and , is there a such that ?

                                    (1)

Assumption

Using the oracle algorithm in solving zero-sum problem, with an error parameter , the output would either be a point such that or a proof that does not exist, i.e., there is no solution to this linear system of inequalities.

solution

Given vector , solves the following relaxed problem

                                                  (2)

If there exists a x satisfying (1), then x satisfies (2) for all . The contrapositive of this statement is also true. Suppose if oracle returns a feasible solution for a , the solution it returns has bounded width . Therefore, if there is a solution to (1), then there is an algorithm that its output x satisfies the system (2) up to an additive error of . The algorithm makes at most calls to a width-bounded oracle for the problem (2). The contrapositive stands true as well. The Multiplicative Updates is applied in the algorithm in this case.

Other Applications

Operation Research and On-line Statistical Decision Making [4]

In operation research and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.

Computational Geometry [4]

Multiplicative weights algorithm is also widely applied in computational geometry such as Clarkson's algorithm for linear programming (LP) with a bounded number of variables in linear time.[11][12] Later, Bronnimann and Goodrich employed analogous methods to find Set Covers for hypergraphs with small VC dimension.[13]

References

  1. 1 2 3 4 5 "Efficient Algorithms Using The Multiplicative Weights Update Method" (PDF). November 2007.
  2. 1 2 "The Multiplicative Weights Algorithm*" (PDF). Retrieved 2016-11-09.
  3. "A sublinear-time randomized approximation algorithm for matrix games." (PDF). 1995.
  4. 1 2 3 4 5 6 7 8 9 10 11 12 13 "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications" (PDF). 2012.
  5. "Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm" (PDF). 2013.
  6. "COS 511: Foundations of Machine Learning" (PDF). 2006-03-20.
  7. "An Algorithmist's Toolkit" (PDF). 2009-12-08. Retrieved 2016-11-09.
  8. DEAN P. FOSTER AND RAKESH VOHRA (1999). Regret in the on-line decision problem, p. 29. Games and Economic Behaviour
  9. 1 2 Yoav, Freund. Robert, E. Schapire (1996). TA Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*, p. 55. journal of computer and system sciences.
  10. "Fundamentals of Convex Optimization" (PDF). Retrieved 2016-11-09.
  11. KENNETH L. CLARKSON. A Las Vegas algorithm for linear programming when the dimension is small., In Proc. 29th FOCS, pp. 452–456. IEEE Comp. Soc. Press, 1988.[doi:10.1109/SFCS.1988.21961] 123, 152.
  12. KENNETH L. CLARKSON. A Las Vegas algorithm for linear and integer programming when the dimension is small., J. ACM, 42:488–499, 1995. [doi:10.1145/201019.201036] 123, 152.
  13. H. BRONNIMANN AND ¨ M.T. GOODRICH. Almost optimal set covers in finite VC-dimension., Discrete Comput. Geom., 14:463–479, 1995. Preliminary version in 10th Ann. Symp. Comp. Geom. (SCG’94). [doi:10.1007/BF02570718] 123, 152
This article is issued from Wikipedia - version of the 11/20/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.