Partially-specified probabilities: decisions and games .fr

Jul 2, 2006 - valued. Consequently, for any information structure there exists a ... on the radio about traffic conditions on some particular main roads, but ...
406KB taille 3 téléchargements 331 vues
Partially-specified probabilities: decisions and games Ehud Lehrer∗ July 2, 2006 First draft: December 20, 2005

Abstract: In Ellsberg paradox, decision makers that are partially informed about the actual probability distribution violate the expected utility paradigm. This paper develops a theory of decision making with a partially specified probability. The paper takes an axiomatic approach using AnscombeAumann’s (1963) setting, and is based on a concave integral for capacities (see Lehrer, 2005). The partially-specified decision making is then carried on to games in order to introduce partially-specified equilibrium. Keywords: Fat-free act; strongly fat-free act; partially-specified probability; decision making; ambiguity aversion; partially-specified equilibrium; partially-specified correlated equilibrium



School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel.

e-mail: [email protected] The author acknowledges Yaron Azrieli, David Schmeidler, Uzi Seagal and Peter Wakker for their helpful suggestions. The author acknowledges the support of the Israel Science Foundation, Grant #762/045.

1

Introduction

Ellsberg paradox demonstrates a situation where a decision maker violates expected utility theory. This violation stems from the partial information that the decision maker receives about the actual distribution. This paper develops a theory of decision making with a partially-specified probability. The paper takes an axiomatic approach using Anscombe-Aumann’s setting (see Anscombe and Aumann, 1963), and is based on the concave integral for capacities (see Lehrer, 2005). The orthodox Bayesian model assumes that a decision maker assigns a probability to every event. In certain variations of this model, including those that relax this assumption, the probability might be either distorted (Quiggin, 1982, Yaari, 1987 and Gul, 1991), non-additive (Schmiedler, 1989) or multiple (Gilboa and Schmeidler, 1989). In this paper, the decision maker obtains partial information about the underlying regular (i.e., additive) probability. This information might include the probability of some events, but not of all, or of the expectation of some random variables, but not of all. An act assigns a lottery to every state of nature. Like in Savage (1954) and Anscombe and Aumann (1963), a complete preference order is defined over the set of acts. However, comparing acts entails using available information. The question arises as to how one should treat available information and how one should treat unavailable information. The model presented here takes an extreme approach to this issue. The decision maker uses available information to its full extent while completely ignoring any unavailable information. Any act is evaluated in terms of the “expected” utility that it yields. The latter is calculated by expressing the act under examination in terms of the information available. Using this information, the decision maker decomposes this act into acts that can be defined solely in terms of the information at hand. For instance, if the probability of event A is available, an act, defined as ‘the lottery assigned to any state in A is, say `, and to any other state the lottery assigned is k’, is expressed only in terms of the information available. Further, when two acts are expressed in the information available, then the act that results from choosing one of then at random is also expressed in terms of the information available. Acts defined this way are easy to evaluate: the expectation can be calculated since all the information needed to accomplish this task is available. The problem, though, concerns acts that cannot be expressed in terms of the information available, meaning 1

that one cannot evaluate them by calculating the expected utility they yield. In this case, the decision maker considers the best approximation possible that uses only what is known. The decision maker then evaluates an act according to the expected utility of the the best approximation yields. Two models of decision making under uncertainty are proposed here. The axiomatization of these models is essentially based on five axioms, all of a rather standard form: completeness, continuity, independence, monotonicity and ambiguity aversion. The first three are as in von-Neumann and Morgenstern (1944), the forth was added by Anscombe and Aumann (1963), and the fifth originates from Schmeidler (1989) and is used also by Gilboa and Schmeidler (1989). The main difference between the current axiomatization and previous ones is in the formulation of the independence axiom. This formulation takes two versions that, in turn, yield two versions of the decision making model with a partially-specified probability. The main idea is to apply the independence axiom to acts that are “fat-free” and to those related to them. A fat-free act is characterized by the property that if a lottery assigned to a state by this act is replaced by a worse one, the resulting act is strictly inferior to the original one. In a fat-free act there is no fat that can be cut while maintaining the same quality of the act. By contrast, in an act that contains fat there is at least one state whose assigned lottery can be replaced by a worse one, without affecting its quality: the modified act is equivalent to the original one. An act g is derived from an act f if any two states, associated with the same lottery under g, are also associated with the same lottery under f . In other words, the act g does not distinguish between two different states (by assigning them two different lotteries) that f does not distinguish between. Any act induces a partition of the state space into events: those subsets of states over which the act is constant. An act g is derived from an act f if the partition induced by f is finer than that induced by g. Technically speaking, this means that g is measurable with respect to f. The first version of the independence axiom applies to the acts derived from the same fat-free act. This version suggests that, as far as independence is concerned, the significant factor of a fat-free act is the structure of events over which it is constant. The precise lotteries assigned to these events is, for this matter, insignificant. It turns out that together with four widely used axioms (completeness, continuity, monotonicity and ambiguity aversion) the independence axiom, when restricted to a 2

fat-free act and to those derived from it, implies that the preference order is determined by evaluating acts using a regular (additive) probability specified only over a sub-algebra of events. The first model suggests that a decision maker encounters a problem governed by a regular probability. This probability is not fully specified. Rather, the decision maker is informed only of the probability of events contained in a sub-algebra. When asked, the decision maker, based on the partial information she receives, ought to decide between a few alternative acts. Her decision deviates from a standard expected utility maximization model due to lack of information concerning the real probability distribution. Obtaining information about the expectation of random variables is particularly relevant in dynamic situations. Suppose, for instance, that at the inception of the process there are 30 red balls in Ellsberg urn and 60 white or black balls. However, the balls multiply at a known rate. After a while, the frequency of the red balls is no longer one third. As the process evolves, the only information available is about the expectation of some random variables (this is illustrated in Example 4). Subject to this restricted information, the decision maker ought to decide between several gambles whose prizes depend on drawing a random ball from the urn. The second model proposed here captures the case where the information available consists of the probability of some events as well as of the expectation of some random variable. This model involves a second version of the independence axiom which applies to strongly fat-free acts. A strongly fat-free act is a fat-free act which remains fat-free even if mixed with a constant act. While in von-Neumann and Morgenstern (1944) the independence axiom applies to all acts, and in Gilboa and Schmeidler (1989) it applies only to constant acts, here it applies to strongly fat-free acts. The axiom states that when one act is preferred to another, this preference stays intact also when the two acts are mixed with a strongly fat-free act. The second version, together with completeness, continuity, monotonicity and an additional axiom (which states that a constant act is strongly fat-free), implies that the preference order is determined by evaluating acts based on a regular probability specified over some events (that do not necessarily form a sub-algebra) and on some random variables. The latter means that the expectations of some random variables are given. It is important to note that the general form of decision making with partially3

specified probabilities (not that with sub-algebra) is axiomatized without explicitly requiring ambiguity aversion. Ambiguity aversion is a consequence of the other axioms, and specifically of the independence axiom applied to strongly fat-free acts. In a Savage’s model, Epstein and Zhang (2001) defined unambiguous and ambiguous events. They axiomatize the case where there are a probability specified on unambiguous events and a utility function that represent the preference order restricted to acts that are measurable w.r.t.1 unambiguous events. The model of Epstein and Zhang (2001) remains silent on the analysis of choices among acts that involve ambiguous events. In this model the probability is specified on a λ-system2 rather than on an algebra. In the model presented here, there are no explicit definitions of what is ambiguous and of what is unambiguous. Rather, everything that turns to be unambiguous essentially results from the particular independence axiom employed. This axiom, together with the concave integral (lehrer, 2005), enables a complete representation of the preference order. Further, in this model unambiguity goes beyond a wellstructured set of events: the probability could be specified on any set of events that has no particular structure, or on any set of random variables. One might confuse partial preference ordering as in Bewley (2001) with partiallyspecified probabilities. In order to avoid such confusion it should be stressed that, as opposed to Bewley (2001), the completeness assumption is fully kept here: any two acts are comparable in the preference order. In other words, either one act is strictly preferred to the other or they are equivalent. In the current context, this complete preference order is determined by the evaluation of acts that is based on a partially-specified probability. Partially-specified decision making is carried on to strategic interactions in order to introduce partially-specified equilibrium. In a partially-specified equilibrium, players do not have precise knowledge of the mixed strategy played by each of the other players. Players know only the probability of some subsets of strategies, without knowing the precise sub-division of probabilities within these subsets. In other words, the mixed strategies played (which are probability distributions over pure strategies) are partially-specified. Moreover, different players may know different specifications 1 2

with respect to A λ-system is a set of subsets which is closed under complement and under union of disjoint

sets.

4

of the mixed strategy employed by any individual. When the information of all players is complete, the partially-specified equilibrium coincides with Nash equilibrium. The model of decision making under partially-specified probability implies ambiguity aversion. This, in turn, implies that the best-response correspondence is convex valued. Consequently, for any information structure there exists a partially-specified equilibrium. In order to illustrate the idea behind the notion of partially specified equilibrium, consider commuters that use a certain road system. A typical commuter might hear on the radio about traffic conditions on some particular main roads, but certainly not about the conditions in any tiny alley. Thus, unlike the traditional assumption that underlies Nash equilibrium, a commuter is unable to respond in the best way to all other commuters’ actions, simply because these actions are not fully known to her. A typical commuter plans the optimal route that would minimize her traveling time based on the partial information she obtains about the traffic situation. The road system is in a partially-specified equilibrium if any commuter takes the best possible route, given the partial specification she obtains about the actual traffic conditions. The information that a player obtains about other players’ strategies is not restricted to product-sets. Therefore, the definition of partially-specified equilibrium can be extended to cases where before playing the game players may obtain correlated signals. This yields the partially-specified correlated equilibrium, which extends the notion of correlated equilibrium (Aumann, 1974). The paper is organized as follows. Section 2 provides a motivating example: Ellsberg paradox. Section 3 contains the model and the axioms. Section 4 introduces partially-specified probabilities and how to integrate w.r.t. them. Section 5 provides the two main theorems: a decision making model with a probability specified on a sub-algebra and a decision making model with a partially-specified probability. The proofs of the main theorems are given in Section 6. The partially-specified equilibrium is introduced in Section 7. A discussion of axioms and of various variations of the model is given in Section 8. These include discussions on limited use of information and framing effects (sub-section 8.2); and on the difficulty of adopting an orthodox Bayesian view when obtaining only a partial specification about the probability (subsection 8.9). A paradoxical phenomenon related to time inconsistency that seems to be intrin5

sic to decision making with a partially-specified probability is demonstrated in subsection 8.8 (Example 7). The paper ends with Section 9 with some final comments on the connection between partially-specified probabilities and cooperative game theory.

2

Ellsberg paradox - a motivating example

Suppose that an urn contains 30 red balls and 60 balls that are either white or black. A ball is randomly drawn from the urn and a decision maker is given a choice between the two gambles. Gamble X: to receive $100 if a red urn is drawn. Gamble Y: to receive $100 if a white urn is drawn. In addition, the decision maker is also given the choice between these two gambles: Gamble Z: to receive $100 if a red or black ball is drawn. Gamble T: to receive $100 if a white or black ball is drawn. It is well documented that most people strongly prefer Gamble X to Gamble Y and Gamble T to Gamble Z. This is a violation of the expected utility theory. There are three states of nature in this scenario: R, W and B, one for each color. Denote by S the set containing these states. Each of the gambles corresponds to a real function (a random variable) defined over S. For instance, Gamble X corresponds to the random variable X, defined as X(R) = 100 and X(W ) = X(B) = 0. Let Y , Z and T be the functions that correspond to Y, Z and T, respectively. Denote A = {∅, S, {R}, {W, B}}. There is a probability of drawing a black ball, but this probability is unknown to the decision maker. Only the probabilities of the four events in the sub-algebra A are known to the decision maker: P (∅) = 0, P (S) = 1, P ({R}) =

1 3

and P ({W, B}) = 32 . In other words, the probability P is

partially-specified. The random variable X can be expressed as a linear combination of characteristic functions of events of which the probability is specified. Using only the four events in A, X can be decomposed as3 X = 100 · 1l{R} . This decomposition is used for evaluating X: X is evaluated at X = 100P (R) = 100 · 31 . When one tries to do the same for Y , one cannot obtain a precise decomposition of Y . The maximal non-negative function which is below Y and can be written solely 3

1lA is the indicator of a set A, known also as the characteristic function of A.

6

in terms of the events in A is 0 · 1lS . The function Y is therefore evaluated at 0. Since, 100 · 31 > 0, X is preferred to Y . A similar method applied to Z and T yields, Z ≥ 100·1l{R} and the right-hand side is the greatest of its kind. Thus, the evaluation of Z is 100 · 13 , while T is decomposed as 100 · 1l{W,B} . Therefore, the evaluation of W is 100 · 23 . Since 100 ·

1 3

< 100 · 23 ,

Gamble T is preferred to Gamble Z. The intuition is that the decision maker bases her evaluation of random variables only on well-known figures, namely on the probability of the events whose probability is specified. The best estimate is provided then by the maximal function which is not larger than the random variable and can be expressed in terms of these events. The evaluation of a random variable is based on this estimate.

3

The model and axioms

3.1

The model

Let N be a finite set of outcomes and ∆(N ) be the set of distributions over N . S is a finite state space. Denote by L the set of all functions from S to ∆(N ) and Lc the set of all constant functions in L. An element of L is called an act. The constant function that attains the value y will be denoted by y (i,e., y(s) = y for every s ∈ S). L is a convex set: if α ∈ [0, 1], f, g ∈ L, then (αf + (1 − α)g)(s) = αf (s) + (1 − α)g(s). A decision maker has a binary relation % over L. We say that % is complete if for every f and g in L either f % g or g % f . It is transitive if for every f, g and h in L, f % g and g % h imply f % h. The order % is non-trivial if there are two acts f and g such that f  g.

3.2

Axioms

(i) Weak-Order: The relation % is non-trivial, complete and transitive. The relation % defined over L induces a binary relation4 , %, over ∆(N ) as follows: 4

At the minor risk of some confusion, although ∆(N ) differs from Lc , the same notation is used

to denote the two preference orders defined over them.

7

y % z iff y % z. The relation % induces the binary relations  and ∼: f  g iff f % g and not g % f ; f ∼ g iff f % g and g % f . Let f and g be two acts. We denote f ≥ g when f (s) % g(s) for every s ∈ S and f > g when f ≥ g and f (s)  g(s) for at least one s ∈ S. For every f ∈ L, denote W (f ) = {g; f ≥ g}. (ii) Monotonicity: For every f and g in L, if f ≥ g, then f % g. Axioms (i) and (ii) imply that there are two constant acts c1 and c2 such that c1  c2 . Definition 1 (1) An act f is fat-free (denoted, FaF) if f > g implies f  g. (2) We say that an act g derives from act f , if there is a function ϕ from ∆(N ) to itself such that g = ϕ ◦ f . The set of all acts that are derived from f is denoted D(f ). (3) An act f is strongly fat-free (denoted, SFaF) if αf + (1 − α)c1 is FaF for every 0 < α ≤ 1. Let f be an act. It is fat-free if when a lottery assigned by f to a state is replaced by a worse one, the resulting act is strictly inferior to f . In a fat-free act there is no single lottery that can be reduced while maintaining the quality of the act. In contrast, in an act that is not fat-free, there is at least one state whose assigned lottery can be replaced by a worse one without affecting its quality: the modified act is equivalent to the original one. Any act induces a partition of the state space into events: those subsets of states over which the act is constant. An act g is derived from an act f if the partition induced by f is finer than that induced by g. Technically speaking, this means that g is measurable5 with respect to f . An act f is strongly fat-free if any convex combination of f with a constant act is fat-free. It may occur that an act is fat-free because the lotteries it assigns to some states are the worst possible lotteries, and these cannot be pushed further down. The test of whether f is strongly fat-free or not is done by mixing it with some constant act: αf + (1 − α)c1 . Now, all the lotteries are uniformly pushed up and are all strictly preferred to the worst lottery. If after such a mixing there is no way to reduce the lotteries without harming the quality of αf + (1 − α)c1 , then f itself is strongly fat-free. 5

An act g is measurable w.r.t. f if for any two states, s1 and s2 , if f (s1 ) = f (s2 ), then g(s1 ) =

g(s2 ).

8

Remark 1 Lemma 4 below shows that the definition of SFaF does not depend on the choice of c1 , as long as there is c2 that satisfies c1 Â c2 . The subject is further discussed in subsection 8.5. Example 1 Consider Ellsberg’s urn described in Section 2. Suppose that N is the set of integers between 0 and 100. The act X, which takes the values 100 on R and 0 on the rest is FaF because any reduction in any prize results in a worse act. For instance X 0 which takes the values 99 on R and 0 on the rest is worse than X. On the other hand, Y is not FaF: Y 0 , which coincides with Y on R and B and is equal to 99 on W , is equivalent to Y . Any act that takes one value on R and another value on {W, B} is derived from X. For instance, the act that is constantly equal to 50 and the act that takes the value 11 on R and the value 12 otherwise, are derived from X. Note that for every act f , Lc is a subset of D(f ). Example 2 An urn contains 100 balls of four colors: white, black, red and green. It is known that there are 90 white or black balls and that there are 90 white or red balls. Thus, S = {W, B, R, G}, P (W, B) = P (W, R) = .9. Consider the act 1l{W,B} , that takes the values 1 on {W, B}, and 0 otherwise. The expected value of this act is .9. Moreover, the expectation of any function of the type Z = α1l{W,B} + (1 − α)1lS is .9α + (1 − α) and the expectation of any act smaller than Z is strictly smaller than that of Z. Thus, 1l{W,B} is SFaF. The reason is that Z can be expressed by acts whose expectation is known. Now consider the act 1lW , that takes the value 1 on W and 0 otherwise. The information available provided no information about the probability of W ; it can be anything between .8 and .9. A decision maker who dislikes ambiguity would evaluate the expectation of 1lW at .8. Adding 1lS to 1lW would result in an act, say Y , whose expectation is 1.8. Denote X = 1l{W,B} + 1l{W,R} . Note that the expectation of X is also 1.8. However, X coincides with Y in all states except for G. On G the act X takes the value 0 while Y takes the value 1. Thus, by reducing the value that Y takes on G from 1 to 0 does not reduce the expectation of Y . Thus, 1lW is not SFaF. The reason is that there is no way to get a precise evaluation for the probability of W . The evaluation of this probability is obtained by taking the lowest estimation that is still consistent with the information provided. 9

(iiid ) Derived Fat-Free Independence: Let f, g and h be acts derived from the same fat-free act. Then, for every α ∈ (0, 1), f  g implies that αf + (1 − α)h  αg + (1 − α)h. (iv) Continuity: For every f, g and h in L, (a) if f  g and g % h, then there is α in (0, 1) such that αf + (1 − α)h  g; and (b) if f % g and g  h, then there is β in (0, 1) such that g  βf + (1 − β)h. Since, Lc is a subset of D(f ), one can apply von Neumann-Morgenstern theorem to ∆(N ): axioms (i), (iiid ) and (iv) imply that there is an affine function defined on Lc that represents % restricted to Lc , and therefore to ∆(N ). (iii) Strongly-Fat-Free Independence: Let f, g be acts and h be SF aF . Then, for every α ∈ (0, 1), f  g implies that αf + (1 − α)h  αg + (1 − α)h. Axioms (i), (ii), (iii) and (iv) imply that there is an affine function defined on Lc that represents % (note that, by definition, Lc ⊆ LF aF ). The following axiom originates from Schmeidler (1989) and is used also by Gilboa and Schmeidler (1989). (v) Uncertainty Aversion: For every f, g and h in L, if f % h and g % h, then for every α in (0, 1), αf + (1 − α)g % h. For simplicity, I add the conciseness axiom, which states that every constant act is FaF. If a constant act is not FaF, then one state is known to have probability zero, in which case this state can be omitted from S. (vi) Conciseness: Any constant act is FaF. Axioms (i), (iii), (iv) and (vi) ensure the existence of vN-M representation of succsim on Lc . Denote by m and M the minimum and the maximum in Lc , respectively.

10

3.3

Some implications of the axioms

The following two lemmas state that every act has an equivalent and FaF act. The first lemma deals with (iiid ) and the second with (iii).6 Notation 1 Let f be an act. [f ] denotes a FaF act which satisfies f ≥ [f ] and f ∼ [f ]. Lemma 1 Axioms (i), (iiid ), (iv) and (v) imply that for every act f , there exists [f ].

Lemma 2 Axioms (i)-(v) imply that for every act f , there exists [f ].

The next lemma states that the set of the SFaF acts is convex. Lemma 3 Axioms (i)-(vi) imply that if f and g are SFaF, then for every 0 ≤ α ≤ 1, αf + (1 − α)g is also SFaF. In other words, the set of SFaF acts is convex.

The definition of a SFaF uses the constant act c1 . The following lemma states that the definition is independent of the choice of c1 and it could be M as well. Lemma 4 Axioms (i)-(vi) imply that if f is SFaF if and only if for every α, αf + (1 − α)M is SFaF. While every act has an equivalent and FaF act, this is not the case with SFaF. However, as the next lemma states, every act has some mixture with the maximally constant act which is SFaF. Lemma 5 Axioms (i)-(vi) imply that for every act f there is 0 < α ≤ 1 such that [αf + (1 − α)M] is SFaF. 6

The proof of this lemma, as well as of all other lemmas and propositions, are deferred to the

Appendix.

11

4

Partially specified probabilities

4.1

Probabilities specified on a sub-algebra

A probability specified on a sub-algebra over S is a pair (P, A) such that A is an algebra7 of subsets of S and P is a probability over S. Let ψ be a non-negative function defined over S and let (P, A) be a probability specified on a sub-algebra over S. Denote, Z nX o X (1) ψdPA = max λE P (E); λE 1lE ≤ ψ and λE ∈ R for every E ∈ A , E∈A

E∈A

where 1lE is the indicator of E. The decision maker obtains the probability of every event in A. She can thereP fore able to calculate the expectation of any function of the form E∈A λE 1lE : the P expectation is simply E∈A λE P (E). The integral of a function ψ is defined as the P maximal expectation of those functions of the form E∈A λE 1lE that are below ψ. Remark 2 (a) If ψ is measurable with respect to A, then this integral coincides with the regular expectation. (b) Let (P, A) be a probability specified on a sub-algebra. Since S is finite, A is generated by a partition, say P, of S. Thus, the integral can be written also with a further restriction that all sets E are taken from the partition P and, moreover, that the coefficients λE are non-negative: Z nX o X (2) ψdPA = max λE P (E); λE 1lE ≤ ψ and λE ≥ 0 for every E ∈ P . E∈P

E∈P

The integral was defined as in eq. (1) in order to keep uniformity with the integral wrt a general partially specified probability defined below (eq. 3). Example 3 Let S = {s1 , s2 , s3 , s4 } and suppose that the sub-algebra A is generated by the partition {{s1 , s2 }, {s3 , s4 }}. Furthermore, assume that P (s1 , s2 ) =

1 , 3

and

P (s3 , s4 ) = 32 . 7

A set A of subsets of S is called algebra if S ∈ A and for every E1 , E2 ∈ A the intersection,

E1 ∩ E2 , and the complement, S \ E1 , are also in A.

12

Consider ψ = (1, 2, 3, 4), that is, ψ(si ) = i, i = 1, 2, 3, 4. ψ ≥ 1l{s1 ,s2 } + 31l{s3 ,s4 } . R This decomposition maximizes the right-hand side of eq. (2), and therefore, ψdPA = 1 + 3 23 = 73 . 3 Definition 2 Let I be a real function from [0, 1]S and X, Y ∈ [0, 1]S . (i) We say that X ≥ Y if X(s) ≥ Y (s) for every s ∈ S and X > Y if X ≥ Y and X(s) > Y (s) for at least one s ∈ S. (ii) A function X over S is fat-free (FaF) wrt I if X > Y implies I(X) > I(Y ). (iii) Y is derived from X if there is a function ϕ : R → R such that Y = ϕ ◦ X. Proposition 1 Let I be a real function from [0, 1]S . There is a probability specified R on a sub-algebra, (P, A), over S such that I(X) = XdPA for every X ∈ [0, 1]S iff (1) I is monotonic w.r.t. to ≥; (2) For every X, there is a FaF function wrt I, [X], such that X ≥ [X] and I(X) = I([X]); (3) If X and Y are derived from the same FaF function and α ∈ (0, 1), then I(αX)+ I((1 − α)Y ) = I(αX + (1 − α)Y ); (4) If X is FaF wrt I, then for every positive c, if cX ∈ [0, 1]S , then I(cX) = cI(X); (5) For every X and Y such that I(X) = I(Y ) and α ∈ (0, 1), I(αX)+I((1−α)Y ) ≤ I(αX + (1 − α)Y ); and (6) I(1lS ) = 1.

4.2

Partially specified probability

Example 4 (Dynamic Petri plate) Suppose that, like in Ellsberg urn, at day 1 a Petri plate contains 90 organisms of which 30 are red and the others white or black. However, each white organism splits into two once a day. On day 2 the decision maker is called upon to choose from among a few gambles. The information available to the decision maker on day 2 does not contain the probability of any event; the probability of R is no longer 31 . However, the decision 13

maker may deduce, from the information available, the expectation of certain random variables. To illustrate this point, suppose that the number of red, white and black organisms on day 2 are denoted by nr , nw and nb , respectively. On day 1 there were nw 2

white organisms. It is known that

nr +nw /6 nr + n2w +nb +nw /2

=

nr + 16 nw nr +nw +nb

nr nr + n2w +nb

= 31 . Since

nw /6 nw /2

=

1 3

one obtains,

= 13 . This is the expectation (on day 2) of the random

variable that attains the value 1 on Red, 61 on White and 0 on Black. In other words, according to the information available, the expectation of this random variable is 13 . The decision maker knows that a regular probability is underlying this decision problem: the actual distribution of organisms. However, the decision maker obtains only partial information about it. Beyond the obvious information concerning the probability of the whole space and the empty set, on day 2 the decision maker is informed only of the expectation of the random variable (1, 16 , 0), from which he may deduce also the expectation of any random variable of the form c(1, 16 , 0), c > 0. No further information is available. The following model of decision making with partially-specified probabilities also captures the case where, in addition to obtaining the probability of some events (but not necessarily all of them), the decision maker obtains some data about the true expectation of some random variables. A partially-specified probability over S is a pair (P, Y), where Y is a set of real functions over S, Y contains 1lS , and P is a probability over S. Note that any probability specified on a sub-algebra, (P, A), can be identified with a partiallyspecified probability, (P, Y): Y = {1lA ; A ∈ A}. Let ψ be a non-negative function defined over S and let (P, Y) be a partiallyspecified probability. Denote, Z nX o X (3) ψdPY = max λY EP (Y ); λY Y ≤ ψ and λY ∈ R for every Y ∈ Y , Y ∈Y

Y ∈Y

where EP (Y ) is the expectation of Y w.r.t. P . The decision maker knows EP (Y ) for every Y ∈ Y and he can therefore calculate the expectation of any function of the P P form Y ∈Y λY Y . The integral of ψ is the maximal Y ∈Y λY EP (Y ) among all those P function of the form Y ∈Y λY Y that are below ψ. Remark 3 Although the set Y need not be closed, the maximum (as opposed to supremum) on the right-hand side of eq. (3) is obtained. Indeed, Lemma 6 below 14

states that without loss of generality Y can be assumed to be finite. Therefore, writing ‘maximum’ is justifies. Example 5 Consider S = {s1 , s2 , s3 , s4 } and suppose that the real probability over S is P . However, the decision maker is informed only of the probability of A = {s1 , s2 } and B = {s2 , s3 } (and, as usual, of S). Note that A, B and S do not form an algebra. In this case Y = {1lA , 1lB , 1lS }. Let ψ = 1lS\A . Since, 1lS\A = 1lS − 1lA one obtains R ψdPY = P (S) − P (A). Now let ψ be a function over S defined as follows: ψ(s1 ) = 1 − α, ψ(s2 ) = α, ψ(s3 ) = 1 − α and ψ(s4 ) = 2 − 3α, where α ≤

1 2

(and thus, 2 − 3α ≥ 1 − α ≥ α). R Notice that (1 − α)1lS\A + (1 − α)1lS\B − α1lS ≤ ψ. Therefore, ψdPY ≥ (1 − α)P (S \ A) + (1 − α)P (S \ B) − α. R Also, α1lA + (1 − α)1lS\A ≤ ψ and therefore, ψdPY ≥ αP (A) + (1 − α)P (S \ A). R It turns out that, ψdPY = max((1 − α)P (S \ A) + (1 − α)P (S \ B) − α, αP (A) + (1 − R α)P (S \ A)). If α = 0, for instance, then ψdPY = max(P (S \ A) + P (S \ B), P (S \ A)) = P (S \ A) + P (S \ B). This example is revisited in the context of strategic interactions (see Example 6 (4) below.) Notation 2 Let A be a set of subsets of S (not necessarily an algebra) that contains R R S. We denote ψdPA = ψdPY , where Y = {1lA ; A ∈ A}. The following lemma states that without loss of generality, the set Y can be assumed to be finite. Lemma 6 Let (P, Y) be a partially-specified probability. Then, there is a finite subset of Y, say Y 0 , such that Z Z ψdPY =

ψdPY 0 .

An important feature of the integral w.r.t. a partially-specified probability is concavity. It states that, as a function defined on functions, the integral is concave. Lemma 7 Let (P, Y) be a partially-specified probability. Then, for every two functions ψ and φ, Z Z Z αψ + (1 − α)φdPY ≥ α ψdPY + (1 − α) φdPY .

15

The following lemma makes the connection between the partially specified probability model and the multiple-prior model (Gilboa and Schmeidler, 1989). It states that for every partially-specified probability (P, Y) one can find a finite number of probability distributions, such that the integral with respect to (P, Y) is equal to the minimum of the respective (additive) integrals. Lemma 8 Let (P, Y) be a partially-specified probability. Then, there is a finite subset of Q of probability distributions over S such that Z Z ψdPY = min ψdQ. Q∈Q

Proposition 2 Let I be a real function from [0, 1]S . There is a partially-specified R probability, (P, Y), such that I(X) = XdPY iff (1) I is monotonic w.r.t. to ≥; (2) For every X, there is 0 ≤ α < 1 such that [αX + (1 − α)1lS ] is SFaF function w.r.t. I; (3) If X is SFaF, Y is a function and α ∈ (0, 1), then I(αX) + I((1 − α)Y ) = I(αX + (1 − α)Y ); (4) For every X and for every positive c, if cX ∈ [0, 1]S , then I(cX) = cI(X); (5) c1lS is SFaF for every c ∈ [0, 1]; and (6) I(1lS ) = 1;

5 5.1

Theorems Decision making with a probability specified on subalgebra

Theorem 1 A binary relation % over L satisfies (i), (iid ),(iii)-(v) if and only if there is a probability specified on a sub-algebra (P, A) and an affine function u on ∆(N ) such that for every f, g ∈ L, Z (4)

f %g

iff

Z u(f (s))dPA ≥

u(g(s))dPA .

Theorem 1 suggests that the probability underlying the decision making problem is indeed additive, but the decision maker is only informed of the probabilities of some of the events (that form a sub-algebra) and not about all of them. The decision maker 16

is informed of the probability of every event in the algebra A. An act f associates with the state s the lottery f (s). The utility of this lottery is u(f (s)). Thus, the act f , combined with the utility function u, induces a correspondence between states and numbers (utility), u ◦ f . The decision maker evaluates the worth of the act f using the information available: he approximates the function u ◦ f by functions whose expectation can be calculated. Axioms (i), (iid ),(iii)-(v) imply that there exists a probability P specified on a sub-algebra A, such that the preference order % is determined by the evaluations of acts using (P, A). The act f is preferred to the act g (i.e., f % g) iff the evaluation R R of f , u(f (s))dPA , is at least as high as the evaluation of g, u(g(s))dPA . A probability specified on a sub-algebra, (P, A), induces a capacity v(P,A) , defined R over S: v(P,A) (E) = 1lE dPA . This capacity is convex8 . By Remark 2 (b), v(P,A) = maxF ⊆E, F ∈A P (F ) and according to Lehrer (2005), the Choquet integral w.r.t. v(P,A) coincides with the integral in eq. (1). Thus, a Decision-maker whose preference order satisfies axioms (i), (ii) (iiid ),(iv) and (v) is in particular a Choquet expected utility maximizer. In other words, the model proposed by Theorem 1 is with that the model of Choquet expected utility maximization introduced in Schmeidler (1989). Moreover, as a convex game, v has a large core9 and by Azriely and Lehrer (2005), axioms (i), (ii) (iiid ),(iv) and (v) imply that the preference over acts is determined by the minimum of the expectations of many probability distributions, as in Gilboa and Schmeidler (1989). The non-additivity of v(P,A) stems from the incomplete information the decision maker obtains about the real probability. While the decision maker might know that the probability governing the decision problem is additive, his use of the partial information he has about it is equivalent to assigning non-additive probability to events whose probability is not explicitly specified (ambiguous). It is important to note that not every capacity is of the form v(P,A) , and therefore the model of decision making with probability specified on a sub-algebra is a strict sub-model of the Choquet expected-utility maximization model. There are a few conceptual differences between the models, which pertain also to the comparison between the model of decision making with partially-specified 8 9

P

A capacity v is convex if for every E, F ⊆ S, v(E) + v(F ) ≤ v(E ∪ F ) + v(E ∩ F ). A capacity v has a large core (Sharkey, 1982) if for every vector Q = (Qi )i∈N , the condition

i∈T

Qi ≥ v(T ) for every T ⊆ S, implies that there is R in the core of v such that Q ≥ R.

17

probability and the model of decision making with a multiple-prior. This issue is discussed in the next subsection.

5.2

Decision making with a partially-specified probability

Theorem 2 Let % be a binary relation over L. This satisfies (i)-(iv) and (vi) if and only if there is a partially-specified probability (P, Y), with Y being finite, and an affine function u on ∆(N ) such that for every f, g ∈ L, Z (5)

f %g

iff

Z u(f (s))dPY ≥

u(g(s))dPY , Z

and moreover, if c is a constant act and f ≤ c satisfies

Z u(f (s))dPY ≥

u(c)dPY ,

then10 f = c. In order to evaluate an act a decision maker uses the information captured in (P, Y). Anything that can be deduced from this information is employed to its full extent. For instance, if a probability of an event is provided, the probability of its complement can be deduced. Further, a decision maker can deduce the expectation of any linear combination of random variables whose expectations are known. The model of decision making described in Theorem 2 is consistent with the multiple-prior model (Gilboa and Schmeidler, 1989), and not with the model of Choquet expected-utility maximization (Schmeidler, 1989). There are two features of the current model that make it more structured than the multiple-prior model. First, due to Lemma 6, the decision maker uses only a finite number of priors to determine his preference order over acts. Second, there is one probability distribution (referred to as the ‘real’ one) and a finite set of random variables on which all the priors and the real distribution agree. Moreover, any other distribution that satisfies this last condition can be obtained as a convex combination of the priors. The Choquet expected-utility maximization model and the multiple-prior model are belief-based: it assumes that there is a (non-additive) belief or a set of priors (beliefs) that the decision maker uses to set up her preferences. In contrast, the partially-specified probability model is information-based: it assumes that the decision maker obtained partial information about the real distribution and based on 10

Throughout this paper we take equality between acts in a broad sense. Let f and g be two acts.

We say that f = g if for every s ∈ S, f (s) ∼ g(s).

18

this information he sets up his preferences. While the belief-based models hinge on a fixed belief or on a fixed set of priors, the information-based model allows the belief to change with the underlying (real) distribution. This difference between the models is crucial when the underlying (i.e., the real) distribution may vary. It is particularly important in a strategic interaction, when the states of nature are nothing else than other players’ actions, and the source of uncertainty is that other players are mixing or chosen by some scheme of random matching. The partially-specified probability model enables one to examine a strategic interaction with a given information structure (that delineates what each player knows about other players’ strategies) and to analyze the possible equilibria in such a situation. Different equilibria determine different beliefs that players hold about the interaction. That is, whereas the information available is fixed, beliefs may change. This issue is discussed also in the context of partially-specified equilibrium (see Section 7). Another important matter in this respect is updating. This is discussed later in subsection 8.8. Notice that Axiom (v) (ambiguity aversion) does not play a role in Theorem 2. This axiom is not assumes. Rather, it is implied, as stated in the following corollary. Corollary 1 Let % be a binary relation over L. Then, axioms (i)-(iv) and (vi) imply (v). The corollary is a consequence of Lemma 7, which states that the integral w.r.t. a partially specified probability is a concave operator defined on the set of functions.

6

Proofs of the theorems

Proof of Theorem 1. The proof of the ‘only if’ direction is omitted. For proving the ‘if’ direction assume that Axioms (i), (ii), (iiid ), (iv) and (v) are satisfied. Let f be FaF. Consider the set D(f ). The order % restricted to this set satisfies the axioms of von Neumann-Morgenstern. By von Neumann-Morgenstern theorem (1944), there is an affine function, say Uf over D(f ), that represents %. That is, for every g and h in D(f ), g % h iff Uf (g) ≥ Uf (h). Note that Lc ⊆ D(f ). As a von Neumann-Morgenstern utility function, Uf is unique up to a positive affine transformation. Therefore, there is a unique von Neumann-Morgenstern utility function over D(f ) that represents % and satisfies Uf (m) = 0 and Uf (M) = 1. Thus, for every k ∈ Lc and every two FaF acts g and f , Uw (k) = Uf (k). 19

According to (ii), 0 ≤ Uf (f ) ≤ 1. Moreover, there is k(f ) ∈ Lc such that Uf (k(f ))= Uf (f ), meaning that k(f ) ∼ f . Define for every act f , U (f ) = U[f ] ([f ]). f % g if and only if [f ] % [g] if and only if k([f ]) % k([g]) if and only if U[f ] (k([f ])) ≥ U[f ] (k([g])) if and only if U[f ] (k([f ])) ≥ U[g] (k([g])) if and only if U[f ] ([f ]) ≥ U[g] ([g]). Thus, U represents %. Define u(y) = U (y) for every y ∈ ∆(N ). For every X ∈ [0, 1]S there is an act fX such that for every s ∈ S, X(s) = u(fX (s)). Note that if X is FaF, so is fX . For every act f define Xf (s) = u(fX (s)). Note that XfX = X. Set, I(X) = U (fX ), for every X ∈ [0, 1]S . Thus, I(Xf ) = U (f ) for every f ∈ L. In order to use Proposition 1, it will be shown that I satisfies properties (1)-(6). (1) follows from (ii) and from the fact that if X ≥ Y , then fX ≥ fY and [fX ] ≥ [fY ]. (2) follows from Lemma 1. As for (3), assume that X and Y are derived from the same FaF Z. Then, fZ is FaF and furthermore, fX and fY are in D(fZ ). Therefore, (6)

UfZ (αfX + (1 − α)fY ) = αUfZ (fX ) + (1 − α)UfZ (fY ). However, for every h ∈ D(fZ ),

(7)

UfZ (h) = UfZ (k(h)) = U[h] (k(h)) = U[h] ([h]).

Thus, UfZ (αfX +(1−α)fY ) = U[αfX +(1−α)fY ] ([αfX +(1−α)fY ]) = I(αfX +(1−α)fY ). On the other hand, UfZ (fX ) = U[fX ] ([fX ]) = I(X), and UfZ (fY ) = U[fY ] ([fY ]) = I(Y ). Due to eq. (6) one obtains (3). Let X be FaF. Thus, fX is FaF and m, fcX ∈ D(fX ) whenever 0 ≤ c and cX ∈ [0, 1]S . Therefore, if c < 1, then fcX ∼ cfX + (1 − c)m and I(cX) = U[fcX ] ([fcX ]) = UfX (fcX ) = cUfX (fX ) + (1 − c)UfX (m) = cUfX (fX ) = cI(X). The second equality is due to eq. 7, and the third is by (iiiid ). For c > 1 the proof method is similar, and is therefore omitted. Thus, (4) is proven. (5) follows from (v) and (6) from construction. One can now use Proposition 1 to obtain I that satisfies properties (1)-(6) of Proposition 1 and therefore, there exists R a probability specified on a sub-algebra, (P, A) such that I(X) = XdPA for every R R X ∈ [0, 1]S . Thus, U (f ) = I(Xf ) = Xf dPA = u ◦ f dPA , as desired. Proof of Theorem 2. The proof of the ‘only if’ direction is omitted. For proving the ‘if’ direction assume that Axioms (i)-(iv) and (vi) are satisfied. Axioms (i), (iii), (iv) and (vi) guarantee that there is a vN-M representation over Lc . Moreover, U could be normalized to take the values 0 and 1 on m and M, respectively. Axioms 20

(ii) and (iv) ensure that every act f has an equivalent constant act, say k(f ). Define U (f ) = U (k(f )). The function U represents %: for every two acts f, g, U (f ) ≥ U (g) if and only if f % g. Denote by LSF aF the set of the SFaF acts. Lemma 3 states that LSF aF is convex. Due to (i), (iii) and (iv) U is affine over LSF aF . Like in the previous proof, define I on [0, 1]S as follows. For every X ∈ [0, 1]S and every act f define the act fX and the function Xf as before. Note that XfX = X. Moreover, if X is SFaF, so is fX . Define, I(X) = U (fX ). Thus, I(Xf ) = U (f ) for every f ∈ L. We show that I satisfies properties (1)-(6) of Proposition 2. (1) is ensured by (ii); (2) is guaranteed by Lemma 5; (4) is due to I(01lS ) = 0, 01lS is SFaF and that % has a vN-M representation over the set of acts αm + (1 − α)fX for every function X; (3) holds due to (4) and (iii); (5) is satisfied because of (vi) and due to (4); finally, (6) is satisfied since f1lS = M. Thus, I(1lS ) = U (fX ) = U (M) = 1. Proposition 2 ensures that there is a partially-specified probability (P, Y) such that R R R I(X) = XdPY for every X ∈ [0, 1]S . Thus, U (f ) = I(Xf ) = Xf dPY = u◦f dPY . By Lemma 6, Y can be chosen to be finite, as desired. Moreover, suppose that f ≤ c and f ∼ c. It means that Xf ≤ c1lS and that R I(Xf ) = Xf dPY = c. However, c1lS is SFaF and thus, Xf = c1lS , which means that f ∼ c, as required.

7

Equilibrium with partially-specified probabilities

7.1

Partially specified equilibrium

Commuters might hear, on the radio, about traffic conditions on some particular main routes, but certainly not about the condition on any tiny alley. In other words, a typical commuter is only partially informed about the distribution of commuters in the road system. Nevertheless, based on this information commuters plan optimal paths that would minimize their traveling time. The distribution of vehicles on the roads in not determined by nature. Rather, this distribution, to which an individual commuter responds, is determined by the joint decision of many decision makers. However, as opposed to the traditional assumption that underlies Nash equilibrium that players best respond to actions taken by all other 21

players, here, players respond best, to the best of their knowledge, subject to partial information they receive about what other players do. This section introduces an equilibrium notion where players obtain just a partial specification of other players’ actions. Let G = (M, {Bi }i∈M , {ui }i∈M ) be a game: M is a finite set of players, Bi is player i’s set of pure strategies (finite), and ui : B → R is player i’s payoff function, where B = ×i∈M Bi . Denote by ∆(Bi ) the set of mixed strategies of player i and for every i denote, B−i = ×j6=i Bj . B−i is the set of action profiles of all players excluding i. P

For every b−i ∈ B−i and player i’s mixed strategy pi ∈ ∆(Bi ), define ui (pi , b−i ) =

bi ∈Bi

pi (bi )ui (bi , b−i ). This is the linear extension of ui to ∆(Bi ) × B−i .

Suppose that player i knows about players in N \ {i} only the probabilities of some subsets of B−i but of all subsets. More specifically, suppose that player j plays the mixed strategy pj .When i 6= j, player i does not know pj in its entirety. Rather, player i knows only the probability of some subsets of Bj . The set of subsets of Bj whose probabilities are known to player i is denoted by Iij . That is, player i knows only the probabilities pj (C), for every C ∈ Iij . Denote by Ii the set of subsets of B−i whose probability are known to player i. The set Ii contains those subsets of B−i that can be written as a product of subsets of individual n players’ pure strategies whose probabilities o are known to player j i. Formally, Ii = ×j∈N \{i} Cj ; Cj ∈ Ii for every j ∈ N \ {i} . Note that Ii needs not be a sub-algebra. The mixed strategies pj , j ∈ N \ {i}, induce a product distribution over B−i , denoted p−i . To sum up, player i knows the probability p−i (C) for any C ∈ Ii . In other words, player i’s PSP is (p−i , Ii ). For the following definition recall Notation 2. Definition 3 Let Ai be the set of subsets of B−i known to player i that contains B−i itself. A profile {pi }i∈M ∈ ×i∈M ∆(Bi ) of mixed strategies is a partially-specified equilibrium w.r.t. A1 , ..., An , if for every player i the mixed strategy pi maximizes his payoff. In other words, for every p0i ∈ ∆(Bi ), Z (8)

Z ui (pi , b−i )dPA−ii



ui (p0i , b−i )dPA−ii .

The notion of partially-specified equilibrium is information-based. The information structure, namely, the information available to each player, determines the set 22

of equilibria. No player has a prior belief about other players’ strategies. The belief a player has about the actual strategies played by all others is determined by these strategies, as well as by the partial information a player obtains about them. When Ai is the discrete11 algebra for every i, then a partially-specified equilibrium w.r.t. A1 , ..., An coincides with Nash equilibrium. Unlike Nash equilibrium, in a partially-specified equilibrium when the algebra Ai is not discrete, the pure strategies in the support of player i’s strategy are not necessarily best responses. To illustrate this point consider the following example.

Example 6 (1) Consider the following two-player coordination game. L

R

T

1,0

0,2

B

0,3 2,-1

Suppose that each player knows nothing about his opponent’s strategy. The row player chooses a distribution (p, 1 − p) over his set of action {T, B}. When playing (p, 1 − p), the expected payoff of the row player is min(p, 2(1 − p)). The maximum over p is achieved when p = 32 . The same reasoning applies to player 2: she plays ( 12 , 12 ) and the only partially-specified equilibrium when both players’ information is trivial, is (( 23 , 13 ), ( 12 , 12 )). Note that L obtains a positive probability, although it is not a best response to ( 23 , 31 ) (neither when player 2 knows nothing about this strategy, nor when player 2 does know it). Furthermore, when plays 2’s knowledge about the strategy played by player 1 is trivial, R is the worst strategy that player 2 can play against ( 23 , 13 ). Nevertheless, R is played in equilibrium with a positive probability.

(2) Let a two-player game be given by

11

L

R

T

2,2

2,0

B

0,2

3,3

An algebra A of a set B−i is discrete if for every b ∈ B−i , {b} ∈ A.

23

Suppose, as in the previous example, that each player knows nothing about his opponent’s strategy. When playing (p, 1 − p), the expected payoff is min(2p, 2p + 3(1 − p)). The maximum is achieved when p = 1. The same applies to player 2, and the only partially-specified equilibrium when both players’ information is trivial, is (T, L). In case player 1 has full information about player 2’s strategy, still (T, L) is the only partially-specified equilibrium.

(3) Let a two-player game be given by L

M

R

T

3,3

0,0

0,0

C

0,0

2,2

0,0

B

0,0

0,0

1,1

Consider the case where player 1 knows the probability assigned by player 2 to R, and player 2 knows the probability assigned by player 1 to B. In other words, A1 is generated by the partition {{L, M }, {R}} and A2 is generated by the partition {{T, C}, {B}}. In equilibrium player 1 plays (p1 , p2 , p3 ) and player 2 plays (q1 , q2 , q3 ). When p1 + p2 > 0, as in example (1), pp21 = qq12 = 23 . There are three equilibria: a. p = ( 25 , 53 , 0), q = ( 25 , 35 , 0); b. p = (0, 0, 1), q = 2 3 6 2 3 6 (0, 0, 1); and c. p = ( 11 , 11 , 11 ), q = ( 11 , 11 , 11 ).

(4) In examples (1)-(3) each player knows the probability of some subsets of the other player’s set of strategies, and these subsets form an algebra. In the following example the set of subset whose probability is known to player 2 is not an algebra. Consider the following zero-sum game. s1

s2

s3

s4

T

0

1

0

-1

R

1

0

1

2

suppose that player 2 knows the mixed strategy played by player 1. Player 1, on the other hand, is informed of the probability of A = {s1 , s2 } and B = {s2 , s3 } (and, as usual, of S = {s1 , s2 , s3 , s4 }). There is no equilibrium where one of the player plays a pure strategy. Thus, player 1 must play ( 12 , 21 ), in which case any mixed strategy of player 2 is a best response. 24

If player 1 plays (α, 1 − α), then the payoff matrix reduces to s1 s2 s3 s4 1−α

α

1−α

2 − 3α

Suppose that player 2 plays the mixed strategy P . Player 1 evaluates his strategies using the partially-specified (P, Y), where12 Y = {A, B, S}. This is a situation similar R to that discussed in Example 5. The expected payoff of player 1 is precisely ψdY P , with ψ being the function analyzed in Example 5. When α ≤ 12 , this integral was found to be max((1−α)P (S \A)+(1−α)P (S \B)−α, αP (A)+(1−α)P (S \A)). If this maximum is attained at the left figure, max((1−α)P (S \A)+(1−α)P (S \B)−α then player 1, in order to maximize his payoff would choose α = 0, which is impossible. Thus, the maximum is attained at the right figure, αP (A) + (1 − α)P (S \ A). The maximum could be at α = payoff is 12 . R

1 2

only if P (S \ A) = P (A) =

1 2

in which case the expected

A similar calculation is conducted for α ≥ 12 . Now, α ≥ 1 − α ≥ 2 − 3α. Then, ψdY P = max((1−α)P (A)+(1−α)P (B)−(2−3α), (1−α)P (A)+(2−3α)P (S \A)).

The maximum is achieved at α = 12 . Thus, in a partially-specified equilibrium player 1 plays ( 12 , 12 ) and player 2 assigns probability 12 to the strategies in A. In other words, any mixed strategy of player 2 that assigns probability ( 21 , 12 )

1 2

to A forms (together with

of player 1) a partially-specified equilibrium.

The following theorem ensures the existence of a partially-specified equilibrium for every information structure. Theorem 3 Let Ai be the set of events known to player i, i ∈ M . Then, there exists a partially-specified equilibrium w.r.t. A1 , ..., An . This theorem can be proven by a standard fixed-point technique. It is based on the fact that the integral is concave (see Lemma 7) and therefore the best-response correspondence is convex valued. The details are omitted. In Nash equilibrium, each player plays a best response to the actual strategies of the other players. In other words, Nash equilibrium requires that beliefs are rationalized by actual behavior. In partially-specified equilibrium, players play their best response to the partial specification they obtain regarding the actual strategies of the other players. 12

Here an event is identified with its characteristic function.

25

Dow and Werlang (1994) defined equilibrium with non-additive probabilities. In this equilibrium, players play their best response to their beliefs. However, besides one restriction, namely that only strategies which appear in the set defined to be the support13 are allowed to be played, there is no relation between the strategies played and the beliefs. Such a latitude seems to allow for an excessive degree of freedom: beliefs of players are essentially unrelated to their actual behavior. Eichberger and Kelsey (1999) extended the notion of Dow and Werlang (1994) from two-player games to n-player games.

7.2

Partially-specified correlated equilibrium

Correlated equilibrium (Aumann 1972, 1987) refers to a case where the players get some (correlated) information before playing the game. This information need not, and is usually not, related to the game itself. Neither it is related to the state of nature, in case of a game with incomplete information, nor is it related to the payoffs. The actions taken by a player depend on her information. One way to define correlated equilibrium is to extend the game by adding a preplay getting-information-stage. As before, let G = (M, {Bi }i∈M , {ui }i∈M ) be a game. In addition to this game, let Ω, µ be a probability space, which will serve as the correlation means. Before the game starts, a point ω ∈ Ω is randomly selected according to µ. Player i obtains a signal in Hi that depends on ω. Formally, let hi : Ω → Hi . Upon getting a signal hi (ω), player i takes an action. A strategy of player i is a function σi : Hi → ∆(Bi ). A profile σ = (σ1 , ..., σ|M | ) of strategies induces a probability distribution over B, say Pσ . However, not only µ is partially specified, the distribution that (σj (rj ))j6=i induces over B−i (for rj ∈ Hj , j 6= i) is also partially specified to i. When getting the signal ri = hi (ωi ) player i has the partially specified probability (Pσ,µ (·|ri ), Biri ), where Pσ−i ,µ (·|ri ) is the probability distribution induced by σ−i and µ, given ri , and Biri is a set of subsets of B−i . Definition 4 The tuple (σ1 , ..., σ|M | ) is partially-specified-correlated equilibrium w.r.t. Biri , ri ∈ Hi , i ∈ M , if for every player i ∈ M , and for every ri ∈ Hi obtained with positive probability, the mixed strategy σi (ri ) maximizes player i’s payoff. In other 13

The support of a non-additive has a few different plausible interpretations.

26

words, for every player i, ri ∈ Hi and a mixed strategy σi0 , Z Z (9) ui (σi (ri ), σ−i (r−i ))d(Pσ ,µ (·|ri ),Biri ) ≥ ui (σi0 (ri ), σ−i (r−i ))d(Pσ

ri −i ,µ (·|ri ),Bi )

−i

.

Note that each player has a list of partially specified probabilities, one for each possible signal that he may receive from the mediator.

7.3

Partially-specified probabilities and games with incomplete information

A game with incomplete information is characterized by a set Ti of player i’s type and a probability distribution, q over T = ×i∈M Ti . A profile t = (t1 , ..., t|M | ∈ T is selected according to q, player i is informed of his type a the game Gt = (M, {Biti }i∈M , {utii }i∈M ) is played. The probability q, however, is not fully specified to the players. Denote for every player i, T−i = ×j6=i Tj . After obtaining the information ti about his own type, player i gets a partial information about q(·|ti ). More specifically, player i obtains the PSP (q(·|ti ), Iiti ). The set Iiti contains subsets of T−i . A mixed strategy of player i, say σi , assigns to every ti ∈ Ti a mixed strategy in ∆(B ti ). That is, σi (ti ) is a distribution over Biti . Definition 5 The tuple (σ1 , ..., σ|M | ) is an equilibrium with partially specified probabilities about the types that are given by Biti , i ∈ M , ti ∈ Ti , if for every player i ∈ M ,and for every ti ∈ Ti , the mixed strategy σi (ti ) maximizes his payoff. In other words, for every player i and a mixed strategy σi0 , Z Z ti ui (σi (ti ), σ−i (t−i ))dq(·|ti )I ti ≥ utii (σi0 (ti ), σ−i (t−i ))dq(·|ti )I ti . (10) i

i

In a more general setup a vector t is chosen according to a prior q and player receive (possibly correlated) signals that stochastically depend on t. Player i receives the signal si ∈ Si with probability π(si |t). However, player i does not know π. After getting the signal si he has a PSP over T × S−i , where S−i = ×j6=i Sj . This PSP is denoted (pi (·|si ), Ii (·|si )). In this setup, a strategy, σi , of player i assigns to every si ∈ Si a mixed strategy in ∆(Bi ) (Bi is player i’s set of pure actions). The tuple (σ1 , ..., σ|M | ) is an equilibrium with partially specified probabilities about the types, given by (pi (·|si ), Ii (·|si )), i ∈ M , si ∈ Si , if for every player i ∈ M , and 27

for every ti ∈ Ti , the mixed strategy σi (ti ) maximizes his payoff. In other words, for every player i and a mixed strategy σi0 , Z Z t (11) ui (σi (si ), σ−i (s−i ))d(pi (·|si ),Ii (·|si )) ≥ utii (σi0 (si ), σ−i (s−i ))d(pi (·|si ),Ii (·|si )) . Note that in the previous definition players have a partial knowledge about the types of others, while unlike Definition 3 the players know the strategy played by others. Obviously, one might combine the two and define an equilibrium in which players have partially specified probabilities on signals, states and strategies. The definitions seem to be clear and are therefore omitted. Ashlagi et al. (2005) Deal with games with incomplete information where the players are totally unaware of the prior according to which states and signals are chosen. Upon getting a signal a player considers the worst case among all those that are consistent with this signal. In the equilibrium called safety-level equilibrium, player i best responds to the strategies of the other players, while considering the worst case among all those that are consistent the signal he received, si . Thus, players are fully informed of all other players’ strategies, while completely uninformed of the prior. Under these circumstances the equilibrium with partially specified probabilities about the types coincides with safety-level equilibrium. Hyafil and Boutilier (2004) also analyzed non-Bayesian games with incomplete information. They introduced the notion of regret minimizing equilibrium.

8 8.1

Discussion On ambiguity aversion

This analysis is meant to introduce a first-order approximation to the way by which people take decisions in the presence of a partially-specified probability. In Ellsberg’s original decision problem, Gamble X is weakly dominated by gamble T. Nevertheless, the theory of decision making with a partially-specified probability would predict that X and W are equivalent. To make things even worse, suppose that X is modified a bit and instead of $100, the prize for drawing a red ball is $101. In this case, X is predicted to be strictly preferred to T. The reason for this difficulty is that the theory takes an extreme ambiguityaversion approach: any information provided is taken without any sense of skepticism and anything else is ignored. 28

Similar difficulties also arise within the expected utility theory. It might happen that one act weakly dominates another while they are actually equivalent. This happens when big prizes are ascribed probability 0 and are therefore not counted in the expected utility. To improve upon the current theory, one might think of discriminating between different information sources according to their reliability. More reliable sources will be getting a greater weight than less reliable ones. In this case, wild guesses are also reliable to a certain extent, and should therefore be taken into account with a weight determined according to their reliability level. Recall eq. (3). To make the previous paragraph more formal, let Y be a set of random variables and suppose that v be real function defined on Y. v(Y ) is interpreted as ‘the expectation of Y is claimed to be v(Y )’ for every Y ∈ Y. However, the function v may summarize the information received from various sources. The reliability of these sources might vary. One may think of a reliability factor rY attached to every Y ∈ Y. This factor rY is meant to indicate the extent to which the information about Y is reliable. This factor could be then taken into account when evaluating a function ψ, as follows: Z nX o X ψdPY = max rY λY v(Y ); λY Y ≤ ψ and λY ≥ 0 for every Y ∈ Y . Y ∈Y

Y ∈Y

In this figure the number λY v(Y ) is discounted by the coefficient rY to obtain rY λY v(Y ).

8.2

A limited use of partially-specified probability and framing effects

In eq. (3) a function ψ is approximated (from below) by linear combinations of functions from Y. If only positive combinations are allowed, it means that the decision maker makes only limited use of the information available to him. This might account for some irrationality aspects that often occur and for a few framing effects. Let (P, Y) be a partially-specified probability over S and let ψ be a non-negative function defined over S. Denote, Z + nX o X (12) ψdPY = max λY EP (Y ); λY Y ≤ ψ and λY ≥ 0 for every Y ∈ Y . Y ∈Y

Y ∈Y

The following lemma states that without loss of generality, the set Y can be 29

assumed to be convex, or otherwise, the set of extreme points of a convex set.14 Lemma 9 Let (P, Y) be a partially-specified probability. R+ R+ (i) Denote the convex hull of Y by Y 0 , then ψdPY = ψdPY 0 for every ψ. R+ (ii) Denote the set of extreme points of the convex hull of convY by Y 00 , then ψdPY = R+ ψdPY 00 for every ψ. Notation 3 The set that contains all the acts that are mixtures of FaF acts is denoted LF aF . That is, LF aF = conv{[f ]; f ∈ L}, where conv stands for the ‘convex hull of ’. (iii+ ) Convex Fat-Free Independence: Let f, g and h be in LF aF . Then, for every α ∈ (0, 1), f  g implies that αf + (1 − α)h  αg + (1 − α)h. Theorem 4 The binary relation % over L satisfies (i),(ii), (iii+ ),(iv) and (v) if and only if there is a partially-specified probability (P, Y) and an affine function u on ∆(N ) such that for every f, g ∈ L, Z (13)

f %g

iff

Z

+

u(f (s))dPY ≥

+

u(g(s))dPY .

In order to evaluate an act, a decision maker uses Y, which captures available data. Anything in Y is considered to its full extent, while anything out of Y is totally ignored. Theorem 4 states that the evaluation of acts is done only by convex (as opposed to linear) combination of functions whose expectation is known. This extra requirement imposes a limitation on the use of available information and creates a few anomalies. A partially-specified probability (P, Y) induces a capacity v, defined over S: R+ v (E) = 1lE dPY for every E j S. This capacity is neither convex nor does it have a R+ large core (see Sharkey, 1982). Thus, by Azriely and Lehrer (2005), constant+XdPY R+ is typically not equal to constant+ XdPY . Equality holds only if X is either con+

stant or FaF. This implies that a decision-maker whose preferences are determined R+ by a partially specified probability (P, Y) and • dPY is typically not a Choquet expected-utility maximizer nor does he follow the model of decision making based on the minimum over a set of priors (Gilboa and Schmeidler, 1989) 14

The proofs of the assertions made in this sub-section can be found in the first version of this

paper .

30

To illustrate this point, let X = ( 16 , 26 , 36 ) (that is, X is a random variable defined on 5 S = {1, 2, 3}) and suppose that Y = {1lS , X}. Moreover, it is given that E(X) = 12 . R+ In this case (0, 1, 1)dPY = 0, and there is no probability distribution Q that satisfies

EQ (X) ≥

5 12

and the last equation. Thus, there is no set of priors such that the R+ minimum of the respective expected values of any ψ coincides with ψdPY . A sophisticated agent would be able to calculate the expected value of any random variable that can be expressed as a linear combination (not necessarily with positive coefficients) of elements in Y. This is precisely the model proposed above in Theorem 2. In the previous example, (0, 12 , 1) and (1, 12 , 0) are both in15 span(Y) (e.g., (0, 12 , 1) = R+ R+ 1 5 3( 16 , 26 , 36 ) − 21 1lS ). Thus, (0, 1, 1)dPY = (0, 2 , 1)dPY = 3 · 12 − 12 = 34 . This R+ R illustrates the fact that • Pspan(Y) = • P(Y) Another way to demonstrate that the preference order induced by (P, Y) and R+ is not determined by the minimum over a set of probability distributions is to show that the integral is not co-variant with adding a constant. Consider the preR+ 3 R+ 3 1l dPY + 14 = 41 and on the other, 1l + vious example. On one hand 4 {3} 4 {3} R+ 1 1 1 1 1 1 3 1 3 1lS dPY = ( 4 , 4 , 1)dPY . Since, ( 4 , 4 , 1) ≥ ( 4 , 0, 4 ), and ( 4 , 0, 4 ) ∈ Y, one obtains, R+ 3 R R4 1 1 1l dPY + 14 6= 34 1l{3} + 41 1lS dPY . ( 4 , 4 , 1)dPY ≥ 13 > 14 . Thus, 4 {3} Consider a state space S = {1, 2, 3} and Y = {1lS , 1l{1,2} }. A decision maker that follows the model of Theorem 4 knows the probability of the event {1, 2}, but ignores the probability of the event {3} that can be easily calculated. Such a decision maker trusts only explicit information that he obtains and ignores anything else. While obtaining the probability of {1, 2} is equivalent to obtaining the probability of {3}, the specific framing of information might therefore affect the entire decision making process. To put it more formally, denote Y 0 = {1lS , 1l{3} }. The partially-specified R+ probabilities (P, Y) and (P, Y 0 ) are not equivalent as far as is concerned. That is, R+ R+ • dPY 6= • dPY 0 . Such framing effects do not exist when the partially-specified probability is used as described in Theorem 2.

8.3

On computational complexity and decision making

The decision maker is informed of the expectation of each random variable in Y. He then uses this data-base to evaluate acts by computing integrals of random variables. 15

span(Y) stands for the linear space spanned by Y.

31

The complexity of such a computation can be measured by the number of basic computational steps required. In Eq. (3) the integral of the random variable ψ is defined as the maximum over P all summations of the sort Y ∈Y λY EP (Y ) that are below ψ. The length of the summations is not limited a-priori. A decision maker with a bounded computational capacity might be incapable of performing summations of unlimited lengths. Such a decision maker might be capable of computing only summations of a bounded length. In this case, if he can deal with summations of any length less than or equal to L, the evaluation of a random variable ψ is (14) max

L nX `=1

λ` EP (Y` );

L X

o λ` Y ≤ ψ, λ` ∈ R, Y` ∈ Y for every ` = 1, ..., L .

`=1

Using the information available in this way, induces a preference order over acts. It is clear that two decision makers that have the same data-base but different computational capacities would have different preference orders. Limiting the summations involved to those with length of at most L, prevents the decision maker from fully exploit all the the information available. Thus, the restricted calculation expressed in Eq. (14) might be also interpreted as a kind of bounded rationality on the part of the decision maker. It would be interesting to find the properties of the preference orders of boundedly rational decision makers modeled this way. For a given decision maker with a bounded computational capacity, different representations of the information available would result in different preference orders over acts. To illustrate this point, consider a decision maker who is informed only on that the probability of the event A is 12 . When the length of the computable summations is bounded by L = 1, the estimation of the probability of the complement of A, Ac , is 0. However, if the same information is expressed differently, for instance, that the probability of Ac is a half, then the estimation of the probability of A is 0. This is a framing effect similar to that discussed in subsection 8.2.

8.4

More on the fat free independence axiom

The FaF independence axiom states that if f, g and h are in LF aF , then for every α ∈ (0, 1), f  g implies that αf + (1 − α)h  αg + (1 − α)h. Theorem 2 could have proven using a weaker version of this axiom: Let f , g and h be acts of the form 32

αz + (1 − α)y, with z is being FaF, y being a constant and α ∈ (0, 1). Then, for every α ∈ (0, 1), f  g implies that αf + (1 − α)h  αg + (1 − α)h. In other words the independence property is required to hold within every Lz =convLc ∪ {z}.

8.5

On the definition of SFaF

The definition of a strongly fat-free axiom uses the constant act c1 . Then, the Axiom (iii) is formulated. This very axiom is needed to determine that there is a vN-M representation on Lc , which in turn guarantees that there exists a maximal element in M in Lc . Lemma 4 states that the definition of SFaF could have been using M rather than c1 . However, at that point the existence of M is not yet guaranteed. Instead of the current definition of SFaF, one could avoid using c1 and adopt a more stringent definition: f is a strongly fat-free act if any combination of f with any constant act is FaF. In the presence of the other axioms, this definition would be equivalent to the current version.

8.6

On the continuity axiom

The continuity axiom used above is a bit stronger than that used by von Neumann and Morgenstern (1944). It has been required in (iv)(a) (the first part of the continuity axiom) that if f  g and g % h, then there is α in (0, 1) such that αf + (1 − α)h  g. The vN-M axiom requires that the same conclusion holds under a weaker condition: f  g and g  h. The stronger axiom is needed in the proof of Lemmas 1 and 2. The independence axiom holds only for fat-free acts (and their derivatives). A-priori, there is only one fat-free act: m (if exists). Axioms (i), (iv) and a weaker version of (iii) could ensure only that there is a vN-M utility representation of % over Lc . However, to ensure a broader scope of the independence axiom, it is essential that every act has an equivalent act which is fat free. A weaker version of the continuity axiom, would not be sufficient for this purpose.

8.7

Additivity and fat free acts

It turns out that when the probability is specified on a sub-algebra, the integral in eq. (1) is additive over the set of functions that are measurable with respect to this sub-algebra. This means that the implied expected utility is additive when restricted 33

to fat-free acts derived from them. This phenomenon is strongly related to that of Epstein and Zhang (2001), where the probability restricted to the set of unambiguous events is additive. There, this set is more general than an algebra: it might be a λsystem. When the probability is partially specified (not over a sub-algebra), additivity is preserved over the strongly fat-free acts, which by Lemma 3, form a convex set. This does not mean that a convex combination of fat-free (not strongly so) acts is necessarily fat free.

8.8

Updating partially specified probabilities and time consistency

Updating non-additive probabilities in light of a new information has caught a lot of attention16 . When dealing with partially-specified probability, there are two kinds of information arrival: the probability of an event whose probability was previously unknown, or the news that a certain event already occurred (in which case, the uncertainty remains within this event). In the first case, there is no need to update any belief. The model takes into account any new piece of information that comes in. The second kind of information arrival demonstrates some paradoxical outcomes. To illustrate this point, consider Example 7 A paradox of time inconsistency: An urn that contains 100 balls. 50 are either white or black and 50 are either red or green. A random ball is drawn, the decision maker receives some signal that depends on the color drawn, and she then has to take an action. Her utility depends on the color selected and on her action. There are three actions: a, b, and c. Essentially, there are four states, S = {W, B, R, G} the decision maker knows that the probability of {W, B} and of {R, G} is 12 . This is all the information available. Suppose that the utility function is given by the following table:

16

W

B

R

G

a

0

1

3

0

b

1

0

0

3

c

0

2

2

0

See, for instance, Jaffray (1992), Denneberg (1994, 2002), Lapied and Kast (2005) and Lehrer

(2005).

34

The signalling structure is described as follows: if either W or G realize, the decision maker is informed of {W, G} , and she is informed of {B, R} if either B or R realize. In other words, she is not able to distinguish between W and G nor is she able to distinguish between B and R; but she is able to tell whether or not W or G occurred and whether or not B or R occurred. The signals in this case are {W, G} and {B, R}. Upon getting a signal about the state realized, the decision maker has to perform an action. A strategy of the decision maker is a map from signals to actions. That is, a strategy prescribes what actions the decision maker has to take on {W, G} and on {B, R}. She would like to maximize her utility. What would an optimal strategy be? It is clear that on {W, G}, b is the dominant action, and on {B, R}, b is the worst action. The question is therefore to determine whether to take action a or c when either B or G occur. Taking action a results in the payoffs X = (1, 1, 3, 3) (a payoff per state) and taking action c results in the payoffs Y = (1, 2, 2, 3). The integral of X, w.r.t. the partially specified probability described above, is 1 1 2

+ 12 3 = 2, while the integral of Y is 12 1 + 12 2 = 1.5. Thus, playing b after hearing that {W, G} occurred, and a after hearing that {B, R} occurred is an optimal strategy. Now suppose that time comes and either B or R occurred. The decision maker knows it, but she knows nothing beyond it. Given that either B or R occurred, what is the best action? Now c is the best action, because by playing c she guarantees the utility 2, while the minimal utility when playing a is 1. While the (ex-ante) optimal strategy entails playing a on {B, R} when one of these states is realized, the (ex-post) best action is c. It seems that this lack of consistency is intrinsic to decision making with partially specified probabilities (unless one arbitrary additive probability that is consistent with the data is used). Time inconsistencies in non-expected utility models have long been recognized (see, e.g., Machina 1989). The example above shows that this kind of phenomenon occurs also when the underlying probability is additive, the decision maker may even know that this is the case, but she obtains only partial information about it.

8.9

Bayesianism and partially-specified probabilities

An orthodox Bayesian approach would dictate that whenever only partial information about the probability is available, the decision maker should adopt an additive probability which is consistent with the data. For instance, in Ellsberg’s urn, since 35

only the probability of white or black is known and the probability of each color, separately, is unknown, the decision maker should assign a probability to white and to black. A uniform distribution, namely 13 to white and black, seems ‘natural’ in this relatively simple case. Suppose however, that there are five states of nature: S = {s1 , s2 , s3 , s4 , s5 }. Furthermore, the information available is that the probability of {s1 , s2 , s3 } and of {s3 , s4 } is .9, each. Obviously, a uniform distribution over {s1 , s2 , s3 } or over {s3 , s4 } is inconsistent with any distribution that assigns to {s1 , s2 , s3 } and to {s3 , s4 } probability .9, each. A plausible possibility is to adopt the distribution that maximizes the entropy of all probability distributions that are consistent with the information available. This would be a natural way to replace the uniform distribution in case this is impossible. The entropy has some desirable properties that become significant when applied to measuring the quantity of information. However, when it comes to decision problems, the entropy has no greater role than any other (symmetric) concave function. This assertion is bolstered by Blackwell (1953) who compared between information structures in relation to utility maximization. One of his results refers to all concave functions and not to any particular one.

9 9.1

Final comments Convex capacities and partially-specified probabilities.

Let v be a convex capacity (i.e, v is a real function defined on the power set of S such that v(∅) = 0 and v(S) = 1.) An event E in S is called fat-free if F $ E implies v(F ) < v(E). Let (P, A) be a probability specified on a sub-algebra. Define the R capacity vP,A as follows. vP,A (E) = 1lE dPA = maxB⊆E; B∈A P (B). It is clear that vP,A is convex (see footnote 8). The following proposition characterizes those convex capacities that are of the form vP,A (E). Proposition 3 Let v be convex non-additive probability. There exists a probability specified on a sub-algebra (P, A) such that v = vP,A if and only if, for every fat-free event E and for every F , v(E) + v(F ) = v(E ∪ F ) + v(E ∩ F ). The proof appears in the Appendix.

36

Let T ⊆ S. A unanimity capacity uT is defined as uT (E) = 1 if T ⊆ E, and uT (E) = 0, otherwise. It turns out that a unanimity capacity is also of the form vP,A : A = {T, S \ T }, P (T ) = 1 and P (S \ T ) = 0. Moreover, a capacity vP,A is a convex combination of unanimity games of a special kind, as demonstrated by the following lemma. The lemma uses this notation: a sub-algebra A is generated by a partition, denoted Q(A). Lemma 10 Let v be a capacity. There is a probability specified on a sub-algebra, (P, A), such that v = vP,A if and only if there is a partition Q of S such that v = P P T ∈Q αT uT , where αT ≥ 0 and T ∈Q αT = 1. Moreover, Q = Q(A). The proof is simple and is therefore omitted.

9.2

Exact capacities and partially-specified probabilities.

Let (P, Y) be a partially-specified probability and define the capacity vP,Y in a similar manner of vP,A . Lemma 8 shows that vP,Y is exact.17 I do not know how to characterize those exact capacities that stem from a partiallyspecified probability, namely that there exits a partially specified probability (P, Y) R such that v(T ) = 1lT dPY for every T ⊆ S. The following lemma (whose proof is omitted) characterizes the case where v = R+ + vP,Y = 1lT dPY (recall, eq. (12)) and Y consists only of characteristic functions. Lemma 11 Let v be a capacity. There is a partially-specified probability on a (P, Y) + with Y consisting only of characteristic functions such that v = vP,Y if and only if P P v = T ⊆S αT uT , where αT ≥ 0 and T ⊆S αT = 1.

References [1] Anscombe, F. J. and R. J. Aumann (1963) “A definition of subjective probability,” Annals of Mathematical Statistics, 34, 199-205. 17

A capacity v is exact if, for every subset R of S, there is a core element of v, say q, such that

q(R) = v(R).

37

[2] Ashlagi, I., D. Monderer and M. Tennenholtz (2005) “Resource Selection Games with Unknown Number of Players,” mimeo [3] Aumann, R. J. (1974) “Subjectivity and correlation in randomized strategies,” Journal of Mathematical Economics, 1, 6796. [4] Aumann, R. J. (1974) “Correlated equilibrium as an expression of Bayesian rationality,” Econometrica, 55, 118. [5] Azrieli, Y. and E. Lehrer (2005) “A note on the concavification of a cooperative game,” mimeo. [6] Azrieli, Y. and E. Lehrer (2006) “On some families of cooperative fuzzy games,”mimeo. [7] Bewley, T. F. (2002) “Knightian decision theory. Part I,” Decisions in Economics and Finance 25, 79-110. [8] Blackwell, D. (1953) “Equivalent comparison of experiments,” Annals of Mathematical Statistics, 24, 265-272. [9] Denneberg, D. (1994) “Conditioning (updating) non-additive measures,” Annals of Operations Research, 52, 21-42. [10] Denneberg, D. (2002) “Conditional expectation for monotone measures, the discrete case,” Journal of Mathematical Economics, 37, 105-121. [11] Dow J. and S. Werlang (1994): “Nash Equilibrium under Knightian uncertainty: breaking down backward induction, Journal of Economic Theory, 64, 305-324. [12] Eichberger, J., Grant, S. and D. Kelsey (2005) “Updating Choquet beliefs.” mimeo. [13] Eichberger, J. D. Kelsey (1999) “Non-Additive beliefs and strategic equilibria,” Games and Economic Behavior, 30, 183215. [14] Epstein, L. G. and J. Zhang (2001) “Subjective probabilities on subjectively unambiguous events,” Econometrica, 69, 265-306. [15] Gilboa, I. and D. Schmeidler (1989) “Maxmin expected utility with non-unique prior,” Journal of Mathematical Economics, 18, 141-153. 38

[16] Gul, F (1991) “The theory of disappointment aversion,” Econometrica, 59, 667686. [17] Hyafil, N. and C. Boutilier (2004) “Regret Minimizing Equilibria of Games with Strict Type Uncertainty,” mimeo [18] Jaffray, J.-Y. (1992) “Bayesian updating and belief functions,” IEEE Transactions on Systems, Man and Cybernetics, 22, 1144-1152. [19] Lapied, A. and R. Kast (2005)“Updating Choquet valuation and discounting information arrivals.” mimeo; CNRS, Lameta et Idep. [20] Lehrer, E. (2005) “A new integral for capacities”, mimeo [21] Lehrer, E. (2005a) “Updating non-additive probabilities: a geometric approach,” Games and Economic Behavior, 50, 42-57. [22] Machina, M.J. (1989) “Dynamic consistency and non-expected utility models of choice under uncertainty,” Journal of Economic Literature, 27, 16221688. [23] von Neumann, J. and O. Morgenstern (1944) Theory of games and economic behavior, Princeton University Press. [24] Quiggin, J. (1982) “A theory of anticipated utility,” Journal of Economic Bahavior and Organization, 3, 323-343. [25] Savage, L. J. (1954) The Foundations of statistics. New York, NY: Wiley. [26] Schmeidler, D. (1989) “Subjective probabilities without additivity,” Econometrica, 57, 571-587. [27] Sharkey, W. W. (1982) “Cooperative games with large cores,” International Journal of Game Theory, 11, 175-182. [28] Yaari, M. E. (1987) “Dual theory of choice under uncertainty,” Econometrica, 55, 95-115.

39

10

Appendix

Proof of Lemma 1. First, note that f itself is a member of the set A(f ) = W (f ) ∩ {g; f ∼ g}. The independence axiom (iiid ) applies in particular to Lc , and along with (i) and (vi) implies that % over Lc is continuous. Using a diagonalization method, for instance, one can find an infimum w.r.t. ≥ in A(f ), say h. The act h satisfies: (a) g ≥ h for every g ∈ A(f ); and (b) if h0 satisfies h0 ≥ h, and h0 (s)  h(s) whenever f (s)  h(s), then there exists g 0 ∈ A(f ) such that h0 > g 0 . We show that h = [f ]. Now suppose on the contrary, that f  h. By (ii) f > h and therefore, βf + (1 − β)h > h for every β ∈ [0, 1). From (b) it follows that for every β ∈ [0, 1) there is g 0 ∈ A(f ) such that βf + (1 − β)h > g 0 By (iv), there exists α ∈ (0, 1) such that f  αf +(1−α)h. This is a contradiction because by (ii), αf + (1 − α)h % g 0 ∼ f . We obtained that f ∼ h, and h is FaF, as desired. Proof of Lemma 2. A closer look at the previous proof reveals that it hinges on (ii), (iv) and on the fact that % is continuous over Lc . Since the independence axiom (iii) applies to Lc , the latter is true under the assumptions of this lemma and thus the proof is complete. Proof of Lemma 3: Assume that f and g are SFaF and let 0 < α < 1. Let 0 ≤ β < 1 and an act h be such that h < β(αf + (1 − α)g) + (1 − β)M. We show that h ³≺ β(αf + (1 − α)g) + (1 − ´ β)M. Define the act h0 : for every s ∈ S, h0 (s) = max (βαf + (1 − βα)m)(s), h(s) , where the maximum is taken w.r.t. Â. ³ g+ By definition h ≤ h0 . By (ii) h ¹ h0 . Moreover, h0 < βαf + (1 − βα) β(1−α) 1−βα ´ (1−β) M . Thus, there is an act, say h00 , such that h0 = βαf + (1 − βα)h00 . Therefore, 1−βα h00
0) that h ≺ αf + (1 − α)M, as desired. As for the inverse direction, assume that αf + (1 − β)M is FaF for ever β ∈ (0, 1) and that h < αf + (1 − α)c1 . We show that h ≺ αf + (1 − α)c1 . As before, βM ³ + (1 − β)m ∼ c1 , with 0 ≤ β ≤ 1. ´ There is an act g that satisfies, h(s) ³ ∼ (1 − (1 − α)(1 − β))g + (1 − α)(1 − β)m (s) for every s ∈ S. This is so, since (1 − ´ ³ α (1−α)(1−β))m+(1−α)(1−β)m (s) ≤ h(s) ≤ (1−(1−α)(1−β))( (1−(1−α)(1−β) f+ ´ ³ ´ (1−α)β M+(1−α)(1−β)m (s) ≤ (1−(1−α)(1−β))M+(1−α)(1−β)m (s). (1−(1−α)(1−β) Thus, (1 − (1 − α)(1 − β))g + (1 − α)(1 − β)m < αf + (1 − α)(βM + (1 − β)m) = αf + (1 − α)βM + (1 − β)m. Moreover, g < assumption, g ≺

α f 1−(1−α)(1−β)

+

(1−α)β M. 1−(1−α)(1−β)

α f 1−(1−α)(1−β)

+

(1−α)β M. 1−(1−α)(1−β)

By

Since m is SFaF and due to (iii),

α (1 − (1 − α)(1 − β))g + (1 − α)(1 − β)m ≺ (1 − (1 − α)(1 − β))( 1−(1−α)(1−β) f + (1−α)β M) + (1 − α)(1 − β)m = αf + (1 − α)c1 . The left-hand side is equal to h 1−(1−α)(1−β)

and thus, h ≺ αf + (1 − α)c1 , as desired. Proof of Lemma 5: Suppose, contrary tothe lemma, that there is a sequence αn that converges to zero, and [αn f + (1 − αn )M] is not SFaF. Thus, for every n there is 0 ≤ βn < 1 and an act gn such that gn < βn [αn f + (1 − αn )M] + (1 − βn )M and gn ∼ βn [αn f + (1 − αn )M] + (1 − βn )M. There is no act hn such that gn = βn hn + (1 − βn )M. Indeed, if such an hn exists, then βn hn + (1 − βn )M < βn [αn f + (1 − αn )M] + (1 − βn )M and βn hn + (1 − βn )M ∼ βn [αn f + (1 − αn )M] + (1 − βn )M. Since M is SFaF and by (iii), this implies that hn ∼ [αn f + (1 − αn )M] and hn < [αn f + (1 − αn )M], contrary to the definition of [αn f + (1 − αn )M]. Let γ be the smallest constant such that there exists hn such that gn = γn hn + (1 − γn )M (such γn and hn exist since % has a vN-M representation on the set {`; gn is a convex combination of ` and M}). It must be that γn > βn and that, for at least one s ∈ S, gn (s) = m(s). We obtain that γn hn + (1 − γn )M < βn [αn f + (1 − αn )M] + (1 − βn )M ≤ βn αn f + (1 − βn αn )M and γn hn + (1 − γn )M ∼ βn [αn f + (1 − αn )M] + (1 − βn )M ∼ βn αn f + (1 − βn αn )M. The latter implies that hn ∼

βn αn f γn

+

(γn −βn αn ) M. γn

Since there are finitely many states, there are infinitely many n’s and a state s0 such that hn (s0 ) = m(s0 ). Notice that the act h, which coincides with M on S \ {s0 } and with m on s0 , satisfies hn ≤ h for infinitely many n’s. Thus, by (ii), 41

h%

βn αn f γn

+

(γn −βn αn ) M γn

infinitely often. Since

(γn −βn αn ) γn

tends to 0, h ∼ M, which

means that M is not SFaF, contradicting (vi). Proof of Lemma 6. Since Y has a finite dimension, there is a finite set, Y 0 , such R R that span(Y 0 ) = span(Y). Since Y 0 ⊆ Y, ψdPY 0 ≤ ψdPY for every ψ. R P P By definition, ψdPY = Y ∈Y λY EP (Y ), where Y ∈Y λY Y ≤ ψ. Since Y ∈ Y, P Z EP is there are coefficients θYZ , Z ∈ Y 0 such that Y = Z∈Y 0 θY Z. Since, P P P Z 0 additive over Y , Y ∈Y λY EP ( Z∈Y 0 θY Z) = Y ∈Y 0 λY EP (Y ) = P P Z Y ∈Y Z∈Y 0 λY θY EP (Z). The last summation is a linear combination of elements R R from Y 0 . This means that ψdPY ≤ ψdPY 0 and equality is established. R P P Proof of Lemma 7. Let ψdPY = Y ∈Y λY P (Y ), where Y ∈Y λY Y ≤ ψ and R P P P P φdPY = Y ∈Y γY P (Y ), where Y ∈Y γY Y ≤ φ. Thus, α Y ∈Y λY Y +(1−α) Y ∈Y γY Y ≤ αψ + (1 − α)φ. The left-hand side is one of the summations in the right side of eq. (3) and therefore the desired claim. Proof of Lemma 8. By Lemma 6 we can assume that Y is finite. Denote by P ∆ the set of non-negative functions ψ over S such that s∈S ψ(s) = 1 and D = R conv(∆ ∪ −∆). Since • dPY is homogenous (as a function defined on [0, 1]S ), we can assume that Y ⊆ ∆. Define the function T over D as the least concave function on D that satisfies R T (ψ) ≥ P (ψ) and T (−ψ) ≥ −P (ψ) for every ψ ∈ Y. By Lemma 7, • dPY is a concave function over ∆, due to homogeneity it is a concave function over all [0, 1]S . R Thus, T (ψ) = ψdPY for every non-negative ψ ∈ D. As Y is finite, T is piecewise linear: D can be split into a finite number of closed sets on each of which T is linear. As a piecewise linear concave function T can be expressed also as a minimum of finitely many affine functions. That is, there are finitely many S-dimensional vectors Qi and scalars bi such that18 T (ψ) = mini ψ ·Qi + P bi for every ψ ∈ D. Since s∈S ψ(s) = 1 for every ψ ∈ ∆, one can find S-dimensional vectors Q0i such that mini ψ · Qi + bi = ψ · Q0i . Therefore, T is a minimum of finitely many linear functions over ∆. Since T is non-negative over ∆, these vectors are non-negative. It remains to show that the vectors Q0i are all probability vectors. Suppose, on the contrary, that there is ψ ∈ ∆ such that there is no probability vector P such that P · ψ = T (ψ), and for any other φ ∈ ∆, P · φ ≤ T (φ). This means that 18

Here, ‘·’ denotes the inner product between two vectors.

42

there is 0 < a < 1 such that T (αψ + (1 − α)1lS ) > αT (ψ) + (1 − α)T (1lS ). However, R R R R T (αψ +(1−α)1lS ) = αψ +(1−α)1lS dPY = αψdPY + (1−α)1lS dPY = α ψdPY + R (1 − α) 1lS dPY = αT (ψ) + (1 − α)T (1lS ). This is a contradiction. We conclude that the integral of a partially specified probability is a minimum of finitely many regular (additive) integrals. Proof of Proposition 1. Step 1: For every Y and c > 0, I(cY ) = cI(Y ). Fix Y . By (4), I(c[Y ]) = cI([Y ]). By the definition of [Y ], cY ≥ c[Y ]. Thus, by (1), I(cY ) ≥ I(c[Y ]). It remains to show that it cannot be that I(cY ) > I(c[Y ]). If I(cY ) = I([cY ]) > I(c[Y ]), then, since [Y ] is FaF, and due to (4), I([cY ]) > cI([Y ]). Thus, 1c I([cY ]) > I([Y ]). By (4) again, I( 1c [cY ]) > I([Y ]). However, Y ≥ 1c [cY ]). Thus, by (1) I(Y ) ≥ I( 1c [cY ]) > I([Y ]), which is a contradiction. Step 2: For every X, Y and 0 < α < 1, I(αX + (1 − α)Y ) ≥ αI(X) + (1 − α)I(Y ). If I(X) = 0 or I(Y ) = 0, this claim is implied by the previous step and (1). Y X ) = I( I(Y ). Otherwise, I(X) > 0 and I(Y ) > 0. Note that by the previous step, I( I(X) ) ) Y X Denote, d = αI(X)+(1−α)I(Y ). By (5), I( αI(X) + (1−α)I(Y )≥ d I(X) d I(Y ) (1−α)I(Y ) Y I( I(Y ). d )

αI(X) X I( I(X) )+ d

By Step 1, d1 I(αX + (1 − α)Y ) ≥ d1 (αI(X) + (1 − α)I(Y ), as desired. P Step 3: If Xi , is derived from FaF, αi ≥ 0, i = 1, ..., `, and αi = 1, then P P I( αi Xi ) = αi I(Xi ). Note that if Xi is derived from FaF, then 1lS − Xi is derived from the same FaF. By (3), (15)

I(1lS − Xi ) + I(Xi ) = I(1lS ).

P P P P By Step 2, I( αi Xi ) ≥ αi I(Xi ) and I( αi (1lS − Xi )) ≥ αi I(1lS − Xi ). SumP P ming these two inequality, one obtains from eq. (15), I( αi Xi ) + I( αi (1lS − Xi )) ≥ I(1lS ). However, by Step 2, the left-hand side is less than or equal to P P I( αi (Xi + 1lS − Xi )) = I( αi (1lS )) = I(1lS ). We obtained that all the inequalities are actually equalities, as desired. Step 4: If X >> Y (i.e., X(s) > Y (s) for every s ∈ S), then I(X) > I(Y ). There is C > 0 such that X − Y ≥ c1lS , or X ≥ c1lS + Y . Thus, X ≥ c[1lS ] + [Y ]. (1) implies I(X) ≥ I(c[1lS ] + [Y ]) and Step 3 implies I(c[1lS ] + [Y ]) ≥ I(c[1lS ]) + I([Y ]). (4) implies I(c[1lS ]) + I([Y ]) = cI([1lS ]) + I([Y ]). Thus, I(X) ≥ cI([1lS ]) + I([Y ]) = c + I([Y ]) > I(Y ). (The last equality is by (6) ). Step 5: Separation. Define D = conv{X − I(X)1lS ; X is derived from FaF}. D is a convex set in RS . D does not intersect the open negative orthant, because 43

if

P

P P αi (Xi ) − I(Xi )1lS 0. Let A0 = {A1 , ..., A` } be the set of all minimal positive FaF. The intersection of any two sets in A0 is empty. It has been shown that v is additive over the sets in A0 . Denote by S 0 the union of A0 . It remains to show that v(S) = v(S 0 ). Note that v(S \ S 0 ) = 0, because otherwise a minimal positive FaF would be included in S\S 0 (and therefore be a part of A0 ). Since each of the Ai ’s is FaF, v(A1 ∪(S \S 0 )) = v(A1 )+v(S \S 0 ) = v(A1 ), then v(A2 ∪ (A1 ∪ (S \ S 0 ))) = v(A2 ) + v(A1 ) and so forth, until one gets v(S) = v(S 0 ), as desired. Finally, let A be the sub-algebra generated by A0 ∪ {S \ S 0 }, define P (A) = v(A) for every A ∈ A0 , extend P in a linear manner to A, and for every A 6∈ A set P (A) = maxB⊆A;

B∈A

P (B). It is clear that for every A ⊆ S , P (A) ≤ v(A), with

equality for every A ∈ A. To show a universal equality one may apply the sequential argument used in the previous paragraph.

45