Bayesian Programming Building Jaynes’ robot
Pierre Bessière CNRS - INRIA - Grenoble University Laboratoire LIG - E-Motion
Bayesian-Programming.org 1
Introducing the Robot In order to direct attention to constructive things and away from controversial irrelevancies, we shall invent an imaginary being. Its brain is to be definite rules.
designed by us,
so that it reasons according to certain
These rules will be deduced from simple desiderata which, it appears to us, would be desirable in human brains; i.e., we think that a rational person, should he discover that he was violating one of these desiderata, would wish to revise his thinking. In principle, we are free to adopt any rules we please; that is our way of defining which robot we shall study. Comparing its reasoning with yours, if you find no resemblance you are in turn free to reject our robot and design a different one more to your liking. But if you find a very strong resemblance, and decide that you want and trust this robot to help you in your own problems of inference, then that will be an accomplishment of the theory, not a premise. Our robot is going to reason about propositions. As already indicated above, we shall denote various propositions by italicized capital letters, A, B, C, etc., and for the time being we must require that any proposition used must have, to the robot, an unambiguous meaning and must be of the simple, definite logical type that must be either true or false. That is, until otherwise stated we shall be concerned only with two valued logic, or Aristotelian logic.
We do not require that the truth or falsity of such an Aristotelian proposition be ascertainable by any feasible investigation; indeed, our inability to do this is usually just the reason why we need the robot's help. ...
E.T. Jaynes, Probability theory: The logic of Science, Page 9 2
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic
Inference: ProBT® (1995)
(1991)
Cognitive models: BIBA, BACS (2000)
3
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic
Inference: ProBT® (1995)
(1991)
Cognitive models: BIBA, BACS (2000)
3
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic
Inference: ProBT® (1995)
(1991)
Cognitive models: BIBA, BACS (2000)
3
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic
Inference: ProBT® (1995)
(1991)
Cognitive models: BIBA, BACS (2000)
3
Overview
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic (1991)
Cognitive models: BIBA, BACS (2000)
4
Inference: ProBT® (1995)
Incompleteness Beam-in-the-Bin experiment (set-up)
5
Incompleteness Beam-in-the-Bin experiment (Result 1)
6
Incompleteness Beam-in-the-Bin experiment (Result 2)
7
Incompleteness Beam-in-the-Bin experiment (Result 3)
8
Incompleteness Beam-in-the-Bin experiment (Result 3)
8
Incompleteness Logical paradigm
Avoid Obstacle
Environment
9
Incompleteness Logical paradigm
Avoid Obstacle
O1
AvoidObs(01) begin ... ... end
P
Environment A
9
Incompleteness Logical paradigm
Avoid Obstacle
O1
AvoidObs(01) begin ... ... end
? =
P
Environment A
9
O1
Incompleteness Logical paradigm
s s e n t e l p m o c In Avoid Obstacle
O1
AvoidObs(01) begin ... ... end
? =
P
Environment
A
9
O1
Overview
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic (1991)
Cognitive models: BIBA, BACS (2000)
10
Inference: ProBT® (1995)
Bayesian Approach Principle
Environment
11
Bayesian Approach Principle
Avoid Obstacle R ( S , M) Connaissances Préalables
Environment
11
Bayesian Approach Principle
Avoid Obstacle R ( S , M) Connaissances Préalables
P(MS | DC) Données Expérimentales
Environment
11
M S
Bayesian Approach Principle
Avoid Obstacle R ( S , M)
=P(M | SDC)
Connaissances Préalables
P(MS | DC) Données Expérimentales
Environment
11
M S
Bayesian Approach An alternative to Logic
Incompleteness
12
Bayesian Approach An alternative to Logic
Incompleteness Preliminary Knowledge + Experimental Data = Probabilistic Representation
Learning Entropy Principles
Uncertainty
12
Bayesian Approach An alternative to Logic
Incompleteness Preliminary Knowledge + Experimental Data = Probabilistic Representation
Learning Entropy Principles
Uncertainty P(a) + P(¬a) = 1
Bayesian inference
P(a ∧b) = P(a) × P(b | a) = P(b) × P(a | b)
Decision 12
Overview
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic (1991)
Cognitive models: BIBA, BACS (2000)
13
Inference: ProBT® (1995)
Bayesian Programming
14
Description
Bayesian Program
Bayesian Programming
Question
14
Bayesian Programming
Description
Bayesian Program
Specification
Identification
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification •Learning from instances
Question
14
Bayesian Programming Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification •Learning from instances
Question
14
Bayesian Programming Related formalisms
15
Overview
Incompleteness (1989)
Formalism: Bayesian Programming (1994)
Industrial Applications: ProBAYES (2003)
Probability as an alternative to Logic (1991)
Cognitive models: BIBA, BACS (2000)
16
Inference: ProBT® (1995)
ProBT® Specification
Description
Bayesian Program
•Variables •Decomposition
•Parametric Forms
Identification •Learning from instances
Question
17
ProBT® main () {
//Variables
Description
Bayesian Program
plFloat read_time; plIntegerType id_type(0,1); plFloat times[5] = {1,2,3,5,10}; plSparseType time_type(5,times); plSymbol id("id",id_type); plSymbol time("time",time_type);
//Parametrical forms
//Construction of P(id) plProbValue id_dist[2] = {0.75,0.25}; plProbTable P_id(id,id_dist); //Construction of P(time | id = john) plProbValue t_john_dist[5] = {20,30,10,5,2}; plProbTable P_t_john(time,t_john_dist);
Bayesian-Programming.org //Construction of P(time | id = bill) plProbValue t_bill_dist[5] = {2,6,10,40,20}; plProbTable P_t_bill(time,t_bill_dist); //Construction de P(time | id) plKernelTable Pt_id(time,id); plValues t_and_id(time^id); t_and_id[id] = 0; Pt_id.push(P_t_john,t_and_id); t_and_id[id] = 1; Pt_id.push(P_t_bill,t_and_id);
//Decomposition
// P(time id) = P(id) P(time | id) plJointDistribution jd(time^id,P_id*Pt_id);
Question
//Question
//Getting the question P(id | time) plCndKernel Pid_t; jd.ask(Pid_t,id,time); //Read a time from the key board cout > > Decomposition: > > > > P (Dir ∧ P rox ∧ Θ ∧ V rot ∧ H | πhoming ) = > > > > P (Dir ∧ P rox ∧ Θ | πhoming ) > > > > ×P (H | P rox ∧ πhoming ) > > < ×P (V rot | Dir ∧ P rox ∧ Θ ∧ H ∧ πhoming ) Parametric Forms: > > > > P (Dir ∧ P rox ∧ Θ | πhoming ) ≡ Uniform > > > > > > > > > P ([H = avoidance] | P rox ∧ πhoming ) = Sigmoidα,β (P rox) > > > > > > > > > > > > > P (V rot | Dir ∧ P rox ∧ Θ ∧ [H = avoidance] ∧ πhoming ) > > > > > > > > > > > > > ≡ P (V rot | Dir ∧ P rox ∧ πavoidance ) > > > > > > > > > > > > P (V rot | Dir ∧ P rox ∧ Θ ∧ [H = phototaxy] ∧ πhoming ) > > > : > > > > > > ≡ P (V rot | Θ ∧ P rox ∧ πphototaxy ) > > > > > > > > Identification: > > : > > > No learning > > > > > : Question: P (V rot | dir ∧ prox ∧ θ ∧ πhoming ) 8 > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > Dir, P rox, Θ, V rot, H > > > > Decomposition: > > > > P (Dir ∧ P rox ∧ Θ ∧ V rot ∧ H | πhoming ) = > > > > P (Dir ∧ P rox ∧ Θ | πhoming ) > > > > ×P (H | P rox ∧ πhoming ) > > < ×P (V rot | Dir ∧ P rox ∧ Θ ∧ H ∧ πhoming ) Parametric Forms: > > > > P (Dir ∧ P rox ∧ Θ | πhoming ) ≡ Uniform > > > > > > > > > P ([H = avoidance] | P rox ∧ πhoming ) = Sigmoidα,β (P rox) > > > > > > > > > > > > > P (V rot | Dir ∧ P rox ∧ Θ ∧ [H = avoidance] ∧ πhoming ) > > > > > > > > > > > > > ≡ P (V rot | Dir ∧ P rox ∧ πavoidance ) > > > > > > > > > > > > P (V rot | Dir ∧ P rox ∧ Θ ∧ [H = phototaxy] ∧ πhoming ) > > > : > > > > > > ≡ P (V rot | Θ ∧ P rox ∧ πphototaxy ) > > > > > > > > Identification: > > : > > > No learning > > > > > : Question: P (V rot | dir ∧ prox ∧ θ ∧ πhoming ) 8 > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > B t , B t−1 , H t , W t , OW t , N t , N Ot , W P t and HP t > > > > Decomposition: > > > > P (B t ∧ B t−1 ∧ H t ∧ W t ∧ OW t ∧ N t ∧ N Ot ∧ W P t ∧ HP t ) = > > < P (B t−1 ) × P (B t | B t−1 ) > ×P (H t | B t ) × P (W t | B t ) × P (OW t | B t ) × P (N t | B t ) > > t t t t t t > > > > ×P (N O | B ) × P (W P | B ) × P (HP | B ) > > > > > > Parametric Forms: > > > > > > > > > > > > P (B t−1 ): uniform; > > > > > : > > > > > All other distributions are tables > > > > > > > > > Identification: > > : > > > None > > > > Question: > : P (B t | bt−1 ∧ ht ∧ wt ∧ owt ∧ nt ∧ not ∧ wpt ∧ hpt ) 8 > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > B t , B t−1 , H t , W t , OW t , N t , N Ot , W P t and HP t > > > > Decomposition: > > > > P (B t ∧ B t−1 ∧ H t ∧ W t ∧ OW t ∧ N t ∧ N Ot ∧ W P t ∧ HP t ) = > > < P (B t−1 ) × P (B t | B t−1 ) > ×P (H t | B t ) × P (W t | B t ) × P (OW t | B t ) × P (N t | B t ) > > t t t t t t > > > > ×P (N O | B ) × P (W P | B ) × P (HP | B ) > > > > > > Parametric Forms: > > > > > > > > > > > > P (B t−1 ): uniform; > > > > > : > > > > > All other distributions are tables > > > > > > > > > Identification: > > : > > > None > > > > Question: > : P (B t | bt−1 ∧ ht ∧ wt ∧ owt ∧ nt ∧ not ∧ wpt ∧ hpt ) 8 > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > B t , B t−1 , H t , W t , OW t , N t , N Ot , W P t and HP t > > > > Decomposition: > > > > P (B t ∧ B t−1 ∧ H t ∧ W t ∧ OW t ∧ N t ∧ N Ot ∧ W P t ∧ HP t ) = > > < P (B t−1 ) × P (B t | B t−1 ) > ×P (H t | B t ) × P (W t | B t ) × P (OW t | B t ) × P (N t | B t ) > > t t t t t t > > > > ×P (N O | B ) × P (W P | B ) × P (HP | B ) > > > > > > Parametric Forms: > > > > > > > > > > > > P (B t−1 ): uniform; > > > > > : > > > > > All other distributions are tables > > > > > > > > > Identification: > > : > > > None > > > > Question: > : P (B t | bt−1 ∧ ht ∧ wt ∧ owt ∧ nt ∧ not ∧ wpt ∧ hpt ) 8 > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > B t , B t−1 , H t , W t , OW t , N t , N Ot , W P t and HP t > > > > Decomposition: > > > > P (B t ∧ B t−1 ∧ H t ∧ W t ∧ OW t ∧ N t ∧ N Ot ∧ W P t ∧ HP t ) = > > < P (B t−1 ) × P (B t | B t−1 ) > ×P (H t | B t ) × P (W t | B t ) × P (OW t | B t ) × P (N t | B t ) > > t t t t t t > > > > ×P (N O | B ) × P (W P | B ) × P (HP | B ) > > > > > > Parametric Forms: > > > > > > > > > > > > P (B t−1 ): uniform; > > > > > : > > > > > All other distributions are tables > > > > > > > > > Identification: > > : > > > None > > > > Question: > : P (B t | bt−1 ∧ ht ∧ wt ∧ owt ∧ nt ∧ not ∧ wpt ∧ hpt ) 8 > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> 0:t 0:t > > Si0:t , Zi0:t , α0:t , βi0:t , B 0:t , λ0:t i ,C i ,M > > > > Decomposition: > > 0:t > > P (Si0:t ∧ Zi0:t ∧ C 0:t ∧ α0:t ∧ βi0:t ∧ M 0:t ∧ λ0:t |πi ) = i ∧B > > ˛ 3i 2 j−1 j > j−1 > ˛ M π ) P (S |S i > i i > ˛ > > 7 6 ×P (Zij |Sij C j πi ) ˛ > > 7 6 ˛ Q > > 7 ˛ t 6 ×P (C j |πi ) × P (αi |C j B j S j πi ) > > ˛ j=1 6 i 7 > j j−1 j j > ˛ 5 4 > ×P (B |πi ) × P (βi |B Si B πi ) > ˛ > > j j j−1 j j ˛ > πi ) ×P (M |πi ) × P (λi |M Si B M > ˛ > < ˛ ×P (Si0 Zi0 βi0 B 0 λ0i M 0 |πi ). Parametric Forms: > > > j−1 j j−1 > > πi ) = Dynamic Model > > P (Si |Si M > > > j j j > > > > |S C π ) = Sensor model P (Z i > > i i > > > j > > > P (C |π ) = A priori about Attention Variables i > > > > > > j j j > > π )Attention model in fusion with coherence form P (α |C B S > > i i > i > > > > > > j > > > > P (B |π ) = A priori about Behaviour variables > > i > > > > > > j j j−1 > > > > > > P (β |B S B π ) = Behaviour model in fusion with coherence form i i > i > > > > > j > > > > > > P (M |π ) = A priori about motor variables i > > > > > > > > j j j j−1 > > > > P (λ |M S B M π ) = Motor model in fusion with coherence form > i i > > i > : > > > 0 0 0 0 > > P (S Z λ M |π ) = Initial Conditions > i > i i i > > > > > > Identification: > > > : > > A Priori or Learning Method > > > > Question: > > > j 0:j−1 0:j−1 0:j−1 > > m c α0:j−1 βi0:j−1 λ0:j−1 πi ) - Prediction of States > P (Si |zi i i > 0:j−1 0:j 0:j−1 0:j−1 > j 0:j−1 0:j−1 > P (C |z m c α β λ π ) - Determination of Attention i > i i i i > > 0:j 0:j 0:j−1 j 0:j 0:j−1 0:j > P (B |z m c α β λ π ) Determination of Behaviour > i i i i i > > j 0:j 0:j 0:j 0:j−1 0:j−1 0:j > > P (S |z m c α β λ π ) Estimation of States i i i i i i > : P (M j |zi0:j m0:j−1 b0:j c0:j α0:j βi0:j λ0:j πi ) - Motor Commands i i 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > C :an index that identify each 2D cell of the grid > > > > A :an index that identify each possible antecedent of the cell > > > > c over all the cells in the 2D grid > > > > Z :sensor measurement relative to the cell c t > > > > V :The set of velocities for the cell c where > > > > V is discretized in n cases; V ∈ V = {v1 , . . . , vn } > > > −1 > O, O : Taking values from the set O ≡ {occ, emp} > > > > indicating if the cell c is ‘occupied’ or ‘empty’. > > > > O−1 represents the random variable of the state of an < antecedent cell of c through the possible motion through c. > > Decomposition: > > > > > > > P (C A Z O O−1 V ) = > > > > > > > P (A)P (V |A)P (C|V, A)P (O−1 |A)P (O|O−1 )P (Z|O, V, C) > > > > > > > > > > > > > > Parametric Forms: > > > > > > > > > > > > > P (A): uniform; > > > > > > > > > > > > P (V | A): conditional velocity distribution of antecedent cell; > > > > > > > > > > > > P (C | V A): dirac representing reachability (see 2.3); > > > > > > > > > > > > P (O−1 | A): conditional occupancy distribution of antecdent cell; > > > > > > > > > > > > P (O | O−1 ): occupancy transitional matrix (see 2.3); > > > : > > > > > > P (Z | O V C): observation model; > > > > > > > > Identification: > > : > > > None > > > > Question: > > > > P (O | Z C) > > : P (V | Z C) 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > C :an index that identify each 2D cell of the grid > > > > A :an index that identify each possible antecedent of the cell > > > > c over all the cells in the 2D grid > > > > Z :sensor measurement relative to the cell c t > > > > V :The set of velocities for the cell c where > > > > V is discretized in n cases; V ∈ V = {v1 , . . . , vn } > > > −1 > O, O : Taking values from the set O ≡ {occ, emp} > > > > indicating if the cell c is ‘occupied’ or ‘empty’. > > > > O−1 represents the random variable of the state of an < antecedent cell of c through the possible motion through c. > > Decomposition: > > > > > > P (C A Z O O−1 V ) = > > > > > > > > P (A)P (V |A)P (C|V, A)P (O−1 |A)P (O|O−1 )P (Z|O, V, C) > > > > > > > > > > > > > > Parametric Forms: > > > > > > > > > > > > > P (A): uniform; > > > > > > > > > > > > P (V | A): conditional velocity distribution of antecedent cell; > > > > > > > > > > > > P (C | V A): dirac representing reachability (see 2.3); > > > > > > > > > > > > P (O−1 | A): conditional occupancy distribution of antecdent cell; > > > > > > > > > > > > P (O | O−1 ): occupancy transitional matrix (see 2.3); > > > : > > > > > > P (Z | O V C): observation model; > > > > > > > > Identification: > > : > > > None > > > > Question: > > > > P (O | Z C) > > : P (V | Z C) 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > :
359
8 Relevant Variables: > > > > Lh, T b, T d, Xh, Y h, Al, F1 and F2 > > > > Decomposition: > > > > P (Lh ∧ T b ∧ T d ∧ Xh ∧ Y h ∧ Al ∧ F1 ∧ F2 ) = > > > > P (Xh) × P (Y h) × P (Al) > > > > ×P (Lh | Al) × P (T b | Xh ∧ Y h) × P (T d | Xh ∧ Y h ∧ T b) > > > > ×P (F1 | Xh ∧ Y h ∧ Al) × P (F2 | Xh ∧ Y h ∧ Al) > > < Parametric Forms: P (Xh) ≡ Uniform > > > > P (Y h) ≡ Uniform > > > > > > > > P (Al) ≡ Uniform > > > > > > > > > P (Lh | Al) ≡ G(µ(Al), σ(Al)) > > > > > > > > > P (T b | Xh ∧ Y h) ≡ G(µ(Xh, Y h), σ(Xh, Y h)) > > > > > > > > P (T d | Xh ∧ Y h ∧ T b) ≡ G(µ(Xh, Y h, T b), σ(Xh, Y h, T b)) > > > > > > > > P (F1 | Xh ∧ Y h ∧ Al) ≡ G(µ(Xh, Y h, Al), σ(Xh, Y h, Al)) > > : > > > P (F2 | Xh ∧ Y h ∧ Al) ≡ G(µ(Xh, Y h, Al), σ(Xh, Y h, Al)) > > > > > : Identification: See text (Sections 4.3, 4.5 and 4.6) Question: P (Lh ∧ T b ∧ T d | f1 ∧ f2 ) or P (Lh ∧ T b ∧ T d | f1 ∧ f2 ∧ al) 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > :
359
8 Relevant Variables: > > > > Lh, T b, T d, Xh, Y h, Al, F1 and F2 > > > > Decomposition: > > > > P (Lh ∧ T b ∧ T d ∧ Xh ∧ Y h ∧ Al ∧ F1 ∧ F2 ) = > > > > P (Xh) × P (Y h) × P (Al) > > > > ×P (Lh | Al) × P (T b | Xh ∧ Y h) × P (T d | Xh ∧ Y h ∧ T b) > > > > ×P (F1 | Xh ∧ Y h ∧ Al) × P (F2 | Xh ∧ Y h ∧ Al) > > < Parametric Forms: P (Xh) ≡ Uniform > > > > P (Y h) ≡ Uniform > > > > > > > > P (Al) ≡ Uniform > > > > > > > > > P (Lh | Al) ≡ G(µ(Al), σ(Al)) > > > > > > > > > P (T b | Xh ∧ Y h) ≡ G(µ(Xh, Y h), σ(Xh, Y h)) > > > > > > > > P (T d | Xh ∧ Y h ∧ T b) ≡ G(µ(Xh, Y h, T b), σ(Xh, Y h, T b)) > > > > > > > > P (F1 | Xh ∧ Y h ∧ Al) ≡ G(µ(Xh, Y h, Al), σ(Xh, Y h, Al)) > > : > > > P (F2 | Xh ∧ Y h ∧ Al) ≡ G(µ(Xh, Y h, Al), σ(Xh, Y h, Al)) > > > > > : Identification: See text (Sections 4.3, 4.5 and 4.6) Question: P (Lh ∧ T b ∧ T d | f1 ∧ f2 ) or P (Lh ∧ T b ∧ T d | f1 ∧ f2 ∧ al) 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >