A Bayesian Network Model for Discovering

Nov 5, 2003 - 4. Conclusion and perspectives. We describe here an application of bayesian networks to model and discover handwriting strategies of primary.
164KB taille 4 téléchargements 330 vues
Teulings, H.L., Van Gemmert, A.W.A. (Eds.),

in 11th of Conference of theofInternational Society, IGS'2003, Proceedings the 11th Conference the InternationalGraphonomics Graphonomics Society (IGS2003), Scottsdale, Arizona, pp. 178-181, 2003. 2-5 November 2003, Scottsdale, Arizona, USA.

A Bayesian Network Model for Discovering Handwriting Strategies of Primary School Children I. ZAAROUR1,2 , Ph. LERAY1 , L. HEUTTE1 , B. ETER3 , J. LABICHE1 , D. MELLIER4 1 Laboratoire PSI FRE CNRS 2645, Universit´e et INSA de Rouen, France [email protected], {Laurent.Heutte, Jacques Labiche}@univ-rouen.fr 2 Facult´e de g´enie, branche I, Universit´e Libanaise, Beyrouth, Liban [email protected] 3 Laboratoire LPM, Universit´e Libanaise, Beyrouth, Liban b [email protected] 4 Laboratoire PSY.CO (EA 1780), Universit´e de Rouen, France [email protected] Abstract. In this paper, we present the results of a longitudinal study for the evolution follow-up in writing among typical pupils in primary education. We propose a method aimed at discovering groups of pupils sharing the same handwriting strategies along their primary education. From the on-line acquisition of writing and drawing tests, writing strategies are modeled by means of a bayesian network. Expert knowledge partially determines the bayesian network structure, in which the writing strategy is represented by a latent variable. By considering that each writing test is represented by its own (local) strategy and that there exists a global strategy which deals with each local strategy, we propose a Global Hierarchical Model. This model is an unsupervised classifier that links handwriting and drawing features, local and global strategies. We give here preliminary results concerning a first phase of strategy discovering and labelling, and a second phase of observation of the pupil evolution in the discovered strategies.

1. Introduction Writing mastery requires a specific learning that superimposes itself to the child’s motor control development (R´emi et al., 2002). A fundamental hypothesis proposed in the literature states that ”some writing features can be used like indicators of the development level”. In the present paper, we illustrate the notion of handwriting strategy with respect to the measured variables, and we categorize sets of pupils with similar features by considering that these classes share the same writing strategy. As we treat latent variables (i.e. the non measurable writing strategies), uncertain data, missing values and a-priori knowledge validated by experts, we choose to use bayesian networks to model our problem. Bayesian networks are knowledge representation and inference tools permitting to graphically specify the probabilistic dependences between variables (Pearl, 1988). These probabilistic models help us to learn the relationship between measured features (features of the graphical control activities computed from the drawings acquired using a graphical tablet) and unmeasured writing strategies. We then label the most frequent global strategies discovered by our model. Finally, we study the pupil evolution in the discovered strategies over three periods.

2. Experimental protocol Completing the works of R´emi et al. (1997, 2002) and Amara et al. (1997), we developed a software for the online acquisition of handwritten drawing and writing psychological tests, and automatic extraction of drawing and writing features. This software proposes seven features: average speed (vit), speed standard deviation (ec), stroke number (trai), pause duration (paus), average acceleration (ac), movement fluency (f l) and drawing duration (ex). Previous experiments show that fluency is a good indicator for motor control, stroke number is related to spatialization and that pause duration reveals planning ability (Zesiger, 1995). Data acquisition has been performed three time (T1 , T2 and T3 , at intervals of six months) with the same 102 pupils, under the same experimental conditions. Thus, we obtain a database with 306 measurements (102 * 3) of 42 features (i.e. 7 variables for 6 psychological tests). Psychological tests used in these experiments have been proposed by Westzaan and Kosterman (1993) and R´emi et al. (2002) in order to test some specific abilities (two drawing tests derived from Bender’s work (cf. (Zazzo, 1992)), and four writing tests : writing of his/her first and last names, writing of a specific 178

(and isolated) word, and writing of a specific sentence with and without the sentence model). The selected pupils, chosen in 5 different school levels, are considered as ”typical” students (they redoubled no class and don’t have established difficulties of the oral or written language).

3. Methodology We represent a writing strategy by a latent discrete variable (the value of this variable corresponds to unknown writing strategies). Determining the size of a latent variable is a difficult problem (Elidan & Friedman, 2001). It can be estimated by using cross-validation methods or by exploiting expert knowledge. As a first approximation, we used previous results obtained with another clustering method, the k-means algorithm (cf. (Duda et al., 2001)). The network structure (i.e. conditional dependence between variables) is partially defined by the experts. The learning of the network parameters (i.e. the conditional probabilities) with incomplete data is performed with the Expectation-Maximisation algorithm (Dempster et al., 1977). After this learning phase, we try to label the discovered writing strategies by displaying and interpreting the probability distribution of each measured variable conditionally to the writing strategies. We then cluster pupils according to these strategies and study the temporal evolution of these clusters according to the school level and to the three periods of acquisition.

3.1. Bayesian network modelling for discovering strategies Let assume that each psychological test is represented by its own (local) strategy and that the relationship between measured variables (for the corresponding test) and the local strategy are given by the ”augmented naive” bayesian network described in Figure 1 (each measured feature has the (latent) strategy variable as a parent but can also have other measured parents). Friedman et al. (1997) have shown that these models are a good compromise between the simplicity of naive bayes classifier and the bayesian network ability to represent conditional dependence. Inspired by the hierarchical latent class models proposed by Zhang (2002) for clustering, we also postulate that the local strategies of each test are all dependent of a global strategy to obtain the hierarchical model described in Figure 2 (Global Hierarchical Model). From a cognitive point of view, this model suggests that (1) each of the psychological tests used in our experiments involves some particular motor abilities and (2) all these motor abilities result from the (global) motor control. After the learning phase (determination of local conditional probability tables with

Figure 1. Local model : relations between measured features (represented by white nodes) are given by expert knowledge, but they all depend on a (local) strategy (represented by a gray node).

Figure 2. Global hierarchical model : the local strategies of each of the 6 psychological tests depend on global strategy (represented by a gray node).

179

?

the EM algorithm), the bayesian network can be used as an inference tool. So, we can compute and display the conditional probability distribution of any variable given the observation of any other variables. The a priori probability distribution of the global strategy shows us that the most current strategies are Glob = 1 (56%) and Glob = 4 (28%). The interpretation of the most probable local strategies and writing features for these two global strategies leads us to label the Glob = 4 strategy as ”normal-writers” (N W ), and the Glob = 1 strategy as ”more advanced normal-writers” (N W +).

3.2. Longitudinal Study

Table 1 describes the strategy evolution of our 102 pupils over time. For example, 40 pupils are clustered in the N W strategy at time T1 , and 18 of these 40 stay in this strategy at time T2 . This figure also reveals an interesting property: the probability of transition between N W and N W + and the probability to stay in N W or N W + strategy are constant over time. By example, P (N W at T 2 |N W at T1 ) = 18/40 = 0.45 and P (N W at T3 |N W at T2 ) = 13/28 = 0.46. With this table, we can show that about 46% of the pupils stay in N W strategy between two consecutive times, while 40% arrive in N W + strategy. As they arrived in N W + strategy, their probability of staying in N W + is about 87%. Progressively, pupils will evolve from N W to N W + strategy. Figure 3 represents the pupil distribution in strategy N W (resp N W +) over time and for the 5 different school levels. As we can see, the distribution by school level in these two strategies is constant over time. We can also notice that the probability to be in N W strategy increases from the first school level to the medium one and then decreases. Logically, the probability to be in N W + strategy is quite low during the first school levels and subsequently increases from the medium school level to the last one.

Global strategy (T1 )

Global strategy (T2 )

NW Glob = 2 Glob = 3 NW + T ot(T2 )

NW Glob = 2 Glob = 3 NW + T ot(T3 )

Global Glob = 2 4

strategy Glob = 3 2

3 3 10

3

NW 13 4

Global Glob = 2 1 3

strategy Glob = 3 2

3 20

1 5

3 5

NW 18 2 5 3 28

5

(T2 ) NW + 16 1 4 38 59

T ot(T1 ) 40 3 15 44

(T3 ) NW + 12 3 5 52 72

T ot(T2 ) 28 8 5 59

Table 1. Strategy evolution of our 102 pupils between T1 and T2 (top) and between T2 and T3 (bottom). Among the 40 pupils in N W strategy at T1 , 18 stay in the same strategy while 16 evolve to N W +. In the same time, 38 of the 44 pupils in N W + strategy remain in the same strategy. We can notice that the evolution is proportionally the same between T2 and T3 .

16

NW (T2)

NW (T1)

12

7

10

12

9

6

8

10

5

7

8

6

4

5

6

3

4

4

3

2

2

2

1

1

0

20

0

NW+ (T1)

0

20

16

17,5

14

15

25

22,5 20

17,5 15

12,5

12,5

10

10

10

8

7,5

7,5

6 4 2 0

NW+ (T3)

NW+ (T2)

22,5

18

12

NW (T3)

8

11

14

5

5

2,5

2,5

0

0

Figure 3. Probability distribution of the 5 different school levels (x axis) for pupils in N W strategy (first row) and N W + (second row) over time T1 , T2 and T3 (in columns).

180

4. Conclusion and perspectives We describe here an application of bayesian networks to model and discover handwriting strategies of primary school children from an automatic feature extraction of psychological tests. The first results of our hierarchical model lead us to discover two global strategies that correspond to normal-writer pupils (N W ) and more advanced normal-writers (N W +). A longitudinal and temporal study of the evolution of the pupils in these strategies also show that these two strategies are consistent: the distribution of typical pupils by school level is constant over time, and the probability of transition between (or within) these strategies is also constant over time. The two global strategies Glob = 2 and Glob = 3 appear to be intermediate strategies and are not labelled yet. A specific study has to be performed, examining more thoroughly the local strategies and the writing features involved by these intermediate strategies. The model we proposed takes advantage of the bayesian network framework : learning conditional probabilities from uncertain and incomplete data, taking into account expert knowledge to partially determine the network structure and using inference algorithm to display the probability distribution of any variable. As a direct application, given some test results, our model can estimate in which strategy the pupil is, or if this pupil does not follow one of our strategies. With the inverse reasoning, given a specific strategy, or a known problem, we could also study the ability of each psychological test to detect it. Recall that the transition matrix given in Table 1 is almost constant over time. An interesting extension would be now to consider our global hierarchical model (Figure 2) as the static part of the dynamic bayesian network described in Figure 4 (whose principle is quite similar to Hidden Markov Models). Our preliminary results tend to prove that this type of dynamic bayesian network could model the pupil evolution and be used to infer some test results or writing strategy at time t + 1 given some previous test results.

Figure 4. Global hierarchical model improvement to model the temporal evolution between global strategies.

5. References Amara, M., Courtellement, P., de Brucq, D., & Devinoy, R. (1997). An analysis software tool for handwriting : Writing and drawing applications. In Proceedings of the 8th biennal conference of the international graphonomics society, IGS’97 (p. 107-108). G`enes, Italie. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incompete data via the EM algorithm. Journal of the Royal Statistical Society, B 39, 1-38. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. John Wiley and Sons Inc. Elidan, G., & Friedman, N. (2001). Learning the dimensionality of hidden variables. In Uncertainty in artificial intelligence: Proceedings of the seventeenth conference (UAI-2001) (p. 144-151). San Francisco, CA: Morgan Kaufmann Publishers. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131 – 163. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann. R´emi, C., Amara, M., Courtellement, P., & de Brucq, D. (1997). An experimental program for the automatic study of handwriting learning at elementary school. In Proceedings of the 8th biennal conference of the international graphonomics society, IGS’97 (p. 75-76). G`enes, Italie. R´emi, C., Fr´elicot, C., & Courtellemont, P. (2002). Automatic analysis of the structuring of children’s drawing and writing. Pattern Recognition, 35(5), 1059–1069. Westzaan, P., & Kosterman, J. (1993). Diagnosis and evaluation of fine motor behaviour in children. In Proceedings of ICOD’93. Zazzo, R. (1992). Test moteur de structuration visuelle de bender. In Manuel pour l’examen psychologique de l’enfant, vol.1. Delachaux et Niestle; Actualit´es p´edagogiques et psychologiques. Zesiger, P. (1995). Approches cognitive, neuropsychologique et d´eveloppementale. Presses Universitaire de France. Zhang, N. L. (2002). Hierarchical latent class models for cluster analysis. In Proceedings of AAAI’02 (p. 230-237).

181