A multiple agent architecture for handwritten text ... - Laurent HEUTTE

IMK) using classical recognition schemes that exploit ..... [8] B. Plessis, A. Sicsu, L. Heutte, E. Menu, E. Lecolinet, ... He worked on handwritten character recognition for the automatic reading of French postal checks and US postal addresses at ...

Télécharger le PDF

283KB taille 18 téléchargements 327 vues

commentaire

Report

Pattern Recognition 37 (2004) 665 – 674 www.elsevier.com/locate/patcog

A multiple agent architecture for handwritten text recognition L. Heutte∗ , A. Nosary, T. Paquet Laboratoire PSI - FRE CNRS 2645, UFR des Sciences, Universite de Rouen, Place Emile Blondel, Mont-Saint-Aignan Cedex F-76821, France Received 16 September 2003; accepted 23 September 2003

Abstract This paper investigates the automatic reading of unconstrained omni-writer handwritten texts. It shows how to endow the reading system with learning faculties necessary to adapt the recognition to each writer’s handwriting. In the 2rst part of this paper, we explain how the recognition system can be adapted to a current handwriting by exploiting the graphical context de2ned by the writer’s invariants. This adaptation is guaranteed by activating interaction links over the whole text between the recognition procedures of word entities and those of letter entities. In the second part, we justify the need of an open multiple-agent architecture to support the implementation of such a principle of adaptation. The proposed platform allows to plug expert treatments dedicated to handwriting analysis. We show that this platform helps to implement speci2c collaboration or cooperation schemes between agents which bring out new trends in the automatic reading of handwritten texts. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Handwriting recognition; Reading model; Multiple-agent architecture; Adaptation; Cooperation; Arti2cial intelligence

1. Introduction Like a human reader, an automatic reading system should be able to meet two di9erent requirements. It should have omni-writer capabilities in order to recognise any handwriting. It should also have mono-writer capabilities in order to take into account the potential whims of each writer. Therefore, making machines learn to read any hand-written text requires of course sophisticated and highly adapted algorithms of pattern recognition but also requires to manage all together the various interpretation levels (i.e. from graphical level up to lexical and syntactical levels). The human expertise in managing these di9erent interpretation levels relies on some abilities of learning the current handwriting. For example, a human reader facing with a very distorted handwriting is able to delay his decisions until more information is gathered during the reading of other ∗ Corresponding author. Tel.: +33-235-146877; fax: +33-235146618. E-mail address: [email protected] (L. Heutte).

parts of handwriting. The current recognition systems do not have these learning abilities and consider the recognition to be a pure omni-writer problem. They try to recognise handwritten words or letters one independently from the others in a sequential manner [1,2]. Two main approaches are used to perform the recognition of handwritten cursive words. The 2rst, called analytical, is a data-driven bottom-up approach in which letters are recognised before a lexical analysis is performed [3,4]. To counteract the problem of letter segmentation before (without) recognition, several segmentation hypothesis must be managed, which makes in return the letter recognition module more complex since it must be therefore able to reject the bad segmentation hypothesis. However, the 2nal decision can only be taken by the lexical veri2cation module. This scheme of recognition is also called segmentation/recognition. The second approach, called holistic, is a top-down approach with veri2cation. In this approach, the segmentation into letters is counteracted by recognising a word in its whole and by selecting word candidates in a lexicon. This approach leans either on the detection of holistic features in the word [5,6] or on the veri2cation that some letters or parts of letters are present

0031-3203/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2003.09.016

666

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

at some positions in the word [7]. In short, the 2rst approach is well adapted to the recognition of words belonging to a large lexicon or even without lexicon; the second one is rather well adapted to limited-lexicon applications. Note that these two approaches can be combined to improve recognition [8]. Note also that some recent studies try to cope with the problem of handwriting variability by clustering handwritings into families of handwriting styles [9,10]; the recognisers are then trained for each speci2c family but a between-style choice is needed before the recognition phase to select the 2tted recogniser. In a simpli2ed manner, we can say that these systems lean on problem-speci2c recognition schemes but have however no on-line learning abilities which would enable them to adapt themselves to the current handwriting. For these reasons, the conventional systems still remain recognition systems but not reading systems. The authors have introduced in Ref. [11] the concept of writer’s invariants which can be de2ned as the set of similar patterns automatically extracted from the segmentation of a handwriting. They have shown that this concept allows to derive new contextual graphical knowledge that can be used to adapt the recognition task to a particular handwriting and allows to make robust decisions when neither simple lexical nor syntactical rules can be used as, for example, in the case of free lexicon unconstrained handwritten text recognition. In this paper, we explain (Section 2) how the recognition system can adapt itself to the current handwriting to be recognised by exploiting the graphical context de2ned by the writer’s invariants. We show that this adaptation is guaranteed by activating interaction links over the whole text between the recognition procedures of word entities and those of letter entities. In Section 3, we justify the need of an open multiple-agent architecture to support the implementation of such a principle of adaptation. The proposed platform allows to plug expert treatments dedicated to handwriting analysis. In Section 4, we show that this platform helps to implement a wide range of collaboration or cooperation schemes between agents, i.e. various reading strategies. We illustrate by some experimental results one particular strategy of adaptation in Section 5. Finally, some conclusions and future work are drawn in Section 6. 2. Adapting the reading task to the writer Faced to a very distorted handwriting, a human reader is usually able to delay the reading of some words until more contextual knowledge (either symbolic or morphological) is gathered to con2rm the emitted hypothesis. This kind of on-line learning mechanism allows the human reader to adapt to each speci2c handwriting. Applying these principles to an automatic reading machine requires to suspend and later activate treatments when more contextual knowledge disambiguates doubtful interpretations. This implies the ex-

istence of a non-sequential (interactive) recognition scheme able to make treatments interact at di9erent contextual levels (graphical, symbolic, lexical) in order to make coherent decisions considering all of these constraints. We thus introduce a knowledge modelling that takes into account the structure of handwriting and allows to highlight the data and their associated type of knowledge. Interaction between di9erent levels of knowledge will then be introduced using knowledge sources to model the desired interactive architecture. 2.1. Speci6c knowledge modelling Considering that the whole text of the writer has been segmented into graphemes using well-known techniques encountered in the literature [12], each grapheme (corresponding to a letter or not) is characterised by • Intrinsic Morphological Knowledge (IMK): any knowledge that can be extracted from the grapheme pattern alone, such as a set of features detected on the grapheme image for example Ref. [13]. • Contextual Morphological Knowledge (CMK): any knowledge about the grapheme pattern that can be extracted from its environment, such as the invariant cluster the grapheme belongs to [11]. Now the following symbolic knowledge about each grapheme can be provided by di9erent treatments: • Intrinsic Symbolic Knowledge (ISK): any knowledge about the possible letter (label) that can be associated to the grapheme considered alone (e.g. obtained from IMK) using classical recognition schemes that exploit inter-writer invariants [13]. • Contextual Symbolic Knowledge (CSK): any knowledge about the possible letter that can be associated to the grapheme by referring to its context. For example symbolic knowledge about a grapheme can be derived from the invariant cluster it belongs to using the hypothesis made about its neighbours. Symbolic knowledge can also be derived thanks to the use of the lexical constraints applied to the word the grapheme belongs to. 2.2. Exploiting contextual knowledge An attempt to illustrate how a recognition system can exploit this knowledge at word and grapheme levels is shown in Fig. 1. Let us consider that handwritten words have already been localised and that the segmentation into graphemes has been performed for each of them. Let us also consider that IMK and CMK have been extracted for each grapheme. Then the following links can be activated at the grapheme level: (a) a character recognition

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

667

Fig. 1. Illustration of the role of the writer’s invariants in the interactive recognition system.

procedure can provide ISK to each grapheme; (b) ISK of each grapheme can activate word level procedure; (c) CSK for each grapheme can be derived from lexical constraints applied at word level; (d) CSK of each grapheme can also be derived from its morphological neighbours (the invariant cluster (Ci) it belongs to); (e) global CSK of each grapheme can provide symbolic hypothesis for a writer invariant; (f) a coherent analysis of each invariant cluster can reinforce the similar letter hypothesis for the similar patterns. Assume for example that a lexical analysis cannot disambiguate the letter hypothesis e and l for the graphemejl . Then thanks to the writer’s invariants it is possible to refer to the letter hypothesis made on graphemekm that belongs to the same cluster but occurs in a di9erent lexical context wordk . Then since there is no ambiguity in letter hypothesis of graphemekm due to its lexical context, letter hypothesis on graphemejl can be disambiguated by means of the writer’s invariants. The activation links described above provide a general framework that can be used to implement various strategies in the reading system. Depending on the strategy used, a global coherence of the recognition hypothesis can be reached at each of the two interpretation levels (Word, Grapheme). Note that the same principles of interaction could be applied between text level and word level thanks to the use of syntactical constraints. 3. Implementation within the multiple-agent paradigm The previous model shows the interest of a new organisation of the treatments. If, on the one hand, the data

modelling can be considered relatively 2xed during the whole resolution of the problem, on the other hand, the order in which the treatments are launched constitutes an important parameter of the proposed model that will directly inFuence the convergence of the system towards a satisfactory solution. The proposed model states that words sharing some common elementary patterns should not be recognised independently from each other. This approach is closely related to a classical problem of scene analysis [14]. In the literature, this approach is solved using constraints relaxation for object labelling. Let us recall that it can be decomposed into two phases, one dedicated to the labelling of objects, and the second dedicated to the determination of compatibility coeGcients. In the present case, one can consider the problem of text recognition as one of graphemes labelling under lexical and graphical constraints. Within this framework of scene analysis, the proposed approach is rather natural. However, because of the speci2city and the variability of handwriting, we think that the convergence of the relaxation process depends on a global strategy of resolution; we should say a strategy of reading. Depending on the current objective, various strategies could be applied to drive the relaxation process. Let us recall that the paradigm of distributed arti2cial intelligence has already been proposed in the 2eld of handwriting recognition [5,15] for the recognition of isolated handwritten words, using a blackboard architecture. Also note that a similar approach was developed long ago, in the 2eld of speech recognition [16]. Since the blackboard approach is dedicated to launching knowledge sources with increasing abstraction level as soon as new knowledge becomes available, the choice of the appropriate knowledge source

668

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

becomes the central problem in real applications. Some studies have therefore proposed the use of a second blackboard to solve the control problem [17]. More recently, new distributed architectures have been proposed from the primary approach of Hewitt [18] and have led to the multiple-agent paradigm [19]. BrieFy speaking agents are entities that have the ability to communicate with other agents in the same environment, have an autonomous behaviour that allows them to act according to their own goal and knowledge of their environment. This model allows distributed control, and is therefore better adapted to the implementation and the test of various strategies of control, e.g. various strategies of reading, when replaced in our context of constraints relaxation for handwritten text recognition. However, since cooperation between agents requires to share the same common data about the problem, we have found that the major drawback of the multiple-agents architecture is the necessity for agents to incorporate the data into control messages. With this perspective in mind, we have built an open platform called EMAC [20] that allows to plug expert treatments dedicated to handwriting analysis with the ability to share a common distributed workspace. This platform also gives general tools for experts to communicate either with each other thanks to the use of agent-based communication language KQML or to broadcast messages among a group of experts thanks to the presence of a broker. The broker also allows to inform agents as soon as new information becomes available in the workspace.

1 to M groups 1 to N agents / group broker broker

shared workspace Fig. 2. The EMAC model of organisation.

shared memory

capacity

state

format

control

send

local memory

peers capacity

analyse

execute

input buffer

receive

communication

Fig. 3. Internal model of the EMAC agent.

3.1. The EMAC model of organisation The EMAC model of the organisation of agents corresponds to groups of experts that have the ability to be noti2ed of the occurrence of particular events thanks to the presence of a broker within each group. This allows each agent to broadcast messages over the whole group. Note that each agent can belong to several groups if necessary, or on the contrary, it can remain alone. Both the number of groups and the number of agents per group are unrestricted. Each agent can become the member of a group as soon as it has declared itself to the broker. The communication can also take place between two agents (that belong either to the same group or to di9erent groups) thanks to simple message passing. This is an eGcient way for agents to collaborate when one of them knows the existence of the other. At last, one of the most remarkable aspects of the EMAC model of organisation and resource is the presence of common workspaces that allows agents to reach a particular information about the problem without resorting to the use of communication links. Fig. 2 gives the global overview of the EMAC model of organisation.

3.2. Internal organisation of the EMAC agent An EMAC agent has a static description that consists of a set of constant characteristics such as its name and a list of its abilities to communicate and analyse messages, access to common workspace, and 2nally to make a particular expert treatment depending on the kind of application. The dynamic behaviour of the EMAC agent is due either to external solicitations or to internal goals 2xed by the programmer of the application. Furthermore, the dynamic behaviour depends on the organisation of the capacities and knowledge of the agents. These are organised as follows (Fig. 3): Communication abilities: Messages are temporarily enqueued before being analysed by the agent by resorting to one of its abilities. Messages to be sent are also enqueued before being sent. Control abilities: These capacities are the motor of the dynamic behaviour of the agent which is constituted by deliberation-action in2nite loop. This loop includes the analysis of the current received message as well as the format of new messages to be sent.

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674 Agent behaviour

Shared worksapce Application

DREAM

KQML parser Communication Language

PVM

669

4. Using EMAC for the recognition of handwritten texts There are several software platforms based on the multiple-agent concept. However, one has to distinguish this innovative concept of distributed arti2cial intelligence from the practical implementation of complex strategies of resolution. Indeed, in spite of their attractive advantages, some questions remain for the implementation of complex systems:

Supporting

environment Fig. 4. The EMAC current architecture.

State of the agent: It is constituted by the set of current knowledge either local to the agent or shared with others as well as the knowledge of its environment, in our case the knowledge of peer addresses. 3.3. Implementation of the EMAC multiple-agent model The EMAC model has been implemented in C++ so as to provide the user with an EMAC agent class that integrates the set of the basic capacities described previously. When used in a speci2c application, a particular agent class will then inherit the EMAC agent class as well as particular classes of expert treatments, for example particular treatments dedicated to handwriting analysis. Since the EMAC model resorts to a dynamic behaviour, there was a need to choose a suitable environment able to manage communication links, to execute methods of each agent, ensure the sharing of a common workspace and at last provide agents with a standard communication language. All of these are well-known problems in the multiple-agent community and have been the object of numerous propositions [21]. The platform for EMAC is currently based on the PVM powerful architecture [22] dedicated to parallel computing in a virtual environment made by the connection of multiple machines, and is brieFy described in Fig. 4. Each EMAC agent is then implemented by a PVM task and can bene2t from the communication tools provided by this environment for sending and receiving ascii messages. The communication standard language between agents is KQML [23], and therefore each agent has the capacity to analyse a KQML message and to act according to the prede2ned performatives of this language. The last tool used in EMAC is the distributed and shared memory tool implemented using DREAM [24]. This tool ensures the sharing of the same virtual addressing space among a set of UNIX like systems. Furthermore it provides a programmable time refreshment of shared regions of memory between all the systems.

• • • •

How to break up and allocate tasks to the di9erent agents? How to coordinate agent control and communications? How to make agents interact in a coherent way? How can an agent reason on actions, plans and knowledge of other agents?

Taking into account these questions has led us to implement on EMAC our handwritten text reading system using the following steps: • Identifying agents and groups of agents: this step consists in assigning tasks to an agent or a group of agents, i.e. specifying which agent for which knowledge; • identifying the possible interactions between agents; • de2ning coordination rules for each type of interaction. These steps are described in the following sections. 4.1. Identifying agents According to the interaction model presented in Section 2 (Fig. 1), three levels are considered in our system: the text level, the word level and the grapheme level. A group of agents is associated to each of these levels in the EMAC platform. These are groups of experts in grapheme analysis and recognition, handwritten word recognition and text segmentation. Each group is made of a set of expert agents and of one broker (see Fig. 5). Each agent within a group is dedicated to the extraction and the treatment of one type of knowledge described in Section 2. The expert agents are named in relation to the speci2c knowledge they must extract or use according to their tasks. These three groups are the following: • The group of agents attached to the text level includes the following agents: segmentation into lines of text, segmentation into graphemes, segmentation into words. These agents are mainly entrusted with the extraction of intrinsic morphological knowledge at text level (IMK T). • The group of agents attached to the grapheme level includes the agents for: writer’s invariant determination, feature extraction, letter recognition and contextual information fusion. The knowledge handled by these agents is: IMK G, CMK G, ISK G and CSK G. For example, the IMK G agent has to extract or provide morphological knowledge for each grapheme, i.e. raw images, the

670

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

Text Broker

Expert Agents at Text Level

Control messages between brokers

IMK_T

Grapheme Broker

Word Broker

IMK_G

CMK_G

IMK_W

ISK_G

CSK_G

ISK_W

Expert Agents at Grapheme Level

Expert Agents at Word level

Shared Memory

Communications between expert agents through KQML messages

Communications between expert agents through shared memory

Fig. 5. Overall scheme of the system.

di9erent features extracted on the grapheme image, etc. The CMK G agent has to get contextual knowledge such as the writer’s invariants. The ISK G has to get the letter hypothesis using recognition tools provided in the system. The CSK G agent has to manage the interaction with the word level to get and update CSK for each grapheme from the writer’s invariants. • The group of agents attached to the word level includes the agents for: word recognition using Viterbi algorithm, word veri2cation in a dictionary, deriving new grapheme scores from a list of word candidates. The knowledge handled by these agents is the IMK and ISK at word level (IMK W and ISK W). The IMK W agent is entrusted with building the word image from elementary graphemes. The ISK W agent is entrusted with the symbolic knowledge at the word level such as word recognition using Viterbi algorithm and word veri2cation in a dictionary. Within each group, one agent is in charge of the control and its task is to manage the communication of asynchronous

event data between agents within the group or with the other groups. 4.2. Between and within group interactions Considering that interactions are the core of a multiple agent system, the EMAC platform provides mechanisms adapted to image processing and pattern recognition by implementing communications within and between groups. Indeed, EMAC agents are totally autonomous and independent and perform tasks in an asynchronous way. They do not know each other: they are aware of data availability only through the exchange of messages. In the framework of our system, communications through KQML messages are broken up according to two modes: • Within-group communication: in this case, an agent cooperates with the other experts to solve the problem concerning the group. An example of this within-group communication consists in sending messages to agents of the grapheme group to inform them that the feature

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

671

extraction process is 2nished and that they may have access to the feature vectors stored in the shared memory through a speci2ed address. • Between group communication: in this case, an agent can communicate with the other agents and break up its problem in sub-problems it delegates to other agents. An example of this kind of communication consists in broadcasting messages from the text broker to the other brokers to inform them that the segmentation of the text is 2nished and that they may have access to the grapheme images in the shared memory through a speci2ed address.

• When an agent has not suGcient competences, resources or information to realise its task alone: in this coordination situation, an agent can call on one of the competences of another agent to achieve a given task. • When redundancy must be avoided in the problem resolution: in this situation, an agent can verify either by sending a message or by searching in the shared memory if some speci2c knowledge has already been generated by another agent.

Note that the two modes of communication cannot be distinguished: when a data is asked to an agent, this agent need not know what will be done with this data. Moreover, each agent can both cooperate with agents of the same group and delegate to other agents or execute tasks required by others. These communications through messages are used in our system to cause, support and control interactions between agents. Di9erent types of communications through messages have been implemented. One can distinguish for example:

Within the EMAC platform, control strategies can be easily implemented thanks to the use of a high level language for communication and the presence of an agent dedicated to control within each group. Various strategies of interaction, i.e. reading strategies, can be evaluated. The following gives an overview of the various parameters on which a particular interaction scheme is based. Let us recall that the proposed adaptation scheme relies on the interaction of lexical and graphical constraints. Interaction can take place according to the following scheme:

• communications for data sharing to solve the problem; • communications used as stimuli to launch or stop agents such as those used by the brokers towards the expert agents; • communications used to provide results. Note that, in the particular case of our application, we have chosen to use the shared memory to save bulky data such as word and grapheme images. Indeed, as each group can work with a local view of the overall environment, it does not seem necessary or desirable to convey images within messages. These are rather reserved to pass symbolic knowledge. 4.3. Coordination rules Coordination is the underlying property of a multiple-agent system for carrying out a speci2ed task in a distributed environment. Coordination can be viewed as the individual behaviour of an agent which tries to realise its own task while trying to satisfy the intermediate or 2nal goals of the system. Coordination rules can be speci2ed in a multiple agent system thanks to communication mechanisms. In our system, coordinated behaviours have been speci2ed in the following situations: • When there exist dependences between actions of agents: for example, let an agent A be in charge of the extraction of ISK on graphemes and an agent B be in charge of the feature extraction (IMK). If A needs some knowledge which must be provided before by B, coordination through synchronisation consists in setting A in a waiting state until it receives a message from B to inform it about the end of treatments and the availability of generated data, either in the message or in the shared memory.

4.4. Implementation of a control strategy

1. 2. 3. 4.

Select words for recognition. Update current interpretation of each grapheme. Evaluate best word candidates for the selected words. Repeat steps 1–3 until all words have been processed.

Step one, devoted to the selection of words, is the central point of a particular strategy. A good rule would consist in the selection of words for which either lexical or graphical constraints are a priori known to bring the larger amount of information to disambiguate between word candidates. However the rule which is at present under evaluation consists in selecting every word in the text: this corresponds to a classical relaxation scheme. According to this strategy, Fig. 6 shows an operational working of the system. Groups of agents, expert agents within each group and interactive links between agents are highlighted. The main task of each expert agent within the group is also indicated. For example, the main task of the IMK G agent within the grapheme group is to extract features on each grapheme of the text. Areas of shared data can be created, attached and updated by any agent in the system thanks to DREAM which ensures the management of these operations. The system works as follows: 1. Text, word and grapheme brokers are launched and set in a waiting state. The text broker informs the IMK T expert agent that it can extract IMK at text level (segmentation of the text). 2. The text broker sends a message to the other brokers to inform them of the end of treatment and addresses of shared data in the shared memory.

672

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674 2

Text Broker

1

End text segmentation message

KQML

IMK_T

Text Segmentation

Text Image Expert Agents at Word Level

Expert Agent Trigger Action

Grapheme Broker

1

Expert Agents at Grapheme Level

Treatment Task Creation

KQML

Feature Extraction 3

Word Broker

1

Invariant Determination 3

KQML

IMK_G

CMK_G IMK_W

Letter Recognition

5

CSK_G Fusion

ISK_W 6

4

Word Recognition

KQML

CSK_G

ISK_G

Expert Agents at Word Level

Attachments Creation

Attachments

Attachment Creation

Feature Vectors (3 levels)

Attachment

Grapheme CSK (3 levels)

Shared Memory Grapheme Images (3 levels)

Grapheme ISK (3 levels)

Fig. 6. Operational description of the system.

3. At the same time, the grapheme broker informs the IMK G agent that it can begin the feature extraction process on all the graphemes of the text and the CMK G agent that it can derive the writer’s invariants. 4. Following the signal that IMK for each grapheme is at one’s disposal in the shared memory, the ISK G agent performs its letter recognition task on all the graphemes. 5. As soon as grapheme ISK is at one’s disposal, the ISK W agent begins the word recognition process. 6. At the end of the word recognition process, the CSK G agent performs the CSK fusion task on all the graphemes. Interaction between CSK G and ISK W agents (i.e. implementation of the adaptation process) is achieved through

messages by alternatively performing word recognition tasks and CSK fusion tasks on graphemes (loop on steps 5 and 6). 5. Experimental results The handwritten texts used for these experiments have been scanned at 300 dpi and binarised. The overall database includes 66 handwritten texts, each one written by a di9erent writer. Each writer has been asked to write the same text made up of 106 words within a lexicon of 71 words. The only constraints imposed to each writer were to respect the alignment of words on each handwriting line and to space out each line in order to avoid as much as possible line

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

Fig. 7. Samples of handwritings; the same word written by di9erent writers.

Table 1 Average word recognition rates on the 15 writers over three consecutive iterations Iterations

TOP1

TOP5

TOP10

TOP20

0 1 2

80.63 83.02 83.33

90.69 95.22 95.35

92.64 96.16 96.23

94.03 96.79 96.86

segmentation problems. Our omni-writer recognition system has been used to localise and to label each word and each grapheme in the image. Finally, 15 texts have been retained to test our word recognition system (samples of the same word extracted from the 15 texts are presented in Fig. 7), while 51 have been used for training. Table 1 shows the mean contribution of the adaptation strategy described in Section 4.2 at word level on the 15 texts. This table presents the word recognition performance over three consecutive iterations and according to the length of the retained list of solutions. The 2rst iteration (iteration 0) represents the output of the ascending phase of recognition of words without adaptation. The rates of recognition by adaptation are presented in the two following iterations (1 and 2). The overall results show a global improvement of word recognition performance at the end of the 2rst iteration, the contribution of the other iterations being less signi2cant. These primary results show the interest of the approach, which is able to gain nearly 5% in the top 2ve propositions. Even if the overall contribution remains modest, the analysis of these results shows that the adaptation improves the system performance with a gain of up to 9% for some writers. To illustrate more clearly the behaviour of our adaptation strategy, we present in Fig. 8 the improvements of character recognition rates per writer, i.e. TOP1 recognition rates at character level between iteration 0 (without adaptation) and iteration 1. These results show that we get, at the end of the 2rst iteration, a signi2cant improvement in recognition rates

673

Fig. 8. Adaptation contribution at character level per writer from iteration 0 –1.

at character level for all the writers, which means that our system does learn the handwriting to be recognised. 6. Conclusion and future work In the framework of a handwritten text recognition application, we have developed a multiple agent system able to manage interaction between di9erent contextual levels of handwriting interpretation. For this purpose, we have built the EMAC environment which has been speci2ed from constraints imposed by our handwriting interpretation system. Several interesting aspects must be pointed out to highlight our approach. The proposed model allows to make several levels of interpretation cooperate in our reading system. Thanks to the shared memory, the proposed solution has been built to exploit data and agents as quickly as possible; moreover, the overall status of the system at a given moment is easier to determine. By introducing the notion of group of agents, it is possible to specify treatments on several interpretation levels which better express the application domain. The distributed and asynchronous working of the agents makes an implementation on several computers possible. Communication tasks are based on a high level communication language (KQML); this allows a real cooperation between agents. Finally, our system architecture allows to combine easily several complementary strategies by implementing speci2c collaboration schemes between agents. Also note that the EMAC model has been speci2ed for recognising handwritten documents. But the underlying principles of our model can be easily extended to other computer vision applications. References [1] S.N. Srihari, Recognition of handwritten and machine-printed text for postal address interpretation, Pattern Recognition Lett. 14 (4) (1993) 291–302. [2] Y.Y. Tang, S.W. Lee, C.Y. Suen, Automatic document processing: a survey, Pattern Recognition 29 (12) (1996) 1931–1952.

674

L. Heutte et al. / Pattern Recognition 37 (2004) 665 – 674

[3] G. Kim, V. Govindaraju, A lexicon driven approach to handwriting word recognition for real-time application, IEEE Trans. PAMI 18 (4) (1997) 366–379. [4] M. Shridhar, G. Houle, F. Kimura, Handwritten word recognition using lexicon free and lexicon directed word recognition algorithms, Proceedings of the ICDAR’97, Ulm, Germany, 1997, pp. 861–865. [5] P.E. Bramal, C.A. Higgins, A cursive recognition system based on human reading models, Mach. Vision Appl. 8 (1995) 224–231. [6] A. Leroy, Correlation between handwriting characteristics, in: M.L. Simner, C.G. Leedham & A.J.W.M. Thomassen (Eds.), Handwriting and Drawing Research: Basic Applied Issues, Amsterdam, IOS Press, 1996, pp. 403– 417. [7] C. Farouz, M. Gilloux, J.M. Bertille, Handwritten word recognition with contextual hidden markov models. Proceedings of the Sixth IWFHR, Korea, 1998, pp. 133–142. [8] B. Plessis, A. Sicsu, L. Heutte, E. Menu, E. Lecolinet, O. Debon, J.V. Moreau, A multi-classi2er combination strategy for the recognition of handwritten cursive words, Proceedings of the ICDAR’93, Japan, 1993, pp. 642– 645. [9] J.P. Crettez, A set of handwriting families: style recognition, Proceedings of the ICDAR’95, Montreal, Canada, 1995, pp. 489 – 494. [10] L. Schomaker, G. Abbink, S. Selen, Writer and writing-style classi2cation in the recognition of online handwriting, Proceedings of the European Workshop on Handwriting Analysis and Recognition, London, England, 1994. [11] A. Nosary, L. Heutte, T. Paquet, Y. Lecourtier, De2ning writer’s invariants to adapt the recognition task, Proceedings of the ICDAR’99, India, 1999, pp. 765 –768. [12] G. Casey, E. Lecolinet, A survey of methods of segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 18 (7) (1996) 690–706. [13] L. Heutte, T. Paquet, J.V. Moreau, Y. Lecourtier, C. Olivier, A structural/statistical feature based vector for handwritten

[14] [15]

[16]

[17] [18] [19] [20] [21] [22]

[23]

[24]

character recognition, Pattern Recognition Lett. 19 (7) (1998) 629–641. R. Duda, P. Hart, Pattern Classi2cation and Scene Analysis, Wiley, New York, 1973. A. BelaTUd, Y. Chenevoy, H. Lˆaasri, B. MaˆUtre, GRAPHEIN: un systWeme aW base de connaissance pour la reconnaissance de l’Xecriture, 7Weme congrWes AFCET-RFIA, Paris, France, 1989, pp. 1067–1074. L.D. Herman, F. Hayes-Roth, V.R. Lesser, R.D. Reddy, The HEARSAY-II speech understanding system: integrating knowledge to resolve uncertainty, ACM Comput. Surv. 12 (1980) 213–253. B. Hayes-Roth, Blackboard architecture for control, Artif. Intell. 26 (1985) 251–321. C.E. Hewitt, Viewing control structures as pattern of passing messages, Artif. Intell. 8 (1977) 323–364. E.H. Durfee, T.R. Lesser, D.D. Corkill, Coherent cooperation among communicating problem solvers, IEEE Trans. Comput. 36 (11) (1987) 1275–1291. C. Hernoux, EMAC, Un environnement Multi-Agents aW mXemoire Collective, MXemoire d’ingXenieur, CNAM, CRA de Rouen, Juin 1999. J. Ferber, Les systWemes multi-agents, vers une intelligence collective, InterEditions, Paris, 1995. A. Geist, A. Geguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderman, PVM: Parallel Virtual Machine, A user’s guide an Tutorial for Networked Parallel Computing, The MIT Press Cambridge, London, England, 1994. T. Finin, D. McKay, R. Fritzson, R. McEntire, KQML: an information and knowledge exchange protocol, in: K. Fuchi and T. Yokoi (Eds.), Knowledge Building and Knowledge Sharing. Ohmsha and IOS Press, 1994. C. Dumoulin, DREAM: Une mXemoire partagXee rXepartie aW cohXerence programmable, ThWese de doctorat de l’USTL, Lille, France, 1997.

About the Author—LAURENT HEUTTE received his Ph.D. degree in computer engineering from the University of Rouen, France, in 1994. He worked on handwritten character recognition for the automatic reading of French postal checks and US postal addresses at MATRA MS& I, France, from 1992 to 1994. He is currently an Associate Professor in computer engineering at the University of Rouen. Dr. Heutte’s present research interests are multiple classi2er systems, o9-line cursive handwriting recognition, writer identi2cation, handwritten document layout analysis and information extraction from handwritten documents. About the Author—ALI NOSARY received his Ph.D. degree in computer engineering from the University of Rouen, France, in 2002. From 1997 to 2002, he worked on o9-line handwritten text recognition through writer adaptation. Since 2002, Dr. Nosary is an Associate Professor in computer engineering at the University of Sana’a, Yemen. About the Author—THIERRY PAQUET received the Ph.D. degree in computer engineering from the University of Rouen, France, in 1992. Then he was appointed as an Associate Professor at the university of Rouen, where his research concerns handwriting recognition using stochastic models. In 2002, Dr. Paquet was appointed as a full professor at the University of Rouen. His current research include handwritten document analysis and querying through graphical and textual content by means of stochastic models. Pr. Paquet is president of the French association for written communication.

A multiple agent architecture for handwritten text ... - Laurent HEUTTE

des documents recommandant