Extending Lambek grammars: a logical account of ... - Alain Lecomte

generally, items) onto a logical formula, called the (syntactic) type ..... similar to the tree obtained for our rst lit- tle example (peter ... The mathematics of sentence ...
239KB taille 2 téléchargements 278 vues
Extending Lambek grammars: a logical account of minimalist grammars Alain Lecomtey and Christian Retorez yUFR

"Sciences de l'Homme et de la Societe", Universite Pierre Mendes-France, BSHM - 1251 Avenue Centrale, Domaine Universitaire de St Martin d'Heres BP 47 - 38040 GRENOBLE cedex 9, France [email protected]

zIRIN,

Universite de Nantes 2, rue de la Houssiniere BP 92208 44322 Nantes cedex 03, France [email protected]

Paper ID: ACL-2001-0075 Keywords: syntax, minimalist grammars, categorial grammars, resource logics, Montague

semantics

Contact Author: author of record (for correspondence) Under consideration for other conferences (specify)? Not submitted elsewhere. Partly presented to Formal Grammar 99 and Logic Language and Computation | workshops without proceedings

Abstract

We provide a logical denition of Minimalist grammars, that are Stabler's formalization of Chomsky's minimalist program. Our logical denition, even simpler than the original one, leads to: - a neat relation to categorial grammar, yielding a treatment of Montague semantics. - a parsing-as-deduction in some resource sensitive logic - a learning algorithm from structured data based on a typing-algorithm and type-unication. Our view of minimalist grammars also is an extension of Lambek grammars: we keep their radical lexicalism and logical view. The generative capacity is increased by using a mixed commutative / non commutative logic due to de Groote, and this logic is not used as in Lambek grammars: - product is essential, since it encodes movement - up to now hypothetical reasonning is not needed, i.e. we only have elimination rules as in classical (AB) categorial grammars or combinatory categorial grammars - the proof determines the consumption of the valencies - but word order is computed from the proof by a simple device (the relation between word-order and valency-consumption is more exible than in Lambek grammars). This allows for a proper account of sophisticated syntactic contructions (expletives, long-distance dependencies,. . . ) and to compute Montague-like semantics from syntactic analyses.

Extending Lambek grammars: a logical account of minimalist grammars Paper-ID: ACL-2001-0075

Abstract We provide a logical denition of Minimalist grammars, that are Stabler's formalization of Chomsky's minimalist program. Our logical denition, even simpler than the original one, leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithm from structured data (based on a typing-algorithm and type-unication). Here we emphasize the connection to Montague semantics which can be viewed as a formal computation of the logical form.

1 Presentation

The connection between categorial grammars (expecially in their logical setting) and minimalist grammars, which has already been observed and discussed (Retore and Stabler, 1999), deserve a further study: although they both are lexicalized, and resource consumption (or feature checking) is their common base, they dier in various respects. On the one hand, traditional categorial grammar has no move opertation, andf usually have a poor generative capactity unless the good properties of a logical system are damaged, and on the other hand minimalist grammars even though they were provided with a precise formal denition (Stabler, 1997), still lacks some computational properties that are crucial both from a theoretical and a practical viewpoint. Regarding applications, one needs parsing, generation or learning algorithm, and, considering more conceptual aspect-

s, such algorithms are needed too to conrm or inrm linguistic claims regarding economy or eciency. Our claim is that a logical treatment of these grammars leads a simpler description and well dened computational properties. Of course among these aspects the relation to semantics or logical form is quite important it is claimed to be a central notion in minimalism, but logical forms are rather obscure, and no computational process from syntax to semantics is suggested. Our logical presentation of minimalist grammar is a rst step in this direction: to provide a description of minimlist grammar in a logical setting immediately set up the computation framework regarding parsing generation and even learning, but also yields some good hints on the computational connection with logical forms. The logical system we use, a slight extension of (de Groote, 1996), is quite similar to the famous Lambek calculus (Lambek, 1958), which is known to be a neat logical system. This logic has recently shown to have good logical properties like the subformula property which are relevant both to linguistics and computing theory (e.g. for modelling concurrent processes). The logic under consideration is a superimposition of the Lambek calculus (a non commutative logic) and of intuitionistic multiplicative logic (also known as Lambek calculus with permutation). The context, that is the set of current hypotheses, are endowed with an order, and this order allows for a distinction between unordered features (commutative product) and ordered features (non commutative product). There is nevertheless a relation between the products, or orders: some rules allows allows for ordered formulae to become unordered, while the converse is not allowed. Having this logical description of syntactic

analyses allows to reduce parsing (and production) to deduction, and to extract logical forms from the proof with a close connection as the one between analyses and Lambek grammars and Montague semantics.

2 The grammatical architecture

The general picture of these logical grammars is as follows. A lexicon maps words (or, more generally, items) onto a logical formula, called the (syntactic) type of the word. Types are dened from syntactic of formal features P (which are propositional variables from the logical viewpoint):  categorial features (categories) involved in merge : base = fc t v d n : : :g  functional features involved in move : fun = fk K wh : : :g The connectives in the logic for constructing formulae are the Lambek implications (or slashes) n = and product  together with the commutative product of linear logic .1 Once an array of items has been selected, a sentence (or any phrase) is a deduction of IP (or of the phrasal category) under the assumptions provided by the syntactic types of the involved items. This rst step works exactly as Lambek grammars, except that the logic and the formulae are richer. Now, in order to compute word order, we proceed by labelling each formula in the proof. These labels, that are called phonological and semantic features in the transformational tradition, are computed from the proofs and consist of two parts that can be superimposed: a phonological label, denoted by =word=, and a semantic label2 denoted by (word) | the superimpostion of both label being denoted by word. The reason for having such a double labelling, is that, as usual in minimalism, semantic and phonological 1 The logical system also contains a commutative implication, ;, but it does not appear in the lexicon, and because of the subformula property, it is not needed for the proofs we use. 2 We prefer semantic label to logical form not to confuse logical forms with the logical formulae present at each node of the proof.

features can move separately. It should be observed that the labels are not some extraneous information indeed the whole information is encoded in the proof, and the labelling is just a way to extract the phonological form and the logical form from the proof. We rather use chains or copy theory than movements and traces: one a label or one aspect (semantic or phonological) has been met it should be ignored when it is met again. For instance a label Peter(Mary)lovesMary corresponds to a semantic label (Peter)(Mary)(love) and to the phonological form =Peter==loves==Mary=.

3 Logico-grammatical rules for merge and phrasal movement

Because of the sub-formula property we need not present all the rules of the system, but only the ones that can be used accoridng to the types that appear in the lexicon. Further more, up to now there is no need to use introduction rules (called hypothetical reasoning in the Lambek calculus): so our system looks more like Combinatory Categorial Grammars or classical AB-grammars. Nevertheless some hypothesis can be cancelled during the derivation by the product-elimination rule. This is essential since this rule is the one representing chains or movement. We also have to specify how the labels are carried out by the rules. At this point some non logical properties can taken into account, for instance the strength of the features, if we wish to take them into account. They are denoted by lower-case variables. The rules of this system in a Natural Deduction format are: ; ` x : A=B  ` y : B =E ] ;  ` xy : A  ` y : B ; ` x : B nA nE ]  ; ` yx : A ;(1  2 )] ` A entropy ;(1  2 )] ` A ; `  : A  B  x : A y : B `  : C E ] ;  `  =fx yg] : C

This later rule encodes movement and deserve special attention. The label  =fx yg] means the substitution of  to the unordered set fx, yg that is the simultaneous substitution of  for both x and y, no matter the order between x and y is. Here some non logical but linguistically motivated distinction can be made. For instance according to the strength of a feature (e.g. weak case k versus strong case K), it is possible to decide that only the semantic part that is () is substituted with x. In the gure 3, the reader is provided with an example of a lexicon and of a derivation. The resulting label is (abook)readsabook phonological form is =reads==abook= while the resulting logical form is (abook)(reads). Observe that language variation from SVO to SOV does not change the analysis. To obtain the SOV word order, one should simply use K instead of k in lexicon, and use the same analysis. The resulting label would be abookreadsabook which yields the phonological from =abook==reads= and the logical form remains the same (abook)(reads). Observe that although entropy which suppress some order has been used, the labels consists in ordered sequences of phonological and logical forms. It is so because when using / E] and n E], we necessarily order the labels, and this order is then registered inside the label and never destroyed, even when using the entropy rule: at this moment, it is only the order on hypotheses which is relaxed. In order to represent the minimalist grammars of (Stabler, 1997), the above subsystem of PdG is enough and the types appearing in the lexicon also are a strict subset of all possible types:

Denition 1 MG -proofs contain only three kinds of steps: 

implication steps (elimination rules for / and n)



tensor steps (elimination rule for )



entropy steps (entropy rule)

Denition 2 A lexical entry consists in an axiom ` w : T where T is a type:

((F2 n(F3 n:::(Fn n(G1 G2 :::Gm A))))=F1 ) where: 

m and n can be any number greater than or equal to 0,



F1 , ..., Fn are attractors,



G1 , ..., Gm are features,



A is the resulting category type

Derivations in this system can be seen as T-markers in the Chomskyan sense. /E] and nE] steps are merge steps. E] gives a coindexation of two nodes that we can see as a move step. For instance in a tree presentation of natural deduction, we shall only keep the coindexation (corresponding to the cancellation of A and B : this is harmless since the conclusion is not modied, and make our natural deduction T-markers. Such lexical entries, when proceeded with MG -rules include to Stabler minimalist grammars this system nevertheless overgenerate, because some minimalist principles are not yet satises: they correspond to constraints on derivations.

3.1 Conditions on derivations

The restriction which is still lacking concerns the way the proofs are built. Observe that this is an algorithmic advantage, since it reduce the search space. The simplest of these restriction is the following: the attractor F in the label L of the target  locates the closest F' in its domain. This simply corresponds to the following restriction.

Denition 3 (Shortest Move) : A

MG proof is said to respect the shortest move condition if it is such that hypotheses are discharged in a First In, First Out order.

Figure 1: reads a book

reads ::= a ::= book ::=

` reads : ((knvp)=d) ` a : ((d  k)=n) ` book : n ` reads : ((knvp)=d)

x:d`x:d

x : d ` reads x : (knvp) y:k`y:k nE ] ` a : ((d  k)=n) ` book : n y : k x : d ` y reads x : vp =E ] entropy] ` a book : d  k y : k x : d ` y reads x : vp E ] ` (a book ) reads a book : vp

4 Extension to head-movement

We have seen above that we are able to account for SVO and SOV orders quite easily. Nevertheless we could not handle this way VSO language. Indeed this order requires headmovement and head-movement is also needed for the head-movement of the verb to the inexion node which is needed for the verb subject agreement. In order to handle head-movement, we shall use the non-commutative product  as whose elimination rule is quite similar to the commutative product. ; `  : A  B (x : A y : B )] `  : C E ] ;  `  =(x y)g] : C Accordingly types will be not only of the shape given in denition ?? but can also be non-commutative product of such types. The non commutative product is needed because of the following linguistic constraint: a head-movement never crosses another headmovement. Nevertheless it is possible that a headmovement crosses a phrasal movement. Our logical system is well designed for this possibility. Indeed the possibility to relax the order among hypotheses, expressed by the following rule, excalty allows for headmovement to cross phrasal ones, without allowing that head-movement to corss other head-movements.

=E ]

!(; f  g)] ` C MA] !f(; )  g)] ` C As a rst example, let us take the very simple example of: 0

0

peter loves mary

. Starting from the following lexicon in gure 4 we can build the tree given in the same gure it represents a natural deduction in our system, hence a syntactic analysis. The resulting phonological form is =Peter==loves==Mary= while the reulsting logical form is (Peter)(Mary)(loves) | the possibility to obtain SOV word order with a K instead of a k also applies here.

5 The interface between syntax and semantics In categorial grammar (Moortgat, 1996), the production of logical forms is essentially based on the association of pairs < string type > with lambda terms representing the logical form of the items, and on the application of the Curry-Howard homomorphism: each (= or n) -elimination rule translates into application and each introduction step into abstraction. Compositionality assumes that each step in a derivation is associated with a semantical operation. In generative grammar (Chomsky, 1995), the production of logical forms is in last part

Figure 2: Peter loves Mary

loves ::= peter ::= mary ::=

` loves : ((knip)=vp)  ((kn(dnvp))=d) ` peter : k  d ` mary : k  d

ip

peter (peter) k

(knip)

1

loves3

((knip)=vp)

vp

(dnvp)

d1

(mary) 2

k

(kn(dnvp)) (to love)

((kn(dnvp))=d)3 of the derivation, performed after the socalled Spell Out point, and consists in movements of the semantical features only. Once this is done, two forms can be extracted from the result of the derivation: a phonological form and a logical one. These two approaches are therefore very dierent, but we can try to make them closer by replacing semantic features by lambdaterms and using some canonical transformations on the derivation trees. Instead of converting directly the derivation tree obtained by composition of types, something which is not possible in our translation of minimalist grammars (we shall see why latter on), we extract a logical tree from the previous, and use the operations of CurryHoward on this extracted tree. Actually, this extracted tree is also a deduction tree: it represents the proof we could obtain in the semantic component, by combining the semantic types associated with the syntactic ones (by a homomorphism H to specify). Such a proof is in fact a proof in implicational intuitionistic linear logic.

mary d2

5.1 Logical form for example 4

Coindexed nodes refer to ancient hypotheses which have been discharged simultaneously, thus resulting in phonological features and semantical ones at their right place3 . By extracting the subtree the leaves of which are full of semantic content, we obtain a structure that can be easily seen as a composition: (peter)((mary)(to love)) If we replace these "semantic features" by terms, we have: ( u:u(peter) ( u:u(mary) x: y:love(y x))) This shows that necessarily raised constituants in the structure are not only "syntactically" raised but also "semantically" lifted, in the sense that u:u(peter) is the high order representation of the individual peter.

5.2 Subject raising

Let us look at now the example: mary seems to work 3 For the time being, we make abstraction of the representation of time, mode, aspect... that would be supported by the inection category.

Figure 3: Mary seems to work

seems ::= mary ::= to work ::=

` seems : ((knip)=vp)  (vp=vp) ` mary : d  k ` to work : (dnvp)

ip

mary (mary)

(knip)

1

k

seems2

((knip)=vp)

vp

(to seem)

(vp=vp)2

vp d1

From the lexicon in gure 5.2 we obtain the deduction tree given in the same gure. This time, it is not so easy to obtain the logical representation:

seem(to work(mary)) The best way of doing consists in assuming that:  rst, the verbal innitive head (here to work) applies to a variable x which occupies the d-position,  then, the semantics of the main verb (here to seem) applies to the result, in order to obtain seem(to work(x)),  the x variable is abstracted in order to obtain x:seem(to work(x)) just before the semantic content of the specier (here the nominative position, occupied by u:u(mary)) applies. This shows that the semantic tree we want to extract from the derivation tree in types logic is not simply the subtree the leaves of which are semantically full. We need in fact some transformation which is simply the stretching of some nodes. These stretchings correspond to !-introduction steps in a Natural deduction tree. They are allowed each

to work (to work)

(dnvp)

time a variable has been used before, which is not yet discharged and they necessarily occur just before a semantically full content of a specier node (that means in fact a node labelled by a functional feature) applies. Actually, if we say that the tree so obtained represents a deduction in a ND-format, we have to say what formulae it uses and what formula it demonstrates. We must therefore dene a homomorphism between syntactic and semantic types. Let H be this homomorphism. We shall assume:

t, H(vp)2ft,(e ! t)g, ( )=e,  H(anb)=H(b=a)= (H(a) !H(b)),  8f, H(f)2f((e ! X) ! X),(X ! X )g 4.  H(ip)= H d

With this homomorphism of labels, the transformation of trees consisting in stretching "intermediary projection nodes" and erasing leaves without semantic content, we obtain X is a variable of type, something that can be seen at rst sight as a possible cause of undecidability, in fact we shall see later on that the instanciation of X is always strightforward. Moreover, when f is of type (X ! X ), it is in fact endowed with the identity function. 4

Figure 4: Mary seems to work

mary peter loves seems to work

::= fmary g : k  d ::= fpeter g : k  d ::= loves : ((knip)=vp)]   : ((kn(dnvp))=d)] ::= seems : ((knip)=vp)]   : (vp=vp)] ::= to work : (dnvp)] Unfortunately, such rigid assignment cannot be made in all cases. For instance, for phrasal movement (say of a d to a k) that depends of course on the particular k-node in the tree (for instance the situation is not necessary the same for nominative and for accusative case). In such cases, we may assume that multisets are associated with lexical entries instead of vectors. We can therefore assume phonological assignments like the ve rst ones in gure 4.

from the derivation tree of the second example, the following "semantic" tree: seem(to work(mary))

t

u:u(mary)

((e ! t) ! t)

x:seem(to work(x))

(e ! t)1

t v:seem(v)

(t ! t)

to work(x)

t

y:to work(y)

(e ! t) where coindexed nodes are linked by the discharging relation. Let us notice that the characteristic weak or strong of the features may often be encoded in the lexical entries. For instance, Headmovement from V to I is expressed by the fact that tensed verbs are such that:  the full phonology is associated with the inection component,  the empty phonology and the semantics are associated with the second one,  the empty semantics occupies the rst one5 6

We must not confuse the "empty" semantics and the identity function. Empty semantics means that the node will be really empty, and therefore erased when passing from the syntactic tree to the semantical one. Nodes a ected by the identity function are not erased, their semantical content is simply used in order to preserve the semantics obtained in the previous steps. 6 This is correct as long we don't take a semantical representation of tense and aspect in consideration. 5

5.3 Reexives

Let us try now to enrich this lexicon by considering other phenomena, like reexive proe1 nouns. The assignment for himself is given in gure 5.3 | where the semantical type of himself is assumed to be ((e ! (e ! t)) ! (e ! t)). We obtain for paul shaves himself as the syntactical tree something similar to the tree obtained for our rst little example (peter loves mary), and the semantic tree is given in gure 5.3. x

6 Remarks on parsing and learning

In our setting, parsing is reduced to proof search, it is even optimized proof-search: indeed the restriction on types, and on the structure of proof imposed by the shortest move principle and the absence of introduction rules considerably reduce the search space, and yields a polynomial algorithm. Nevertheless this is so when traces are known: otherwise one has to explore the possible places of theses traces. Here we did focus on the interface with semantics. Another excellent property of categorial grammars is that they allow, especially when there are no introduction rules for learning algorithms, which are quite ecient when

Figure 5: Computing a semantic recipe: shave himself

shaves ::= shaves :  : ((knip)=vp)]   : x: y:shave(y x) : ((kn(dnvp))=d)] himself ::=  : u: z:u(z z) : k]  himself : x : d] shave(paul,paul)

t

u:u(paul)

z:shave(z z)

((e ! t) ! t)

(e ! t)2 shave(z,z)

t

z:shave(z z)

z

e

(e ! t)

2

u:z:u(z z)

((e ! (e ! t)) ! (e ! t))

x:y:shave(yx)

(e ! (e ! t)1) y:shave(yx)

(e ! t)

x:y:shave(yx)

(e ! (e ! t))

applied to structured data. This kind of algorithm applies here as well when examples are derivation. Indeed the algorithm consists in computing a most general typing to a derivation and then to unify the types of the same word in dierent examples or positions. Appplied to our derivation this learning algorithm works just the same: there are also most general types for derivations, and unication works just the same. Nevertheless, because of movement learning from string which is possible for usual categorial grammars by trying any possible derivation, is much more complicated.

7 Conclusion In this paper, we have tried to bridge a gap between minimalist program and the logical view of categorial grammar. We thus obtained a description of minimalist grammars which is quite formal and allows for a better interface with semantics, and some usual algorithms for parsing and learning.

References

x

e1

Noam Chomsky. 1995. The minimalist program. MIT Press, Cambridge, MA. Philippe de Groote. 1996. Partially commutative linear logic: sequent calculus and phase semantics. In Michele Abrusci and Claudia Casadio, editors, Third Roma Workshop: Proofs and Linguistics Categories { Applications of Logic to the analysis and implementation of Natural Language, pages 199{208. Bologna:CLUEB. Joachim Lambek. 1958. The mathematics of sentence structure. American mathematical monthly, 65:154{169. Michael Moortgat. 1996. Categorial type logic. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language, chapter 2, pages 93{177. North-Holland Elsevier, Amsterdam. Christian Retore and Edward Stabler. 1999. Resource logics and minimalist grammars: introduction. In Christian Retore and Edward Stabler, editors, Resource Logics and Minimalist Grammars, European Summer School in Logic Language and Information, Utrecht. FoLLI. RR-3780 http://www.inria.fr/RRRT/publicationseng.html.

Edward Stabler. 1997. Derivational minimalism. In Christian Retore, editor, Logical Aspects of Computational Linguistics, LACL`96, volume 1328 of LNCS/LNAI, pages 68{95. SpringerVerlag.