Parse Tree .fr

May 6, 2006 - from this class, but also most effective. ... Conversational Agent ..... 50. Dialogue act. ▫ An act with associated structural information related.
346KB taille 9 téléchargements 309 vues
Artificial Intelligence Natural Language Processing Hoá NGUYEN College of Technology, Vietnam National University, Hanoi

6 May 2006

[email protected]

Agenda „

Introduction : From words to meaning

„

Difficulties

„

Understanding

„

Generation

„

Applications: Spoken Dialogue System

AI++ - Hoá NGUYEN @ 2006

2

Introduction „

What is NLP? ‰

„

Systems using natural language as modality to interact with users

NLP concerns: ‰ ‰

Understanding spoken and written language. Generating written and spoken language. Language

Systems

Language

AI++ - Hoá NGUYEN @ 2006

3

Alternative Views on NLP „

Computational models of human language processing ‰

„

Computational models of human communication ‰

„

Programs that operate internally the way humans do

Programs that interact like humans

Computational systems that efficiently process text and speech

AI++ - Hoá NGUYEN @ 2006

4

Language Engineering Discourse Pragmatic

Speech Applications

Semantic Syntactic Lexical Phonetic

Core speech technologies

Articulatory Acoustic

AI++ - Hoá NGUYEN @ 2006

5

Applications… „

Translation

Tout ce que vous produisez pour credit dans ce cours doit etre votre propre travail. Vous pouvez parler avec les autres etudiants (et les professeurs) de votre approche du probleme, mais ensuite vous devez rsoudre le probleme par vous-meme. Ce n’est pas seulement la facon la plus ethique d’apprendre le contenu de cette classe, mais aussi la plus efficace.

AI++ - Hoá NGUYEN @ 2006

Everything you do for credit in this subject is supposed to be your own work. You can talk to other students (and instructors) about approaches to problems, but then you should sit down and do the problem yourself. This is not only the ethical way but also the only effective way of learning the material.

All that you produce for credit in this course must be your own work. You can speak with the other students (and the professors) about your approach about the problem, but then you must solve the same problem by you. It is not only the most ethical way to learn the contents from this class, but also most effective. 6

Applications… „

Information Extraction: Map a document collection to structured database Firm XYZ is a full service advertising agency specializing in direct and interactive marketing. Located in Bigtown CA, Firm XYZ is looking for an Assistant Account Manager to help manage and coordinate interactive marketing initiatives for a marquee automative account. Experience in online marketing, automative and/or the advertising field is a plus. Assistant Account Manager Responsibilities Ensures smooth implementation of programs and initiatives Helps manage the delivery of projects and key client deliverables ... Compensation: 50,000-$80,000 Hiring Organization: Firm XYZ

INDUSTRY POSITION LOCATION COMPANY SALARY

Advertising Assistant Account Manager Bigtown, CA. Firm XYZ $50,000-$80,000

AI++ - Hoá NGUYEN @ 2006

7

Applications… „

Text summarization:

AI++ - Hoá NGUYEN @ 2006

8

Applications „

Conversational Agent S1: Hello. You’ve reached the [Communicator.] Tell me your name U2: Hi I’d like to fly to Seattle Tuesday morning S3: Travelling to Seattle on Tuesday, August 11th in the morning. U4: Yes. S5: Your full name? U6: John Doe …

„

Other NLP applications ‰ ‰ ‰ ‰

Grammar Checking Sentiment Classification Report Generation …

AI++ - Hoá NGUYEN @ 2006

9

Agenda „

Introduction : From words to meaning

„

Difficulties

„

Understanding

„

Generation

„

Applications: Spoken Dialogue System

AI++ - Hoá NGUYEN @ 2006

10

Why is NLP hard? „

A lot of difficult problems ‰ ‰ ‰ ‰ ‰ ‰ ‰

„

ambiguity anaphora indexicality discourse structure metonymy metaphor …

Example: “At last, a computer that understands you like your mother”

AI++ - Hoá NGUYEN @ 2006

11

Ambiguity „

Acoustic level (speech recognition) ‰ ‰

„

“ . . . a computer that understands you like your mother” “ . . . a computer that understands you lie cured mother”

Syntactic level

AI++ - Hoá NGUYEN @ 2006

12

Ambiguity „

Word sense ambiguity – Semantic meaning level ‰

Two definitions of “mother” „ „

„

a woman who has given birth to a child a stringy slimy substance consisting of yeast cells and bacteria; is added to cider or wine to produce vinegar

Discourse level ‰

‰

Alice says they’ve built a computer that understands you like your mother. But she doesn’t know any details ! Anaphora problem !

AI++ - Hoá NGUYEN @ 2006

13

Anaphora „

Using pronouns to refer back to entities already introduced in the text ‰

‰ ‰ ‰

After Mary proposed to John, they found a preacher and got married. For the honeymoon, they went to Hawaii Mary saw a ring through the window and asked John for it Mary threw a rock at the window and broke it

AI++ - Hoá NGUYEN @ 2006

14

Other problems „

Indexicality: Indexical sentences refer to utterance situation (place, time, S/H, etc.) ‰ ‰

„

Metonymy: Using one noun phrase to stand for another ‰ ‰ ‰

„

I am over here Why did you do that?

I've read Shakespeare Chrysler announced record profits The ham sandwich on Table 4 wants another beer

Metaphor: “Non-literal” usage of words and phrases, often systematic: ‰

I've tried killing the process but it won't die. Its parent keeps it alive.

AI++ - Hoá NGUYEN @ 2006

15

Agenda „

Introduction : From words to meaning

„

Difficulties

„

Understanding

„

Generation

„

Applications: Spoken Dialogue System

AI++ - Hoá NGUYEN @ 2006

16

Knowledge Required „

What knowledge do we (as humans) use to make sense of language? ‰

Knowledge of how words sound „

‰

Knowledge of how words can be composed into sentences (grammar) „ „

‰

“cat” == “c” “a” “t”

The can sat on the mat OK sat mat can on the NO

Knowledge of people, events, the world, types of text. „ „

Recognizing adverts for what they are. Understanding indirect requests “I don’t quite understand this” as request for help.

AI++ - Hoá NGUYEN @ 2006

17

Components of NL „

Sound structure: phonetics, phonology

„

Word structure: morphology and morphophonemic

„

Phrase structure – syntax: combinations of words

„

Semantic structure: meaning of utterance/phrase

„

Pragmatic and discourse structure: reasoning about the actions, beliefs, causes, intentions…

AI++ - Hoá NGUYEN @ 2006

18

Stages of processing To deal with complexity, can process language in series of stages: „ speech recognition ‰

„

syntactic analysis ‰

„

using grammar of language to get at sentence structure.

semantic analysis ‰

„

using knowledge of how sounds make up words.

mapping this to meaning

pragmatics ‰

using world knowledge and context to fill in aspects of meaning.

AI++ - Hoá NGUYEN @ 2006

19

Syntactic Analysis „ „

We will focus on syntax. How do we recognise that a sentence is grammatically correct? ‰ ‰

„

The cat sat on the mat. OK On the the sat cat mat. NO.

More importantly, how to we use knowledge of language structures to assign structure to a sentence (helping in deriving its meaning). ‰ ‰

(The large green cat) (sat on (the small mat)) Bracketed bits are meaningful subparts.

AI++ - Hoá NGUYEN @ 2006

20

Grammars „

Grammars define the legal structures of a language.

„

We “parse” a sentence using a grammar to: ‰ ‰

„

Determine whether it is grammatical. Assign some useful structure/grouping to the sentence.

We want the words denoting an object to be grouped together, and words denoting actions to be grouped together.

21

AI++ - Hoá NGUYEN @ 2006

Syntactic Categories „

Grammars based on each word belonging to a particular category: ‰ ‰ ‰ ‰ ‰

„

nouns verbs adjectives adverbs articles/determiners

Example ‰ ‰

The black cat jumps article adjective noun verb

AI++ - Hoá NGUYEN @ 2006

quickly adverb

22

Larger groupings „

Noun phrase: sequence of words denoting an object. ‰

„

E.g.: the black cat.

Verb phrase: sequence of words denoting an action. E.g., ‰ ‰ ‰

jumps quickly runs after the small dog kicks the small boy with the funny teeth

! Note that verb phrases may contain noun phrases.

AI++ - Hoá NGUYEN @ 2006

23

Simple NL Grammar „

We can write a simple NL grammar using phrase structure rules such as the following: ‰

sentence --> nounPhrase, verbPhrase. nounPhrase --> article, adjective, noun.

‰

verbPhrase --> verb, nounPhrase.

‰

„

This means: ‰

‰

„

a sentence can consist of a noun phrase followed by a verb phrase. A noun phrase can consist of an article, followed by an adjective, followed by a noun.

Rules define constituent structure.

AI++ - Hoá NGUYEN @ 2006

24

Parsing „

„

Using these rules we can determine whether a sentence is legal, and obtain its structure. Example: “The large cat eats the small rat”, this consists of: Noun Phrase: The large cat ‰ Verb Phrase: eats the small rat The verb phrase in turn consists of: ‰

„ „

verb: eats Noun Phrase: the small rat

25

AI++ - Hoá NGUYEN @ 2006

Parse Tree „

This structure can be represented as a tree: sentence noun phrase

article adjective noun

The large cat

AI++ - Hoá NGUYEN @ 2006

verb phrase

verb

noun phrase

article adjective noun eats the small rat

26

Parse Tree „

This tree structure gives you groupings of words. (e.g., the small cat).

„

These are meaningful groupings - considering these together helps in working out what the sentence means.

AI++ - Hoá NGUYEN @ 2006

27

Parsing „

Basic approach is based on rewriting.

„

To parse a sentence you must be able to “rewrite” the “start” symbol (in this case sentence) to the sequence of syntactic categories corresponding to the sentence.

„

You can rewrite a symbol using one of the grammar rules if it corresponds to the LHS of a rule. You then just replace it with the symbols in LHS. e.g., ‰ ‰ ‰ ‰

sentence nounPhrase verbPhrase article adjective noun verbPhrase Etc.

AI++ - Hoá NGUYEN @ 2006

28

A little more on grammars „

„

Example grammar will ONLY parse sentences of a very restricted form. What about: ‰ ‰ ‰

„

“John jumps” The man jumps”. John jumps in the pond.

We need to add extra rules to cover some of these cases

AI++ - Hoá NGUYEN @ 2006

29

Extended Grammar ‰ ‰ ‰ ‰

sentence --> nounPhrase, verbPhrase. nounPhrase --> article, adjective, noun. nounPhrase --> article, noun. nounPhrase --> properName.

verbPhrase --> verb, nounPhrase. Think how you ‰ verbPhrase --> verb. might handle “in the pond”..? „ Grammar now parses: ‰

‰

„

John jumps the pond.

And fails to parse ungrammatical ones like: ‰

jumps pond John the

AI++ - Hoá NGUYEN @ 2006

30

NL Grammars „

A good NL grammar should: ‰ ‰

cover a reasonable subset of natural language. Avoid parsing ungrammatical sentences „

‰

„

(or at least, ones that are viewed as not acceptable in the target application).

Assign plausible structures to the sentence, where meaningful bits of the sentence are grouped together.

But.. The role is NOT to check that a sentence is grammatical. By excluding dodgy sentences the grammar is more likely to get the right structure of a sentence.

AI++ - Hoá NGUYEN @ 2006

31

More on grammars „

Consider following examples: ‰ ‰ ‰ ‰ ‰ ‰ ‰

“John likes.” NOT OK “John jumps.” OK “John jumps in the water,” OK “The small fluffy cat jumps.” OK John like the cat. NOT OK. The cats likes John. NOT OK. The cat on the table likes John. OK

AI++ - Hoá NGUYEN @ 2006

32

Better grammar „

Should deal with: ‰

‰

‰ ‰

Intransive/Transitive verbs. Former are ones that don’t need following noun phrase. Prepositional phrases (e.g., in the lake). Prepostion followed by noun phrase. Series of adjectives. Recursive rule can be used.. Subject-verb agreement. Can add arguments to grammar rules/dictionary entries. „ „ „

sentence --> np(Num), vp(Num). np(Num) --> art, noun(Num). noun(sing) --> [cat].

AI++ - Hoá NGUYEN @ 2006

33

Semantics „ „

„

Syntax: Uses grammar to structure sentence. Semantics: Maps this to a structured representation that can be used in inference. (often referred to as sentence meaning) Possible representations: ‰

‰

‰

SQL. Map “Find me all the students who are taking AI3” to relevant SQL query. Predicate Logic: Map “John loves anyone who is tall” onto relevant statement in predicate logic. Other structured rep: e.g., “case frame”: action: loves subject: john object: mary

AI++ - Hoá NGUYEN @ 2006

34

Semantics „

„

„

How do we get from the parsed sentence to this kind of representation? In general rather tricky, but to illustrate idea we will show how it could be done for “John loves Mary” by adding extra arguments to a prolog grammar. We want to map that sentence to ‰

„

loves(john, mary).

We will cheat by assuming that the functor pf Prolog structured objects can be a variable. ‰

Verb(Object, Subject)

AI++ - Hoá NGUYEN @ 2006

35

Grammar with Semantics Sentence(Verb(Subject, Object)) --> nounPhrase(Subject), verbPhrase(Verb, Object). nounPhrase(Subject) --> properName(Subject). verbPhrase(Verb, Object) --> verb(Verb), nounPhrase(Object). „

General idea is that we can “compose” the sentence meaning by working out the “meaning” of the syntactic constituents and sticking the results together somehow.

AI++ - Hoá NGUYEN @ 2006

36

Pragmatics „

„ „

But can’t get very far without knowing something about the world, and the context in which a sentence is uttered. Pragmatics deals with this. Example. Determining referents of pronouns etc. ‰

‰

“John likes that blue car. He buys it.” We need context to determine what he is referring to in “that blue car”, “he”, it”. Then can create meaning: likes(john, car1) and buys(john, car1).

AI++ - Hoá NGUYEN @ 2006

37

Pragmatics „

„

Pragmatics is also about what people DO with language. Making sense of, and generating language involves mapping language to goals. ‰

‰

„

“Do you have the time?” -> speaker wants to know the time. “When is the last train to London?” -> speaker probably wants to go there.

We can apply some of our planning ideas to this problem.

AI++ - Hoá NGUYEN @ 2006

38

Pragmatics and Plans „

„ „ „

„

As an example of a plan-based approach to language, consider the actions of requesting, informing, asking. Referred to as “speech acts”. We can describe these as planning operators. The preconditions and effects refer to speaker and hearer’s beliefs and desires. We use a notation to describe these: ‰ ‰

„

knows(Agent, Fact) wants(Agent, State/Action)

e.g. ‰ ‰

wants(fred, kiss(fred, mary)) knows(fred, loves(mary, joe))

AI++ - Hoá NGUYEN @ 2006

39

Putting it all together „

Given sentences like spoken by John about Fred: ‰ ‰

„

“What is the time? He has missed the train.

Can now ‰ ‰

‰

parse the sentence map that to a structured representation that is good for inference. Use context and knowledge of goals/plans to obtain from that: „

„

wants(john, know(john, time1)) (where time1 is the time at some instant) believes(john, missed(fred, train2))

AI++ - Hoá NGUYEN @ 2006

40

Agenda „

Introduction : From words to meaning

„

Difficulties

„

Understanding

„

Generation

„

Applications: Spoken Dialogue System

AI++ - Hoá NGUYEN @ 2006

41

Language Generation „

Language processing also about generation of language. ‰

„

Structured representation --> NL text.

Simplest generation method is using templates, mapping representation straight to text template (with variables/slots to fill in). ‰ ‰

loves(X, Y) -> X “loves” Y gives(X, Y, Z) -> X “gives the” Y “to” Z

AI++ - Hoá NGUYEN @ 2006

42

Language Generation „

But much more to language generation in general. Templates are very rigid. ‰

‰

„

Consider “John eats the cheese. John eats the apple. John sneezes. John laughs.” Better as “John eats the cheese and apple, then sneezes. He then laughs.”

Getting good style involves working out how to map many facts to one sentence, when to use pronouns, when to use “connectives” like “then”.

AI++ - Hoá NGUYEN @ 2006

43

Language Generation „

Serious language generation involves deciding: ‰ ‰ ‰ ‰

‰

„

What to say. How to order and structure it. How to break it up into sentences. How to refer to objects (using pronouns, and expressions like “the cat” etc). How to express things in terms of grammatically correct sentences.

Often starting point is a communicative goal

AI++ - Hoá NGUYEN @ 2006

44

Agenda „

Introduction : From words to meaning

„

Difficulties

„

Understanding

„

Generation

„

Applications: Spoken Dialogue System

AI++ - Hoá NGUYEN @ 2006

45

Human Conversation „

Human data is used to inform design of conversational systems ‰ ‰

„

scheduling assistant cross-language information access

Computational questions: ‰ ‰

how to represent structural information in dialogue? how to compute this representation?

AI++ - Hoá NGUYEN @ 2006

46

Speech act „

Austin (1962): An utterance is a kind of action

„

One utterance – three acts: ‰

‰

‰

Locutionary act: the utterance of a sentence with a particular meaning Illocutionary act: the act of asking, answering, promising, etc., in uttering a sentence Perlocutionary act: the (often intentional) production of certain effects upon the thoughts, feelings, or actions of addressee in uttering a sentence

AI++ - Hoá NGUYEN @ 2006

47

Speech act „

Example: “You can’t do that !” ‰

Locutionary force: „

‰

Illocutionary force: „

‰

Imperative Protesting

Perlocutionary act: „ „

Intent to annoy addressee Intent to stop addresses from doing something

AI++ - Hoá NGUYEN @ 2006

48

Five classes of Speech Acts (Searle, 1975) „

Assertives: committing the speaker to something’s being the case (suggesting, putting forward, swearing, boasting)

„

Directives: attempts by the speaker to get the addressee to do something (asking, ordering, requesting)

„

Commisives: committing the speaker to future course ofaction (promising, planning, vowing, betting, opposing)

„

Expressives: expressing the psychological state of the speaker about a state of affairs (thanking, apologizing, welcoming, deploring)

„

Declarations: bringing about a different state of the world via the utterance (I resign; You’re fired)

AI++ - Hoá NGUYEN @ 2006

49

Dialogue act „

„

„

„

An act with associated structural information related to its dialogue function Multiple classification schemes have been developed in the past These schemes combine ideas from Searle, Austin and others, but details may change from one domain to another Ex: Meeting organizing task ‰ ‰

Two-party scheduling dialogues Speakers were asked to plan a meeting at some future date

AI++ - Hoá NGUYEN @ 2006

50

Spoken dialogue system - Allô, Mr. Dupuis?

- Please book the roomhave Lafayette - Mr. Dupont would Lafayette Yes… forroom tomorrow at 10h for an important meeting, so Cool, and inform also allroom - -Tell him I need it and I this ask him could you let him have ofmorning, PVE project tomember give up me this room. tomorrow please?

please. - Great, Thank you, Mr. Dupuis.

Dupont

- Allô, Mr. this Dupont - Sorry, room?has been Yes…. - M.booked Dupuisbyhas Mr.agreed Dupuisto let you OK, but move my have- OK, the room I will Lafayette call him and I will reservation to the next day.. let you know about the result. tomorrow. - OK, I will call them immediately

Spoken Spoken Spoken Dialogue Dialogue Dialogue System System Dupuis

Services Services

51

AI++ - Hoá NGUYEN @ 2006

Dialogue management Generic algorithm of speech turn treatment: Fp

π • Detect and handle incomprehension, • Calculate the adequate dialogue strategy, • Invoke the task manager to perform necessary inference, • Determine the theme, the goal to refine, • Generate the acts of system. Next turn (π+1)

AI++ - Hoá NGUYEN @ 2006

52

Architecture

Utterance

Utterance

Automatic Speech Recognizer

TTS

Orthographical String Parser

Generator

Actions

Semantic Schema Pragmatic Interpreter

Task Manager Acts

Dialogue Manager 53

AI++ - Hoá NGUYEN @ 2006

Multisession dialogue Emergence session S

Welcome to the meeting organization service of the CLIPS laboratory, I am Melina and you?

D

Good morning, Jean Caelen is speaking

S

Good morning Director! Can I help you?

D

Please book the Lafayette room for me on Thursday from 15 p.m. to 17 p.m.

S

I am sorry, but this room is already reserved! More precisely, Lafayette room was booked by Mr. Yannick Fouquet from 14 p.m. to 16 p.m., and by Mrs. Solange Hollard from 16 p.m. to 17 p.m. on Thursday 7 April 2005

D

Tell them I absolutely need this room and I ask them to change it…

S

OK, I will call them and I will give you the result later.

AI++ - Hoá NGUYEN @ 2006

54

Multisession dialogue Negotiation session with P1 S

Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mr. Yannick Fouquet?

P1

Yes, it’s me…

S

Mr. Jean Caelen has an important 2 hours meeting from 15 p.m. Thursday 7 April 2005. He asks you to let him have the Lafayette room. Would this be possible for you?

P1

Let me see, um, OK, yes, it should be possible.

S

Do you want to put back your meeting to another day? For example, Friday 8 April 2005

P1

No no, book me the Aquarium room please. I also need a video-projector

S

OK, this room has 30 places, and you will have it as well as a videoprojector at 14 p.m. for 2 hours on Thursday 7 April 2005.

P1

Very good!

S

Do you want me to inform the participants?

P1

No, I will do it myself.

S

OK.

P1

Thanks and good bye

S

Thank you for your generosity! Good bye, Mr. Fouquet. 55

AI++ - Hoá NGUYEN @ 2006

Multisession dialogue Negotiation session with P2 S

Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mrs. Solange Hollard?

P2

Yes

S

I am contacting you about your Lafayette room reservation. Mr. Jean Caelen has an important 2 hours meeting from 15 p.m. Thursday 7 April 2005. Could you let him have this room, please?

P2

I am sorry but this is not possible, I already fixed my agenda…

S

Could you move your meeting to another day, Friday 8 April 2005, for example?

P2

No, I don’t want to change my date…

S

I see, I will have to find another solution for him! Thank you, Mrs. Hollard and see you later.

AI++ - Hoá NGUYEN @ 2006

56

Multisession dialogue Notification session S

Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mr Jean Caelen?

D

Yes

S

Mr. Yannick Fouquet agreed to let you have the Lafayette room at 15 p.m. Thursday 7 April 2005. However, Mrs. Solange Hollard did not agree to modify her reservation! Could you therefore choose another day please?

D

That’s a pity! OK, move my meeting to this Friday from 10a.m. to midday

S

You will have the Lafayette room at your disposition. It has 45 places

D

Very good

S

Would you like me to inform the participants

D

Yes, this would be kind. Inform all members of CLIPS for me please

S

By e-mail or by telephone?

D

By phone, of course.

S

OK, I will announce your meeting to them immediately

D

Thank you and bye

S

Good bye, Mr. Caelen.

AI++ - Hoá NGUYEN @ 2006

57

Summary „

„

Natural Language Processing covers understanding and generating spoken and written language, from sentences to large texts. Focus on understanding sentences. ‰ ‰

‰

‰

First step is to parse sentence to derive structure. Use grammar rules which define constituency structure of language. Parse gives tree structure which shows how words are grouped together. Analyze pragmatic & discourse to discover maximally meaning of text

AI++ - Hoá NGUYEN @ 2006

58