Hsin-Hsi Chen English Organizer: David Kirk Evans - Horizons

Controversies, World War II”. • Examine polarity. • List top opinion holders, polarity. • Use mutual information / log likelihood measures to identify opinionated ...
3MB taille 1 téléchargements 316 vues
Multilingual Opinion Analysis at NTCIR-6 Japanese Organizer:Yohei Seki Chinese Organizer: Hsin-Hsi Chen English Organizer: David Kirk Evans

Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders

Opinion Analysis • Given a sentence: • Does it express an opinion? • Polarity? (Positive, Negative, Neutral) • Who expresses the opinion? • Is it relevant to the document set topic?

Opinion Analysis •

Aomori Gov. Morio Kimura on Tuesday banned a ship from carrying highly radioactive waste into a port here, voicing concern that Tokyo may try to turn this tiny fishing village into a permanent nuclear dumping site.

• 2/3 Agree: opinionated • 2/3 Agree: relevant to topic Give information regarding protests against nuclear power.

Opinion Analysis •

Aomori Gov. Morio Kimura on Tuesday banned a ship from carrying highly radioactive waste into a port here, voicing concern that Tokyo may try to turn this tiny fishing village into a permanent nuclear dumping site.

• 2/3 Agree: opinionated • 2/3 Agree: relevant to topic Give information regarding protests against nuclear power.

Corpus Annotation •

Three annotators per document



~ 20 docs per topic (EN, JA), 40 CH

• •

1998~2001 data CH annotators students, JA newsrelated, EN translators & teachers

Feature

Value

Req’d?

Opinionated

YES, NO

Yes

Opinion Holder

String, multiple per Yes sentence possible

Relevant

YES, NO

No

Polarity

Positive, Neutral, Negative

No

Corpus Sources • Japanese: 1998-2001 Yomiuri, Mainichi newspapers

• Chinese: 1998-2001 United Daily News,

China Times, China Times Express, Commercial Times, Central and Daily News

• English: 1998-2001 Mainichi Daily News, Korea Times, Xinghua

Annotator Training •

JA: 1 topic for training, basic instructions for opinionated / relevant / polarity. 2 adjudication meetings. Checked 2 topics for disagreement, revised answers for higher agreement



EN: 1 topic for training, 1 adjudication meeting, same guidelines as JA



CH: 1 hour meeting with annotators, explained special cases, could ask questions for confusing cases, but did not dictate “answer”

Some Guidelines • General beliefs, “common sense knowledge” are not opinions

• Expressions of future plans are not opinions

• JA: Rules about how to write opinion

holders (title, position, affiliation, etc.)

Some Guidelines • Generally follow Janyce Wiebe, Theresa

Wilson , and Claire Cardie (2005). “Annotating expressions of opinions and emotions in language”. Language Resources and Evaluation, volume 39, issue 2-3, pp. 165-210.

Training Data • 4 sample topics for Chinese and Japanese • 1 sample topic for English • Reference to MPQA opinion corpus

Annotator Agreement Lang

Min

Max

Avg.

CH

.0537

.4065

.2328

EN

.1673

.4799

.2943

JA

.5997

.7681

.6740

Cohen’s Kappa

Annotator Agreement •

EN, JA have consistent annotators



CH uses 3 annotators from pool of 7 (pertopic agreement)



JA high agreement

Lang E E E E E E E E E J J J J J J J J J

Pair 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3

Task Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity

Kappa 0.4799 0.1673 0.2357 0.2666 0.4763 0.4143 0.4298 0.1710 0.2247 0.6499 0.6107 0.7919 0.4130 0.3676 0.8576 0.5736 0.5341 0.7734

Annotator Agreement •

EN, JA have consistent annotators



CH uses 3 annotators from pool of 7 (pertopic agreement)

• •

JA high agreement EN #3 difficult!

Lang E E E E E E E E E J J J J J J J J J

Pair 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3

Task Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity

Kappa 0.4799 0.1673 0.2357 0.2666 0.4763 0.4143 0.4298 0.1710 0.2247 0.6499 0.6107 0.7919 0.4130 0.3676 0.8576 0.5736 0.5341 0.7734

Average Chinese Agreement Per-topic 0.400 0.406 0.225 0.194 0.160 0.261 0.363 0.320

0.377 0.104 0.122 0.271 0.274 0.195 0.269 0.054

0.232 0.235 0.236 0.134 0.316 0.366 0.366 0.200

0.054 ~ .406

0.221 0.070 0.235 0.311 0.093 0.142 0.128 0.152

Average English Agreement Per-topic 0.390 0.315 0.438 0.264 0.233 0.245 0.222

0.207 0.413 0.438 0.284 0.446 0.395 0.219

0.319 0.321 0.457 0.192 0.202 0.218 0.401

0.094 ~ 0.438

0.155 0.323 0.142 0.225 0.171 0.154 0.094

Annotator 1-2 English Agreement Per-topic 0.5174 0.4374 0.4565 0.5505 0.3630 0.4287 0.4803

0.4608 0.7079 0.4308 0.4771 0.5884 0.5101 0.5532

0.3574 0.5235 0.6163 0.3903 0.4664 0.4427 0.5075

0.2825 ~ 0.7079

0.4058 0.5205 0.4274 0.5620 0.3673 0.3760 0.2825

Average Japanese Agreement Per-topic 0.4547 0.7607 0.4561 0.6413 0.7163 0.7764 0.3795

0.6199 0.4935 0.6508 0.4640 0.7462 0.7489

0.5871 0.5512 0.7442 0.7922 0.6728 0.7176

0.3795 ~ .9061

0.9061 0.5766 0.7121 0.6046 0.7035 0.7046

Corpus Lang Topics Docs Sents Opin. Rel. CH

28

843

EN

28

439

JA

30

490

62% / 8,546 25% 23% / 8,417 5% 29% / 15,279 22%

Lenient / Strict

39% / 16% 27% / 11% 64% / 49%

Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders

English System • Java component for relevance, feature extraction

• Machine learning component for

opinionated, polarity, opinion holders

• Relevant sentences use a simple vector-

space model with Rocchio blind relevance feedback

Machine learning • WEKA toolkit http://www.cs.waikato.ac.nz/ ~ml/weka/

• ~ 40 machine learning classification algorithms

• After NTCIR-6 evaluation, added Japanese

implementation using the same framework

Training Data • MPQA corpus (http://www.cs.pitt.edu/ mpqa/)

• ~530 documents • holders, polarity, annotated at phrase level • adapted this data to a sentence level task

Pre-processing • OpenNLP Tools http://

opennlp.sourceforge.net/

• Tokenization, POS Tagging, NE extraction for people, locations, organizations

• Gender names: http://www.census.gov/ genealogy/names/names_files.html

• List of country names

Word polarity list • H. Yu and V. Hatzivassiloglou. “Towards

answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences.” (EMNLP 2003)

• ~41,000 words with co-occurrence based polarity score

Opinionated Features • 16 features: length, average_valence,

max_valence, min_valence, in_quote, has_person, has_org, has_place, has_country, he_count, she_count, he_said, she_said, ne_said, said_ne, location

• Sentence is opinionated if MPQA sentence has medium or strong opinion expressed that is not insubstantial

Opinionated learners • Tried 25 learners for nominal data (Y or N) • Baseline is majority class (55% accuracy) • Tried 2 learners for numeric data (1.0 to 0.0)

• 11134 training instances

Opinionated Learners Baseline JRip Logistic Model Trees BayesNet Sequential Minimal Optimization for SVO 70.0 52.5 35.0 17.5 Accuracy

0 Precision

False Positives / 10

Opinion Holders • For each sentence, propose that every previously occurring named entity or pronoun could be an opinion holder

• Compute features for sentence-NE pair • Match to MPQA opinion holders • Assign probability of 1/(# MPQA holders)

Holder features • ~20 features: closeness, isPronoun,

totalAppearances, isPerson, isLocation, isCountry, isOrganization, inQuotes, isImplicit, neSaid, saidNE, sentNumPronouns, sentNumPeople, sentHeSaid, sentSheSaid, sentHeTokens, sentSheTokens, sentNESaid, sentSaidNE

• 4 classes: strong, medium, weak, not holder

Holder training data • 180,777 training instances (features + not holder, weak, medium, or strong holder mapped to 0.0 to 1.0)

• Sampled down and trained on 64,054 instances

• Trained a logistic regression based classifier

Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders

Evaluation Metrics • Precision, Recall, F-Measure over opinionated, relevant, polarity

• Semi-automatic evaluation of opinion holders (precision, recall, f-measure)

• Multiple approaches developed • Lenient: 2/3 annotators must agree • Strict: 3/3 annotators must agree

P, R, F P =

syscorrect sysretrieved

R=

syscorrect allrelevant

F =

2×(P ×R) (P +R)

Results DKE Lenient

Japanese English

Chinese

Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT

Opinionated F .636 .762 .777 .625 .660 .419 .403 .427 .427 .477 .451 .488 .429 .589

Relevance F .813 .777 .590 .473 .395 .393 .320 .632 .638

Polarity F .454 .374 .349 .209 .169 .125 .107 .110 .192 .199 .296

Results DKE Strict

Japanese English

Chinese

Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT

Opinionated F .428 .412 .393 .331 .402 .125 .117 .125 .131 .130 .175 .444 .425 .497

Relevance F .616 .509 .473 .266 .287 .188 .213 .560 .580

Polarity F .296 .186 .150 .099 .049 .029 .018 .027 .061 .158 .218

Holder evaluation • Semi-automatic evaluation • Match system extracted holders to

annotator holder list, automate the process in some way

• Time consuming, only first priority run evaluated

Opinion Holder Results

Japanese English

Chinese

Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT

Lenient F 0.697 0.272 0.303 0.430 0.227 0.266 0.153 0.222 0.094 0.180 0.346 0.105 0.143 0.225

Strict F 0.744 0.279 0.351 0.436 0.227 0.097 0.051 0.074 0.032 0.055 0.146 0.086 0.120 0.172

What sorts of things have opinions? • Look at all Opinion Holders in the humanannotated data

• Use a named entity identification system to tag opinion holders as Person, Organization, Location, etc.

• What is the distribution?

What sorts of things have opinions? • 5723 Opinion Holders • 444 with multiple tags

(uncounted) • 1850 with no tag • All holders that occur 3 or more times verified by hand • 70 percent of respondents to a telephone poll

notag

1850

person

1324

(-author-)

1208

(multiple)

444

location

399

organization

331

group

102

percentage

39

object

26

English Systems Features I C K G syntactic parsing √ √ √ √ quote handling √ interviews √ opinion word lexicon √√√ √ √ machine learning SVM CRF SVM SVM semantic √ √ √ √ patterns / rules √ auto hand coreference resolution √ named entity √

T

D





√ √ SVM Reg. √ √ √



Features of Opinion Holder detection (EN)

Error Types devans %

IIT %

TUT %

cornell %

gate %

icu_kr %

auth from prev sentence

33.0

13.8

20.3

11.5

19.3

7.5

missed sent ne holder

17.0

20.1

13.5

17.5

15.6

9.9

should be author

12.0

8.1

7.8

4.0

5.7

7.5

holder is quoted speaker

11.5

12.6

3.1

8.0

8.7

6.2

missed sent nn holder

11.5

11.5

21.4

14.5

13.3

15.5

holder not in sentence

8.0

13.8

5.7

12.5

11.9

18.6

holder is pronoun

3.5

3.5

4.7

7.5

8.7

12.7

transcript style

2.0

2.3

3.1

4.5

0

1.2

incorrectly said author

1.5

12.2

20.3

20.0

15.6

5.6

parse error?

0

1.2

0

0

0.9

3.7

actually correct

0

0

0

0

0

2.5

no answer

0

0

0

0

0

4.4

incorrect coreference

0

0

0

0

0

3.7

Does content differ across languages? • Who is expressing opinions? • Are they positive or negative? • What are important concepts expressed in opinions?

• Does the above differ by language?

Topic Examination • Topic 010: “History Textbook

Controversies, World War II”

• Examine polarity • List top opinion holders, polarity • Use mutual information / log likelihood

measures to identify opinionated words

Topic 010 Information Docs Sents

POS

NEU

NEG

Rel.

1,641 198 199 528 966 CH 41 (547) (12%) (12%) (32%) (59%) 774 8 57 224 359 EN 20 (258) (1.0%) (7.3%) (28.9%) (46.4%) 2,358 149 148 319 1269 JA 20 (786) (6.3%) (6.3%) (13.5%) (53.8%) Annotated sentences and tags (not lenient or strict standard)

Opinion Holders 60

Author

1,32,17

41

He

20,7, 14

127

Author

28,42, 54

21

S. Korea

0,11,10

24

Koizumi Junichiro

11,4,9

37

Korea

1,19,18

0,18,2

Ministry of 13 8,1,4 Foreign Affairs

30

20 Zhu Bangzao History book 14 group

2,5,7

11

Hsieh, Chi-ta

1,7,3

Hata Ikuhiko 11,4,15

26

Takamori Akinori

13,8,5 3,16,4 1,4,6

7

S. Korean legislators

0,6,1

9

Chu, Te-Lan

1,7,1

23

Tanaka Toshiaki

4

Jp. Ministry of Education

0,2,2

8

Lo, Fu-chen

4,0,4

11

China

English

Chinese

Japanese

Opinionated Terms M Iw,A = log2

!

|A| + |B| |Aw | × |A| |Aw | + |Bw |

"

• Mutual Information • Log Likelihood G2 = 2(alog(a) + blog(b) + clog(c) + dlog(d) − (a + b)log(a + b) − (a + c)log(a + c) − (b + d)log(b + d) − (c + d)log(c + d) + (a + b + c + d)log(a + b + c + d))

Opinionated Terms Log Likelihood

Mutual Information

Log Likelihood

Mutual Information

textbook

invaders

教科書 textbooks

忠実 faithful

history

denigration

歴史 history

不見識 rash

Japanese

blurs

検定 offic. approval おかしい strange

textbooks

biased

修正 revision

いこ (unfair?)

facts

Stage

ない (negation)

郁 culture progress

Japan

Rally

韓国 Korea

許容 permission

Asian

Netizens

1

欺瞞 deception

draft

Cyber

記述 description

東京都立大

descriptions

militarists

つくる

山住 Yamazumi Pn

distorted

tragedies

美化 glorification

危惧 misgivings

Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders

Future Evaluations • Plan to continue opinion analysis task • Sub-sentence level (clause) • Add opinion target • Add opinion strength

Thank You

• For more information please see http://research.nii.ac.jp/ntcir