Multilingual Opinion Analysis at NTCIR-6 Japanese Organizer:Yohei Seki Chinese Organizer: Hsin-Hsi Chen English Organizer: David Kirk Evans
Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders
Opinion Analysis • Given a sentence: • Does it express an opinion? • Polarity? (Positive, Negative, Neutral) • Who expresses the opinion? • Is it relevant to the document set topic?
Opinion Analysis •
Aomori Gov. Morio Kimura on Tuesday banned a ship from carrying highly radioactive waste into a port here, voicing concern that Tokyo may try to turn this tiny fishing village into a permanent nuclear dumping site.
• 2/3 Agree: opinionated • 2/3 Agree: relevant to topic Give information regarding protests against nuclear power.
Opinion Analysis •
Aomori Gov. Morio Kimura on Tuesday banned a ship from carrying highly radioactive waste into a port here, voicing concern that Tokyo may try to turn this tiny fishing village into a permanent nuclear dumping site.
• 2/3 Agree: opinionated • 2/3 Agree: relevant to topic Give information regarding protests against nuclear power.
Corpus Annotation •
Three annotators per document
•
~ 20 docs per topic (EN, JA), 40 CH
• •
1998~2001 data CH annotators students, JA newsrelated, EN translators & teachers
Feature
Value
Req’d?
Opinionated
YES, NO
Yes
Opinion Holder
String, multiple per Yes sentence possible
Relevant
YES, NO
No
Polarity
Positive, Neutral, Negative
No
Corpus Sources • Japanese: 1998-2001 Yomiuri, Mainichi newspapers
• Chinese: 1998-2001 United Daily News,
China Times, China Times Express, Commercial Times, Central and Daily News
• English: 1998-2001 Mainichi Daily News, Korea Times, Xinghua
Annotator Training •
JA: 1 topic for training, basic instructions for opinionated / relevant / polarity. 2 adjudication meetings. Checked 2 topics for disagreement, revised answers for higher agreement
•
EN: 1 topic for training, 1 adjudication meeting, same guidelines as JA
•
CH: 1 hour meeting with annotators, explained special cases, could ask questions for confusing cases, but did not dictate “answer”
Some Guidelines • General beliefs, “common sense knowledge” are not opinions
• Expressions of future plans are not opinions
• JA: Rules about how to write opinion
holders (title, position, affiliation, etc.)
Some Guidelines • Generally follow Janyce Wiebe, Theresa
Wilson , and Claire Cardie (2005). “Annotating expressions of opinions and emotions in language”. Language Resources and Evaluation, volume 39, issue 2-3, pp. 165-210.
Training Data • 4 sample topics for Chinese and Japanese • 1 sample topic for English • Reference to MPQA opinion corpus
Annotator Agreement Lang
Min
Max
Avg.
CH
.0537
.4065
.2328
EN
.1673
.4799
.2943
JA
.5997
.7681
.6740
Cohen’s Kappa
Annotator Agreement •
EN, JA have consistent annotators
•
CH uses 3 annotators from pool of 7 (pertopic agreement)
•
JA high agreement
Lang E E E E E E E E E J J J J J J J J J
Pair 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3
Task Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity
Kappa 0.4799 0.1673 0.2357 0.2666 0.4763 0.4143 0.4298 0.1710 0.2247 0.6499 0.6107 0.7919 0.4130 0.3676 0.8576 0.5736 0.5341 0.7734
Annotator Agreement •
EN, JA have consistent annotators
•
CH uses 3 annotators from pool of 7 (pertopic agreement)
• •
JA high agreement EN #3 difficult!
Lang E E E E E E E E E J J J J J J J J J
Pair 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3 1-2 1-3 2-3
Task Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity Opinionated Opinionated Opinionated Relevant Relevant Relevant Polarity Polarity Polarity
Kappa 0.4799 0.1673 0.2357 0.2666 0.4763 0.4143 0.4298 0.1710 0.2247 0.6499 0.6107 0.7919 0.4130 0.3676 0.8576 0.5736 0.5341 0.7734
Average Chinese Agreement Per-topic 0.400 0.406 0.225 0.194 0.160 0.261 0.363 0.320
0.377 0.104 0.122 0.271 0.274 0.195 0.269 0.054
0.232 0.235 0.236 0.134 0.316 0.366 0.366 0.200
0.054 ~ .406
0.221 0.070 0.235 0.311 0.093 0.142 0.128 0.152
Average English Agreement Per-topic 0.390 0.315 0.438 0.264 0.233 0.245 0.222
0.207 0.413 0.438 0.284 0.446 0.395 0.219
0.319 0.321 0.457 0.192 0.202 0.218 0.401
0.094 ~ 0.438
0.155 0.323 0.142 0.225 0.171 0.154 0.094
Annotator 1-2 English Agreement Per-topic 0.5174 0.4374 0.4565 0.5505 0.3630 0.4287 0.4803
0.4608 0.7079 0.4308 0.4771 0.5884 0.5101 0.5532
0.3574 0.5235 0.6163 0.3903 0.4664 0.4427 0.5075
0.2825 ~ 0.7079
0.4058 0.5205 0.4274 0.5620 0.3673 0.3760 0.2825
Average Japanese Agreement Per-topic 0.4547 0.7607 0.4561 0.6413 0.7163 0.7764 0.3795
0.6199 0.4935 0.6508 0.4640 0.7462 0.7489
0.5871 0.5512 0.7442 0.7922 0.6728 0.7176
0.3795 ~ .9061
0.9061 0.5766 0.7121 0.6046 0.7035 0.7046
Corpus Lang Topics Docs Sents Opin. Rel. CH
28
843
EN
28
439
JA
30
490
62% / 8,546 25% 23% / 8,417 5% 29% / 15,279 22%
Lenient / Strict
39% / 16% 27% / 11% 64% / 49%
Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders
English System • Java component for relevance, feature extraction
• Machine learning component for
opinionated, polarity, opinion holders
• Relevant sentences use a simple vector-
space model with Rocchio blind relevance feedback
Machine learning • WEKA toolkit http://www.cs.waikato.ac.nz/ ~ml/weka/
• ~ 40 machine learning classification algorithms
• After NTCIR-6 evaluation, added Japanese
implementation using the same framework
Training Data • MPQA corpus (http://www.cs.pitt.edu/ mpqa/)
• ~530 documents • holders, polarity, annotated at phrase level • adapted this data to a sentence level task
Pre-processing • OpenNLP Tools http://
opennlp.sourceforge.net/
• Tokenization, POS Tagging, NE extraction for people, locations, organizations
• Gender names: http://www.census.gov/ genealogy/names/names_files.html
• List of country names
Word polarity list • H. Yu and V. Hatzivassiloglou. “Towards
answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences.” (EMNLP 2003)
• ~41,000 words with co-occurrence based polarity score
Opinionated Features • 16 features: length, average_valence,
max_valence, min_valence, in_quote, has_person, has_org, has_place, has_country, he_count, she_count, he_said, she_said, ne_said, said_ne, location
• Sentence is opinionated if MPQA sentence has medium or strong opinion expressed that is not insubstantial
Opinionated learners • Tried 25 learners for nominal data (Y or N) • Baseline is majority class (55% accuracy) • Tried 2 learners for numeric data (1.0 to 0.0)
• 11134 training instances
Opinionated Learners Baseline JRip Logistic Model Trees BayesNet Sequential Minimal Optimization for SVO 70.0 52.5 35.0 17.5 Accuracy
0 Precision
False Positives / 10
Opinion Holders • For each sentence, propose that every previously occurring named entity or pronoun could be an opinion holder
• Compute features for sentence-NE pair • Match to MPQA opinion holders • Assign probability of 1/(# MPQA holders)
Holder features • ~20 features: closeness, isPronoun,
totalAppearances, isPerson, isLocation, isCountry, isOrganization, inQuotes, isImplicit, neSaid, saidNE, sentNumPronouns, sentNumPeople, sentHeSaid, sentSheSaid, sentHeTokens, sentSheTokens, sentNESaid, sentSaidNE
• 4 classes: strong, medium, weak, not holder
Holder training data • 180,777 training instances (features + not holder, weak, medium, or strong holder mapped to 0.0 to 1.0)
• Sampled down and trained on 64,054 instances
• Trained a logistic regression based classifier
Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders
Evaluation Metrics • Precision, Recall, F-Measure over opinionated, relevant, polarity
• Semi-automatic evaluation of opinion holders (precision, recall, f-measure)
• Multiple approaches developed • Lenient: 2/3 annotators must agree • Strict: 3/3 annotators must agree
P, R, F P =
syscorrect sysretrieved
R=
syscorrect allrelevant
F =
2×(P ×R) (P +R)
Results DKE Lenient
Japanese English
Chinese
Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT
Opinionated F .636 .762 .777 .625 .660 .419 .403 .427 .427 .477 .451 .488 .429 .589
Relevance F .813 .777 .590 .473 .395 .393 .320 .632 .638
Polarity F .454 .374 .349 .209 .169 .125 .107 .110 .192 .199 .296
Results DKE Strict
Japanese English
Chinese
Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT
Opinionated F .428 .412 .393 .331 .402 .125 .117 .125 .131 .130 .175 .444 .425 .497
Relevance F .616 .509 .473 .266 .287 .188 .213 .560 .580
Polarity F .296 .186 .150 .099 .049 .029 .018 .027 .061 .158 .218
Holder evaluation • Semi-automatic evaluation • Match system extracted holders to
annotator holder list, automate the process in some way
• Time consuming, only first priority run evaluated
Opinion Holder Results
Japanese English
Chinese
Group CUHK NTU UMCP ISCAS GATE IIT TUT Cornell NII GATE ICU-KR EHBN NICT TUT
Lenient F 0.697 0.272 0.303 0.430 0.227 0.266 0.153 0.222 0.094 0.180 0.346 0.105 0.143 0.225
Strict F 0.744 0.279 0.351 0.436 0.227 0.097 0.051 0.074 0.032 0.055 0.146 0.086 0.120 0.172
What sorts of things have opinions? • Look at all Opinion Holders in the humanannotated data
• Use a named entity identification system to tag opinion holders as Person, Organization, Location, etc.
• What is the distribution?
What sorts of things have opinions? • 5723 Opinion Holders • 444 with multiple tags
(uncounted) • 1850 with no tag • All holders that occur 3 or more times verified by hand • 70 percent of respondents to a telephone poll
notag
1850
person
1324
(-author-)
1208
(multiple)
444
location
399
organization
331
group
102
percentage
39
object
26
English Systems Features I C K G syntactic parsing √ √ √ √ quote handling √ interviews √ opinion word lexicon √√√ √ √ machine learning SVM CRF SVM SVM semantic √ √ √ √ patterns / rules √ auto hand coreference resolution √ named entity √
T
D
√
√
√ √ SVM Reg. √ √ √
√
Features of Opinion Holder detection (EN)
Error Types devans %
IIT %
TUT %
cornell %
gate %
icu_kr %
auth from prev sentence
33.0
13.8
20.3
11.5
19.3
7.5
missed sent ne holder
17.0
20.1
13.5
17.5
15.6
9.9
should be author
12.0
8.1
7.8
4.0
5.7
7.5
holder is quoted speaker
11.5
12.6
3.1
8.0
8.7
6.2
missed sent nn holder
11.5
11.5
21.4
14.5
13.3
15.5
holder not in sentence
8.0
13.8
5.7
12.5
11.9
18.6
holder is pronoun
3.5
3.5
4.7
7.5
8.7
12.7
transcript style
2.0
2.3
3.1
4.5
0
1.2
incorrectly said author
1.5
12.2
20.3
20.0
15.6
5.6
parse error?
0
1.2
0
0
0.9
3.7
actually correct
0
0
0
0
0
2.5
no answer
0
0
0
0
0
4.4
incorrect coreference
0
0
0
0
0
3.7
Does content differ across languages? • Who is expressing opinions? • Are they positive or negative? • What are important concepts expressed in opinions?
• Does the above differ by language?
Topic Examination • Topic 010: “History Textbook
Controversies, World War II”
• Examine polarity • List top opinion holders, polarity • Use mutual information / log likelihood
measures to identify opinionated words
Topic 010 Information Docs Sents
POS
NEU
NEG
Rel.
1,641 198 199 528 966 CH 41 (547) (12%) (12%) (32%) (59%) 774 8 57 224 359 EN 20 (258) (1.0%) (7.3%) (28.9%) (46.4%) 2,358 149 148 319 1269 JA 20 (786) (6.3%) (6.3%) (13.5%) (53.8%) Annotated sentences and tags (not lenient or strict standard)
Opinion Holders 60
Author
1,32,17
41
He
20,7, 14
127
Author
28,42, 54
21
S. Korea
0,11,10
24
Koizumi Junichiro
11,4,9
37
Korea
1,19,18
0,18,2
Ministry of 13 8,1,4 Foreign Affairs
30
20 Zhu Bangzao History book 14 group
2,5,7
11
Hsieh, Chi-ta
1,7,3
Hata Ikuhiko 11,4,15
26
Takamori Akinori
13,8,5 3,16,4 1,4,6
7
S. Korean legislators
0,6,1
9
Chu, Te-Lan
1,7,1
23
Tanaka Toshiaki
4
Jp. Ministry of Education
0,2,2
8
Lo, Fu-chen
4,0,4
11
China
English
Chinese
Japanese
Opinionated Terms M Iw,A = log2
!
|A| + |B| |Aw | × |A| |Aw | + |Bw |
"
• Mutual Information • Log Likelihood G2 = 2(alog(a) + blog(b) + clog(c) + dlog(d) − (a + b)log(a + b) − (a + c)log(a + c) − (b + d)log(b + d) − (c + d)log(c + d) + (a + b + c + d)log(a + b + c + d))
Opinionated Terms Log Likelihood
Mutual Information
Log Likelihood
Mutual Information
textbook
invaders
教科書 textbooks
忠実 faithful
history
denigration
歴史 history
不見識 rash
Japanese
blurs
検定 offic. approval おかしい strange
textbooks
biased
修正 revision
いこ (unfair?)
facts
Stage
ない (negation)
郁 culture progress
Japan
Rally
韓国 Korea
許容 permission
Asian
Netizens
1
欺瞞 deception
draft
Cyber
記述 description
東京都立大
descriptions
militarists
つくる
山住 Yamazumi Pn
distorted
tragedies
美化 glorification
危惧 misgivings
Outline • Opinion Analysis Task introduction • Tasks, Data, Annotator Agreement • Description of my system • Opinion Analysis Task results • Analysis of Opinion Holders
Future Evaluations • Plan to continue opinion analysis task • Sub-sentence level (clause) • Add opinion target • Add opinion strength
Thank You
• For more information please see http://research.nii.ac.jp/ntcir