Twitter has grown to become a very popular platform over the past
few years consisting of 190 million users and 65 million tweets publishing
everyday. Twitter sentiment analysis has been a very famous paradigm for the
past 10 years. This blog makes you acquainted with a new sentiment analysis
technique called “target specific sentiment analysis” which can be used to form
public opinion on elections.
Before going on to the technical aspects , you may ask a very
intuitive question : Why do we need a new sentiment analysis technique? Why
can’t the existing techniques suffice to the needs of forming public opinions?
Well the answer to this question lies in analysing the
state-of-the-art sentiment analysis techniques. Most of them use machine
learning based classifiers (info on classifiers) which do not consider the targets
being talked about in the tweet. But what is a target? A target can be any
entity that is being discussed in the tweet. For example in “Oh, Jon Stewart.
How I love you so.” , the author expresses love for “you” but he/she actually
mean to express love for “John stewart” (which is a target). The problem with
state-of-the-art techniques can become clearer with the following example : “I
will be voting for BJP mainly to remove Congress as a ruling party”. The
overall sentiment of the tweet is positive but there is a negative sentiment
about congress. These examples are very common among the tweets on topics like
politics (Wang
et al). It becomes vital to distinguish and differentiate sentiments
towards major issues and entities being discussed on the topics such as
politics.
Now, the question comes how can the so called “target specific
sentiment analysis” be achieved? Do you need to build a completely new system
or improve the existing state-of-the art techniques? A completely new system
can be made but it is certainly more instinctive to rather add on top of the
existing techniques. What relevant things you should add to the current
techniques? Assuming you know a bit of machine learning (or probably have gone
through the link) one obvious approach can be to add target specific features!
Before going on to the features, the prior and important
task is to identify the targets in a set of tweets. Target identification is
done by using a technique called extended targets(Jiang et al). Why you need extended targets ? It
is needed because people often express their sentiments about a target by
commenting on the things related to target rather than the target itself. For
instance in “I am passionate about Microsoft technologies especially
Silverlight.” , the author expresses a positive sentiment about Microsoft by
referring to Microsoft technologies. Hence the foremost job is to attribute all
the noun phrases as targets. You can identify related references to the targets
as follows -
- All the words
referring to the target can be regarded as extended targets. For example,
“All hail Jon Snow! He is king in the north.” he is a reference to
Jon snow.
- Finding top-k
noun and noun phrases which share a strong affiliation with the target and
adding them as extended targets.
- Identifying head
nouns from the extended targets. For instance, assume we have an extended
target for ‘ jon snow’ as “The night’s watch” we can further add ‘watch’
as the new extended target.
Having
explored the ways to find the targets, now target specific features have to
added. It is a robust and practical approach to take into account the figures
of speech used for the target. Hence the objective of adding features is not
only to be target specific but also figure of speech oriented. Consider a word
w and the target as T (or it’s extended target) , features are added in the
following fashion -
- w is a
transitive verb accompanied by T or any of its extended targets as object,
then feature w_arg2 is added. For instance, for the tweet say “I love
Macbook ” we generate Macbook_arg2 as the feature for the target Macbook.
- w is a
transitive verb and T is accounted as object, then feature w_arg1 is
added.
- w is an
intransitive verb and T is its subject, then feature w_it_arg1 is added.
- w is an
adjective or noun joined via connecting word to T then w_cp_arg1 is added
as feature for target.
There
are some more features mentioned in Jiang et al . After adding these features (related
to a target) one can train classifiers like SVM and analyse the sentiments
using the existing techniques.
Adhering to the title of the blog, consider a system of elections.
The target can be a political party. You want to find how likely is the chance
of this party winning the elections. Here, the target specific sentiment
analysis can come to the rescue. You can analyse the sentiments for that
political party on twitter which can prove as an aid for polling the elections.
Moreover this technique can also be used to find bots on twitter which keep on
creating a positive sentiment for a particular political party. Besides taking
the system to be elections there can be many more systems such as bollywood
(entities can be actors), sports and religion. Hence, target specific sentiment
analysis can have plethora of applications in varied domains and can be a very
worthwhile technique.
Comments
Post a Comment