Sentiment Analysis of Movie Reviews

Nowadays, a huge amount of information is being generated from various sources like Facebook, Twitter, Quora, movie sites (where audience give feedback about movies) and various group discussion forums, etc. But to maintain this bulk of information, we need some categorization according to the content of the information so that people can easily infer the overall sentiment of the matter of subject and can relate if it is useful to them in one way or another. For example, one can decide whether to buy a particular product or not after looking reviews of a product on shopping website by other customers; one can decide whether to watch a movie or not after taking a snap of the reviews of that movie by other audience, etc.

Sentiment classification is one of the prominent areas for researchers as it used in various fields like business analytics, recommender systems, shopping websites, surveys, feedbacks, intelligent applications and one can make improvements based on the outcomes of the sentiments.

Sentiment analysis can be classified into three different levels:
  • ·         Document level
  • ·         Sentence level
  • ·         Entity-aspect level

First, entity is analyzed to understand the sentiment at micro level followed by sentence level which gives essence at mini level. Finally, the document is considered to extract overall sentiment. Sentiment analysis process is given below.

The audience posts their reviews or opinions in unstructured form. The unstructured data is converted into a structured form, and important features are extracted after performing some data preprocessing. Then, machine learning techniques are applied to classify the sentiments according to polarity.

For understanding the problem from the practical point of view, we would consider the domain of movie reviews. We look at the problem of classification not by topic but by sentiments. We would do sentiment classification of movie reviews by applying machine learning techniques. The domain of movie reviews affect everyone from the audience to movie critics and finally the movie makers. By correct analysis, masses mentioned above can take a correct decision and prove it to be correct and can utilize it in a very effective manner or way and earn profits. But it’s a very challenging task to classify movies based on the sentiments in comparison to text-based classification as the sentiments can be expressed in a very subtle manner. The same topic can be expressed in different ways, the words used to express a positive sentiment can be used to express the negative sentiment in the other sentence. The meaning of words used in a sentence may vary depending upon the surrounding words of that particular word in the sentence.

Sarcasm also contributes to the ambiguity of the sentiment, and it is really difficult to convey the exact tone of the sentence. For example, the movie was supposed to be hilarious! Here the sentiment is positive in one way and negative in another way.

Consider a sentence “the movie interstellar was visually a treat, but the storyline was terrible.” Now one can see how categorizing this sentence as negative, positive or neutral can be difficult. The phrases “visually a treat” and “storyline was terrible” can be considered positive and negative respectively but the degree of their ‘positiveness’ and ‘negativeness’ is somewhat ambiguous. Hence, we can say that sentiment-based classification is more difficult than text-based classification as it requires the understanding of the sentiments.

Classification Approaches for Sentiment Analysis:

Before performing classification task, various feature selection methods like TF-IDF (Term Frequency–Inverse Document Frequency), IG (Information Gain), MI(Mutual Information), Feature Vector, Unigram, Bigram and N-gram methods can be used for extracting important features from the movie review dataset and then machine learning techniques can be applied.

The below-mentioned machine learning techniques can be used for sentiment classification of movie reviews:
  • ·         Naïve Bayes classification
  • ·         Maximum Entropy classification
  • ·         Support Vector Machines
  • ·         Random Forest
  • ·         Gradient Boosting
  • ·         Decision Trees
  • ·         Neural Networks
  • ·         Deep Learning

The results obtained from the machine learning techniques are good and comparable to the baselines generated by the humans. Apart from the above-mentioned techniques, one can use other machine learning techniques for obtaining better results and then compare results generated from various techniques.


References:




Comments