/Sentiment Analysis – Part 1

Sentiment Analysis – Part 1

Sentiment Analysis is the use of natural language processing to determine the polarity of a public opinion, whether it is negative, positive or neutral. Such analysis can help organizations gain insights on current trends and take required action for improving their products or services. They also find applications in non-tech cases like who is going to win the election based on the majority of opinion in social media. Not everyone is on social media but if we take a diverse corpus of different age groups, ethnicity, sex, religion, nationality and societal classes, we can make pretty good predictions. One obtuse use case of sentiment analysis is to predict who is going through depression by looking at the posts, shares and comments of a person. Necessary action can be taken before the person starts getting suicidal instincts.

The demand for Sentiment Analysis is rapidly increasing due to its efficiency and endless applications. These are the most common applications of it:

1. Social Media Sentiment Analysis is used to determine the opinion of a group of people on a particular topic. Also, business adopts sentiment analysis to monitor their brands across different social media platforms.

Example:  Chevy’s “global positivity system,” which uses IBM Watson to evaluate how positive people are on social media.

2. Call Centers: to evaluate the customer service performance.

3. Political campaigns: To monitor the public opinion on policies and campaigns.

This is how we classify texts in sentiment analysis:

  • Dogs are awesome! (Positive)
  • Cats are boring! (Negative)
  • I have a dog. (Neutral)

Three main approaches:

1. Lexicon based approach: In this approach, the definition of sentiment is based on the analysis of individual words and/or phrases; emotional dictionaries are often used: emotional lexical items from the dictionary are searched in the text, their sentiment weights are calculated, and some aggregated weight function is applied. This usually requires powerful linguistic resources (e.g. emotional dictionary), which is not available in every language. In addition, it is difficult to take the context into account.

2. Machine Learning Approach: In the machine learning approach the task of sentiment analysis is regarded as a common problem of text classification and it can be solved by training the classifier on a labeled text collection or giving positive/negative labels to unlabeled texts. In this approach the dictionary is not required and the algorithm is able to extract features on its own. In practice, the methods demonstrate high accuracy of classification.

3. Hybrid Approach: an entity-level method is usually used in which training data is created automatically by assigning an unlabeled text into positive or negative categories based on its included sentiment words. The training data is then fed into a classifier.

Challenges:

1. Named Entity Recognition – whether the word is a name or location or date – what kind of word is it? Sometimes it gets too tough to identify because its based on word features like whether the first character of the word is capital, is it a multi-word name or whether it’s a company name or person name. To solve such issues more information is also added to an algorithm to classify the word like part of speech (noun, verb, adjective and pronoun) of the word preceding and following it. We also consider a longer window of words(3 preceding and 3 succeeding words after the concerned word in the sentence)

2. References – “ I went to the concert and then to a café, I didn’t like it.” What does it refer to?

3. Sarcasm – Consider the following sentence: “My flight’s been delayed. Brilliant!” Most of us will be able to quickly interpret the person is being sarcastic (except Sheldon). By applying contextual understanding to the sentence, we can easily identify the sentiment as negative, Without contextual understanding, a machine looking at the sentence above might see the word “brilliant” and categorize it as positive.

4. Abbreviations, lack of capitals, poor spelling, poor punctuation, poor grammar in Social Media. (You know people invent new words, like covfefe.

Best Supervised Machine Learning Algorithms

Machine Learning algorithms can be categorized into two groups: Supervised and Unsupervised methods. Sentiment Analysis is inherently a supervised problem since machines do not understand meaning of words, labeled data crucial for an accurate analysis. There are many supervised algorithms out there; I have listed 4 of the most used ones below:

1. Naive Bayes Classifier : It is usually the baseline for other methods and uses the least computing power comparing to other methods.

2. Maximum Entropy Classifier: is an exponential classifier. it works by combing features in a linear fashion with features extracted from text.

3. Decision Tree: it consists of root, branches, leaves. The decision point is on the leaf node at the end of each branch.

4. Support Vector Machines: even though SVMs have high accuracy, they have their own challenges of computational complexity and skewed nature of datasets.

Implementation

Here is an implementation of Sentiment Analysis using Naive Bayes model (textblob library)

  1. Create the training set:
  2. Create a Naive Bayes classifier and passing the training data to the constructor:
  3. Finally, classify the text (positive or negative):
  4. Textblob also gives the option to go deeper in sentiment analysis by looking at the most informative features:
train = [
    ('I love this sandwich.', 'pos'),
    ('this is an amazing place!', 'pos'),
    ('I feel very good about these beers.', 'pos'),
    ('this is my best work.', 'pos'),
    ("what an awesome view", 'pos'),
    ('I do not like this restaurant', 'neg'),
    ('I am tired of this stuff.', 'neg'),
    ("I can't deal with this", 'neg'),
    ('he is my sworn enemy!', 'neg'),
    ('my boss is horrible.', 'neg')
 ]
 	
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)

cl.classify("This is an amazing library!")
# 'pos'

cl.show_informative_features(5)
# Most Informative Features
#       contains(my) = True              neg : pos    =      1.7 : 1.0
#       contains(an) = False             neg : pos    =      1.6 : 1.0
#       contains(my) = False             pos : neg    =      1.3 : 1.0
#       contains(place) = False          neg : pos    =      1.2 : 1.0
#       contains(of) = False             pos : neg    =      1.2 : 1.0

This indicates that sentences containing the word “my” but not containing the word “place” tend to be negative.

Check out this demo and try it yourself for a better understanding of Sentiment Analysis.

In the next part, we will take a deeper look into various sentimental analysis APIs in the market, their advantages and disadvantages. Stay tuned!

Machine Learning Developer at TeraTarget and Tech Enthusiast. I am always learning and happy to help, say hi!