After collecting the verbatim (open ended comment) from the survey or other data source:
- We process it with the sentiment tool
- The tool uses industry standard VADER sentiment analysis.
- The score attributed ranges from -5 to 5.
- If it is positive then it is > 0
- If it is negative then it is < 0
Lexicon and rule-based sentiment analysis
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media and open text. It is fully open-sourced under licence from MIT.
This is an overview of the technical aspects of VADER: https://github.com/cjhutto/vaderSentiment
And this is a more theoretical overview of VADER’s application: https://blog.quantinsti.com/vader-sentiment/
VADER is sensitive to both the
- Polarity of a word (whether the sentiment is positive or negative), and the
- Intensity of the emotions associated with the word (how positive or negative is the attributed sentiment)
Valence Scoring
VADER incorporates this by providing a Valence Score to the word into consideration.
Example Valence score of some context-free text (i.e. words literally 'taken out of context' i.e with no surrounding text to amplify the words' meaning on face value) are:
- Positive Valence: "okay" is 0.9 "good" is 1.9, and "great" is 3.1
- Negative Valence: "horrible" is –2.5, emoticon ' :( ' is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5
Note that positive, negative and neutral proportions represent the "raw categorization" of each lexical item (e.g., words, emoticons/emojis, or initialisms) into positive, negative, or neutral classes.
They do not account for the VADER rule-based enhancements such as:
- word-order sensitivity for sentiment-laden multi-word phrases
- degree modifiers
- word-shape amplifiers
- punctuation amplifiers
- negation polarity switches, or
- contrastive conjunction sensitivity.
How does VADER calculate the Valence score of an input sentence?
VADER makes use of certain rules to incorporate the impact of each sub-text on the perceived intensity of sentiment in sentence-level text.
Five Heuristics of VADER:
-
Punctuation, namely the exclamation point (!), increases the magnitude of the intensity without modifying the semantic orientation. For example: “The weather is hot!!!” is more intense than “The weather is hot.”
-
Capitalization, specifically using ALL-CAPS to emphasize a sentiment-relevant word in the presence of other non-capitalized words, increases the magnitude of the sentiment intensity without affecting the semantic orientation. For example: “The weather is HOT.” conveys more intensity than “The weather is hot.”
-
Degree modifiers (also called intensifiers, booster words, or degree adverbs) impact sentiment intensity by either increasing or decreasing the intensity. For example: “The weather is extremely hot.” is more intense than “The weather is hot.”, whereas “The weather is slightly hot.” reduces the intensity.
-
Polarity shift due to Conjunctions, The contrastive conjunction “but” signals a shift in sentiment polarity, with the sentiment of the text following the conjunction being dominant. For example: “The weather is hot, but it is bearable.” has mixed sentiment, with the latter half dictating the overall rating.
- Catching Polarity Negation, By examining the contiguous sequence of 3 items preceding a sentiment-laden lexical feature, we catch nearly 90% of cases where negation flips the polarity of the text.
In short, details matter:
- punctuation - how many exclamation marks are used if any, etc
- emoticons/emojis - the algorithm recognize these too as well as any more common slang words and such (like ‘meh’ and ‘bleh’)
- using capitals - writing words with full caps is calculated differently
So, for example, a group of sentences that at first glance appear identical, but in fact have difference punctuation and emphases, can be scored quite differently as on this example: