When I started out looking around for a sentiment analysis package in R, I was at once impressed by sentimentr. According to its creator Tyler Rinker:
sentimentr is a response to my own needs with sentiment detection that were not addressed by the current R tools. My own polarity function in the qdap package is slower on larger data sets. It is a dictionary lookup approach that tries to incorporate weighting for valence shifters (negation and amplifiers/deamplifiers). Matthew Jockers created the syuzhet package that utilizes dictionary lookups for the Bing, NRC, and Afinn methods as well as a custom dictionary. He also utilizes a wrapper for the Stanford coreNLP which uses much more sophisticated analysis. Jocker’s dictionary methods are fast but are more prone to error in the case of valence shifters. …
And he went on to explain what valence shifters are and why they are important. He gave an example:
mytext <- c(
'do you like it? But I hate really bad dogs',
'I am the best friend.',
'Do you really like it? I\'m not a fan'
)
mytext <- get_sentences(mytext)
sentiment(mytext)
## element_id sentence_id word_count sentiment
## 1: 1 1 4 0.2500000
## 2: 1 2 6 -1.8677359
## 3: 2 1 5 0.5813777
## 4: 3 1 5 0.4024922
## 5: 3 2 4 0.0000000
Inspired, I also devised some sentences to see how sentiment analysis works:
Sentiment analysis is for the fools.
Sentiment analysis is not for dummies.
Sentiment analysis is not for the masses.
Sentiment analysis is mediocre science.
Sentiment analysis is not the right tool.
They were saved to a text file: “sentiSen.txt” Actually, I tested a number of sentiment analysis packages on them before I tried the emotion detection software reported in my last post.
txt <- readLines("sentiSen.txt")
# using sentimentr package
library(sentimentr)
s <- get_sentences(txt)
x <- sentiment(s)
Here I am learining to improve the look of my posts as well. In the last post, I was just using the default format of results (tables) given by Rmarkdown and tried to change font size in Blogger compose page. The results were disatrous as seen by missing columns and so on. I discovered that too late to do anything. Here I tried “xtable” as well as “stargazer” packages, to format tables, but finally I settled down to the “kableExtra” described here.
library(knitr)
library(kableExtra)
Here is the result from the sentimentr package.To better understand the formatting options of the kableExtra package, you may like to consult the document mentioned above.
kable(x) %>%
kable_styling(bootstrap_options = c("striped", "hover",
"condensed"), full_width=F,position = "left")
element_id | sentence_id | word_count | sentiment |
---|---|---|---|
1 | 1 | 6 | -0.2041241 |
2 | 1 | 6 | 0.0000000 |
3 | 1 | 7 | 0.0000000 |
4 | 1 | 5 | -0.3354102 |
5 | 1 | 7 | -0.3023716 |
It is really nice to be able to add scrolling for a wide table.
# using sentimentAnalysis package
library(SentimentAnalysis)
y <- analyzeSentiment(txt)
kable(y) %>%
kable_styling(bootstrap_options = c("striped", "hover",
"condensed"),font_size = 11) %>%
scroll_box(width = "600px")
WordCount | SentimentGI | NegativityGI | PositivityGI | SentimentHE | NegativityHE | PositivityHE | SentimentLM | NegativityLM | PositivityLM | RatioUncertaintyLM | SentimentQDAP | NegativityQDAP | PositivityQDAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | -0.3333333 | 0.3333333 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -0.3333333 | 0.3333333 | 0.00 |
3 | 0.0000000 | 0.0000000 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0000000 | 0.0000000 | 0.00 |
3 | 0.0000000 | 0.0000000 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0000000 | 0.0000000 | 0.00 |
4 | -0.2500000 | 0.2500000 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -0.2500000 | 0.2500000 | 0.00 |
4 | 0.2500000 | 0.0000000 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.2500000 | 0.0000000 | 0.25 |
library(syuzhet)
nrc_data <- get_nrc_sentiment(txt)
kable(nrc_data) %>%
kable_styling(bootstrap_options = c("striped", "hover",
"condensed"), font_size = 11, full_width = F,position = "left")
anger | anticipation | disgust | fear | joy | sadness | surprise | trust | negative | positive |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
library(coreNLP)
libLoc <- "C:/Users/MTNN/Documents/R/win-library/3.5/
coreNLP/extdata/stanford-corenlp-full-2015-12-09"
initCoreNLP(libLoc)
Get sentiments with coreNLP:
atxt <- annotateString(txt)
kable(getSentiment(atxt)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width=F, position = "left")
id | sentimentValue | sentiment |
---|---|---|
1 | NA | NA |
2 | NA | NA |
3 | NA | NA |
4 | NA | NA |
5 | NA | NA |
View the annotation of the text by coreNLP:
print(atxt)
A CoreNLP Annotation: num. sentences: 5 num. tokens: 36
z <- getToken(atxt)
kable(z) %>%
kable_styling(bootstrap_options = c("striped", "hover",
"condensed"), font_size = 11) %>%
scroll_box(height = "230px")
sentence | id | token | lemma | CharacterOffsetBegin | CharacterOffsetEnd | POS | NER | Speaker |
---|---|---|---|---|---|---|---|---|
1 | 1 | Sentiment | sentiment | 0 | 9 | NN | O | PER0 |
1 | 2 | analysis | analysis | 10 | 18 | NN | O | PER0 |
1 | 3 | is | be | 19 | 21 | VBZ | O | PER0 |
1 | 4 | for | for | 22 | 25 | IN | O | PER0 |
1 | 5 | the | the | 26 | 29 | DT | O | PER0 |
1 | 6 | fools | fool | 30 | 35 | NNS | O | PER0 |
1 | 7 | . | . | 35 | 36 | . | O | PER0 |
2 | 1 | Sentiment | sentiment | 37 | 46 | NN | O | PER0 |
2 | 2 | analysis | analysis | 47 | 55 | NN | O | PER0 |
2 | 3 | is | be | 56 | 58 | VBZ | O | PER0 |
2 | 4 | not | not | 59 | 62 | RB | O | PER0 |
2 | 5 | for | for | 63 | 66 | IN | O | PER0 |
2 | 6 | dummies | dummy | 67 | 74 | NNS | O | PER0 |
2 | 7 | . | . | 74 | 75 | . | O | PER0 |
3 | 1 | Sentiment | sentiment | 76 | 85 | NN | O | PER0 |
3 | 2 | analysis | analysis | 86 | 94 | NN | O | PER0 |
3 | 3 | is | be | 95 | 97 | VBZ | O | PER0 |
3 | 4 | not | not | 98 | 101 | RB | O | PER0 |
3 | 5 | for | for | 102 | 105 | IN | O | PER0 |
3 | 6 | the | the | 106 | 109 | DT | O | PER0 |
3 | 7 | masses | mass | 110 | 116 | NNS | O | PER0 |
3 | 8 | . | . | 116 | 117 | . | O | PER0 |
4 | 1 | Sentiment | sentiment | 118 | 127 | NN | O | PER0 |
4 | 2 | analysis | analysis | 128 | 136 | NN | O | PER0 |
4 | 3 | is | be | 137 | 139 | VBZ | O | PER0 |
4 | 4 | mediocre | mediocre | 140 | 148 | JJ | O | PER0 |
4 | 5 | science | science | 149 | 156 | NN | O | PER0 |
4 | 6 | . | . | 156 | 157 | . | O | PER0 |
5 | 1 | Sentiment | sentiment | 158 | 167 | NN | O | PER0 |
5 | 2 | analysis | analysis | 168 | 176 | NN | O | PER0 |
5 | 3 | is | be | 177 | 179 | VBZ | O | PER0 |
5 | 4 | not | not | 180 | 183 | RB | O | PER0 |
5 | 5 | the | the | 184 | 187 | DT | O | PER0 |
5 | 6 | right | right | 188 | 193 | JJ | O | PER0 |
5 | 7 | tool | tool | 194 | 198 | NN | O | PER0 |
5 | 8 | . | . | 198 | 199 | . | O | PER0 |
I found the results interesting, but quite surprising. I made all my five sentences to give negative sentiment to “Sentiment analysis”. Consider two of my sentences:
Sentiment analysis is not for dummies.
Sentiment analysis is not for the masses.
I am for the dummies and the masses, because I identify myself with them. Therefore I would condemn Sentiment Anaysis if sentiment analysis were not for the dummies or the masses. That was the sense I intended. But if it were the Sentiment Analysis guru, who is uttering these two sentences, he would in fact be expressing positive sentiment for his favorite subject, am I correct? Well, I am enjoying all this.
No comments:
Post a Comment