Bayanathi Technology: Emotions 10000001

When I started out looking around for a sentiment analysis package in R, I was at once impressed by sentimentr. According to its creator Tyler Rinker:

sentimentr is a response to my own needs with sentiment detection that were not addressed by the current R tools. My own polarity function in the qdap package is slower on larger data sets. It is a dictionary lookup approach that tries to incorporate weighting for valence shifters (negation and amplifiers/deamplifiers). Matthew Jockers created the syuzhet package that utilizes dictionary lookups for the Bing, NRC, and Afinn methods as well as a custom dictionary. He also utilizes a wrapper for the Stanford coreNLP which uses much more sophisticated analysis. Jocker’s dictionary methods are fast but are more prone to error in the case of valence shifters. …

And he went on to explain what valence shifters are and why they are important. He gave an example:

mytext <- c(
    'do you like it?  But I hate really bad dogs',
    'I am the best friend.',
    'Do you really like it?  I\'m not a fan'
)
mytext <- get_sentences(mytext)
sentiment(mytext)
##    element_id sentence_id word_count  sentiment
## 1:          1           1          4  0.2500000
## 2:          1           2          6 -1.8677359
## 3:          2           1          5  0.5813777
## 4:          3           1          5  0.4024922
## 5:          3           2          4  0.0000000

Inspired, I also devised some sentences to see how sentiment analysis works:

Sentiment analysis is for the fools.
Sentiment analysis is not for dummies.
Sentiment analysis is not for the masses.
Sentiment analysis is mediocre science.
Sentiment analysis is not the right tool.

They were saved to a text file: “sentiSen.txt” Actually, I tested a number of sentiment analysis packages on them before I tried the emotion detection software reported in my last post.

txt <- readLines("sentiSen.txt")
# using sentimentr package
library(sentimentr)
s <- get_sentences(txt)
x <- sentiment(s)

Here I am learining to improve the look of my posts as well. In the last post, I was just using the default format of results (tables) given by Rmarkdown and tried to change font size in Blogger compose page. The results were disatrous as seen by missing columns and so on. I discovered that too late to do anything. Here I tried “xtable” as well as “stargazer” packages, to format tables, but finally I settled down to the “kableExtra” described here.

library(knitr)
library(kableExtra)

Here is the result from the sentimentr package.To better understand the formatting options of the kableExtra package, you may like to consult the document mentioned above.

kable(x) %>%
  kable_styling(bootstrap_options = c("striped", "hover", 
  "condensed"), full_width=F,position = "left")

element_id	sentence_id	word_count	sentiment
1	1	6	-0.2041241
2	1	6	0.0000000
3	1	7	0.0000000
4	1	5	-0.3354102
5	1	7	-0.3023716

It is really nice to be able to add scrolling for a wide table.

# using sentimentAnalysis package
library(SentimentAnalysis)
y <- analyzeSentiment(txt)
kable(y) %>%
  kable_styling(bootstrap_options = c("striped", "hover", 
  "condensed"),font_size = 11) %>%
     scroll_box(width = "600px")

WordCount	SentimentGI	NegativityGI	PositivityGI	SentimentQDAP	NegativityQDAP	PositivityQDAP
3	-0.3333333	0.3333333	0.00	-0.3333333	0.3333333	0.00
3	0.0000000	0.0000000	0.00	0.0000000	0.0000000	0.00
3	0.0000000	0.0000000	0.00	0.0000000	0.0000000	0.00
4	-0.2500000	0.2500000	0.00	-0.2500000	0.2500000	0.00
4	0.2500000	0.0000000	0.25	0.2500000	0.0000000	0.25

library(syuzhet)
nrc_data <- get_nrc_sentiment(txt)

kable(nrc_data) %>%
  kable_styling(bootstrap_options = c("striped", "hover", 
  "condensed"), font_size = 11, full_width = F,position = "left")

anger	anticipation	disgust	fear	joy	sadness	surprise	trust	negative	positive
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	0

library(coreNLP)
libLoc <- "C:/Users/MTNN/Documents/R/win-library/3.5/
           coreNLP/extdata/stanford-corenlp-full-2015-12-09"
initCoreNLP(libLoc)

Get sentiments with coreNLP:

atxt <- annotateString(txt)
kable(getSentiment(atxt)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width=F, position = "left")

id	sentimentValue	sentiment
1	NA	NA
2	NA	NA
3	NA	NA
4	NA	NA
5	NA	NA

View the annotation of the text by coreNLP:

print(atxt)

A CoreNLP Annotation: num. sentences: 5 num. tokens: 36

z <- getToken(atxt)
kable(z) %>%
  kable_styling(bootstrap_options = c("striped", "hover", 
                  "condensed"), font_size = 11) %>%
  scroll_box(height = "230px")

sentence	id	token	lemma	CharacterOffsetBegin	CharacterOffsetEnd	POS	NER	Speaker
1	1	Sentiment	sentiment	0	9	NN	O	PER0
1	2	analysis	analysis	10	18	NN	O	PER0
1	3	is	be	19	21	VBZ	O	PER0
1	4	for	for	22	25	IN	O	PER0
1	5	the	the	26	29	DT	O	PER0
1	6	fools	fool	30	35	NNS	O	PER0
1	7	.	.	35	36	.	O	PER0
2	1	Sentiment	sentiment	37	46	NN	O	PER0
2	2	analysis	analysis	47	55	NN	O	PER0
2	3	is	be	56	58	VBZ	O	PER0
2	4	not	not	59	62	RB	O	PER0
2	5	for	for	63	66	IN	O	PER0
2	6	dummies	dummy	67	74	NNS	O	PER0
2	7	.	.	74	75	.	O	PER0
3	1	Sentiment	sentiment	76	85	NN	O	PER0
3	2	analysis	analysis	86	94	NN	O	PER0
3	3	is	be	95	97	VBZ	O	PER0
3	4	not	not	98	101	RB	O	PER0
3	5	for	for	102	105	IN	O	PER0
3	6	the	the	106	109	DT	O	PER0
3	7	masses	mass	110	116	NNS	O	PER0
3	8	.	.	116	117	.	O	PER0
4	1	Sentiment	sentiment	118	127	NN	O	PER0
4	2	analysis	analysis	128	136	NN	O	PER0
4	3	is	be	137	139	VBZ	O	PER0
4	4	mediocre	mediocre	140	148	JJ	O	PER0
4	5	science	science	149	156	NN	O	PER0
4	6	.	.	156	157	.	O	PER0
5	1	Sentiment	sentiment	158	167	NN	O	PER0
5	2	analysis	analysis	168	176	NN	O	PER0
5	3	is	be	177	179	VBZ	O	PER0
5	4	not	not	180	183	RB	O	PER0
5	5	the	the	184	187	DT	O	PER0
5	6	right	right	188	193	JJ	O	PER0
5	7	tool	tool	194	198	NN	O	PER0
5	8	.	.	198	199	.	O	PER0

I found the results interesting, but quite surprising. I made all my five sentences to give negative sentiment to “Sentiment analysis”. Consider two of my sentences:

Sentiment analysis is not for dummies.
Sentiment analysis is not for the masses.

I am for the dummies and the masses, because I identify myself with them. Therefore I would condemn Sentiment Anaysis if sentiment analysis were not for the dummies or the masses. That was the sense I intended. But if it were the Sentiment Analysis guru, who is uttering these two sentences, he would in fact be expressing positive sentiment for his favorite subject, am I correct? Well, I am enjoying all this.

Monday, February 25, 2019

Emotions 10000001

No comments:

Post a Comment

Blog Archive