Monday, September 10, 2018

Shuffling into YouTube's comment space


Just three or four months back it was as if I couldn't do much more than to peer hard over the railings of YouTube to get some idea of what's in there.

Then a young friend of mine told me about an internet service provider near my place. So I hooked up with their services for an unlimited internet access at some reasonable price and suddenly I was IN! Before, I was happy enough to get a taste of good things in YouTube like Myanmar oldies, educational materials, DIY clips, political debates, and sensational news through my cellphone. But that was expensive. Now I would keep on enjoying such familiar topics for hours on end or just click-away in abandon.

Since the day I had read about the sweetness of the song of this exotic little bird called a nightingale, one of my schoolboy-fantasies was to listen to its songs in the cool of a shady and beautiful garden somewhere beyond the sea. When the Internet age arrived, I was lucky enough to have unlimited access due to my humble employment in a regional institution. Nevertheless, I was reluctant to lookup for the nightingale and its song. May be I was scared that my untutored ears won't receive its songs well. Driven perhaps by my broadband access, this has changed. Now I am enjoying an hour's worth or more of the nightingale song. Not only that, I would look for Yanni's nightingale song performed in his Tribute concert at the Taj Mahal and the Forbidden City. Then I would go on to discover Deborah Henson-Conant singing and playing her "Nightingale" song on the harp, as well as a great many of the covers. And I wont miss watching the recital of Keat's Ode to a Nightingale poem as well, and an animation of Andersen's Nightingale fairy tale no less.

But then, I couldn't help looking for a recording of the song of our little bird we call သပိတ်လွယ် (Oriental magpie-robin). For me, its song seems mellower and sweeter than a nightingale's. My apologies if it sounds like the words of some well-known western horticulturist or botanist that I had read a long time ago. He said that he won't care for all the cherimoyas of Peru and for him a firm apple or two would be fine!

Whether it is YouTube's purely enjoyable content or its more serious ones, most of the video pages carry informative, interesting, or thought-provoking comments. I guess they would be most valuable for serious YouTubers. Since the day I discovered the magic of natural language processing via R, I've been itching to try my hands at analyzing the infamous comments in our own Myanmar language on Facebook. But the NLP software, as I know, are presently based on English and English like languages where word is the element for the communication of meaning. Unfortunately our language has no equivalence for this. So, being an old-timer, I have no better alternative than to wander into YouTube's English-only comment space (at a shuffling pace). Bear with me because I am in a sort of alone in the wilderness situation with R as the only equipment in my survival kit.

Looking for an interesting YouTube video to start with, I enjoyed discovering the whole series of Senate Hearings of April 2018 (lasting more than five hours) of Mark Zukerberg, the Facebook boss. They were tremendously entertaining even if I couldn't understand their true significance. The exchanges between the Senators, Congressman/woman and Zukerberg were really exciting and there were a lot of intelligent (I guess) comments on these exchanges. However I am not going to touch them here because I dare not mess with the Myanmar Facebook community. Even so, I couldn't help noticing one particular video page with the title How does Facebook define hate speech? Zuckerberg dodges question. Its content would be highly informative, appropriate and timely for us and  it is unlikely to provoke suspicion or anger from our folks. Unfortunately, this page didn't allow any comments!

Meanwhile, I was feeling uneasy about the downhearted bunch of young people from the entire batch of fresh high-school graduates of this year. As usual, majority of the graduates would not make the grade for medical or engineering college, or information technology studies, or for business and management studies, and other popular schools. And most of these young people as well as their parents look like they are feeling lost and hopeless. May be Andersen's Ugly Duckling is just the right fairy tale to comfort them. Though this direct pep talk video (The Most Successful People Explain Why a College Degree is USELESS) at https://www.youtube.com/watch?v=e8QY0NDWqzk might be more appealing to the young people and their parents.

knitr::include_graphics("degreeUseless.jpg")
plot of chunk unnamed-chunk-1
Here I am sharing my experience of playing with data from this YouTube video available through the YouTube API. For that I am using the “tuber” package of R. This post shows how I got comments, captions (or subtitles or transcripts) and downloaded the thumbnail of the video.

I am leaving out the usual step of installing an R package (here, tuber). You'll also need to obtain from Google an authorization known as “oauth” to use data from YouTube videos.You should read about it at the appropriate Google website.

#  Get comments, captions, and thumbnail from a youtube video using the tuber package
## myint thann, Sept 09, 2018
library(tuber)
yt_oauth()
When you followed Google's instructions to obtain the oauth, you'll get your “client id” and “client secret”. For the first time you run yt_oauth like this:
yt_oauth(“client id”, “client secret”, token = “”)
and R will respond with:
Use a local file ('.httr-oauth'), to cache OAuth access credentials between R sessions?
1: Yes 2: No
If you choose yes, when you run yt_oauth at next session, you only need to use
yt_oauth().
Now we'll ask for some general information about our video. You get the id of the video from the url of the video page and it is the characters following “v= ” from https://www.youtube.com/watch?v=e8QY0NDWqzk for example.
get_stats(video_id="e8QY0NDWqzk")
## $id
## [1] "e8QY0NDWqzk"
## 
## $viewCount
## [1] "3467753"
## 
## $likeCount
## [1] "66742"
## 
## $dislikeCount
## [1] "3982"
## 
## $favoriteCount
## [1] "0"
## 
## $commentCount
## [1] "8342"
Download the video thumbnail
To get the video thumbnail we need to get its URL from the list returned by the request for video details. Here we take the high quality (640x480 pixels) thumbnail image.
x <- get_video_details(video_id = "e8QY0NDWqzk")
thq <- x[[4]][[1]][[4]][[5]][[3]][[1]]
We download the image to our working directory.
download.file(thq, destfile="DegreeUseless.jpg", mode="wb")
Get the video caption
A YouTube video has two caption tracks: ASR - A caption track generated using automatic speech recognition; standard - A regular caption track.To retrieve the caption we need to get the id of the desired track and then use it to get the caption.
cctrack <- list_caption_tracks(part = "snippet", video_id = "e8QY0NDWqzk")
# get captions from the Standard track
cc.2 <- get_captions(id = cctrack$id[2])
The caption is received as a raw data stream. It is converted to text output and saved to text file with:
cat(rawToChar(cc.2), file = "caption.txt")
If you omit the file parameter, all the captions will be displayed on the console. To show just a few lines of comment I wrote it to a text file and then read it back, and ask to show 5 time-slice/caption on the console:
print(scan(file = "caption.txt", what = character(),sep = "\n", nlines = 14,  
           blank.lines.skip = FALSE), quote = FALSE )
##  [1] 0:00:07.510,0:00:08.590                                                    
##  [2] Well, often times                                                          
##  [3]                                                                            
##  [4] 0:00:08.590,0:00:10.980                                                    
##  [5] Business Education today, and I see it all the time                        
##  [6]                                                                            
##  [7] 0:00:10.980,0:00:13.164                                                    
##  [8] Kids come out of college, the best colleges                                
##  [9]                                                                            
## [10] 0:00:13.200,0:00:17.160                                                    
## [11] Wharton and Harvard and Stanford and some of the great business schools and
## [12]                                                                            
## [13] 0:00:17.160,0:00:20.000                                                    
## [14] they'll come out and they won't have practical experience.
Well, you can see on the video that they were the words of President Trump.

Get all comment threads on the video page
The get_comment_threads() function give a data.frame with the following 12 columns:
“authorDisplayName”, “authorProfileImageUrl”, “authorChannelUrl”, “authorChannelId.value”, “videoId”,
“textDisplay”, “textOriginal”, “canRate” “viewerRating”, “likeCount”, “publishedAt”, “updatedAt”
cmmt <- get_comment_threads(c(video_id = "e8QY0NDWqzk"), max_results = 101)
nrow(cmmt)
## [1] 4000
Suppose we want to view the first 5 rows out of 4000 for authorDisplayName, publishedAt, and textOriginal, first we can extract a subset of the cmmt dataframe. Then format the text the way we want to see using paste() function.
cmmt5 <- cmmt[1:5, c(1,7,11)]
cmmt5.TMC <- paste('<< ', trimws(cmmt5$authorDisplayName),' >>', '[',  
                   cmmt5$publishedAt, '] ', trimws(cmmt5$textOriginal),
                   collapse = '\n\n')
Display the comments on the console using the cat() function.
cat(strwrap(cmmt5.TMC, width = 70), sep = '\n')
## << Motivation Madness >> [ 2017-10-12T16:40:51.000Z ] PLEASE READ -->
## Hi everyone, this is a completely different video than normal. Videos
## will resume back to normal on Monday with an EPIC video by Simon
## Sinek. I want to explain that College is a perfect solution to many,
## however to others it may not be a good fit. For myself, it was
## perfect, for one of my good friends, it wasn't a good fit. If you are
## currently in college, do not rely on that the piece of paper that you
## receive at the end to get you far, it is your own commitment and
## perseverance that will get you far. College is one of the best places
## on earth to develop networking connections with fellow students and
## professors, as well as create experiences that are extremely
## valuable. I want to emphasise that you don't need to go to the best
## and most expensive school to get the best education or be successful.
## My advice is to BECOME INVOLVED, make friends with as many people as
## possible, help others, and be true to yourself. I REPEAT, this is not
## a video saying that College is useless, but rather we put too much
## emphasis on a piece of paper, thinking that a degree is going to
## catapult us to great success. Make the most out of your time, go out
## there and make connections with other people, and take risks!
## 
## << Max Anguiano >> [ 2018-09-10T05:44:22.000Z ] According to ample
## research, most people with a college degree earn a higher income than
## those without one.
## 
## << HumbleWolf >> [ 2018-09-10T04:45:02.000Z ] The Reason Why You're
## Failing In All Aspect Of Life -
## https://www.youtube.com/watch?v=kPef2yhexAg
## 
## << fantamas06 >> [ 2018-09-10T04:17:13.000Z ] without a title of your
## education, no one will take you for a high tech /engineering job,
## even if you spend a lot of time educating self, and you know more
## than those, who completed colleges/ high-level schools.
## 
## << Justin Ajuogu >> [ 2018-09-10T00:37:08.000Z ] Well there's no harm
## in education, just do sum with it.

No comments:

Post a Comment