Sunday, October 14, 2018

Shuffling into YouTube's comment space -II(3): Creating automatic-subtitles for a ripped DVD


 After my last post dealing with creating automatic subtitles for YouTube video, I was thinking about playing with sentiment analysis of the comments. But then, I happened to look up my post: “Surreal misquotes” of October 31, 2014 recalling that I had complained of the lack of transcriptions for Professor Hla Myint talk, back in 2012, on Myanmar's development efforts and pace, and potential solutions in the context of insufficient administrative capacity. Professor Myint's talk was the main theme of my post and I was trying to quote him which I managed to do by listening to the audio many times and at different speeds using the Audacity software. Now, six years later, I suddenly got this idea that it might be possible to automatically generate subtitles from the video file itself. And it sidetracked me into looking for some ways to do just that!

Before exploring that idea, someone more intelligent than me would have re-read the documentation to make sure that youtube-dl software that I would be using could indeed handle non-YouTube video files. Ignoring that issue, the plan seems clear: (i) rip Professor Myint's part of talk from the dvd, and (ii) process the resulting video file to create subtitles as in my last post. The following screen shot shows that it worked:


However, there were a few lessons my fellow dummies would benefit from.

Ripping Professor Myint's part of talk from the DVD

The VLC media player could do the ripping, but I know that it is slow and I understand that it would take as much time as the playback, which will be a little over 27 minutes for this job. So I looked for free software and found Winx DVD Ripper Free among a host of others. It was fast, and easy to use as the reviews said, but it is just a trial version and would only rip 5 minutes' duration of the DVD, as I found out too late! So I used handbrake and it was trouble-free. But it took 35 minutes which may even be slower than ripping with the VLC player, though I haven't verified that.


Processing the ripped video file with youtube-dl
  1. Making the file accessible to youtube-dl
For youtube-dl to access the ripped video file, the address of the video file need to be given in the URL form. So I opened it in my Chrome browser and took the address from the address bar. This gives the URL of file on the local system in this format: file:///filepath/filename. But when I tried to access this video file, I got this warning and can't go on:

WARNING: Could not send HEAD request to file:///C:/Users/MTNN/profMyint.mp4: <urlopen error file:// scheme is explicitly disabled in youtube-dl for security reasons>

Then I realized that I could use some cloud storage to overcome this problem. So I uploaded my video file to Dropbox and then youtube-dl has no problem reading that file.
  1. There aren't any subtitles
But when I tried to automatically generate subtitles from this file using “--write-auto-sub –embed-subs” the video is downloaded to my laptop, but there is this message: [ffmpeg] There aren't any subtitles to embed. That was because in the first place I naively thought that youtube-dl itself writes automatic-subtitles when there is none in a video file. Luckily, after some homework, I found out that, in fact, YouTube automatically generate subtitles for all videos uploaded to it. Now, my task is to upload my video to YouTube.

  1. Uploading video file to YouTube
The video file size that YouTube accepts by default is up to 15 minutes long. Mine was 27plus minutes so it was rejected. Luckily YouTube gives you the option to increase that limit by letting it verify your Google account and I had no trouble doing that.

  1. Accessing and processing this YouTube video file with youtube-dl
Since this video file is strictly for private use I gave its sharing option as private. However, when I run youtube-dl to access this file I got this message:
WARNING: Unable to extract video title
ERROR: This video is unavailable.
Lucky again! I found this solution from Dave Parrish in his post: HOW TO DOWNLOAD PRIVATE VIDEOS FROM YOUTUBE WITH YOUTUBE-DL. The problem, as he explained, was that youtube-dl couldn't handle YouTube's two factor authentication. The workaround is to create a cookie (newcookiefile.txt) following his example. I used the cookie so created to access my private video, and embed the automatically created subtitles like this:
d:\YT-DL\youtube-dl.exe --cookies=newcookiefile.txt –write-auto-sub –embed-subs https://youtu.be/Lxgpz2NGjus

Here you need to go through two intermediate steps. First, for the creation of the cookie file, you can use the EditThisCookie plugin for Chrome browser, which you can get from the chrome web store.
Next, the cookie you have created using it need to be converted to a format that youtube-dl could use by using the curl software. 


I downloaded it from here. Then you can follow the steps given by Dave Parrish. However, there is one problem with his syntax of using curl here:
      curl -b cookiefile.txt --cookie-jar newcookiefile.txt '/https://youtube.com'

The problem was with the quotes in the URL. I used plain https://www.youtube.com.

The final result


I found that the automatic speech to text conversion wasn't that perfect. For example it should be “administrative” instead of “atmospheric” in the screenshot above. In fact, I found lot more funny renderings than that in the entire video.

But you can see that it would be vastly easier for someone to correct the flawed subtitles than to start from scratch. Here, all you need to do is to ask youtube-dl to retain the subtitle text file (the file with vtt extension) in the process of embedding the subtitles (or ask it to create the subtitle file separately), and listen to the professors' speech hard and modify the text as required!

Credit: The talk was sponsored/DVD produced by UMFCCI and MIEGA, Myanmar.