Sunday, September 23, 2018

Shuffling into YouTube's comment space -II(2): Creating subtitles if there is none


After my success in downloading the video embedded with subtitles as described in my last post, I tried to do in exactly the same way for another YouTube video: Text Mining (part 3) - Sentiment Analysis and Wordcloud in R (single document). The video was from the Jayalar Academy and it motivated me to find a video downloader capable of embedding subtitles in the first place.

Here goes:

D:\yt-dl\youtube-dl.exe --write-sub --embed-subs -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]" https://www.youtube.com/watch?v=JM_J7ufS-BU&t=889s


That video doesn't have subtitles? I'd watched it played with subtitles on the YouTube page and so I checked with the R tuber package to see if there is any caption. Sure enough there is none! So there is some way for the video author to prevent others from downloading the original captions, I guess. Anyway, I tried using --write-auto-sub instead of --write-sub and it works.


Playing the downloaded video with VLC media player, you can see it works:


What happened, I guess, was that youtube-dl created the subtitles itself. The command
--write-auto-sub created a subtitle file with extension vtt, and --embed-subs put the subtitles into the video.

Actually, I didn't hit the right set of commands as effortlessly as they appeared in this post and the last. Two false leads were notable. Instead of the full command-line,
D:\yt-dl\youtube-dl.exe --write-auto-sub --embed-subs -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]" https://www.youtube.com/watch?v=JM_J7ufS-BU&t=889s

(1) omitting –write-auto-sub (or if the video has subtitles, --write-subThe video is downloaded, but no subtitle file is produced and the result is the message: [ffmpeg] There aren't any subtitles to embed

    (2) omitting -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]"
    Produces a different video file format than mp4.

The webm file produced could be opened with the Internet Explorer which shows the “cc” button for displaying subtitles. But the output is garbled:


Opening with VLC player gives the same kind of result. On the other hand, the webm video format is described as newer than the mp4 format. Yet for now, using youtube-dl, I'll stick to the mp4 format by specifying as: -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]". I hit this solution through the WARNING: Requested formats are incompatible for merge and will be merged into mkv message I got when I run: 
d:\yt-dl\youtube-dl.exe --embed-subs https://www.youtube.com/watch?v=e8QY0NDWqzk

Luckily I found the reason for that warning and the solution for it from the ffmpeg mailing list here:
"Most probably, youtube-dl defaulted to "bestvideo+bestaudio". That
could result in webm video and m4a audio. youtube-dl cannot merge webm
into mp4, therefore chooses mkv. That's all.

Actually, I think youtube-dl's warning message is confusing or wrong (I
can post a bug ticket): It says it "cannot merge", therefore it merges?
I believe it means "cannot merge to (default format) MP4, therefore
choosing MKV".
...
BTW, to force a "pure" MPEG video, use:
$ youtube-dl -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]"
(when actually downloading from YouTube and not one of the other 5000
sites the tool supports)."

No comments:

Post a Comment