Wednesday, December 6, 2017

Confessions of a dumb blogger - IV: reproducible science


As young civil servant recruits, one of us talked about the Communist Manifesto at least once. It was such a long time I couldn't remember who introduced that topic and as usual we weren't that serious. In those days we must have been pretending to be the intelligentsia of the times hotly debating politics, arts, poverty, and science over a few cups of tea offered, almost every afternoon, by an artist friend.

It's forty-plus years since I've heard about another manifesto - A manifesto for reproducible science (NATURE HUMAN BEHAVIOUR 1, 0021 (2017) | DOI: 10.1038/s41562-016-0021). This time without company, peering hard at my laptop screen while sipping through a cup of instant white coffee brought back from Viet Nam by a family friend. This time I was looking through the search results on reproducible research and open science and this time too I was no smarter than the last time I was being exposed to “big” ideas.

Anyway, true to my philosophy of easier done than said, I would skim the related topics starting with Wikipedia or whatever catches my eyes first. When I got a little bit of a feeling of what the field looks like I would start looking for what I could do hands-on. I would say that this is the second step of the celebrated dictum for acquiring knowledge in our Myanmar language—suu-tuu-pyuu. To quote myself from an earlier post:

Learning Machine-Learning (or for that matter Data Mining, or Statistics, or …) is obviously easier said than done. Yet we may approach any kind of learning through a three-step process: စု- တု-ပြု(suu-tuu-pyuu) or accumulate-imitate-create, as we Myanmars used to say. Turning our conventional wisdom upside down, I would now suggest that at least for the accumulation step, it could be easier done than said!

I won't fumble with my own words to say something about open science and reproducible research because that would be quite useless. But I think I could say that what reproducible research, open science, and the like are doing is essentially asking the travelers before us to leave their tracks and stick their notes for a journey we have not done before and for “Which to discover we must travel too” as Rubaiyat puts it:


But shouldn't there be exceptions to “Not one returns to tell us of the Road ” relating to less sublime aspects of our lives? Within the vulgarities of our lives, or more importantly in science, to be really helpful should we not resist our selfish habit to share the fish only for the pot, but hide the spot where we cast our nets (ဟင်းစားဘဲပေးမယ် ကွန်ချက်တော့မပြဘူး)? And I know at least this example from Christopher Gandrud:


He went as far as to let you reproduce the book from its source code! Truly a technological “frying the carp with its own fat”(ငါးကြင်းဆီနဲ့ငါးကြင်းကြော်). The book carries this message:

      1. Reproduce this book
This book practices what it preaches. It can be reproduced. I wrote the
book using the programs and methods that I describe. Full documentation
and source files can be found at the book’s GitHub repository. Feel free to
read and even use (within reason and with attribution, of course) the book’s
source code. You can find it at: https://GitHub.com/christophergandrud/
Rep-Res-Book.

If you can't go as far as Gandrud you could try to imitate him in a small way (I'm veering away from Gandrud because I haven't read him yet). Here's a new facility called R Notebooks added to RStudio in October 2016. The following excerpt from “Why I love R Notebooks” by R Views neatly summarizes what I want to know about the R Notebook and how I could use it:

So I tried to do data science with R Notebook, in order to (i) tell you about the details of creating the parallel coordinates plots shown in my post “Playing with microdata”, (ii) show you the output produced by executing the chunks of code written for that purpose, and (iii) share all those chunks of code in a single, reproducible document.

It went well and the resulting HTML file could be opened in any browser from where you could download the Rmd file:


However, posting of this HTML output to Blogger is not straightforward. On this problem Bart Rogiers, as recently as June of this year, wrote that you need to be an HTML/JavaScript/CSS expert to be able to publish HTML document as a Blogger post without destroying the formatting. His workaround consisted of this workflow: (i) Create your R script (ii) Compile your notebook (iii) Get self-contained html body code (without images) (iv) Modify image urls (v) Further modify HTML if necessary (vi) Publish.

Luckily, Kyle's workaround that I used for my previous post “Confessions of dumb blogger - II” still works, but not as good as Rogiers would like. All I need to know is how to, (i) take the Rmd file from the R Notebook creation process, and (ii) run the following code chunk:

# create "clean" HTML from Rmd file
library(knitr)
library(markdown)
knit("parCoord_notebook.Rmd") # produces a .md file
markdownToHTML("parCoord_notebook.md", "parCoord_notebook.html",         fragment.only=TRUE) # produces clean .html

The HTML file so produced couldn't be exactly like the R Notebook HTML file shown in the picture above, but comes complete with narrative, codes and images (plots), but without the heading of the post, namely, “R Notebook version of my parallel coordinates plots”, and without the button on the page that allows you to download the Rmd file. This Blogger-HTML version of my Notebook will be published as my next post. The only formatting I've done to this file before posting is to open it in Notepad++ and use its facility to remove blank lines.

The Rmd file is what makes the R Notebook fully reproducible. When you run it in the RStudio, the complete R Notebook is produced. RStudio provides services for you to share documents publicly or privately. You may as well use email or some cloud service to share them.

I've uploaded the full HTML file of the Notebook (parCoord_notebook.nb.html) to the Google Drive. You could download it from here.


No comments:

Post a Comment