As young civil servant recruits, one of
us talked about the Communist Manifesto at least once. It was such a
long time I couldn't remember who introduced that topic and as usual
we weren't that serious. In those days we must have been pretending
to be the intelligentsia of the times hotly debating politics, arts,
poverty, and science over a few cups of tea offered, almost every
afternoon, by an artist friend.
It's forty-plus years since I've heard
about another manifesto -
A
manifesto for reproducible science
(NATURE
HUMAN BEHAVIOUR 1,
0021 (2017) |
DOI:
10.1038/s41562-016-0021).
This time without company, peering hard at my laptop
screen while sipping through a cup of instant white coffee brought
back from Viet Nam by a family friend. This time I was looking
through the search results on reproducible research and open
science and this time too I was
no smarter than the last time I was being exposed to “big” ideas.
Anyway, true to my philosophy of easier
done than said, I would skim the related topics starting with
Wikipedia or whatever catches my eyes first. When I got a little bit
of a feeling of what the field looks like I would start looking for
what I could do hands-on. I would say that this is the second step of
the celebrated dictum for acquiring knowledge in our Myanmar
language—suu-tuu-pyuu. To quote myself from an earlier post:
Learning
Machine-Learning
(or for
that matter Data Mining, or Statistics, or …) is obviously easier
said than done.
Yet we may approach any kind of learning through a three-step
process:
စု-
တု-ပြု(suu-tuu-pyuu)
or
accumulate-imitate-create,
as we Myanmars used to say. Turning our conventional wisdom upside
down, I would now suggest that at least for the accumulation step, it
could be easier
done than said!
I won't fumble with my own words to say
something about open science and reproducible research because that
would be quite useless. But I think I could say that what
reproducible research, open science, and
the like are doing is essentially asking the travelers before
us to leave their tracks and stick their notes for a journey we have
not done before and for “Which
to discover we must travel too” as Rubaiyat
puts it:
But
shouldn't there be exceptions to “Not
one returns to tell us of the Road
” relating to
less sublime aspects of our lives? Within the vulgarities of our
lives, or more importantly in science, to be really helpful should we
not resist our selfish habit to share
the fish only for the pot, but hide the spot where we cast our nets
(ဟင်းစားဘဲပေးမယ်
ကွန်ချက်တော့မပြဘူး)?
And I know at least this example from Christopher Gandrud:
He went as far as to let you reproduce
the book from its source code! Truly a technological “frying the
carp with its own fat”(ငါးကြင်းဆီနဲ့ငါးကြင်းကြော်).
The book carries this message:
- Reproduce this book
This
book practices what it preaches. It can be reproduced. I wrote the
book
using the programs and methods that I describe. Full documentation
and
source files can be found at the book’s GitHub repository. Feel
free to
read
and even use (within reason and with attribution, of course) the
book’s
source
code. You can find it at: https://GitHub.com/christophergandrud/
Rep-Res-Book.
If you can't go as far as Gandrud you
could try to imitate him in a small way (I'm veering away from
Gandrud because I haven't read him yet). Here's a new facility called
R Notebooks added to RStudio in October 2016. The
following excerpt from “Why
I love R Notebooks”
by R Views neatly
summarizes what I want to know about the R Notebook and
how I could use it:
So I tried to do data science with R
Notebook, in order to (i) tell you about the details of creating the
parallel coordinates plots shown in my post “Playing with
microdata”, (ii) show you the output produced by executing the
chunks of code written for that purpose, and (iii) share all those
chunks of code in a single, reproducible document.
It went well and the resulting HTML
file could be opened in any browser from where you could download the
Rmd file:
However, posting of this HTML output to
Blogger is not straightforward. On this problem Bart Rogiers, as
recently as June of this year, wrote that you need to be an
HTML/JavaScript/CSS expert to be able to publish HTML document as
a Blogger post without destroying the formatting. His workaround
consisted of this workflow: (i) Create your R script (ii) Compile
your notebook (iii) Get self-contained html body code (without
images) (iv) Modify image urls (v) Further modify HTML if necessary
(vi) Publish.
Luckily, Kyle's workaround that I used
for my previous post “Confessions of dumb blogger - II” still
works, but not as good as Rogiers would like. All I need to know is
how to, (i) take the Rmd file from the R Notebook creation process,
and (ii) run the following code chunk:
#
create "clean" HTML from Rmd file
library(knitr)
library(markdown)
knit("parCoord_notebook.Rmd")
# produces a .md file
markdownToHTML("parCoord_notebook.md",
"parCoord_notebook.html", fragment.only=TRUE) # produces
clean .html
The HTML file so produced couldn't be exactly like the R Notebook HTML file shown in the picture above, but comes complete with narrative, codes and images (plots), but without the heading of the post, namely, “R Notebook version of my parallel coordinates plots”, and without the button on the page that allows you to download the Rmd file. This Blogger-HTML version of my Notebook will be published as my next post. The only formatting I've done to this file before posting is to open it in Notepad++ and use its facility to remove blank lines.
The Rmd file is what makes the R Notebook fully reproducible. When you run it in the RStudio, the complete R Notebook is produced. RStudio provides services for you to share documents publicly or privately. You may as well use email or some cloud service to share them.
I've uploaded the full HTML file of the
Notebook (parCoord_notebook.nb.html) to the Google Drive. You
could download it from here.
No comments:
Post a Comment