I had known a bright young nuclear scientist trained locally who went
abroad a long time ago to learn about handling nuclear waste. Coming
from a rural family of farmers, he told me that he had quite some
trouble using a tea-bag properly for his first breakfast at the
hotel. Then I recalled myself watching a gentleman from Sri-Lanka
making tea over a conversation. It was some five or six years earlier
when I was in the Pacific. There I learned that the secret of good
tea is to add one more spoon of creamer—for the cup—to the two
spoonfuls I had added for myself! Yet, despite some time I spent
outside of Myanmar I wouldn't be able to set the table myself because
I don't know the proper places for knives, forks, or spoons and I'm
surely ignorant of which brand of tea should go with which occasion.
It's my karma and in no way could I have attended a classy high
school or have a proper upbringing, if you like. But I won't regret.
For
that matter, I never owned a car, or learned to drive one. I'm just
happy that I learned to use computers and smart phones a bit. I am
happy that I could share things with ordinary folks and that matters.
Then
there is one website I would like to share. I've heard that it could
replace some technical journals which you won't get free. Welcome
open access. Welcome arXiv.
According
to ProgrammableWeb
-
The
Cornell University e-print arXiv, hosted at arXiv.org, is a document
submission and retrieval system used by the physics, mathematics and
computer science communities. It has become the primary means of
communicating manuscripts on current and ongoing research. The arXiv
repository is available worldwide. Manuscripts are often submitted to
the arXiv before they are published by more traditional means. In
some cases they may never be submitted or published elsewhere. The
purpose of the arXiv API is to allow programmatic access to the
arXiv's e-print content and metadata.
The
aRxiv is an R interface to the arXiv API and the
arXiv API does not require an API key. You can install aRxiv
from any of the cran mirrors.
Below is my R script for finding papers
on machine learning published on arXiv from the
beginning of 2010, and viewing abstracts on the arXiv website.
Once on the abstract page you could download the full paper by using
the link provided. On the other hand you could save the search
results for the arXiv papers to a text file. Then you could
conveniently view them with a spreadsheet.
When you run the script, you
will see your commands and outputs, and error messages, if any,
displayed one after another on the R console. I assume you know how
to save them to a text file for reference. Near the end of the
script you'll see the arxiv_open() function. That will open
the abstract page shown below and here you can see that you could
download the pdf file of the full paper:
After the run, you should also find the
ML_aRxSrch.csv file you've
saved in your working directory. Part of it is shown below opened
with the Open Office Calc spreadsheet. Some rows were hidden for
convenience in displaying:
My purpose here and in some earlier
posts is to introduce (to myself as well as others) some tools that
could take advantage of the web APIs provided by publishers, Q/A
sites, social media, and others. For simplicity I've been leaving the
contents of their results untouched. However, I couldn't help
feeling inspired by an instance of the struggle for ever greater
understanding of older vs. newer ideas as I read this abstract:
Neuroimaging research has
predominantly drawn conclusions based on classical statistics,
including null-hypothesis testing, t-tests, and ANOVA. Throughout
recent years, statistical learning methods enjoy increasing
popularity, including cross-validation, pattern classification, and
sparsity-inducing regression. These two methodological families used
for neuroimaging data analysis can be viewed as two extremes of a
continuum. Yet, they originated from different historical contexts,
build on different theories, rest on different assumptions, evaluate
different outcome metrics, and permit different conclusions. This
paper portrays commonalities and differences between classical
statistics and statistical learning with their relation to
neuroimaging research. The conceptual implications are illustrated in
three common analysis scenarios. It is thus tried to resolve possible
confusion between classical hypothesis testing and data-guided model
estimation by discussing their ramifications for the neuroimaging
access to neurobiology.
Despite my complete lack of idea on
neuroimaging or neurobiology I sensed the promise that such
improvements in knowledge would benefit our well-being
somehow—collectively or individually. On the side, my feeling is
that reading such abstracts could be something more than an idle
recreation for the non-specialist. The bottom-line is that it may
broaden our knowledge base or simply leave us with an appreciation of
good things done (provided of course, that we could make any sense
out of a given abstract).
Back to the basics: to play
around with aRxiv package, a good start will be to read the vignette
“aRxiv tutorial” in the help pages of “Html help” accessible
from your R console. You could also download this tutorial here.