I had read about Jason and the golden
fleece in my school boy days and now I remembered that vaguely as a
piece of Greek mythology and nothing more. These help me refresh my
memory:
I guess JSON is pronounced the same but
the difference is that it is very real and becoming more and more
visible on the Web.
The first time I had heard of JSON was
about two years ago when I was looking for large data files of one
Terabyte or more so that I could try playing with Big Data. Looking
for sources of big data, I vaguely came to understand that big boys
like Amazon or Google for example, could let me get such data in a
format called JSON. It was the first time I heard of that name and I
thought it must be terribly hard to learn and use it. So I dumped the
idea of trying to get data that way. And I went for more traditional
statistical data formats like text format, or SPSS format, or Stata
format and ended up collecting a couple of sub-terabyte data files.
The official website (json.org)
described JSON as
JSON
(JavaScript
Object Notation) is a lightweight data-interchange format. It is easy
for humans to read and write. It is easy for machines to parse and
generate. It is based on a subset of the JavaScript
Programming Language
...
JSON is a text format that is completely language independent but
uses conventions that are familiar to programmers of the C-family of
languages, including C, C++, C#, Java, JavaScript, Perl, Python, and
many others. These properties make JSON an ideal data-interchange
language.
Stat 545 getting data from the Web – part 2 gives the example of JSON,
As of August 2014, 6482
datasets available across 22 Federal Agencies in Data.json files
in the U.S.
I'm
still surprised at how many people are unaware that 22 of the top
federal agencies have data inventories of their public data assets,
available in the root of their domain as a data.json file. This means
you can go to many example.gov/data.json and there is a machine
readable list of that agencies current inventory of public datasets.
I
currently know of 22 federal agencies who have published data.json
files
Looks like JSON is somewhat new, even
in the U.S. A quick look at the situation of open
data of various nations in July 2015 and looking at Hong Kong and
Singapore I found that 20 datasets of Hong Kong were available in
JSON, but looks like there's none in Singapore where the data sets
were mostly in XML format.
According to Amandeep
Singh, September 17, 2015:
Ten
years ago, XML was the primary data interchange format. When it came
on the scene, it was a breath of fresh air and a vast improvement
over the truly appalling SGML (Standard Generalized Markup Language).
It
enabled people to do previously unthinkable things, like exchange
Microsoft Office documents across HTTP connections. With all the
dissatisfaction surrounding XML, it’s easy to forget just how
crucial it was in the evolution of the web in its capacity as a
“Swiss Army Knife of the internet.”
But
it’s no secret that in the last few years, a bold transformation
has been afoot in the world of data interchange. The more
lightweight, bandwidth-non-intensive JSON (JavaScript Object
Notation) has emerged not just as an alternative to XML, but rather
as a potential full-blown successor. A variety of historical forces
are now converging and conspiring to render XML less and less
relevant and to crown JSON as the privileged data format of the
global digital architecture of the future. I think that the only
question is how near that future is.
Well then, that inspired me to
experiment with accessing JSON data and to play with it. I looked
for R packages that would help me do it and I found the Jasonlite
package. I installed it, and after some frustrating moments I was
able to get two or three data files in JSON format converted to the
standard data frame in R.
Looking for JSON data, one convenient
source was the Data.gov site hosted by the U.S. Government. From
there I found the Biodiversity
data by County for the State of New York. This is the R script I
used for downloading the data and converting it into a data frame.
This exercise took me 91.3 seconds on
my i5 laptop with 8GB RAM and Windows-7.
I dare you to mess with JSON. Who's
afraid of JSON, anyway.