Wednesday, October 28, 2015

JSON, who?

I had read about Jason and the golden fleece in my school boy days and now I remembered that vaguely as a piece of Greek mythology and nothing more. These help me refresh my memory:

I guess JSON is pronounced the same but the difference is that it is very real and becoming more and more visible on the Web.

The first time I had heard of JSON was about two years ago when I was looking for large data files of one Terabyte or more so that I could try playing with Big Data. Looking for sources of big data, I vaguely came to understand that big boys like Amazon or Google for example, could let me get such data in a format called JSON. It was the first time I heard of that name and I thought it must be terribly hard to learn and use it. So I dumped the idea of trying to get data that way. And I went for more traditional statistical data formats like text format, or SPSS format, or Stata format and ended up collecting a couple of sub-terabyte data files.

The official website ( described JSON as

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language ... JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

Stat 545 getting data from the Web – part 2 gives the example of JSON,

and XML:

I'm still surprised at how many people are unaware that 22 of the top federal agencies have data inventories of their public data assets, available in the root of their domain as a data.json file. This means you can go to many and there is a machine readable list of that agencies current inventory of public datasets.

I currently know of 22 federal agencies who have published data.json files

Looks like JSON is somewhat new, even in the U.S. A quick look at the situation of open data of various nations in July 2015 and looking at Hong Kong and Singapore I found that 20 datasets of Hong Kong were available in JSON, but looks like there's none in Singapore where the data sets were mostly in XML format.

Ten years ago, XML was the primary data interchange format. When it came on the scene, it was a breath of fresh air and a vast improvement over the truly appalling SGML (Standard Generalized Markup Language).

It enabled people to do previously unthinkable things, like exchange Microsoft Office documents across HTTP connections. With all the dissatisfaction surrounding XML, it’s easy to forget just how crucial it was in the evolution of the web in its capacity as a “Swiss Army Knife of the internet.”

But it’s no secret that in the last few years, a bold transformation has been afoot in the world of data interchange. The more lightweight, bandwidth-non-intensive JSON (JavaScript Object Notation) has emerged not just as an alternative to XML, but rather as a potential full-blown successor. A variety of historical forces are now converging and conspiring to render XML less and less relevant and to crown JSON as the privileged data format of the global digital architecture of the future. I think that the only question is how near that future is.

Well then, that inspired me to experiment with accessing JSON data and to play with it. I looked for R packages that would help me do it and I found the Jasonlite package. I installed it, and after some frustrating moments I was able to get two or three data files in JSON format converted to the standard data frame in R.
Looking for JSON data, one convenient source was the site hosted by the U.S. Government. From there I found the Biodiversity data by County for the State of New York. This is the R script I used for downloading the data and converting it into a data frame.

This exercise took me 91.3 seconds on my i5 laptop with 8GB RAM and Windows-7.
I dare you to mess with JSON. Who's afraid of JSON, anyway.

No comments:

Post a Comment