Wednesday, November 4, 2015

Dummy meets JSON - I


It must have been a not-so-intelligent autodidact's typical approach to something new for him.

There's this website posting some data in JSON format. I stumbled upon it. More accurately it was a series of stumbling. I tried to find out more about open data on the web; stumbled upon APIs (application programming interface); couldn't understand much; looked for examples; found more about JSON; stumbled upon Data.gov website; looked for ways to access data in JSON format; found and installed jasonlite an R package; found some example scripts for accessing JSON with jasonlite; got a couple of successes with simple JSON data like the one on biodiversity of New York State demonstrated in my last post.

Well, that wasn't the end of my story. From the API Resources for Federal Agencies available here, I downloaded the individual_apis.json file. The following is a part of the file opened by notepad and you can read and understand everything written there.


With the following code I could access the JASON file from R.

The code head(x) gives the first six rows of x;


Moving the slider in the scroll bar to the right shows the remaining part.


All this seems fine. So I tried to write the whole of “x” to a comma separated value (csv) text file.

# write csv file
write.csv(x, file = "individual_apis.csv", row.names = FALSE)

It couldn't be done. The error message was:

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'

I tried looking for possible solution on the web and tried out many that seemed promising. But failed miserably. Finally I found a post by Mark Needham (R: write.csv – unimplemented type ‘list’ in ‘EncodeElement’) that showed the remedy for a problem with the error message that exactly matches mine. He advised that:

If we do have a list that we want to add to the data frame we need to convert it to a vector first so we don’t run into this type of problem”

In our case, as well as his, the data frame consisted of two columns and the second column happened to be a list instead of a vector. The difference between his data frame and mine was that his second column consisted of data elements when converted to a vector has the same number of elements as the first column, whereas my second column consisted of a list of data frames each with two columns and variable number of rows. And I don't have enough knowledge of R to be able to adapt his method to my more complex situation.
Then as my second and third screenshots above showed, the whole of my data frame could be displayed on the screen properly. So in a typical dummy way I realized that if I could transfer that output to a text file it would be the solution. The only way I know of for that is to use the “sink” function and after some struggle I succeeded. My script runs:

Then the saved text file is read in two parts, cleaned, combined into one data frame, and csv file written and is done (credit due to f3lix for the answer to How to trim leading and trailing whitespace in R? on Stack Overflow).
.

However, I couldn't be triumphant for long because when I compared contents of the csv file with the output on the console I found at once a discrepancy. The console displayed for the 8th row:

Cropland Data Layer, Quick Stats API, http://www.nass.usda.gov/research/Cropland/sarsfaqs2.html#_Cropscape1.2, http://quickstats.nass.usda.gov/api

while individual_apis.csv gives:

Cropland Data Layer, Quick Stats API, http://www.nass.usda.gov/research/Cropland/sarsfaqs2.html.


So it missed #_Cropscape1.2, http://quickstats.nass.usda.gov/apiand I couldn't get that right. Besides there could be lot more misses or some other errors which means I'll have to look for a better way.  

1 comment:

  1. Nov. 7: Found unreadable piece of code after the first screenshot (has to be replaced with an image); and missing third screenshot. Corrected. My apologies.

    ReplyDelete