Thursday, November 5, 2015

Dummy meets JSON – II


Before I tried out my dumb approach shown in my last post, I guess I had the workable idea to flatten the second column whose name is “apis” and which is a list of 102 data frames each with two variables, namely, “name” and “url”, with variable number of rows.

To do so, I need to find way to convert the list of 102 data frames into a single data frame and then combine this with the first column of the original data frame. For that I found this work perfectly:
This single data frame consisted of 421 rows and to combine with x$name which has 102 rows, I could create an id column in the collapsed data frame and using row.names of x as integer I could then merge the two data frames. Then I could extract the appropriate columns of the merged data frame to write to a csv file. The problem was that I didn't know how to add id to the collapsed data frame. That's why I resorted to my dumb approach given in my last post.

Now I found that I could add id to the list of data frames x$apis (or x[[2]]) with:
That let me get a not-so-dumb solution as follows:

But wait, the R gurus would like to see something more elegant than this. Instead of the “for” loop in line number 9 of my script, they would like to see something that is not using a loop. Well, it is beyond my reach right now.


Wednesday, November 4, 2015

Dummy meets JSON - I


It must have been a not-so-intelligent autodidact's typical approach to something new for him.

There's this website posting some data in JSON format. I stumbled upon it. More accurately it was a series of stumbling. I tried to find out more about open data on the web; stumbled upon APIs (application programming interface); couldn't understand much; looked for examples; found more about JSON; stumbled upon Data.gov website; looked for ways to access data in JSON format; found and installed jasonlite an R package; found some example scripts for accessing JSON with jasonlite; got a couple of successes with simple JSON data like the one on biodiversity of New York State demonstrated in my last post.

Well, that wasn't the end of my story. From the API Resources for Federal Agencies available here, I downloaded the individual_apis.json file. The following is a part of the file opened by notepad and you can read and understand everything written there.


With the following code I could access the JASON file from R.

The code head(x) gives the first six rows of x;


Moving the slider in the scroll bar to the right shows the remaining part.


All this seems fine. So I tried to write the whole of “x” to a comma separated value (csv) text file.

# write csv file
write.csv(x, file = "individual_apis.csv", row.names = FALSE)

It couldn't be done. The error message was:

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'

I tried looking for possible solution on the web and tried out many that seemed promising. But failed miserably. Finally I found a post by Mark Needham (R: write.csv – unimplemented type ‘list’ in ‘EncodeElement’) that showed the remedy for a problem with the error message that exactly matches mine. He advised that:

If we do have a list that we want to add to the data frame we need to convert it to a vector first so we don’t run into this type of problem”

In our case, as well as his, the data frame consisted of two columns and the second column happened to be a list instead of a vector. The difference between his data frame and mine was that his second column consisted of data elements when converted to a vector has the same number of elements as the first column, whereas my second column consisted of a list of data frames each with two columns and variable number of rows. And I don't have enough knowledge of R to be able to adapt his method to my more complex situation.
Then as my second and third screenshots above showed, the whole of my data frame could be displayed on the screen properly. So in a typical dummy way I realized that if I could transfer that output to a text file it would be the solution. The only way I know of for that is to use the “sink” function and after some struggle I succeeded. My script runs:

Then the saved text file is read in two parts, cleaned, combined into one data frame, and csv file written and is done (credit due to f3lix for the answer to How to trim leading and trailing whitespace in R? on Stack Overflow).
.

However, I couldn't be triumphant for long because when I compared contents of the csv file with the output on the console I found at once a discrepancy. The console displayed for the 8th row:

Cropland Data Layer, Quick Stats API, http://www.nass.usda.gov/research/Cropland/sarsfaqs2.html#_Cropscape1.2, http://quickstats.nass.usda.gov/api

while individual_apis.csv gives:

Cropland Data Layer, Quick Stats API, http://www.nass.usda.gov/research/Cropland/sarsfaqs2.html.


So it missed #_Cropscape1.2, http://quickstats.nass.usda.gov/apiand I couldn't get that right. Besides there could be lot more misses or some other errors which means I'll have to look for a better way.