Tuesday, December 18, 2018

Teashop PI, ERRATA!


Last Sunday, my wife and I hosted my younger co-workers, The 150-Gang, for breakfast at their den near the great Shwedagon pagoda. As usual it was more of casual conversations than breakfasts.
Back home, I thought of giving them links to my posts on PI which I published here in October 2014.

To my horror, when I opened my post “Teashop PI” of October 9, 2014 I found the first four links to be wrong!

How come? The first four of them pointed to the preceding four posts of “Old Myanmar land area calculation”. Has nobody noticed that for more than four years? Or, do I failed totally as a blogger because nobody cared to read them?

Anyway, here are the corrected links:




Tuesday, December 11, 2018

Playing with inequality analysis using the DAD software- III


It was indeed a happy family reunion of Gini, Lorenz and DAD. Now, uncle R is inviting them for a free lunch.

I am doing the Gini coefficient and Lorenz curve, now with IC2 package of R. I have recently installed this package which one document describes as:

Description 
The package IC2 implements the computation of some indices of inequality and concentration. For each index, it provides decomposition between subgroups. Plotting of Lorenz and concentration curves are also available.

Details 
Three family of inequality indices are available: extended Gini, Atkinson and Generalized Entropy. Except for GEI, two different forms of decomposition are available. Ordinary as well as generalized Lorenz curves can be drawn. Sampling weights can be used.

Compute Gini coefficient with IC2
While creating the ASCII data file for importing into DAD software I had saved it also as an R data file ineq_HHdata.RData. The following R script created the coefficients:
# retrieve data
load(file="ineq_HHdata.RData")

# compute Gini coefficient
library(IC2)
Gincome <- calcSGini(xm$income, w = xm$hhweight)
Gconsump <- calcSGini(xm$consump, w = xm$hhweight)
cbind(Income = unlist(Gincome[[1]]), Consumption = unlist(Gconsump[[1]]))
##                    Income Consumption
## index.SGini     0.3192903   0.3066726
## parameter.param 2.0000000   2.0000000

Draw Lorenz curve with IC2
# draw Lorenz curves
curveLorenz(xm$income, w = xm$hhweight, xlab = "Percentiles (p)", ylab = "L(p)", col = "blue", lwd = 2)
curveLorenz(xm$consump, w = xm$hhweight, xlab = NA, ylab = NA, col = "red", lwd = 2, add = TRUE)
title(main = "Lorenz curve(s) by IC2")
legend("top",legend = c("Income", "Consumption"), cex = .8, lty = c(1,1), col = c("blue","red"), lwd = 2)
plot of chunk unnamed-chunk-2

Playing with inequality analysis using the DAD software- II


I will now compute Gini coefficient and Lorenz curve using the saved data in the DAD format in the file ineq_HHdata.dad created as shown in my last post.

Compute Gini coefficient with DAD
  1. I opened the data file: ineq_HHdata.dad. If you check the sample design in the Edit menu, you'll see that the design parameters Stratum, PSU, LSU, and Sampling Weight are there. When I select Gini/S Gini Index from the Inequality menu, the “Configuration of distributions” popup opens. Click OK and the dialog box for configuring the computation opens:

When I select “income” for Variable of Interest and click the Compute button, I get the Gini coefficient for income. I do the same for consump(tion). Now at the graph window I click the Save button and chose to save as html file. The html file shows:


Draw Lorenz curve with DAD
  1. I select Curve menu, then Lorenz, and then OK. Now I got to the Lorenze curve dialog box:

I select “income” for Variable of Interest, and click the Graph button. First, there is nothing in the graph window.

When I click “Draw”, the Lorenz curve for income is drawn. To get the 45 degree line I select Properties from the Tool menu, and tick Draw 45 degree Line, then click OK. I repeat this for consumption variable and draw both curves in the graph window. Then I add appropriate legend and title using the Tool/Properties menu and save the graph as a jpg file. The final result:




Sunday, December 9, 2018

Playing with inequality analysis using the DAD software- I

Recently an old friend who I haven't met for many years asked me to find a suitable software for him to play around with some income-expenditure data. It seems he had calculated the Gini coefficient and the like, by hand, from this data. May be that was forty years ago, perhaps in the process of preparing his economic-diploma term paper.

Long time ago, I remember giving him the DAD (Distributive Analysis) software, but seems he was too busy to do anything with it. Now, knowing that I dabble in R, he says he doesn't know R and want something simpler. So, I downloaded the latest version of DAD (version 4.6, 2010), tested it, and now I am ready to send him the software together with a few suggestions for getting started. At the same time, for the benefit of my fellow (old) dummies, and for our young people who would hopefully be curious about this business, I am posting my experience. As always, mostly for fun, and possibly for some use.

The DAD website says:
DAD is designed to facilitate the analysis and the comparisons of social welfare, inequality, poverty and equity across distributions of standard living. Its features include the estimation of a large number of indices and curves that are useful for distributive comparisons as well as the provision of asymptotic standard errors to enable statistical inference.
DAD: A Software for Distributive Analysis / Analyse Distributive. This programme is freely distributed and freely available. Please acknowledge its use by quoting it as Jean-Yves Duclos, Abdelkrim Araar and Carl Fortin, "DAD: a software for Distributive Analysis / Analyse Distributive", MIMAP programme, International Development Research Centre, Government of Canada, and CIRPÉE, Université Laval.


For importing data, DAD can read files with extensions: dat, txt, prn, or generally a file in ASCII format with any file extension if you would specify it. This post shares my experience of creating a data file that is readable by DAD and then importing it into it. Next post will show how Gini coefficient is calculated and Lorenz curve is drawn with DAD, and optionally with the IC2 software package of R.

However, getting the DAD software, by itself, is some story. I still have the DAD 4.4, but to get version 4.6, their website here asked me to register with them. The problem was that at the field where I need to select my country, I couldn't find either “Myanmar” or “Burma”, so the registration failed. Fortunately when I send an email informing about it to them, one of the authors of DAD, Mr. Abdelkrim Araar promptly and kindly advised me to use this website, and it worked. Thanks.

Creating ASCII data file for importing data into DAD 
(1) You can type in data using DAD's own spreadsheet-like interface. Or if the data is in Excel format, you can create ASCII data file by exporting to csv or prn format with its own export facility. Or you could covert non-ASCII data file using your pet software, for example, R.

(2) For our exercise, we will try to do the same kind of inequality analysis using two different software, (i) DAD, and, (ii) IC2 R-package. The World Bank's AdePT package not only can do inequality analysis, but also could do a lot more. I am not going to do our exercise with AdePT but will download an exercise dataset that comes with ADePT. That would allow us to check our results against those given by the ADePT's example report.

Getting and converting ADePT example data

(3) ADePT was downloaded from here. The download page includes example datasets “adept_example_data” for a number of countries from LSMS surveys. But the data are in dta (Stata) format.

(4) This dataset was imported into R and then exported as a tab limited ASCII file (to avoid problems with comma delimited csv file; such as for character data containing comma) as ineq_HHdata.txt and imported into DAD.

(5) Some variable names could be changed to English, e.g. “starost” = Age, “pol” = Gender, “srodstvo” = Relationship to Household head, “obrazovanje” = Education, and “aktivnost” = Economic status. But you could do it easily at database creation time in DAD, so I'll leave that as is.

(6) The dataset contains household as well as individual (household members) data. However, our purpose is to do some basic inequality analysis, and therefore we create a file containing only the household/household head data. This was done in R and exported to tab delimited ASCII data. Since DAD needs PSU id in addition to ids for stratum, and hhweight which were already in the data file, I assumed that each unique hhweight equates to a distinct PSU and went on to create the PSU ids. This was the R script:

# import Stata data file into R 
library(foreign)
INEQdata <- read.dta("adept_2002.dta")

# sort the data by household ID, and filter the rows only for HH head 
# using piping for the first time!
library(magrittr)
x <- INEQdata[order(INEQdata$id),]%>%
subset(srodstvo == "Head of the household") 
head(x)

##      mesto urban   region hhweight    income starost    pol
## 6392     1 Urban Belgrade 364.8254  1099.873      40   Male
## 2        1 Urban Belgrade 364.8254  3655.284      53   Male
## 6397     1 Urban Belgrade 364.8254  3481.157      48 Female
## 6402     1 Urban Belgrade 364.8254  9023.255      53   Male
## 6404     1 Urban Belgrade 364.8254  5155.616      79 Female
## 6406     1 Urban Belgrade 364.8254 10029.057      46   Male
##                   srodstvo                        obrazovanje aktivnost
## 6392 Head of the household                     Primary school  Employed
## 2    Head of the household  Vocational schools from 1-3 years  Employed
## 6397 Head of the household  Vocational schools from 1-3 years  Employed
## 6402 Head of the household  Vocational schools from 1-3 years  Employed
## 6404 Head of the household                        High school  Inactive
## 6406 Head of the household                     Primary school  Employed
##       pline_u  pline_l id   consump stratum
## 6392 6281.115 4643.848  1  2019.232       1
## 2    6281.115 4643.848  2  9011.678       1
## 6397 6281.115 4643.848  3  2954.915       1
## 6402 6281.115 4643.848  4 30448.318       1
## 6404 6281.115 4643.848  5 13631.313       1
## 6406 6281.115 4643.848  6 10990.324       1
# add PSU id for unique hhweight
xx <- x[, c(1:4,13,15)]%>%
extract(!duplicated(.$hhweight),)%>%
extract(order(.$stratum,.$id),)
xx$PSU <- seq.int(nrow(xx))
xx.1 <- xx[, c(4,7)]
xm <- merge(x, xx.1, by = "hhweight")%>%
extract(order(.$PSU,.$id),)
# preview the first six rows of data
head(xm)

##      hhweight mesto urban   region    income starost    pol
## 2896 364.8254     1 Urban Belgrade  1099.873      40   Male
## 2897 364.8254     1 Urban Belgrade  3655.284      53   Male
## 2898 364.8254     1 Urban Belgrade  3481.157      48 Female
## 2899 364.8254     1 Urban Belgrade  9023.255      53   Male
## 2900 364.8254     1 Urban Belgrade  5155.616      79 Female
## 2901 364.8254     1 Urban Belgrade 10029.057      46   Male
##                   srodstvo                        obrazovanje aktivnost
## 2896 Head of the household                     Primary school  Employed
## 2897 Head of the household  Vocational schools from 1-3 years  Employed
## 2898 Head of the household  Vocational schools from 1-3 years  Employed
## 2899 Head of the household  Vocational schools from 1-3 years  Employed
## 2900 Head of the household                        High school  Inactive
## 2901 Head of the household                     Primary school  Employed
##       pline_u  pline_l id   consump stratum PSU
## 2896 6281.115 4643.848  1  2019.232       1   1
## 2897 6281.115 4643.848  2  9011.678       1   1
## 2898 6281.115 4643.848  3  2954.915       1   1
## 2899 6281.115 4643.848  4 30448.318       1   1
## 2900 6281.115 4643.848  5 13631.313       1   1
## 2901 6281.115 4643.848  6 10990.324       1   1

# write data file in tab delimited ASCII format
write.table(xm, "ineq_HHdata.txt", row.names = FALSE, sep ="\t")

Using the DAD software
(7) When the DAD software is run, a spreadsheet appears. When I click the file open button and select the saved “ineq_HHdata.txt” file, the Data Import Wizard opens. Here, when I unselect the Space delimiter and select the Tab delimiter and First row includes name of variables,  the data is correctly displayed on the Data Import Wizard. Then I click Advanced and select dot for the delimiter:

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-4
This way, I get data into DAD database correctly and then I can go on to save the file in DAD format by clicking the save button (and saved as “ineq_HHdata.dad”):

plot of chunk unnamed-chunk-5
Before saving I specify the sample design by selecting Set sample Design from the Edit menu:

plot of chunk unnamed-chunk-6