Sunday, December 9, 2018

Playing with inequality analysis using the DAD software- I

Recently an old friend who I haven't met for many years asked me to find a suitable software for him to play around with some income-expenditure data. It seems he had calculated the Gini coefficient and the like, by hand, from this data. May be that was forty years ago, perhaps in the process of preparing his economic-diploma term paper.

Long time ago, I remember giving him the DAD (Distributive Analysis) software, but seems he was too busy to do anything with it. Now, knowing that I dabble in R, he says he doesn't know R and want something simpler. So, I downloaded the latest version of DAD (version 4.6, 2010), tested it, and now I am ready to send him the software together with a few suggestions for getting started. At the same time, for the benefit of my fellow (old) dummies, and for our young people who would hopefully be curious about this business, I am posting my experience. As always, mostly for fun, and possibly for some use.

The DAD website says:
DAD is designed to facilitate the analysis and the comparisons of social welfare, inequality, poverty and equity across distributions of standard living. Its features include the estimation of a large number of indices and curves that are useful for distributive comparisons as well as the provision of asymptotic standard errors to enable statistical inference.
DAD: A Software for Distributive Analysis / Analyse Distributive. This programme is freely distributed and freely available. Please acknowledge its use by quoting it as Jean-Yves Duclos, Abdelkrim Araar and Carl Fortin, "DAD: a software for Distributive Analysis / Analyse Distributive", MIMAP programme, International Development Research Centre, Government of Canada, and CIRPÉE, Université Laval.


For importing data, DAD can read files with extensions: dat, txt, prn, or generally a file in ASCII format with any file extension if you would specify it. This post shares my experience of creating a data file that is readable by DAD and then importing it into it. Next post will show how Gini coefficient is calculated and Lorenz curve is drawn with DAD, and optionally with the IC2 software package of R.

However, getting the DAD software, by itself, is some story. I still have the DAD 4.4, but to get version 4.6, their website here asked me to register with them. The problem was that at the field where I need to select my country, I couldn't find either “Myanmar” or “Burma”, so the registration failed. Fortunately when I send an email informing about it to them, one of the authors of DAD, Mr. Abdelkrim Araar promptly and kindly advised me to use this website, and it worked. Thanks.

Creating ASCII data file for importing data into DAD 
(1) You can type in data using DAD's own spreadsheet-like interface. Or if the data is in Excel format, you can create ASCII data file by exporting to csv or prn format with its own export facility. Or you could covert non-ASCII data file using your pet software, for example, R.

(2) For our exercise, we will try to do the same kind of inequality analysis using two different software, (i) DAD, and, (ii) IC2 R-package. The World Bank's AdePT package not only can do inequality analysis, but also could do a lot more. I am not going to do our exercise with AdePT but will download an exercise dataset that comes with ADePT. That would allow us to check our results against those given by the ADePT's example report.

Getting and converting ADePT example data

(3) ADePT was downloaded from here. The download page includes example datasets “adept_example_data” for a number of countries from LSMS surveys. But the data are in dta (Stata) format.

(4) This dataset was imported into R and then exported as a tab limited ASCII file (to avoid problems with comma delimited csv file; such as for character data containing comma) as ineq_HHdata.txt and imported into DAD.

(5) Some variable names could be changed to English, e.g. “starost” = Age, “pol” = Gender, “srodstvo” = Relationship to Household head, “obrazovanje” = Education, and “aktivnost” = Economic status. But you could do it easily at database creation time in DAD, so I'll leave that as is.

(6) The dataset contains household as well as individual (household members) data. However, our purpose is to do some basic inequality analysis, and therefore we create a file containing only the household/household head data. This was done in R and exported to tab delimited ASCII data. Since DAD needs PSU id in addition to ids for stratum, and hhweight which were already in the data file, I assumed that each unique hhweight equates to a distinct PSU and went on to create the PSU ids. This was the R script:

# import Stata data file into R 
library(foreign)
INEQdata <- read.dta("adept_2002.dta")

# sort the data by household ID, and filter the rows only for HH head 
# using piping for the first time!
library(magrittr)
x <- INEQdata[order(INEQdata$id),]%>%
subset(srodstvo == "Head of the household") 
head(x)

##      mesto urban   region hhweight    income starost    pol
## 6392     1 Urban Belgrade 364.8254  1099.873      40   Male
## 2        1 Urban Belgrade 364.8254  3655.284      53   Male
## 6397     1 Urban Belgrade 364.8254  3481.157      48 Female
## 6402     1 Urban Belgrade 364.8254  9023.255      53   Male
## 6404     1 Urban Belgrade 364.8254  5155.616      79 Female
## 6406     1 Urban Belgrade 364.8254 10029.057      46   Male
##                   srodstvo                        obrazovanje aktivnost
## 6392 Head of the household                     Primary school  Employed
## 2    Head of the household  Vocational schools from 1-3 years  Employed
## 6397 Head of the household  Vocational schools from 1-3 years  Employed
## 6402 Head of the household  Vocational schools from 1-3 years  Employed
## 6404 Head of the household                        High school  Inactive
## 6406 Head of the household                     Primary school  Employed
##       pline_u  pline_l id   consump stratum
## 6392 6281.115 4643.848  1  2019.232       1
## 2    6281.115 4643.848  2  9011.678       1
## 6397 6281.115 4643.848  3  2954.915       1
## 6402 6281.115 4643.848  4 30448.318       1
## 6404 6281.115 4643.848  5 13631.313       1
## 6406 6281.115 4643.848  6 10990.324       1
# add PSU id for unique hhweight
xx <- x[, c(1:4,13,15)]%>%
extract(!duplicated(.$hhweight),)%>%
extract(order(.$stratum,.$id),)
xx$PSU <- seq.int(nrow(xx))
xx.1 <- xx[, c(4,7)]
xm <- merge(x, xx.1, by = "hhweight")%>%
extract(order(.$PSU,.$id),)
# preview the first six rows of data
head(xm)

##      hhweight mesto urban   region    income starost    pol
## 2896 364.8254     1 Urban Belgrade  1099.873      40   Male
## 2897 364.8254     1 Urban Belgrade  3655.284      53   Male
## 2898 364.8254     1 Urban Belgrade  3481.157      48 Female
## 2899 364.8254     1 Urban Belgrade  9023.255      53   Male
## 2900 364.8254     1 Urban Belgrade  5155.616      79 Female
## 2901 364.8254     1 Urban Belgrade 10029.057      46   Male
##                   srodstvo                        obrazovanje aktivnost
## 2896 Head of the household                     Primary school  Employed
## 2897 Head of the household  Vocational schools from 1-3 years  Employed
## 2898 Head of the household  Vocational schools from 1-3 years  Employed
## 2899 Head of the household  Vocational schools from 1-3 years  Employed
## 2900 Head of the household                        High school  Inactive
## 2901 Head of the household                     Primary school  Employed
##       pline_u  pline_l id   consump stratum PSU
## 2896 6281.115 4643.848  1  2019.232       1   1
## 2897 6281.115 4643.848  2  9011.678       1   1
## 2898 6281.115 4643.848  3  2954.915       1   1
## 2899 6281.115 4643.848  4 30448.318       1   1
## 2900 6281.115 4643.848  5 13631.313       1   1
## 2901 6281.115 4643.848  6 10990.324       1   1

# write data file in tab delimited ASCII format
write.table(xm, "ineq_HHdata.txt", row.names = FALSE, sep ="\t")

Using the DAD software
(7) When the DAD software is run, a spreadsheet appears. When I click the file open button and select the saved “ineq_HHdata.txt” file, the Data Import Wizard opens. Here, when I unselect the Space delimiter and select the Tab delimiter and First row includes name of variables,  the data is correctly displayed on the Data Import Wizard. Then I click Advanced and select dot for the delimiter:

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-4
This way, I get data into DAD database correctly and then I can go on to save the file in DAD format by clicking the save button (and saved as “ineq_HHdata.dad”):

plot of chunk unnamed-chunk-5
Before saving I specify the sample design by selecting Set sample Design from the Edit menu:

plot of chunk unnamed-chunk-6

1 comment:

  1. Careless me! The extension for the data file to be saved was given as ".daf". Doesn't work. Corrected to "ineq_HHdata.dad".

    ReplyDelete