Recently an old friend who I haven't met for many years asked me to find a suitable software for him to play around with some income-expenditure data. It seems he had calculated the Gini coefficient and the like, by hand, from this data. May be that was forty years ago, perhaps in the process of preparing his economic-diploma term paper.
Long time ago, I remember giving him the DAD (Distributive Analysis) software, but seems he was too busy to do anything with it. Now, knowing that I dabble in R, he says he doesn't know R and want something simpler. So, I downloaded the latest version of DAD (version 4.6, 2010), tested it, and now I am ready to send him the software together with a few suggestions for getting started. At the same time, for the benefit of my fellow (old) dummies, and for our young people who would hopefully be curious about this business, I am posting my experience. As always, mostly for fun, and possibly for some use.
The DAD website says:
DAD
is designed to facilitate the analysis and the comparisons of social
welfare, inequality, poverty and equity across distributions of
standard living. Its features include the estimation of a large
number of indices and curves that are useful for distributive
comparisons as well as the provision of asymptotic standard errors to
enable statistical inference.
DAD:
A Software for Distributive Analysis / Analyse Distributive. This
programme is freely distributed and freely available. Please
acknowledge its use by quoting it as Jean-Yves Duclos, Abdelkrim
Araar and Carl Fortin, "DAD: a software for Distributive
Analysis / Analyse Distributive", MIMAP programme, International
Development Research Centre, Government of Canada, and CIRPÉE,
Université Laval.
For importing data, DAD can read files with extensions: dat, txt, prn, or generally a file in ASCII format with any file extension if you would specify it. This post shares my experience of creating a data file that is readable by DAD and then importing it into it. Next post will show how Gini coefficient is calculated and Lorenz curve is drawn with DAD, and optionally with the IC2 software package of R.
However, getting the DAD software, by
itself, is some story. I still have the DAD 4.4, but to get version
4.6, their website
here
asked me to register with them. The problem was that at the field
where I need to select my country, I couldn't find either “Myanmar”
or “Burma”, so the registration failed. Fortunately when I send
an email informing about it to them, one of the authors of DAD, Mr.
Abdelkrim Araar promptly and kindly advised me to use
this
website, and it worked. Thanks.
Creating ASCII data file for importing data into DAD
(1) You can type in data using DAD's own spreadsheet-like interface. Or if the data is in Excel format, you can create ASCII data file by exporting to
csv or
prn format with its own export facility. Or you could covert non-ASCII data file using your pet software, for example, R.
(2) For our exercise, we will try to do the same kind of inequality analysis using two different software, (i) DAD, and, (ii) IC2 R-package. The World Bank's AdePT package not only can do inequality analysis, but also could do a lot more. I am not going to do our exercise with AdePT but will download an exercise dataset that comes with ADePT. That would allow us to check our results against those given by the ADePT's example report.
Getting and converting ADePT example data
(3) ADePT was downloaded from
here.
The download page includes example datasets “adept_example_data”
for a number of countries from LSMS surveys. But the data are in dta
(Stata) format.
(4) This dataset was imported into R and then exported as a tab limited ASCII file (to avoid problems with comma delimited csv file; such as for character data containing comma) as
ineq_HHdata.txt and imported into DAD.
(5) Some variable names could be changed to English, e.g. “starost” = Age, “pol” = Gender, “srodstvo” = Relationship to Household head, “obrazovanje” = Education, and “aktivnost” = Economic status. But you could do it easily at database creation time in DAD, so I'll leave that as is.
(6) The dataset contains household as well as individual (household members) data. However, our purpose is to do some basic inequality analysis, and therefore we create a file containing only the household/household head data. This was done in R and exported to tab delimited ASCII data. Since DAD needs PSU id in addition to ids for
stratum, and
hhweight which were already in the data file, I assumed that each unique
hhweight equates to a distinct PSU and went on to create the PSU ids. This was the R script:
# import Stata data file into R
library(foreign)
INEQdata <- read.dta("adept_2002.dta")
# sort the data by household ID, and filter the rows only for HH head
# using piping for the first time!
library(magrittr)
x <- INEQdata[order(INEQdata$id),]%>%
subset(srodstvo == "Head of the household")
head(x)
## mesto urban region hhweight income starost pol
## 6392 1 Urban Belgrade 364.8254 1099.873 40 Male
## 2 1 Urban Belgrade 364.8254 3655.284 53 Male
## 6397 1 Urban Belgrade 364.8254 3481.157 48 Female
## 6402 1 Urban Belgrade 364.8254 9023.255 53 Male
## 6404 1 Urban Belgrade 364.8254 5155.616 79 Female
## 6406 1 Urban Belgrade 364.8254 10029.057 46 Male
## srodstvo obrazovanje aktivnost
## 6392 Head of the household Primary school Employed
## 2 Head of the household Vocational schools from 1-3 years Employed
## 6397 Head of the household Vocational schools from 1-3 years Employed
## 6402 Head of the household Vocational schools from 1-3 years Employed
## 6404 Head of the household High school Inactive
## 6406 Head of the household Primary school Employed
## pline_u pline_l id consump stratum
## 6392 6281.115 4643.848 1 2019.232 1
## 2 6281.115 4643.848 2 9011.678 1
## 6397 6281.115 4643.848 3 2954.915 1
## 6402 6281.115 4643.848 4 30448.318 1
## 6404 6281.115 4643.848 5 13631.313 1
## 6406 6281.115 4643.848 6 10990.324 1
# add PSU id for unique hhweight
xx <- x[, c(1:4,13,15)]%>%
extract(!duplicated(.$hhweight),)%>%
extract(order(.$stratum,.$id),)
xx$PSU <- seq.int(nrow(xx))
xx.1 <- xx[, c(4,7)]
xm <- merge(x, xx.1, by = "hhweight")%>%
extract(order(.$PSU,.$id),)
# preview the first six rows of data
head(xm)
## hhweight mesto urban region income starost pol
## 2896 364.8254 1 Urban Belgrade 1099.873 40 Male
## 2897 364.8254 1 Urban Belgrade 3655.284 53 Male
## 2898 364.8254 1 Urban Belgrade 3481.157 48 Female
## 2899 364.8254 1 Urban Belgrade 9023.255 53 Male
## 2900 364.8254 1 Urban Belgrade 5155.616 79 Female
## 2901 364.8254 1 Urban Belgrade 10029.057 46 Male
## srodstvo obrazovanje aktivnost
## 2896 Head of the household Primary school Employed
## 2897 Head of the household Vocational schools from 1-3 years Employed
## 2898 Head of the household Vocational schools from 1-3 years Employed
## 2899 Head of the household Vocational schools from 1-3 years Employed
## 2900 Head of the household High school Inactive
## 2901 Head of the household Primary school Employed
## pline_u pline_l id consump stratum PSU
## 2896 6281.115 4643.848 1 2019.232 1 1
## 2897 6281.115 4643.848 2 9011.678 1 1
## 2898 6281.115 4643.848 3 2954.915 1 1
## 2899 6281.115 4643.848 4 30448.318 1 1
## 2900 6281.115 4643.848 5 13631.313 1 1
## 2901 6281.115 4643.848 6 10990.324 1 1
# write data file in tab delimited ASCII format
write.table(xm, "ineq_HHdata.txt", row.names = FALSE, sep ="\t")
Using the DAD software
(7) When the DAD software is run, a spreadsheet appears. When I click the file open button and select the saved “ineq_HHdata.txt” file, the Data Import Wizard opens. Here, when I unselect the
Space delimiter and select the
Tab delimiter and
First row includes name of variables, the data is correctly displayed on the Data Import Wizard. Then I click
Advanced and select dot for the delimiter:
This way, I get data into DAD database correctly and then I can go on to save the file in DAD format by clicking the save button (and saved as “ineq_HHdata.dad”):
Before saving I specify the sample design by selecting
Set sample Design from the
Edit menu: