Long time ago, I remember giving him the DAD (Distributive Analysis) software, but seems he was too busy to do anything with it. Now, knowing that I dabble in R, he says he doesn't know R and want something simpler. So, I downloaded the latest version of DAD (version 4.6, 2010), tested it, and now I am ready to send him the software together with a few suggestions for getting started. At the same time, for the benefit of my fellow (old) dummies, and for our young people who would hopefully be curious about this business, I am posting my experience. As always, mostly for fun, and possibly for some use.
The DAD website says:
DAD
is designed to facilitate the analysis and the comparisons of social
welfare, inequality, poverty and equity across distributions of
standard living. Its features include the estimation of a large
number of indices and curves that are useful for distributive
comparisons as well as the provision of asymptotic standard errors to
enable statistical inference.
DAD:
A Software for Distributive Analysis / Analyse Distributive. This
programme is freely distributed and freely available. Please
acknowledge its use by quoting it as Jean-Yves Duclos, Abdelkrim
Araar and Carl Fortin, "DAD: a software for Distributive
Analysis / Analyse Distributive", MIMAP programme, International
Development Research Centre, Government of Canada, and CIRPÉE,
Université Laval.
However, getting the DAD software, by
itself, is some story. I still have the DAD 4.4, but to get version
4.6, their website here
asked me to register with them. The problem was that at the field
where I need to select my country, I couldn't find either “Myanmar”
or “Burma”, so the registration failed. Fortunately when I send
an email informing about it to them, one of the authors of DAD, Mr.
Abdelkrim Araar promptly and kindly advised me to use this
website, and it worked. Thanks.
(1) You can type in data using DAD's own spreadsheet-like interface. Or if the data is in Excel format, you can create ASCII data file by exporting to csv or prn format with its own export facility. Or you could covert non-ASCII data file using your pet software, for example, R.
(2) For our exercise, we will try to do the same kind of inequality analysis using two different software, (i) DAD, and, (ii) IC2 R-package. The World Bank's AdePT package not only can do inequality analysis, but also could do a lot more. I am not going to do our exercise with AdePT but will download an exercise dataset that comes with ADePT. That would allow us to check our results against those given by the ADePT's example report.
Getting and converting ADePT example data
(3) ADePT was downloaded from here. The download page includes example datasets “adept_example_data” for a number of countries from LSMS surveys. But the data are in dta (Stata) format.
(4) This dataset was imported into R and then exported as a tab limited ASCII file (to avoid problems with comma delimited csv file; such as for character data containing comma) as ineq_HHdata.txt and imported into DAD.
(5) Some variable names could be changed to English, e.g. “starost” = Age, “pol” = Gender, “srodstvo” = Relationship to Household head, “obrazovanje” = Education, and “aktivnost” = Economic status. But you could do it easily at database creation time in DAD, so I'll leave that as is.
(6) The dataset contains household as well as individual (household members) data. However, our purpose is to do some basic inequality analysis, and therefore we create a file containing only the household/household head data. This was done in R and exported to tab delimited ASCII data. Since DAD needs PSU id in addition to ids for stratum, and hhweight which were already in the data file, I assumed that each unique hhweight equates to a distinct PSU and went on to create the PSU ids. This was the R script:
# import Stata data file into R
library(foreign)
INEQdata <- read.dta("adept_2002.dta")
# sort the data by household ID, and filter the rows only for HH head
# using piping for the first time!
library(magrittr)
x <- INEQdata[order(INEQdata$id),]%>%
subset(srodstvo == "Head of the household")
head(x)
## mesto urban region hhweight income starost pol
## 6392 1 Urban Belgrade 364.8254 1099.873 40 Male
## 2 1 Urban Belgrade 364.8254 3655.284 53 Male
## 6397 1 Urban Belgrade 364.8254 3481.157 48 Female
## 6402 1 Urban Belgrade 364.8254 9023.255 53 Male
## 6404 1 Urban Belgrade 364.8254 5155.616 79 Female
## 6406 1 Urban Belgrade 364.8254 10029.057 46 Male
## srodstvo obrazovanje aktivnost
## 6392 Head of the household Primary school Employed
## 2 Head of the household Vocational schools from 1-3 years Employed
## 6397 Head of the household Vocational schools from 1-3 years Employed
## 6402 Head of the household Vocational schools from 1-3 years Employed
## 6404 Head of the household High school Inactive
## 6406 Head of the household Primary school Employed
## pline_u pline_l id consump stratum
## 6392 6281.115 4643.848 1 2019.232 1
## 2 6281.115 4643.848 2 9011.678 1
## 6397 6281.115 4643.848 3 2954.915 1
## 6402 6281.115 4643.848 4 30448.318 1
## 6404 6281.115 4643.848 5 13631.313 1
## 6406 6281.115 4643.848 6 10990.324 1
# add PSU id for unique hhweight
xx <- x[, c(1:4,13,15)]%>%
extract(!duplicated(.$hhweight),)%>%
extract(order(.$stratum,.$id),)
xx$PSU <- seq.int(nrow(xx))
xx.1 <- xx[, c(4,7)]
xm <- merge(x, xx.1, by = "hhweight")%>%
extract(order(.$PSU,.$id),)
# preview the first six rows of data
head(xm)
## hhweight mesto urban region income starost pol
## 2896 364.8254 1 Urban Belgrade 1099.873 40 Male
## 2897 364.8254 1 Urban Belgrade 3655.284 53 Male
## 2898 364.8254 1 Urban Belgrade 3481.157 48 Female
## 2899 364.8254 1 Urban Belgrade 9023.255 53 Male
## 2900 364.8254 1 Urban Belgrade 5155.616 79 Female
## 2901 364.8254 1 Urban Belgrade 10029.057 46 Male
## srodstvo obrazovanje aktivnost
## 2896 Head of the household Primary school Employed
## 2897 Head of the household Vocational schools from 1-3 years Employed
## 2898 Head of the household Vocational schools from 1-3 years Employed
## 2899 Head of the household Vocational schools from 1-3 years Employed
## 2900 Head of the household High school Inactive
## 2901 Head of the household Primary school Employed
## pline_u pline_l id consump stratum PSU
## 2896 6281.115 4643.848 1 2019.232 1 1
## 2897 6281.115 4643.848 2 9011.678 1 1
## 2898 6281.115 4643.848 3 2954.915 1 1
## 2899 6281.115 4643.848 4 30448.318 1 1
## 2900 6281.115 4643.848 5 13631.313 1 1
## 2901 6281.115 4643.848 6 10990.324 1 1
# write data file in tab delimited ASCII format
write.table(xm, "ineq_HHdata.txt", row.names = FALSE, sep ="\t")
Using the DAD software(7) When the DAD software is run, a spreadsheet appears. When I click the file open button and select the saved “ineq_HHdata.txt” file, the Data Import Wizard opens. Here, when I unselect the Space delimiter and select the Tab delimiter and First row includes name of variables, the data is correctly displayed on the Data Import Wizard. Then I click Advanced and select dot for the delimiter:
This way, I get data into DAD database correctly and then I can go on to save the file in DAD format by clicking the save button (and saved as “ineq_HHdata.dad”):
Before saving I specify the sample design by selecting Set sample Design from the Edit menu:
Careless me! The extension for the data file to be saved was given as ".daf". Doesn't work. Corrected to "ineq_HHdata.dad".
ReplyDelete