For the following code to run, you need to have, (i) downloaded the stata data file “myanmarcs_fy14_datafile_with_dk.dta” from the World Bank site, (ii) after running the step (16) of the code in my previous post “Playing with microdata II: my first parallel coordinates plot”, you have saved the resulting data frame to “pcdf.RData”, and (iii) it exists in the directory for the R Notebook project in RStudio.
load("pcdf.RData")
str(pcdf)
## 'data.frame': 511 obs. of 10 variables:
## $ row : int 1 1 1 2 2 2 3 3 3 4 ...
## $ row0 : int 1 1 1 2 2 2 3 3 3 4 ...
## $ id : int 101 101 101 102 102 102 103 103 103 104 ...
## $ g1r : Factor w/ 9 levels "Office of the President/ Prime Minster/ Minister",..: NA NA NA 5 5 5 1 1 1 5 ...
## $ col : num 18 19 20 7 13 20 7 8 0 7 ...
## $ r1 : int 18 18 18 7 7 7 7 7 7 7 ...
## $ r2 : int 19 19 19 13 13 13 8 8 8 13 ...
## $ r3 : int 20 20 20 20 20 20 NA NA NA 20 ...
## $ rid : num 1 2 3 1 2 3 1 2 3 1 ...
## $ sgcode: Factor w/ 9 levels "1","2","3","4",..: NA NA NA 5 5 5 1 1 1 5 ...
head(pcdf)
## row row0 id g1r col r1 r2 r3
## 1 1 1 101 <NA> 18 18 19 20
## 2 1 1 101 <NA> 19 18 19 20
## 3 1 1 101 <NA> 20 18 19 20
## 4 2 2 102 Private Sector/ Financial Sector/ Private Bank 7 7 13 20
## 5 2 2 102 Private Sector/ Financial Sector/ Private Bank 13 7 13 20
## 6 2 2 102 Private Sector/ Financial Sector/ Private Bank 20 7 13 20
## rid sgcode
## 1 1 <NA>
## 2 2 <NA>
## 3 3 <NA>
## 4 1 5
## 5 2 5
## 6 3 5
In my post “Playing with microdata II: my first parallel coordinates plot”, the plot at the bottom of the post was created by running the following code chunk (Note that to get the desired size of plot we use something like “”{r fig.height=6, fig.width=6}“ for a given code chunk):library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
y_levels <- levels(factor(1:35))
ggplot(pcdf, aes(x = rid, y = col, group = row)) +
geom_path(aes(size = NULL, color = sgcode),
alpha = 0.5,
lineend = 'round', linejoin = 'round')+
scale_y_discrete(limits = y_levels, expand = c(0.5, 0)) +
scale_size(breaks = NULL, range=c(1,7))
Then I left it there for the reader to try to create plots like the one shown in the last part of my post - "Playing with microdata”.In fact this last graphic consisted of six separate plots which I didn't find an easy way to combine into one page using ggplot2. To dodge this issue I just combined them into a single graphic usig GIMP! I think this is fine for the time being, because my primary purpose is blogging. But for sharing my R code, I should learn to place multiple plots produced by ggplot2 on one page. One promising way here would be to use the multiplot function given in: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/, and there maybe others.
The main idea for improving the above plot would be to select one stakeholder group and plot the lines in this group in one color infront of all other groups in another color. To do so (i) we create Y-axis label to represent codes for development priorities, (ii) we define the order of drawing two groups of lines, (iii) and for the readability of the plots create legends for the line colors with text wrapping, and (iv) add appropriate axis labels and headings.
Create Y-axis labels
# create data frame of response codes to use with ggplot
YL <- data.frame(A=as.character(rep("a2_",35)), n=as.character(seq(1,35,1)))
q <- paste(YL$A,YL$n,sep="")
YL$rcode <- factor(q,levels=q)
head(YL)
## A n rcode
## 1 a2_ 1 a2_1
## 2 a2_ 2 a2_2
## 3 a2_ 3 a2_3
## 4 a2_ 4 a2_4
## 5 a2_ 5 a2_5
## 6 a2_ 6 a2_6
Create StakeHolder group names with text wrapping
pcdf$SGname <- gsub("/","/ \n",pcdf$g1r)
sgn <- gsub("/","/ \n", levels(pcdf$g1r))
pcdf$SGname <- factor(pcdf$SGname, levels= sgn)
head(pcdf$SGname)
## [1] <NA>
## [2] <NA>
## [3] <NA>
## [4] Private Sector/ \n Financial Sector/ \n Private Bank
## [5] Private Sector/ \n Financial Sector/ \n Private Bank
## [6] Private Sector/ \n Financial Sector/ \n Private Bank
## 9 Levels: Office of the President/ \n Prime Minster/ \n Minister ...
Create new variables in which one stakeholder group name is preserved and other groups are collapsed into “All Others”
pcdf$SG_1 <- ifelse(as.integer(pcdf$g1r) == 1,
levels(pcdf$SGname)[1], "All Others")
pcdf$SG_2 <- ifelse(as.integer(pcdf$g1r) == 2,
levels(pcdf$SGname)[2], "All Others")
pcdf$SG_3 <- ifelse(as.integer(pcdf$g1r) == 3,
levels(pcdf$SGname)[3],
"All Others")
pcdf$SG_5 <- ifelse(as.integer(pcdf$g1r) == 5,
levels(pcdf$SGname)[5],
"All Others")
pcdf$SG_6 <- ifelse(as.integer(pcdf$g1r) == 6,
levels(pcdf$SGname)[6], "All Others")
pcdf$SG_7 <- ifelse(as.integer(pcdf$g1r) == 7,
levels(pcdf$SGname)[7], "All Others")
head(pcdf)
## row row0 id g1r col r1 r2 r3
## 1 1 1 101 <NA> 18 18 19 20
## 2 1 1 101 <NA> 19 18 19 20
## 3 1 1 101 <NA> 20 18 19 20
## 4 2 2 102 Private Sector/ Financial Sector/ Private Bank 7 7 13 20
## 5 2 2 102 Private Sector/ Financial Sector/ Private Bank 13 7 13 20
## 6 2 2 102 Private Sector/ Financial Sector/ Private Bank 20 7 13 20
## rid sgcode SGname
## 1 1 <NA> <NA>
## 2 2 <NA> <NA>
## 3 3 <NA> <NA>
## 4 1 5 Private Sector/ \n Financial Sector/ \n Private Bank
## 5 2 5 Private Sector/ \n Financial Sector/ \n Private Bank
## 6 3 5 Private Sector/ \n Financial Sector/ \n Private Bank
## SG_1 SG_2 SG_3
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 All Others All Others All Others
## 5 All Others All Others All Others
## 6 All Others All Others All Others
## SG_5 SG_6
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 Private Sector/ \n Financial Sector/ \n Private Bank All Others
## 5 Private Sector/ \n Financial Sector/ \n Private Bank All Others
## 6 Private Sector/ \n Financial Sector/ \n Private Bank All Others
## SG_7
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 All Others
## 5 All Others
## 6 All Others
# convert to factors
pcdf[,12:17] <- lapply(pcdf[,12:17], as.factor)
# change levels of factors
pcdf[,12] <- relevel(pcdf[,12], ref=levels(pcdf[,12])[2])
pcdf[,13] <- relevel(pcdf[,13], ref=levels(pcdf[,13])[2])
pcdf[,14] <- relevel(pcdf[,14], ref=levels(pcdf[,14])[2])
pcdf[,15] <- relevel(pcdf[,15], ref=levels(pcdf[,15])[2])
pcdf[,16] <- relevel(pcdf[,16], ref=levels(pcdf[,16])[2])
pcdf[,17] <- relevel(pcdf[,17], ref=levels(pcdf[,17])[2])
str(pcdf)
## 'data.frame': 511 obs. of 17 variables:
## $ row : int 1 1 1 2 2 2 3 3 3 4 ...
## $ row0 : int 1 1 1 2 2 2 3 3 3 4 ...
## $ id : int 101 101 101 102 102 102 103 103 103 104 ...
## $ g1r : Factor w/ 9 levels "Office of the President/ Prime Minster/ Minister",..: NA NA NA 5 5 5 1 1 1 5 ...
## $ col : num 18 19 20 7 13 20 7 8 0 7 ...
## $ r1 : int 18 18 18 7 7 7 7 7 7 7 ...
## $ r2 : int 19 19 19 13 13 13 8 8 8 13 ...
## $ r3 : int 20 20 20 20 20 20 NA NA NA 20 ...
## $ rid : num 1 2 3 1 2 3 1 2 3 1 ...
## $ sgcode: Factor w/ 9 levels "1","2","3","4",..: NA NA NA 5 5 5 1 1 1 5 ...
## $ SGname: Factor w/ 9 levels "Office of the President/ \n Prime Minster/ \n Minister",..: NA NA NA 5 5 5 1 1 1 5 ...
## $ SG_1 : Factor w/ 2 levels "Office of the President/ \n Prime Minster/ \n Minister",..: NA NA NA 2 2 2 1 1 1 2 ...
## $ SG_2 : Factor w/ 2 levels "Office of Parliamentarian",..: NA NA NA 2 2 2 2 2 2 2 ...
## $ SG_3 : Factor w/ 2 levels "Employee of a Ministry/ \n PMU/ \n Consultant on WBG project",..: NA NA NA 2 2 2 2 2 2 2 ...
## $ SG_5 : Factor w/ 2 levels "Private Sector/ \n Financial Sector/ \n Private Bank",..: NA NA NA 1 1 1 2 2 2 1 ...
## $ SG_6 : Factor w/ 2 levels "CSO","All Others": NA NA NA 2 2 2 2 2 2 2 ...
## $ SG_7 : Factor w/ 2 levels "Media","All Others": NA NA NA 2 2 2 2 2 2 2 ...
Plot by all stakeholder groups
# plot responses for development priorities for all stakeholder groups
y_levels <- levels(YL$rcode)
ggplot(pcdf, aes(x = rid,
y = col, group=row))+
labs(title = "General Issues Facing Myanmar:",
subtitle = "Development Priority")+
xlab("Three responses")+
ylab("Response code")+
geom_path(aes(color = SGname), lineend='round',
linejoin='round', size=0)+
scale_y_discrete(limits = y_levels)+
scale_size(breaks = NULL, range = c(1, 35))
Define the order of drawing the groups of lines
The idea is from “ggplot2: Determining the order in which lines are drawn”: http://blog.mckuhn.de/2011/08/ggplot2-determining-order-in-which.html.pcdf$o1 <- as.factor(apply(format(pcdf[,c("SG_1", "row")]), 1, paste, collapse=" "))
pcdf$o2 <- as.factor(apply(format(pcdf[,c("SG_2", "row")]), 1, paste, collapse=" "))
pcdf$o3 <- as.factor(apply(format(pcdf[,c("SG_3", "row")]), 1, paste, collapse=" "))
pcdf$o5 <- as.factor(apply(format(pcdf[,c("SG_5", "row")]), 1, paste, collapse=" "))
pcdf$o6 <- as.factor(apply(format(pcdf[,c("SG_6", "row")]), 1, paste, collapse=" "))
pcdf$o7 <- as.factor(apply(format(pcdf[,c("SG_7", "row")]), 1, paste, collapse=" "))
Previous plot shows that some respondents didn't identify the SG they belong. So rows for pcdf$g1r with NA's have to be dropped from the data frame.
pcdf.1 <- pcdf[!is.na(pcdf$g1r),]
nrow(pcdf.1)
## [1] 445
Create plots with emphasis on a particular stakeholder group
Two plots were drawn for Stakeholder Groups SG_3 and SG_7 with headings, labels for axes, thicker line size, front line color blue and background line color yellow. This is done by using “scale_colour_manual(values=c('blue','yellow'))”. I was happy playing with many combinations of two different colors before I settled with blue and yellow.Find out what staggering number of colors you could use by “colors()”. To know how ggplot2 define the colors of factors, Q/A such as these may be useful: https://stackoverflow.com/questions/46393082/ggplot2-why-is-color-order-of-geom-line-graphs-reversed, and https://stackoverflow.com/questions/9887342/ggplot2-plotting-order-of-factors-within-a-geom.
Legend is placed at the bottom of the plot. Plots for the remaining groups could be drawn by changing the group=, subtitle, and color= appropriately.
# for SG_3
plot <- ggplot((pcdf.1), aes(x = rid,
y = col, group=o3))+
labs(title = "General Issues Facing Myanmar: \n Development Priority",
subtitle = "(Group-3 vs. All Others)")+
xlab("Three responses")+
ylab("Response code")+
geom_path(aes(color = SG_3), lineend='round',
linejoin='round', size=.8)+
scale_colour_manual(values=c('blue','yellow'))+
scale_y_discrete(limits = y_levels)+
xlim("First","Second", "Third", "(Fourth)")+
scale_size(breaks = NULL)
plot + theme(legend.text = element_text(size = 8, hjust = .1, vjust = .1),
legend.position = "bottom")
# for SG_7
plot <- ggplot((pcdf.1), aes(x = rid,
y = col, group=o7))+
labs(title = "General Issues Facing Myanmar: \n Development Priority",
subtitle = "(Stakeholder Group-7 vs. All Others)")+
xlab("Three responses")+
ylab("Response code")+
geom_path(aes(color = SG_7), lineend='round',
linejoin='round', size=.8)+
scale_colour_manual(values=c('blue','yellow'))+
scale_y_discrete(limits = y_levels)+
xlim("First","Second", "Third", "(Fourth)")+
scale_size(breaks = NULL)
plot + theme(legend.text = element_text(size = 8, hjust = .1, vjust = .1),
legend.position = "bottom")
In the last section of the narrative, "... levels for axes, thicker line size, ..." should read "... labels for axes, thicker line size, ...". Corrected but too late for the linked nb.html file.
ReplyDelete