Thursday, July 16, 2015

Fooling around and having fun with PVT


Even some election observation experts don't believe in taking samples of voting stations for quick count or PVT. They think it is better to take all the voting stations and do away with the risk of just taking a sample. This idea seems like common sense, but it is flawed.


Although international and domestic groups have con-ducted sample-based PVTs in dozens of countries since 1988, PVTs have sometimes drawn controversy in some quarters of the international community. National election authorities, foreign aid officials, and technical advisers have sometimes questioned the feasibility and accuracy of a vote count verification exercise based on statistical sampling, even though the use of statistical sampling in polling and research is widely accepted among social scientists, media organizations, public opinion researchers and politicians around the world. They also worry that a separate, unofficial vote projection that diverges from the official count might foment postelection unrest.


Misgivings among election authorities and national political elites about the purposes and methodology of PVTs are not surprising. Election authorities rarely like the idea of independent organizations, domestic or foreign, threatening to second guess the official results or offering their own reports of the election outcome. Foreign involvement in such exercises can also be seen as a threat to local sovereignty or hurt national pride because it seems to imply that national authorities require international oversight.


The reason that collecting data from all the units (a census) might not give as reliable results as collecting data from some of the units (a sample) because of the vastly larger scale of operation for the former. This is the well known fact in the census/survey community. Even the seemly simple and routine tasks of collecting vote count results from the voting stations, transmitting them to headquarters, and tabulating the results are no exception to this rule.  


The critically important transitional elections in Indonesia in June 1999 produced considerable controversy among both domestic and international actors.


In response to substantial public mistrust of the official election authorities a coalition of Indonesian universities called the Rectors‟ Forum, with advice from NDI, proposed a sample-based PVT.   
... Apparently, for the first time, however, development agency officials and technical advisers questioned the intellectual basis of a sample-based PVT. In particular, some PVT critics questioned the PVT‟s reliance on statistics. They claimed, incorrectly, that random statistical sampling would not work in the absence of extensive baseline demo-graphic data or could not be used for proportional representation elections. This was a fundamental misunderstanding of the principles of statistics.
Yet because of these unfounded concerns about a sample-based PVT, many Indonesian election and government officials, a number of foreign technical advisers, and some development agency officials initially opposed the PVT. Some urged instead that an independent vote tabulation should consist of a comprehensive PVT, which would at-tempt to collect all the results from several hundred thou-sand polling stations in the country, much as NAMFREL had attempted to do in the Philippines in 1986.


Subsequently, key international actors organized an unofficial comprehensive count in Indonesia, called the Joint Operations Media Center (JOMC). It was organized on the behalf of the Indonesian election commission with funding and technical assistance from American, Australian, and Japanese organizations and the United Nations Development Program (UNDP). Before the election, one of the international organizers promised a “facility . . . capable of reporting reliable results of the elections at the earliest practical moment.”


The JOMC‟s spokesperson told the media he hoped that 50 percent of the results would be known by the day after polling.


... The JOMC was ultimately unable to collect meaningful results. By the morning after election day, it was reporting less than 1/4 of 1 percent of the vote, a meaningless number. Even by three days after the elections, the JOMC could report only 7.8 percent of the vote count, still too small to support any conclusions about the outcome of the elections. ... Rather than reassuring Indonesians and the international community about the integrity of the vote count, the JOMC parallel count actually undermined confidence by raising expectations that it could not meet. Both the sample-based PVT and the comprehensive JOMC ultimately failed to build confidence in the integrity of the reported election results.


Leaving aside the complex issues of PVT vs. exit polls, sample PVT vs. comprehensive PVT, or vote count verification in general, you may like to relax for a moment and have some fun playing around with sample size for PVT using real life voting data. You could do that with what is known as computer simulation. You could learn about the rationale and philosophy and all the nice and impressive things about simulation later, if you like (pardon me, I didn't).


To start with, you will need to have a bit of knowledge about using computers. I would assume that you have installed R on your computer and know how to run a script file with it. If you haven't installed the simFrame package, then install it.


As for the data, download the precinct level 2012 US elections data for Texas from Harvard Dataverse, the Harvard Elections Data Archive. You could download the data file in tab delimited text format, R data format, or the original stata file format. Unfortunately the R data file doesn't work. The stata data file is fine. I don't know for sure if the precinct level elections data means voting station level elections data. I assumed it is so, but it would be no harm for the purpose of our exercise if it is not exactly equal.


The handbook for quick count/PVT by NDI mentioned in my previous post gives detailed description on how to determine the sample size. The report by Committee for Free and Fair Elections in Cambodia (COMFREL), Parallel Vote Tabulation Through Quick Count for 2008 National Assembly Elections, October 2008, showed it followed the NDI approach. Among other resources, ACE encyclopedia (version 1.1) noted "On the whole and probably in a rather random way, one might say that there is an inclination towards doing quick counts on 10% of the population in the case of transition elections (e.g., Chile in 1988, Panama in 1989 and Bulgaria in 1990)." Handbook for Domestic Election Observers by OSCE/ODHR, 2003, observed similarly: "Experience shows that where there is little demographic data and the population is quite diverse, the tendency is to use a relatively large sample, such as 10 per cent of polling stations. Where the opposite is true, a smaller sample can be used and provide sufficiently credible and accurate results for national elections."


In its methodology note on PVT, Pakistan General Elections 2008: Election Results Analysis by Free and Fair Election Network explains:


Experience with past PVTs has shown that drawing a sample of 25-30 polling stations provides sufficient data, within a relatively small margin of sampling error, to assess the reasonableness of official election results. Adding additional polling stations to the sample, even when the number of total polling stations is large, does not improve the margins of sampling error dramatically.


The reason for this statistical principle is that a PVT works with “cluster samples” – each polling station “cluster” averages 1,000 registered voters, and 25 polling stations in a constituency produces a sample of 25,000 voters (25 polling stations x 1,000 voters each) which is much more than statistically sufficient to permit comparisons with official results.


... As part of the world’s largest PVT, almost 16,000 Polling Station Observers (PSOs) from the Free and Fair Election Network (FAFEN) witnessed and recorded the actual vote count in a statistically valid sample of 7,778 randomly- selected polling stations during the 2008 Pakistan National and Provincial Assembly Elections. The national sample of 7,778 polling stations represented almost eight million registered voters.


Common people and even some experts find it hard to believe that taking 25 or 30 voting stations out of a large number of them in a constituency would give good enough estimate for true voting results. For our exercise we have downloaded the Texas data for 2012 elections. It included data for 8952 precincts, of which 278 has 0 votes. It covers election results for U.S. President, for U.S. and State House of Representatives and Senate. For this exercise you will take the votes for the President.


Here's how you could play around with the sample size for PVT. You take simple random sample of 25 precincts out of 8674 with any votes. Then you total up the votes for "g2012_USP_dv" (Democratic votes), "g2012_USP_rv" (Republican votes), and "g2012_USP_tv" (Total votes) for this sample. Then you estimate their totals for Texas.

Theoretically you want to do this for infinite numbers of samples. Obviously you can't. As someone said, running 10,000 samples won't hang your computer and it is close enough to infinity as you could comfortably get. So you would run the simulation with 10,000 samples. Finally you would estimate the total votes for Texas by taking the mean of all the estimates from each of the 10,000 samples.  Then you could compare them with the known results for Texas to see how accurate they are.


Here's how I did that with the simFrame package:












You should get these results with the above code:
(i) For total votes


   Vote_For SimulatedTotVotes TrueTotVotes AccuracyPercent
1  Democrats           3302674      3307609           99.85
2 Republican           4562952      4568788           99.87
3      Total           7986507      7997303           99.86


(ii) For percentage of total votes


   Vote_For  SimulatedPCVotes  TruePCVotes AccuracyPercent
1  Democrats             41.35        41.36           99.98
2 Republican             57.13        57.13          100.00


In sampling terms, a PVT consisted of a sample of clusters (the voting stations). When they differ greatly in "size", the precision of the estimates will suffer. Stratifying the voting stations by "size" and taking samples independently in each of the groups (strata) could improve the precision of the estimates of vote counts.


I guess one way to look into this in our Texas data would be to draw a scatter-plot with the ratio Republican-votes/Democrat-votes on the y-axis and total-votes on the x-axis. We could then see if this ratio changes with the "size" (number of voters) of the precincts.


Here's the scatter-plot:



The same scatter-plot done with the package "hexbin" is here:


Note that they both have regression line drawn in on the graph. From these two graphs, I guess, I could make out that stratification will not be very effective in this situation. I also have a hunch that plain systematic sampling would be good here.


Although this simulation exercise is directed at PVT, it could be useful in help convincing the skeptics that sampling really works. In a sense, I was hoping to give a peek of simulation,  PVT, and sampling to young people and ordinary folks.  Once they are interested, I'm sure they would like to try out the beautiful hexbin plots too.

Look for the resources on simulation, PVT, and hexbin on the Web, learn more, experiment and enjoy (more like advising myself)! Besides, improve on my ideas and codes, would you?

Thursday, July 9, 2015

The story of Parallel Vote Tabulation


Data is just one necessary inconvenience. But who cares?
As quoted in Robert and Casella's Introducing Monte Carlo Methods with R:

"What if you haven't the data?"
"Then we shall proceed directly to the brandy and cigars."
                                                                   
                                                                     Lyndsay Faye
                        The Case of Colonel Warbuton's Madness                                                     

For anyone with some familiarity of sample surveys, the advantage of parallel vote tabulation (PVT) as a method for verifying vote counts over the exit poll would be obvious. The reason is that in a PVT you observe and record the actual vote counts in a voting station, whereas in an exit poll, you ask a sample of voters how they voted, and that way you don't always get information on for whom they have actually voted as in the case of Shy Tories in the recent British election. There are also cases in which a voter would drop a blank ballot, or tampered it to get it rejected but obviously would not reveal it.

However, in exit polls you are interviewing the voters and therefore you could ask not only their ballot choices but also their motivation for voting that way. And that's exciting information for political parties, academics, analysts, and everyone interested. While exit poll and PVT basically measure the same thing—the voting results—one is potentially more accurate than the other and each primarily serve a different purpose.

Sample survey practitioners unfamiliar with PVTs would have thought that they would always have been conducted as probability samples, or what is popularly known as scientific samples, of voting stations. They were not. National Citizens' Movement for Free Elections (NAMFREL) which is acknowledged as the first adopter of the PVT, then known as a "quick count", collected vote counts from as many voting stations as they could. It ended up collecting election results from 70 percent of the 85,000 voting stations in the Philippines presidential election in 1986. Though not exactly statistically sound, it gave good enough evidence that Corazon Aquino was leading Marcos by more than half a million votes out of 20 million cast in contrast to the officially declared victory of Marcos (Vote Count Verification, Bjornlund and Cowan, 2011, Democracy International).

It was for the 1988 plebiscite in Chile on whether President Augusto Pinochet could continue in office, that probability sampling was introduced with the advice from the National Democratic Institutions for International Affairs (NDI). The nongovernmental Committee for Free Elections responsible for it chose to do the quick count in a statistical sample of voting stations instead of trying to obtain the results from all of the voting stations in the entire country.

NDI experts Garber and Cowan "coined the term “parallel vote tabulation” in lieu of “quick count,” which they thought better reserved for an independent verification designed to project results quickly rather than to verify the results. They chose the term “parallel” to distinguish the operation from the official vote tabulation conducted by relevant authorities. They settled on the word “tabulation” to refer to the aggregation or summing of ballots rather than “count” to avoid any connotation of reviewing and recording individual ballots. Nevertheless, many donors, advisers, and observers continue to use the term “quick count” regardless whether the objective of the exercise is to project results quickly or to verify them later and regardless of whether the analysis is based on comprehensive or sample-based data."

That these two pioneering use of quick count or PVT were crucial in uncovering electoral frauds led the electoral monitoring community to recognize the importance of PVT as an effective means in deterring or detecting ballot count fraud. So, by the early 1990s, PVTs had become an important tool in election monitoring.

The idea of the PVT is deceptively simple. You draw a big enough sample of voting stations, watch the vote count process and collect the results, transmit to your headquarters, process the data there and make the estimate for the entire country through statistical analysis, and it is done! However, there are a lot of things to do apparently. You can guess it from the contents of the NDI guide on quick count (The Quick Count and Election Observation: An NDI Handbook for Civic Organizations and Political Parties, Estock, Nevitte, and Cowan, 2002):

CHAPTER TWO: GETTING STARTED
Leadership and Staff; Project Planning; Budgets and Fundraising

CHAPTER THREE: PROMOTING THE QUICK COUNT
Relations with Electoral Authorities; External Relations; The Media Campaign

CHAPTER FOUR: BUILDING THE VOLUNTEER NETWORK
Designing Materials; Recruiting; Training; Logistics

CHAPTER FIVE: STATISTICAL PRINCIPLES AND QUICK COUNTS
Basic Statistical Principles; Constructing the Sample

CHAPTER SIX: THE QUALITATIVE COMPONENT OF THE QUICK COUNT
Designing Observation Forms; Analyzing Qualitative Data

CHAPTER SEVEN: COLLECTING AND ANALYZING QUICK COUNT DATA
Data Reporting Protocols; Information Flows; Statistical Analysis of Quick Count Data

CHAPTER EIGHT: THE “END GAME”
Developing a Protocol for Data Use; Releasing Quick Count Data; End Game Activities;
Preparing for the Future

The handbook is rather old, but I guess the much of the advice given would still be applicable. In our case, I felt that the hardest tasks we would have to face may be with those relating to chapter three. This would be particularly applicable to the political parties among two of the user groups to which the handbook had been directed. 

The Union Election Commission, Myanmar has issued procedures for international observers (June 26, 2015; translation by GNLM, 3 July 2015), and procedures for national observers (June 26, 2015).

The accredited international as well as national observers have the following rights, among others:

1.       legal protection and security of the Republic of the Union of Myanmar;
2.       the right to observe and to have access to the information on the Election process;
3.       the right to observe voting, vote counting and developing the voting results;
4.       the right to observe in the polling station; (report to the Presiding officer and comply with his arrangement because he is performing his responsibilities.)

The third bullet ensures that PVT could be performed by both the accredited international and national observers. The procedures provided no explicit reference to the information collection from voters outside of the voting process and voting premises—the exit poll. On the strength of the bullet-2, by extension, the accredited observers are clearly entitled to do the exit poll. But what about the public media? Could they do the exit poll freely? I guess that depends on how the authorities would interpret the exit poll. Is it election observation, subject to the above provisions by the UEC, or not?

Anyone could see that this national election, due November 2015, is a crucial step in the democratic transition for Myanmar. And anyone could see that we need to do things right. For that, international assistance is indispensible. So we see the UEC working with European Union Electoral Support Team, International Institute for Democracy and Electoral Assistance (International IDEA), and International Foundation for Electoral System (IFES). I guess the political parties, interest groups, and civic organizations would benefit from the cooperation of these institutions as well as from some other institutions well experienced with electoral observation such as the Carter Foundation, and also diplomatic missions in Myanmar, and others.

For the general public and small guys, they could get occasional publication in Myanmar language from these institutions. For a stable supply of information, they could collect the spillovers in the tea-shop conversations, go to face book, to the immensely popular local weekly news, or to radio broadcasts.

If they want to know about a more quantitative side of vote count verification, they could go to The Quick Count and Election Observation: An NDI Handbook for Civic Organizations and Political Parties, by Estock, Nevitte, and Cowan, and Vote Count Verification: A User’s Guide for Funders, Implementers, And Stakeholders by Bjornlund and Cowan mentioned earlier.

In addition to the PVT, the latter covers the exit poll, as well as post-election statistical analysis and election forensics:

Chapter 1: Overview of vote count verification and purpose of study
Chapter 2: Parallel vote tabulations
Chapter 3. Limitations of exit polls and other types of public opinion research
Chapter 4: Post-election statistical analysis and election forensics
Chapter 5: Managing vote count verification
Chapter 6: New challenges for vote count verification

If they have gone this far, they may also like to look at Report on Roundtable on Vote Count Verification by the Carter Center and Democracy International, 2011.

The possibility of conflicts in conclusions drawn from a PVT and an exit poll when both were present in an election environment is interesting and instructive. In the post Improving Vote Count Verification in Transitional Elections in Electoral Insight blog, March 2006, of Elections Canada, Eric Bjornlund said:

But PVTs and exit polls have sometimes worked at cross-purposes. Exit polls sponsored by international groups may distract from PVTs conducted by domestic groups or may not be reliable in less-than-free political environments. Indeed, if reliable exit polls are possible in a given country, PVTs – which tend to be more expensive and difficult to organize – are probably not necessary. Where both PVTs and exit polls exist, the results of a reliable PVT should take precedence for vote count verification, and interested parties should look to exit polls primarily for insights about voter motivation as opposed to vote count verification. ... Experiences from recent elections in Macedonia and Ukraine offer some important lessons about the need for better coordination among the sponsors of different election monitoring techniques. In Macedonia in 2002, a foreign-sponsored exit poll used to quickly project results overshadowed a well-executed PVT by a national group. This did little to advance the larger democratic development goals shared by all the organizations involved. In Ukraine in 2004, exit polls suggested fraud, but a PVT did not support this conclusion. Such discrepancies can hurt the credibility of election monitoring.

To me it is clear that there is basically nothing against the existence of both tools for a given election so long as each is used primarily for what it does best. For us Myanmars, I felt we need both, as we are short of reliable information relating to every part of our lives.

Well, there is at least one example of conflict between the locals and an outsider, or shall we say "People vs. Margate House Films"? Every vote must be counted, every voice must be heard is the name of the post by Rob Allyn, that appeared in Mandala, July 16, 2014. Rob Allyn is Chairman of Margate House, which handled TV ads for Jokowi’s 2012 campaign for Governor of Jakarta and the Prabowo-Hatta bid for the Presidency in 2014.

Allyn said:

Much has been written by conspiracy theorists and crackpot websites about our work for democracy in Indonesia, where our firm handled media campaigns both for Jokowi’s 2012 victory and Prabowo-Hatta’s 2014 presidential campaign – the results of which have yet to be counted, verified and released. A few academics and journalists have been misled by this wildly inaccurate Internet chatter as they make a case for deciding the election based on so-called “quick counts” (produced by private pollsters on Jokowi’s campaign team). ...

To those who declare this election over based on alleged “quick counts” produced by members Jokowi’s campaign team, let us remember these key facts:
·         Quick counts represent less than half of 1% of voters (some 2,000 voting places out of 480,000+ across the vast archipelago of the world’s third-largest democracy).
·         The pollsters producing those quick counts (based on less than ½%), as well as the head of Indonesia’s polling association, are all members of Jokowi’s campaign.
·         These same Jokowi-team “quick count” pollsters were cited by Aaron Connelly and other distinguished academics and journalists for refusing to disclose their poll results in June, for fear they would reveal Prabowo’s 30-point gain in the race.
·         Inarguably, awarding the election based on quick counts of ½% by Jokowi’s pollsters completely disenfranchises more than 99.5% of Indonesian voters.
·         That’s not democracy. In a true democracy, every vote must be counted, every ballot must be recorded, and every voice must be heard.

There were altogether 72 comments, most overwhelmingly against him. Here are a few.

Monique:
... You have confirmed my belief in publicily-funded elections, and the complete and total banning of all public or private commercial political advertising. We see what it does to the electoral process. It degrades and impugns it.

Aaron Connelly:
... I’d recommend reading Diane Zhang’s breakdown of the quick count numbers and what they mean (http://electionwatch.edu.au/indonesia-2014/what-basis-jokowis-claim-presidency). But if at this late stage you still don’t understand how quick counts work, if you still don’t understand that Jokowi won the most votes last week, I am not sure that I or anyone else here can help you.

danau:
Why is a reputable research institution like the ANU publishing this piece? It is an advertising piece and a PR manoeuver for Margate House/Rob Allyn. Where is the scholarly critique to go with this piece?

As for Rob Allyn, he can say all he wants about “working for democracy” but it is rhetorical (in the same way that much of what Prabowo says is empty rhetoric). If Allyn really was a champion for democracy the way he makes himself out to be in this piece, he would have done some background check about Prabowo before offering him Margate House’s services for his presidential campaign.

But their actions suggest that Margate House/Rob Allyn is not about democracy – instead, they are about profit. Either that, or they are very naive about the precariousness of Indonesia’s young democracy. ...

Tempodulu:
Beware of this deceit!!!! Rob Allyn is basically saying that statics sampling is a load of old baloney – WHICH IT IS NOT. Quick counts do take small samples, but the beauty of statistics is that they are EXTREMELY ACCURATE in predicting outcomes. In fact, the truth is that you only need a small sample size of around 1,500 people and you could accurately determine the percentage of Indonesians who prefer Jokowi or Prawbowo. Quick counts predict election results so accurately that in most countries they are taken as the final result (with a small margin of error). In Indonesia, all the IMPARTIAL quick counts were very similar, which backs this simple truth – JOKOWI circa 52-53%, Prabowo circa 47-48%. If Prabowo were to win the election, this would actually violate the rules of mathematics – a remarkable achievement by anyone. Perhaps Stephen Hawking might get interested!

Angrymagpie:
Now Kompas is running a story about Rob’s voice; basically summarised Rob Allyn’s article here without mentioning the fierce criticism it receives from the readers of New Mandala. I hope this won’t be used by some people to falsify an air of Rob Allyn neutrality & professionalism, or even victimhood.
http://indonesiasatu.kompas.com/read/2014/07/16/13141361/konsultan.kampanye.prabowo.asal.as.buka.suara?utm_source=WP&utm_medium=box&utm_campaign=Khlwp

danau:
I share Angrymagpie’s concerns. (Hence my multiple posts!) Merdeka is also citing ANU’s website for publishing Allyn’s post. Thankfully Merdeka has taken the time to ask for comments from Made Supriatma, who has offered some critique. But Merdeka has wrongly attributed the italicize two line introduction to Allyn (but did not mention the purpose of the publication – for public response). And they say that Allyn posted this from the US, when he clearly states that he has gone back to the US. Luckily, they have only made minor misinterpretations…for now.


Mara Dyer:
Mr. Allyn? Your client lost…Given his notorious temper and well-known habit for pulling out his hand gun during an argument, I sincerely hope you are no longer in Indonesia. If you are still here, a word of advice: RUN!

So far this selection of comments has necessarily been subjective. Better read the post yourselves.