Bayanathi Technology: May 2015

Wednesday, May 20, 2015

Electoral fraud and Benford's Law

Allegations of electoral fraud are at least as old as elections themselves. Granted that such a fraud exists in a given situation, it may not be that simple to prove it. Some said that the motivation for electoral frauds is helped by the fact that once you get elected, it is almost impossible to get removed.

There could be barefaced ways of electoral fraud as well as finer ways. It may be tempting to think that the blatant ways of fraud would be easier to prove, yet only the powerful people would be likely to choose such methods because they don't care. And so also, the proofs would be harder to get. Yet finer ways of fraud would be able to bring about the desired results, especially in closely fought elections and would then be almost impossible to point your finger to. Anyway, to get started with electoral fraud, you can look for it in Wikipedia, or look for electoral abuses in Encyclopedia Britannica or anything similar.

Are electoral frauds widespread in space and time, in our world? We don't know. You may assume that for any convicted electoral fraud there may have been multiple cases that escaped conviction or just assume it as an isolated case. It depends on whether you are with the camp, like the case in U. S., that argues that election fraud is rampant and elections are completely corrupt, or with the other camp which dismisses all claims of election fraud as partisan and instead argue that election fraud is nonexistent in U.S. elections. While it seems easy enough to have a gut-feeling that such abuses exist anywhere, in general it is harder to detect them, and still harder to convince the courts or the election authorities about the irregularities.

Are there statistical methods available for detecting electoral fraud? Are there digital signatures differentiating uncertainty in data collection, process errors, and methodological biases from downright vote rigging? In this connection, the attempt to develop statistical methods to verify whether elections results are accurate has come to be known as election forensics in the field of social science.

An obviously appealing approach for detecting election irregularities seems to be to compare official voting reports with the exit poll results. However the exit polls are very much imprecise as we have seen, for example, for the very recent UK 2015 General Elections. You may think that intensive polling independent of those by media groups would do the trick. In fact, this idea is nothing news and also unworkable. David W. Moore, Senior Gallup Poll Editor has refuted this in Exit Polls Probably Ineffective Against Vote Fraud (2004). He quoted Warren Mitofsky, the inventor of the exit poll, who said it would not work unless the size of the error in any single polling place is very large because small errors will be undetectable. We could also note that additionally the voters have to say truthfully who they voted for. Well, it is not always the case. In the UK 2015 election it was acknowledged that "Shy Tories", or those who voted for Conservatives but reluctant to say they did, really existed and that was one major factor for the downfall of the exit poll and all other pre-election polls.

If a less intrusive method such as exit poll would not work, let's consider the most intrusive, and the most direct way of investigating the existence or otherwise of electoral abuse. Assume that you get past the intricacies of handling electoral fraud complaints by the electoral authority or the court, what could you do when ballot papers don't exist anymore? It was this latter kind of situation the researchers Leeman and Boschler faced with, "In a Swiss referendum in 2011, one in twelve municipalities irregularly destroyed the ballots, rendering a recount impossible. We do not know whether this happened due to sloppiness, or to cover possible fraudulent actions." (A systematic approach to study electoral fraud, Electoral Studies, 2014), and so:

How can we detect electoral fraud? The answer to this question depends on the type of committed fraud. Lacking access to the proof (the ballot papers), researchers have started to develop statistical methods to detect irregularities in the reported election results, which might be due to illegitimate manipulations. ...

In general, there are two ways to go about detecting electoral fraud. We focus on the returns at the lowest levels possible and we try to compare outcomes with expectations. The origins of these expectations distinguish the two instruments we have. First, we may rely on ecological information. Knowing the political structure of a village may allow us to predict the voting pattern we should observe (Alvarez and Boehmke, 2008). This approach relies on regression style models based on a subsample where we can (with large confidence) outrule fraud.

Second, we can focus solely on the return sheets (the reported numbers). We compare these figures not with other returns but with a theoretical distribution of digits. ... The basic idea is that when someone makes up numbers they fail to produce numbers that are truly random in the way they would be in a truly fair election or vote.

This second technique is based on the Benford's Law.

Looking for an explanation of this law for dummies (and myself), I couldn't exactly find it. The following from RonJoniak.org (The Mysteries of Benford’s Law, June 7, 2014) without math, without probabilities, may be good enough to give some sense of this strange law to everyone, though much simplified and incomplete.

Benford’s Law, in the most elementary form of understanding, states that the number “1” transpires as the leading digit 30% of the time compared to higher digits such as 9 which occurs 5% of the time. This occurs for all kinds of data sets ranging from electricity bills, street addresses, stock prices, to even physical and mathematical constants. Yes, that’s right, the physical and mathematical constants of the universe follow this mysterious law.

That is Benford's Law stated for the first significant digit, whereas a more complete definition would give for other digits also. Actually, electoral forensics researchers favor looking at the second, third, or fourth digit, and particularly the second digit instead of the first one.

A nice presentation of introduction to Benford's Law is by John D. Barrow (Benford's Very Strange Law). The following is slide number 28:

While Benford's Law is increasingly used for election forensics in recent times, some believe that it is suitable only for cases where vote counts are altered which is considered quite unlikely (Leeman and Boschler, cited earlier), while some think it is outright unsuitable (The Irrelevance of Benford’s Law for Detecting Fraud in Elections, Deckert et al, 2010). Walter Mebane who has been investigating Benford's Law in his election forensic research, finds it inconclusive (Using Vote Counts’ Digits to Diagnose Strategies and Frauds: Russia, Walter Mebane, 2013):

Both the second-digit Benford’s-like Law (2BL) and the idea that the last digits should be uniformly distributed have been proposed as standards for clean elections. Many claim that election fraud is rampant in recent Russian federal elections (since 2004), so Russia should be a good setting in which to see whether the digit tests add any diagnostic power. Using precinct-level data from Russia, ... The digit tests produce surprising and on balance implausible results. ...The usefulness of simple and direct application of either kind of digit tests for fraud detection seems questionable, although in connection with more nuanced interpretations they may be useful.

Pointing out that the while the Second-Digit Benford's Law (2BL)-test is getting popular with researchers, it has mostly been applied to fraud suspect elections, Shikano and Mack tried applying the test to the 2009 German Federal Parliamentary Election against which no serious allegation of fraud has been raised (When Does the Second-Digit Benford’s Law-Test Signal an Election Fraud? Facts or Misleading Test Results, Shikano and Mack., 2011). According to them:

Surprisingly, the test results indicate that there should be electoral fraud in a number of constituencies. These counterintuitive resultsmight be due to the naive application of the 2BL-test which is based on the conventional v2 distribution. If we use an alternative distribution based on simulated election data, the 2BLtest indicates no significant deviation. Using the simulated election data, we also identified under which circumstances the naive application of the 2BL-test is inappropriate. Accordingly, constituencies with homogeneous precincts and a specific range of vote counts tend to have a higher value for the 2BL statistic.

Again in 2013, they continued investigating the 2BL-test (Benford’s Law-test on trial. Simulation-based application to the latest election results from France and Russia,

Mack and Shikano, April 2013, and summarized that:

Concentration of precincts votes in a certain range can boost the BL statistic. Some argue that the use of second digits instead of first digits would solve this problem. This is, however, no solution since concentrated precincts votes appear in certain circumstances which can affect the distribution of even second digits. In this paper, we apply 2BL and an alternative distribution systematically to different institutional settings. More specifically, we investigate the latest parliamentary and presidential elections of France (both 2012), with no suspicion of fraud, and Russia (2011 and 2012), with strong suspicion of fraud. Finally, we replicate another detection method for the purpose of cross validation and compute simple fraud scenarios to assess the performance and mechanisms of 2BL. We can identify a circumstance when 2BL gives misleading signals and have to conclude that 2BL is inappropriate for fraud detection.

On the other hand, Pericchi and Torres who were the first to suggest using the 2BL-test support the Benford's Law tests with a number of variants (Quick Anomaly Detection by the

Newcomb–Benford Law, with Applications to Electoral Processes Data from

the USA, Puerto Rico and Venezuela , Statistical Science, 2011):

The test examines the frequencies of digits on voting counts and rests on the First (NBL1) and Second Digit Newcomb–Benford Law (NBL2), and in a novel generalization of the law under restrictions of the maximum number of voters per unit (RNBL2). We apply the test to the 2004 USA presidential elections, the Puerto Rico (1996, 2000 and 2004) governor elections, the 2004 Venezuelan presidential recall referendum (RRP) and the previous 2000 Venezuelan Presidential election. ... The adequacy of the law is assessed through Bayes Factors (and corrections of p-values) instead of significance testing, since for large sample sizes and fixed α levels the null hypothesis is over rejected. Our tests are extremely simple and can become a standard screening that a fair electoral process should pass.

The appeal of the Benford's Law for election forensics is that it is simple and doesn't need much more data than vote counts to apply it. However, research has shown that the view that it could be used as a model for generating fair voting pattern in all elections so that results for any election may be compared with this standard to judge irregularities couldn't be supported. However, that may not be the last word. Certain type of blatant electoral fraud may still be detectable with some adaption of Benford's Law or appropriately modified variants based on the investigation of digits. Some things are just too dear to throw away.

Mebane (2013) said:

A caveat is that the motivation for the digit tests presumes that those who commit election fraud are unsophisticated or careless, or that the mechanisms used to commit the frauds do not allow precise control of what the fraudulent outcomes are. Beber and Scacco (2012) argue that humans who fake vote counts simply by writing down numbers they happen to think of are subject to psychological limitations that produce nonuniform patterns in the results. Such human limitations would be easily overcome, say by using a random number generator to create the fake numbers: using well-known algorithms, the fake numbers can have any desired distribution. Vote counts that are 2BL-distributed are easy to simulate as well, which would tend to undercut the test advocated by Pericchi and Torres (2011). Indeed, Beber and Scacco (2012) point out that the simulated counts produced by one of the mechanisms in Mebane (2006) that was designed to produce counts that satisfy 2BL also have uniformly distributed last digits.

True, it is complicated. And yet it's all the more interesting for it.

Monday, May 11, 2015

Poll Entrée: UK 2015

Just a few days before May 7 the UK 2015 Election Day, I was lucky enough to watch the celebrated US poll aggregator Nate Silver in action in Britain on BBC Television. As you very well know, poll aggregators are websites that take the published polls results and combine them statistically to report their own election predictions.

According to Wikipedia, notable poll aggregators are Real Clear Politics; Electoral-vote.com; Princeton Election Consortium; FiveThirtyEight (founded by Nate Silver); Pollster.com or Huffpost Pollster; the political blog Talking Points Memo; Votamatic; Frontloading HQ; Election Projection; and Politics by the Numbers.

Nate Silver's prediction on BBC Television was that though the Conservative Party will beat the Labor Party the share of parliamentary seats will be so close that it will result in a hung-parliament. As someone writes: "... after near 30 minutes of mind numbingly boring footage of basically a caravan driving around the UK, finally has Nate Silver state his forecast conclusion that put the Conservatives on 283 seats, Labour 270, SNP 48, Lib Dems 24 and UKIP on just 1 seat. Then he states the obvious that no major party even with the Lib Dem support could form a majority:"

Living in UK and having seen something like this polling scene in 2010 elections, ordinary folks may very well have a gut-feeling that the Tories will win the most seats but couldn't get a majority by themselves. So it's nothing new. In fact, this latter part of their gut-feeling may as well have been influenced by the predictions of the pollsters.

Meanwhile, one UK polling company, Survation, said they played too safe and threw away the only would be correct prediction. It claimed that its election eve poll had been close to the final result with the Conservatives on 37% and Labour on 31% (the final results were 36.9% to 30.4%). According to its CEO, "The results seemed so 'out of line' with all the polling conducted by ourselves and our peers -- what poll commentators would term an 'outlier' -- that I 'chickened out' of publishing the figures -- something I'm sure I'll always regret".

As it happened, some of the election predictions, according to one pollster were:

Source: Market Oracle	Market Oracle	May2015.com	Electoralcalculus.co.uk	ElectionForecast.co.uk	The Guardian
	28th Feb	26th Apr	26th Apr	27th Apr	27th Apr
Conservative	296	272	279	286	274
Labour	262	271	282	267	270
SNP	35	55	47	48	54
Lib Dem	30	26	18	24	27
UKIP	5	3	1	1	3
Others	22	22	22	22	22

Predictions from other sources, for example, as given in Wikipedia, were not much different from these. In terms of the vote share percentages, mostly it was like one or two percentage points over the Labor in favor of the Conservative party.

The exit poll result given at 10 PM at the end of voting on the Election Day gives a more realistic number of seats for the Conservatives, but still not enough to give a majority.

Why couldn't all the best pollsters predict the majority for the Conservatives? And not even by the exit poll?

One of the biggest problems for pollsters in the recent times is that the response rate for pre-election polls was getting really low. So that getting a traditional poll done on a good representative sample of voters is getting well-nigh impossible or getting prohibitively expensive. Some hails internet polling as used by YouGov and others as possible remedy for such flaws of the traditional polls.

But, YouGov’s poll on Thursday night, conducted after votes had been cast, was baffling. It again showed that the parties were neck-and-neck (How 'shy Tories' confounded the polls and gave David Cameron victory, Jessica Eglot, The Guardian, 8 May 2015). Besides, it didn't do well recently in the 2014 Scottish independence referendum. One poll by YouGov created a stir when it published a 2% lead for "Yes" and the actual outcome was an 11% lead for "No" (Election 2015: Hold on, this isn't what you said would happen, BBC newsbeat).

Generally, it would be more difficult to get the polls right in the political system prevailing in UK than in the US. On the FiveThirtyEight website Nate Silver said "When there are only two major candidates, the choice isn’t very complicated. ... UK has become less and less of a two-party system. While the Conservatives and Labour collectively accounted for about 90 percent of the vote through the election of 1970, they’ll be down to somewhere in the neighborhood of 65 percent to 70 percent this year."

He was worried that "The World May Have A Polling Problem":

Consider what are probably the four highest-profile elections of the past year, at least from the standpoint of the U.S. and U.K. media:

The final polls showed a close result in the Scottish independence referendum, with the “no” side projected to win by just 2 to 3 percentage points. In fact, “no” won by almost 11 percentage points.
Although polls correctly implied that Republicans were favored to win the Senate in the 2014 U.S. midterms, they nevertheless significantly underestimated the GOP’s performance. Republicans’ margins over Democrats were about 4 points better than the polls in the average Senate race.
Pre-election polls badly underestimated Likud’s performance in the Israeli legislative elections earlier this year, projecting the party to about 22 seats in the Knesset when it in fact won 30. (Exit polls on election night weren’t very good either.)

Perhaps it’s just been a run of bad luck. But there are lots of reasons to worry about the state of the polling industry. Voters are becoming harder to contact, especially on landline telephones. Online polls have become commonplace, but some eschew probability sampling, historically the bedrock of polling methodology. And in the U.S., some pollsters have been caught withholding results when they differ from other surveys, “herding” toward a false consensus about a race instead of behaving independently. There may be more difficult times ahead for the polling industry.

The British Polling Council (an association of UK polling organizations) declared on 7 May that it will set up an independent inquiry to examine "the possible causes of this apparent bias" in the UK 2015 election polls and make recommendations for future polling.

As reported by CNN, Council president John Curtice -- Professor of Politics at Strathclyde University – said that while polls should be judged on their percentages rather than seats and while it could well be true that many had fallen within their margins of error, an inquiry was still needed. The polls had been accurate on the SNP, Liberal Democrats and Greens -- but they all had an error in the same direction, he said. And that "The reason an inquiry has been set up is that actually the industry collectively clearly underestimated the Conservative lead over Labour".

What would happen if the UK2015 election system were proportional representation than first-past-the-post as it was done? This is what The Electoral Reform Society, a campaign group, arrived at through their calculation using the D'Hondt method of converting votes to seats.

Well, this is interesting, but the real challenge for pollsters in the present electoral system in UK is another kind of vote conversion. It is the need to convert estimates of votes received by parties to estimates of seats in the parliament. So even if you could get the number of votes within the margin of error you planned for, you may still get the error in projected parliamentary seats that hurts.

Meanwhile, International New York Times labeled pollsters as British Election's Other Losers (Dan Bilefsky, 8 May, 2015). It cited Alberto Nardelli, the Guardian's data editor, who said there was no simple explanation to what went wrong with the polling.

“It could be simply that people lied to the pollsters, that they were shy or that they genuinely had a change of heart on polling day,” he said. “Or there could be more complicated underlying challenges within the polling industry, due, for example, to the fact that a diminishing number of people use landlines or that Internet polls are ultimately based on a self-selected sample.”

It also cited Korteweg who thinks that polling disasters are becoming a trend in UK:

Rem Korteweg, a senior research fellow at the Center for European Reform in London, said that British pollsters were going through a particularly bad period. He cited the referendum last year on Scottish independence, when pollsters’ predictions of a neck-and-neck performance for the no and yes camps were upended by results in which 55 percent of Scots voted against becoming independent compared with 45 percent in favor.

“This isn’t the first time in 18 months when the polls got it wrong,” he said. “This is starting to become a trend in this country.”

Mr. Korteweg attributed the pollster’s failings in this latest election to the fact that voters often give socially desired responses during polling, only to behave differently when they vote. “People say who they are voting for with their heart and then vote with their wallets,” he said.

And who even suggested the possibility of having exploited dark fears in politics:

He said that the last-minute surge by the conservatives could also be explained by the fact that the Conservatives had adroitly exploited fears among voters that the Labour Party would be able to govern only in coalition with the Scottish National Party, which wants an independent Scotland.

What really happened in the UK 2015 General Elections was that the Conservatives got a majority since 1992, and the Labor Party suffered their worst defeat since 1987 (United Kingdom general election, 2015, Wikipedia).

Party	Leader	Votes	Votes %	Seats	Seats %
Conservative Party	David Cameron	11,334,920	36.9%	331	50.9%
Labour Party	Ed Miliband	9,344,328	30.4%	232	35.7%
UK Independence Party	Nigel Farage	3,881,129	12.6%	1	0.2%
Liberal Democrats	Nick Clegg	2,415,888	7.9%	8	1.2%
Scottish National Party	Nicola Sturgeon	1,454,436	4.7%	56	8.6%
Green Party	Natalie Bennett	1,154,562	3.8%	1	0.2%
Democratic Unionist Party	Peter Robinson	184,260	0.6%	8	1.2%
Plaid Cymru	Leanne Wood	181,694	0.6%	3	0.5%
Sinn Féin	Gerry Adams	176,232	0.6%	4	0.6%
Ulster Unionist Party	Mike Nesbitt	114,935	0.4%	2	0.3%
Social Democratic & Labour Party	Alasdair McDonnell	99,809	0.3%	3	0.5%
Others	N/A	349,487	1.1%	1	0.2%