Why did Indian exit polls get it so very wrong?
Sampling error, preference falsification, media bias, postponed census, and more......
India’s exit polls predicted a clean sweep for Modi, with BJP winning more seats than the 2019 election (i.e., more than 303 seats) and close to 400 seats for the NDA coalition. The election commission is yet to make its final announcement, but BJP only won 240 seats, and the NDA coalition will land at about 295. It seems they are all set to form a government led by Modi.
But why did everyone call it so wrong? One argument is media bias in favor of Modi. A second is preference falsification by voters/surveyors/media etc. But I think the third possibility, that the sampling was faulty and the data were bad, is more likely. And I think they erred because constituency sizes are very large, and small sample sizes need to be either very precise or lucky to get it right. And very precise sampling is difficult because of the paucity of overall census data. Modi government’s decision to postpone the 2021 census may have been to their and their supporters’ detriment.
I hope political scientists and data scientists will dig into this in the coming weeks and months. But until then, my hunch is that it is a sampling problem possibly exacerbated by lack of census data.
1. Media Bias
Andy Mukherjee, prescient as always, told us to take the exit polls with a pinch of salt. He called the exit polls more “psychological warfare” against the opposition, calling big numbers in favor of Modi because of “partisan role of media moguls.”
While Andy is right that media bias is a problem, it is not the reason for faulty exit polls. The media’s bias stems from Modi’s position of power, rather than existing independently of it. The government has enormous powers over businesses, especially media houses, and it can create a lot of problems through tax raids, audits, searches, asset seizures, and freezing bank accounts. And it also has a lot of favors to dole out to the media houses who comply. But all this is true only when Modi and BJP are in government wielding that position of power.
If the exit polls had suggested that Modi is not coming back, media houses would have used it to increase TRPs and get interviews from BJP leadership. And if the exit polls suggested that the BJP will not win a majority of seats, then it is in their interest to curry favor with those who are likely to form the next government, since the new masters will wield the same legal and regulatory weapons.
2. Could it be preference falsification?
Another possibility discussed across TV channels today is whether voters lied to surveyors, or surveyors/polling companies lied to their bosses. The argument is that Indian voters were not in support of Modi, but when asked, they lie about their true preferences, either out of fear of harassment or social censure. Timur Kuran argues that because of group pressure, the preferences people express in public can often differ from those they hold privately. Kuran used it to explain the perceived stability of the communist regimes that collapsed. Under communism, because of the brute force of the state and social surveillance, everyone had praise for the regime and its leaders despite mass discontent. Social pressure created a situation where individuals could not express their true preferences in public. But as public opposition to communism started to rise, people’s public preferences changed quickly and came closer to their private preferences. In the 1990s, this caused the sudden fall of several seemingly strong and stable communist regimes.
Preference falsification can happen at the level of individual voters or at the level of media persons and journalists. With individual voters too, it seems unlikely. Though, in the past, I argued that it may have been a factor in the 2019 elections (I was wrong).
Most of the major exit poll surveyors use a tablet where the voter registers their preference anonymously. In the past, it used to be through a paper list where the voter checked their preference and dropped it into the box/bag of the surveyor. It is not a short or long interview with the voter over audio or video. There is relative anonymity. It seems odd to lie, and in such large numbers, but only in some states, across all the different surveyors across different exit polls.
A friend who is a political scientist told me an even odder, though believable, falsification by surveyors. In the middle of the heat wave, surveyors who are already over worked, underpaid, and running a marathon over 7 phases, have lots of incentives to not stand outside polling booths in the heat. They falsify polls, and no one will question them if they add a false BJP vote, but other parries, especially those which have performed poorly in the recent past, may raise questions. I doubt this is the case, but it doesn’t sound outrageous.
If it were a case of preference falsification at such a large scale, then betting markets (illegal in India) should have diverged a lot from the exit polls. But the betting markets, while landing at fewer seats for the BJP than suggested by the polls, did not suggest that the BJP would not win a majority.
Another reason is that even the INDIA bloc’s poll numbers, which suggested the opposite result of the exit polls, led their leadership to believe they would win 295 seats. But they are 60 seats shy of that number. There was no reason for preference falsification because it was not a poll by a media house. And yet their own exit poll was also way off. This makes me think the problem is not so much preference falsification as data/sampling/surveying error.
3 Faulty sampling/surveying
For instance, Pradeep Gupta and his team at Axis My India polls have a pretty good track record predicting the last two national elections and most of the state elections in the last decade. Gupta personally prides himself on his surveys getting it right and wept on national TV today when the actuals diverged so much from his expectations and exit polls. One of the reasons is that Axis My Poll uses the largest sample sizes of the various exit polls. They interviewed about 580,000 voters using 912 surveyors. While these numbers sound staggering, they are relatively small in India. If spread equally, that is about 1,100 voters per constituency. India has both very large constituency sizes and severe malapportionment, i.e., it does not have equal constituency sizes across states – to learn more, read this post of mine; it is a long story. But some constituencies have over 3 million voters, while others have about 1.8 million voters.
In India, typically, the exit polls use the method called stratified sampling. In this method, pollsters select subdistricts/wards/blocks that represent the demographics and socioeconomic, religious, cultural, and partisan makeup of the state or constituency. This ensures the sample covers the entire gamut of voters, even though only about 1,100 will be polled.
However, the precision of choosing the different strata for sampling depends on up-to-date data. And India has not had a census since 2011. The 2021 Indian census was initially delayed due to the pandemic-related lockdowns. But it’s now 3 years since the pandemic, and while everything seems to have bounced back, census data is nowhere in sight.
And India is amid what we call a structural transformation of the economy. Some regions are going from rural to peri-urban to urban. Metropolitan areas like Mumbai are expanding their boundaries. The pandemic-related economic stresses have led to the decline of some areas while leading to a boom in others. Migration is a mainstay, with young men moving from some of the poorest regions of India to some of the richest regions. To precisely stratify the constituency to create a sample, pollsters need up-to-date data.
In addition to stratification, another method is weighting the samples correctly. The raw data is weighted to match the likely composition of the electorate based on factors like past exit polling or past election results. This corrects for over- or under-sampling of certain voter groups. But again, if the census data is 13+ years old, and the constituencies have transformed, it is much harder to weight the samples correctly.
In September 2020, Nityanand Rai, Minister of State of Home Affairs, told Parliament the census was postponed indefinitely. In December 2023, Rai provided the same reason in a written reply to Parliament. Covid didn’t prevent other government programs involving millions of people – like the 2021 Kumbh Mela. Almost two dozen state elections have been conducted since the pandemic. Census operations start three months after freezing administrative boundaries. On December 30, 2023, the Additional Registrar General of India informed states the deadline to freeze boundaries was extended to June 30, 2024, from January 1, 2024. This ninth extension was required to conduct elections before the administrative freeze. Now it seems that the census is indefinitely postponed, and for the first time in colonial or independent India’s history.
India added a quarter of a billion people since the last census. It has grown at 5-6%, and its metropolitan areas and migrants have grown even faster. Not having census numbers makes large-scale and complex constituencies harder to sample. It could be the reason the same polling firms have a better record at state-level elections where the assembly sizes are smaller, and constituents and their issues are less stratified.
4. Sampling error combined with preference falsification.
Finally, given the faulty sampling, it is possible media houses engaged in some preference falsification. Let’s say some media houses got poll results suggesting BJP will not win a clear majority. But because of small sample sizes, or all the data and sampling problems, they are not confident in their poll. The other media houses are all showing results that give BJP a landslide win. In such a scenario, announcing that the BJP will underperform may place a target on their back should Modi win a third term. So, it is safer to falsify preferences in this scenario.
We will eventually find out the reasons the exit polls were so wrong. But I think not having census data to have precision in sampling and weighting is definitely one of the factors. BJP has postponed and suppressed the census for political reasons like India’s next electoral delimitation that is constitutionally due in the first census after 2026. But it looks like suppressing census data has not worked out well for the BJP, in setting expectations or finding reliable polling data for their electoral outcomes.
Thanks for covering what for me was the biggest question coming out of the Indian elections. Sample size and its construction was probably the reason. Larger samples and well constructed survey plans may have have narrowed the confidence interval, but it costs more and involves delays. And who wants to be last in declaring the results of an exit poll.
Great piece ma'am. I was also wondering the same thing and am glad that I found this article.