September 16, 2024

Flaws of RCP poll aggregation

Currently the RCP average of polls shows Harris +0.1 in Pennsylvania, based on a straight average of 6 of the most recent polls. I'm not focusing on Pennsylvania here, merely using the RCP data for the state to point out some observations. This RCP average of polls is deeply flawed on so many levels. Let me explain.

It is indeed true that if you take these 6 polls, and average the response rate for Harris and for Trump, as a straight average,  Harris gets a rating of 47.83% and Trump gets an average of 47.67%. But take a closer look and some interesting things begin to appear.

Notice firstly, that there are polls from as far back as August 18/19 being averaged against polls as recent as September 3rd through 6th. Those are not like periods. I won't quibble this point too much, but recent polls are more likely to be accurate than older polls. But the debate wasn't until September 10th, so these may be more alike in voter temperament than pre debate vs post debate polls. Additionally in RCP's defense, maybe there are not enough polls that are recent enough to do much more. Both fair points, but this is actually the least of my concerns.

There are 3 polls that are tied, 2 in which Trump leads Harris, and only one where Harris leads Trump. 5 polls with Trump tied or ahead, only one with Harris ahead, and yet she leads the RCP aggregate. Not just odd, Very Odd. The poll that has Harris ahead has her way ahead. Its skew, is overpowering 5 other polls. That means the outlier poll is dominating the RCP average. That's not good math.

Let me take that last point one step further, the Bloomberg poll surveyed Registered Voters, not Likely Voters. It is not as indicative of actual turnout as all the other polls.  So not only is the outlier over emphasized due to it's skewness, it also is not as good a poll as the likely voter polls.

In calculating an average of polls I would exclude the Bloomberg poll because of the people polled. That's without even getting into the crosstabs of the polls to see if Democrats were oversampled. Maybe they were and maybe they weren't, but remember Michael Bloomberg ran for president, as a Democrat.  Again, not suggesting there is a bias in the sample selection, because there is enough reason to exclude it from the poll without even considering that. To be fair, Insider Advantage and Trafalgar Group tend to skew Republican and those are the two polls that have Trump ahead. I would not exclude them however, because I would not exclude CNN or The Hill polls just because they tend to skew Democrat.

The CNN poll has an absurd 4.7% margin of error though.  The rest of the polls margin of error are below 3.5%, which is to me a reasonable number. For that reason I would exclude CNN.

All else being equal, and it's really not, that would leave me with 4 polls. And if everything were equally done, I would take a weighted average of the polls rather than a straight average. The Insider Advantage poll of 800 should count slightly less than the CBS poll of 1085. Weighting the percentages according to poll size makes a bit more sense.  

Doing so, I get the following output, showing Trump +0.8% In Pennsylvania:


No comments:

Post a Comment

Disagreement is always welcome. Please remain civil. Vulgar or disrespectful comments towards anyone will be removed.

Related Posts Plugin for WordPress, Blogger...

Share This