September 13, 2016

Election is closer than it appears

RealClearPolitics regularly updates a poll summary and provides an aggregate of recent polls to come up with an average of polling numbers.  It's become the widely referenced source of voter preference using it's polling average.  But to blindly have faith in that is a risky proposition when it comes to predicting the presidential election.

Let me start by explaining the caveats for the headline before pointing out why the race is closer than the media would have you believe.

Of course not all polls are created equal.  Firstly, some poll Registered Voters, others poll Likely Voters (which rely on different models to predict turnout).  Registered Voter polls typically skew Democrat as many Democrat voters, particularly younger voters, don't turn out as strongly as older, more conservative voters.  But a Likely Voter model entails risks as well.  Bias can creep into what a likely voter is and therefore the model can be skewed as well.  Unfortunately there's no way to tell what the individual skew might be.  Nevertheless, on the whole the Likely Voter polls tend to be more accurate, and as we get closer to the election, more polls switch to the likely voter model.

Secondly, polls all have different sizes and different resulting margins of error.  A poll of 827 Likely Voters is not the same as a poll of 1476 Likely Voters.  But in the RealClearPolitics average, it seems the average of polls does not weight the polls according to size. Three smaller polls could carry the same weight as 3 larger polls with lower margins of error.

Even that is a minor issue compared to poll weighting.  Some polls are conducted entirely randomly, but many polls weight the people they question to reflect what they believe the ratio of voters of Democrat to Republican to Independent is reflected in the general population.  This most certainly increases bias as using an electorate split of 2012 in the 2014 election led to many polls to be considerably off - the same was true with Obama's re-election in 2012.  Some reliable pollsters were quite humbled by their inaccurate predictions that year.  Clearly this approach can possess a high degree of bias or simply a misreading the electorate.

Some other issues include whether the polls include the third party candidates or just poll head to head.  As we approach election day, often the support of third party candidates like Gary Johnson will drift to the main candidates or those voters may simply stay home.

And of course at the end of the day, even if all of the above issues are reasonably resolved, the polls are at a national level and as we all know the electoral college is conducted on a state by state basis and unless either candidate has a lead of more than 4% nationally, then the really important polls are the state polls, particularly in the swing states - Florida, Ohio, Virginia, North Carolina, Missouri, Wisconsin, Colorado, Iowa and potentially in this election Georgia, Nevada, Arizona and Pennsylvania.  That's what matters.

Nevertheless, there is some value in the RealClearPolitics all polls data - it can help identify trends as there is a plethora of polling data included - many data points.  I've taken the polling and looked at the Likely Voter polls only using Microsoft Excel.  Here's my results by half-month, along with a polynomial regression of voter preferences for Trump and Clinton.  Clearly using Likely Voters, the election is a lot closer than the media is portraying it.  Additionally Trump has a slightly upward trend and Clinton appears to have roughly flattened out.

Take it with a grain of salt - after all, September and October is when people pay attention and this is where preferences get cemented or are most subject to change.  But to me this election, right now, is a lot closer than an individual ABC News/Washington Post Hillary +8 poll of 640 likely voters would have you believe.
