Red vs. Blue: A Study in Numbers of the US 2020 Election

DataRes at UCLA
9 min readMar 26, 2021

--

By: Ishaan Shah (Project Lead), Alan Wang, Yixin Chen, Ovie Soman, Nicole Park

Introduction

The most tumultuous year in recent memory was capped off with a fitting end — America bookended a period marked by the rise of Carole Baskin and “Tiger King,” the releasing of classified UFO videos by the Pentagon, and — of course — a global pandemic, with one of the largest and most controversial elections to date. Our team wanted to tackle the challenge of analyzing the 2020 US Election by gaining insight to not only voters by the numbers, but also to the effects of the coronavirus on the election.

Dataset Overview

The “Election, COVID, & Demographic Data by County” dataset from Kaggle was a newly made dataset by Ethan Schacht. The dataset summarized statistics relating to demographic and voting data from over 3000 counties in the United States. We used 3 sub-datasets contained in the main Kaggle file that contained information on voter demographics and distribution by county, cumulative coronavirus cases at the time of the election, as well as polling data collected on all candidates of this election cycle. As a bonus they added the 2016 election data too, which allowed us to create a more thorough analysis.

Data Cleaning

After collecting the data from Kaggle, our team cleaned and created new datasets. The datasets on voters were originally collected by county, so we also chose to summarize the data by state to get a more comprehensive picture of the election. We cleaned up variables such as the percentage of votes given by a state for a particular candidate to race. We also worked to clean the polling dataset collected from Kaggle, which contained information from dozens of pollsters. This was accomplished by selecting a pollster with high credibility (established by FiveThirtyEight’s pollster ratings) YouGov, and filtering out polling data from the start of Biden’s Democratic Nomination in April of 2020.

Visual Insights

Coronavirus Numbers at the County and State Levels

We plotted two visualisations that show the impact of the coronavirus on the county and the state levels, respectively.

One can see that some of the most affected states were North Dakota, South Dakota, Utah, Arizona, and Tennessee. States with the highest number of cases were relatively less affected due to their high population. States which were relatively unaffected were Washington, Oregon, Maine, Vermont, and Hawaii.

Poll Data by Pollster

As seen in the pie graph, pollster SurveyUSA had the greatest number of polls conducted at 2204 polls, which is far greater than the number of polls conducted by MorningConsult, the pollster with the second greatest number of polls, at 460 polls. The total number of polls conducted were 5543 polls.

Poll Predictions Per Candidate

Overall, when aggregating the predictions of each poll across the 2020 presidential election cycle, the number of polls that predicted Biden would win are less than the number of polls who predicted Trump would win. This may be because of factors such as Trump’s incumbency advantage as well as the large amount of campaign money he raised leading up to the election.

To create this visualization, we added the total number of polls that predicted Biden would win and the total number of polls that predicted Trump would win. We then plotted them side-by-side in the bar chart above to show visually that more polls predicted Trump would win by a slim margin of 273 polls. The graph shows that although Trump overall did better in the polls, the polls did not think he had a big advantage over Biden.

Clearly, the polls were proven wrong as Biden won the election.

Poll Data over Time and Cumulative Coronavirus Cases

Coronavirus cases appeared to be a motivating factor for influencing voter turnout. We also wanted to see if coronavirus cases were correlated with popularity of the presidential candidates. To visualize this correlation, we plotted polling ratings from YouGov over time, split by each presidential candidate. Alongside these ratings, we visualized cumulative coronavirus cases during the same times when the polls were collected.

For Biden, it is clear that there is a general increasing trend of his performance in the polls alongside the growing coronavirus cases, both cumulatively and per day. His final months in the polls proved to be his strongest, according to YouGov, where he received an average rating of 51% in the month of October. Conversely, the polls alongside virus cases told a much different story for incumbent President Trump. His polling performances were seen to be more variable than his counterpart, exhibiting both decreases and increases amidst the increasing coronavirus cases. It can be seen that during the increasing cases from April to July of 2020, Trump’s ratings dip, reaching a then-low 39%. We see subsequent rises during July and September, but these rises are periodically accompanied with dips in the months of August and October.

Impact of Coronavirus on Votes: Comparison of Voter Turnout in 2016 and 2020

The 2020 election saw the largest voter turnout in the country’s history. Here’s how the most recent election’s voter numbers stacked up to 2016’s:

The vast majority of states saw a large increase in voting numbers. Most notably, all the states along the West Coast saw strong rises in voter turnout, with California and Washington seeing 17% increases in voter numbers from the prior election, and Oregon boasting a 13% increase. Other crucial states in the election like Pennsylvania and Arizona saw similar jumps in those who took to voting booths, with Arizona carrying the second highest increase in voters from the prior election with a 19% increase behind Utah’s 19.5%.

Interestingly enough, the only states that saw decreases in their voting ranks were the Northeast states including Vermont, New Hampshire, and Maine, losing, respectively, 40%, 51%, and 54% from their 2016 voting total. Comparing this to our heat map of coronavirus cases from earlier, the decrease in votes in these states coincides with some of the lowest coronavirus case numbers in the country at the time. The aforementioned Northeast States held the least coronavirus cases at that time: New Hampshire had a 0.58% rate of coronavirus among its population, Maine held a 0.36% rate, and Vermont had the nation’s lowest coronavirus rate at the time leading up to the November election with 0.29%.

This correlation in relative lack of severity of coronavirus in a state’s community and decrease in voting numbers could be for a number of reasons. The fact that the only states to lose voters compared to the prior 2016 election were also the three states with the lowest coronavirus cases relative to their population size seems to suggest that coronavirus was one of the motivating factors for the surge of votes in this election. This idea is again supported when looking at states with higher voter turnouts. States like California, Utah, and Arizona had some of the highest coronavirus cases as well as some of the highest voting increases, as can be seen from the visualisation depicting voter turnout changes and coronavirus cases by state.

A Divided Story by Parties

Breaking down these voting increases by party, we see two different scenarios. Glancing at the voter changes in support of the Democratic party, we can see that the wide majority of states garnered more support for Biden’s run in 2020 relative Hillary Clinton during her 2016 bid. This majority is made more apparent when comparing the Democratic choropleth map to that of the Republican party. The Republican party’s visual tells a much different tale, where more states either showed relatively no change in support for Trump compared to 2016 or lost their support for him. Looking into key states during the election — Georgia and Arizona being chief among them — the visuals show a loss in support for Trump coinciding with a surging increase of votes for Biden, representing that Democratic voters not only rose to the occasion, but that some former Trump supporters also withdrew support for the incumbent President at the time. We can see 4% increases in Democratic support for Biden relative to Clinton alongside a 0.5% and 2% decrease in support for Trump from the 2016 election.

Impact of Coronavirus: Comparison of Voter Preference on Cases in 2016 and 2020

Above is the bar plot that tells how the counties grouped by covid cases in 2020 voted during the 2016 election.

We have grouped all the counties across the states by their covid case rate in the population and presented their voter preference outcome in 2016 and 2020. One may expect that the more affected a group of counties is, the more likely its voter preference differs from the last election. However, for the groups with more than around 8% of the population infected, the overall voter preference is almost unchanged: there is a significant high preference for Donald Trump over Hillary Clinton or Joe Biden. The fluctuations of vote rates mostly occur in groups that are less affected by covid. Comparing the two plots, it’s apparent that groups of counties more affected by covid have had a high preference for Donald Trump since long before the covid pandemic even occurred. Though we can not say that covid case rate actually influences the final vote rate outcome of the 2020 election from the two bar plots, it’s reasonable to conclude that the preference of Donald Trump is correlated to the relatively high covid case rate in the population. One possible factor of this may be that supporters of Trump tend to take his advice on covid and underestimate the risks.

Impact of demographic factors: Rich vs. Poor

We have grouped all the counties across the states by their average Income Per Capita and the bar plots of their voter preference outcome in 2016 and 2020 are shown above. In both 2016 and 2020, the overall pattern is not much changed: for those whose income per capita is greater than $1.8k, the more they earn, the more likely they are to vote for Democratic. For those whose income per capita is less than $1.8k, it’s more likely for them to vote for the Democratic party. Especially when the income per capita is less than $1.2k, the difference in vote rate between the two parties is 25 to 30%. In 2020, only counties with income per capita between $1.8k and $2.8k show an average voter preference for the Republican party.

Impact of demographic factors: Role of Race

In the two bar plots above, we have grouped all the counties across the states by the rate of whites in population and shown their voter preference outcome in 2016 and 2020. The overall trend of vote rate by diversity between 2016 and 2020 is not changed. For groups of counties with a higher rate of whites in population, people there are more likely to vote for Donald Trump. There are some fluctuations between counties with roughly 55% to 75% of the rate of White people. But in 2020, the difference between vote rates in areas with less than 10% whites in population narrows significantly. It can be seen that the effect of diversity differs between the two elections, especially for areas with extremely low rates of whites in population.

Conclusion

Our team had a great time using this election dataset to glean valuable insights. We saw the impact of a variety of demographic factors such as race and income on voter preference. We also got to see the impact of the coronavirus on this election. Whether the pandemic had an impact on this election was one of the most discussed topics and ex-president Trump also blamed the virus for his loss. Although we can’t conclusively say the pandemic caused Trump to lose the election, there is definitely a correlation between the increasing cases and Trump’s worsening performance. Anyone who wants to continue this project can definitely look at the impact of a variety of other demographic factors, work with time-series data, and maybe use this data to predict a state’s vote in the next election.

--

--

No responses yet