What Does Current COVID-19 Data Tell Us?

DataRes at UCLA
7 min readJan 16, 2023

Authors: Joseph Ramirez (Project Lead), Olivia Wang, Olivia Weisiger, Nicole Ju, and Bill Gu

Introduction

From toilet paper shortages and emptied grocery stores to lifting mask mandates and reopening schools, it is clear that COVID-19 regulations have changed significantly since the first recorded US cases in March 2020. Knowing the virus has had such a significant impact on economies and well-being around the world, multiple institutions such as the Center for Disease Control and National Institute of Health keenly moderate the collection of COVID-19 data on a daily basis. Now that the virus is approaching a three year birth date, we were interested in seeing what the latest data says about the virus in a variety of ways. In this article, we explore how COVID has impacted different US states through a variety of different lenses, ranging from sewage systems to weather analyses!

Wastewater Surveillance Analysis

Although one might not expect sewage systems to be insightful in monitoring how a disease spreads, wastewater surveillance has actually been an important indicator for COVID. Even at UCLA, residence halls continue to monitor COVID-19 strains in its sewage systems to ensure cases are properly handled.

The figure above is a bar plot showing the most recent average proportion of tests with SARS-CoV-2 detected in wastewater over 15-day windows by each State, meaning a cycle threshold (Ct) value <40 for RT-qPCR or at least 3 positive droplets/partitions for RT-ddPCR, by sewershed over the most recent 15-day windows. As we can see, the majority of the States still have over 80% of the tests detected SARS-CoV-2, in which 20 States (including Vermont, Tennessee, South Dakota, Rhode Island, Oklahoma, New York City, New Mexico, New Jersey, New Hampshire, Nevada, Minnesota, Massachusetts, Maine, Louisiana, Kansas, Iowa, Hawaii, District of Columbia, Connecticut, Alaska) have 100% of the tests detected SARS-CoV-2.

The figure above includes 49 side-by-side boxplots for each State showing the percentiles of SARS-CoV-2 virus levels compared to historical levels at the same site. 0 percentile means levels are the lowest they have been at the site; 100 percentile means levels are the highest they have been at the site. Most States have their median percentile fluctuating around 50 percentile, which means that most States remain at the same levels compared to historical records at the same site. However, New Mexico, Indiana, Arkansas, and Alaska show relatively higher percentiles compared to the other States, which means that the concentration of SARS-CoV-2 is increasing exponentially among these States.

In wastewater, SARS-CoV-2 virus is able to be detected through the shedding by people with or without symptoms. Therefore, studying wastewater surveillance could provide an early warning that COVID-19 is spreading a community.

Racial and Education Analysis of COVID Data

A more traditional analysis of COVID-19 would also include seeing what socioeconomic data says about how the virus has affected certain populations.

Education is a frequently used metric in evaluating socioeconomic status. For example, one may expect more educated states to have a lower number of cases. The figure above is a graph which plots a state’s total population divided by the number of cases ever recorded against the proportion of people with a college degree. By initial inspection, there may appear to be a positive correlation between the variables, suggesting that more educated states actually tend to have a greater number of cases. Using a linear regression model from Sci-kit learn, an R² value of 0.08 was extracted. Analysis using logistic regression and XGBoosted trees revealed low accuracy with both models which, with the low R² value, implies that there likely is not a correlation between more educated states and the amount of infections. Even in other models, which are shown further below, states that have a greater proportion of adults with college education, such as Massachusetts, Virginia, and Colorado–vary greatly in the amount of confirmed cases.

Race has also been frequently studied in terms of how the virus impacts certain populations. The graphs above show a relationship between race demographics and weekly COVID case rates among states in the US. Each dot represents a state and contains two pieces of information: the percentage of the state’s population that is a particular race (specified by the color of the dot) and the state’s weekly COVID rate. The first visualization contains information for 3 different racial groups, which doesn’t reveal any strong correlation between race demographics and COVID rates. The second visualization focuses on the Black population, and one can see that there seems to be a slight negative correlation between the percentage of a state’s population that is Black and the state’s weekly COVID rates. More precisely, the correlation is around -0.2784. This data seems to contradict the idea that minority groups are more severely impacted by COVID. However, this data does not account for the severity of COVID’s effects on a state’s population, as it merely includes the case rate. There are also possible confounding variables, including availability of resources and the likelihood that someone will get tested.

Weather Data Analysis

One may also ask how a variable like weather may provide any unexpected insights into which states are more significantly affected by the virus.

The density map of the U.S. states and territories conveys the severity of the proportion of confirmed COVID-19 cases from January 1st, 2020 to September 14th, 2022. The proportion of confirmed cases was calculated by dividing the cumulative number of confirmed cases per state/territory by the corresponding population. As seen on the key, states with greener tints had less severe proportions of confirmed cases within the specified date range.

When the population size is not considered it is no surprise that states like California, Texas, New York, and Florida are the highest ranking for confirmed cases due to their greater population density. However, when population size is considered, we see from the figure that Alaska, Rhode Island, Guam, and Kentucky have the highest proportion of confirmed cases by population size.

To investigate the general effect of weather on confirmed COVID cases, we took four states from each category — those possessing the highest, middle, and lowest ranking proportion of confirmed COVID 19 cases. Then, we plotted aspects of their average weather patterns to see whether there was a significant discrepancy between the rankings to indicate reasoning for the differences in confirmed case proportion. The state rankings are as follows:

Top Four (High Case Proportion): Alaska, Rhode Island, Guam, Kentucky

Middle Four (Medium Case Proportion): Maine, Oregon, Maryland, American Samoa

Bottom Four (Low Case Proportion): Minnesota, Massachusetts, Puerto Rico, Colorado

For average relative humidity per month, we expect more humid states to foster COVID at higher rates. Yet, as we can see from Figure 3, regardless of a state’s confirmed case proportion — high versus medium versus low severity — average humidity per month was relatively similar.

In congruence with average relative humidity, we understand increased rainfall drives humidity rates. Therefore, states with higher average rainfall (mm) per month should experience a greater proportion of confirmed cases. However, Figure 3 conveys no clear association between higher rainfall and high case proportion severity, considering most states are clustered around similar monthly averages, and the state with the overall greatest average rainfall has a low case proportion severity.

For average snowfall (mm) per month, we typically expect states with more snowfall, suggesting colder weather, to have more severe proportions of confirmed cases. However, Figure 3 shows that high average snowfall is not typically a trait of higher COVID case proportion severity.

Hence, the figure above suggests that aspects of weather, including average humidity, rainfall, and snowfall per month, are not accurate indicators of a state’s COVID severity in terms of the proportion of confirmed cases.

Conclusion

Overall, much of the data collected regarding COVID-19 can lead to new insights. For instance, wastewater has been seen to be highly sensitive to small changes in COVID, so it may be a good indicator to quantify a disease further in the future. While there are still some unclear correlations, such as the relation between infection rates and weather, it was still interesting to investigate if any correlation existed to begin with. Hopefully, these findings showcase possible analyses to be done in the future with other diseases as well.

If you are interested in seeing the code used for these analyses, please look at our GitHub!

Sources

https://data.cdc.gov/Public-Health-Surveillance/NWSS-Public-SARS-CoV-2-Wastewater-Metric-Data/2ew6-ywp6

https://data.cdc.gov/Public-Health-Surveillance/NWSS-Public-SARS-CoV-2-Concentration-in-Wastewater/g653-rqe2

https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/docs/table-weather.md

https://worldpopulationreview.com/states/states-by-race

https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36/data

--

--