Session Catalog Header

Predicting Counties with Elevated COVID-19 Vaccine Hesitancy Utilizing a Data Mining Approach

  • Program: Applied Public Health Statistics
  • Background: Over the past two years, the pandemic has reemphasized how important modeling and forecasting are to guide more effective and targeted public health interventions. This is particularly true when resources (e.g., financial and personnel) are stretched thin. This abstract describes a multivariable model to predict counties with elevated hesitancy towards the COVID-19 vaccine throughout the United States. Results will inform prevention and education strategies targeting messaging and misinformation. Methods We utilized data from the US Census Bureau’s Household Pulse Survey, American Community Survey, and county-level political affiliations from the 2020 election results. Data consisted of county-level demographics, vulnerability characteristics, health-related factors, pandemic severity, and political affiliations to predict counties with moderate hesitancy (>12%) towards the COVID-19 vaccination. Logistic regression and random forest models were built utilizing stepwise selection and evaluated for their ability to classify counties with elevated hesitancy correctly. We performed statistical analyses in R and Orange and mapped the predicted probabilities using GIS. Results Both the logistic regression (LR) and random forest (RF) models effectively identify counties with elevated hesitancy (AUC- 0.83 and 0.86, respectively). The two models had similar specificity (LR: 82.9%, RF: 79.1%), but the random forest had a 14% greater sensitivity (LR: 62.8%, RF: 76.9%). The results observed strong correlations between counties with elevated hesitancy and counties with reduced vaccination rates nearly a year later. Conclusion Utilizing county-level characteristics, our models effectively classified counties with elevated hesitancy. These machine learning models prove to help identify misinformation surrounding vaccines and enable more tailored public health interventions.
Joel Hartsell
Maximus Public Health Data Analytics
Director, Data Science
  • Linked Session ID: 943520
  • Authors: Joel Hartsell, MPH, PMP, CPMAI1, Jonnell Sanciangco, BS, GISP2, Roberto Mejia, DDS, Ph.D.1, Karin Hoelzer, DVM, Ph.D.1
    1Maximus Public Health Data Analytics, 2Maximus Public Health
  • Learning Outcome 1: Describe how machine learning can be utilized to predict counties with elevated hesitancy to the COVID-19 Hesitancy.
  • Learning Outcome 2: Demonstrate how predictive modeling can inform more tailored public health interventions.
  • Learning Outcome 3: Evaluate how the hesitancy has impacted vaccination rates almost a year later.
  • Linked Session Title: Big Data and Machine Learning for Health Research