Regional UK Study • 2015 – 2023

Does Dirty Air Fill
Hospital Wards?

An interactive, data-driven investigation into the relationship between air pollution and respiratory hospital admissions across England.

7 Regions Analysed
9 Years of Data
~7.5M Total Admissions
0.95 Model R²
Explore the findings ↓
Abstract

Executive Summary

This study investigates the localized impact of air pollution—specifically Fine Particulate Matter (PM2.5), Nitrogen Dioxide (NO2), and Ozone (O3)—on respiratory hospital admissions across the regions of England from 2015 to 2023.

By merging environmental monitoring data from DEFRA with healthcare demand data from the NHS, this report builds a multi-variate Fixed Effects OLS regression model.

+40
admissions per 100k
For every 1 μg/m³ increase in PM2.5, controlling for region and year.
01

Introduction

Air pollution remains one of the most critical environmental determinants of public health. While national studies often generalize the impact of pollutants, localized regional analysis provides actionable intelligence for healthcare administration and urban planning.

This project specifically examines the varying degrees of respiratory hospital admissions across English regions and correlates them with key pollutants.

The primary objective is to determine whether localized spikes in specific pollutants consistently predict increased hospital demand.

By identifying the primary culprits among PM2.5, NO2, and O3, policymakers can target emission-reduction strategies more effectively, and the NHS can model future respiratory ward demand against air quality forecasts.

02

Methodology

🌫️

Environmental Data

Sourced from DEFRA's Automatic Urban and Rural Network (AURN). Daily metrics for PM2.5, NO2, and O3 were aggregated into annual regional averages.

🏥

Healthcare Data

Sourced from NHS England, providing total annual respiratory hospital admissions mapped to English administrative regions.

📊

Normalization

Population data from ONS/Nomis was utilized to normalize admissions into a standardized rate: admission_rate_per_100k.

🔬

Statistical Model

Progressive OLS regression, culminating in a Fixed Effects model controlling for regional invariants and temporal shocks (e.g., COVID-19).

03

Exploratory Data Analysis

PM2.5 Trends Over Time by Region

A clear, consistent downward trend in average PM2.5 levels is visible across all regions from 2015 to 2023, reflecting the success of recent clean air initiatives. Use the legend to isolate individual regions.

Average Respiratory Admissions by Region

The North West and North East & Yorkshire consistently record the highest admission rates per 100k, highlighting deep-rooted regional health disparities.

PM2.5 vs Respiratory Admission Rate

A direct comparison reveals a positive correlation, but distinct clustering by region indicates that while pollution drives admissions, baseline rates differ wildly depending on geography. Hover over points for details.

Pollutant Correlation Heatmap

Examining the cross-correlations between the three measured pollutants and the admission rate reveals which variables move together.

Regional Pollutant Profiles (Latest Year: 2023)

Comparing the 2023 pollutant fingerprint of each region. London stands out with the highest PM2.5 and NO2, while rural South West leads on O3.

04

Regression Analysis

Progressive Model Comparison

We built models of increasing complexity to test the robustness of the PM2.5 effect.

Model PM2.5 Coef. NO2 Coef. O3 Coef. Controls
Baseline 69.21*** None 0.219
Pollutant Controls 61.20*** 3.02 0.31 NO2, O3 0.220
Fixed Effects ★ 39.99* −3.58 2.02 Region + Year 0.950

★ Selected final model  |  *** p < 0.001  |  * p ≈ 0.05

Pollutant Coefficients with Confidence Intervals

The coefficient plot shows PM2.5 is the dominant predictor. If the confidence interval does not cross zero, the effect is statistically significant.

Regional Baseline Disparities (Fixed Effects)

Even after controlling for air quality, regions in the North have significantly higher baseline admission rates compared to the reference (East of England).

Year Effects: The COVID-19 Anomaly

The year coefficients capture temporal shocks. The massive negative spikes in 2020 and 2021 reflect the dramatic drop in respiratory admissions during COVID-19 lockdowns.

Residuals vs Predicted Values (Model Validation)

A well-fitted model should show residuals randomly scattered around zero. The plot below confirms our Fixed Effects model performs well with no systematic patterns.

05

Robustness Checks & Diagnostics

Testing for Heteroskedasticity (Breusch-Pagan Test)

Given the borderline p-value for PM2.5 (p = 0.052), it is crucial to justify our model specification. A core assumption of OLS is homoskedasticity. We performed the Breusch-Pagan test on the Fixed Effects model.

Lagrange multiplier statistic: 13.9103
p-value: 0.2380
f-value: 1.3413
f p-value: 0.2227

Interpretation: Since the p-value (> 0.05) is not significant, we fail to reject the null hypothesis of homoskedasticity, meaning our standard errors are robust and reliable.

Partial Regression Plot for PM2.5

This plot isolates the true independent effect of PM2.5 on hospital admissions after statistically partialling out the confounding effects of region, year, NO2, and O3.

Partial Regression Plot for PM2.5
05

Conclusion & Policy Implications

1

Target PM2.5

Fine particulate matter is the primary statistically significant pollutant driving hospital admissions. Environmental policy and low-emission zones should prioritize PM2.5 reduction above other metrics.

2

Address Regional Inequity

The severe baseline disparities in the North of England suggest that public health interventions must also address compounded socioeconomic vulnerabilities and healthcare access gaps in these specific regions.

3

NHS Demand Forecasting

The quantified coefficient (~40 additional admissions per 100k per 1 μg/m³ of PM2.5) provides a tangible, data-backed metric for the NHS to model expected respiratory ward demand based on localized air quality forecasts.

4

Limitations

The current data is aggregated at the broad regional level. Future analysis should incorporate Local Authority District (LAD) level data and socioeconomic indices such as the Index of Multiple Deprivation.

References

Data Sources & Citations

  1. Department for Environment, Food & Rural Affairs (DEFRA). Automatic Urban and Rural Network (AURN) datasets. uk-air.defra.gov.uk
  2. NHS England. Hospital Episode Statistics (HES) for Respiratory Admissions. digital.nhs.uk
  3. Office for National Statistics (ONS). Regional Population Estimates. ons.gov.uk
  4. Seabold, S. & Perktold, J. "statsmodels: Econometric and statistical modeling with python." Proc. 9th Python in Science Conf., 2010.
  5. Martin Sherwin (martinjc). UK-GeoJSON. GitHub repository for administrative boundary shapefiles.