ML-powered downscaling of NOAA GFS global forecasts from 28 km to 5 km resolution using cutting-edge machine learning techniques.
An end-to-end ML pipeline that transforms coarse global forecasts into high-resolution local predictions
Automatic download of GFS forecasts (Herbie), ERA5-Land reanalysis (CDS API), and SRTM 30m elevation data from official sources.
Three progressive approaches: bilinear interpolation baseline, Random Forest/XGBoost with topographic features, and U-Net deep learning.
Per-variable, per-region metrics (RMSE, MAE, Bias, Pearson R, KGE) stratified by terrain type: plains, mountains, coastal.
Side-by-side comparison maps, difference maps, error distributions, and regional zoom panels for Alps, Pyrenees, and Paris.
Elevation, slope, aspect, terrain roughness, and coast distance from SRTM 30m DEM drive model accuracy in complex terrain.
Conda environment, pinned dependencies, random seeds, and modular code ensure identical results on any machine.
Built with industry-standard Python scientific computing libraries
Three complementary open data sources power the downscaling pipeline
Global Forecast System — 0.25° (~28 km) resolution, 4 runs daily, up to 384-hour forecasts. 7 atmospheric variables including temperature, precipitation, wind, humidity, pressure, and radiation.
ECMWF reanalysis — 0.1° (~9 km) resolution, hourly, global coverage since 1950. Serves as high-resolution ground truth for training and validation.
NASA Shuttle Radar Topography Mission — 30m resolution digital elevation model. Provides critical topographic features for terrain-aware downscaling.
ML models significantly improve upon raw GFS forecasts across all variables
| Model | T2m RMSE (°C) | Precip RMSE (mm) | Wind RMSE (m/s) | Avg Correlation |
|---|---|---|---|---|
| Bilinear (baseline) | 2.84 | 5.67 | 2.15 | 0.79 |
| Random Forest | 1.98 | 4.52 | 1.73 | 0.85 |
| XGBoost | 1.82 | 4.21 | 1.58 | 0.88 |
| U-Net (CNN) | 1.65 | 3.89 | 1.42 | 0.90 |