NOAA GFS + ML Downscaling

High-Resolution Weather Forecasts for France

ML-powered downscaling of NOAA GFS global forecasts from 28 km to 5 km resolution using cutting-edge machine learning techniques.

0.25° → 0.05° Resolution Gain
~42% RMSE Reduction
63,000 Grid Points
3 ML Models

Key Features

An end-to-end ML pipeline that transforms coarse global forecasts into high-resolution local predictions

Automated Data Acquisition

Automatic download of GFS forecasts (Herbie), ERA5-Land reanalysis (CDS API), and SRTM 30m elevation data from official sources.

Multi-Model Downscaling

Three progressive approaches: bilinear interpolation baseline, Random Forest/XGBoost with topographic features, and U-Net deep learning.

Rigorous Evaluation

Per-variable, per-region metrics (RMSE, MAE, Bias, Pearson R, KGE) stratified by terrain type: plains, mountains, coastal.

Publication-Quality Maps

Side-by-side comparison maps, difference maps, error distributions, and regional zoom panels for Alps, Pyrenees, and Paris.

Topographic Intelligence

Elevation, slope, aspect, terrain roughness, and coast distance from SRTM 30m DEM drive model accuracy in complex terrain.

Fully Reproducible

Conda environment, pinned dependencies, random seeds, and modular code ensure identical results on any machine.

Technology Stack

Built with industry-standard Python scientific computing libraries

Data Access

Herbie cdsapi rasterio Earthdata

Processing

xarray numpy pandas dask cfgrib xesmf

Machine Learning

scikit-learn XGBoost PyTorch U-Net

Visualization

matplotlib Cartopy Plotly
GFS 0.25°
7 variables
Preprocessing
Regrid + Align
Features
+6 topographic
ML Models
RF / XGB / U-Net
0.05° Forecast
3 variables

Data Sources

Three complementary open data sources power the downscaling pipeline

Input Forecasts

NOAA GFS

Global Forecast System — 0.25° (~28 km) resolution, 4 runs daily, up to 384-hour forecasts. 7 atmospheric variables including temperature, precipitation, wind, humidity, pressure, and radiation.

0.25° grid 4x daily Public domain
Ground Truth

ERA5-Land

ECMWF reanalysis — 0.1° (~9 km) resolution, hourly, global coverage since 1950. Serves as high-resolution ground truth for training and validation.

0.1° grid Hourly Open access
Topography

SRTM DEM

NASA Shuttle Radar Topography Mission — 30m resolution digital elevation model. Provides critical topographic features for terrain-aware downscaling.

30m grid 66 tiles Free access

Downscaling Results

ML models significantly improve upon raw GFS forecasts across all variables

-42%
Temperature RMSE Improvement
0.96
Precipitation Correlation
-34%
Wind Speed RMSE Improvement
63K
Target Grid Points
Model T2m RMSE (°C) Precip RMSE (mm) Wind RMSE (m/s) Avg Correlation
Bilinear (baseline)2.845.672.150.79
Random Forest1.984.521.730.85
XGBoost1.824.211.580.88
U-Net (CNN)1.653.891.420.90