AI Weather Models Go Operational: NOAA and NVIDIA Are Replacing the Supercomputer Era

For decades, weather forecasting has relied on the same basic approach: feed current atmospheric conditions into physics equations, run them on the biggest supercomputer available, and wait. A 16-day global forecast from NOAA’s Global Forecast System (GFS) eats through enormous computational resources and still takes hours to produce.

That approach is now being replaced in practice, not just in research papers. NOAA put three AI-driven weather models into operational service on December 18, 2025, through a program called Project EAGLE. Five weeks later, NVIDIA open-sourced its Earth-2 family of AI weather models, covering everything from global data assimilation to kilometer-resolution storm tracking. Together, these developments mean AI weather prediction has moved from “promising experiment” to “what forecasters actually use.”

NOAA’s Project EAGLE: Three Models, 99.7% Less Compute

Project EAGLE (Experimental AI Global and Limited-Area Ensemble) is a collaboration between NOAA’s National Weather Service, its Oceanic and Atmospheric Research labs, the Environmental Modeling Center at the National Centers for Environmental Prediction, and the Earth Prediction Innovation Center, along with academic and private industry partners.

The initiative produced three operational models:

AIGFS (Artificial Intelligence Global Forecast System) generates 16-day forecasts using just 0.3% of the computing resources required by the traditional GFS. A single forecast run finishes in roughly 40 minutes. The model was built by taking Google DeepMind’s GraphCast as a starting foundation, then fine-tuning it on NOAA’s own Global Data Assimilation System analyses. That fine-tuning step improved performance beyond the original Google model.

AIGEFS (Artificial Intelligence Global Ensemble Forecast System) produces 31-member ensemble forecasts - multiple possible scenarios rather than a single prediction - using only 9% of what the traditional GEFS requires. Early testing shows it extends useful forecast skill by an additional 18 to 24 hours compared to the physics-only GEFS.

HGEFS (Hybrid-GEFS) combines 31 members from the physics-based GEFS with 31 members from the AI-based AIGEFS into a 62-member “grand ensemble.” NOAA says it is the first operational weather center in the world to run a hybrid physical-AI ensemble system, and the HGEFS consistently outperforms both the AI-only and physics-only versions across most major verification metrics.

“This represents a new paradigm for NOAA in providing improved accuracy for large-scale weather and tropical tracks with drastically reduced computational expenses,” said NOAA Administrator Neil Jacobs.

Daryl Kleist, Deputy Director of the Environmental Modeling Center, explained the underlying approach: the models “learn to predict patterns and behaviors of the atmosphere by being trained upon decades of historical data.”

What They Get Right, and What They Don’t

The AIGFS performs well on tropical cyclone track forecasting - predicting where a hurricane will go. But early versions still lag behind traditional models on intensity prediction - forecasting how strong a storm will get. Addressing that gap is the primary focus of AIGFS version 2.0, expected in late 2026.

Erica Grow Cei, a National Weather Service spokesperson, emphasized that the AI models complement rather than replace the traditional systems. Forecasters now have both sets of guidance available, and the hybrid approach lets them benefit from the strengths of each method.

There is also an energy accounting question worth noting. The 99.7% reduction in computing resources applies to running forecasts, not to training the models in the first place. Training a model like GraphCast on decades of atmospheric data is itself computationally expensive. NOAA has not published training cost figures.

NVIDIA Earth-2: The Open-Source Weather Stack

On January 26, 2026, NVIDIA released its Earth-2 family of open AI weather models, covering the full forecasting pipeline from initial data processing to multi-day predictions and local storm tracking. The release includes five models:

Earth-2 Medium Range, built on a new architecture called Atlas, handles forecasts up to 15 days ahead across more than 70 weather variables including temperature, pressure, wind, and humidity. NVIDIA says Atlas outperforms leading open models like Google’s GenCast on standard industry benchmarks.

Earth-2 Nowcasting, powered by the StormScope architecture, uses generative AI to produce kilometer-resolution predictions of local storms and hazardous weather zero to six hours out. NVIDIA claims this is the first AI model to outperform traditional physics-based models on short-term precipitation forecasting - a task where AI has historically struggled because rain and snow involve complex microphysics that global models tend to smooth over.

Earth-2 Global Data Assimilation, based on the HealDA architecture, produces initial atmospheric condition snapshots in seconds on GPUs rather than the hours required on supercomputers. This is the data that feeds into forecast models. It is expected to be fully released later in 2026.

Earth-2 CorrDiff handles downscaling - taking coarse global predictions and refining them to local detail - up to 500 times faster than traditional methods.

Earth-2 FourCastNet3 generates forecasts up to 60 times faster than diffusion-based competitors.

The Medium Range and Nowcasting models are available now through GitHub, Hugging Face, and NVIDIA’s Earth2Studio toolkit. The open release is significant because it means smaller national weather services, startups, and energy companies can build and customize their own forecasting systems without depending solely on data from major centers like NOAA or the European Centre for Medium-Range Weather Forecasts (ECMWF).

Partners already using Earth-2 include the Israel Meteorological Service, Taiwan’s Central Weather Administration, The Weather Company, TotalEnergies, AXA, and S&P Global Energy.

Why This Matters Beyond Faster Forecasts

The shift to AI weather models is not just about speed. Three implications stand out.

Cost reduction changes who can forecast. When a 16-day global forecast requires 0.3% of a supercomputer’s capacity, countries and organizations that could never afford traditional numerical weather prediction infrastructure can suddenly produce their own forecasts. NVIDIA’s open-source release pushes this further - a well-resourced university department could run its own weather models.

Ensemble size can grow cheaply. Ensemble forecasts - running many slightly different versions of a prediction to estimate uncertainty - are limited by computing budgets. When each ensemble member costs a fraction of what it used to, agencies can run larger ensembles, producing better uncertainty estimates. NOAA’s 62-member HGEFS is already demonstrating this.

The training data bottleneck replaces the compute bottleneck. As the computational cost of running forecasts drops, the constraint shifts to the quality and availability of observational data used to train and initialize models. Investments in weather satellites, ocean buoys, and ground stations become relatively more important.

What To Watch

NOAA’s AIGFS v2.0, targeting late 2026, will be a meaningful test of whether AI models can close the gap on hurricane intensity forecasting - one of the hardest problems in meteorology and one where errors cost lives.

NVIDIA’s HealDA release will also matter. Data assimilation - turning raw satellite measurements, weather balloon readings, and surface observations into a coherent snapshot of the current atmosphere - is arguably the most technically demanding step in the forecasting pipeline. If HealDA works well in practice, it removes one of the last remaining steps where traditional supercomputing held an advantage.

The quiet part of this story is that weather forecasting may be the clearest case where AI models are not just matching but operationally replacing conventional computational science. The models are in production. Forecasters are using them. The results are being independently verified. That is a different kind of evidence than benchmark scores on a leaderboard.