AI Is Finding What Astronomers Missed: 800 Unknown Objects in Hubble's Archive, 1.5 Million in NEOWISE

Astronomical archives contain decades of observations that no human team could ever fully review. Hubble alone has captured over a million images since 1990. NASA’s NEOWISE telescope recorded 200 billion individual infrared detections across 10.5 years. The data is there. The problem has always been that looking through it all was functionally impossible.

Two recent projects demonstrate that AI can do what human eyes and traditional search methods could not: systematically scan these massive archives and pull out objects that fell through the cracks. One found over 800 never-documented cosmic anomalies in Hubble data. The other flagged 1.5 million variable objects hiding in NEOWISE’s infrared surveys. Neither project discovered new physics. Both showed that existing data contains far more than we had extracted from it.

AnomalyMatch: 100 Million Hubble Cutouts in Two and a Half Days

David O’Ryan and Pablo Gomez, both at the European Space Agency, built a neural network called AnomalyMatch and pointed it at the Hubble Legacy Archive. The tool processed nearly 100 million image cutouts - each measuring 7 to 8 arcseconds on a side - in roughly two and a half days.

The system identified more than 1,300 objects with unusual appearances. Of those, more than 800 had never been documented in scientific literature. The results were published in Astronomy & Astrophysics in December 2025.

AnomalyMatch works by recognizing visual patterns, trained to flag objects that look different from the overwhelming majority of normal galaxies, stars, and other familiar sources in the archive. The researchers then manually reviewed the flagged candidates to confirm they were genuinely unusual - an important step, since neural networks are good at finding outliers but not at distinguishing interesting outliers from imaging artifacts.

The confirmed anomalies break down into several categories:

417 merging or interacting galaxies with distorted shapes, trailing streams of stars and gas, or other morphological disruptions
138 candidate gravitational lenses, where a foreground galaxy’s gravity bends light from a more distant background galaxy into arcs or rings
18 jellyfish galaxies, which trail long tendrils of gas as they move through dense cluster environments
Ring galaxies, a rare type formed when one galaxy punches through the center of another
Galaxies with massive star-forming clumps that stand out from normal galactic structure
Edge-on planet-forming disks seen in profile
Several dozen objects that do not fit any existing classification scheme

That last category is worth paying attention to. Some of these objects have morphologies that astronomers have no standard label for. They will require follow-up observation to understand.

“This is a powerful demonstration of how AI can enhance the scientific return of archival datasets,” said Pablo Gomez. David O’Ryan noted that his primary research focus remains the relationship between galaxy evolution and galaxy morphology - and that AnomalyMatch is giving him far more examples to study than any manual survey could produce.

The 138 candidate gravitational lenses are particularly valuable. Gravitational lenses act as natural magnifying glasses, allowing astronomers to study extremely distant objects that would otherwise be too faint to observe. Finding more of them in existing data means new science from old observations without pointing a single telescope at anything new.

VARnet: A High School Student Finds 1.5 Million Variable Objects

The second project comes from an unlikely source. Matteo Paz, a 17-year-old from Pasadena High School, built an AI model called VARnet while participating in Caltech’s Summer Research Connection program in 2023. Under the mentorship of Caltech senior research scientist Davy Kirkpatrick, Paz designed the system to identify variable astronomical objects - sources that change brightness over time - in NASA’s NEOWISE infrared data.

VARnet works in three stages. First, wavelet decomposition reduces measurement noise in the raw data. Then, a modified discrete Fourier transform (which Paz calls FEFT) extracts periodic patterns from irregularly sampled light curves - a common problem in astronomical surveys where observations are not evenly spaced. Finally, convolutional neural networks classify each source into one of four categories: non-variable, transient events, intrinsic pulsators (stars that physically expand and contract), or eclipsing binaries (pairs of stars that periodically block each other’s light).

The system processes each source in under 53 microseconds and achieved an F1 score of 0.91 on validation data.

Applied to NEOWISE’s full archive of 200 billion infrared detections, VARnet produced a catalog of 1.9 million infrared variable objects. Of those, 1.5 million had not been previously identified. The candidates include potential quasars (supermassive black holes actively consuming material), newborn stars, variable stars of various types, and transient events like supernovae.

Paz published the work as a single-author paper in The Astronomical Journal, and the project earned first place at the 2025 Regeneron Science Talent Search. Additional Caltech collaborators including Shoubaneh Hemmati, Daniel Masters, Ashish Mahabal, and Matthew Graham contributed machine learning and astronomical expertise.

Paz plans to release the full VarWISE catalog to the astronomical community with updated classifications and cross-matches against other surveys.

What This Pattern Means

These two projects share a common structure: take an existing archive that is too large for manual review, build an AI system to systematically scan it, and find objects that previous search methods missed. Neither AnomalyMatch nor VARnet required new observations. All of the discoveries were already present in the data.

This matters because astronomy is generating data faster than humans can analyze it. The Vera C. Rubin Observatory, expected to begin full science operations soon, will survey the entire visible southern sky every few nights and generate roughly 20 terabytes of raw data per night. ESA’s Euclid mission, launched in 2023, is building a three-dimensional map of the universe that will contain billions of galaxies. Without systematic AI tools, most of what these instruments observe will sit in archives unexamined.

The AnomalyMatch team has explicitly noted that their methodology is designed with future surveys like Euclid in mind. If the tool works on Hubble’s archive, it should scale to even larger datasets.

There is a limitation worth flagging. Both projects still require human verification. AnomalyMatch flagged candidates, but researchers had to manually confirm each one as a genuine anomaly rather than an imaging artifact. VARnet produced 1.9 million candidates, and follow-up observations are needed to confirm and classify them. AI handles the scale problem - scanning millions or billions of sources - but the final scientific judgment remains with people.

Still, the ratio has shifted. Instead of astronomers searching for needles in haystacks, they are reviewing curated lists of candidates. That is a different kind of work, and it makes discoveries that would have taken decades of manual effort into something achievable in days.