TL;DR
VideoPDE casts a spatiotemporal PDE solution u(x,t) as a video tensor and applies conditional diffusion (inpainting/forecasting) to reconstruct missing observations. The framing naturally supports irregular spatiotemporal masks and emphasizes coherent long-horizon generation.
Problem
Many PDE inference settings provide only partial observations over space/time (missing sensors, masked pixels, intermittent trajectories). Classical operator learners struggle when supervision is sparse or the task changes (new masks). VideoPDE uses a diffusion prior over trajectories to reconstruct/forecast in a mask-conditional way.
Benefits vs others
- Unified interface for **inpainting + forecasting**: different masks correspond to different conditioning inputs.
- Video diffusion backbones can model strong temporal coherence, helping long-horizon rollouts.
- Mask conditioning makes it easy to add new observation patterns without redesigning the model.
Interesting detail
- The video perspective suggests plug-and-play reuse of large generative video models for scientific spatiotemporal data.
- Future extensions: conditioning on PDE coefficients, boundary geometry, or control signals.
Core method (math)
Template for Diffusion. Paper-specific equations are added when manually curated.
Main theoretical contribution
- Mask-conditioning is equivalent to learning p(x_0 | M⊙x_0) for a family of observation operators parameterized by M.
- Replacing observed entries during sampling enforces hard constraints and stabilizes reconstructions.
Main contribution
- Introduces a **video-diffusion framing** for PDE trajectories under partial observation.
- Demonstrates reconstructions/forecasts across PDE datasets with diverse masking (see tables).
- Provides ablations on mask patterns and temporal context.
Main results (headline)
(Optional) Add main_results for a quick headline summary.
Experiments
PDE problems
- Wave equation
- Navier–Stokes
- Kolmogorov flow
Tasks
- Forward prediction
- Inverse reconstruction from sparse observations
Experiment setting (high level)
- Spatiotemporal masking patterns (video inpainting).
- Reports both forward and inverse scenarios with MSE.
- Uses diffusion sampling at multiple step budgets.
Comparable baselines
- DiffusionPDE
- FNO
- PINO
- DeepONet
Main results
Forward scenario (MSE)
Transcribed from the earlier site draft.
| Method | Wave-Layer | Navier–Stokes | Kolmogorov |
|---|---|---|---|
| VideoPDE | 0.023 | 0.026 | 0.125 |
| DiffusionPDE | 0.102 | 0.051 | 0.140 |
| PINO | 0.261 | 0.078 | 0.424 |
| DeepONet | 0.254 | 0.083 | 0.421 |
| FNO | 0.260 | 0.071 | 0.421 |
Inverse scenario (MSE)
Transcribed from the earlier site draft.
| Method | Wave-Layer (inv) | Navier–Stokes (inv) |
|---|---|---|
| VideoPDE | 0.009 | 0.024 |
| DiffusionPDE | 0.077 | 0.026 |
| PINO | 0.034 | 0.044 |
| DeepONet | 0.036 | 0.031 |
| FNO | 0.031 | 0.030 |
Citation (BibTeX)
@article{videopde2025,
title={VideoPDE: Masked Video Diffusion for Partial-Observation PDE Inference},
author={He, Ruicheng and others},
journal={arXiv preprint arXiv:2506.13754},
year={2025}
}