Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis

Stanford University

Given a 10m low resolution (LR) image from 2016 and a 1m high resolution (HR) image from 2018, we generate a photo-realistic and accurate HR image for 2016.


High-resolution satellite imagery has proven useful for a broad range of tasks, including measurement of global human population, local economic livelihoods, and biodiversity, among many others. Unfortunately, high-resolution imagery is both infrequently collected and expensive to purchase, making it hard to efficiently and effectively scale these downstream tasks over both time and space. We propose a new conditional pixel synthesis model that uses abundant, low-cost, low-resolution imagery to generate accurate high-resolution imagery at locations and times in which it is unavailable. We show that our model attains photo-realistic sample quality and outperforms competing baselines on a key downstream task – object counting – particularly in geographic locations where conditions on the ground are changing rapidly.


An illustration of our proposed framework (discriminator omitted). Details can be found in the paper.

Our method uses two pairs of high resolution and low resolution satellite imagery from different timestamps. We then use a low resolution image and a high resolution image from a different time to generate a high resolution image of the area at the time the low resolution image was taken. We use a generative adversarial network (GAN) to train a generator and discriminator.


Samples from all models on the Texas housing dataset. Our models show advantages in both sample quality and structural detail consistency with the ground truth, especially in areas with house or pool construction (zoomed in with colored boxes).

Human evaluation showed improvements over other models in image realism, similarity to ground truth, and accuracy in building and pool counting.

Our method is able to generate images given a low resolution image timestamped either before or after the high resolution image.

Samples from all models on the Functional Map of the World crop field dataset.


  title={Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis},
  author={He, Yutong and Wang, Dingjie and Lai, Nicholas and Zhang, William and Meng, Chenlin and Burke, Marshall and Lobell, David B. and Ermon, Stefano},
  abbr={NeurIPS 2021},
  booktitle={Neural Information Processing Systems},