CMU-CS-24-132
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-132

Towards City-Scale Neural Rendering

Haithem Turki

Ph.D. Thesis

July 2024

CMU-CS-24-132.pdf


Keywords: Computer Vision, Machine Learning, 3D Deep Learning, Neural Rendering, Novel View Synthesis

Advances in neural rendering techniques have led to significant progress towards photo-realistic novel view synthesis. When combined with increases in data processing and compute capability, this promises to unlock numerous VR applications, including virtual telepresence, search and rescue, and autonomous driving. Large-scale virtual reality, long the domain of science fiction [31, 62], feels markedly more tangible.

This thesis explores the frontier of large-scale neural rendering by building upon Neural Radiance Fields (NeRFs) [118], a family of methods attracting attention due to their state-of-the-art rendering quality and conceptual simplicity. Since its inception, at least 3,000 papers have been proposed in less than three years by research groups across the world across numerous use cases [135]. However, many shortcomings remain. The first is scale itself. Only a handful of existing methods capture scenes larger than a single object or room. Those that do only handle static reconstruction, which limits their applicability. Another is speed, as rendering falls below interactive thresholds. Current acceleration methods remain too slow or degrade quality at high resolution. Quality is a third issue, as NeRF assumes ideal viewpoint conditions that are unrealistic in practice and degrades when they are violated.

We first explore scaling within the context of static reconstruction. We design a sparse network structure that specializes parameters to different regions of the scene that can be trained in parallel, allowing us to scale linearly as we increase model capacity (vs quadratically in the original NeRF), and reconstruct urban-scale environments orders of magnitude larger than prior work. We then address dynamic reconstruction of entire cities, and build the largest dynamic NeRF representation to date. To accelerate rendering, we improve sampling efficiency through a hybrid surface-volume representation that encourages the model to represent as much of the world as possible through surfaces (which require few samples per ray) while maintaining the freedom to render transparency and finer details (which pure surface representations struggle to capture). We finally propose a fast anti-aliasing method that greatly improves rendering quality when training with data collected from freeform camera trajectories. Importantly, our method incurs a minimal performance overhead and is compatible with the scale and speed improvements previously mentioned.

118 pages

Thesis Committee:
Deva Ramanan (Chair)
Jessica K. Hodgins
Martial Hebert
Jonathan T. Barron (Google DeepMind)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu