Bilateral Guided Radiance Field Processing

Bilateral Guided Radiance Field Processing
SIGGRAPH (ACM TOG) 2024

¹The Chinese University of Hong Kong ²Shanghai AI Laboratory

We propose a radiance field processing pipeline: 1) Bilateral guided training can merge input views with photometric variation over radiance field optimization. 2) Bilateral guided finishing can achieve 3D-level enhancement by lifting a user-provided single view editing to 3D.

Abstract

Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the multi-view consistency assumption, leading to "floaters" in the reconstructed radiance fields. To address this concern without compromising visual aesthetics, we aim to first disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields at the finishing stage. Furthermore, to make the re-applied enhancements consistent between novel views, we need to perform imaging signal processing in 3D space (i.e. "3D ISP"). For this goal, we adopt the bilateral grid, a locally-affine model, as a generalized representation of ISP processing. Specifically, we optimize per-view 3D bilateral grids with radiance fields to approximate the effects of camera pipelines for each input view. To achieve user-adjustable 3D finishing, we propose to learn a low-rank 4D bilateral grid from a given single view edit, lifting photo enhancements to the whole 3D scene. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching.

Video

Disentangle ISP enhancements

When capturing multi-view images for training NeRF, the image signal processing (ISP) in the camera program enhances each captured view independently. This results in photometric variations in the NeRF input, leading to "floaters" in the synthesized novel views. Our bilateral guided NeRF training addresses this issue by disentangling per-view enhancements.

In this case, the scene is captured under varying exposure and ISO, using a cell phone camera. Floaters appear in the baseline result (left), but our method (right) can synthesize clean novel views.

Lift 2D enhancements to 3D

Our proposed bilateral guided finishing enables 3D-level human-adjusted enhancements. Users can simply select a rendered view and retouch it in an image editor (e.g., Adobe Lightroom®). Then, our method can lift the 2D editing to the whole scene, achieving compelling renditions consistently over synthesized views.

HDR fusion

Our bilateral guided training can achieve HDR fusion in NeRF. Best-exposed parts of the input images (a) with varying exposure are fused into a HDR radiance field (b). Our radiance-finishing can further adjust the color tone of the fused radiance fields by lifting a single view enhancement (c).

(a) Input samples w/ varying exposure

(b) Radiance field with HDR fusion

Disentangle varying lighting

Varying lighting across input views (a) causes "disco" artifacts on the baseline results (b). While our method (c) can somewhat disentangle the impacts of varying lights.

(a) Input samples w/ varying lighting

(b) Results of ZipNeRF baseline

Object recoloring

In this case, we first edit the color of the bulldozer in a single view, then train the low-rank 4D bilateral grid on the 2D editing to perform 3D-level recoloring. The local and edge-aware operator in the bilateral space will change the color of the subject without significantly affecting surrounding areas.

Photorealistic style transfer

We first transfer the reference style to a selected view, then lift the 2D view stylization to the whole 3D scene using our proposed radiance-finishing via a low-rank 4D bilateral grid.

Live demo

We develop a simple interactive 3D editor with our method, built on the ngp_pl backbone (a PyTorch implementation of Instant-NGP). To try our method with other backbones, please check out the lib_bilagrid.py module.

Related work

This work stands on the shoulders of many prior papers. Especially, the following work illuminated us significantly:

Real-time Edge-aware Image Processing with the Bilateral Grid introduces the bilateral grid data structure, which is the fundamental of our method. Bilateral Guided Upsampling demonstrates that the bilateral grid is a universal approximator for various image enhancements. We further adopt this operator to process 3D scenes.
HDRNet demonstrates that the bilateral grid can work with neural networks due to its differentiability.
TensoRF shows the effectiveness of low-rank approximation for 3D representations.
RawNeRF first incorporates camera pipelines into NeRF.
Our method pipeline resembles the classic HDR+ algorithm, which first merges a burst of frames into an HDR image and then applies local tone mapping for photo-finishing.

Citation

If you find this work helpful, please consider citing:

@article{wang2024bilateral,
    title={Bilateral Guided Radiance Field Processing},
    author={Wang, Yuehao and Wang, Chaoyi and Gong, Bingchen and Xue, Tianfan},
    journal={ACM Transactions on Graphics (TOG)},
    volume={43},
    number={4},
    pages={1--13},
    year={2024},
    publisher={ACM New York, NY, USA}
}

Acknowledgements

We thank all anonymous reviewers for their constructive feedback and suggestions. We thank Ruikang Li for discussing image signal processing with us. We also thank Wang Wei, Xuejing Huang, and Chengkun Li for their assistance in making the video demo and capturing the dataset.

The website template was borrowed from Michaël Gharbi and Ref-NeRF.