3D Mesh Editing using Masked LRMs

¹University of Chicago, ²Meta Reality Labs

arXiv 2024

Abstract

We present a novel approach to mesh shape editing, building on recent progress in 3D reconstruction from multi-view images. We formulate shape editing as a conditional reconstruction problem, where the model must reconstruct the input shape with the exception of a specified 3D region, in which the geometry should be generated from the conditional signal. To this end, we train a conditional Large Reconstruction Model (LRM) for masked reconstruction, using multi-view consistent masks rendered from a randomly generated 3D occlusion, and using one clean viewpoint as the conditional signal. During inference, we manually define a 3D region to edit and provide an edited image from a canonical viewpoint to fill in that region. We demonstrate that, in just a single forward pass, our method not only preserves the input geometry in the unmasked region through reconstruction capabilities on par with SoTA, but is also expressive enough to perform a variety of mesh edits from a single image guidance that past works struggle with, while being 10x faster than the top-performing competing prior work.

Training Overview

Our model is a Large Reconstruction Model, taking posed images of an object as input and predicting triplanes which may be decoded into an SDF and RGB colors as output. In contrast to ordinary LRMs, we randomly generate rectangular 3D masks during training and render them from the same camera poses as the input images. Patches that contain pixels occluded by these masks are replaced with a learnable token. Through this masking procedure, our LRM learns how to "inpaint" a 3D masked region in the input shape.

BibTeX

@article{gao2024meshedit, title={3D Mesh Editing using Masked LRMs}, author={William Gao and Dilin Wang and Yuchen Fan and Aljaž Božič and Tuur Stuyck and Zhengqin Li and Zhao Dong and Rakesh Ranjan and Nikolaos Sarafianos}, journal={arXiv preprint arXiv:2412.08641}, year={2024} }

3D Mesh Editing using Masked LRMs

By training an LRM using multi-view 3D masking, we enable mesh editing using a single conditional view. Our model takes the edited image and propogates the edit into the entire masked region in 3D.

Abstract

Training Overview

Result Gallery

The initial mesh is shown in the top left, with a block masking out the editing region. The target image is shown in the top right. Our model inpaints the masked region, resulting in the center mesh.

BibTeX