Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems. While recent progress in implicit function has shown encouraging results on high-quality 3D shape reconstruction, it is still very challenging to generalize to cluttered and partially observable LiDAR data.
In this paper, we propose to leverage the continuity in video data. We introduce a novel and unified framework which utilizes a neural implicit function to simultaneously track and reconstruct 3D objects in the wild. Our approach adapts the DeepSDF model (i.e., an instantiation of the implicit function) in the video online, iteratively improving the shape reconstruction while in return improving the tracking, and vice versa. We experiment with both Waymo and KITTI datasets and show significant improvements over state-of-the-art methods for both tracking and shape reconstruction tasks.
After initialization of the shape code, tracking and shape adaptation are performed iteratively. At a specific frame, the incoming object point cloud is first aligned to the previous shape, and then the shape is adapted to the aligned point cloud. Both procedures are based on DeepSDF.
We optimize pose by pushing the point cloud to the zero-level set of the SDF field in (a), and deform the SDF field to match the point cloud in (b) for better shape. We use different temperature of the color to represent the distances from the surface.
@article{ye2022online,
author = {Ye, Jianglong and Chen, Yuntao and Wang, Naiyan and Wang, Xiaolong},
title = {Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild},
journal = {IEEE Robotics and Automation Letters},
year = {2022},
}