3D Reconstruction & 6-DoF Pose Estimation

GitHub RepositoryReport / Paper
Ansh Chablani, Gaurav Behera, Samkit Jain • IIIT Hyderabad
Reconstructed Kitchen
Reconstructed 3D mesh of a kitchen environment from the 7-Scenes Dataset.

Abstract: We present a pipeline for real-time 3D reconstruction and nearest 6 Degrees of Freedom (6DOF) pose estimation of objects from monocular RGB-D video data. Our approach integrates TSDF fusion to build dense 3D models, with pose initialization refined through epipolar geometry and differential rendering.

Introduction & Related Work

6-DoF pose estimation determines the position and orientation of an object in 3D space. While traditional methods rely on CAD Models, they often fail under occlusions or lighting changes. Modern CAD Model-Free approaches like OnePose++ and BundleSDF (shown below) allow for tracking objects without prior geometry models.

OnePose++ BundleSDF
Comparisons of CAD-free architectures: OnePose++ and BundleSDF.

Reconstructed Results

The core of the project involves integrating depth maps into a volumetric grid. Below are reconstructions of scenes and synthetic objects.

Textured Table
Textured mesh reconstruction of the Table dataset.
Bunny 10 Bunny 200
Incremental reconstruction of the Stanford Bunny (10 frames vs 200 frames).
Kitchen Color Kitchen Reconstruction
Full scene reconstruction of a kitchen from the 7-Scenes Dataset.

Methodology: TSDF Fusion

Each voxel stores a signed distance $d$ to the nearest surface, truncated to $[-t, t]$. The update rule for a voxel's TSDF value ($t_d$) and weight ($w$) is a weighted average:

t_d' = (w * t + w_new * t_d) / (w + w_new)
w' = w + w_new
TSDF Detail
2D TSDF integration logic: Voxel-to-camera projection.

Using CUDA, we optimized voxel updates and mesh generation steps to ensure real-time performance even with high voxel resolutions.

Overall Pipeline
System architecture for real-time 3D reconstruction and pose estimation.

6-DoF Pose Refinement

After estimating an initial pose via nearest-neighbor search, we apply refinement:

  • Epipolar Geometry: Rectifies misalignment between frames.
  • Differential Rendering: Minimizes the discrepancy between rendered silhouettes and observed data via backpropagation.

Datasets

Experiments were conducted on the 7-Scenes dataset, synthetic Blender models, and Gazebo simulations of dynamic objects (SUV models).

Datasets
Examples from the various datasets used for testing.

Performance & Benchmarks

The GPU implementation achieved substantial speedups (up to 1000x) over CPU-based methods.

Dataset Frames Time (s) FPS
Stanford Bunny 500 1.68 296.67
Suzanne 500 0.45 530.74
Table Scene 500 1.68 296.67

Hyper-parameters & Methods

Comparison between Marching Cubes and Occupancy Dual Contouring, along with voxel size and truncation experiments.

Marching Cubes ODC
Marching Cubes vs Occupancy Dual Contouring.
0.01 0.5
Impact of voxel size (0.01 vs 0.5).
0.01 0.5
Impact of truncation distance (0.01 vs 0.5).