RGB-D SLAM with Masks

**System setup**
Hardware:
- RTX 3070
- Ryzen 5600X (32GB RAM)

GPU Info:
- Driver 580.95.05
- CUDA 13.0

OS Info:
- Ubuntu 24.04
- Docker 29.1.3
- Container Toolkit 1.18.1

PyCuVSLAM is running in a Docker container based on the `nvidia/cuda:12.6.1-devel-ubuntu22.04` image.

**Camera setup**
Camera data produced by Toyota HSR Robot (RGB-D Camera) via ROS bag file.

**Description**
I am trying to use RGB-D SLAM with PyCuVSLAM and masks to cut out dynamic objects in the scene. This leads to almost all of the features to be discarded from the landmarks. Altheigh observations seem to be correctly masked out of the image.

Image without mask (last observations in red):
<img width="1081" height="842" alt="Image" src="https://github.com/user-attachments/assets/34a20e56-15de-4a93-a33d-7545207af245" />
Features are detected on the chair.

Image with mask:
<img width="1081" height="842" alt="Image" src="https://github.com/user-attachments/assets/269c924f-e206-4573-a935-40723a63a865" />
Features are properly discarded.

Pointcloud without any masking:
<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/62742d2e-a523-4d3e-9d38-a4ce4373deea" />
Example result:
```
last_observations: 217
last_landmarks: 130
final_landmarks: 512
```

Pointcloud with an all `0` mask:
<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/9f791711-7083-4e44-b248-2440632cdec7" />
Example result:
```
last_observations: 321
last_landmarks: 0
final_landmarks: 0
```

Pointcloud with an all `255` mask:
<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/b9dce8c7-2fd6-4b1d-b6a7-f2234297ad7f" />
Result on every frame:
```
last_observations: 0
last_landmarks: 0
final_landmarks: 0
```

Pointcloud with a real mask:
<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/2d1e600c-ccd5-4c7b-b111-d66f7f2c6361" />
Example result:
```
last_observations: 316
last_landmarks: 0
final_landmarks: 0
```

**To Reproduce**
1. Use PyCuVSLAM in RGBD Mode with a SLAM Config
  ```python
  rgbd_settings = vslam.Tracker.OdometryRGBDSettings(
      depth_scale_factor=1.0 / 0.001,
      depth_camera_id=0,
      enable_depth_stereo_tracking=False,
  )
  odom_config = vslam.Tracker.OdometryConfig(
      async_sba=True,
      enable_final_landmarks_export=True,
      odometry_mode=vslam.Tracker.OdometryMode.RGBD,
      rgbd_settings=rgbd_settings,
      use_denoising=True,
      use_motion_model=True, 
      use_gpu=True,
  )
  slam_config = vslam.Tracker.SlamConfig(
      use_gpu=True,
      sync_mode=False,
  )
  loc_settings = vslam.Tracker.SlamLocalizationSettings(
      horizontal_search_radius=8.0,
      vertical_search_radius=2.0,
      horizontal_step=0.5,
      vertical_step=0.2,
      angular_step_rads=0.03,
  )

 rig = vslam.Rig()

  cam = vslam.Camera()
  cam.distortion = vslam.Distortion(vslam.Distortion.Model.Pinhole)
  cam.focal = fx, fy
  cam.principal = cx, cy
  cam.size = width, height
  cam.rig_from_camera.rotation = [-0.500, 0.500, -0.500, 0.500]

  rig.cameras = [cam]

  tracker = vslam.Tracker(rig, odom_config, slam_config)
  ```
2. Provide any mask to the `track` method:
  ```python
  H, W = image.shape[:2]

  assert np.all(image.shape[:2] == depth.shape[:2])

  mask = np.zeros((H, W), dtype=np.uint8)  # or np.ones((H, W), dtype=np.uint8)
  
  odom_pose_estimate, slam_pose_raw = tracker.track(
      timestamp_ns,
      images=[image],
      masks=[mask],
      depths=[depth],
  )
  ``` 
3. Realize that all points are discarded from the final landmarks
  ```python
  landmarks = tracker.get_final_landmarks().values()
  print(f"number of landmarks: {len(landmarks)}")
  # number of landmarks: 0
  ```


**Expected behavior**
I would expect that only features inside the mask are discarded. If I provide an all zero mask all features should be kept. If I provide an all 255 mask all features should be discarded.

**What you have tried**
I have tried
- providing a real-world example mask created with segmentation models like YOLOE and SAM
- providing a mask with `np.zeros`
- providing a mask with `np.ones`
- providing a mask with `np.ones * 255`

**Additional information**
N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RGB-D SLAM with Masks #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RGB-D SLAM with Masks #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions