Skip to content

RGB-D SLAM with Masks #30

@fmuehlis

Description

@fmuehlis

System setup
Hardware:

  • RTX 3070
  • Ryzen 5600X (32GB RAM)

GPU Info:

  • Driver 580.95.05
  • CUDA 13.0

OS Info:

  • Ubuntu 24.04
  • Docker 29.1.3
  • Container Toolkit 1.18.1

PyCuVSLAM is running in a Docker container based on the nvidia/cuda:12.6.1-devel-ubuntu22.04 image.

Camera setup
Camera data produced by Toyota HSR Robot (RGB-D Camera) via ROS bag file.

Description
I am trying to use RGB-D SLAM with PyCuVSLAM and masks to cut out dynamic objects in the scene. This leads to almost all of the features to be discarded from the landmarks. Altheigh observations seem to be correctly masked out of the image.

Image without mask (last observations in red):
Image
Features are detected on the chair.

Image with mask:
Image
Features are properly discarded.

Pointcloud without any masking:
Image
Example result:

last_observations: 217
last_landmarks: 130
final_landmarks: 512

Pointcloud with an all 0 mask:
Image
Example result:

last_observations: 321
last_landmarks: 0
final_landmarks: 0

Pointcloud with an all 255 mask:
Image
Result on every frame:

last_observations: 0
last_landmarks: 0
final_landmarks: 0

Pointcloud with a real mask:
Image
Example result:

last_observations: 316
last_landmarks: 0
final_landmarks: 0

To Reproduce

  1. Use PyCuVSLAM in RGBD Mode with a SLAM Config
rgbd_settings = vslam.Tracker.OdometryRGBDSettings(
    depth_scale_factor=1.0 / 0.001,
    depth_camera_id=0,
    enable_depth_stereo_tracking=False,
)
odom_config = vslam.Tracker.OdometryConfig(
    async_sba=True,
    enable_final_landmarks_export=True,
    odometry_mode=vslam.Tracker.OdometryMode.RGBD,
    rgbd_settings=rgbd_settings,
    use_denoising=True,
    use_motion_model=True, 
    use_gpu=True,
)
slam_config = vslam.Tracker.SlamConfig(
    use_gpu=True,
    sync_mode=False,
)
loc_settings = vslam.Tracker.SlamLocalizationSettings(
    horizontal_search_radius=8.0,
    vertical_search_radius=2.0,
    horizontal_step=0.5,
    vertical_step=0.2,
    angular_step_rads=0.03,
)

rig = vslam.Rig()

cam = vslam.Camera()
cam.distortion = vslam.Distortion(vslam.Distortion.Model.Pinhole)
cam.focal = fx, fy
cam.principal = cx, cy
cam.size = width, height
cam.rig_from_camera.rotation = [-0.500, 0.500, -0.500, 0.500]

rig.cameras = [cam]

tracker = vslam.Tracker(rig, odom_config, slam_config)
  1. Provide any mask to the track method:
H, W = image.shape[:2]

assert np.all(image.shape[:2] == depth.shape[:2])

mask = np.zeros((H, W), dtype=np.uint8)  # or np.ones((H, W), dtype=np.uint8)

odom_pose_estimate, slam_pose_raw = tracker.track(
    timestamp_ns,
    images=[image],
    masks=[mask],
    depths=[depth],
)
  1. Realize that all points are discarded from the final landmarks
landmarks = tracker.get_final_landmarks().values()
print(f"number of landmarks: {len(landmarks)}")
# number of landmarks: 0

Expected behavior
I would expect that only features inside the mask are discarded. If I provide an all zero mask all features should be kept. If I provide an all 255 mask all features should be discarded.

What you have tried
I have tried

  • providing a real-world example mask created with segmentation models like YOLOE and SAM
  • providing a mask with np.zeros
  • providing a mask with np.ones
  • providing a mask with np.ones * 255

Additional information
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions