cudaMalloc on different gpus and ram sizes

Hi,

I currently testing out milo and it's parameters. I'am using the RTX50 dockerfile from #20 and the rtx 50 repo. I started the following training on the Truck demo data on different GPUs and always get an cudaMalloc exception. 

### Training command

```bash
python /workspace/MILo/milo/train.py -s /data/input/Truck -m /data/output/Truck --imp_metric outdoor --rasterizer radegs --eval --mesh_config default --decoupled_appearance --log_interval 200 --save_iterations 2000 4000 6000 8000 10000 12000 14000 16000 18000 --checkpoint_iterations 2000 4000 6000 8000 10000 12000 14000 16000 18000 --data_device cpu --config_path /workspace/MILo/milo/configs/fast
```

### Tested GPUs

* Nvidia GTX 5060 TI, 16 GB RAM
* Nvidia GTX 5070 TI, 16 GB RAM
* Nvidia GTX 5090, 32 GB RAM
* Nvidia H100, 80 GB RAM

### Exception

The crash does not crash at the same iteration on every gpu, so I don't think its a problem with the demo data.

```bash
Training progress:  71%|██████████████████████████████████████████████████████████████████████████████████████████▉                                     | 12790/18000 [31:50<15:18,  5.67it/s, Loss=0.0630130, DNLoss=0.0062229, MDLoss=0.0019296, MNLoss=0.0061558, OccLoss=0.0000477, OccLabLoss=0.0011094, N_Gauss=319469]
[INFO] Resetting occupancy labels at iteration 12800. [03/02 11:15:44]
Computing occupancy from mesh:   0%|                                                                                                                                                                                                                                                                 | 0/219 [00:00<?, ?it/s]
Traceback (most recent call last):%|                                                                                                                                                                                                                                                                 | 0/219 [00:00<?, ?it/s]
  File "/workspace/MILo/milo/train.py", line 652, in <module>
    training(
  File "/workspace/MILo/milo/train.py", line 288, in training
    mesh_regularization_pkg = compute_mesh_regularization(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/MILo/milo/regularization/regularizer/mesh.py", line 535, in compute_mesh_regularization
    voronoi_occupancy_labels, _ = evaluate_mesh_occupancy(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/MILo/milo/regularization/sdf/depth_fusion.py", line 541, in evaluate_mesh_occupancy
    render_pkg = mesh_renderer(
                 ^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/MILo/milo/scene/mesh.py", line 396, in forward
    fragments, rast_out, pos = self.rasterizer(mesh, cameras, cam_idx, return_rast_out=True, return_positions=True)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/MILo/milo/scene/mesh.py", line 341, in forward
    nvdiff_rast_out = nvdiff_rasterization(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/MILo/milo/scene/mesh.py", line 127, in nvdiff_rasterization
    rast_chunk, _ = dr.rasterize(
                    ^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/nvdiffrast/torch/ops.py", line 135, in rasterize
    return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/autograd/function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/nvdiffrast/torch/ops.py", line 78, in forward
    out, out_db = _nvdiffrast_c.rasterize_fwd_cuda(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cuda error: 2[cudaMalloc(&m_gpuPtr, bytes);]
Training progress:  71%|██████████████████████████████████████████████████████████████████████████████████████████▉                                     | 12790/18000 [31:52<12:59,  6.69it/s, Loss=0.0630130, DNLoss=0.0062229, MDLoss=0.0019296, MNLoss=0.0061558, OccLoss=0.0000477, OccLabLoss=0.0011094, N_Gauss=319469]
```

Any ideas whats going wrong? Some missleading parameters?
If I use mesh_config=verylowres and set the sampling_factor=0.1 or 0.2 it works.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaMalloc on different gpus and ram sizes #38

Training command

Tested GPUs

Exception

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

cudaMalloc on different gpus and ram sizes #38

Description

Training command

Tested GPUs

Exception

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions