Skip to content

运行eval.py脚本时会爆显存 #5

@bertum-bo

Description

@bertum-bo

请问在运行eval.py的时候有碰到显存不足的问题吗?
我的显卡是4090的,在运行eval.py时监控内存的使用情况如下图。

Image Image

报错的内容为:

Error executing job with overrides: []
Traceback (most recent call last):
File "eval.py", line 276, in main
info.update(evaluate())
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/extscache/omni.pip.torch-1_13_0-0.1.4+104.1.lx64/torch-1-13-0/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "eval.py", line 235, in evaluate
return_contiguous=False
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 6677, in clone
*[td.clone() for td in self.tensordicts],
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 6677, in
*[td.clone() for td in self.tensordicts],
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 4448, in clone
source={key: _clone_value(value, recurse) for key, value in self.items()},
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 4448, in
source={key: _clone_value(value, recurse) for key, value in self.items()},
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 8580, in _clone_value
return value.clone()
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 4448, in clone
source={key: _clone_value(value, recurse) for key, value in self.items()},
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 4448, in
source={key: _clone_value(value, recurse) for key, value in self.items()},
File "/home/itlab/ybl/SimpleFlight/third_party/tensordict/tensordict/tensordict.py", line 8580, in _clone_value
return value.clone()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.52 GiB total capacity; 16.07 GiB already allocated; 19.62 MiB free; 16.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Exception ignored in: <function _make_registry.._Registry.del at 0x7f552b0e8710>
Traceback (most recent call last):
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in del
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function _make_registry.._Registry.del at 0x7f552b0e8710>
Traceback (most recent call last):
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in del
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function SettingChangeSubscription.del at 0x7f587690f710>
Traceback (most recent call last):
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/kit/kernel/py/omni/kit/app/_impl/init.py", line 114, in del
AttributeError: 'NoneType' object has no attribute 'get_settings'
Exception ignored in: <function RegisteredActions.del at 0x7f48510eac20>
Traceback (most recent call last):
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/extscache/omni.kit.viewport.menubar.lighting-104.0.8/omni/kit/viewport/menubar/lighting/actions.py", line 345, in del
File "/home/itlab/isaacsim/isaac_sim-2022.2.0/extscache/omni.kit.viewport.menubar.lighting-104.0.8/omni/kit/viewport/menubar/lighting/actions.py", line 350, in destroy
TypeError: 'NoneType' object is not callable
2025-11-20 08:04:40 [583,484ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,484ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,484ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,484ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,484ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,485ms] [Warning] [omni.usd] Warning: in operator() at line 95 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyFunction.h -- Tried to call a method on an expired python instance

2025-11-20 08:04:40 [583,486ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,486ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,486ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,486ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,486ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,486ms] [Warning] [omni.usd] Warning: in operator() at line 95 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyFunction.h -- Tried to call a method on an expired python instance

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Warning] [omni.usd] Warning: in operator() at line 95 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyFunction.h -- Tried to call a method on an expired python instance

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Error] [omni.usd] TF_PYTHON_EXCEPTION: in TfPyConvertPythonExceptionToTfErrors at line 114 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyError.cpp -- Tf Python Exception

2025-11-20 08:04:40 [583,487ms] [Warning] [omni.usd] Warning: in operator() at line 95 of /buildAgent/work/ca6c508eae419cf8/USD/pxr/base/tf/pyFunction.h -- Tried to call a method on an expired python instance

2025-11-20 08:04:40 [583,783ms] [Warning] [carb.audio.context] 1 contexts were leaked
段错误 (核心已转储)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions