-
Notifications
You must be signed in to change notification settings - Fork 16
Error when running distributed_train_acme_qrdqn.py #2
Description
Hi, I have been trying to get 'distributed_train_acme_qrdqn.py' to run with only a few agents and I'm getting the following error. I think it might be an issue between jax, dm-acme, and dm-launchpad.
I did some digging and came across this acme/agents/jax/actors
This is where I get stuck as I'm not entirely sure how the Qr-DQN is built with jax and passed to launchpad. I would really appreciate any thoughts on this issue.
Operating System
- Python 3.9.13
- Ubuntu 20.04
Error
/usr/local/lib/python3.9/dist-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead.
PyTreeDef = type(jax.tree_structure(None))
I0908 13:09:34.228399 140062111078208 xla_bridge.py:345] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0908 13:09:34.228528 140062111078208 xla_bridge.py:345] Unable to initialize backend 'cuda': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
I0908 13:09:34.228579 140062111078208 xla_bridge.py:345] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
I0908 13:09:34.229399 140062111078208 xla_bridge.py:345] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
W0908 13:09:34.229537 140062111078208 xla_bridge.py:352] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0908 13:09:34.483748 140052206171904 courier_utils.py:120] Binding: run
I0908 13:09:34.487003 140052206171904 lp_utils.py:87] StepsLimiter: Starting with max_steps = 9600000 (actor_steps)
I0908 13:09:34.487962 140050360694528 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:34.488504 140052214564608 savers.py:164] Attempting to restore checkpoint: None
I0908 13:09:35.382974 140050352301824 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.442431 140046896195328 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.453237 140046232733440 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.453534 140052214564608 courier_utils.py:120] Binding: get_counts
I0908 13:09:35.463889 140046132836096 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.473653 140046098515712 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.482568 140052214564608 courier_utils.py:120] Binding: get_directory
I0908 13:09:35.483737 140045998618368 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.503851 140045981832960 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.504534 140045923084032 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.504815 140052214564608 courier_utils.py:120] Binding: get_steps_key
I0908 13:09:35.524922 140045914691328 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.525063 140052214564608 courier_utils.py:120] Binding: increment
I0908 13:09:35.525359 140045822371584 node.py:61] Reverb client connecting to: localhost:33011
I0908 13:09:35.526567 140052214564608 courier_utils.py:120] Binding: restore
I0908 13:09:35.533543 140052214564608 courier_utils.py:120] Binding: save
I0908 13:09:35.542086 140052214564608 savers.py:155] Saving checkpoint: /root/acme/20220908-130931/checkpoints/counter
I0908 13:09:36.944851 140052206171904 lp_utils.py:95] StepsLimiter: Reached 0 recorded steps
Node ThreadWorker(thread=<Thread(actor, stopped daemon 140045923084032)>, future=<Future at 0x7f61f80a19a0 state=finished raised AttributeError>) crashed:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/launchpad/launch/worker_manager.py", line 474, in _check_workers
worker.future.result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/usr/local/lib/python3.9/dist-packages/launchpad/launch/worker_manager.py", line 250, in run_inner
future.set_result(f())
File "/usr/local/lib/python3.9/dist-packages/launchpad/nodes/python/node.py", line 75, in _construct_function
return functools.partial(self._function, *args, **kwargs)()
File "/usr/local/lib/python3.9/dist-packages/launchpad/nodes/courier/node.py", line 113, in run
instance = self._construct_instance() # pytype:disable=wrong-arg-types
File "/usr/local/lib/python3.9/dist-packages/launchpad/nodes/python/node.py", line 180, in _construct_instance
self._instance = self._constructor(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/acme/jax/experiments/make_distributed_experiment.py", line 169, in build_actor
actor = experiment.builder.make_actor(actor_key, policy_network,
File "/usr/local/lib/python3.9/dist-packages/acme/agents/jax/dqn/builder.py", line 99, in make_actor
return actors.GenericActor(
File "/usr/local/lib/python3.9/dist-packages/acme/agents/jax/actors.py", line 67, in init
self._init = jax.jit(actor.init, backend=backend)
AttributeError: 'function' object has no attribute 'init'
Python Packages
absl-py 0.15.0
ale-py 0.7.3
astunparse 1.6.3
async-generator 1.10
atari-py 0.2.9
attrs 22.1.0
bsuite 0.3.5
cached-property 1.5.2
cachetools 4.2.4
certifi 2021.10.8
chardet 3.0.4
charset-normalizer 2.0.7
chex 0.1.4
clang 5.0
cloudpickle 2.0.0
colorama 0.4.5
commonmark 0.9.1
cycler 0.11.0
dbus-python 1.2.16
decorator 5.1.0
dill 0.3.5.1
distrax 0.1.2
dm-acme 0.4.1
dm-control 0.0.364896371
dm-env 1.5
dm-haiku 0.0.7
dm-launchpad 0.5.2
dm-reverb 0.7.2
dm-sonnet 2.0.0
dm-tree 0.1.6
docker 6.0.0
dopamine-rl 4.0.0
etils 0.7.1
execnet 1.9.0
flatbuffers 1.12
flax 0.5.3
fonttools 4.37.1
frozendict 2.3.4
future 0.18.2
gast 0.4.0
gin 0.1.6
gin-config 0.5.0
glfw 2.5.4
google-api-core 2.8.2
google-api-python-client 2.58.0
google-auth 1.35.0
google-auth-httplib2 0.1.0
google-auth-oauthlib 0.4.6
google-cloud-aiplatform 1.16.1
google-cloud-bigquery 2.34.4
google-cloud-core 2.3.2
google-cloud-resource-manager 1.6.1
google-cloud-storage 2.5.0
google-crc32c 1.3.0
google-pasta 0.2.0
google-resumable-media 2.3.3
googleapis-common-protos 1.56.4
grpc-google-iam-v1 0.12.4
grpcio 1.47.0
grpcio-status 1.47.0
gym 0.21.0
h5py 3.1.0
httplib2 0.20.4
humanize 4.3.0
idna 3.3
imageio 2.21.2
immutabledict 2.2.1
importlab 0.7
importlib-metadata 4.8.1
importlib-resources 5.4.0
iniconfig 1.1.1
jax 0.3.16
jaxlib 0.3.14
jmp 0.0.2
joblib 1.1.0
keras 2.8.0
Keras-Preprocessing 1.1.2
kiwisolver 1.3.2
kubernetes 24.2.0
labmaze 1.0.5
libclang 12.0.0
libcst 0.4.7
lxml 4.9.1
Markdown 3.3.4
matplotlib 3.5.3
mizani 0.7.4
mock 4.0.3
msgpack 1.0.2
mypy-extensions 0.4.3
networkx 2.8.6
ninja 1.10.2.3
numpy 1.22.4
oauthlib 3.1.1
opencv-python 4.5.4.58
opensimplex 0.3
opt-einsum 3.3.0
optax 0.0.9
packaging 21.3
palettable 3.3.0
pandas 1.4.4
patsy 0.5.2
Pillow 8.4.0
pip 22.2.2
plotnine 0.9.0
pluggy 1.0.0
portpicker 1.5.2
promise 2.3
proto-plus 1.22.1
protobuf 3.19.1
psutil 5.9.1
py 1.11.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pygame 2.1.0
Pygments 2.13.0
PyGObject 3.36.0
PyOpenGL 3.1.6
pyparsing 3.0.4
pytest 7.1.2
pytest-forked 1.4.0
pytest-xdist 2.5.0
python-apt 2.0.0+ubuntu0.20.4.8
python-dateutil 2.8.2
pytype 2021.8.11
pytz 2021.3
PyWavelets 1.3.0
PyYAML 6.0
requests 2.26.0
requests-oauthlib 1.3.0
requests-unixsocket 0.2.0
rich 11.2.0
rlax 0.1.4
rlds 0.1.5
rsa 4.7.2
s2sphere 0.2.5
scikit-image 0.19.3
scikit-learn 1.0.1
scipy 1.7.1
setuptools 45.2.0
six 1.15.0
sklearn 0.0
SQLAlchemy 1.2.19
statsmodels 0.13.2
tabulate 0.8.10
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.8.0
tensorflow-datasets 4.5.2
tensorflow-estimator 2.8.0
tensorflow-io-gcs-filesystem 0.26.0
tensorflow-metadata 1.10.0
tensorflow-probability 0.15.0
tensorstore 0.1.23
termcolor 1.1.0
tf-estimator-nightly 2.8.0.dev2021122109
tf-slim 1.1.0
tfp-nightly 0.15.0.dev20211104
threadpoolctl 3.0.0
tifffile 2022.8.12
toml 0.10.2
tomli 2.0.1
toolz 0.11.1
tqdm 4.64.0
transitions 0.8.10
trfl 1.2.0
typed-ast 1.5.4
typing_extensions 4.3.0
typing-inspect 0.8.0
uritemplate 4.1.1
urllib3 1.26.7
websocket-client 1.4.0
Werkzeug 2.0.2
wheel 0.34.2
wrapt 1.12.1
xmanager 0.2.0
zipp 3.6.0