-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I can run ```compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20on a remote interactive cluster job, but when I submit via sbatch, I getValueError: All arrays must be of the same length`. This is independent of the cryo heterogeneity challenge wrapper. The shapes of the refs and targets is (80, 11239424) = (80, 224**3).
error trace
::::::::::::::
slurm/logs/4713549.err
::::::::::::::
2025-05-06 23:42:09.330840: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different comput
ation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-06 23:42:09.795816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been regist
ered
2025-05-06 23:42:09.796165: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registe
red
2025-05-06 23:42:09.828360: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been reg
istered
2025-05-06 23:42:09.929514: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-06 23:42:15.686854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Projecting volumes: 14%|█▍ | 11/80 [00:06<00:26, 2.38it/s]
Projecting volumes: 28%|██▊ | 22/80 [00:10<00:21, 2.65it/s]
Projecting volumes: 41%|████▏ | 33/80 [00:15<00:19, 2.50it/s]
Projecting volumes: 55%|█████▍ | 43/80 [00:18<00:13, 2.80it/s]
Projecting volumes: 68%|████�█▋ | 53/80 [00:22<00:09, 2.90it/s]
Projecting volumes: 80%|█�█████▉ | 63/80 [00:26<00:06, 2.65it/s]
Projecting volumes: 91%|█████████▏| 73/80 [00:29<00:02, 2.54it/s]
Projecting volumes: 100%|██████████| 80/80 [00:32<00:00, 2.44it/s]
Traceback (most recent call last):
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
sys.exit(main())
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
compute_distance_matrix(**inputs)
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 70, in compute_distance_matrix
md = XmippMetaData(os.path.join(outPath, "projections.mrcs"), angles=angles_all,
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/xmipp_metadata/metadata/xmipp_metadata.py", line 86, in __init__
self.table = pd.DataFrame.from_dict(COLUMN_DICT)
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 1813, in from_dict
return cls(data, index=index, columns=columns, dtype=dtype)
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 733, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
index = _extract_index(arrays)
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
submission script
#!/bin/bash
#SBATCH --job-name=map_to_map_zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=gpu
#SBATCH -c 1
#SBATCH --time=99:00:00
#SBATCH --gpus=1
#SBATCH --constraint="a100|h100"
# for N in {1..24}; do sbatch submission_zernike.sh $N ; sleep 2; done
N=${1:-1} # default to 1 if no argument is given
cd /mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/
for ICE_CREAM in $(ls submission*pt | sed -e 's/submission_//' -e 's/.pt//' | head -n "$N" | tail -n 1);
do
cd /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/
PATH_TO_SUBMISSION_FILE=/mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/submission_${ICE_CREAM}.pt
PATH_TO_OUTPUT=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/map_to_map_zernike_${ICE_CREAM}.pkl
TMP_DIR_ZERNIKE=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir_zernike_${ICE_CREAM}
sed -e "s|PATH_TO_SUBMISSION_FILE|${PATH_TO_SUBMISSION_FILE}|" -e "s|PATH_TO_OUTPUT|${PATH_TO_OUTPUT}|" -e "s|TMP_DIR_ZERNIKE|${TMP_DIR_ZERNIKE}|" config_files/config_map_to_map_template_zernike.yaml > config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
run_map_to_map_pipeline --config config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
# wraps the command: compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
done
Running the command directly, this is the error trace
(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$ compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
2025-05-07 00:02:10.658147: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:10.696375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:10.696403: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:10.697491: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:10.703892: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:11.498334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-07 00:02:15.825978: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:15.863871: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:15.863896: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:15.864970: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:15.871257: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:16.633121: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Open3D has not been installed. The program will continue without this package
/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:
TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).
For more information see: https://github.com/tensorflow/addons/issues/2807
warnings.warn(
Open3D has not been installed. The program will continue without this package
2025-05-07 00:02:17.855187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38548 MB memory: -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:31:00.0, compute capability: 8.0
Traceback (most recent call last):
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/train_zernike3deep.py", line 8, in <module>
sys.exit(main())
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 226, in main
train(**inputs)
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 133, in train
autoencoder.build(input_shape=[(None, autoencoder.xsize, autoencoder.xsize, 1),
AttributeError: 'AutoEncoder' object has no attribute 'xsize'
Traceback (most recent call last):
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
sys.exit(main())
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
compute_distance_matrix(**inputs)
File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 102, in compute_distance_matrix
subprocess.check_call(f'eval "$({conda_base}/bin/conda shell.bash hook)" && '
File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'eval "$(/mnt/home/gwoollard/software/mambaforge/bin/conda shell.bash hook)" && conda activate flexutils-tensorflow && train_zernike3deep.py --md_file /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd --out_path /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1 --L1 7 --L2 7 --batch_size 1024 --lr 0.001 --epochs 40 --architecture mlpnn --cost corr --regNorm 0.001 --apply_ctf 0 --shuffle --step 1 --split_train 1 --ctf_type apply --sr 1.0 --pose_reg 0.0 --ctf_reg 0.0 --gpu 0' returned non-zero exit status 1.
(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$ more /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels