Skip to content

compute_distance_matrix_zernike3deep - ValueError: All arrays must be of the same length #30

@geoffwoollard

Description

@geoffwoollard

I can run ```compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20on a remote interactive cluster job, but when I submit via sbatch, I getValueError: All arrays must be of the same length`. This is independent of the cryo heterogeneity challenge wrapper. The shapes of the refs and targets is (80, 11239424) = (80, 224**3).

error trace

::::::::::::::
slurm/logs/4713549.err
::::::::::::::
2025-05-06 23:42:09.330840: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different comput
ation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-06 23:42:09.795816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been regist
ered
2025-05-06 23:42:09.796165: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registe
red
2025-05-06 23:42:09.828360: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been reg
istered
2025-05-06 23:42:09.929514: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-06 23:42:15.686854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Projecting volumes:  14%|█▍        | 11/80 [00:06<00:26,  2.38it/s]
Projecting volumes:  28%|██▊       | 22/80 [00:10<00:21,  2.65it/s]
Projecting volumes:  41%|████▏     | 33/80 [00:15<00:19,  2.50it/s]
Projecting volumes:  55%|█████▍    | 43/80 [00:18<00:13,  2.80it/s]
Projecting volumes:  68%|████�█▋   | 53/80 [00:22<00:09,  2.90it/s]
Projecting volumes:  80%|█�█████▉  | 63/80 [00:26<00:06,  2.65it/s]
Projecting volumes:  91%|█████████▏| 73/80 [00:29<00:02,  2.54it/s]
Projecting volumes: 100%|██████████| 80/80 [00:32<00:00,  2.44it/s]
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
    compute_distance_matrix(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 70, in compute_distance_matrix
    md = XmippMetaData(os.path.join(outPath, "projections.mrcs"), angles=angles_all,
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/xmipp_metadata/metadata/xmipp_metadata.py", line 86, in __init__
    self.table = pd.DataFrame.from_dict(COLUMN_DICT)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 1813, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 733, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

submission script

#!/bin/bash
#SBATCH --job-name=map_to_map_zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=gpu
#SBATCH -c 1
#SBATCH --time=99:00:00
#SBATCH --gpus=1
#SBATCH --constraint="a100|h100"

# for N in {1..24}; do sbatch submission_zernike.sh $N ; sleep 2; done

N=${1:-1}  # default to 1 if no argument is given

cd /mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/

for ICE_CREAM in $(ls submission*pt | sed -e 's/submission_//' -e 's/.pt//' | head -n "$N" | tail -n 1);
do   
    cd /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/
    PATH_TO_SUBMISSION_FILE=/mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/submission_${ICE_CREAM}.pt
    PATH_TO_OUTPUT=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/map_to_map_zernike_${ICE_CREAM}.pkl
    TMP_DIR_ZERNIKE=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir_zernike_${ICE_CREAM}
    sed -e "s|PATH_TO_SUBMISSION_FILE|${PATH_TO_SUBMISSION_FILE}|" -e "s|PATH_TO_OUTPUT|${PATH_TO_OUTPUT}|" -e "s|TMP_DIR_ZERNIKE|${TMP_DIR_ZERNIKE}|" config_files/config_map_to_map_template_zernike.yaml > config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
    run_map_to_map_pipeline --config config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
   #  wraps the command: compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
done 

Running the command directly, this is the error trace

(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$     compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
2025-05-07 00:02:10.658147: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:10.696375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:10.696403: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:10.697491: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:10.703892: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:11.498334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-07 00:02:15.825978: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:15.863871: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:15.863896: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:15.864970: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:15.871257: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:16.633121: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Open3D has not been installed. The program will continue without this package
/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Open3D has not been installed. The program will continue without this package
2025-05-07 00:02:17.855187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38548 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:31:00.0, compute capability: 8.0
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/train_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 226, in main
    train(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 133, in train
    autoencoder.build(input_shape=[(None, autoencoder.xsize, autoencoder.xsize, 1),
AttributeError: 'AutoEncoder' object has no attribute 'xsize'
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
    compute_distance_matrix(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 102, in compute_distance_matrix
    subprocess.check_call(f'eval "$({conda_base}/bin/conda shell.bash hook)" && '
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'eval "$(/mnt/home/gwoollard/software/mambaforge/bin/conda shell.bash hook)" && conda activate flexutils-tensorflow && train_zernike3deep.py --md_file /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd --out_path /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1 --L1 7 --L2 7 --batch_size 1024 --lr 0.001 --epochs 40 --architecture mlpnn --cost corr --regNorm 0.001 --apply_ctf 0 --shuffle --step 1 --split_train 1 --ctf_type apply --sr 1.0 --pose_reg 0.0 --ctf_reg 0.0 --gpu 0' returned non-zero exit status 1.
(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$ more /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions