The demo runs successfully, but doesn't work properly on my own data

Hi, I ran the demo using this command `python demo_run.py --parameter kcat --input_file demo/batch_kcat.csv --use_gpu --checkpoint_dir ../data/pretrained/production/kcat` according to your instructions, and everything showed normal. 

However, when I replaced the sequences and substrates in batch_kcat.csv with my own data, the program returned an error. Interestingly, when I use the ki parameter, it runs without any issues. But when I try kcat or km, the same error occurs.

I’ve attached the input file, command, and the full error message below. Could you please help me troubleshoot this issue or provide some guidance on what might be causing it?

Thank you!

```
cat test/test.csv 
Substrate,SMILES,sequence,pdbpath
t1,CC1([C@@H](N2[C@H](S1)[C@@H](C2=O)NC(=O)Cc3ccccc3)C(=O)O)C,MMSTATASPAVKLNSGYEIPLVGFGCWKLTNDVASDQIYRAIKSGYRLFDGAEDYANEQEVGEGIKRAIKEGIVKREELFITSKLWNSFHDKKNVEVALMKTLSDLNLDYVDLFYIHFPIAQKPVPIEKKYPPGFYCGDGDKWSIEEVPLLDTWRALEKLVDQGLAKSIGISNFSAQLIYDLIRGCTIKPVALQIEHHPYLTQPKLVEYVQLHDIQITGYSSFGPQSFLEMDLKRALDTPVLLEEPTVKSIADKHGKSPAQVLLRYQTQRGIAVIPRSNSPDRMAQNLSVIDFELTQDDLQAIAELDCNLRFNEPWDFSNIPVFVHPRRHFCAGLLAVGLFAAVSAPAAGRSELPYIDSVVNEAARAVIRQHDIAGMVIAVTHQGRQRFYTYGVESLQTRRAVNRDTIFEVGSISKTFTVTLAAYAQAKGLLQLTDSPARFLPELAGTEFAKLSLLNLATHTTGGFPLQVPDEVRDNAQLMQYLKAWKPEHAPGTYRSYANPSIGMLGVVAAVSLKQPFAQAMEKDLFPKLGLSSTFIDVPAAKASRYAQGYNKQGAPVRVNPGVLAAEAYGVKTSARDLLRFVEASMDMDVLDKDIRRAIADTHVGYYQVGAMTQDMVWEQFPYPVPLDSLLTANAGTLNSQSHPAQALQPPLAPQAQTWINKTGSTNGFGAYVAFVPARKLGIVILANRNYPNDARVRLAAEILGAVEKQPMAPAGAR,seq1.pdb
```

```
python demo_run.py --parameter kcat --input_file test/test.csv --use_gpu --checkpoint_dir ../data/pretrained/production/kcat
/home/data/usr/dx/software/catpred_pipeline/CatPred
Predicting.. This will take a while..
calculating protein embed only on cpu
/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/rotary_embedding_torch/rotary_embedding_torch.py:35: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/rotary_embedding_torch/rotary_embedding_torch.py:268: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
Loading training args
/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/utils.py:501: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  vars(torch.load(path, map_location=lambda storage, loc: storage)["args"]),
Loading models
Setting molecule featurization parameters to default.
Loading data
0it [00:00, ?it/s]/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/data/cache_utils.py:95: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(str(entry_path))
1it [00:00, 14.58it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 20262.34it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1822.03it/s]
Validating SMILES
Test size = 1
  0%|                                                                                                                              | 0/10 [00:00<?, ?it/s]/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/utils.py:113: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state = torch.load(path, map_location=lambda storage, loc: storage)
Creating protein model
MoleculeModel(
  (softplus): Softplus(beta=1.0, threshold=20.0)
  (encoder): MPN(
    (encoder): ModuleList(
      (0): MPNEncoder(
        (dropout): Dropout(p=0.0, inplace=False)
        (act_func): ReLU()
        (W_i): Linear(in_features=147, out_features=300, bias=False)
        (W_h): Linear(in_features=300, out_features=300, bias=False)
        (W_o): Linear(in_features=433, out_features=300, bias=True)
      )
    )
  )
  (seq_embedder): Embedding(21, 36, padding_idx=20)
  (rotary_embedder): RotaryEmbedding()
  (multihead_attn): MultiheadAttention(
    (out_proj): NonDynamicallyQuantizableLinear(in_features=36, out_features=36, bias=True)
  )
  (attentive_pooler): AttentivePooling(
    (linear1): Linear(in_features=1316, out_features=1316, bias=True)
    (tanh): Tanh()
    (linear2): Linear(in_features=1316, out_features=1, bias=True)
    (softmax): Softmax(dim=1)
  )
  (readout): Sequential(
    (0): Dropout(p=0.0, inplace=False)
    (1): Linear(in_features=1616, out_features=300, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=300, out_features=2, bias=True)
  )
)
Loading pretrained parameter "encoder.encoder.0.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.0.W_i.weight".
Loading pretrained parameter "encoder.encoder.0.W_h.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.bias".
Loading pretrained parameter "seq_embedder.weight".
Loading pretrained parameter "rotary_embedder.freqs".
Loading pretrained parameter "multihead_attn.in_proj_weight".
Loading pretrained parameter "multihead_attn.in_proj_bias".
Loading pretrained parameter "multihead_attn.out_proj.weight".
Loading pretrained parameter "multihead_attn.out_proj.bias".
Loading pretrained parameter "attentive_pooler.linear1.weight".
Loading pretrained parameter "attentive_pooler.linear1.bias".
Loading pretrained parameter "attentive_pooler.linear2.weight".
Loading pretrained parameter "attentive_pooler.linear2.bias".
Loading pretrained parameter "readout.1.weight".
Loading pretrained parameter "readout.1.bias".
Loading pretrained parameter "readout.4.weight".
Loading pretrained parameter "readout.4.bias".
Moving model to cuda
/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/utils.py:446: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state = torch.load(path, map_location=lambda storage, loc: storage)
  0%|                                                                                                                              | 0/10 [00:01<?, ?it/s]
Traceback (most recent call last):                                                                                                                        
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/predict.py", line 35, in <module>
    results = main()
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/predict.py", line 30, in main
    results = catpred_predict()
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/train/make_predictions.py", line 515, in catpred_predict
    make_predictions(args=PredictArgs().parse_args())
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/utils.py", line 619, in wrap
    result = func(*args, **kwargs)
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/train/make_predictions.py", line 471, in make_predictions
    preds, unc = predict_and_save(
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/train/make_predictions.py", line 167, in predict_and_save
    estimator = UncertaintyEstimator(
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/uncertainty/uncertainty_estimator.py", line 29, in __init__
    self.predictor = build_uncertainty_predictor(
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/uncertainty/uncertainty_predictor.py", line 1290, in build_uncertainty_predictor
    predictor = predictor_class(
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/uncertainty/uncertainty_predictor.py", line 51, in __init__
    self.calculate_predictions()
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/uncertainty/uncertainty_predictor.py", line 354, in calculate_predictions
    preds, var = predict(
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/train/predict.py", line 108, in predict
    batch_preds = model(
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/catpred/models/model.py", line 435, in forward
    seq_outs = torch.cat([esm_feature_arr, seq_outs], dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1329 but got size 720 for tensor number 1 in the list.
Traceback (most recent call last):
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/demo_run.py", line 150, in <module>
    main(args)
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/demo_run.py", line 131, in main
    output_final = get_predictions(args.parameter, outfile)
  File "/home/data/usr/dx/software/catpred_pipeline/CatPred/demo_run.py", line 75, in get_predictions
    df = pd.read_csv(outfile)
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
  File "/home/data/usr/dx/miniconda3/envs/catpred/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'test/test_input_output.csv'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The demo runs successfully, but doesn't work properly on my own data #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The demo runs successfully, but doesn't work properly on my own data #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions