1D protein convolution layer issues

Regarding issues #13 (1D convolution direction) and #12 (inconsistencies between the repository and publication), I am reopening these issues to follow up, as they appear to remain unresolved.

As discussed in #13, the protein convolution layer in the current implementation operates over the embedding dimension rather than the sequence dimension, deviating from the standard sequence-based application of convolution. As a result, each of the 1000 amino acid embeddings is effectively collapsed and squashed into a single embedding per convolution channel, eliminating almost all sequential information.

When the protein embedding matrix is permuted from `[batch, 1000, 128]` to `[batch, 128, 1000]`, allowing PyTorch's `Conv1d` to operate along the sequence dimension as intended, the performance of all evaluated models changes drastically.

In addition, as mentioned in #12, the 1D protein convolution architecture described in the [paper](https://doi.org/10.1093/bioinformatics/btaa921) (Section 2.3 and Figure 1) specifies three consecutive 1D convolution layers, whereas the current implementation in this repository uses only a single layer. 

I will create a pull request shortly that addresses both the convolution direction and the architectural mismatch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1D protein convolution layer issues #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

1D protein convolution layer issues #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions