-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Regarding issues #13 (1D convolution direction) and #12 (inconsistencies between the repository and publication), I am reopening these issues to follow up, as they appear to remain unresolved.
As discussed in #13, the protein convolution layer in the current implementation operates over the embedding dimension rather than the sequence dimension, deviating from the standard sequence-based application of convolution. As a result, each of the 1000 amino acid embeddings is effectively collapsed and squashed into a single embedding per convolution channel, eliminating almost all sequential information.
When the protein embedding matrix is permuted from [batch, 1000, 128] to [batch, 128, 1000], allowing PyTorch's Conv1d to operate along the sequence dimension as intended, the performance of all evaluated models changes drastically.
In addition, as mentioned in #12, the 1D protein convolution architecture described in the paper (Section 2.3 and Figure 1) specifies three consecutive 1D convolution layers, whereas the current implementation in this repository uses only a single layer.
I will create a pull request shortly that addresses both the convolution direction and the architectural mismatch.