Skip to content

1D protein convolution layer issues #20

@matija-marijan

Description

@matija-marijan

Regarding issues #13 (1D convolution direction) and #12 (inconsistencies between the repository and publication), I am reopening these issues to follow up, as they appear to remain unresolved.

As discussed in #13, the protein convolution layer in the current implementation operates over the embedding dimension rather than the sequence dimension, deviating from the standard sequence-based application of convolution. As a result, each of the 1000 amino acid embeddings is effectively collapsed and squashed into a single embedding per convolution channel, eliminating almost all sequential information.

When the protein embedding matrix is permuted from [batch, 1000, 128] to [batch, 128, 1000], allowing PyTorch's Conv1d to operate along the sequence dimension as intended, the performance of all evaluated models changes drastically.

In addition, as mentioned in #12, the 1D protein convolution architecture described in the paper (Section 2.3 and Figure 1) specifies three consecutive 1D convolution layers, whereas the current implementation in this repository uses only a single layer.

I will create a pull request shortly that addresses both the convolution direction and the architectural mismatch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions