Skip to content

Conversation

@ryanbarouki
Copy link

This usually happens when 'U' (Selenocysteine) is found in a protein sequence.
The seg fault only seems to happen when called from the python bindings. Otherwise, it seems that the uninitialized memory is zero when calling the binary directly.

Currently, when a character is found that isn't in the alphabet, a warning is triggered but the loop continues, leaving the uninitialized memory in that array. This is later used in
lib/src/bpm.c:

for (int i = 0; i < n+W; i++) {
        /* LOG_MSG(" (%d)",i); */
        uint8_t c = 0;
        if(i >= n){
                /* seq padding  */
                c = 0;
        }else{
                c = (uint8_t)t[i]; <---- Using uninitialized memory as an index
        }
        /* LOG_MSG(" c: %d  (%d)", c,i); */
        int carry = 0;

        for (int b = 0; b <= y; b++) {
                /* LOG_MSG("y: %d c: %d b: %d n: %d m:%d",y, c,b,n,m); */
                uint64_t Pv = P[b];
                uint64_t Mv = M[b];
                uint64_t Eq = Peq[c][b]; <---- Undefined behaviour

The fix tries to treat the unknown alphabet letters as ambiguous or it falls back to zeroing the memory.

The seg fault only seems to happen when called from the python bindings
Otherwise it seems that the uninitialized memory is zero when calling
the binary directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant