Skip to content

å character mojibake is not processed correctly by fix_encoding() when it is at the beginning of a string #222

@Justin-Folvarcik

Description

@Justin-Folvarcik

The "å" character, which can end up looking like "Ã¥" due to mojibake, cannot be processed if it is the first character in the string.

Some example code that will highlight the problem:

from ftfy import fix_encoding

text = "Ã¥klagarmyndighets"

print(fix_encoding(text))

Output:

Ã¥klagarmyndighets

Expected output:

åklagarmyndighets

I have absolutely no idea why this is the case, but it only happens when that particular mix-up is at the beginning of the string. Even whitespace will fix the problem. IE:

text = " åklagarmyndighets"

This can be processed correctly as åklagarmyndighets

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions