Skip to content

Conversation

@vasilykolosov
Copy link

This fix is maybe somewhat too specific, since it first checks if LC_CTYPE is set to "UTF-8".

I tested this on several systems, and it does solve (kind of) the problem. Basically, if locale.setlocale(locale.LC_ALL, '') fails, it calls the following instead:

locale.setlocale(locale.LC_ALL, ('en_us', 'UTF-8'))

'en_us' and 'UTF-8' are derived from LANG environmental variable, which is specific to remote Ubuntu system.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do os.environ.get("LANG", "some default") to be extra paranoid.

Why do you split() it on the '.' character? That will return an array, did you want to get the first component of that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do os.environ.get("LANG", "some default") to be extra paranoid.

Makes sense

Why do you split() it on the '.' character? That will return an array, did you want to get the first component of that?

No, but you're right — there's no need to do a split. I was a bit confused by example in documentation, where setlocale is called with a tuple ("en_us", "UTF-8") as a second argument, but a string is okay too.

Fixing that in next commit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the spec says you can put some kind of expressions there, like en_US.UTF-8;fr_FR.UTF-8 for example, I think it's best to minimize our need to parse that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unnecessary split() (You were absolutely right about that!)

Do you think falling back to 'C' is a good idea? I tested that just now, and cnf seems to behave correctly with LC_CTYPE set to 'C'. Besides, it's pretty universal.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using C can be tricky. At least when you start with LANG=C, python behaves like it's the 1980 again and sys.stdout.encoding is ASCII. That break everything, including us trying to print the input line (say I mistype ł a non-ASCII) character. For all intents and purposes I would not do that.

I think it's worth separating translation support, which depends on lots of stuff working right from just not crashing on weird settings and random garbage as input. We should strive not to crash above all. Getting translation support should be a cherry on top that we do when everything else is working.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a quick test, use a non-en locale, try installing something that you can still operate without understanding and has translations enabled. That will trigger all kinds of issues. I recommend a far-east language (but I don't know what has translations ATM) or something as simple as German, which should have translations and enough non-ASCII characters to cause trouble.

If that works for you (using LANG=C fallback) then say so, I assume it breaks so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants