[PATCH] filter: set environment variable PYTHONIOENCODING to utf-8
John Keeping
john at keeping.me.uk
Fri Mar 17 21:04:55 CET 2017
On Fri, Mar 17, 2017 at 07:07:02PM +0100, Jason A. Donenfeld wrote:
> On Sun, Mar 12, 2017 at 6:51 PM, John Keeping <john at keeping.me.uk> wrote:
> > While I'm inclined to agree with this, in this particular case we
> > explicitly encode pages as UTF-8 so there is an argument that we should
> > be telling child processes that UTF-8 is the correct encoding.
>
> That's a compelling argument, actually.
>
> >
> > Maybe we should be looking to change LANG instead, but I'm not sure how
> > reliably we can do that.
>
> I'm more onboard with that. Does changing LANG influence the PYTHON
> variable implicitly?
Yes, if there is no explicit encoding requested then Python derives it
from the locale.
However, it only works if the locale actually exists on the system; for
example on my system I get:
$ LANG=en_GB.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)'
UTF-8
$ LANG=en_GB.ISO-8859-1 python2 -c 'import sys; print(sys.stdin.encoding)'
ISO-8859-1
but I don't have C.UTF-8, so:
$ LC_ALL=C.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)'
ANSI_X3.4-1968
There's an open glibc bug [1] to support C.UTF-8 but for now it looks
like it's only available on Debian and derivatives.
> > Is it safe to do something like:
> >
> > const char *lang = getenv("LANG");
> > struct strbuf sb = STRBUF_INIT;
> >
> > if (!lang)
> > lang = "C";
> > strbuf_addf(&sb, "%.*s.UTF-8",
> > (int) (strchrnul(lang, '.') - lang), lang);
> > setenv("LANG", sb.buf);
>
> That's probably not too bad, though I wonder if we could get away with
> just explicitly setting a more generic UTF-8 instead of trying to read
> the user's language preferences.
Other people have already found that it's not quite that simple [2] if
we want it to work on all systems.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17318
[2] https://github.com/commercialhaskell/stack/issues/856
More information about the CGit
mailing list