Skip to content

Update encodings.aliases #149891

@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

I compared encodings.aliases with the IANA registry (see https://www.iana.org/assignments/character-sets/character-sets.xhtml, I used the CSV format) and found more missing aliases. Most of them are with the "cs" prefix, but there were also CCSID01140, iso-ir-149, KS_C_5601-1989 (we only have KS_C_5601-1987), GB_2312-80, windows-936, etc. One alias, csHPRoman8, was not normalized, so it did not work.

There are some errors in the IANA registry. ISO-8859-11 is an alias of TIS-620, while in Python they differ by one character (euro). MS_Kanji is an alias of Shift_JIS, while in Python it is an alias of cp932 (not IANA registered). I suppose Python is more correct here.

cc @malemburg

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.15pre-release feature fixes, bugs and security fixes3.16new features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or errortype-featureA feature request or enhancement

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions