Code page

In computing, a code page is a table of values that describes the character set used for encoding a particular set of glyphs, usually combined with a number of control characters. The term "code page" originated from IBM's EBCDIC-based mainframe systems,^[1] but many vendors use this term, including Microsoft, SAP,^[2] and Oracle Corporation.^[3] Vendors often allocate their own code page number to a character encoding, even if it is better known by another name (for example UTF-8 character encoding has code page numbers 1208 at IBM, 65001 at Microsoft, 4110 at SAP). The multitude of code page assignments leads many vendors to recommend Unicode.

The code page numbering system

IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest (first) numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware.

With the release of PC DOS version 3.3 (and the near identical MS-DOS 3.3) IBM introduced the code page numbering system to regular PC users, as the code page numbers (and the phrase "code page") were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.^[4]

After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one third-party vendor (Oracle) also has its own different list of numeric assignments.^[3] IBM's current assignments are listed in their CCSID repository, while Microsoft's assignments are documented within the MSDN.^[5] Additionally, a list of the names and approximate IANA (Internet Assigned Numbers Authority) abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as Internet Explorer).

Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into eight bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.

The text mode of standard (VGA-compatible) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the display adaptor for easy switching.^[6] There was a selection of third-party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages.

Relationship to ASCII

The vast majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a parity bit in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors (including IBM) used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these ‘extended character sets’ and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings.

Relationship to Unicode

Unicode is an effort to include all characters from previous code pages into a single character enumeration that can be used with a number of encoding schemes. In the process, duplicate characters are eliminated and new variants are introduced, like fullwidth ASCII. While consistent use of any single Unicode encoding would theoretically eliminate the need to keep track of different code pages or character encodings, the existence of multiple encodings of Unicode as well as the need to remain compatible with existing documents and systems that use the older encodings remains. In practice the various Unicode character set encodings have simply been assigned their own code page numbers, and all the other code pages have been technically redefined as encodings for various subsets of Unicode.

Code pages

EBCDIC-based code pages

37/1140 - USA/Canada - CECP
37-2 - 3279 APL as used by C/370 (a variant of 37 and 1047)^[7]
256 - International #1
259 - Symbols, Set 7
273/1141 - Germany F.R./Austria - CECP
274 - Old Belgium Code Page
275 - Brazil - CECP
276 - Canada (French) - 94
277 - Denmark, Norway - CECP
278 - Finland, Sweden - CECP
279^[7]
280 - Italy - CECP
281 - Japan (Latin) - CECP
282 - Portugal - CECP
283^[7]
284 - Spain/Latin America - CECP
285/1146 - United Kingdom - CECP
297 - France^[7]
500/1148 - International ECECP
871 - Iceland^[7]
892 - EBCDIC, OCR A
893 - EBCDIC, OCR B
1047 - Latin 1/Open System^[7]
1137 - Devanagari EBCDIC
1140 - USA, Canada, etc. ECECP
1141 - Austria, Germany ECECP
1142 - Denmark, Norway ECECP
1143 - Finland, Sweden ECECP
1144 - Italy ECECP
1145 - Spain, Latin America (Spanish) ECECP
1146 - UK ECECP
1147 - France ECECP with euro
1149 - Icelandic ECECP with euro
1153 - EBCDIC Latin 2 Multilingual with euro
1154 - EBCDIC Cyrillic, Multilingual with euro
1155 - EBCDIC Turkey with euro
1156 - EBCDIC Baltic Multi with euro
1157 - EBCDIC Estonia with euro
1158 - EBCDIC Cyrillic, Ukraine with euro
1159 - T-Chinese EBCDIC

ISO/IEC 646-related code pages

367 – 7-bit US-ASCII
1009 – 7-bit ISO IRV
1010 – 7-bit France
1011 – 7-bit Germany F.R.
1012 – 7-bit Italy
1013 – 7-bit United Kingdom
1014 – 7-bit Spain
1015 – 7-bit Portugal
1016 – 7-bit Norway
1017 – 7-bit Denmark
1018 – 7-bit Finland/Sweden
1019 – 7-bit Netherlands
1020 – 7-bit Canadian (French) Variant
1021 – 7-bit Switzerland Variant
1023 – 7-bit Spain Variant
20105 - 7-bit IA5 IRV (Western European)^[8]^[9]^[10]
20106 - 7-bit IA5 German (DIN 66003)^[8]^[9]^[11]
20107 - 7-bit IA5 Swedish (SEN 850200 C)^[8]^[9]^[12]
20108 - 7-bit IA5 Norwegian (NS 4551-2)^[8]^[9]^[13]
20127 - 7-bit US-ASCII^[8]^[9]^[14]

Other 7-bit code pages:

895 – 7-bit Japan Latin
896 – 7-bit Japan Katakana Extended
1101 – 7-bit British NRC Set
1102 – 7-bit Dutch NRC Set
1103 – 7-bit Finnish NRC Set
1104 – 7-bit French NRC Set
1105 – 7-bit Norwegian/Danish NRC Set
1106 – 7-bit Swedish NRC Set
1107 – 7-bit Norwegian/Danish NRC Alternate

ISO/IEC 8859-related code pages

ISO/IEC 8859 related 8-bit code pages:

813 - ISO 8859-7
819 - ISO 8859-1
901 - Extension of ISO 8859-13 with euro
902 - ISO Estonian with euro
912 - Extension of ISO 8859-2
913 - ISO 8859-3
914 - ISO 8859-4
915 - Extension of ISO 8859-5
916 - ISO 8859-8
919 - ISO 8859-10
920 - ISO 8859-9
921 - Extension of ISO 8859-13
922 - ISO Estonian
923 - ISO 8859-15
1111 - ISO 8859-2
1124 - ISO 8859-5
1129 - ISO Vietnamese
1163 - ISO Vietnamese with euro
28591 – ISO-8859-1 (Windows)
28592 – ISO-8859-2 (Windows)
28593 – ISO-8859-3 (Windows)
28594 – ISO-8859-4 (Windows)
28595 – ISO-8859-5 (Windows)
28596 and 38596 – ISO-8859-6 (Windows)
28597 – ISO-8859-7
28598 and 38598 – ISO-8859-8 (Windows)
28599 – ISO-8859-9 (Windows)
28603 – ISO-8859-13 (Windows)
28605 – ISO-8859-15 (Windows)
28606 – ISO-8859-16 (Windows)
38596 – ISO-8859-6 (Windows)
38598 – ISO-8859-8 (Windows)

IBM PC / DOS (OEM) code pages

These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font. The interface of those adapters (emulated by all later adapters such as VGA) was typically limited to single byte character sets with only 256 characters in each font/encoding (although VGA added partial support for slightly larger character sets). Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged. Microsoft refers to these as the OEM code pages because they were defined by the OEMs who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standards organization. Examples include:

100 – Hebrew hardware fontpage (Not from IBM; HDOS)^[15]
111 – Greek (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18])
112 – Turkish (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18])
113 – Yugoslavian (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18])
151 – Nafitha Arabic (Not from IBM; ADOS)
152 – Nafitha Arabic (Not from IBM; ADOS)
161 – Arabic (Not from IBM; ADOS)^[15]
162 – Arabic (Not from IBM; ADOS)
163 – Arabic (Not from IBM; ADOS)^[15]
164 – Arabic (Not from IBM; ADOS)
165 – Arabic (Not from IBM; ADOS)^[15]
166 – IBM Arabic PC (ADOS)^[15]
210 - Greece (NEC Jetmate printers)
220 – Spanish (Not from IBM)
437 – Original IBM PC hardware code page
667 – Polish (Mazovia) (Not from IBM)
668 – Polish (Not from IBM)
708 – Arabic (ASMO 708)
709 – Arabic (Not from IBM; ASMO 449+/BCON V4)
710 – Arabic (Transparent Arabic)
711 – Arabic (Not from IBM; Nafitha Enhanced)
720 – Arabic (Transparent ASMO)
737 – Greek
770 – Estonian, Latvian, Lithuanian
771 – Lithuanian/Cyrillic — KBL
772 – Lithuanian/Cyrillic
773 – Latin-7 — KBL
774 – Lithuanian
775 – Latin-7 Baltic RIM
776 – Lithuanian (extended CP770)
777 – Accented Lithuanian (old) (extended CP771) — KBL
778 – Accented Lithuanian (extended CP775)
790 – Polish (Mazovia)
808 – Russian/Cyrillic with euro
848 – Ukrainian with euro
849 – Belarusian with euro
850 – Multilingual / Latin-1
851 – Greek
852 – Eastern European / Latin-2
853 – Turkish / Latin-3
854 – Spanish
855 – Cyrillic
856 – Hebrew
857 – Turkish / Latin-5
858 – Multilingual / Latin-1 with euro symbol
859 – Latin-9
860 – Portuguese
861 – Icelandic
862 – Hebrew
863 – Canadian French
864 – Arabic
865 – Nordic / Danish/Norwegian
866 – Belarusian, Russian, Ukrainian / Cyrillic-2
867 – Hebrew + euro (based on CP862) (conflictive ID: NEC Czech (Kamenický))
868 – Urdu
869 – Greek
872 – Cyrillic with euro
874 – Thai with Low Tone Marks & Ancient Chars (conflictive ID with Windows 874; Windows version is IBM 1162)
881 – Latin 1 (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18]) (conflictive ID with IBM EBCDIC 881)
882 – Latin 2 (ISO 8859-2) (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18]) (conflictive ID with IBM EBCDIC 882)
883 – Latin 3 (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18]) (conflictive ID with IBM EBCDIC 883)
884 – Latin 4 (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18]) (conflictive ID with IBM EBCDIC 884)
885 – Latin 5 (Not from IBM; AST Premium Exec DOS 5.0^[16]^[17]^[18]) (conflictive ID with IBM EBCDIC 885)
891 – Korean
895 – Czech (Kamenický), (Not from IBM; conflictive ID with IBM CP895 — 7-bit EUC Japanese Roman)
900 – Cyrillic (Russian MS-DOS 5.0 LCD.CPI)
912 – Eastern Europe / Latin 2 (ISO 8859-2) (PC DOS 7.0)
915 – Cyrillic / Latin (ISO 8859-5) (PC DOS 7.0)
932 – Japanese (DOS/V) (DBCS) (conflictive ID with Windows 932; Windows version is IBM 943)
934 – Korean (DOS/V) (DBCS)
936 – ANSI/OEM Simplified Chinese (gb2312) (DOS/V) (DBCS) (conflictive ID with Windows 936; Windows version is IBM 1386)
938 – Traditional Chinese (DOS/V, OS/2)
942 – Japanese SAA (OS/2)
943 – Japanese (Windows CP 932)
944 – Korean SAA (OS/2)
948 – Traditional Chinese SAA (OS/2)
949 – Korean (Unified Hangul / Extended Wansung (ks_c_5601-1987)) (conflictive ID with Windows 949; Windows version is IBM 1363)
950 – Traditional Chinese (Big5 encoding)
966 – Saudi Arabian (Not from IBM)
991 – Polish (Mazovia) (Not from IBM)
1098 – Farsi
1116 – Estonian
1117 – Latvian
1118 – Lithuanian
1119 – Lithuanian/Cyrillic
1125 – Ukrainian
1131 – Belarusian

When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but newer encoding systems, in particular Unicode, are encouraged for new designs.

Code page 819 is identical to Latin-1, ISO/IEC 8859-1, and with slightly-modified commands, permits MS-DOS machines to use that encoding. It was used with IBM AS/400 minicomputers.

DOS code pages are typically stored in .CPI files.^[19]^[20]^[21]^[22]^[23]

Windows (ANSI) code pages

Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.

874 – Windows Thai
1250 – Windows Central and East European Latin 2
1251 – Windows Cyrillic
1252 – Windows West European Latin 1
1253 – Windows Greek
1254 – Windows Turkish
1255 – Windows Hebrew
1256 – Windows Arabic
1257 – Windows Baltic
1258 – Windows Vietnamese

Microsoft recommends new applications use UTF-8 or UCS-2/UTF-16 instead of these code pages.^[24]

DBCS code pages

These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "ANSI" code page for the applicable locale.

932 – Supports Japanese
936 – GBK supports Simplified Chinese
949 – Supports Korean
950 – Supports Traditional Chinese

Apple related code pages

1275 – Apple, Latin 1
1280 – Apple Greek
1281 – Apple Turkish
1282 – Apple Central European
1283 – Apple Cyrillic
1284 – Apple, Croatian
1285 – Apple, Romanian
1286 – Apple, Icelandic
10000 - Apple Macintosh Roman
10001 - Apple Japanese
10002 - Apple Chinese (traditional) (BIG-5)
10003 - Apple Korean
10004 - Apple Arabic
10005 - Apple Hebrew
10006 - Apple Greek
10007 - Apple Macintosh Cyrillic
10008 - Apple Chinese (simplified) (GB 2312)
10010 - Apple Romanian
10017 - Apple Ukrainian
10021 - Apple Thai
10029 - Apple Macintosh Central Europe / Roman II
10079 - Apple Icelandic
10081 - Apple Turkish
10082 - Apple Croatian

Various other Microsoft code pages

The following code page numbers are specific to Microsoft Windows. IBM may use different numbers for these code pages.

1276 – Adobe (PostScript) Standard Encoding
1277 – Adobe (PostScript) Latin 1
20000
20001
20002
20003
20004
20005
20261
20269
20273
20277
20278
20284
20285
20290
20297
20420
20423
20424
20833
20838
20866 - KOI8-R
20871
20880
20905
20924
20932
20936
20949
21025
21027
21866 - KOI8-U

ISO/IEC 10646 / Unicode code pages

Unicode is the most recommended encoding for modern applications.

1200 – UTF-16LE Unicode (little-endian)
1201 – UTF-16BE Unicode (big-endian)
1400 – ISO 10646 UCS-BMP (Based on Unicode 6.0)
1401 – ISO 10646 UCS-SMP (Based on Unicode 6.0)
1402 – ISO 10646 UCS-SIP (Based on Unicode 6.0)
1414 – ISO 10646 UCS-SSP (Based on Unicode 4.0)
1445 – IBM AFP PUA No. 1
1446 – ISO 10646 UCS-PUP15 (Based on Unicode 4.0)
1447 – ISO 10646 UCS-PUP16 (Based on Unicode 4.0)
1448 – UCS-BMP (Generic UDC)
1449 – IBM default PUA
65000 – UTF-7 Unicode
65001 – UTF-8 Unicode
65520 – Empty Unicode Plane

Miscellaneous

(number missing) – MIK supports Bulgarian and Russian as well

List of code page assignments

This list is incomplete; you can help by expanding it.

List of known code page assignments (incomplete):

ID	Names	Description	Origin	Platform	DOS	OS/2	Windows	Mac	Else	Encoding	Comment
0	N/A	Reserved	IBM, Microsoft	N/A	3.3+	1.0+	?	?	?		Internal OS use^[15]
437	CP437, IBM437	PC US	IBM^[25]	IBM PC	3.3+	1.0+	Yes	?	Yes	8-bit SBCS
57344 - 61439	N/A	Private use derivations	IBM	N/A	N/A	N/A	N/A	N/A	N/A	various	Private use code page derivations (E000h-EFFFh)
65280 - 65533	N/A	Private use definitions	IBM	N/A	N/A	N/A	N/A	N/A	N/A	various	Private use code page definitions (FF00h-FFFDh)
65534	N/A	Reserved	IBM, Microsoft	N/A	?	?	?	?	?	various	Internal OS use (FFFEh)
65535	N/A	Reserved	IBM, Microsoft	N/A	3.3+	1.0+	?	?	?	various	Internal OS use (FFFFh)^[15]

Criticism

Many older character encodings (unlike Unicode) suffer from several problems. Some code page vendors insufficiently document the meaning of all code point values, which decreases the reliability of handling textual data through various computer systems consistently. Some vendors add proprietary extensions to some code pages to add or change certain code point values; for example, byte 0x5C in Shift JIS can represent either a back slash or a yen currency symbol depending on the platform. Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.

Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. Applications may also mislabel text in Windows-1252 as ISO-8859-1. Fortunately, the only difference between these code pages is that the code point values used by ISO-8859-1 for control characters are instead used as additional printable characters in Windows-1252. Since control characters have no function in HTML, web browsers tend to use Windows-1252 rather than ISO-8859-1. In HTML5, treating ISO-8859-1 as Windows-1252 is even codified as standard. Later, UTF-8 has succeeded both encodings in terms of popularity on the Internet.^[26]^[27]

Private code pages

When, early in the history of personal computers, users didn't find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g., CP895).

When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.

In order to overcome such problems, the IBM Character Data Representation Architecture level 2 specifically reserves ranges of code page IDs for user-definable and private-use assignments. Whenever such code page IDs are used, the user must not assume that the same functionality and appearance can be reproduced in another system configuration or on another device or system unless the user takes care of this specifically. The code page range 57344-61439 (E000h-EFFFh) is officially reserved for user-definable code pages (or actually CCSIDs in the context of IBM CDRA), whereas the range 65280-65533 (FF00h-FFFDh) is reserved for any user-definable "private use" assignments. For example, a non-registered custom variant of code page 437 (1B5h) or 28591 (6FAF) could become 57781 (E1B5h) or 61359 (EFAFh), respectively, in order to avoid potential conflicts with other assignments and maintain the sometimes existing internal numerical logic in the assignments of the original code pages. An unregistered private code page not based on an existing code page, a device specific code page like a printer font, which just needs a logical handle to become addressable for the system, a frequently changing download font, or a code page number with a symbolic meaning in the local environment could have an assignment in the private range like 65280 (FF00h).

The code page IDs 0, 65534 (FFFEh) and 65535 (FFFFh) are reserved for internal use by operating systems such as DOS and must not be assigned to any specific code pages.

References

↑ IBM i Globalization - EBCDIC Code Pages
↑ "Code Page". sap.com.
1 2 "Glossary". oracle.com.
↑ The MS-DOS Encyclopaedia, Microsoft press (1988, ISBN 1-55615-049-0, ISBN 978-1-55615-049-4)
↑ "Code Page Identifiers". microsoft.com. Microsoft.
↑ "VGA/SVGA Video Programming--VGA Text Mode Operation". osdever.net.
1 2 3 4 5 6 xlate - Transliterate Contents of Records, IBM Corporation, 2010 [1986], retrieved 2016-10-18
1 2 3 4 5 "Code Page Identifiers". Microsoft Developer Network. Microsoft. 2014. Archived from the original on 2016-06-19. Retrieved 2016-06-19.
1 2 3 4 5 "Web Encodings - Internet Explorer - Encodings". WHATWG Wiki. 2012-10-23. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
↑ Foller, Antonin (2014) [2011]. "Western European (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
↑ Foller, Antonin (2014) [2011]. "German (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
↑ Foller, Antonin (2014) [2011]. "Swedish (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
↑ Foller, Antonin (2014) [2011]. "Norwegian (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
↑ Foller, Antonin (2014) [2011]. "US-ASCII encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
1 2 3 4 5 6 7 Paul, Matthias (2002-09-05), Technical info on undocumented DOS country info for LCASE, ARAMODE and CCTORC records, FreeDOS development list fd-dev at Topica, archived from the original on 2016-05-27, retrieved 2016-05-26
1 2 3 4 5 6 7 8 Brown, Ralf (2002-12-29). "The x86 Interrupt List". Retrieved 2011-10-14.
1 2 3 4 5 6 7 8 Paul, Matthias (1997-07-30). NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP (e-book) (in German) (edition 3, release 157 ed.). Archived from the original on 2016-05-22. Retrieved 2012-01-11. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet larger MPDOSTIP.ZIP collection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of the NWDOSTIP.TXT file.
1 2 3 4 5 6 7 8 Paul, Matthias (2001-04-09). NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP (e-book) (in German) (edition 3, release 183 ed.).
↑ Paul, Matthias (2001-06-10) [1995]. "Format description of DOS, OS/2, and Windows NT .CPI, and Linux .CP files" (CPI.LST file) (1.30 ed.). Archived from the original on 2016-04-20. Retrieved 2016-08-20.
↑ Elliott, John (2006-10-14). "CPI file format". Archived from the original on 2016-09-22. Retrieved 2016-09-22.
↑ Brouwer, Andries Evert (2001-02-10). "CPI fonts". 0.2. Archived from the original on 2016-09-22. Retrieved 2016-09-22.
↑ Haralambous, Yannis (September 2007). Fonts & Encodings. Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. p. 601–602, 611. ISBN 978-0-596-10242-5. ISBN 0-596-10242-9.
↑ MS-DOS Programmer's Reference. Microsoft Press. 1991. ISBN 1-55615-329-5.
↑ "Code Pages". microsoft.com. Microsoft.
↑ IBM. "SBCS code page information document - CPGID 00437". Retrieved 2014-07-04.
↑ "Usage Statistics of Character Encodings for Websites, (updated daily)". w3techs.com. Retrieved 6 August 2015.
↑ "UTF-8 Usage Statistics". trends.builtwith.com. Retrieved 28 March 2011.

External links

IBM CDRA glossary
IBM code pages
IBM code pages by encoding scheme
IBM/ICU Charset Information
Microsoft Code Page Identifiers (Microsoft's list contains only code pages actively used by normal apps on Windows. See also Torsten Mohrin's list for the full list of supported code pages)
Shorter Microsoft list containing only the ANSI and OEM code pages but with links to more detail on each
Character Sets And Code Pages At The Push Of A Button
Microsoft Chcp command: Display and set the console active code page

Character encodings

Character sets

Early telecommunications	ASCII ISO/IEC 646 ISO/IEC 6937 T.61 BCDIC Baudot code Morse code (Telegraph code Wabun code) Special telegraphy codes: Non-Latin, Chinese, Cyrillic

ISO/IEC 8859	-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16

Bibliographic use	ANSEL ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 MARC-8

National standards	ArmSCII CNS 11643 GOST 10859 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 PASCII SI 960 TIS-620 TSCII VISCII YUSCII

EUC	CN JP KR TW

ISO/IEC 2022	CN JP KR CCCII

MacOS code pages ("scripts")	Arabic Mac OS Celtic CentEuro ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari Dingbats Farsi Gaelic Greek Gujarati Gurmukhi Hebrew Iceland Japanese / ShiftJIS Korean / EUC-KR Roman Romanian Symbol Thai / TIS-620 Turkish Ukrainian

DOS code pages	100 111 112 113 151 161 162 163 164 165 220 300 301 437 449 620 667 668 708 709 710 711 720 737 770 771 772 773 774 775 776 777 778 790 808 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 872 874 876 877 878 881 882 883 884 885 891 895 897 898 899 900 903 904 906 907 909 910 911 925 926 927 928 929 932 934 936 938 941 942 943 944 946 947 948 949 950/1370 951 966 991 1004 1034 1039 1040 1041 1042 1043 1044 1046 1086 1088 1090 1092 1093 1098 1108 1109 1114 1115 1116 1117 1118 1119 1125 1126 1127 1131 1139 1161 1162 1167 1168 1351 1361 1362 1363 1372 1373 1374 1375 1380 1381 1385 1386 1391 1392 1393 1394 17248 Kamenický Mazovia CWI-2 KOI8 MIK Iran System

IBM AIX code pages	367 371 806 813 819 895 896 901 902 912 913 914 915 916 919 920 921 922 923 952 953 954 955 956 957 958 959 960 961 962 963 964 965 970 971 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1029 1036 1089 1111 1124 1129 1133 1163 1350 1382 1383

IBM Apple MacIntosh Emulations	1275 1280 1281 1282 1283 1284 1285 1286

IBM Adobe Emulations	1038 1276 1277

IBM DEC Emulations	1021 1023 1100 1101 1102 1103 1104 1105 1106 1107 1287 1288

IBM HP Emulations	1050 1051 1052 1053 1054 1055 1056 1057 1058

Windows code pages	874/1162 (TIS-620) 932/943 (Shift JIS) 936/1386 (GBK) 950/1370 (Big5) 949/1363 (EUC-KR) 1169 1174 1200 (UTF-16LE) 1201 (UTF-16BE) 1250 1251 1252 1253 1254 1255 1256 1257 1258 1261 1270 54936 (GB18030)

EBCDIC code pages	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 32 33 34 35 36 37/1140 38 39 40 251 252 254 256 257 258 259 260 264 273/1141 274 275 276 277/1142 278/1143 279 280/1144 281 282 283 284/1145 285/1146 286 287 288 289 290 293 297/1147 298 310 320 321 322 330 351 352 353 355 357 358 359 360 361 363 382 383 384 385 386 387 388 389 390 391 392 393 394 395 410 420/16804 421 423 424/12712 425 435 500/1148 803 829 833 834 835 836 837 838/1160 839 870/1153 871/1149 875/9067 880 881 882 883 884 885 886 887 888 889 890 892 893 905 918 924 930/1390 931 933/1364 935/1388 937/1371 939/1399 1001 1002 1003 1005 1007 1024 1025/1154 1026/1155 1027 1028 1030 1031 1032 1033 1037 1047 1068 1069 1070 1071 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1087 1091 1097 1110 1112/1156 1113 1122/1157 1123/1158 1130/1164 1132 1136 1137 1150 1151 1152 1159 1165 1166 1278 1303 1364 1376 1377 JEF KEIS

Platform specific	ATASCII Atari ST BICS CDC CPC DEC Radix-50 DMCS/NRCS ELWRO-Junior FIELDATA GEM GEOS GSM 03.38 HP Roman Extension HP Roman-8 HP Roman-9 HP calculators LICS LMBCS NEC APC NeXT PETSCII TI calculators WISCII XCCS ZX80 ZX81 ZX Spectrum

Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 (UTF-16LE/UTF-16BE) / UCS-2 UTF-32 (UTF-32LE/UTF-32BE) / UCS-4 UTF-EBCDIC GB 18030 BOCU-1 CESU-8 SCSU

Miscellaneous code pages	ABICOMP APL Cork HZ JOHAB TRON UTF-5 UTF-6 WTF-8

Related topics	Control character (C0 C1) CCSID Character encodings in HTML Charset detection Han unification Hardware ISO 6429/IEC 6429/ANSI X3.64 Mojibake

This article is issued from Wikipedia - version of the 12/1/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.