Appendix:Control characters

Besides alphabetic characters and symbols, Unicode also includes a variety of control characters with no graphical representation. While some of these are actually used to modify certain characters, others are largely disused remnants from older computing systems.

C0 (ASCII and derivatives)
C0 control codes are in the Unicode range U+0000-U+001F, and were inherited from the ASCII standard. Often, these cannot be input directly because they fulfill specific low-level functions in the operating system.

C1 set
C1 control codes are in the Unicode range U+0080-U+009F, and were inherited from the ISO 8859 series of standards. Nowadays, these control codes are rarely if ever used for their intended purpose, and often their presence in the text indicates an ill-formatted alternate character set, such as Windows-1252.

Unicode control characters
Unicode control characters are control characters which do not occur in the C0 or C1 sets. While the C0 and C1 control characters are of the type  (Other, Control), control characters exclusive to Unicode are in the category   (Other, Format).

Language tags
Language tags are encoded in the Unicode range U+E0000-U+E007F. These were added as a way to represent the language of a text without having to resort to higher-level mechanisms. They were intended specifically for Chinese, to allow a text to unambiguously represent the simplified or traditional form of a Chinese character. However, these characters haven't caught on, and their use is now deprecated by Unicode.

Example:

Variation selectors
Variation selectors are encoded in three Unicode ranges scattered across the planes. Their purpose is to mark variations of a character in certain cases where it may be appropriate to specify one form of a character. The variations are not arbitrary, but specifically listed in the Unicode standard. There are currently three main uses of variation selectors:


 * Variation selectors 1-16 are encoded in the U+FE00-U+FE0F block. Only the first three and the last two are ever actually used in Unicode. The first (VARIATION SELECTOR-1, U+FE00) is used for many different characters from several scripts, while the second (VARIATION SELECTOR-2, U+FE01) and third (VARIATION SELECTOR-3, U+FE02) are used only for variant forms of Han ideographs.  The last two are used to specify text (monochrome) or emoji forms of characters; VARIATION SELECTOR-15 (U+FE0E) selects the text form, while VARIATION SELECTOR-16 (U+FE0F) selects the emoji form.
 * Variation selectors 17-256 are encoded in the U+E0100-U+E01EF block. This set is intended to define special variants of Han ideographs.
 * The Mongolian block has its own set of variation selectors which is used orthographically in this script.