Wiktionary:About Han script

There are a number of entries on Chinese characters, which are used in the People's Republic of China (simplified Chinese), the Republic of China (Taiwan) (traditional Chinese), Japan (kanji), and Korea (hanja).

Chinese characters were formerly used in Vietnamese (chữ Nôm), have been used in minority languages in China: Bai, Dong, Miao and Zhuang (this latter using significant variants), and Siniform scripts were used for the extinct Khitan, Jurchen, and Tangut languages.

In addition to laying out a standard format, this page recommends standards on writing (creating or editing) these pages.

Recommendations

 * Use to wrap Chinese character text, specifying the language and using   (traditional Chinese),   (simplified Chinese),   (Japanese), or   (generic Chinese) as the script, or characters may display using inappropriate fonts and forms.
 * Due to, a given character does not have a language attached. As a result, if the language is not explicitly defined browsers may use a set of fallback fonts that may be inappropriate for a given text. Chinese words may be jarringly displayed using a Japanese font with insufficient coverage needed for Chinese (resulting in two fonts with different styles being used at once), or Japanese terms may be displayed using a Chinese font that uses character forms that deviate from the Japanese norm. To avoid this, is used to tell the browser that "this is Chinese text, use a Chinese font" or "this is Japanese text, use a Japanese font".

Entry layout
Chinese characters are both characters and the spellings of words in various languages. Thus entries for Chinese characters:
 * begin with a "Translingual" section on the character itself, and
 * then include entries in each language that uses them.
 * The language-specific sections themselves ("Chinese", "Japanese", "Korean", etc.) begin with a section on the character itself (except for Chinese), then include a part of speech section if the character can be used in isolation (for example, can be used as a normal noun in Japanese). The part of speech sections follow WT:ELE, but the character sections have specific formats (reading, eumhun, compounds, etc.), as detailed below.

Theoretical code for a most basic entry is shown below, using.

Categories of characters
Chinese characters have been used in a number of languages and regions in the, and thus have a great deal of variation.
 * Many characters are used in the same form, across all languages;
 * some characters only exist in one language or another;
 * and other characters have different forms in different languages.
 * In, many are considered variants not worth encoding separately (see ), some are encoded separately to preserve backwards compatibility with legacy encodings, and some appear identical but have different stroke orders.
 * There are also handwritten simplifications (Japanese, Korean ) which may or may not be encoded.

It is useful to indicate both:
 * What categories a character falls into
 * What, if any, a character has

Most basically, there are s, and two major simplifications: s (Chinese) and (Japanese).

In more detail:
 * Chinese
 * s
 * s
 * Japanese
 * &mdash;generally identical to traditional
 * &mdash;sometimes the same as simplified, sometimes different; see Category:CJKV characters simplified differently in Japan and China
 * &mdash;Japanese characters coined in Japan (sometimes borrowed into China) (see Category:Japanese-only CJKV Characters)
 * , or &mdash;simplification of kyūjitai forms outside the ; primarily used by the newspaper
 * ghost kanji (ja)&mdash;non-existent characters erroneously included in
 * Note that and  are written in shinjitai, while  are generally written in kyūjitai, or rarely in extended shinjitai.
 * Korean
 * Generally the same as traditional.
 * coined in Korea (rare) (see Category:Korean-only CJKV Characters)
 * Vietnamese
 * Generally the same as traditional.
 * chữ Nôm coined in Vietnam (see Category:Vietnamese-only CJKV Characters)

Variant forms
A character may have multiple forms. One should:
 * Indicate the variant forms, both by an hatnote and via a  (Translingual) or  (Chinese) template.
 * Not use inappropriate forms. For instance, do not write Japanese words in simplified Chinese (if the form differs from the shinjitai form).

For instance, the character for "reading" has 3 forms, as in the box at right. This box is produced by the template.

Headings
There are a number of templates which help with the layout, which are listed below.

The only thing that should come before the “Translingual” heading is, if necessary, a hatnote for similar characters which may be confused (such as  and ), and for variant forms; plus.

(Stroke order)
A stroke order diagram may be displayed using the template, with parameter strokes= for sizing. All existent diagrams can be found at commons:Category:CJK stroke order. A separate “Stroke order” section is not to be created.

Caveat: Different stroke orders
Beware that, just as some characters have different forms in different languages, some characters have different stroke orders in different languages.

There are potentially 3 (or more) different stroke orders, but these very often coincide:
 * Traditional Chinese, used historically, and in Taiwan and Hong Kong
 * Note that there are some differences between modern Taiwanese and Hong Kong standards and actual historical practice; for example, differs in Taiwan, while  and  differ in Hong Kong.
 * Japanese
 * e.g., ,.
 * Simplified Chinese, used in mainland China
 * Also known as Modern Chinese; some characters were not simplified, but their stroke order was changed.

There are also Korean and Vietnamese stroke orders and character forms, but modern Korea generally uses Japanese conventions, and Vietnamese is only of historical interest, hence relatively unimplemented.

For instance, is different in Chinese and Japanese, while the radical  (and thus all derived characters) differs in simplified (and Taiwan) and (historical) traditional Chinese.

When simplified and traditional Chinese stroke orders differ, Japanese and simplified Chinese coincide. There are apparently no examples where all three share the same form but different stroke orders, though there are examples where the form differs in all three.


 * defaults to Simplified Chinese.
 * If there are multiple stroke order diagrams available, please include all forms.
 * To include Chinese and Japanese, use the parameter  for the Japanese stroke order: see.
 * To include traditional and simplified forms, you must currently do so manually: see.

Etymology
If possible, include “Etymology” section explaining the form of the character, listing earlier forms, and explaining the development of the character form.

Beware that there are many folk etymologies based on analyses of modern forms, with many dating to the 2nd century CE (when present forms largely stabilized)! Modern scholarship based on oracle bone script often provides different etymologies. See References.

Please do not include discussion of the etymology of the word (often Old Chinese) that the character was developed to represent; this belongs in the language-specific section. The Translingual "Etymology" section should not include pronunciation information, except when necessary to understand the form. This occurs for example in phono-semantic compounds, where reconstructions of the pronunciations of the compound character and its phonetic are relevant to the form, but sound is completely irrelevant to pictographs and ideographs. Reconstructed pronunciations should be cited and follow the usual rules for historical Sinitic languages – see About Old Chinese and About Middle Chinese for guidelines, and for an example.

Most characters were coined during the Old Chinese period; this needn’t be explicitly mentioned, but can be stated if helpful. If a character was not coined during the Old Chinese period – notably Middle Chinese or foreign coinages (especially Japanese, some Korean and Vietnamese), this should be mentioned.

Simplified and Shinjitai
For simplified Chinese and shinjitai character, the "Etymology" section should simply link to the traditional Chinese or kyūjitai and explain the method of simplification, as in Simplified Chinese characters: Methods of simplification and Shinjitai: Methods of simplifying Kanji. This can be done using the template, which also categorizes.

Traditional and coinages
For traditional Chinese and country-specific coinages, the "Etymology" section should:
 * Classify composition (see ). One should provide traditional classification using the template  and break up compound characters via . Note that:
 * The overwhelming majority of Chinese characters (90%+) are phono-semantic compounds.
 * Beware of folk etymologies based on current forms (especially claims that a character is an ideogrammic compound) – the current form is often a simplification of an older form, which may not be related to the current components. For instance, the lower part of is cognate to, not to , which it more closely resembles.
 * Show previous forms. These are collected at Wikimedia Commons, and the template will display them if they exist.
 * Note that older forms themselves had variants, which need not be exhaustively displayed.

Han character
The main section is the “Han character” section, using the template, which includes radical, stroke count, and various input methods.

Previously this is followed by definitions (still in “Han character”). This is deprecated and should not be used. Definitions should be placed under the language heading (“Japanese”).

This should also include a “Reference” section, using, which links to the character in various standard dictionaries, and includes the Unicode number (linking to Unihan in the process).

Etymology
(Explanation of form; ideally shows earlier forms.)

Compounds
Compounds and idioms involving a character (熟語) are listed language by language, since they vary between languages.

List compounds using a suitable Category:Column templates, generally or  if only listing compounds, or  or  if also providing a gloss. See 水 is an excellent example.

Compounds should be collated by radical-and-stroke sorting; for order of radicals, see Appendix:Chinese radical. However, as per About Japanese, compounds that begin with the character should come first.

As per About Japanese, terms involving a character should be listed in an L4 section called “Compounds” – by contrast, in the entry for a 2 or more character compound, longer compounds should be called “Derived terms”.

A separate L4 section called “Names” should contain any common names constructed from the character, even if such names duplicate a compound word.

Note that some pages list compounds as “Derived terms” in the “part of speech” section: contrast 日 and 天.

Compound entry
On the page for a compound (2 or more Chinese characters), some general considerations.

As above, longer compounds (containing a given compound) should be in a section called “Derived terms”.

If one compound is obtained from another by re-arranging the characters, such as and, it is useful to link these; the “Related terms” section fits best, presuming an etymological connection.

Chinese
For the layout of “Chinese” section, see About Chinese. The following is an example:

Japanese
In addition to L3 part of speech headings, Japanese entries for a Chinese character have an L3 heading called “Kanji”, which has an L4 heading called “Readings”, which can use the template. This currently supports the usual on, kun, and (rarer) nanori readings, but also nazuke and 呉音 (go-on) readings.

See About Japanese for more on the format of Japanese entries.

Korean
There should be an L3 heading for “Hanja”, beginning with the eumhun (meaning/reading), which can be obtained by the template. This also supports the following romanizations, via the respective parameters: Revised Romanization of South Korea (ehrv), McCune-Reischauer (ehmr), Yale Romanization of Korean (ehy).

Next there should be an L4 heading “Compounds”; in addition to the hanja form, it should also include hangeul forms for all words.

Vietnamese
Currently, the vast majority of Vietnamese character entries indicate Hán-Việt readings and omit Nôm readings. The layout has not been standardized, though most have a single L3 heading, "Han character", with below it.

Works in chữ Nôm are quoted in the part of speech section using the template. Any quốc ngữ works should be quoted in the corresponding quốc ngữ entry.

Note that most Nôm text includes characters not yet encoded in Unicode. Most Nôm sources make use of Private Use Area characters that are found in various Nôm fonts. Do not use Private Use Area characters, because they will be misinterpreted by readers with different Nôm fonts installed. Instead, use Ideographic Description Sequences. (See Template:vi-ruby for an example.)

Pronunciations and etymology generally belong in the quốc ngữ entry. Also in that entry, each headword line takes a list of characters (according to Nôm readings) as an additional parameter. Hán-Việt forms may be listed under an L3 "Readings" section using.

Proposal
There is a proposal at Beer parlour/2013/December that would do away with the current layout in favor of the following structure:


 * Character – "Han character" is avoided because it appears to exclude Nôm readings or Nôm-only characters.
 * Readings – Specify any Hán-Việt and Nôm readings using the template.
 * Compounds
 * part of speech (Noun, Verb, etc.) – Because Hán-Việt readings rarely differ from the definitions in the Translingual section, the parts of speech sections are for definitions of Nôm readings only. Use headword templates like and, listing Nôm readings in the first parameter.
 * References (if applicable)

Others
Chinese characters and similar scripts (see ) are used for other languages than those primarily discussed above.

Ryukyuan languages
Chinese characters are used for some, following the format for Japanese.

Minority languages in China
Some languages like Bai, Dong, Miao and Zhuang, like Vietnamese, are currently officially romanized, but used Chinese characters in the past, and have limited usage currently. These languages may uses some significant variant characters, not fully encoded in the Unicode Standard at present.

Although most of entries in these languages are in romanized script, entries in Chinese characters may exist. They may be soft redirected to entries in romanized script using an appropriate template (e.g. ).

Other scripts
The extinct languages of Khitan, Jurchen, and Tangut each use their own script, derived from Chinese characters. Characters in these scripts is not unified with Han characters, so information in this page does not apply to entries in these languages.

Some other scripts used in China are not derived from Chinese characters, but often have borrowed from Chinese. These notably include, , , and.

Etymology

 * Xǔ Shèn 許慎/许慎. Shuōwén Jiězì “說文解字”/“说文解字” 100–121 CE – classic reference, but due to lack of access to earlier forms, has errors
 * Xu Zhongshu 徐中舒. “丁山說文闕義箋” [Commentary on the errors in Shuowen by Ding Shan]
 * 李孝定 Lĭ Xiàodìng (Lee Hsiao-ting, 1965). 甲骨文字集釋 Jiǎgǔwénzì jíshì, [Collected interpretations of oracle bone characters], 台北 Táibĕi, 南港 Nángǎng (Nankang): 中央研究院歷史語言研究所 Institute of History and Philology, Academia Sinica
 * A authoritative modern reference.