Template talk:zh-forms

rfm
I suggest moving this template to Template:Hani-forms, and keeping the old name indefinitely as a redirect.

The code "zh" is ambiguous and unwanted per consensus for a number of reasons. In particular, this template begins with "zh" (which means "Chinese", or "Mandarin", depending on how you look at it), but it is also used in other languages written in Han script, whose code is "Hani". This template serves the purpose of showing varieties of Han script, so a name beginning with that code seems to be a very natural choice.

FWIW, another high-use template whose name begins with a script code is Template:Latn-def. --Daniel 08:59, 8 June 2011 (UTC)
 * I think refers to the Chinese languages as a whole as opposed to the script. No real strong feelings, you could move it to, for example. --Mglovesfun (talk) 12:04, 8 June 2011 (UTC)
 * If there are many good template names to be chosen, then you can consider my proposal of "Hani-forms" as completely arbitrary, but a proposal nonetheless, that I believe to be better than the current system.
 * However, I do think that "Hani-forms" is even better than "zhx-forms". The template is used with Translingual entries, that are neither Sinitic nor of any other family, but are written with Han characters nonetheless. --Daniel 12:28, 8 June 2011 (UTC)

Done. Nobody objected. --Daniel 01:44, 22 June 2011 (UTC)

技巧
Please see 技巧, thanks. Wyang (talk) 00:39, 31 January 2016 (UTC)
 * What is the problem? --kc_kennylau (talk) 02:03, 31 January 2016 (UTC)
 * Word linking in trad. Wyang (talk) 02:24, 31 January 2016 (UTC)
 * Sorry, my bad. --kc_kennylau (talk) 02:25, 31 January 2016 (UTC)

Size of text in alt=
For me, it's barely legible （´∀｀；） Perhaps it's my font choices though. —suzukaze (t・c) 03:41, 5 June 2016 (UTC)
 * Fair enough. Taking the average of 70%. Wyang (talk) 03:48, 5 June 2016 (UTC)

Reduplication
It's putting and the like into Chinese reduplications. — justin(r)leung { (t...) 01:29, 26 October 2016 (UTC)
 * I originally put "This category includes any Chinese word containing two consecutive identical characters." in the description of Category:Chinese reduplications to show that this is only a category for all words with reduplicated characters. I tightened the criteria a bit to exclude sole transcriptions, and reduplications crossing component boundaries, but it may be hard to achieve the linguistic sense of reduplication automatically. Wyang (talk) 01:57, 26 October 2016 (UTC)

草蜢撩雞公——自尋死路; maybe a list or tracking category should be generated and the category should be applied manually. —suzukaze (t・c) 09:41, 26 October 2016 (UTC)


 * Fixed just then in . Wyang (talk) 09:42, 26 October 2016 (UTC)

on 㙌
Attention needed. Wyang (talk) 22:24, 29 June 2018 (UTC)

Discussion from Talk:溍
,, , Hi. After the most recent edit in Module:zh-forms, the zh-forms box isn't displaying the proper traditional and simplified forms for 溍 and 溍 (both encoded under the same code point) based on the language tag. Also, I think it would be preferable to add in such characters manually rather than letting it do so automatically. If you look at revision 49664286, some characters added to the  list such as ,  show no significant difference between traditional and simplified forms in the Unicode charts. I don't think it is necessary to split between traditional and simplified forms for characters that show only minor cosmetic differences (mostly in the stroke direction) such as 今 /今, 氐 /氐 , 令 /令 , 艾 /艾 , 叟 /叟 , 丰 /丰 , 犮 /犮 , <font size=4, lang="zh-hant">壬 /<font size=4, lang="zh-hans">壬 , <font size=4, lang="zh-hant">呈 /<font size=4, lang="zh-hans">呈. Instead, these differences should be noted in the translingual section (either as alternative forms or in their respective ids). KevinUp (talk) 13:08, 8 June 2018 (UTC)

I think it would be much better to add something such as  to characters such as <font size=4, lang="zh-hant">珊 /<font size=4, lang="zh-hans">珊, <font size=4, lang="zh-hant">琤 /<font size=4, lang="zh-hans">琤 , <font size=4, lang="zh-hant">猺 /<font size=4, lang="zh-hans">猺 , <font size=4, lang="zh-hant">瘟 /<font size=4, lang="zh-hans">瘟 , <font size=4, lang="zh-hant">莒 /<font size=4, lang="zh-hans">莒  as these characters are special exceptions that have been unified when compared with derived characters of 冊 / 册 , 爭 / 争 , 䍃 , 𥁕 / 昷 , 呂 / 吕  such as  /  and  /  and  /  and  /  and  /  that have been disunified. It should be noted that is slightly inconsistent, with frequently used characters split into separate code points whereas rarely used characters are unified. Hence, I would suggest adding  to such anomalies when encountered instead of having a   list that is prone to errors when not properly checked. KevinUp (talk) 02:12, 6 July 2018 (UTC)


 * 溍 looks fine on my computer. What does it look like on your system? About, I do agree that the list needs improvement, but I like the idea that this is done automatically. We can always update the list when needed. There are still some problems to consider: (1) not all systems have the right fonts; (2) some simplified glyph shapes are acceptable (or even standard in Hong Kong) in traditional Chinese; (3) how different is different?—to me, <font size=4, lang="zh-hant">犮 and <font size=4, lang="zh-hans">犮 are different enough. — justin(r)leung { (t...) 02:30, 6 July 2018 (UTC)


 * I'm not sure if the problem still persists on your (KevinUp) computer, but I can see a trad-simp form difference on 溍, same as Justin above. Wyang (talk) 03:07, 6 July 2018 (UTC)


 * No, it's still not working for me. However, if I were to copy the code from your previous edit at 49663403 and apply it to the page for, I would be able to distinguish between the two forms. Otherwise I'm only seeing the simplified forms in both boxes. KevinUp (talk) 04:40, 6 July 2018 (UTC)


 * , I managed to get the characters to display correctly via this edit . Can you all check to see if the fonts are applied correctly on the devices that you are using? Thanks. KevinUp (talk) 07:56, 6 July 2018 (UTC)


 * Yes, thanks, it's still displayed correctly for me. Wyang (talk) 13:24, 6 July 2018 (UTC)


 * (1) On my system I am able to distinguish between <font size=4, lang="zh-hant">溍 and <font size=4, lang="zh-hans">溍 . Before this, in edit 49666022, I was still able to see the difference. But since edit 49666028 , only the simplified form is shown. (2) Can you list a few more examples where the glyph shape in Hong Kong is different from the one used in Taiwan besides (standard in Hong Kong/mainland China) vs  (standard in Taiwan) and  (standard in Hong Kong/mainland China) vs  (standard in Taiwan)? So far I'm only aware of these two, as well as <font size=4, lang=zh-Hant-HK>𤏁 /<font size=4, lang=zh-Hant-TW>𤏁  and <font size=4, lang=zh-Hant-HK>𤇍 /<font size=4, lang=zh-Hant-TW>𤇍  which have different compositions in Hong Kong compared to Taiwan based on HKSCS 2016. In this case, adding usage notes for the respective characters would be more helpful. (3) I agree that <font size=4, lang="zh-hant">犮 and <font size=4, lang="zh-hans">犮 are different enough because there is an additional horizontal stroke for the form used in mainland China. Most of the characters that look different in mainland China due to   and are encoded under the same code point in Unicode should not be considered as "simplified forms" as this would cause some confusion. Simplified characters should be defined as those that are found in 1956  《漢字簡化方案》 , 1964  《簡化字總表》 , 1988 《現代漢語通用字表》 , 2013  《通用規范漢字表》  and 1956  《第一批异体字整理表》  (Revised 1986, 1988, 1993) . Besides this, I am of the opinion that characters which have separate code points in Unicode such as   and  ,   and   or preferred forms that are encoded separately such as  (preferred in Taiwan) and  (preferred in mainland China) can be listed as being traditional/simplified in the   box. However, I don't think it is a good idea to consider  characters as being simplified. Some traditional characters in China are composed of simplified elements due to Xin Zixing such as <font size=4, lang=zh-Hans>殺 (mainland China) vs <font size=4, lang=zh-Hant>殺 (Taiwan) and <font size=4, lang=zh-Hans>鷀 (mainland China) vs <font size=4, lang=zh-Hant>鷀 (Taiwan). In this case both mainland China and Taiwan character forms are encoded under the same code point but are composed of different forms and have different stroke counts. I think having the   list is great but it needs to be properly checked and compared with the Unicode charts to ensure that the characters are actually different. To me, characters that were unified inconsistently (such as the anomalies given in the second top level of this discussion) should be added to the list while those that are unified consistently across its set of derived characters such as <font size=4, lang="zh-Hant">犮 /<font size=4, lang="zh-Hans">犮 should not be added to the list. Consider <font size=4, lang="zh-hant">任 /<font size=4, lang=zh-Hans>任 , listed as being traditional/simplified) due to the difference in composition of <font size=4, lang="zh-Hant">壬 /<font size=4, lang="zh-Hans">壬 . By analogy the derived characters of 任 such as <font size=4, lang="zh-Hant">凭 /<font size=4, lang="zh-Hans">凭 should be added as well. But if someone were to add in derived characters of  en masse, some anomalies are bound to occur such as in  , which is both a traditional character found in  and the simplified form of . Hence I don't think it is a good idea to define Xin Zixing characters that are unified consistently as being simplified. By the way, I'm using  fonts. It covers the differences in glyph shapes between different regions and supports all characters found in . KevinUp (talk) 04:40, 6 July 2018 (UTC)


 * , : Do you think  characters that have different glyph shapes but are encoded under the same code point such as <font size=4, lang="zh-hant">今 /<font size=4, lang="zh-hans">今, <font size=4, lang="zh-hant">令 /<font size=4, lang="zh-hans">令 , <font size=4, lang="zh-hant">艾 /<font size=4, lang="zh-hans">艾 , <font size=4, lang="zh-hant">叟 /<font size=4, lang="zh-hans">叟 , <font size=4, lang="zh-hant">丰 /<font size=4, lang="zh-hans">丰 , <font size=4, lang="zh-hant">壬 /<font size=4, lang="zh-hans">壬 , <font size=4, lang="zh-hant">呈 /<font size=4, lang="zh-hans">呈 should be considered as simplified and separated from traditional forms in the   box? KevinUp (talk) 04:40, 6 July 2018 (UTC)


 * We should determine a reasonable limit to this, or otherwise we might as well show zh_CN-Hans, zh_CN-Hant, zh_HK-Hant, and zh_TW-Hant at all times. —Suzukaze-c◇◇ 05:42, 6 July 2018 (UTC)


 * I think that one way to overcome this issue is to upload SVG files of such as,  and  Sans/Serif CJK to Wikimedia Commons so that the different character forms can be displayed independently of the fonts used by the user's computer system. Another possibility is to put a special note to specify that the character may appear differently due to  character forms rather than splitting the box into traditional and simplified forms. Note that some  and  images have already been uploaded to Wikimedia Commons, and these can be found on the  page on Chinese Wikipedia. KevinUp (talk) 07:56, 6 July 2018 (UTC)


 * (I just remembered, zh.wiktionary actually does show all 4 at once 🤔 —Suzukaze-c◇◇ 06:13, 15 July 2018 (UTC))


 * I am flattered to be pinged to this discussion, but I'm afraid I can contribute very little to these kinds of technical issues. I defer to the experts here. ---&#62; Tooironic (talk) 07:08, 6 July 2018 (UTC)


 * ,, , The problem with the output is that this template is outputting  , which only contains a language code. To get correct display between simplified and traditional Chinese, you need to use ISO 15924 script codes in the   attribute (i.e.,   and  ) because this information is what Web browsers use for correct glyph selection. For a proof of concept, see Template:CJKV-forms, which had the problem this template currently has; I just fixed it.


 * If it's necessary to display distinct glyphs for Hong Kong and Taiwan traditional Chinese, you'll have to get even more specific and use the language codes for Cantonese and Mandarin  (i.e.,   and  ). Or so I assume; I've never tried to display distinct glyphs in this case.


 * You can also use region codes:  and  . However, I dislike this approach because it ties a language to a political designation.


 * For the first, simpler case, it looks like there are two places in the code where  attributes are output and need to be fixed. In each,   needs to be output when the   arguments are ,   when they are  , and   otherwise.


 * I can attempt to fix this template myself, but I would prefer that someone else try since I don't feel particularly comfortable modifying live code in a programming language I don't know. (I have strong abilities in several programming languages, but Lua isn't one of them.) If no one tries, I'll probably make an attempt anyway.


 * —Patrick Dark (talk) 15:49, 5 August 2018 (UTC)


 * If we want to display proper glyphs, we should not be using regional language codes, which allows the user's browser to pick fonts, but we should try to use the classes listed in MediaWiki:Common.css. — justin(r)leung { (t...) 16:04, 5 August 2018 (UTC)


 * I also feel that region codes are a bad idea (as previously stated), but ISO 15924 script codes should be used. Users' browsers are already picking fonts since Wiktionary doesn't serve its own fonts. It's using a stylesheet to make educated guesses about what fonts are available on a user's system, but those fonts can't be predicted reliably and the guesses are more likely to be wrong for users on minority operating systems (e.g., Ubuntu (Linux)) such as myself. It therefore should be assumed that browsers will need this information until Wiktionary serves its own fonts.


 * As for that stylesheet, CSS has a  selector specifically for dealing with this subject, but it doesn't work properly if script codes aren't specified. This is evidenced in said stylesheet, which is using classes in an attempt to work around a lack of script codes. For example, code like   is brittle and breaks as soon as someone adds a script or region code; it should be.


 * —Patrick Dark (talk) 16:55, 5 August 2018 (UTC)

unified_char list
AFAIK, allographic variant characters like 次 / 次, 草 / 草 , 道 / 道 or 骨 / 骨 are rather regional differences than differences between traditional and simplified characters, also because these characters aren't part of the Complete List of Simplified Characters. Therefore it's probably better to abandon that list. By the way, this list is far from complete. --SelfishSeahorse (talk) 21:31, 18 February 2020 (UTC)

Edit: Some examples of characters that look the same in mainland China and Hong Kong, but different in Taiwan:

And and example of a character that looks different in mainland China, Hong Kong and Taiwan:

These examples show such variant characters are related to the region and not to simplified characters. --SelfishSeahorse (talk) 18:18, 20 February 2020 (UTC)


 * Honestly we should probably display a "Taiwan" section at all times, like zh.wiktionary, instead of maintaining these huge lists. —Fish bowl (talk) 01:02, 7 April 2022 (UTC)
 * and also 臺標 is ugly and does not deserve to be presented as 繁體字. —Fish bowl (talk) 06:40, 10 April 2022 (UTC)

2022
What do you think of removing  and instead adding separate rows for different country standards at all times?

Example:


 * "Traditional" (codepoint-wise same as Taiwan, but not in 臺標): 說夢話
 * PRC traditional: 説夢話
 * Taiwan traditional: 說夢話
 * HK traditional: 說夢話
 * PRC simplified: 说梦话

—Fish bowl (talk) 07:59, 1 May 2022 (UTC)


 * I think it would be nice, though would it look too clunky on the side? Another issue is that sometimes Taiwan or HK (and in the rare occasion, simplified) may have more than one acceptable/accepted variant (officially or otherwise); it’s hard to say for HK sometimes because there is much less standardization at the 詞 level afaik, given that there aren’t big official dictionaries for HK afaik. A third issue is that places like Macau don’t have a clear standard afaik; do we assume it’s traditional or following Hong Kong? BTW, the HK standard (according to 常用字字形表) should be 説夢話. — justin(r)leung { (t...) 13:26, 1 May 2022 (UTC)
 * Do you mean something like the Chinese Wiktionary template that shows variants and relatives?
 * Also what about adding a remote character composer renderer?
 * Like ⿱𥫗旦 becoming 笪 automatically?, it could pull a svg renderer or combining with a tag would render over those characters maybe? Kernel-chan (talk) 02:12, 2 May 2022 (UTC)
 * The problem is old and in fact since Unicode 4 has a clean solution. The attempt to support variants in browsers using language tagging (either with deprecated Unicode language tag characters, or using rich-text tagging with HTML, XML, or even CSS) is deprecated since years. The real solution (that works even in modern browsers, and renderers, even in plain text) is to use variation sequences (i.e. to follow the unified ideograph by a variation selector, which are standardized in the Ideographic Variation Database (IVD), a integral standard part of the Unicode character database (UCD). However this template (and the associated module) does not use any such IVD sequence.
 * Note that the module would need to specify which "variation selector" to use for each form of each ideograph (the same "variation selector" used after different characters are not warrantied to select the same form, and in fact Han ideographs may have MORE than just two forms ("simplified" and "traditional"). These variant forms may be encoded and added at any time in the Unicode standard (in the IVD) long after the encoding of isolated ("unified" or "compatibility") ideographs and isolated variation selectors: you need to use the normative data from the IVD (there, you'll find multiple variants for traditional forms, and multiple variants for the simplified form, depending on the language: Chinese, Japanese, Vietnamese, Korean, or relevant national standards).
 * It is the standardisation of the IVD that allowed Unicode and the ISO TC to affirm that there would no longer be any new addition to "compatibility ideographs" and that any such request for standardization will be now rejected (the two existing compatibility blocks in the BMP have been "frozen", except to fix a few missing characters that were forgotten in the relevant standards that were accepted and normatively referenced in past Unicode/ISO standard versions, due to past defects in the Han unification: all seems to be fixed now, and there are more quality assurance tools used by Unicode and the IRG to make sure that all variants are referenced in the IVD (all past compatibility ideographs are present in the IVD with their defined variation selector, along with the variation selector for the unified ideograph, so that canonical equivalence now works perfectly with Han characters). All "compatibility ideographs" are now deprecated (this does not concern 12 characters from the "IBM 32" subset that are present in one compatibility block, but that are NOT "compatibility ideographs" but are unified ideographs. Since this IVD standardization, all new additions to Han ideographs have only occurred with new blocks allocated exclusively for "unified ideographs", all of them mappable at any time in the IVD to assign their needed variants.
 * I then strongly suggest you to include support for the IVD (part of the standard UCD and integrated in the Unihan Database). And then generate variation sequences in this template, instead of relying of language tagging (which was experimental, and was removed from all modern browsers, whose text renderers are already capable to correctly display the variation selectors (with quality fonts that have mapping from them; legacy font mappings on compatibility ideographs is also starting to disappear, moderns fonts are now removing these old mappings in favor of mappings of variation sequences!).
 * Verdy p (talk) 22:20, 3 May 2024 (UTC)

Template:zh-forms not displaying a definition
In this entry (八面玲瓏), Template:zh-forms does not properly display defitions in the box. Maybe it's because all four defitions in the entry are prefaced with template:lb.--Prisencolin (talk) 07:33, 29 November 2018 (UTC)
 * I don't think that's the problem. The problem is probably that it only takes the first definition. The first definition is wrapped in, which it automatically ignores. — justin(r)leung { (t...) 23:07, 5 December 2018 (UTC)

|alt= parameter
Is the alt parameter for written or spoken variants? is unfortunately mixing the two.

I think it's better to move spoken variants to the "Alternative forms" section and format them with so that their pronunciation is visible. That's what I'm too.

--Dine2016 (talk) 15:10, 25 November 2019 (UTC)


 * idk, but i'd really like to distinguish the two tbh. —Suzukaze-c◇◇ 16:01, 25 November 2019 (UTC)


 * I think of 'alt=' as written variants which are pronounced in the same way as the word being defined (like on the or  pages). But here's another page where that rule is not being followed:  --Geographyinitiative (talk) 22:06, 25 November 2019 (UTC)


 * I agree. I prefer to have alt reserved for variations in orthography only, i.e. they are all pronounced the same way as the main entry. Any other type of "alternative form" should either be treated as a synonym (if it's very different) or as an alternative form under the "alternative form" header. — justin(r)leung { (t...) 22:43, 25 November 2019 (UTC)


 * Maybe? --Geographyinitiative (talk) 22:55, 25 November 2019 (UTC)

IDS
Could the IDS functions from Module:zh-sortkey be called so that these work properly? (and perhaps moved to a more general module? Module:Hani?)

—Suzukaze-c (talk) 08:23, 5 July 2020 (UTC)
 * Good idea. See the draft in Module:zh-forms/sandbox. (I haven't split the IDS code into a separate module yet.) It handles these two cases at least, though it might need more testing. — Eru·tuon 03:56, 6 July 2020 (UTC)

Broken 星期日
The template call outputs unterminated wikicode. Can someone look at it? --Derbeth talk 14:18, 20 December 2020 (UTC)

共和國
Also broken, outputs unterminated wikicode. --Derbeth talk 16:01, 2 May 2021 (UTC)
 * Justinrleung, I see you edit Module:zh-forms, are you able to take a look? --Derbeth talk 16:05, 2 May 2021 (UTC)
 * I've suppressed anything that shows wikicode like this as a temporary solution. I will dig into it a little more to come up with a better solution. — justin(r)leung { (t...) 17:00, 2 May 2021 (UTC)

Justify left and right seems random
Using this template, sometimes the box is sitting on the left, unchanged. Sometimes it has it's justify set to the right. I can't see anything in the module code that would do this, and it's very annoying. Is this meant to be this way? Levi OP (talk) 16:49, 10 January 2022 (UTC)


 * https://en.wiktionary.org/w/index.php?title=Module:zh-forms&oldid=66370000#L-80. Honestly I don't get it either. —Fish bowl (talk) 06:38, 10 April 2022 (UTC)
 * @Fish bowl Nice find. I might just be bold and change this. If anyone has an issue with it it can be reverted but I can't see any reason that it would be like this. Levi OP (talk) 01:08, 15 April 2022 (UTC)

I think the reasoning for this is that above a certain length it takes up the whole row anyway, so it may as well be aligned left. I don’t really mind it, but it seems to be set too short, and it does make it inconsistent. Theknightwho (talk) 12:32, 1 May 2022 (UTC)

ss and 二簡 1977 vs. 1981
You guys should probably figure out a way to adapt ss for whatever this difference is.


 * 1) https://en.wiktionary.org/w/index.php?title=罐&action=history (?)
 * 2) 蒙
 * 3) 鼠
 * 4) 雕
 * 5) 贏
 * 6) 粵

—Fish bowl (talk) 03:12, 18 May 2022 (UTC)

Issue with "Template:vern" transclusion in "橡果#Chinese"
I think has an issue with transcluding Template:vern in 橡果. One of the column headers displays html+handlebars literally instead of transcluding it. -- F1yingpig (talk) 00:27, 17 October 2022 (UTC)


 * The problem is in Module:zh/data/glosses; the template syntax is not rendered., how important are these templates in this context? Should they be processed? —Fish bowl (talk) 00:15, 7 May 2023 (UTC)
 * It is not at all critical. The problem may have to do with there being two templates on that line in the module. There are many instances of vern and taxlink in that module, but I don't recall ever seeing the two together.
 * The purpose of those templates is to count links to determine which organism names are most worth adding. I would like to count all uses of the name, from definitions, image captions, etymology sections, and the forms boxes. But, as I have to use the XML dumps to count the templates, I can't count links in the forms boxes. Arguably, they are of lesser importance than the links from the other items, but it may lead to failure to add organism-name entries that are of fundamental cultural importance in China and elsewhere in Asia. As far as I can tell, my template count finds only three links to Quercus serrata, whereas a search finds 18 uses. DCDuring (talk) 02:10, 7 May 2023 (UTC)

needs simplified form suppression
凝 "2nd round simp. 泞／𰛑" —Fish bowl (talk) 07:52, 5 March 2024 (UTC)


 * @Fish bowl ✅. I've also suppressed it for nonstandard simplified forms as well. Theknightwho (talk) 14:05, 5 March 2024 (UTC)

FYI this doesn't seem to be handling properly: see 上海白菜. Weylaway (talk) 22:12, 21 April 2024 (UTC)