Wiktionary talk:About Sinitic languages/archive 1

What heading?
Reading this page, I'm left with the impression that instead of ==Chinese==, Wiktionary would prefer to use ==Mandarin== and ==Cantonese==. Is this interpretation correct? In the variety of cleanup scans I do from the monthly XML dumps, I've usually had to exclude Chinese, as we never used to have knowledgable contributors to ask. Lately, we seem to have aquired five or six.

The problem I have, parsing entries, is that on the English Wiktionary, a level two heading is assumed to be the language. Lately, I've seen ===Mandarin=== (third level heading) which to me seems wrong.

I've also seen ==Mandarin== as a heading. Before just a few moments ago (reading this page) I was convinced that heading was not a valid language heading. Is it? Or have I just been utterly confused by the effects of User:NanshuBot's activities from years ago? --Connel MacKenzie T C 17:22, 13 June 2006 (UTC)


 * Whether to use "==Chinese==" or "==Cantonese==" and "==Mandarin==" depends on whether you consider Chinese to a be a single language or a language family. There are serious implications (both linguistic and political) of each stance. Rod (A. Smith) 00:30, 17 June 2006 (UTC)


 * When I first started entering words into Wiktionary, I used level two headings such as ==Mandarin== and ==Min Nan==, but this was met with resistance. I'm not sure why.  ==Chinese== simply is too vague for Wiktionary purposes.  This is why I began to add level three headings.  Linguistically, this is akin to doing something like ==Romance language== followed by ===French===.  At any rate the headings are not much help in terms of stats because they are too inconsistent.  In addition to the above, I have also seen ==Simplified Chinese==, ==Chinese Hanzi==, ==Chinese Pinyin== etc.  I'm not sure how this can be fixed.  Perhaps the wiktionaryz people will come up with a better way.  I think we need some way to tag an entry by its precise language classification.  Perhaps ISO codes in combination with Templates could help out.  For example, we could tag "color" with , and "colour" with  .  For Chinese, you could do the following:

I did take a little liberty with the ISO codes (there is no nan-poj) because even ISO 639 is not sufficiently detailed for our purposes. At any rate, tagging terms in this way would allow us to more precisely keep track of the words in each language. It would also allow the user to filter a search by saying, "give me only results for Cantonese written in Traditional Chinese characters " etc. In order for this to work, you would have to make it a requirement that each and every entry have at least one previously agreed upon tag. Of course, some entries will have multiple tags. A-cai 14:50, 23 July 2006 (UTC)

Need to rework the wording
And, in the R.O.C., Pe̍h-oē-jī is an orthography, that was introduced by Presbyterian missionaries as a romanization system for the Taiwanese language; which is spoken by the "Native" (situation akin to North American Native) People of Taiwan.
 * This sentence needs to be reworked a little:

A-cai 16:13, 7 July 2006 (UTC)
 * 1) And, in the R.O.C., Pe̍h-oē-jī is an orthography
 * This makes it sound as though Pe̍h-oē-jī is the standard method of romanizing Taiwanese within the ROC. This may have been true at one time, but a quick perusal of recent teaching and reference materials published in Taiwan will show that TLPA is now more common.  However, I don't mean to imply that TLPA OR Pe̍h-oē-jī is recognized as a standard.  Unlike, Pinyin in the case of Mandarin, Taiwanese Min Nan still lacks a method of romanization that is a de facto standard.  Since almost all Taiwanese Min Nan speakers are also fluent in Mandarin, it is far more convenient for them to write in Mandarin.  This has probably retarded the development of a standard writing system for Min Nan.
 * 1) which is spoken by the "Native" (situation akin to North American Native) People of Taiwan
 * This is tricky, the Min Nan citizens of Taiwan describe themselves as 本土, which can be translated as "Native." However, when I first read this sentence, I was left with the impression that the Min Nan citizens of Taiwan are the original indigeneous people of Taiwan.  This is not the case.  Taiwan does have aboriginal people, but they are not linguistically or ethnically related to the Min Nan people.

Text removed from page
The following text was removed from the policy page, while some of it is useful, it is heavily POV, treating the Chinese language group as ~ Standard Mandarin and variants. Some may be useful, and added back into the page. Robert Ullmann 06:26, 10 February 2007 (UTC)

Question
How does Wiktionary deal with Chinese languages other than Mandarin that have their own script?

Background
Most people quickly recognize Japanese and Chinese as two separate languages, even though Japanese borrowed heavily when it created its classic writing system kanji, and thus resembles the Traditional Chinese logogram collection. However, Chinese is not a single language. Chinese might be more helpfully thought of as a culture, which is composed of many different societies, which each speak and write a different language. Thus, a whole family of languages are associated with those Asian societies that are known as Chinese. The Chinese speak many languages (such as Mandarin, Cantonese, Shanghainese, Hakanese, etc.), each of which is unique in its own way, holds its own history, and is (by and large) spoken in different geographic regions. Each of these languages possesses different phonological characteristics, but also significant differences in both verbalized and othographic syntax. The Mandarin language is a Tonal language, which employs a quadriad of tonal phonemes during spoken word generation. Mandarin is very different from western languages in this regard. The orthographies of the Chinese consist of bounded collections of logographs, for which spoken words have been associated. The orthographies do NOT consist of phonetic alphabets, which are able to compose unlimited numbers of words (as with Western languages). The "working" logogram collection comprises approximately 2500 logograms.

There exists a common misconception that all Chinese languages use one and only one common script as their orthography. But in fact, the situation is much more complex. In a nutshell, SWC does not consist of one orthography, but two, which exists in Classic and Simplified forms. Each orthography is associated with the Mandarin Language, which is spoken as a primary language in the People's Repuplic of China (Beijing), the Repuplic of China (Taiwan), and Singapore. The orthographies of other Chinese languages, like Shanghainese, Cantonese, Taiwanese, etc. have various logograms in common with those of SWC. The orthographic commonality has been an ongoing situation throughout the history of Chinese languages. Through history, there was Classical Written Chinese (CWC) that was used for mutual communication. By the 1800s, however, it did not represent the "modern" spoken form of any Chinese language. CWC was reformed into a SWC at the turn of the century by a demand to have a written language that more closely represented a spoken Chinese language. Mandarin was the spoken language that SWC was developed to represent, as it was the primary language spoken in and around the Chinese capital (now Beijing). This same situation has been on the minds of speakers of other Chinese languages for hundreds of years. There is Written Cantonese (which has a history of at least a couple hundred years) which uses modern characters present in the classic form of the SWC, archaic characters not in SWC, and recently invented characters. And, in the R.O.C., Pe̍h-oē-jī is an orthography that was introduced by Presbyterian missionaries as a romanization system for the Taiwanese language, which is spoken by the "Native" (situation akin to North American Native) People of Taiwan. Hanyu Pinyin, (or more commonly Pinyin) and Zhuyin Fuhao are systems of romanization (phonemic notation and transcription to Roman script) used by students, for transliteration between the Mandarin and English languages.

Problems
From my understanding right now, it appears that Wiktionary is setup so that each language version has a dictionary in its own language and translates multiple foreign words into its own language. For instance, English Wiktionary defines a word like "cat" in English, lists foreign words with the same meaning like "French: chat", and can link "chat" to a separate entry which defines the French word "chat" in English.

Right now in the list of translations, many entries have simple "Chinese:..." without distinguishing whether it is a Mandarin or Cantonese or some other Chinese language. Even if this was clarified, there is still the interesting problem of what to do when the tranlsation for Mandarin is has the same glyph form as Cantonese (e.g. English:July, Mandarin:七月， Cantonese:七月). The problem already exist with Japanese (July is also 七月 in Japanese) and have been dealt with by have a single entry for 七月 but separate sections for Chinese and Japanese.

Possible Solutions

 * Distinguish the Chinese languages more precisely. On the list of translations, most current entries are labeled "Chinese" when they would be more properly labeled as "Mandarin."  The same should apply to an entries like "七月" where there are multiple languages that use the same written form.
 * In this situation, then written forms in Chinese languages with other scripts would handled elegantly by having their own page (or a shared page if another language used the same written form). For instance, the Cantonese words "而家" (Eng: now) and "BB" (Eng: baby) are not in Mandarin, but they would still be able to have their own page to define this written form.

Benefits
Outside of Mandarin, many of Chinese languages have been sadly under-studied and resources for foreign students are lacking. Take Cantonese for example: although there are a handful of dictionaries (many outdated), but none are truly comprehensive or give sufficient examples of usuage to be really pratical for a student. I would love to see Cantonese translations on English Wiktionary developed to the extent where entries could have example sentences or phrases.

There is another important reason why we should distinguish between the different Chinese languages. There are written forms that are in multiple Chinese languages but do not necessarily have the same meaning (note, this is also a problem between Japanese and Chinese). For instance, in Cantonese 整 can mean "do" but in Mandarin it means "whole". There are plenty more examples of this situation.

end of moved text Robert Ullmann 06:26, 10 February 2007 (UTC)

Chinese translations
Hi everyone, you probably have noticed that I have been doing the chinese translation for a while... Could I possibly have some opinions on which version of the translation you would prefer in wiktionary? Thank you very much!!
 * version one: time
 * version two: civilization
 * version three: anything not one or two

Thanks everyone!! ^^ Chloejr 16:36, 31 December 2007 (UTC)


 * Chloejr, allow me to explain my reasons for preferring version one:
 * In English, the word Chinese usually refers to Mandarin. However, in the west, there are a number of speakers of other Chinese dialects.  For example, Cantonese and Min Nan are also quite common.  I believe it is better to identify the language as either:
 * Mandarin:
 * or
 * Chinese
 * Mandarin:
 * For Pinyin Romanization, most Chinese dictionaries only use spaces to separate full words. For example, 文明 would be wénmíng, because it represents a single word.  However, 东欧 would be rendered as Dōng Ōu, since it actually represents two words: Eastern and Europe.  Notice, that I have capitalized the first letter of each word in Dōng Ōu.  That is because Eastern Europe is a proper noun.  A good Chinese print dictionary such as 现代汉语词典 ISBN 9620701348 can help you to know the "correct" Pinyin spelling.
 * If a word is the same in both simplified and traditional, it is acceptable to write it only once. In the example for 文明, you could write:
 * Chinese
 * Mandarin: 文明 (wénmíng)
 * or
 * Mandarin: 文明 (wénmíng)
 * the reason that the Pinyin is hyperlinked is that Wiktionary also includes entries for Pinyin spellings. For example, shíjiān.


 * That's all I can think of for now. 如果上面有什么不清楚的地方，请随时给我贴讯息（中英都可以）.   -- A-cai 23:48, 31 December 2007 (UTC)

zh-tw, zh-cn, etc.

 * Moved discussion from main Project Page to here
 * Nbarth (email) (talk) 17:03, 8 June 2008 (UTC)

I need a little help identifying the difference between the numerous zh categories, since I can't find the explanations in the categories themselves. I'm worried I'm screwing it up and if I am I'd like to stop as soon as possible. :) Ric | opiaterein 19:37, 22 October 2006 (UTC)


 * cmn, nan and yue are all ISO 639-3 codes. They all belong to the macro language code zh from the original ISO 639.


 * I'm not sure what the intention was for cmn, nan and yue, but since there are no codes for romanizations such as Pinyin, Jyutping, POJ etc, I decided to use them for that. This limits each one to the main phonetic system (e.g. cmn for Chinese Mandarin Pinyin, nan for Min Nan POJ, yue for Cantonese Jyutping etc).  I debated switching all the zh codes to cmn for the sake of consistency, but have not done so as of yet.  More people are familiar with zh than cmn (if they know the codes at all).  So here is the way it breaks down:


 * zh-cn = Chinese (Standard Mandarin) in simplified script, per PRC usage
 * zh-tw = Chinese (Standard Mandarin) in traditional script, per Taiwan usage
 * zh = Chinese (Standard Mandarin) in romanized Pinyin
 * nan-cn = Min Nan (Amoy) in simplified script, per PRC usage
 * nan-tw = Min Nan (Amoy) in traditional script, per Taiwan usage
 * nan = Min Nan (Amoy) in romanized POJ
 * yue-cn = Cantonese in simplified script, per PRC usage
 * yue-hk = Cantonese in traditional script, per Hong Kong usage
 * yue = Cantonese in romanized Jyutping (I don't speak Cantonese, so if this is wrong, please correct)


 * I took it upon myself to work all of this out. In the absense of feedback from a large number of contributors, I tried to work out a system that I felt would make logical sense.  Time will tell whether it is the best system, but atleast it is a system (that a bot could later modify as needed).  I will try to write something in WT:AC about the categories, but I agree that perhaps the best way would be to make a note in the categories themselves.

A-cai 22:34, 22 October 2006 (UTC)


 * P.S. I did create two templates a while ago that could be used more often: and  :

and


 * See Category:zh-cn:Beginning Mandarin for an example of their use. Perhaps these and other similar templates should be placed in the categories for all of the Chinese dialects.

A-cai 22:42, 22 October 2006 (UTC)


 * Thanks :) Just wanted to make sure I was putting things in the right categories. Ric | opiaterein 21:20, 23 October 2006 (UTC)

Character forms and romanization
I’ve added a discussion of character forms and romanizations, which I believe reflects consensus – feel free to change as necessary.

Nbarth (email) (talk) 17:32, 8 June 2008 (UTC)

below? should it not be above?
The project page says: This template,, should be placed below the language header. But is it not above? because else which language are we talking about if it both has meaning in cantonese and mandarin? The similar template ja-forms is placed above in the example 翻译. Problem with putting them above is it will break layout when there are no TOC like in 丹麦 (see this earlier version ). So what is the right thing to do? Kinamand 13:27, 20 June 2008 (UTC)


 * I had been placing the template above the language header until recently. I am now convinced that it is better to put it underneath the language header.  The entry for 現在 should illustrate my line of thinking.  Note the difference between the template in the Japanese section vs. the rest of the languages.  -- A-cai 15:11, 20 June 2008 (UTC)


 * I understand but what about ja-forms? It is not used in 翻訳 which the project page write as an example but it is used in 櫻. But is it used correct in 櫻 or should it be placed under the japanese language header? Kinamand 05:31, 21 June 2008 (UTC)


 * The honest answer is that we've never had enough Asian language experts on hand to reach any kind of consensus. In the absence of such a consensus, I have tried to make my own entries as consistent as possible.  However, I have gradually refined my ideas about how entries should be formatted.  If you were to run across one of my entries from 2006 (which has not yet been updated), you would find that it looks a lot different from an entry that I made a few days ago.  Sometimes, it is only after creating several hundred (or several thousand) entries that I arrive at certain opinions about how to best format certain parts of an entry.  If you would like to see a fair representation of the way I currently am formatting entries, see Special:Contributions/A-cai.  -- A-cai 05:43, 21 June 2008 (UTC)

Chinese Categories
The section which describes chinese categories does not say anything about why for example there are two categories for chineses nouns: Category:zh-cn:Nouns and Category:Mandarin nouns. Should we continue to have both or is the first obsolete? Kinamand 08:17, 17 August 2008 (UTC)


 * Category:Mandarin nouns includes all nouns, regardless of orthography. In contrast, Category:zh-cn:Nouns (should) only contain nouns which are written in Simplified Chinese.  Similarly, Category:zh-tw:Nouns (should) only contain nouns which are written in Traditional Chinese.  Many students of Chinese either learn one or the other, but not both.  The zh-cn/zh-tw categories are intended to help with this by not mixing the two scripts together within the same category.  Hope this explanation helps.  -- A-cai 23:38, 18 August 2008 (UTC)


 * Hi A-cai. Thanks for your answer. are intended to help with this - what do you mean. Just because a student have only learnt traditional chinese you think it would be problem for him or her to use a category which contains both? I dont believe that since they are very similar but I can see that the sort order on Category:zh-tw:Nouns are different from the zh-cn and mandarin category. It that the reason for the extra categories? In that case I will suggest the 3 categories mandarin + zh-cn + zh-tw be changed to "Mandarin sorted by pinyin" and "Mandarin sorted by radical and stroke number" (maybe we can abbreviate the categories names). What do you think? Kinamand 07:15, 19 August 2008 (UTC)


 * I like the stroke category idea, but that would apply to all writing systems that use Han characters, not just Mandarin. (perhaps Category:Han characters by stroke?)--TBC  07:27, 19 August 2008 (UTC)


 * Or more like Mandarin nouns by stroke, Mandarin nouns by pinyin,  Cantonese nouns by stroke, etc. Kinamand 12:37, 19 August 2008 (UTC)


 * For a more complete description of my orginal thinking on the subject, please refer back a couple of posts to Wiktionary_talk:About_Chinese. Please keep in mind that the zh-cn/zh-tw scheme now applies to dozens of categories.  I would not recommend changing anything unless there is broad consensus within the Wiktionary community.  If you feel strongly about the issue.  It may be worth it to raise the issue on WT:BP.  -- A-cai 00:13, 21 August 2008 (UTC)

I have raised the issue on Beer Palour and there Robert Ullmann writes that it is easy to change because we only have to edit the POS templates and he think that we here on About Chinese should decide about structure and names. I think we should have the names Mandarin nouns by radical, Mandarin nouns by pinyin, Cantonese nouns by radical etc. What do you think? Kinamand 11:43, 27 August 2008 (UTC)

I have just found another strange thing about the structure of the categories. Look at Category:zh:Verbs. You write no suffix for the standard romanization (Pinyin, POJ, Jyutping) but actually it does not contain pinyin entries instead it has Category:Mandarin verbs as subcategory. It is in my opinion a mess. It can be fixed in the template but it is an good example of how confusing it is right now with two set of categories. Kinamand 08:24, 1 September 2008 (UTC)

cmn-hanzi and hanzi section under mandarin section
The project page dont tell anything about the cmn-hanzi template. I looks like there are supposed to be a hanzi section under the mandarin section. But should the hanzi section be the first or last section under the mandarin section? looks like it is supposed to be the last subsection but it is confusing since now when there are no definition (defn template) the hanzi is first and defn under so if I replace the defn with at definition the hanzi section will be first. See 東. Kinamand 12:34, 24 October 2008 (UTC)


 * See the entry for 字. We used that entry as a sort of template for how all single character entries should eventually be formatted.  See 八 as well.  -- A-cai 12:59, 24 October 2008 (UTC)
 * But 字 dont have a hanzi section in the mandarin section and 八 have it between the Pronunciation and Noun section. Why does 字 not hav a hanzi subsection in the mandarin section? Kinamand 13:36, 24 October 2008 (UTC)
 * There was some confusion early on about whether the hanzi section should be retained, once the Mandarin definitions were added. One of our administrators made an argument that the hanzi template is still needed, even after the rest is added.  The hanzi template should be above the other inflection lines, as you see in the entry for 八.  I have modified 字 so that it now conforms to the latest thinking with respect to single character entries.  -- A-cai 14:32, 24 October 2008 (UTC)
 * When I look at 八 I can see that the Compounds subsection is under the noun section. I think that in most other entries is compounds a subsection to the hanzi section. What is the right way to do it? Kinamand 12:46, 27 October 2008 (UTC)


 * Compounds should only be under the Hanzi section, terms derived from the noun should be Derived terms as anywhere else in the wikt. This is one of the reasons to have the Hanzi section; there will be compounds that use the character that are not derived from the noun or other PsOS shown. Another reason is to show the pinyin for the readings that are not the PsOS given. Robert Ullmann 14:14, 27 October 2008 (UTC)
 * I have now moved the compounds section in 八 to under the hanzi section. Please take a look and tell me if it is right or wrong. Then in the future I will use 八 as "template" when I update other mandarin entries. Kinamand 20:56, 27 October 2008 (UTC)


 * Robert, thanks for the explanation of the difference between compounds and derived terms. I don't remember seeing that before, but it makes sense.


 * Kinamand, I think what Robert is saying is that the compounds section are for compound words which include a single character, even if the compound word doesn't have any thematic relation to the single character. For example, 八卦 (see Bagua) might go under a derived terms section, but one might argue that 三八 (a pejorative term for women: see explanation in Mandarin Chinese profanity) could go under a compounds section.  Actually, you could go either way with 三八.  This stuff is not clear cut.  In my experience, analyzing languages is part art and part science.  -- A-cai 19:05, 28 October 2008 (UTC)
 * Ah! I understand. Thanks for the explanation. Yes it really is some kind of art to make right here :-) Kinamand 09:53, 30 October 2008 (UTC)

categorizing characters by component radicals
I don't know if this is the proper discussion page for this, as this is more generally a CJK character issue rather than specifically about the Sinitic languages. Would it not be a good idea to categorize each character entry according to the different radicals that comprise it? This way, entries for characters can be cross-referenced by radical using Wiktionary's own category mechanism. - Gilgamesh 03:36, 6 December 2008 (UTC)

Chinese vs. Mandarin
Please see this straw poll and the discussion that led to it. DAVilla 05:37, 10 May 2009 (UTC)
 * The links just leads to the Wiktionary:Beer parlour main page. Please give us the correct links and tell us why you want us to look at it. Kinamand 09:01, 26 August 2009 (UTC)

Why is it so difficult to find this page?
I can not find any way to go to this page from the main page. The only things that links to this page is some private discussions. It would be very helpful if people can find this page since it contains essential information about how to make entries in Chinese which is rather complex. Can someone add it to some main page? Kinamand 09:11, 26 August 2009 (UTC)
 * You can find the list of language policy pages by click on the "WT:AXX" link on the top of the main policy pages such as WT:ELE and WT:CFI. Alternatively, if you remember that "zh" is the language code for Chinese then you just go to the shortcut "WT:AZH". --Bequw → ¢ • τ 03:26, 18 December 2009 (UTC)

Template:zh-tone
Please see this link. Mglovesfun (talk) 11:14, 6 December 2009 (UTC)

Also, see Beer Parlour. About what to do with toneless pinyin. —Internoob (Disc.•Cont.) 02:31, 10 December 2009 (UTC)

Context (sic) templates
I'd like to know what these are, and what they do. We're having a review of context labels, and AFAICT these aren't to do with context. I'm not suggesting that they should be deleted, but it needs to be more obvious what purpose they serve. Thanks, Mglovesfun (talk) 11:34, 14 December 2009 (UTC)


 * See RFDO --Bequw → τ 14:46, 1 February 2010 (UTC)
 * See RFDO --Bequw → τ 14:46, 1 February 2010 (UTC)

Category names
The WT:BP discussion lead to Votes/2009-12/Chinese categories which just started. Please contribute. --Bequw → ¢ • τ 03:28, 18 December 2009 (UTC)