Wiktionary talk:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters

feedback from -sche

 * 1) As written and titled, this vote would ban Romanizations / Pinyin like Běijīng (because they are "not in Chinese characters" and are not "entirely in Chinese characters"). I imagine that is not your intention (and even if it is your intention to ban Romanizations, that is a separate issue and should be a separate vote). With this vote, we want to exclude mixed-script entries like Thames河, and entries like *London / *Москва (non-Pinyin, non-Hanzi script), but we shouldn't affect Pinyin like Běijīng and Sān-K Dǎng.
 * 2) I would ban entries only if the non-Chinese part was present in another language with the relevant meaning. (Not "only if we had an entry for it in another language", but if it was present, regardless of whether we had an entry or not.) In other words, I would keep *Taymz河 (if it was attested, and if no language called the river "Thames" the "Taymz"), because such a term could not be considered code-switching or even borrowing, but would have to be considered a Chinese invention. This would "take care of" entries like "三K黨" (that is, it would allow them to be included), because "K" does not mean "Klu Klux Klan" in any Latin-script language.
 * 3) I suggest we remove the section "The user User:Engirst (aka User:123abc, User:Ddpy, and a large number of anonymous IP's)  who creates entries like Thames河: is known for creating Mandarin entries only in pinyin romanization, not following Wiktionary rules, generating IP's to avoid administrative bans." It seems irrelevant and ad hominem; at best, it is background information which everyone should have acquired by reading the RFD discussion. - -sche (discuss) 01:13, 3 October 2011 (UTC)


 * Ok, I agree with all three. Do you want to have a go or want me to change the wording? This is the first vote I have set up, I could use some help. --Anatoli 01:24, 3 October 2011 (UTC)
 * OK, I'll take out the second paragraph and make changes to the first paragraph. I'll think about how we can best word it, so that it excludes Thames河 and London (eg if someone found the line *"London 是欧洲第三大都市. ") but does not affect Běijīng, 三K黨, Sān-K Dǎng, or a hypothetical *Taymz河. - -sche (discuss) 05:24, 3 October 2011 (UTC)
 * I changed the mechanism by which the entries are excluded: I worded it so that it is a vote to consider "首先看到的是Thames河的出海口," not a use of "Thames"/"Thames河" as a Chinese word, with the effect that "Thames"/"Thames河" cannot meet WT:CFI (unless someone finds three citations of "Thames" being used with some meaning other than its English meaning, for example being used as a verb "to swim"). I find it hard to formulate the vote such that it does not affect "三K黨": you had written "an exception could be where there is no other way to write the proper name", but I think that seems likely to allow all small, unimportant placenames that occur in Roman or other script on Chinese maps of the world (being too small to have Hanzi names). :/ Hopefully other editors give input. - -sche (discuss) 06:14, 3 October 2011 (UTC)


 * For your reference 英國 London 大學地質學博士 Engirst 09:01, 3 October 2011 (UTC)

In the UK, there are a lot of spoken and written usages of mixed scripts, can we ban them? Engirst 11:30, 3 October 2011 (UTC)
 * Correction: In the UK, there are a lot of spoken and written usages of mixed scripts/terms, can we ban them? Engirst 11:57, 3 October 2011 (UTC)


 * That is literally impossible: scripts cannot be spoken. —Ruakh TALK 11:36, 3 October 2011 (UTC)


 * As I predicted, we may now expect entries like London市 - city of London, London大学 - London University, what's next? Engirst, Perhaps you should start a language that doesn't use complex scripts? Difficulty in transcribing foreign words in Mandarin is not a good reason to try and force it in Roman. --Anatoli 11:40, 3 October 2011 (UTC)


 * The fact is the people speaking and writing mixed scripts/terms, why not? Are you want to ban the people of the UK speaking and writing Chinese like this?  Engirst 13:21, 3 October 2011 (UTC)


 * This is not about what script is used. This is about what language is used, and how to categorize terms used in mixed-language contexts.
 * The script used is immaterial -- as described (I think accurately) by -sche above, the issue is whether these are examples of Chinese words, or examples of foreign words used in a Chinese context. Most of us here seem to be of the view that these are examples of foreign words used in a Chinese context.  Note that this does not make these words Chinese.  In Thames河:, "Thames" is still English.  In London大学:, "London" is still English.
 * These examples show mixed languages more than mixed scripts or terms, and they should be treated as such. There are no grounds for creating London or Thames entries.  By the same argument, there are no grounds for creating Москва or natsukashii entries.  -- Eiríkr Útlendi | Tala við mig 20:23, 3 October 2011 (UTC)
 * For your references: OK, pizza and Hyde公园. Engirst 20:41, 3 October 2011 (UTC)


 * This proposal is about proper nouns, not common. Common nouns: OK is standard, pizza may be OK in non-standard Mandarin (to address the formatting and/or CFI later). The proepr noun Hyde公园 is an example of SoP, code-switching and to be disallowed if this vote is passed, regardless whether there is a citation, it's an example of English names inside Mandarin, they are not Mandarin names. --Anatoli 21:45, 3 October 2011 (UTC)


 * -she, probably need to add a clause about citations on the page. --Anatoli 21:47, 3 October 2011 (UTC)
 * It's there; the vote, as worded, prevents *"London 是欧洲第三大都市. " and similar things from being considered citations (meaning that London and similar terms simply cannot meet CFI as Mandarin words, unless they take on new meanings in Chinese). - -sche (discuss) 22:17, 3 October 2011 (UTC)


 * Thanks, is something to be done before the vote apart from spreading the word? I will try to check the wording again, some time before the votes starts, allowing everyone to discuss. --Anatoli 22:31, 3 October 2011 (UTC)


 * Is Hyde Park (海德公园, Hyde公园) really sum of parts? Engirst 22:03, 3 October 2011 (UTC)


 * @Engirst --
 * I realize there is some dispute about what constitutes SOP for place names. That said, "Hyde" in Hyde公园 is *still English*.  Consequently, Hyde公园 has no place under a "Mandarin" heading.  -- Eiríkr Útlendi | Tala við mig 23:09, 3 October 2011 (UTC)


 * I think we can classify this as a "hybrid language" or "mixed language", common with people living outside their home country, where one of the parts is not an assimilated part of the main language. If Hyde Park meets CFI then /  (Hǎidé Gōngyuán) also could but not Hyde公园 where "Hyde" part (the actual name) is in English, not in Mandarin.  /  (gōngyuán) is simply attached to the names of parks, like  to the names of rivers and  to the names of the cities but these suffixes or words don't convert the preceding words into Mandarin. Don't know if Wiktionary caters and should cater for mixed languages. --Anatoli 23:31, 3 October 2011 (UTC)

Can we generalize?
This is all specific to the Mandarin language, but does it make sense for us to create a more general rule governing this type of word going forward? When is it OK to include a mixed language word? A mixed script word? A mixed code word? I would hate to see this vote take place and the CFI updated only to have the same debate spring up in Japanese next week and Arabic the week after that. - 10:31, 6 October 2011 (UTC)


 * I know what you mean. I felt that even including all words, not just proper noun may be too difficult. The situation with proper nouns will be similar with all languages using non-Roman letters - all foreign names are written in the native script. The level of acceptance and tolerance of non-native script is different. Chinese is definitely the hardest language to transliterate a foreign name into, so, out of laziness, lack of knowledge, for clarity or just for fashion, people may occasionally write a name in Roman letters, less commonly in Cyrillic, even less Greek or Korean Hangeul.


 * Not sure if the community will look at it differently but see:


 * All brand and company names can be written in Roman letters in all languages along with the transliteration, sometimes translation - 微软 (Wēiruǎn) - Microsoft, 谷歌 (Gǔgē) - Google.


 * Japanese use katakana for foreign names. English words, sometimes very long penetrate Japanese in katakana form, Japanese create their own "English" words from existing roots, parts of word. I digress, all proper nouns are in katakana, unless they are a Japanese invention, except for abbreviation and company names, like JR/ＪＲ. Some brands like Sony are written in Romaji officially, making it difficult to decide whether the term "Sony" is Japanese or English or international.
 * Russian (also Belarusian, Ukrainian, Bulgarian, Macedonian), Korean, Arabic, Hindi, Greek, Persian as far as I know, transliterate proper nouns into native scripts. Company names and brands are a contentious issue.


 * If the current vote passes, it may serve as a basis for other but I think other languages will have less issues. Mandarin is known for the difficulties in proper nouns. In literature, some names may have a big variety of translations, multiplying by the number of standards across Greater China. --Anatoli 11:10, 6 October 2011 (UTC)


 * It's a very good point that mixed-script code-switching occurs in other languages, but like Anatoli says, we should have a narrow vote on Mandarin first. In part, this is because we've had a strong and coherent consensus in the discussions regarding Mandarin, but seen less coherent positions expressed regarding other languages. If this vote passes and voters want to generalise (or if it fails because voters do not want to estasblish criteria for only one language, but want general criteria), it can be a basis for another vote. - -sche (discuss) 18:04, 6 October 2011 (UTC)

soft redirects
If we cannot agree to soft redirects as a compromise (which is not clear yet), I think it is best to hold the vote to ban mixed-script entries, without the option of soft redirects. (That option neutralises the vote.) Wiktionary's Chinese-speaking editors (except Engirst) agree the entries should be banned, and many other editors support or defer to the same view, so a ban will probably pass. It won't affect mixed-script, non-SOP common nouns, but I think banning the biggest class of potentially not-SOP mixed-script entries, and then handling (possibly keeping or possibly banning) the presumably much smaller number of possibly not-SOP entries that do not contain proper nouns, is a good approach. - -sche (discuss) 09:30, 8 October 2011 (UTC)


 * The vote has started: Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 04:27, 18 October 2011 (UTC)