User talk:Wyang/Archive1

斗
As far as I know, no one has expressed an opinion on the edits you tried to retract. I accidentally blocked the IP that added them (presumably you), but I thought I was blocking another IP when I did it. I unblocked the first IP as soon as I discovered my error. My apologies for any misunderstanding. Chuck Entz (talk) 21:12, 18 January 2013 (UTC)


 * All those IPs were me. Wyang (talk) 09:53, 22 January 2013 (UTC)
 * I'm sure there are lots of people from other countries who disagree with the way we drive on the right side of the road in the US, but I've never heard of any of them driving on the left side just to make a point- it wouldn't change anything important, but it sure would make a mess of things... Chuck Entz (talk) 11:29, 22 January 2013 (UTC)


 * You are using an inappropriate analogy. Driving on which side of the road is unimportant because the purpose of driving is to reach a destination, and the options are basically equally efficient in this respect. But here the right side of the road is full of bumps and hollows and may not even lead to the desired destination, whilst the left side is well-paved and -targeted. I know this has been raised a zillion times but whenever the issue was raised, the decision-making coterie has been reluctant to realise the benefits and opposed it fiercely. Yes the issue is complicated, but asking people to comply with whatever rules the previous people came up with but not the most logical rules is not the way to resolve this. This issue is going to be raised another zillion times and people need to examine the issue impartially and accept that a change in format is hugely beneficial to further editing. Wyang (talk) 10:38, 24 January 2013 (UTC)

Template:zh-n.
What is this? Mglovesfun (talk) 11:01, 22 January 2013 (UTC)

囧
Could you please provide the pīnyīn transcription and translations of citations as separate lines instead of in a template? For an example of what I mean, see the quotation at Νεφελοκοκκυγία. —Μετάknowledge discuss/deeds 08:30, 24 January 2013 (UTC)
 * I don't think Pinyin is necessary. Wyang (talk) 10:38, 24 January 2013 (UTC)
 * zh-n. is not an intuitive template, and traditionally any Wiktionary templates that use mouseover to show more information, like certain Persian conjugational templates (if my memory serves me well), have an obvious explanatory line at the top. —Μετάknowledge discuss/deeds 02:16, 25 January 2013 (UTC)

Note: Template talk:pinyin-analyser. —Μετάknowledge discuss/deeds 07:20, 3 February 2013 (UTC)

bí mật
Hi,

Good job! Only điều bí mật is also a synonym for bí mật in my sources. Please add, don't replace words. Do you mind adding a to your user page? --Anatoli (обсудить/вклад) 02:02, 2 February 2013 (UTC)


 * điều bí mật just means "secret things", to emphasise that "bí mật" which has both adjective and noun senses, is used here as a noun. Same for sự which is used if the following word has both verb and noun senses. Wyang (talk) 10:39, 2 February 2013 (UTC)


 * About the header of Babel, I got into the weird part of YouTube, which I may not mind. --Lo Ximiendo (talk) 10:19, 3 February 2013 (UTC)


 * The deal with words such as điều or sự is to display them, anyway. One can use alt=điều bí mật, e.g. bí mật (changing now). Oh, please don't be too sensitive on us using "cmn" instead of "zh". Your language hasn't been destroyed. We cover dialects as well, especially "yue" and "nan". We had long discussions and votes, so we had to compromise. If you add "cmn" in the Babel, users will know that you speak Chinese, especially its standard and most known form - Mandarin. --Anatoli (обсудить/вклад) 11:43, 3 February 2013 (UTC)


 * zh-N, cmn-4 will appear as cmn-N, cmn-4. Wyang (talk) 00:33, 4 February 2013 (UTC)


 * Not sure what you mean, sorry. Do you mean that and  have wrong wordings? --Anatoli (обсудить/вклад) 00:44, 4 February 2013 (UTC)


 * I have replaced 中文 with 普通話/國語 and 普通话/国语 in language user templates, even though it doesn't cover all names for Mandarin, these templates separate Mandarin speakers from 粵語/粤语 speakers, etc. --Anatoli (обсудить/вклад) 00:54, 4 February 2013 (UTC)


 * I speak non-Mandarin Chinese natively (and I don't really know which branch of Chinese this dialect falls under under ISO 639), and can communicate well (4) in Modern Standard Chinese (MSC). I can't find templates to accurately describe this situation. doesn't exist any more, and  is confusing MSC (no code) with Mandarin (a group of Chinese dialects, code: cmn). Wyang (talk) 01:41, 4 February 2013 (UTC)


 * The branches ("dialects") are somewhat determinable by region... I'm sure you could figure out what it's called in English by the ISO if you look it up on Wikipedia. If you feel comfortable telling me where you grew up, I could help finding possibilities for you to check. I think grouping MSC with actual Beijing region dialects is probably the best way to solve that particular subproblem, just based on coverage and similarity. —Μετάknowledge discuss/deeds 01:47, 4 February 2013 (UTC)


 * We don't have templates for all dialects. That maybe a problem but small dialects are usually not in big demand. You can add your dialect to your user page, so people could find you. We merge here Mandarin and MSC, treating entries/translations in standard Mandarin and Northern Chinese as one language. Using, marking words as regional dialects and using other means. Everything is solvable, you can even create templates for your dialect, discuss some technical details first. --Anatoli (обсудить/вклад) 01:52, 4 February 2013 (UTC)


 * Which dialect is unimportant, because that is too specific. It doesn't make sense that doesn't exist, while  or  do. All are macrolanguages. Arabic or Malay speakers are likely to perceive themselves as speaking Arabic or Malay (not some variety that has an ISO code) when encountering non-speakers, the same way that Chinese speakers do. Wyang (talk) 02:37, 4 February 2013 (UTC)


 * Standard or most common Arabic (including certain colloquialism common to various dialects, loanwords) us "ar". For dialects we have "ary", "arz", etc. Arabic wan't heavily discussed, we didn't have battles and multiple votes about. As I said, a while ago we have reached a compromise for translations:

etc.
 * Chinese:
 * Cantonese:
 * Mandarin:
 * Min Nan:
 * Having "Chinese" as the main header for entries was rejected by some of your compatriots and Taiwanese people and others. "Mandarin" is more specific than "Chinese". When one says "an Arabic word", no-one immediately question which variety, "a Chinese word" raises questions like "Mandarin or Cantonese". If we used zh instead of cmn, yue and nan a and "Chinese" instead of "Mandarin", "Cantonese" and "Min Nan" we would have a mix-up. In any case, things are the way they are, if you want to open this can of worms, then you can start a discussion in the Beer parlour. I personally don't want any change and other Chinese editors (native or learners) got used to the status quo. --Anatoli (обсудить/вклад) 03:03, 4 February 2013 (UTC)

Or,
 * Chinese:
 * Cantonese: 安全
 * Classical Chinese: 安全
 * Gan: 安全
 * Hakka: 安全
 * Huizhou: 安全
 * Jinyu: 安全
 * Mandarin: 安全
 * Middle Chinese: 安全
 * Min Bei: 安全
 * Min Dong: 安全
 * Min Nan: 安全
 * Min Zhong: 安全
 * Old Chinese: 安全
 * Pu Xian: 安全
 * Xiang: 安全
 * Wu: 安全

as in 祕密 or 安全. Specificness is hardly an improvement. Besides, equating "Mandarin" with (or using it to denote) "" or "" is just outright wrong. Wyang (talk) 03:17, 4 February 2013 (UTC)

谢谢！
Thank you for the great Mandarin entries, as well as the Korean bits... if you're ever looking for more red links to create, there are a bunch at User:Tooironic that could use some attention. Again, thanks! —Μετάknowledge discuss/deeds 05:30, 5 February 2013 (UTC)


 * OK. Is there a longer list? Wyang (talk) 05:35, 5 February 2013 (UTC)


 * Yeah, there's also Appendix:HSK list of Mandarin words/Elementary Mandarin (almost done) + Intermediate + Advanced. Please use the HSK categories as in the example entries (most bluelinked belong to them) if you decide to work on them. Note the difference between trad. and simp. in categorizations. --Anatoli (обсудить/вклад) 05:42, 5 February 2013 (UTC)


 * Thanks. I've updated to account for those (damn) templates. Wyang (talk) 05:59, 5 February 2013 (UTC)


 * I wouldn't have a clue how to use it but thanks. Perhaps it's just easier to create entries manually. --Anatoli (обсудить/вклад) 06:12, 5 February 2013 (UTC)


 * Awesome template! @Anatoli, I made a generalised version independently based on the same idea at, but it's a lot worse because it's mainly for minority languages with less infrastructure. For this one, the documentation is currently at Template talk:cmn new. —Μετάknowledge discuss/deeds 06:45, 5 February 2013 (UTC)


 * I've seen it but I still don't understand. Do I need to add it to my User:Atitarev/common.js to be able to use it? What triggers this template and when and how do I use the parameters? --Anatoli (обсудить/вклад) 07:04, 5 February 2013 (UTC)

Template:zh-trad-to-simp
From experience I know that very large switch statements like the one in that template are very slow. I hope you'll take that into account and not use this template often, or always substitute it. You can also try an alternative approach, by using subpages, one for each character, in the same way as. That would be faster I think, especially when there are many options. 03:44, 8 February 2013 (UTC)

နေ
Hi, do both of the Proto-Sino-Tibetan words given at နေ really mean "sun, day"? Or does *g-na-s mean something else? —Angr 20:33, 13 February 2013 (UTC)
 * Yeah, my bad. Corrected. Wyang (talk) 22:14, 13 February 2013 (UTC)
 * Speaking of which, is also from, in spite of the different tone? —Angr 21:26, 10 March 2013 (UTC)


 * Yes, according to Paul Benedict's Sino-Tibetan: A Conspectus. Wyang (talk) 11:05, 11 March 2013 (UTC)

Q about cmn vs. msc
First off, many thanks for your various ZH and KO term additions! (I don't suppose you have any more detail about etym 3, like first appearance or quotes or anything?)

I read above that “equating "Mandarin" with (or using it to denote) "" or "" is just outright wrong.” However, the EN WP article on  says right in the first line that MSC == Mandarin, leaving me confused. I ask purely out of ignorance -- I studied some 普通话 for a couple semesters in university, but most of my time is taken up with Japanese. Given my meager understanding of the wide varieties of Chinese, I'm left wondering what MSC as a spoken lect would equate to, if not Mandarin? I thought Mandarin was the proper English label for 普通话, and I thought too that 普通话 was the same thing as MSC, but perhaps I'm way off the mark? Does "Mandarin", as you understand it, mean the Beijing dialects more specifically? Curious, -- Eiríkr Útlendi │ Tala við mig 07:46, 17 February 2013 (UTC)


 * Thanks. For agwi, I could only find quotations and dialectal forms (agu, akku), not Middle Korean forms. The addition of the obsolete sense of "mouth" by KYPark seems reasonable but needs checking though (may be dialectal instead; I only know of agari). A possible etymological connection between these is interesting: ag- (, ağız) is the common Altaic word for "mouth", "surviving" (from an Altaicist's POV) in Modern Korean as agari (derogatory: "mouth").
 * Wrt MSC, "Mandarin" is the name for a group of Chinese dialects, while MSC is a standardised variety of Chinese. There is no "Standard Mandarin" really, as MSC and written vernacular Chinese (the standardised written form of MSC) serve as de facto standards for spoken (in PRC, ROC, sg) and written (all Chinese-speaking regions) Chinese. Wyang (talk) 01:10, 18 February 2013 (UTC)


 * Thanks for both answers. Interesting about ag-; I note also that Turkish ağız purportedly derives from Proto-Turkic *āgıŕ, and that final "r" appears to have echoes in KO agari and JA anguri (“agape, gawping”).  That said, JA anguri looks like it might ultimately derive from verb aku, “to open”.  That might still be traceable to Altaic “mouth” words, but it seems to get tenuous, unless Altaic also has words of similar sound that have to do with “opening”.  I note that KO  doesn't seem to include any such ak or ag elements, though I suppose this might be the result of some phonetic change from an earlier form.  That said, it looks like 🇨🇬 had aç- (“to open”), from, , which is interestingly close to JA root ak “to open”.  -- Eiríkr Útlendi │ Tala við mig 01:24, 18 February 2013 (UTC)


 * Sorry, Wyang but I'm sure your answer to the second question is biased. In the Western world, "Mandarin" (language) stands for two things - 1) the most common Chinese dialect (or group of dialects) - 官话 (Guānhuà), 北方话 (Běifānghuà) and 2) the standard Chinese (Putonghua, Guoyu, Huayu) - 普通话 (Pǔtōnghuà), 国语 (Guóyǔ), 华语 (Huáyǔ). It's just a reality. People study Mandarin at universities. Even though "standard Chinese" would a more correct term, it's seldom used, even in academic circles. Dictionary names still use just "Chinese", e.g. Chinese-English dictionary. --Anatoli (обсудить/вклад) 01:27, 18 February 2013 (UTC)


 * Yes, but the majority of people who call it that probably think of Chinese as a simple dichotomy between Mandarin and Cantonese.. Wyang (talk) 02:57, 18 February 2013 (UTC)


 * There may be some who do but dictionaries only describe the language as it used. I can attest that Mandarin classes where people are especially aware what Mandarin actually is, still use either Mandarin or Chinese to refer to the standard Chinese language they study, even when they study standard Chinese. There are too many names and too many language codes. The current practice is not based on the lack of knowledge or confusion but a compromise. We use "Mandarin" header, even if we talk about Northern Chinese dialects (not a standard Chinese term), like 啥, etc. --Anatoli (обсудить/вклад) 03:33, 18 February 2013 (UTC)


 * I believe it is an inappropriate and inefficient compromise, as the 15 or so headings for Chinese will largely turn out to be reduplications of each other eventually. Wyang (talk) 03:43, 18 February 2013 (UTC)


 * You probably mean a different issue now. Words that are 95-99% identical in dialects but are split into Mandarin, Cantonese, etc.? There are not many editors eager to develop dialects. Min Nan and Cantonese are a slight exception. I don't think you'll have luck persuading the community to merge them into one language but if you stay longer, you may get a case. --Anatoli (обсудить/вклад) 03:51, 18 February 2013 (UTC)


 * That's what I meant. Using "Mandarin" to denote something that should be more appropriately labelled "Chinese" only seems fine now because there is currently practically nil additions in other varieties, but it will increasingly appear less appropriate as the category of "Mandarin" start to become saturated and other varieties grow. Wyang (talk) 04:02, 18 February 2013 (UTC)


 * Well, the community warmed up over time to merging Serbo-Croatian varieties, Romanian and Moldavian, Indonesian and Malay wasn't successful, Hindi and Urdu was never attempted. You can always try and raise it again at Beer parlour. What are you suggesting? Having ==Chinese== header and list all dialect pronunciations? --Anatoli (обсудить/вклад) 04:12, 18 February 2013 (UTC)


 * Yes, examples: compound, character. I agree with you; I too (highly) doubt this will pass if raised. Wyang (talk) 04:17, 18 February 2013 (UTC)


 * Wyang, I'm on your side because lesser bytes in many entries would be nice for me. --Lo Ximiendo (talk) 04:33, 18 February 2013 (UTC)
 * FWIW, I agree as well, not least as there is simply so much overlap between the various Chinese languages/dialects. It just seems wrong to have the headword and etym duplicated so many times over one single page.  And most defs, too, at that.
 * Though that does raise the question of how to handle cases where the same word has different definitions in the different langs/lects. -- Eiríkr Útlendi │ Tala við mig 04:44, 18 February 2013 (UTC)


 * Regional words can be marked with Category:Regional context labels (or, if it exists in too many varieties, simply "dialectal"). Wyang (talk) 04:49, 18 February 2013 (UTC)


 * Hmmm. 'Chinese' is linguistically inaccurate and likely to cause a godawful mess. On the bright side, we already have a godawful mess that is arguably worse. The pronunciation section could also be solved by means of Lua if all topolects go straight from romanisation to IPA without a hitch (Lua will hopefully also remove the need for overly complex templates like py-to-ipa and grc-ipa-rows). —Μετάknowledge discuss/deeds 05:02, 18 February 2013 (UTC)


 * @Wyang. All depends how strong the case is, how you present it. You need to know the moods of other Chinese editors and your possible opponents - their arguments. The arguments will need to be addressed. I'd hate to set up votes myself, since my only vote on banning entries like "Planck常数" in Chinese failed (Votes/pl-2011-10/Mixed script Mandarin entries), even with a compromised solution to having them as soft redirects. I would probably support your idea in the vote. --Anatoli (обсудить/вклад) 05:04, 18 February 2013 (UTC)


 * Just wondering, what would the potential "godawful mess" be like? Wyang (talk) 05:37, 18 February 2013 (UTC)
 * Well, I'll try to explain more clearly. This approach, or to be exact an approach very similar to this approach is, what they use at zh.wikt, right? It works at zh.wikt because Chinese is something that everybody there knows and that enough people are willing to upkeep. Around here, the merge wouldn't be pretty. Sure, Hakka and Wu will go without a fight. But Cantonese entries, for example, will sometimes diverge from other languages or have a different level of detail and merging those will often take a human, not to mention that I'm already assuming that somebody is running a bot to do all the easy parts and to import data from zh.wikt, most likely. There are a lot of characters and shared words. So, if you're volunteering to write and run a bot, to sift through entries and to edit your way through massive categories, I still might not support, but I wouldn't oppose. The problem is that if we don't have someone, we could get a mess at least as bad as this one, especially if our format got frozen half-and-half Chinese-based/topolect-based or something horrible like that. —Μετάknowledge discuss/deeds 16:04, 18 February 2013 (UTC)

Appendix:Proto-Sino-Tibetan/m/s-glak ~ m-glaŋ
For technical reasons, we really oughn't to have a slash in the title... —Μετάknowledge discuss/deeds 23:43, 23 February 2013 (UTC)
 * Is the m/s actually a part of the reconstruction, or does it indicate alternative reconstructions? Normally, alternative forms get their own page. 23:47, 23 February 2013 (UTC)


 * It is part of the reconstruction; there is often variability in the prefix. I see it seems to be uninterpretable by . Don't know how to fix it though - perhaps create a ? Wyang (talk) 23:50, 23 February 2013 (UTC)
 * I don't know anything about Sino-Tibetan, but are you saying that is a prefix, but it's not known whether it was  or ?  23:53, 23 February 2013 (UTC)


 * *m- and *s- were the prefixes reconstructable that could be added to the stem/root to form derivatives. So the whole etymon (treated as a single unit, a word family which would contain multiple allofams) is written *m/s-. Wyang (talk) 23:57, 23 February 2013 (UTC)
 * Doesn't that technically mean that no single form is actually reconstructable for Proto-Sino-Tibetan proper? If this word has two different prefixes that cannot be cognates, then it seems to me that this word didn't actually exist in PST but was only formed later, and that one branch used s- while the other used m-. 00:16, 24 February 2013 (UTC)


 * It is possible for differently prefixed forms to exist in a language simultaneously, with the derived words having divergent or largely identical meanings. For example: ཉལ་བ. Wyang (talk) 00:21, 24 February 2013 (UTC)

Etymology of Latvian tirgus
As per you requested on the talk page, I have just added an etymology to that word (plus examples, derived terms, and a picture). --Pereru (talk) 09:51, 26 February 2013 (UTC)


 * Thanks! Looks great as usual. Wyang (talk) 23:30, 26 February 2013 (UTC)

All those ko new templates
What are they for? And why so many? 14:44, 26 February 2013 (UTC)
 * They allow for semi-automated creation of Korean entries and IPA transcriptions. Very helpful, too. —Μετάknowledge discuss/deeds 23:29, 26 February 2013 (UTC)


 * Yes. For edits like the one at 경쟁력. Wyang (talk) 23:30, 26 February 2013 (UTC)
 * Why not just write a Module that does it? That wouldn't require dozens of templates and would be much easier to do. 23:39, 26 February 2013 (UTC)


 * I find MediaWiki code more familiar to work with. I've never used Lua before (so it probably wouldn't be easier, for me..). Wyang (talk) 23:42, 26 February 2013 (UTC)
 * The more I work with Lua, the more I realise how useful it is. It allows you to remove templates and parameters that aren't actually necessary because Lua is able to split strings and look at individual characters. A Lua function that converts, say, Hangul to IPA could be written in only a few lines of code. 00:09, 27 February 2013 (UTC)


 * OK, will study this when I have time. (A few lines probably won't be enough, considering the complexities in ). Wyang (talk) 09:43, 27 February 2013 (UTC)

黄泉 etym
I dimly remember reading about the reason that the afterworld is associated with yellow springs, but it's been quite a while and I've forgotten most of the story. Could you add something about that to the etym, assuming of course that you're familiar with the tale? It's a bit obscure otherwise. :) -- Eiríkr Útlendi │ Tala við mig 15:29, 13 March 2013 (UTC)


 * Sure, added. Don't know if my explanation is understandable though :) Wyang (talk) 03:57, 14 March 2013 (UTC)

Mandarin translation of dowsing
Thanks for splitting the SoP translation. I was just being lazy. It's so much easier just to use the JavaScript tool. --Anatoli (обсудить/вклад) 23:30, 20 March 2013 (UTC)
 * No worries. Wyang (talk) 23:33, 20 March 2013 (UTC)

Middle Chinese → Sino-Xenic
I was looking around the Tubes for a handy chart showing how Middle Chinese initials, medials, and finals generally change upon entrance into Japanese, Korean, and Vietnamese, but I could only find charts with example words, for the most part, not generalizations about the phonemes. To what degree is it in fact predictable, and if that degree is high enough, is there a chart anywhere? Thank you —Μετάknowledge discuss/deeds 17:51, 23 March 2013 (UTC)
 * I'm pretty confused... *nyijH and *sijH have the same syllable coda and tone AFAICT. The Sino-Japanese descendants have the same vowel, but the Korean and Vietnamese ones don't. Why is that? —Μετάknowledge discuss/deeds 18:20, 23 March 2013 (UTC)


 * There are detailed explanations of how each Middle Chinese initial/final/tone corresponds to modern Chinese and Sinoxenic readings in many publications written in CJKV languages, however maybe not so much in English. I had a search of Wikipedia and could only find this rather (unnecessarily) rudimentary page at ; the other language versions are more detailed:, , , , . As for predictability, I estimate 90-95% of modern readings to be regular. The percentage is a lot less in Chinese varieties with prominent . The reason for the difference in the vowels is the difference in MC initials. This -ij rhyme corresponds to:


 * Japanese: -i
 * Middle Korean: -uy (velar and laryngeal initials) (> Modern i), -o (coronal sibilant initials) (> a), -i (else)
 * Vietnamese: -ư (coronal sibilant initials), -i/y (else)

Wyang (talk) 07:45, 24 March 2013 (UTC)
 * I see, excellent. And thank you for the Wikipedia links; they are very slow reading for me, but better than the English resources by far. How often does the initial affect the vowel like that? (PS: All you have to do is learn Lua and you will be worthy of worship. Your knowledge is impressive in the extreme.) —Μετάknowledge discuss/deeds 16:14, 24 March 2013 (UTC)


 * I'm flattered :) Influence by initial or medial glide is very common. Almost every rime corresponds to multiple finals in the modern language, with the exact reflex depending on initial or glide (-y-, -w-, -ɣ-). Just found a good Wikipedia article explaining the correspondence between MC finals and Beijing Mandarin ones: . Wyang (talk) 00:31, 25 March 2013 (UTC)

Found an issue
Sorry to bug you. Please read Template_talk:ja-romaji. --Anatoli (обсудить/вклад) 04:30, 4 April 2013 (UTC)
 * Replied there. Wyang (talk) 06:47, 4 April 2013 (UTC)


 * The problem keeps coming back, please see Template_talk:ja-romaji. We have removed all deprecated parameters, so it should be easier to code. --Anatoli (обсудить/вклад) 11:43, 7 April 2013 (UTC)

Beer_parlour/2013/April
Hi,

Please say if you have an opinion on this topic. --Anatoli (обсудить/вклад) 04:35, 8 April 2013 (UTC)

拆開
Hi Wyang, I love the automatic entry-creation script, but the IPA they generate seems to be incorrect. See 拆開 for example. Is this fixable? ---&#62; Tooironic (talk) 00:48, 9 April 2013 (UTC)
 * Thanks, but how is the IPA incorrect? Wyang (talk) 01:01, 9 April 2013 (UTC)
 * Also, that template allows more functions:
 * type= : 21 (eg. 市政厅), 12, 22 (if length > 2)
 * e1=, |e2= : new etymology section, definitions for the first and second characters
 * c1=, |c2= : if components for etymology are different from the first and second characters
 * : more definitions
 * wp= : link to zh.wikipedia
 * eg.
 * 市政厅: n
 * 落實: a
 * Wyang (talk) 01:10, 9 April 2013 (UTC)


 * Thank you, that's extremely helpful. I don't actually speak IPA but, I dunno, it doesn't look right to me. E.g. at 内地 is that really how it's written? So many numbers... ---&#62; Tooironic (talk) 01:15, 9 April 2013 (UTC)


 * That's just the tones and the tone sandhis in Beijing Mandarin. Superscripts 1-5 are the same as tone symbols ˩˨˧˦˥; they are easier to recognise typographically. Wyang (talk) 01:19, 9 April 2013 (UTC)


 * Wyang, how do you invoke the accelerated entry creation? --Anatoli (обсудить/вклад) 03:50, 9 April 2013 (UTC)


 * I got, thanks. Have you documented the code somewhere? c1 and c2 don't seem to work. I tried n . Also how do you create entries for simp. or trad. only (not both)?--Anatoli (обсудить/вклад) 03:58, 9 April 2013 (UTC)


 * It worked when I tried on 市政厅 . The instructions are at . This code is for one entry only, so for simp+trad, you have to submit the code on both pages, and for one of simp/trad, it's one submission at the missing entry. Wyang (talk) 04:10, 9 April 2013 (UTC)


 * Nice tool!I got it to work with the shorter version (cmn new/a) on 废除 and 廢除
 * User:Ruakh has developed a nifty tool (User:Ruakh/Tbot.js) for accelerated Russian entries creation from translation sections. I enabled it here. Do you think you could create the same for Mandarin? Just clicking on a red link (green if the script is enabled) in translations creates an entry in Russian. I'm just filling the rest manually (inflection, etc.). --Anatoli (обсудить/вклад) 04:28, 9 April 2013 (UTC)


 * The cases are a little different. In translations {{t|cmn for trad is not assigned a transcription, so it would not (?) be possible to extract pinyin for the missing trad entry. I'm not used to writing .js things like that, so having me to digest what's written there probably will take days. The current code is simple enough, for me... Wyang (talk) 04:41, 9 April 2013 (UTC)


 * I thought I'd let you know. The transcription for trad. is the same as for simpl., so tr= can be copied from simplified.
 * After enabling 'cmn' and clicking on the green link (减弱 - appears green on mine in Translation section) in abate I instantaneously got this:

Verb

 * 1) abate to bring down or reduce to a lower state


 * The gloss, the part of speech, tr is all there, only the code is generic and uses {{t|head. I find both yours and his work amazing pieces for accelerated development, keep up the good job. --Anatoli {{sup|(обсудить}}/{{sup|вклад)}} 04:52, 9 April 2013 (UTC)


 * With cmn the trad-simp conversion has to be done in addition, which is preferably achieved through substitution of existing trad-simp lists (which is what {{cmn_new is doing). I envisaged filling a missing entry with the code {{subst:cmn new/a|p1=PINYIN|PoS|defn}} when clicked, then having to decompose the PINYIN into syllables separated by |p2= etc. Since trad form is not assigned a |tr= in translation sections, one would have to copy the pinyin manually. So for simp it's one extra step of decomposition of PINYIN, while for trad it's two extra steps. The overall process is not hugely simpler than substituting {{cmn_new from scratch, that way the definition and SoPness are also checked (which is important for cmn as the definitions are likely different from the translation glosses). Wyang (talk) 05:04, 9 April 2013 (UTC)

Adding new languages
Hi,

Sorr to be a nuisance. Could you please write a basic script to create new Russian and Japanese entries (in this order) like you did for Mandarin, Korean and Vietnamese?

A basic Russian (ru) noun entry is very simple but it needs gender (g) and transliteration. For example: "лютик" - a buttercup (leaving the entry uncreated)

Noun

 * 1) buttercup

Interjections, conjunctions, particles, prepositions use &#123;&#123;head||ru|preposition... Japanese entries are more complicated, divided into hiragana, katakana, kanji and I don't know if it's feasible. --Anatoli (обсудить/вклад) 00:45, 11 April 2013 (UTC)


 * No worries. Russian one done at . Don't really know how Japanese entries should be formatted. If you can point to me all format possibilities, that'd be great. Wyang (talk) 01:12, 11 April 2013 (UTC)


 * Wow! That was quick. Thank you! Will test and come with some feedback. Japanese can be basic and more complex, depends how far you're willing to go. Didn't give as I didn't know if you would agree. --Anatoli (обсудить/вклад) 01:36, 11 April 2013 (UTC)


 * Japanese should be alright. Forms are easy to detect (see if it's a pure hiragana or katakana string, assign it as such if so; otherwise, kanji if no kana present, mixed if kana present), and script conversion shouldn't be an issue as well (hira to kana and to romaji, or the reverses, depending on requirement). Wyang (talk) 01:40, 11 April 2013 (UTC)


 * JA noun example 愛国心, kanji, need to provide kanjitab (this is done simpler than Chinese hanzi), wikified hiragana, romaji:

Noun

 * 1) patriotism


 * JA noun example あいこくしん, hiragana, need to provide romaji and &#123;&#123;ja-def:

Noun

 * 1)  patriotism


 * JA noun example アニメ, katakana, need to provide romaji and, hidden index (convert katakana to hiragana アニメ -> あにめ but without voiced consonants (e.g. が (ga)-> か (ka)) - this part may be hard, will check with Haplology or Eirikr). It's fine if just a parameter, without any tricks.

Noun

 * 1) anime



Thanks. Will look into these. (Forgive me if I seem unresponsive and distracted by real life...) Wyang (talk) 01:58, 11 April 2013 (UTC)


 * Hi, no pressure at all, just reminding that I still need Japanese templates. A simple template, without IPA or script conversions will do, as long as the formatting matches the above. --Anatoli (обсудить/вклад) 03:25, 12 April 2013 (UTC)


 * Done: see . Wyang (talk) 04:01, 12 April 2013 (UTC)


 * Great stuff, thank you! Only I don't understand how the template detects the script. Is it automatic or by the params used?
 * Please see ラジオカセット, the romaji has "ッ" in "rajiokaseッto". --Anatoli (обсудить/вклад) 04:16, 12 April 2013 (UTC)


 * It's automatic. Yeah, I haven't done geminate consonants yet... Now done. Wyang (talk) 04:31, 12 April 2013 (UTC)


 * I see. It's only for nouns at the moment? Isn't it? Even so, is not so great. :) --Anatoli (обсудить/вклад) 04:33, 12 April 2013 (UTC)


 * Fixed. (I didn't know what's the parameter for mixed script, so I just put 'm') Other PoS enabled too. Code has been altered, see the template for example usages. Wyang (talk) 04:48, 12 April 2013 (UTC)

Script detecting
Is M.script in Module:ja for detecting the script? If so, there are easier ways to do so. --Z 08:21, 11 April 2013 (UTC)

More requests
I am a serial pest and I'd like to ask you for two more very simple templates - pinyin and romaji, if you haven't created them yet. (I hope I can learn from these how to create my own).

Pinyin is standard, romaji is still in a bit of limbo but we have many thousands romaji entries, so I don't see them reverted soon.

Pinyin can have one or two parameters (and more) per line, e.g. dòngnéng:

Romanization
Romaji can have up to six params, e.g. akachan:

Romanization
They are not as important and the templates are already very easy. Perhaps this could be done differently, by some accelerated method like English plurals (green links) or something. --Anatoli (обсудить/вклад) 06:50, 12 April 2013 (UTC)


 * No worries. Done at and . (despite the fact that I do not support keeping these romanised entries in the long term...) Wyang (talk) 12:53, 12 April 2013 (UTC)


 * 好极了！多谢啊.

rs value for 纯洁
Hi,

I think Template:cmn new incorrectly generated rs value for "纯洁". Should be 纟04, not 糸04. But I used used "t", not "s" parameter (by mistake). It didn't matter on "厂长" though, where I also used "t" (I wonder how the script figured it out). --Anatoli (обсудить/вклад) 01:12, 17 April 2013 (UTC)


 * I see. They are basically the same radical, one is the combining or simplified form (纟) of the radical form (糸). Characters should be listed under the radical forms. I don't think I'm able to separate the two at the code level, since the subpages of Index:Chinese radical (where I extracted my rs values from) do not differentiate the two. I suggest merging the combining or simplified forms into the radical forms (like those subpages), which can be done if you replace all the in Category:Mandarin headword-line templates with  (They are protected so I can't edit them). Wyang (talk) 01:39, 17 April 2013 (UTC)


 * There is no need to specify traditional/simplified. They are detected automatically using conversion lists . Wyang (talk) 01:41, 17 April 2013 (UTC)


 * Thank you. Automatic script detections seems to work well. I know that 纟 is a simplified form of 糸. They've been usually sorted differently though. You lost me about your suggestion. Are you basically suggesting that both simp and rad. were sorted the same way, if they use radicals, not pinyin for sorting? Will this also affect characters like 門 and 开? Not sure if I understand this correctly. --Anatoli (обсудить/вклад) 02:17, 17 April 2013 (UTC)


 * I was suggesting that the combining forms and simplified forms of the radicals be treated as identical to the basic radical forms. It would affect 門 but not 开. Or, even better, just sort everything in pinyin and get rid of this parameter altogether (currently uses an awful mix of sorting methods if you have a look at the code). Wyang (talk) 03:35, 17 April 2013 (UTC)


 * From a Western perspective, I was wondering this whole time why we don't just sort by pinyin. I assumed it is out of respect to traditional Chinese lexicography. In any case, I'd support that. —Μετάknowledge discuss/deeds 03:57, 17 April 2013 (UTC)


 * Sorry, I put the wrong character, I meant 開 and 开, the two equivalents (t/s).
 * @Metaknowledge. The arguable benefit for sorting by radical, not pinyin was meant for people (primarily Chinese) not familiar with pinyin, speakers of dialects. Especially applicable to overseas Chinese where pinyin was not taught and various input methods exist, which don't rely on pronunciation of characters. The designer of this method - User:A-cai comes from Taiwan where dictionaries are also structured more towards radicals and romanisation, including two variants of pinyin have changed over time and still don't enjoy the full support of the population.
 * I already expressed my support to sort words by numbered pinyin. If you're able to do it, go ahead. All Chinese speaking editors already expressed supported. I think for people not knowing Chinese, it's still possible to find words they want just by entering them in the search window. Sorting by pinyin is more beneficial for learners than for native speakers.
 * This discussion reminded me of another thing I wanted to ask Chinese speaking editors. (Tooironic wasn't very enthusiastic about it, when I asked him a while ago). Please take a look at Category:Japanese_terms_by_their_individual_characters. I think creating categories for by their individual characters is a brilliant idea, even though in Chinese, their number can be quite big but finding words that use a specific character is very helpful for learners. But creating categories can be done automatically, can it not? Adding to categories is done via Template:ja-kanjitab. We could tweak Template:Hani-forms and Template:zh-hanzi to automatically add words to categories like e.g. Category:Mandarin terms spelled with 始. What do you think? User:Daniel Carrero has created this, see User_talk:Daniel_Carrero. --Anatoli (обсудить/вклад) 04:44, 17 April 2013 (UTC)


 * I don't think I support creating such categories (for Chinese and Japanese). It's going to be more troublesome to maintain those categories for Chinese. All one has to do to find all compounds and related pages of 始 is to go to or some page like this. Wyang (talk) 05:08, 17 April 2013 (UTC)


 * The trouble with those links are that they mix all other languages using the same character, translations and pinyin entries. The format is no user-friendly either.
 * Daniel might be able to help with making it work. When a Japanese entry is created, the categories are added automatically. Their structure is identical and only differs by the kanji and sort value. The category Category:Japanese_terms_spelled_with_始 lists only Japanese words, no other. Just looking at the list is educational and shows how words can be created with the character, especially common infixes or suffixes. I think creation of characters can be automated as well. --Anatoli (обсудить/вклад) 05:24, 17 April 2013 (UTC)


 * Creating these is easy, but I am not particularly fond of the idea. These should be information provided at the character page, which should explain the definitions and the relevant compounds. It shouldn't be done with thousands of categories. Wyang (talk) 05:29, 17 April 2013 (UTC)

IPA 今兒 and 今儿
It may be hard to generate correct IPA for erhua (儿化) by your template. Besides, the reading can be as expected, like 女儿. Anyway, could you fix the IPA and check the entries otherwise, please?

For words with 儿 we agreed to make alternative forms, like 沒事兒 and 没事儿, unless it's a different word. 今儿 probably won't qualify as an alternative form of 今天. --Anatoli (обсудить/вклад) 01:56, 19 April 2013 (UTC)


 * p1= should be 'jīnr' and |p2= is empty (although the template automatically generates pronunciation for the second character too... will solve this with the new Module:zh-based template). The pronunciation is still generatable by substituting, eg. jīn . I've created . These are not really alternative forms, they are diminutive forms (like Dutch -je, English -ling) with sometimes different meanings. Wyang (talk) 01:58, 19 April 2013 (UTC)


 * Thanks. Just on the erhua forms. It may be worth separating words that simply attach 儿: 没事-> 没事儿 to those that replace another form? 今儿 = 今天, 这儿 = 这里. I think 今 is seldom used as "today" and 这 doesn't mean "here". --Anatoli (обсудить/вклад) 02:05, 19 April 2013 (UTC)


 * 这儿 is formed from 这 ("this, such, here"), not 这里. The development of 这 -> 这里 is parallel to the development of 这 -> 这儿. The original un-erhua-ed forms may not be in use colloquially, but etymologically this is how erhua forms are generated. Wyang (talk) 02:10, 19 April 2013 (UTC)


 * I'm aware of this (origin). I just found Category:Mandarin erhua terms, could you perhaps repoint the template and/or merge the categories? --Anatoli (обсудить/вклад) 02:33, 19 April 2013 (UTC)
 * Repointed the template. Also, how can you enable sorting by pinyin? --Anatoli (обсудить/вклад) 02:54, 19 April 2013 (UTC)


 * They should be sorted now. Wyang (talk) 03:25, 19 April 2013 (UTC)


 * Thank you. --Anatoli (обсудить/вклад) 03:26, 19 April 2013 (UTC)

Template:cmn new and erhua
Could you have a look at this, please? The generated pinyin and IPA wasn't right:


 * Oops, forgot the entry: 夫妻店儿, created by Tooironic. --Anatoli (обсудить/вклад) 12:10, 19 April 2013 (UTC)


 * Yeah... Avoid erhua entries for now. Let me work on Module:zh. Wyang (talk) 12:07, 19 April 2013 (UTC)


 * OK, no worries. --Anatoli (обсудить/вклад) 12:10, 19 April 2013 (UTC)

Etym for 年度
I was hoping you might be able to help with the etymology for Japanese. I suspect this was in use in China some time back, but there's a slim chance it's a more modern coinage. Do you have any insight? If so, please change the etymology there as appropriate. TIA, -- Eiríkr Útlendi │ Tala við mig 16:49, 15 May 2013 (UTC)

Come back
Hi,

You should come back, there's a lot of work with Mandarin or Chinese if you want to call it so. --Anatoli (обсудить/вклад) 01:56, 24 May 2013 (UTC)

JA etyms for Buddhist terms
I'm chewing on the etymologies for various kinds of 如来. I'm working on the assumption that Buddhist terms would have been imported from Chinese wholesale; JA-JA dictionaries give etymologies traced back to Sanskrit, which must have come via China, especially considering the history of Buddhism and even literacy in Japan.

However, I'm uncertain if names like were brought into Japanese as an integral unit, or if the name portion of  came into Japanese, with the  added in Japan. I ask because I'm noticing that some of the Five Dhyani Buddhas show up on the ZH WT with the epithet instead of, such as zh:w:不空成就佛 vs. ja:w:不空成就如来. I do see that zh:w:不空成就如來 redirects to the zh:w:不空成就佛 entry, and does find over 100K hits, but in terms of etymologies, I'm not sure what's best.

For now, I'm going on the assumption that the names were imported into JA as integral units from Middle Chinese. Please hit me over the head with the cluebat as necessary. :) &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 17:43, 24 July 2013 (UTC)


 * 大日如來 is attested in "大日經" ("大毗盧遮那成佛神變加持經"), translated by into Chinese in 724. I'd agree with your assumption. Wyang (talk) 03:54, 25 July 2013 (UTC)

Etymologies again -- KO and JA
Thank you for your etymological activities of late, I very much appreciate the fuller picture of various KO terms.

Along similar lines, I was wondering about. Modern JA has, , deriving from OJP opoji, which at first glance looks like a possible relative to KO.

That said, OJP opoji is itself a compound of opo “big, great; many” (root of modern JA, ) + ji < chi =. Any chance that is also of compound derivation?

TIA, &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 18:43, 25 July 2013 (UTC)


 * abeoji was abi in the 15th century. The form abeoji clearly violates vowel harmony and seems to be of late origin, formed from abi + some sort of suffix -Aji. The -Aji (-아지/어지) suffix is probably the same suffix as in 바가지 (< 박, "gourd"), 싸가지 (< 싹, "hope" < "bud"), or even the diminutive suffix -ngaji in 송아지, 강아지. The first component abi was probably Altaic: Turkish aba, Mongolian abu. I'm not sure of the etymology of the Japanese words. Wyang (talk) 00:28, 26 July 2013 (UTC)


 * Brilliant, exactly the kind of detail I was hoping for. It sounds pretty clear from your description then that opoji and abeoji are only superficially similar, and that OJP chi, "male" is no match for Middle Korean aji, "diminutive suffix".  :)  Thank you!  &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 01:12, 26 July 2013 (UTC)

“Phonosemantic interpretation” of Chinese characters
Just a heads-up that I’ve commented on a discussion you’ve participated quite extensively in: Beer parlour/2013/June Cheers!
 * —Nils von Barth (nbarth) (talk) 14:53, 1 August 2013 (UTC)

老头儿 and 노틀
Do you have a source to back up why you think these two terms are related? I can't find anything on the Internet. ---&#62; Tooironic (talk) 09:22, 1 September 2013 (UTC)


 * For example, . Wyang (talk) 12:52, 3 September 2013 (UTC)

Korean templates
Hi,

How do I use your Korean template(s) to generate RR transcription for Korean terms? --Anatoli (обсудить/вклад) 22:51, 9 November 2013 (UTC)


 * Hi, just created a new template. Please use 가방 . Wyang (talk) 03:13, 11 November 2013 (UTC)


 * Thank you! Do rule match those described by Shinji, his Korean link and the French templates? If you don't know, then I guess, I will need to run test cases with your templates as well and see if they are acceptable. Can it work for single words only, not strings with spaces? --Anatoli (обсудить/вклад) 03:22, 11 November 2013 (UTC)


 * It basically matches the official guidelines, except that it uses dashes in consonant + vowel syllable divisions, i.e. bur-yaseong instead of buryaseong (This can be changed by replacing all "-" in the produced string). This template was written before the advent of Lua, so it is quite slow and may not be useful with long strings, but for simple strings like 5 - 6 characters this should be sufficient. To incorporate Lua into this template would probably involve a quite substantial rewrite. Wyang (talk) 03:29, 11 November 2013 (UTC)


 * I just wanted to verify what it does. If it can work with short strings but accurately, it can still be used to verify {[user|Kephir}}'s module or add test cases based on the result from your template. E.g 있습니다 worked OK - "issseumnida" but 갋 gives "ga". You probably need to test it with Module:ko-translit/testcases examples (some examples don't match the RR transcription rules). --Anatoli (обсудить/вклад) 03:50, 11 November 2013 (UTC)


 * Yes double consonants in codas were not included - couldn't be bothered to do research and generate a large matrix of how to romanise all combinations (also because exceptions are quite common), so just left out this bit altogether. Wyang (talk) 03:58, 11 November 2013 (UTC)

"measure word", "counter" and "classifier" - headers
Hi,

You might be interested in this topic: Beer_parlour/2013/November. --Anatoli (обсудить/вклад) 01:44, 29 November 2013 (UTC)

cháy
Thank you for adding an etymology to this entry. I'd love to add etymologies like these to the Vietnamese Wiktionary, but I've had very little luck finding etymologies apart from modern loanwords. What sources do you consult?

Also, I noticed that the Mường word cal³ is given in an orthography I'm not familiar with. The Vietnamese Wiktionary has been using the Vietnamese-based orthography that seems to be ubiquitous among Vietnamese academic and government sources, since the Mường live in Vietnam. (This orthography uses Vietnamese tone marks rather than tone numbers.) Where can I find out about the orthography you're using?

– Minh Nguyễn (talk, contribs) 05:39, 16 December 2013 (UTC)


 * Hi Minh. The Sealang Mon-Khmer Comparative Dictionary is a very useful resource for this purpose, which includes results from Shorto's Mon-Khmer Comparative Dictionary, and Ferlus' unpublished 2007 manuscript "Lexique de racines Proto Viet-Muong" (from the POV of Vietnamese). The notation is per Ferlus (2007) - The Mường form using Vietnamese diacritics appears to be chẳl. Wyang (talk) 06:00, 16 December 2013 (UTC)


 * Wow, that's awesome! What's the copyright status on the database? Some of the citations have tooltips that say, "Do not cite entries from this manuscript!" What's that about?


 * If I'm not mistaken, spellings like cal³ in the database are just IPA transcriptions with Chinese-style tone numbers. It would be more appropriate to use the "local orthography" field in . Some linguists use ad-hoc transcriptions, but the Vietnamese-based one seems to be prevalent in the media and other dictionaries. Would you mind if I changed transcriptions like cal³ to chẳl when I see them?


 * – Minh Nguyễn (talk, contribs) 06:51, 16 December 2013 (UTC)


 * Sure, I have changed it myself. Are there any good online resources describing the phonology or orthography of tieng Muong, or other Vietic languages? Googling in English and Vietnamese does not seem to yield much useful. The tooltip note means the work is a preliminary unpublished manuscript and is subject to errors. Wyang (talk) 21:38, 16 December 2013 (UTC)


 * I've found very little online, but here's a decent primer on the Mường orthography (Flash Paper) written in Vietnamese. In general, Vietnamese glosses are given after Mường words. Let me know if you have any questions about this document. – Minh Nguyễn (talk, contribs) 13:09, 17 December 2013 (UTC)


 * Excellent, thanks! :) Wyang (talk) 01:15, 18 December 2013 (UTC)

Template:vi new
I'm trying to find new Vietnamese words but what is this template supposed to do for me? TeleComNasSprVen (talk) 07:44, 28 December 2013 (UTC)
 * There's no documentation or examples but I have just created [[cùng nhau]] using this code on a blank page: . --Anatoli (обсудить/вклад) 13:13, 28 December 2013 (UTC)

NEW templates
Hi,

Could you document a bit your templates, like Template:vi new and Template:ko new. A few examples would do. Not sure what happened with phim (thanks for fixing!). I used. What did I do wrong? --Anatoli (обсудить/вклад) 02:11, 6 January 2014 (UTC)


 * Hi, should be phim Wyang (talk) 02:18, 6 January 2014 (UTC)

Inaccuracy in the Korean verb template
Hi, please join this discussion, if it's okey with you. --Anatoli (обсудить/вклад) 09:43, 23 January 2014 (UTC)

Zhuyin and erhua
Hi,

Thanks for your efforts on the conversion module! I posted a question there (described as I personally see it), copying here:

Issue with erhua: Since erhua is very unpopular in Taiwan, we still need to convert them correctly but they may be back conversion problems. ㄦ is equivalent to a full syllable "ēr" (first tone without a tone mark. To convert Pinyin like wánr and dàir probably need to do ㄨㄢˊㄦ˙ and ㄉㄞˋㄦ˙. Converting them backward would give wáner (wán+er) and dàier (dài+er). I can't find a definite explanation of how to transliterate erhua using Zhuying but Pleco uses ㄦ˙ (with a neutral tone marker) to render the "-r" suffix.

Could you reply at Grease_pit/2014/January, please?

I have also put there my ideas about what needs to happen next, when coding and testing is complete, please comment, if you can. --Anatoli (обсудить/вклад) 22:30, 27 January 2014 (UTC)


 * Hi, replied at Module talk:PinyinBopo-convert/testcases. Wyang (talk) 22:58, 27 January 2014 (UTC)
 * I've commented at Module_talk:PinyinBopo-convert/testcases. Your feedback is important on this, if you're familiar with erhua spelling in Zhuyin. --Anatoli (обсудить/вклад) 01:01, 28 January 2014 (UTC)

Pinyin dì'èr shǒu, hm
Hi,

Could you take a look at Module:PinyinBopo-convert/testcases, please? "dì'èr shǒu" becomes "ㄉㄧˋ 'ㄜˋㄦ ㄕㄡˇ" but should be "ㄉㄧˋ ㄦˋ ㄕㄡˇ". Perhaps if apostrophes are removed before the conversion it'll work. Also what would Pinyin Zhuyin for Pinyin "hm" look like, as in 噷, also hèn? I'm just trying to cover all corner cases, not trying to bombard you with requests :). --Anatoli (обсудить/вклад) 22:09, 28 January 2014 (UTC)


 * Hi, no worries. The former was taken into account and works as expected: [Module call redacted]. However apostrophe in PAGENAME fails to be recognised no matter what I do to the module. Thus  fails at dì'èr shǒu. I am not sure what can be done to fix this. As for the latter, how should 'hm' etc. be transcribed in Zhuyin? Wyang (talk) 22:40, 28 January 2014 (UTC)
 * Thanks. I'll post a question on Grease_pit/2014/January regarding the apostrophes, they are quite common in pinyin. I'll also search more for "hm", the dictionaries I've checked so far only had "hèn". --Anatoli (обсудить/вклад) 22:48, 28 January 2014 (UTC)


 * I have imported the whole article 噷 from my Pleco dictionary (after setting tranlsiteration to Zhuyin), all usage examples also get Zhuyin:
 * 噷
 * ㄏㄇ˙
 * {interjection} (expressing disapproval or reproach) humph


 * 噷, 别提了.
 * ㄏㄇ˙, ㄅㄧㄝˊㄊㄧˊ ㄌㄜ˙.
 * Humph, don't bring that up.


 * 噷, 算了吧.
 * ㄏㄇ˙, ㄙㄨㄢˋ ㄌㄜ˙ ㄅㄚ˙.
 * Humph, forget about it.
 * So, string "hm" should be just "ㄏㄇ˙"--Anatoli (обсудить/вклад) 23:17, 28 January 2014 (UTC)


 * OK, should be like that now. Wyang (talk) 23:20, 28 January 2014 (UTC)
 * Thanks. I have asked a question about apostrophes. Have you tried using codes for apostrophes, rather than literals? --Anatoli (обсудить/вклад) 23:54, 28 January 2014 (UTC)


 * The apostrophe only fails to be recognised when it is part of PAGENAME, using it inside the string is not buggy (as I said above). For the latter, I am not sure I understand what you mean in the question.. Wyang (talk) 23:59, 28 January 2014 (UTC)
 * Sorry for being dumb, I have misunderstood you :) Could you clarify what you're trying to achieve at Grease_pit/2014/January because I can't help you here? Note I had to put double quotes around "dì'èr shǒu" in the test case but it's still reporting as "failed". --Anatoli (обсудить/вклад) 00:07, 29 January 2014 (UTC)


 * I am terrible at making myself understood. I have replied there, hopefully it is understandable. Wyang (talk) 00:15, 29 January 2014 (UTC)
 * No-no. It makes sense, I haven't read carefully the first time. Maybe a silly suggestion but have you tried - storing PAGENAME in a variable, displaying it first, remove apostrophe, display again, then convert, check result in this order? It's not easy to debug Lua, I know. --Anatoli (обсудить/вклад) 00:21, 29 January 2014 (UTC)


 * No, I haven't tried that. I guess I will wait for the more knowledgeable to kindly shed some light on the problem first, and resort to my dilettantish skills if all else fails... Wyang (talk) 00:31, 29 January 2014 (UTC)


 * More "weird" Pinyin and Zhuyin: ng=兀 (with various tones or neutral, Pleco lists a few) as in 嗯 and 㕶. 兀 is both a Han character and a Zhuyin symbol reserved for non-Mandarin sounds and interjections like this. --Anatoli (обсудить/вклад) 00:56, 29 January 2014 (UTC)


 * Should be OK now. Wyang (talk) 01:06, 29 January 2014 (UTC)


 * It might be more complicated as "ng" is realised as ńg, ňg, ǹg with tones. I would add it myself but I don't understand the code well. :) BTW. re 第二手 - User_talk:Atitarev, has some concerns about limiting it to Taiwan, me too :) --Anatoli (обсудить/вклад) 01:30, 29 January 2014 (UTC)


 * I think it is done too now. Wyang (talk) 02:08, 29 January 2014 (UTC)

第二手
Hi Wyang. What makes you think this is Taiwanese Mandarin? I've heard Mainlanders use it before. ---&#62; Tooironic (talk) 01:48, 29 January 2014 (UTC)


 * I don't know, I just thought it sounded strange, so I assumed it is a Taiwanese usage. To me it just means "secondary (information)". The more common term is 二手. It seems 二手 is more common in Taiwan as well. Wyang (talk) 01:59, 29 January 2014 (UTC)
 * I've changed to "rare" for the correct categorisation (Mandarin terms with rare senses, not rare forms), it doesn't mean I agree it's rare. I don't know. --Anatoli (обсудить/вклад)


 * I've been seeking used bookshops in Taipei recently and they all seem to use just 二手 rather than 第二手, or the other purported Taiwanese term 中古 for that matter. &mdash; 18:57, 5 February 2014 (UTC)

fèiyong
Hi,

Could you have a look at Zhuyin for fèiyong, please? I've added it to Module:PinyinBopo-convert/testcases--Anatoli (обсудить/вклад) 00:52, 6 February 2014 (UTC)


 * Hi, it seems Lua is having a bit of a meltdown at the moment, if you try . I don't know what is going on. Wyang (talk) 00:58, 6 February 2014 (UTC)


 * Sorry. I don't understand it but other cases seem to be alright at Module:PinyinBopo-convert/testcases. It seems i+y cause this. yòng looks OK. I wish I could help more. --Anatoli (обсудить/вклад) 01:04, 6 February 2014 (UTC)


 * There is no problem now. Could you have a look at Module talk:PinyinBopo-convert? Are all the conversions there working on your computer? Wyang (talk) 01:13, 6 February 2014 (UTC)
 * Yes, they do. I am on a lookout for these. Thank you!
 * Sorry to be a serial pest. I have 2 requests for and, when you have time and if you have interest. In Korean, the hangulisation template should be orphaned and deleted, IMO. We should use the standard  for loanwords (doesn't apply to Sino-Korean, Sino-Japanese, etc.). With Japanese, the template should produce simpler output, hiragana being the first parameter.

E.g. 招き猫Instead of:

Noun

 * 1) beckoning cat; figure of a cat with one paw raised

Should be just:

Noun
Note that hiragana, katakana may have spaces "まねき ねこ", which are not displayed but produce a more user-friendly romaji. Not urgent but it would be great to have. I will take a "no" for an answer if you rather not change. Appreciate your efforts! --Anatoli (обсудить/вклад) 01:26, 6 February 2014 (UTC)
 * 1) beckoning cat; figure of a cat with one paw raised


 * There have been many changes to the standard format of a Japanese entry, thanks to the simplification efforts by User:Haplology. I have changed Template:ja new to adapt (it seems) to those changes. As for Korean, you can use |ee=league in 리그. To change it to other languages, you can use |el=fr ... |el= is 'en' by default. Wyang (talk) 02:03, 6 February 2014 (UTC)


 * Yes, you guys are doing a great job. I have created 予習 using the modified template. Thank you again. --Anatoli (обсудить/вклад) 02:20, 6 February 2014 (UTC)


 * Thanks. The suru function also works: よしゅう . Wyang (talk) 02:22, 6 February 2014 (UTC)
 * Great feature! --Anatoli (обсудить/вклад) 02:32, 6 February 2014 (UTC)

Using in
I created this template some time ago to encompass both and. I am just wondering if you are willing to include it in your template? Jamesjiao → T ◊ C 00:51, 11 February 2014 (UTC)


 * Thanks, done. Someone bot-possessing should have obsoleted those long ago... Wyang (talk) 00:58, 11 February 2014 (UTC)

Re : Errors in Chinese character etymologies
Thanks for your comments on my edits.

I tried to correct them following your comments. I kept some of the wrong etymology and labelized them as mnemonic. Is it a good practice for the chinese character etymology section?

Feel free to comment or modify my edits as a am a beginner in wiktionary and english is not my native language.

Is it the good way to awnser your message?

Meihouwang (talk) 15:34, 20 February 2014 (UTC)

Pinyin with apostrophes
Hi Wyang, what do we do about pinyin that has apostrophes e.g. 反而, 西安? It seems to generate a question mark when I put the apostrophe in the pronunciation template. ---&#62; Tooironic (talk) 22:45, 25 February 2014 (UTC)


 * Hi, the template can handle pinyin with apostrophes correctly. The small superscript in the IPA represents the semi-, found in the onset of certain null-initial syllables. Wyang (talk) 03:05, 26 February 2014 (UTC)


 * Ah I see, thank you. ---&#62; Tooironic (talk) 03:42, 26 February 2014 (UTC)

一般
Is there a template to clearly show the tone sandhi change here (i.e. yībān as yìbān)? ---&#62; Tooironic (talk) 06:07, 2 March 2014 (UTC)
 * Hi. There is only one pronunciation (not variant pronunciations), and Pinyin only writes the non-sandhi form (yi1ban1) . Please see the page now. Cheers, Wyang (talk) 11:05, 2 March 2014 (UTC)
 * Looks great now, thanks. Is there a way to mention tone sandhi in the pronunciation box? I think that would be helpful to users who may not understand why the change occurs. ---&#62; Tooironic (talk) 08:48, 4 March 2014 (UTC)


 * On a side note, I'm having second thoughts about this not being a variant pronunciation. The 國語辭典 (a trusted Taiwan dictionary) lists 一般 as yībān, even in the pronunciation sample, along with example sentences, listen here: Do you think this is a Taiwan variant perhaps? I can't recall ever hearing a mainlander pronouncing it as yībān. ---&#62; Tooironic (talk) 08:52, 4 March 2014 (UTC)
 * To me the two pronunciations sound like the pronouncer's attempt to pronounce the two syllables as if they are in isolation, probably as a consequence of the Pinyin orthography being the non-sandhi version (yi1ban1) (i.e. spelling pronunciation). Instead, the two 一般's in example sentences show tone sandhi. I don't think it is a Taiwan variant, at most a rare one. There are many online resources describing the tone sandhi patterns of 一 and 不 in Taiwanese Mandarin - . Wyang (talk) 11:28, 4 March 2014 (UTC)
 * Ah, another thing. For the pronunciation template, it is not necessary to specify the audio file if the filename is 'zh-PINYIN.ogg'. Using '|a=y' suffices. Wyang (talk) 11:30, 4 March 2014 (UTC)

关于黑人僧
您好，关于黑人僧的发音，我查了 粵語審音配詞字庫，里面是t͡ʃɐŋ˥，因此我暂时先改回去了. 如果修改错误，还请您指出. 另外中文维基词典最近出现了一些奇怪的条目，可以麻烦您处理一下么. 谢谢. --Hahahaha哈 (talk) 06:05, 5 March 2014 (UTC)
 * Hahahaha哈您好. 的'z'是/ts/. 除作为/ts/颚化的变体之一以外（与/tɕ/），/tʃ/不存于广州粤语. 有关中文版，谢谢提醒，有时间我会去处理的. Wyang (talk) 06:10, 5 March 2014 (UTC)

黑人僧
That last character, shouldn't that be 憎? Or is that the joke? :) Also in the hanzi box (but not the lemma) at 乞人憎.  &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 01:01, 8 March 2014 (UTC)
 * That is part of where the pun is. There was a typo in 乞人憎 which I have fixed. This is a  in Chinese, the first part being 非洲和尚 (a monk from Africa), and the last part being 黑人僧 (a black monk) (another synonymous way of putting it) - 乞人憎 (makes people hate) (the near-homophone). Someone might say "something is really 非洲和尚", to mean "something is really annoying". Cheers, Wyang (talk) 03:40, 8 March 2014 (UTC)

鰓 and 鳃
I already ever edit hanzi entries. Is the Hanzi heading supposed to be retained? ---&#62; Tooironic (talk) 02:20, 11 March 2014 (UTC)
 * I'd leave them for now for each single character entry. There's no decision on changing this yet. --Anatoli (обсудить/вклад) 03:57, 11 March 2014 (UTC)


 * I don't know as I rarely edit hanzi entries too. I just dislike the current format. It is too distant from the ideal logical format I have in mind. Wyang (talk) 07:17, 11 March 2014 (UTC)

apostrophe in Zhuyin for gè'àn - ㄍㄜˋ 'ㄢˋ
Hi,

Could you fix this please :). --Anatoli (обсудить/вклад) 07:02, 14 March 2014 (UTC)
 * Also, do you mind adding Zhuyin to, next to Pinyin? I hope it's not too hard for you. The table is a little too tall, maybe it could simplified with the bullets a bit, considering that we will include dialects as well. --Anatoli (обсудить/вклад) 07:05, 14 March 2014 (UTC)


 * Hi. The apostrophe issue is fixed. I have added Zhuyin to Pinyin-IPA and made it a little more compact (The extra line at the top and bottom of the table is something I plan to remove when my bot gets granted bot rights). Wyang (talk) 13:20, 14 March 2014 (UTC)
 * Thanks a lot! --Anatoli (обсудить/вклад) 13:42, 14 March 2014 (UTC)

Template:cmn-erhua form of
This template has triggered a script error for a while now. Could you please fix that? 15:27, 16 March 2014 (UTC)
 * You reported the wrong culprit (Template:zh-compound/code). Now done. Wyang (talk) 22:40, 16 March 2014 (UTC)

Putting a homophone field in the pronunciation header template
Hi Wyang, is it possible to put a homophone field in the pronunciation header template? I think it would be useful. Here are two sets of entries that could benefit from it: 營利/营利 VS 盈利 and 迷路 VS 麋鹿. ---&#62; Tooironic (talk) 00:07, 18 March 2014 (UTC)
 * See Russian homophones and . They can simply be added manually with:


 * * Homophones:, at the bottom of "====Pronunciation====" section. --
 * Yes I'm aware of that. But I was hoping there was a way to integrate it into the new pronunciation template. It looks strange and ugly to have it listed as a bullet-point under the lovely box. Here's another example: 便利 VS 遍歷/遍历. ---&#62; Tooironic (talk) 00:24, 18 March 2014 (UTC)


 * Hi, you can create the page Template:Pinyin-IPA/hom/PINYIN to include homophones. Anything with that Pinyin will show the homophones field. For an example please see any of the above, or Template:Pinyin-IPA/hom/yìzhì - 意志. Cheers, Wyang (talk) 00:40, 18 March 2014 (UTC)
 * I think homophones should be manually parameterised then, so that not the template but the entries are maintained - 遍歷/遍历 - maybe? But If we keep the simple bulleted style then homophones could fit nicely into the format. Of course, pinyin entries should be kept up-to-date with homophones (no problem listing currently missing entries). Actually, homophones in Chinese is an issue we should discuss separately. There could too many and fitting them into the template will become problematic. --Anatoli (обсудить/вклад) 00:58, 18 March 2014 (UTC)
 * Maybe you could use categories instead? DTLHS (talk) 01:14, 18 March 2014 (UTC)
 * (E.C.)Possibly. It seems the structure of templates with homophones is unsustainable. Editors should be able to add/remove them manually into entries or ignore them altogether and let categories, Pinyin entries list them. If we go away from one complexity with Chinese entries, such as "rs" value, we shouldn't create new one. Topolectal pronunciation/transliteration should be optional, of course. --Anatoli (обсудить/вклад) 01:24, 18 March 2014 (UTC)
 * Re:Anatoli: How about now (collapsed)...? Pinyin entries can be made to have zero information, only links to these templates (in another format). Re:DTLHS: Too many of them... Probably looking at >10000 of these. Templates are probably easier to manage. Wyang (talk) 01:18, 18 March 2014 (UTC)
 * How is creating 10000 subtemplates easier than creating 10000 categories? DTLHS (talk) 01:29, 18 March 2014 (UTC)
 * It might be harder to do manipulations of the data... For example, if one is interested in finding out all near-homophones (minimal pairs wrt tones) of shi4shi4 (i.e. shiNshiN), templates would seem easier in producing the list, no? Wyang (talk) 01:42, 18 March 2014 (UTC)
 * I don't think so- either way you're passing the pinyin through a module that can generate and parse it any way you like. The only disadvantage is you can't add terms we don't have an entry for yet. DTLHS (talk) 01:45, 18 March 2014 (UTC)
 * Yep. Wyang (talk) 02:29, 18 March 2014 (UTC)


 * Not sure about your first question. Could you give me a link. Who will maintain templates? Pinyin entries are much simpler and they have been used to find homophones or choose the right Hanzi entry, anyway. If a template could read Pinyin entries, then it's probably better but seems too complex to me, anyway. In short, status quo is better for homophones, IMHO. --Anatoli (обсудить/вклад) 01:24, 18 March 2014 (UTC)
 * Perhaps it's time you change your position on Pinyin entries? :) They work exactly as Pinyin indices in published dictionaries. I saw your expanded example. It looks OK but would be hard to maintain in the long run. --Anatoli (обсудить/вклад) 01:37, 18 March 2014 (UTC)


 * I meant the template format if you have a look at 犀利. The Pinyin entries in their currently state cannot be called by the pronunciation template to generate a list of homophones. The Pinyin information should be kept centralised somewhere, such that both Pinyin entries and character entries can call these templates without having the need to do synchronisations of contents (especially homophones which can be quite a headache to keep identical). The conversion would not be hard, and it would make Pinyin entries even more unjustified. Wyang (talk) 01:42, 18 March 2014 (UTC)

"Yi" in Zhuyin
In our entry for 意義, the Zhuyin for "yi" is written as ㄧ, whereas in the 國語辭典 entry it is written as 一. Which is considered correct? ---&#62; Tooironic (talk) 04:29, 24 March 2014 (UTC)


 * Isn't it ㄧ not 一 in the 國語辭典 entry too? Wyang (talk) 04:37, 24 March 2014 (UTC)


 * Yes, it's ㄧˋ ㄧˋ as well in 國語辭典 entry. If you copy-paste, you'll see ㄧˋ ㄧˋ. Interesting that it appears horizontally. Does the symbol appear horizontally in horizontal writing and vertical in vertical, similar to the Japanese elongation symbol, which appears as a vertical stroke in vertical writing? I've only seen Zhuyin symbol as a vertical sign before. --Anatoli (обсудить/вклад) 05:19, 24 March 2014 (UTC)


 * Theoretically it should be ㄧ in horizontal writing and 丨 in vertical writing. In reality there are often exceptions to this rule. Wyang (talk) 05:22, 24 March 2014 (UTC)


 * Interesting that in Bopomofo 瓶子 appears in vertical as:


 * ㄆ
 * 1) ㄧ (appears horizontally, in vertical writing ?!, can't render here)
 * ㄥˊ
 * ㄗ
 * and in horizontal as ㄆ丨ㄥˊ ㄗ˙ (the opposite of what you suggested). Note ㄧ and the position of the neutral marker as well. Is there any rule in these examples? --Anatoli (обсудить/вклад) 05:29, 24 March 2014 (UTC)
 * and in horizontal as ㄆ丨ㄥˊ ㄗ˙ (the opposite of what you suggested). Note ㄧ and the position of the neutral marker as well. Is there any rule in these examples? --Anatoli (обсудить/вклад) 05:29, 24 March 2014 (UTC)


 * Ah, yeah. I tried to search for the official rules when I edited Module:PinyinBopo-convert, but did not appear to have found anything very useful. There are also multiple versions of Zhuyin. The Wikipedia example might have been what the official rule (if there is one) considers as correct. I don't know about the rule for tonelessness. As the moedict.tw example above shows, in reality there are often exceptions to how ㄧ/丨 are supposed to be used, if Wikipedia is correct (chances are). Wyang (talk) 05:40, 24 March 2014 (UTC)
 * Thanks. Wikipedia doesn't describe it either. It seems like with the neutral tone marker, there is no real consistency. --Anatoli (обсудить/вклад) 06:01, 24 March 2014 (UTC)

错觉
Hi,

If 错觉 has two readings - cuòjué and cuòjiào (?), how would you make an entry using your template? --Anatoli (обсудить/вклад) 08:09, 24 March 2014 (UTC)


 * I only know the former pronunciation. What does cuòjiào mean? Wyang (talk) 09:42, 24 March 2014 (UTC)
 * I got it from less reputable dictionaries and I thought it was a valid variant. Anyway, I don't recall how you handle words with multiple readings, such as 瘦削. What's the right way? --Anatoli (обсудить/вклад) 10:21, 24 March 2014 (UTC)


 * My plan after User:Wyangbot gets granted a bot flag by a bureaucrat is to finish the format change on pages using Pinyin-IPA (i.e. ). Currently there are >3000 pages using the old format of Template:Pinyin-IPA, which requires each syllable to be fed into the template separately. Once that is done, I will modify Template:Pinyin-IPA, to make it accept alternative readings as second, third, ... parameters and enable one to write comments for each pronunciation (like Taiwan/Mainland, standard/colloquial, if one needs to), and use that to end the template awkwardness in Category:cmn:Variant pronunciations pages. Template:cmn-new will also be modified, so that one can use the parameter |py2=... to add a second pronunciation, although it would be better if one modifies the page afterwards, since the readings are often used in different contexts. By the way, 觉 is one of the few characters in Standard Chinese which show, with jiao4/jiao2 being the colloquial reading (limited in 睡觉), and jue2 being the literary reading (all other situations). People speaking other dialects may use the colloquial reading in compounds which Standard Chinese normally uses the literary one, eg. 觉得jiao2de, 自觉zi4jiao2, and this would be typically considered heavily accented or colloquial. I haven't heard 错觉 been pronounced cuo4jiao4 or cuo4jiao2 though. Wyang (talk) 22:48, 24 March 2014 (UTC)
 * Multiple pronunciations often have different statuses. E.g. Russian when stressed on the first syllable is considered less educated, so is, which is also an alternative spelling of a more standard . Manual feeding of templates is fine with me but I'd like to see an example. At Wiktionary, it's OK to list all acceptable but verifiable forms, even if they are colloquial. As for different contexts, words could be split into etymologies, like 得了. No rush. I can see you're busy. Please consider that we will need templates for words, which ARE NEVER USED in Mandarin as well - Cantonese, Min Nan (including non-Han scripts - Latin, Cyrillic, Arabic), where Pinyin/Zhuyin may not be appropriate. See Talk:老番. Perhaps Cantonese 佢哋 could be a good example, how this type of entries are going to look (after a change).
 * Keep Votes/pl-2014-04/Unified Chinese in mind as well.
 * BTW, 指指点点 doesn't show tone sandhi. --Anatoli (обсудить/вклад) 23:14, 24 March 2014 (UTC)


 * It shows tone sandhi in IPA, but in Pinyin. I only did tone sandhi for Pinyin for words containing 一 and 不, not other cases, since the effects cannot be represented well by Pinyin tone marks. In the case of 指指点点, all syllables undergo tone sandhi, the first three undergo third-to-second tone sandhi which you can represent using the acute accent, but the last syllable undergoes third-to-half-third (half third: only the first half of third tone, only dipping, no rising) which you cannot represent using Pinyin tone markers. There are also half fourth-tone and tone sandhi of neutral syllables, for which there are no Pinyin diacritics available too.
 * For Cantonese-only entries like 老番 and 佢哋, would a pronunciation template like the one in Talk:老番 (either collapsed or uncollapsed) speak your mind? Wyang (talk) 23:28, 24 March 2014 (UTC)
 * Yes, I think so. You could release the pronunciation template without having to wait for the vote. Since you're not breaking anything with it. I guess "==Mandarin==" and "lǎofān" looks confusing on 老番, even if Mandarin usage might be attestable as well.--Anatoli (обсудить/вклад) 00:23, 25 March 2014 (UTC)
 * OK, I have done so at 老番. Wyang (talk) 01:02, 25 March 2014 (UTC)

Homophones bug
Hi Wyang. Why do homophones for 董事 show up in its pronunciation header, but not for all the shìshí entries, like 事實, 適時, 侍食, 試食? ---&#62; Tooironic (talk) 12:18, 27 March 2014 (UTC)
 * I haven't created those yet... I've only done A-G so far - . I will create the rest when I have time. You can create these lists yourself: create the Template:Pinyin-IPA/hom/PINYIN page (like Template:Pinyin-IPA/hom/shìshí) and save with the following text:

Leave SIMP1 empty if TRAD1 == SIMP1. eg. Template:Pinyin-IPA/hom/shìshí has the following text:

Cheers,Wyang (talk) 22:49, 27 March 2014 (UTC)

OK
Was [//en.wiktionary.org/w/index.php?title=OK&curid=5592&diff=25962160&oldid=25833153 this] intentional? The edit summary was "formatting", but the edit removed the entire language section, citations and all. - -sche (discuss) 04:16, 28 March 2014 (UTC)


 * It's not a Chinese word. Wyang (talk) 04:25, 28 March 2014 (UTC)


 * We have to allow a minimum of non-Hanzi Mandarin. "OK" and other all-caps Roman words are included in Chinese dictionaries and OK was used to create 卡拉OK (a Chinese invention), so we should keep OK. --Anatoli (обсудить/вклад) 04:45, 28 March 2014 (UTC)
 * Sorry but I have to revert the bot's edit. --Anatoli (обсудить/вклад) 04:50, 28 March 2014 (UTC)


 * Oh, well. Although it's something I don't agree with. Also, that "OK" has nothing to do with the OK in 卡拉OK. Wyang (talk) 04:52, 28 March 2014 (UTC)
 * Possibly but "OK" is spoken quite a lot by Chinese (it's questionably considered the most common word in the world!), even if it can be argued as "code-switching", nobody found reasonably acceptable Hanzi to render the sounds, so that it was accepted by the majority, besides "OK" is so easy to type, compared to anything else. OK in 卡拉OK is usually pronounced identically, the Chinese way, that's all (with some variations in both). Of course, they have nothing in common otherwise. IMHO, rendering foreign "/k/" sounds seems problematic in standard Chinese with some eexceptions, since many words have "j" via Cantonese or otherwise, even "卡" is not common for foreign /ka/. --Anatoli (обсудить/вклад) 05:03, 28 March 2014 (UTC)

免疫力
Hi there. Do you know why there is a blank line under the pronunciation header? ---&#62; Tooironic (talk) 06:32, 29 March 2014 (UTC)
 * Oh, it seems it's in all the Mandarin entries. ---&#62; Tooironic (talk) 06:35, 29 March 2014 (UTC)


 * Yeah, I'm not sure. It's probably related to my edits to the pronunciation template. I'll see what I can do. By the way, the template now can handle variant pronunciations (eg. 骨頭, 普遍) and can generate Mainland-Taiwan differences automatically (eg. 星期, 乳酪). Cheers, Wyang (talk) 06:42, 29 March 2014 (UTC)
 * Hi. The HSK categories are without cmn. prefix. --Anatoli (обсудить/вклад) 07:58, 29 March 2014 (UTC)


 * Looking good. Thanks for your hard work. I hope you can fix that blank line issue though. ---&#62; Tooironic (talk) 11:31, 29 March 2014 (UTC)

Template:Pinyin-IPA, Template:Pinyin-IPA/essence, Template:Pinyin-IPA/code
The code in these templates is pretty much unreadable right now, it's just a big giant blob of code. Could you clean it up please? 18:14, 29 March 2014 (UTC)


 * Not sure if this is what you meant: - is it clearer now? Wyang (talk) 21:43, 29 March 2014 (UTC)
 * Yes, although I would have done it slightly differently myself. 22:01, 29 March 2014 (UTC)
 * I've reorganised a lot of the code in these templates, to make it easier to maintain. The /essence template really contained the same code 5 times with some small variations, so I split that code out into a separate template, Template:Pinyin-IPA/table. The /essence template is no longer needed now. 00:07, 30 March 2014 (UTC)


 * Yes, the variant pronunciation feature added two days ago involved a lot of duplications. I wanted to put it entirely into Module:Pinyin-IPA when I added it, but I just opted for the easier option out of laziness. Thanks for doing that. By the way, there was a minor error in your code of the main template, which is now fixed. Wyang (talk) 22:36, 30 March 2014 (UTC)

More specifically
Using User:Wyang/歷史 as an exemplar, this is the xml which I would be processing - the most current revision of the article:

User:Wyang/歷史 2</ns> <id>4354968</id> <id>25957600</id> 25957507      2014-03-28T00:38:39Z Atitarev <id>27724</id> /* Chinese */ add, rm Wikipedia, etymology, irrelevant to the proposal <text xml:space="preserve" bytes="438">==Chinese==

Noun
tp7iuzb46po75ssvxo3n88s5lbl3ouu wikitext text/x-wiki
 * 1)  records of past events; historical records
 * 2) history, past
 * 3) past experiences of a person, the history of a person
 * 4) historiography, the study of history, usually

For my current project this would result in the title (User:Wyang/歷史) being added to the list of words for "Chinese". When creating captcha images, having mixed scripts can result in text in one script appearing much smaller than the other, usually illegibly. This can be worked around, but it's an additional investment of time and effort. Every wiki whose language would be collapsed to "Chinese" would end up with, possibly, all the words in that classification being used.

My particular project is very WMF-focused, and you can easily say that it would not matter on other sinitic WMF projects. But Wiktionary's data is not intended solely for use inside the WMF. A researcher may wish to use a wiktionary dump to create pools of zh-classical 'words', or a teacher might wish to create a booklet of Min-Nan zoological terms, or a developer might pull solely Cantonese translations and want solely Cantonese senses to go with them. Doing so under the proposed model would not be possible using the dump data, because the relevant information is carried solely within the parsed templates. Working with the dump takes about 12 minutes to build my 1612 word lists; building the same from the API apparently missed about 4 million entries, and took 36 hours.

In the city I live in, Richmond in BC, Canada, the majority of people speak one or another Sinitc language, but most students must speak English in school. Even very young students use Wiktionary to clarify both their English and their Chinese language use. While I do not have specific evidence, I would expect a speaker of a Chinese language would look on the page for (example) Cantonese first, and Chinese second (or not at all.) In my opinion, Wiktionary should strive to help that student find what they are looking for on their first attempt. - Amgine/t&middot;e 06:07, 31 March 2014 (UTC)


 * Hi, Amgine. Thanks for the clarification. I see what you mean in your comment now. Let me get back to you in two or three hours. Wyang (talk) 06:20, 31 March 2014 (UTC)

Hi, sorry about the delay.


 * I agree that data maintenance of multi-scripted languages is typically particularly troublesome, and people on Wiktionary working with those languages (eg. Serbo-Croatian, Chinese, Japanese) can certain relate to that. However, digraphia in Serbo-Croatian, Chinese and Japanese is unrelated to the amalgamation or separation of its varieties. If the grouping of Serbo-Croatian is not in place, the issue of multiple scripts would still pose a problem for your captcha work, as both the Cyrillic and Latin alphabets are used to write the, with Serbian being the only European language which has synchronic digraphia. Similarly, both  and  are used to write every Chinese variety. Pulling out all entries of a Chinese topolect, before and after the amalgamation of varieties, would both inevitably run into the problem of having to deal with both sets of Chinese characters. The difference in font size between simplified and traditional is probably not significant, if any, fortunately.


 * As you probably know, the Chinese varieties share a common written form - in the past it was Classical Chinese, and now it is . Consequently there is not much point in generating a topolect-specific captcha, say a Wu-language captcha. It would be more realistic to generate a captcha based on Written vernacular Chinese, or just Chinese characters in general. I'm not sure about the details of your captcha project. Do you use the presence of '== ==' language headers to pull out all entries in a particular language? If so, then the merger would be great for your project, since it only applies to Chinese character-scripted entries here. You could pull out the title of every page which contains the header '==Chinese==' in its content, as they are guaranteed to be the same script.


 * If you have attempted generating captcha for non-Mandarin Chinese topolects using Wiktionary data, you may have noticed that at present, such data is remarkably meagre on Wiktionary. For example, Category:Wu nouns only has 10 pages, Category:Gan nouns has 1 page, and Category:Xiang nouns has only 1 page as well. Generating page title data for Category:Min Nan nouns would have resulted in a terrible mix of three scripts, whereas the unified Chinese approach would eliminate this script multiplicity, as said above. I am curious, though, regarding how you would handle Japanese data on Wiktionary? It is written in three (Kanji, Hiragana, Katakana) different scripts here... well actually, four (plus Romaji) if you pull out everything. It must be a headache to try to analyse this.


 * I'm not sure I agree with your point on language self-identification. Most of the Chinese people I know identify their speeches to be 'Chinese' when asked. It is when people enquire further which division of Chinese it is that they give the 'Cantonese' or 'Mandarin' answer.

Cheers, Wyang (talk) 09:42, 31 March 2014 (UTC)


 * For my specific project it is, in fact, important to identify which script is used as the identities of the Wikipedia communities is in part based on their use. It may be offensive to, for example, a Bosnian wikipedian if xe is given a cyrillic captcha, while it would not matter for sh.wikipedia and would possibly be offensive not to do so on sr.wikipedia. My personal opinion for Chinese languages would be to use vernacular Chinese, as you suggest, but how would such be identifiable under your proposal? I do use the L2 header ('== ==') to identify the language, but this is exactly why your proposal is a problem, as I will explain below.
 * As I understand your proposal, script-wise the article titles would probably not face great difficulties. The breadth of characters available in any given family member language may, however, be more limited than the total number of entries in "Chinese". Being able to easily identify a relevant vocabulary - in some ways to limit the expressions to those in common use by the target reader population - is often an important reuse of Wiktionary data. Would it be possible to include in your model an unambiguous method of identifying language codes which would commonly be expected to use the entry?
 * Yes, I have generated word lists for all L2 on en.WT; for Wu I found exactly 45 entries, 19 in Ga, and two for Xiang. Although Min-Nan does include a mix of scripts, this appears to be normal across the spectrum for written Min-Nan although I found references to efforts to standardize Hokkien in any of several writing systems. With your proposal, each of the Chinese languages would suddenly seem to have a very large number of entries if one assumed that, for example, Min-Nan = Min-Nan + Chinese. But it doesn't. Like-wise I, as a person not familiar with these languages but working with the en.Wiktionary data, would not know if Bopomofo entries are Chinese or not, or if terms written in Taiwanese Kana or latin are included. If they are not, would Min-Nan suddenly consist solely of entries in these other writing systems?
 * For generating word lists for Min-Nan your proposal would have no effect on reducing the multiplicity of scripts actually used in written Min-Nan. It would, however, possibly erroneously limit (and/or expand) the list of terms found for Min-Nan in en.Wiktionary data. Consolidating terms under a single L2 header "Chinese", while excluding Chinese family language headers, will likely result in confusing data for later use. Having an unambiguous language code list identifying which languages in which a term is in common use would reduce this confusion, but not entirely alleviate it. Put another way, it is likely to cause future errors, requiring a greater investment of effort in order to use Wiktionary data.
 * I think what I am trying to say is that although English may occasionally use words from many related languages, especially German, French, and Latin, these words are not commonly considered 'part' of the English language except here on en.Wiktionary. These words and phrases make up a much larger vocabulary for English than is actually recognized or understood by large percentage of the population, even though the terms may follow the linguistic rules and rôles of English. I approach the concept of Chinese written language and Japanese Kanji in much this way - intelligibly part of the larger classification, but not always part of the vernacular - which may be completely ignorant.
 * For this project I am using all entries in any writing system, so all four writing systems of Japanese are valid. The generation of captcha images, however, is failing mostly due to the Kanji, which have a high percentage of illegibility due to the complexity of the characters versus the distortion effect used. This is also a problem with Chinese scripts - the highly refined characters become illegible when even slightly distorted. For other analyses I have done with Wiktionary data having multiple writing systems is a distinct benefit, allowing use of a larger corpus of source documents. The limitations become creativity and time, rather than what can be analysed. - Amgine/t&middot;e 15:22, 31 March 2014 (UTC)

Hi, Amgine. Thanks for the reply. I would like to mention a few things:


 * Content under the heading '==Chinese==' will not be absentmindedly assigned to every ISO-coded Chinese variety under the proposal. The unambiguous language code list for a term is the pronunciation template, and pronunciation in each variety will be fed into that template. In the xml code above, the varieties for which pronunciations have been given include: Mandarin, Cantonese, Min Nan and Wu, hence the entry will be categorised into those respective categories, sorted by the appropriate romanisation. One could parse through the template code to extract the topolect-specific page titles, eg. regex <tt> \{\{zh\-pron\n\|([^}]+\n)*\|c=(.+)(\n[^}]+)*\n\}\} </tt> or something similar, to extract all the Cantonese pages. Even easier perhaps, one could recursively extract the titles of all pages from the category Category:Cantonese parts of speech and its subcategories, which would be much more convenient.
 * I concur with your point of using the appropriate script so as to avoid offending specific subpopulations of a larger speaker population. However, the circumstance may be different for languages with speech-writing separation. The imposition of a modern literary standard is in place in all countries which designate Chinese as one of the official languages, and texts written in Written vernacular Chinese would be understandable to any educated person. The scope of other orthographies would be very limited. For example, the  romanisation of Min Nan is generally only understood by some seniors. Young people in Taiwan, who are fully conversant in Min Nan, are mostly illiterate in Pe̍h-ōe-jī. Even if they are able to read it, it is unlikely that they would have the appropriate input method for it.
 * Generating Chinese character captchas seems quite uncommon, and I can imagine it will be much more difficult than the Latin alphabet. A Google image search suggests most have basically unobscurified characters, or just characters in different fonts. Most Chinese fora are simply not bothered and just opt to use Latin-alphabet captchas. For the captcha project, I think the best approach would be to generate captcha based on Written vernacular Chinese. It could be produced by pulling out titles of all entries containing '==Chinese=='. Alternatively, you might want to pull out titles which are used in both Simplified Chinese and Traditional Chinese, as the reader population for any Chinese variety will be a mix of the two, and it may not be possible for them to have the input method for both sets of characters. This could be done by parsing through the simp-trad form template (, as alternatives have been made obsolete by User:Wyangbot), and generating all mainspace pages transcluding the template but lacking a second parameter - eg. <tt> \{\{zh\-hanzi\-box\|([^\|]+)\}\} </tt>. Or, you could use a frequency list for Chinese characters (eg. ) and do combinations and modifications on characters which lack a simp-trad distinction. The text probably does not have to be meaning-conveying for Chinese character captchas.

Cheers, Wyang (talk) 00:08, 1 April 2014 (UTC)


 * Thank you for this, Wyang. I've linked some of this information on the captcha bugs.


 * Having a second regex inside the L2 loop to check for the presence of the zh-pron template doubles the amount of work the parsing script is required to perform. It also more than doubles the amount content processing, and a basic test of your example page 1000000 shows a time to process increase of just over 18x on average. (I could do a formal benchmark if you would like.) The template is not documented. The codes used are not unambiguous, and do not follow a reference standard. What this means is it cannot be trusted to reliably identify which languages the entry can be used for, nor can any metadata processor future-proof their code.
 * Using the API to recursively iterate over Category:Cantonese parts of speech would take, I estimate, two or three days. Multiply this for each language which is at least equal in size. This time expense is prohibitive; it is not an option. Additionally, previous parsing for european languages found about 98% of terms were properly categorized; the remaining 1-2% were uncategorized or miscategorized.
 * Yesterday a new dump of wiktionary was produced, and I'm working on automating a process to update the word lists generated from it. However, I will be recommending that we not use en.Wiktionary data in the future, and instead derive word lists from the wikipedia dumps.


 * - Amgine/t&middot;e 15:55, 1 April 2014 (UTC)

Post-recommendation discussions
Hi, User:Amgine. I have added some more detailed descriptions of the template. Please forgive me if it is insufficiently detailed and unambiguous; the template itself rests on the assumption that a unified Chinese approach is agreed upon. I am wondering what the outputs for your runs are like? Are they page title lists for entries satisfying a particular criterion, without any page content or history information? If so, there are probably better ways to achieve this. Wyang (talk) 04:27, 2 April 2014 (UTC)


 * I have, previously, processed en.Wiktionary content for many different purposes ranging from a mediawiki gadget to [|creating DICT ouput] (as a proof of concept) to various structured dump processing scripts for linguistic research and cross-referencing all wiktionaries to a private corpus. My current project's output is a simple lists of terms, a couple of quick hacks which produce output like this from the 2014/03/28 dump of en.WT.
 * In short, I manipulate Wiktionary content in many different ways for diverse clients. Unlike many project members, I am aware of how exceptionally relevant en.WT data can be in real-world applications, both inside and out of acadæmia. And how useless.
 * To answer your question directly regarding the current request, I am creating lists of terms or phrases which are considered vulgar or obscene, and lists of terms which are *not* considered vulgar or obscene which meet the further requirements of being not confusable, single scripted, and/or non-spoofed (invisible || single- or multi-script equivalencies to non-linguistic terms.) This is related to Mediawiki bugs #32695 #5309 (primary), #63216, #63217, #62960 (prototypes via GSOC 2014) and of course CAPTCHA. - Amgine/t&middot;e 05:35, 2 April 2014 (UTC)


 * Thanks, I see. For the current project, it seems Mandarin.txt is a mix of Simplified Chinese, Traditional Chinese, Latin letters (some with diacritics), numbers, and special symbols. Just a thought: In multi-script cases, you could perhaps use AWB for simple tasks like generating word lists. This is my take on Mandarin.txt (via recursively extracting pages under Category:Mandarin parts of speech three times) and Mandarin_NoSimpTradDistinction.txt (using the second latest en.WT dump, finding mainspace transclusions of lacking a second parameter). In both cases non-Chinese characters symbols have been filtered off. These are effectively Written vernacular Chinese wordlists, and the latter is probably good for producing captchas targeted at speakers of Chinese varieties. Wyang (talk) 06:24, 2 April 2014 (UTC)


 * Not sure that AWB can run as an unattended event on a *nix server, but I'll ask Reedy about how automatable the process could be. - Amgine/t&middot;e 16:23, 2 April 2014 (UTC)

Module:zh
Why remove counter? --kc_kennylau (talk) 09:19, 2 April 2014 (UTC)


 * 'Counter' is superseded by 'Classifier' per the discussions at Template talk:cmn-new and Beer_parlour/2014/April. Wyang (talk) 10:17, 2 April 2014 (UTC)

Moving classifiers back to counters
Thank you so much for this and sorry for the confusion and making you work! You seem to be able to do the formatting work with your bot as well for the Japanese. --Anatoli (обсудить/вклад) 22:43, 3 April 2014 (UTC)


 * No worries. I have posted at Beer parlour/2014/April. Yes, I put 'simplification of the headword templates' on my tasks-to-do list for the bot (at the end of the list). :) Wyang (talk) 23:16, 3 April 2014 (UTC)

其中
I think something went wrong here. 13:40, 8 April 2014 (UTC)
 * Fixed. --kc_kennylau (talk) 14:41, 8 April 2014 (UTC)
 * Undone. --kc_kennylau (talk) 14:42, 8 April 2014 (UTC)


 * OK. There was only |sort2= but not |sort= in the previous version, which is why it got confused. Thanks people for fixing it. Wyang (talk) 04:12, 9 April 2014 (UTC)

Japanese counters -> classifiers
Great job, thank you! --Anatoli (обсудить/вклад) 06:43, 3 April 2014 (UTC)

Erroneous deletion of "References" headers
I just ran across this a second time, and realized that Wyangbot is the one doing it: for one example. I can't remember the earlier example of where else I've seen this, but just now checking Wyangbot's contribs, I also found and. Could you look into this? &#8209;&#8209; Eiríkr Útlendi │ <small style="position: relative; top: -3px;">Tala við mig 19:55, 8 April 2014 (UTC)
 * "rs" is not references but "radical sort". Mandarin entries should now be sorted by numbered pinyin instead- "pint" by an earlier agreement with al Chinese editors. Suffixes "in simplified script"/ "in traditional script" are removed in topical categories. The bot should actually replace rs with pint, IMO. Maybe Wyang wants to do it in stages? --Anatoli (обсудить/вклад) 20:11, 8 April 2014 (UTC)


 * Anatoli, have another look -- Wyangbot is deleting the  header from some, but not all, Japanese entries that are above a Mandarin section that was edited by the bot.  I'm not sure why Wyangbot is only doing this some of the time.  &#8209;&#8209; Eiríkr Útlendi │ <small style="position: relative; top: -3px;">Tala við mig 20:14, 8 April 2014 (UTC)
 * I see, thanks. Hopefully, can explain and help fix it. --Anatoli (обсудить/вклад) 22:48, 8 April 2014 (UTC)


 * There was an error in the code looking for empty reference sections, and I have fixed this. Sorry and thanks. Could you please have a look at bot edits of the 'references' section of other articles in your watchlist? I will search for other affected articles too once the new en.wikt dump is available in about ten days time. Thanks. Wyang (talk) 04:27, 9 April 2014 (UTC)


 * Cheers, yes, I'm slowly working down the contribs list. Any idea when this bug was introduced, so I have an idea when to stop?  :)  &#8209;&#8209; Eiríkr Útlendi │ <small style="position: relative; top: -3px;">Tala við mig 06:56, 9 April 2014 (UTC)


 * The automated changes were started approximately 24 hours ago. I am automatically readding the references header to lines of instead of  . However, I have added codes to adapt to names already, so you're safe to use   now. --kc_kennylau (talk) 10:23, 6 May 2014 (UTC)
 * Yes, Kenny is right. Wyang (talk) 12:26, 6 May 2014 (UTC)
 * Thanks very much. I don't usually add idioms but this one is one of my favourites. :) ---&#62; Tooironic (talk) 22:45, 6 May 2014 (UTC)

Two questions
Above. --kc_kennylau (talk) 09:26, 7 May 2014 (UTC)
 * 1) Look at 唔該, the Yale romanization is not function properly because the grave accent cannot be displayed on top of the letter m. Any idea how to fix this?
 * 2) What parameters should be included in ? I feel so awful deleting every detail in the head.


 * #1 is due to the <tt> formatting. We could remove that, although it wouldn't look as nice typographically. 2) I would go for no parameter at all. Anything included would be duplicative of something that is already present. Wyang (talk) 09:32, 7 May 2014 (UTC)

return your head
Did you understand this phrase I put in the edit summary? --kc_kennylau (talk) 10:42, 8 May 2014 (UTC)
 * A calque of Chinese ……你個頭. Wyang (talk) 10:43, 8 May 2014 (UTC)
 * Does this phrase exist in Mandarin? --kc_kennylau (talk) 10:48, 8 May 2014 (UTC)
 * Yes. Wyang (talk) 10:48, 8 May 2014 (UTC)
 * What would an appropriate translation be? --kc_kennylau (talk) 11:10, 8 May 2014 (UTC)
 * my arse. See 你妹. Wyang (talk) 11:12, 8 May 2014 (UTC)
 * Interesting that Russian uses a similar swearword (zh:你妈！) in such cases - "your mother!" (in the accusative case - object), well "fuck" is implied and is also used explicitly: ! --Anatoli (обсудить/вклад) 05:21, 9 May 2014 (UTC)

Cantonese - done, Min Nan - to do
Cantonese multisyllabic entries are now converted/merged/fixed to use "Chinese" L2 (every PoS, if they used the proper templates)! Now the turn is for Min Nan - a much larger set. I'm not familiar with Min Nan but I can treat carefully and check but I may not be able to spot wrong entries - transliteration, senses, etc. Do you think you can run your bot again (I saw you merged Min Nan entries as well)? Min Nan entries seem a bit more complicated than Cantonese, though. --Anatoli (обсудить/вклад) 04:59, 9 May 2014 (UTC)
 * Thanks for all the hard work, 安德利 :) I definitely will, when I have time. Wyang (talk) 05:07, 9 May 2014 (UTC)
 * 不用谢，方智. 我越来越喜欢学中文，现在做新文章也比较容易了，感谢你了. :)--Anatoli (обсудить/вклад) 05:15, 9 May 2014 (UTC)
 * 誰是方智？還有，祝你好運！--kc_kennylau (talk) 09:03, 9 May 2014 (UTC)
 * 你猜啊！谢谢你. --Anatoli (обсудить/вклад) 12:07, 9 May 2014 (UTC)
 * 你的中文蛮不错的嘛，呵呵. Wyang (talk) 01:21, 10 May 2014 (UTC)
 * Why can't we make the category for Cantonese Jyutping act more like the category for Mandarin Pinyin? --Lo Ximiendo (talk) 03:43, 10 May 2014 (UTC)
 * I'm probably the wrong person to ask for this...I'm against having Mandarin Pinyin in the way they are now, or having any romanised entries at all. Wyang (talk) 05:28, 10 May 2014 (UTC)
 * Monosyllabic are allowed, pollysyllabic probably not - Votes/2013-11/Jyutping. That vote was controversial. --Anatoli (обсудить/вклад)

a potential issue
Hi Wyang, I noticed that at the new entry I created for 休養 it appears that it is not linked to any category. What's going on here? ---&#62; Tooironic (talk) 01:26, 10 May 2014 (UTC)
 * Hi, please use instead of . Mandarin audios have the parameter '|ma=', which works exactly like the parameter '|a=' in Pinyin-IPA. Please see my change on that page. Wyang (talk) 01:29, 10 May 2014 (UTC)
 * Gotcha, thanks. ---&#62; Tooironic (talk) 02:36, 10 May 2014 (UTC)

你懂的
Does this exist in Mandarin? --kc_kennylau (talk) 12:51, 11 May 2014 (UTC)
 * Yes. I have added some examples there but they are in part 18+. Wyang (talk) 00:15, 12 May 2014 (UTC)

cat=con vs. cat=conj
It's a minor issue, but I just made the above change to zh-pron in 4 entries to empty Category:Mandarin con and its sister categories: you evidently told your bot to use "con", and zh-pron didn't recognize it. Fortunately I knew where to find the correct abbreviation, but others won't- so someone will probably make that or similar mistakes as long as there's no list in the documentation. Chuck Entz (talk) 00:21, 12 May 2014 (UTC)
 * Thanks for letting me know. I have enabled 'con' and 'conjunction' as valid aliases of 'conjunctions'. Wyang (talk) 00:24, 12 May 2014 (UTC)

Wu Chinese transliteration
Sorry, Frank, I'm making too many mistakes so far. I'll make a list of words and my transliteration attempts for you to check. Is that okey? --Anatoli (обсудить/вклад) 01:07, 12 May 2014 (UTC)
 * Please if you can. It's all right since the Wu pronunciations are quite unintuitive for anyone not familiar with it. Wyang (talk) 01:11, 12 May 2014 (UTC)

Middle Chinese and Hakka transliterations
Hi Frank,

Is there a way to transliterate Middle Chinese? I've merged 我 but not happy about Middle Chinese (ŋấ, ngɑ̌) and Hakka (a big list with a reference to a dictionary). I'd like to do 好. It also has Middle Chinese transliterations: *xaù, *xǎu and a list of Hakka. Not sure about the best way to add them. --Anatoli (обсудить/вклад) 14:41, 13 May 2014 (UTC)
 * Please use the |mc= parameter in . Please use this page to look up MC pronunciations, the parameter value is "中古声母(1 syl)-中古韵母(1 syl)-中古等(1 or 3 syl, no "等" character)-中古开合(1 syl)-中古摄(1 syl)-中古声调(1 syl)-中古反切(2 syl)". Multiple readings ("后一条") are separated by ",". Please see my edits at 我 and 好. Wyang (talk) 05:01, 14 May 2014 (UTC)


 * Hi. I did a couple but adding Middle Chinese transliterations seems such a hassle using [] Perhaps, we should just adopt one or two of the transliterations there without extra info? Same with Hakka, actually, perhaps a simple list would do, not sure if every word/character can be found in the used references. --Anatoli (обсудить/вклад) 01:29, 20 May 2014 (UTC)


 * The process of extracting those values can be automated. The "one or two of the transliterations there" are for Old Chinese, not Middle Chinese. Wyang (talk) 01:47, 20 May 2014 (UTC)


 * I meant Middle Chinese, e.g. value "疑歌一开果上五可" for 我. If this can be automated, this would be wonderful but that's for single-character words? --Anatoli (обсудить/вклад) 01:54, 20 May 2014 (UTC)

備胎
Was just wondering if you had a suggestion about how to translate the extended meaning of 備胎? The best I could do was "a possible replacement for one's current partner". It's a terrible translation, but I'm not sure if there is any equivalent for this in English. ---&#62; Tooironic (talk) 04:17, 14 May 2014 (UTC)
 * Aha, good one. Don't think an exact equivalent exists in English - maybe "a backup", "a second choice", "a just-in-case", "a plan B", "a contingency"? Wyang (talk) 04:26, 14 May 2014 (UTC)

㑚
Hi Frank,

Could you check this entry please - specifically word boundaries and Min Nan transliteration? There is some Wu specific grammar and words I don't understand in this usage example. --Anatoli (обсудить/вклад) 00:06, 15 May 2014 (UTC)
 * I have checked Wu. It doesn't seem to be used in Min Nan. Which bit of the grammar do you not understand? 立(站)-辣(在)-窗口頭(窗口前)-額(的)-搿(這)-個-人-是-㑚(你)-經理，對- 𠲎(嗎)？ Wyang (talk) 03:13, 15 May 2014 (UTC)


 * Thanks for adding Mandarin, I understand now. I hoped there is a Min Nan reading, also for 你们, even if the words are not used in Min Nan. Should 窗口頭(窗口前) be split or is it synonymic to 窗口? Also, Qian Nairong says 我 is also pronounced as "whu23" by young people, normally "ngu34", that's "3ngu" and "3hhu", right? Which tone is right? Can I add the alternative "hhu" pronunciation? --Anatoli (обсудить/вклад) 03:22, 15 May 2014 (UTC)


 * You (plural) in Min Nan is 恁. Yes, 我 is 3ngu and 3hhu. 窗口頭 is a word, 頭 is a suffix in that word, like 木頭. Wyang (talk) 03:28, 15 May 2014 (UTC)

yo
Is this sound even present? --kc_kennylau (talk) 10:18, 15 May 2014 (UTC)
 * 喔唷 (ōyō), 哼唷 (hēngyō), 哎哟 (āiyou, āiyō) --Anatoli (обсудить/вклад) 10:34, 15 May 2014 (UTC)
 * Yep. Wyang (talk) 23:12, 15 May 2014 (UTC)


 * Zhuyin is failing, though, see 唷喔 or ōyō. Could you please add? I think it's "｜ㄛ". --Anatoli (обсудить/вклад) 23:29, 15 May 2014 (UTC)


 * Yes, "yō" is definitely "｜ㄛ" in Zhuyin: --Anatoli (обсудить/вклад) 23:32, 15 May 2014 (UTC)


 * Fixed. Wyang (talk) 00:22, 16 May 2014 (UTC)

Pinyin-IPA to zh-pron 2
When converting topolects to the new format, the longest time is to convert from using to. Could you run a bot to change those on existing Mandarin entries? I don't know if it's hard and if it may cause other problems, though. --Anatoli (обсудить/вклад) 01:18, 16 May 2014 (UTC)


 * Hi, you can use . Replace 'Pinyin-IPA' with 'subst:Pinyin-IPA/a', and add a |cat= parameter at the end. :) Wyang (talk) 02:09, 16 May 2014 (UTC)
 * I'm not sure what you mean. I am only doing it manually (copy/paste) or re-generate with, which adds . Can you show, please? --Anatoli (обсудить/вклад)

吃飯:

Replace it with

chīfàn

Wyang (talk) 02:19, 16 May 2014 (UTC)


 * I got it, thanks. Used on 草書 + c=, mn=. --Anatoli (обсудить/вклад) 02:55, 16 May 2014 (UTC)

Other Topolects for 加油
Hi, when are you adding the pronunciations for Wu, Gan, Hakka, Min Dong and Xiang on the entry for 加油? --Lo Ximiendo (talk) 01:55, 16 May 2014 (UTC)
 * Wyang is very busy with merging topolects. I'm also hassling him to add Wu pronunciations, which I attempt to do myself. For some topolects without a developed transliteration system it's especially complicated and may not be even available. If IPA or sound recording is found, then it's possible but this information has to be found. Having said this, a starightforward way to add topolects, which are not handled yet must be addressed, if IPA or sound recording is found.
 * My attempt with Wu: "1ka yeu" (probably wrong), Hakka: "ka-yû". --Anatoli (обсудить/вклад) 02:06, 16 May 2014 (UTC)
 * I'd probably go for "4ka yeu". Jamesjiao → T ◊ C 02:18, 16 May 2014 (UTC)
 * In Shanghainese Wu, it follows phrase tone sandhi rules, as its individual parts are evident. It's ka44 hhieu23. Wyang (talk) 02:28, 16 May 2014 (UTC)
 * Is the /ɦ/ really there? I can't hear it myself (doesn't mean it doesn't exist though). Jamesjiao → T ◊ C 02:44, 16 May 2014 (UTC)
 * I can't add |w=ka44 hhieu23 (Module error). --Anatoli (обсудить/вклад) 02:43, 16 May 2014 (UTC)
 * I don't hear /ɦ/ either. I've got a little book on Shanghainese. They speak very fast, though and I don't seem to get Wu sounds well. here's a nice recording on, the site Wyang gave me.--Anatoli (обсудить/вклад) 03:01, 16 May 2014 (UTC)
 * /ɦ/ is the slight constriction of the glottis in the recording. Apart from the constriction, the presence of 'hh' also causes the tone to be lower when the character is pronounced in isolation. Compare 椅 i and 夷 hhi, as well as 矮 a and 鞋 hha. Null-initial and /ɦ/ are found in complementary distribution, occurring in characters which had voiceless and voiced initials in MC respectively. 油 (you2) had voiced initial in MC, which is why it is tone 2 in Mandarin (阳平) not tone 1 (阴平). 幽 (you1) would have voiceless initial in MC, and its Shanghainese pronunciation would therefore lack 'hh' and be just 'ieu'. Wyang (talk) 04:42, 16 May 2014 (UTC)
 * Makes a bit of sense but how do you know if it's 阳平 or 阴平 tone? Do you know Middle Chinese pronunciation for these characters? For 油 Wu minidict only shows "yeu" 平/1. So it can be either 1yeu or 3hhieu? --Anatoli (обсудить/вклад) 05:23, 16 May 2014 (UTC)
 * For 油 it is 3hhieu (MD: yeu 平/1, 阳平), and for 幽 it is 1ieu (MD: ieu 平/1, 阴平). You can use the MC pronunciation or other dialectal information. For the level tone it is easy, Mandarin 1st tone = 阴平, 2nd tone = 阳平; Cantonese 1st tone = 阴平, 4th tone = 阳平. So compare: 幽 (M you1, C jau1, W 1ieu), 油 (M you2, C jau4, W 3hhieu). Wyang (talk) 05:28, 16 May 2014 (UTC)
 * So, you basically can use Mandarin pronunciation + 平/去/入 from MD to determine the tone of isolated hanzi? I was only relying on MD for tones when I couldn't use Qian's book. --Anatoli (обсудить/вклад) 05:34, 16 May 2014 (UTC)
 * You don't need to use Mandarin. The voicedness of the initial and 平上去入 is enough for knowing which tonal category the character belongs to in Shanghainese. MiniDict's 'y' is 'hhi', so for the voiced initial 'hh', the tonal category of 油 is tone 3 (voiced, 平, i.e. light level). Wyang (talk) 05:37, 16 May 2014 (UTC)


 * I still find it hard to convert what I find in wu-minidict to what you have described. I'm not giving up but it's kind of difficult to combine learning and editing. Even if I get an audio file to listen to Shanghainese words, I can now pick up only some tones, phrasal tones make little sense. I'm more or less comfortable with reproducing and picking up Mandarin tones, I never really bothered with IPA, since I used pinyin and characters. And I'm still about uncomfortable with numbers used to represent tones in IPA but I'm getting more understanding. My exposure to Cantonese is much shorter but I used lessons and listen to recording but I'm not comfortable with Cantonese tones. Still, Cantonese doesn't sound as alien as Shanghainese, my former Chinese classmates taught me some too. After the merger, I'll do a bit more Shanghainese. Sorry for bugging about transliterations and thank you very much for your help. If it's not a burden, I'll keep adding words to my list of words to transliterate in Wu.
 * On the topic of Xiang, Gan, etc. Since there's so little documentation, no standard or official transliteration, are we going to handle those at all? if yes, in what way? Currently, there's almost nothing in Wiktionary, outside Mandarin and major popular topolects - Cantonese, Min Nan, Wu and Hakka. What if there's a sourced audio-recording or IPA in Xiang Chinese? Can we have a simple framework for those? E.g. as simple as x=, etc. in 湘语? Just a thought. --Anatoli (обсудить/вклад) 01:50, 20 May 2014 (UTC)


 * Shanghainese is a bit unusual among Chinese dialects. It arose as sort of a creole of different Wu and Mandarin dialects in the past century, which is why its phonology is a lot simplified compared with the neighbouring dialects. Its tone system is on the verge of breakdown (or from another perspective, on the path to a pitch accent system), and there is so much homophony and multisyllabification. For example, the listener wouldn't know whether the person who said 我买/卖过汽车 has the experience of buying or selling a car. No worries about the transliteration checks.
 * Tones are hard to get used to, especially when there are too many of them in the language.
 * With regard to the other groups, the only one with some printed romanisation material would probably be . The romanisation is or "Bàng-uâ-cê" (same characters as POJ). However, the phonology of Min Dong is notoriously difficult, arguably the hardest in theory among Chinese dialects. There are complex sandhi rules not only for tones, but for initials and finals (!) as well (See how  has two sets of values for each rime). Luckily I had some exposure to it before. The amount of printed material using that romanisation is meagre, although I am looking for ways of obtaining those material either electronically or in print.
 * The other ones - I would just set the parameter |x=, |g= to IPA. The parameter will be passed to a function which converts numbers to superscripts: x=siɔ̃44 ny31. Audios can be added using |xa=, |ga=; see 中国. Wyang (talk) 02:22, 20 May 2014 (UTC)
 * Would |x=IPA=/siɔ̃44 ny31/ be OK for Xiang or just |x=siɔ̃44 ny31 ? --Anatoli (обсудить/вклад) 02:27, 20 May 2014 (UTC)
 * It's a shame Wu/Shanghainese has so few resorces. The site you gave me - doesn't use consistent spelling and there's so little about grammar. Ming Dong seems scary and there must be very little written in this dialect or only in Roman letters. Another problem is, dialectal words may not pass RFV, if they only appear in chats, dubious web-sites and the pronunciation/transliteration provided is amateurish or otherwise incompatible with the way we write IPA/transliteration here. So, some dialects, even big ones may miss out completely. --Anatoli (обсудить/вклад) 02:36, 20 May 2014 (UTC)
 * We could always resort to dictionaries perhaps, such as the Comprehensive Dictionary of Chinese Dialects I mentioned before or the . Wyang (talk) 03:43, 20 May 2014 (UTC)
 * It's not easy to access them, I don't see myself mass-adding entries in smaller dialects, I may become more comfortable with Wu later, and Min Nan and Cantonese are available enough. I think we should create a simple enough framework, though (like you said x=IPA(x)). Please also answer my question above about the format of Xiang IPA or let me know if you're undecided yet. --Anatoli (обсудить/вклад) 03:50, 20 May 2014 (UTC)
 * Just |x=siɔ̃44 ny31, since there is no romanisation for it. Wyang (talk) 03:52, 20 May 2014 (UTC)
 * OK. I've added in 湘語/湘语, please make it display and categorise (Xiang nouns) if you can, and other topolects we might include in the future (IPA only). --Anatoli (обсудить/вклад) 04:07, 20 May 2014 (UTC)
 * OK, |x=, |g=, |j= enabled now. Wyang (talk) 04:22, 20 May 2014 (UTC)

Thanks but 湘語/湘语 doesn't seem to work - I mean categorisation. I think they should also be visible in collapsed mode as well--Anatoli (обсудить/вклад) 04:27, 20 May 2014 (UTC)
 * Categorised now. Gan, Jin and Xiang promoted (it looks a bit weird though, having a mix of romanisations and ipas). Wyang (talk) 04:37, 20 May 2014 (UTC)
 * Thank you. I think it looks OK for the lack of romanisation and because there could be multiple IPA for other varieties. we can document it later.
 * Without actually suppressing any dialect, there should be probably be a technical limit on what can go into, and can be added to PoS categories. What if a small regional entry with a pronunciation is added by a contributor, e.g. Sichuanese Mandarin 横顺 (huan2 sen1) (=反正) or even smaller, less known dialect? Wiktionary principle is all words in all languages, though. What do you think? --Anatoli (обсудить/вклад) 04:56, 20 May 2014 (UTC)
 * I agree. We could account for those by allowing things like |m=Sichuan=IPA (in the future). Wyang (talk) 06:29, 20 May 2014 (UTC)

Chinese
Please check the Mandarin pronunciation of 一次方程. --kc_kennylau (talk) 14:31, 18 May 2014 (UTC)
 * Checked, it is correct. Wyang (talk) 23:21, 18 May 2014 (UTC)

AWB
Keep it going, please :) There are verbs, adverbs, adjectives... --Anatoli (обсудить/вклад) 07:27, 20 May 2014 (UTC)


 * Seems to be all done now. Wyang (talk) 08:10, 20 May 2014 (UTC)


 * Good job! There are some multisyllabic adjectives, interjections, pronouns and prepositions. I have just cleaned a few proper nouns. Well, when all varieties are done, you can do Mandarin? --Anatoli (обсудить/вклад) 08:45, 20 May 2014 (UTC)


 * You are right... For some reason I erroneously filtered some articles off the list. I'm now generating a still-to-do list from the dump, and I'm probably looking at >100 pages here. Wyang (talk) 10:52, 20 May 2014 (UTC)
 * Could you run your AWB again, please? It's just not efficient to do it manually.
 * There are only two Min Dong entries - 平話/平话.--Anatoli (обсудить/вклад) 13:06, 21 May 2014 (UTC)
 * No problem, but probably tomorrow since it's quite late now. Would you like to use that tool too? It is very simple. I have shared my file at https://www.dropbox.com/s/ue72ilv0x62ylvd/topolect_merger.xml. Wyang (talk) 13:21, 21 May 2014 (UTC)
 * I have saved the file but I have no idea how AWB works and I don't have it. You'd probably have to spend much time explaining. Tomorrow's fine or any other time, as long as you're planning to do it. --Anatoli (обсудить/вклад) 13:28, 21 May 2014 (UTC)
 * At any rate, if you would like to learn to use it any time, I'm more than happy to help. All you do is download it, put the file in (File > Open Settings), log in (File > Log in/Profiles > Add), and run (Start > Start). I will do some when I have time. Wyang (talk) 23:25, 21 May 2014 (UTC)

Sorting
Why use pinyin to sort both simplified and traditional versions in Chinese topic categories? --kc_kennylau (talk) 12:23, 20 May 2014 (UTC)
 * I don't know. What do you reckon? Wyang (talk) 00:21, 21 May 2014 (UTC)

Pinyin spacing
Should 土衛七 be Tǔwèiqī or Tǔwèi qī or Tǔwèi Qī? --kc_kennylau (talk) 12:33, 20 May 2014 (UTC)
 * I'm not familiar with the orthography rules of Pinyin. This website might be helpful. Wyang (talk) 00:22, 21 May 2014 (UTC)
 * I suggest "Tǔwèiqī", the same with other Saturnian or Jovian moons. --Anatoli (обсудить/вклад) 00:42, 21 May 2014 (UTC)

房卡, 房號, 房号, 房型
Just bringing this to your attention. All these entries have come up with "(At least one of the forms in the hanzi box is uncreated...)" at the top of the page. ---&#62; Tooironic (talk) 04:29, 21 May 2014 (UTC)
 * It goes away if you save the page with an empty edit. It's a server lag problem. I'm using my bot to do null edits on these, so it should go away soon. Wyang (talk) 04:32, 21 May 2014 (UTC)

zh-usex
How to link to one page while having the transliteration with spaces? For example in 愛怎麼著怎麼著. --kc_kennylau (talk) 11:14, 21 May 2014 (UTC)
 * I believe that is not possible currently... Well, unless you modify Module:zh-usex. :) Wyang (talk) 11:17, 21 May 2014 (UTC)
 * I have already implemented this function. Please update the documentation accordingly if you like. :) --kc_kennylau (talk) 12:31, 21 May 2014 (UTC)
 * Thanks. I have expanded Template:zh-usex/documentation. It seems simp_word[i] fails to accept the new tricks though, when I tried to add 愛怎麼著怎麼著 as an example there. Wyang (talk) 12:56, 21 May 2014 (UTC)
 * Where is the example that failed? --kc_kennylau (talk) 13:12, 21 May 2014 (UTC)
 * At that documentation page now. Wyang (talk) 13:23, 21 May 2014 (UTC)
 * Done. --kc_kennylau (talk) 13:35, 21 May 2014 (UTC)

個/个 and topolect merger
Frank, could you please edit the entry yourself, specifically the Wu transliteration, perhaps some use examples? Just one of them is okey - 個, I'll fix the other one (trad./simp.).

I have added a few Wu entries without updating the check-list. Some are from the Wu dictionary (astronomy, weather), so I have some confidence about the tones but initials/consonants may need checking but the IPA generated looked similar (not identical to Wiktionary methods you designed). I also used existing verified entries for reference. I can't easily access the dictionary, though. So, others entries need more attention still - both tones and the rest. Would you prefer me to add any new Wu entry to the checklist? Thanks for regularly checking it! It's really helpful.

I'd like to do more Wu, I'd appreciate if you check my edits. The more entries we, the easier it gets to add more contents.

I'll leave the remaining work on topolect merger to you, since you're better equipped with tools and skills (there are still remaining multisyllabic entries but you need to update your list, since I have done a few) but I will work gradually on single-characters entries, they probably can't be done automatically?

I think all remaining Min Nan entries without Mandarin equivalent should get in front of the definition. What do you think? --Anatoli (обсудить/вклад) 01:56, 23 May 2014 (UTC)


 * No problem for the checklist. Please add anything you are unsure about, or anything that is not supported by those references.
 * I have expanded 個. I propose that we allow the use of the header "Definitions" for hanzi entries, makes it a lot clearer and editing a lot easier.
 * I will get on with the merger job... There were 177 remaining the last time I checked. Should be done soon. Wyang (talk) 06:54, 23 May 2014 (UTC)

Null-initial syllables
Should it be /ɥy/ instead of /y/ for the word 語 in 語言? --kc_kennylau (talk) 09:28, 23 May 2014 (UTC)
 * Do you mean when it is null-initial? Or any /y/? Wyang (talk) 09:43, 23 May 2014 (UTC)
 * Yes, any null-initial. --kc_kennylau (talk) 09:47, 23 May 2014 (UTC)
 * I don't have a strong preference for this... To me they are just different ways of looking at the phonotactics. I prefer /y/, as I think there isn't a semivocalic component that is worth notating, but that might just be my idiosyncrasy. If you change it, make sure you change /i/ and /u/ as well. Wyang (talk) 10:00, 23 May 2014 (UTC)

核兒
I think this should be pronounced húr, right? ---&#62; Tooironic (talk) 17:19, 23 May 2014 (UTC)
 * Yes, you are right. Thanks, added. Wyang (talk) 00:43, 24 May 2014 (UTC)