Module talk:PinyinBopo-convert/testcases

Related discussions: Grease_pit/2014/January, Module_talk:PinyinBopo-convert

I think it should be tone numbers, not tone marks - parameter "pint=" also in the entries, which should be used for sorting Mandarin entries as well. We have have only one "pint=" (numbered tones) per entry but multiple "pin=" (diacritic tones). Is it OK to have one Zhuyin per entry? --Anatoli (обсудить/вклад) 03:28, 24 January 2014 (UTC)

7 tests failed
At the moment, there are 7 failed tests, although expected=actual, except for one (zhong - ㄓㄨㄥ˙ - ㄓㄨ):


 * Běijīng	ㄅㄟˇ ㄐㄧㄥ	ㄅㄟˇ ㄐㄧㄥ
 * Zhōngguó	ㄓㄨㄥ ㄍㄨㄛˊ	ㄓㄨㄥ ㄍㄨㄛˊ
 * biān... biān...	ㄅㄧㄢ ... ㄅㄧㄢ ...	ㄅㄧㄢ ... ㄅㄧㄢ ...
 * bùyóu fēnshuō	ㄅㄨˋ ㄧㄡˊ ㄈㄣ ㄕㄨㄛ	ㄅㄨˋ ㄧㄡˊ ㄈㄣ ㄕㄨㄛ

And


 * Běijīng	bei3 jing1	bei3 jing1
 * Zhōngguó	zhong1 guo2	zhong1 guo2

Is anything wrong with the testing module? --Anatoli (обсудить/вклад) 12:32, 25 January 2014 (UTC)
 * There is — in that the testing module does not show whitespace properly. Keφr 12:39, 25 January 2014 (UTC)


 * Thank you for your help. Are you able to fix it? I can see spaces (U+0020) on both sides. In any cases, there should be only one space between Zhuyin syllables, even if Pinyin may have none or one space. I've added more cases for erhua but I'm not 100% sure about correctness, e.g. "wánr" should be "ㄨㄢˊㄦ" or "ㄨㄢㄦˊ". Will wait for some feedback. --Anatoli (обсудить/вклад) 03:22, 26 January 2014 (UTC)


 * My bad, I thought erhua in Zhuyin works analogously to Pinyin (-er fuses with syllable and comes before tone mark). The online dictionary by the Ministry of Education of Taiwan uses ㄨㄚˊㄦ without the neutral tone mark at the end - erhua fixed according to that. There is something weird (possibly space or line break) with the remaining error. Both 'zhong' -> 'zhong5' and 'zhong5' -> 'ㄓㄨㄥ˙' are converted correctly on their own, but the combination fails. Even works correctly (produces ). I'm not sure how to fix it. Wyang (talk) 23:03, 27 January 2014 (UTC)


 * (Before edit conflict) @Wyang. According to Pleco dictionary "wánr" and "dàir" are "ㄨㄢˊㄦ˙" and "ㄉㄞˋㄦ˙" (exactly), not "ㄨㄢˊㄦ" and "ㄉㄞˋㄦ", even though they can be misread as "wáner/wán'er" and "dàier/dài'er".  without a tone mark becomes "ēr".
 * (After edit conflict) I'm not sure myself. I remember seeing erhua in Taiwanese texts. It is used but much less often than in mainland China. I wonder if we can use Pleco approach - see my suggestion above, if is marked with a neutral tone marker, make it simply "-r".
 * Another "challenge" for you. The English word "or" (it's used in pin= parameter) should not be converted (see - "pin=lājī or lèsè"). It's easier to implement it here, than in templates. Delinking is not a problem. I've made s simple Template:Zhuyin template. Please check. --Anatoli (обсудить/вклад) 23:16, 27 January 2014 (UTC)


 * I'm leaning towards the Ministry of Education scheme. Erhuaed syllables are monosyllables essentially, plus other resources I found appear to be using the same rules too. Fixed for the second one - " lājī or lèsè" or "lājī or lèsè become . Wyang (talk) 01:06, 28 January 2014 (UTC)
 * Cool, if you find some definite rules/examples, could you post here? --Anatoli (обсудить

/вклад) 01:15, 28 January 2014 (UTC)


 * Testing Template:Zhuyin with "lājī or lèsè:.
 * Good but maybe "or" shouldn't be italicised (could used the reverse italicisation, so that it appears normal)! Template:Zhuyin puts extra white lines, do you think you could fix it, please? --Anatoli (обсудить/вклад) 01:20, 28 January 2014 (UTC)


 * OK, just thought italics perhaps looked better. Rules: . I am not sure about the line thing. I think it is the same problem as the remaining 'zhong' error. Wyang (talk) 01:31, 28 January 2014 (UTC)
 * Just a single with no tone mark? Okey, question: does a full syllable "ēr" (1st tone) or "er" (neutral tone) exist in Mandarin (in case we have a syllable, which should be "ēr" or "er", not "-r"? I know "ěr", "ér" and "èr" do. See er - is this correct? Do characters EVER have full syllable "er" reading - 儳, 兒? --Anatoli (обсудить/вклад) 01:45, 28 January 2014 (UTC)


 * Yes. 'ēr' and 'er' don't exist, so there is no confusion. Zhuyin has to have syllables separated by ' ' anyway, since non-separation results in ambiguity (eg. ㄧㄢ 'yān' = ㄧ 'yī' + ㄢ 'ān'). Wyang (talk) 01:53, 28 January 2014 (UTC)


 * I see, great, no confusion. So, no mark for, if it's "-r"? BTW, what's the exact rule about the position of the neutral tone mark ? Some resources use it in FRONT of syllables and some at the BACK. Can it be both? Should we change and put it at the front? --Anatoli (обсудить/вклад) 01:59, 28 January 2014 (UTC)


 * I don't know. Resources are inconsistent wrt this. I would leave it as it is, since it is ... less complicated ... :) Wyang (talk) 02:20, 28 January 2014 (UTC)


 * No errors now. Wyang (talk) 03:27, 28 January 2014 (UTC)
 * Thank you! Do I need to add a comprehensive test? I can't think of anything missing. If you are confident, we can focus on making Pinyin entries display Zhuyin in brackets, which is simpler and then move on to headword templates. We can try and get help again. --Anatoli (обсудить/вклад) 03:35, 28 January 2014 (UTC)


 * Yeah, a comprehensive test would be good, if that can be done. I don't really have an opinion on the latter, except that both Pinyin and Zhuyin should be de-linked and disallowed as page titles in these templates. Wyang (talk) 03:45, 28 January 2014 (UTC)
 * OK, I promise that I won't try to make Zhuyin linked, nobody has an intention to have entries for them, there's no-one to maintain them. Can you still help and get display? It'll become a bit more useful, more than just a soft redirect. If you really hate it, could you try a Chinese headword template to call ?
 * Are differences in Pinyin for mainland China and Taiwan a problem, e.g. "shénme"/"shénmo"? Alternatives pronunciations are separated by "or" or commas but not sure if such differences may cause problems with the conversion. I'll think of a more comprehensive test. --Anatoli (обсудить/вклад) 04:53, 28 January 2014 (UTC)


 * I've edited Template:cmn-pinyin to include Zhuyin. Other Chinese headword templates are protected, but it's probably just an extra parameter in Template:head (although I don't know whether it can be de-linked in that template). Other annotations are potentially troublesome. Maybe we can write a script to do simple Pinyin-Zhuyin if pin= contains no '[' or ']', and only convert what is inside [ ... ] if it does. Wyang (talk) 12:02, 28 January 2014 (UTC)


 * Thanks. I saw your edit and I managed to unlink Zhuyin with "#invoke:links|remove_links". See Zhōngwén and . It seems to have worked. Please tell me, which template you want to try first and I'll unprotect it. I suggest to experiment with something like, it's not protected, actually. --Anatoli (обсудить/вклад) 12:10, 28 January 2014 (UTC)
 * has just made it better. You have already added handling for "or" and the delinking is available. --Anatoli (обсудить/вклад) 12:38, 28 January 2014 (UTC)

dì'èr shǒu
I've just added dì'èr shǒu. The entry looks "ㄉㄧˋ 'ㄜˋㄦ ㄕㄡˇ", the test case looks OK - "ㄉㄧˋ ㄦˋ ㄕㄡˇ" but I used double quotes. Apostrophes should be stripped before conversion. --Anatoli (обсудить/вклад) 13:12, 28 January 2014 (UTC)