Module talk:Jpan-headword

Split by part of speech
Most headword modules have separate functions for each part of speech. Why wasn't that done here too? 21:36, 26 December 2013 (UTC)
 * It isn't? It has separate functions under pos_functions["nouns"],  pos_functions["adjectives"] , and pos_functions["verbs"] .  The other parts of speech behave the same way and don't inflect or conjugate so (AFAIK) there's no reason to split them.  If you know a better way to write this module and want to revise it, please feel free.  I'm not a Lua expert and I wrote this module because nobody else had done it, and the #invoke calls in the templates got unwieldy.  Thanks for the most recent edit btw. Haplogy (話) 01:59, 27 December 2013 (UTC)
 * You're right, they are split. I'm just confused by the size of all the code that handles different scripts, it seems like it does it depending on the POS so it made it look like there was no split. I don't really understand what it's for, but why is so much code needed for that? 02:21, 27 December 2013 (UTC)
 * There is partly the fact that I started using Lua at about the same time that I wrote that, so there may be a better way to write it. The part for accelerated entry creation for hiragana entries is split based on POS and there's probably a better way to pass the pos to Conrad.Irwin's Javascript but I did it that way because that was how I knew how to do it at the time.  The code for -suru verbs also makes it bulkier.  Maybe that should be split into a different function.  I admit the code written to handle -suru verbs is ugly. -suru verbs are unique because the "suru" has to be added to each kana or romaji form at the end.  The romanization code in the main function also tests for POS because it does the same trick for romanization for those parts of speech.  Haplogy (話) 12:31, 27 December 2013 (UTC)

Line 13
The CJK character 一 (one) is included in the hiragana checklist. Why so? --kc_kennylau (talk) 13:17, 5 June 2014 (UTC)
 * I think it was meant to be ー (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK). — Keφr 13:55, 5 June 2014 (UTC)

romaji for kanji+kana entries
Entries like 足りる, 死ぬ and 居眠り show extra incorrect romaji in the headword such as "romaji 死nu". How can we remove these? In  there are the lines below that look wrong to me, but I'm not confident enough to fix myself. if mw.ustring.gsub(PAGENAME,kanapattern,) ==  then if #allkana == 0 then table.insert(allkana, PAGENAME) end else table.insert(allkana, PAGENAME) end Can we just remove the "else" clause? Do you have any idea, Wyang, Kc kennylau? Whym (talk) 12:52, 18 July 2014 (UTC)


 * Fixed. Wyang (talk) 00:29, 21 July 2014 (UTC)
 * Thanks a lot! Whym (talk) 08:28, 22 July 2014 (UTC)

Multiple readings
Entries such as and  should not be in Category:Japanese words with multiple readings, but ja-headword seems to do that. How can we prevent this? (Sorry for the mess I created when I tried by myself) Whym (talk) 04:23, 26 July 2014 (UTC)

What about historical katakana?
Historical hiragana can be input using hhira. Can historical katakana also be made inputtable using hkata, please? — I.S.M.E.T.A. 21:32, 9 January 2015 (UTC)


 * ✅ —suzukaze (t・c) 02:30, 11 January 2016 (UTC)

Latin-script in entry spelling
The module doesn't accept spelling with Latin mixed in it. Can this be fixed? I tried doing this but this also resulted in  being interpreted as "". —suzukaze (t・c) 02:30, 11 January 2016 (UTC)
 * See, which uses or . --Anatoli T. (обсудить/вклад) 02:36, 11 January 2016 (UTC)

RFC discussion: August 2017

 * See Category talk:Japanese.

Type 1 verbs ending in -iru or -eru
Would it be possible or desirable to categorize godan verbs ending in -iru or -eru automatically? Many Japanese-language resources have such a list. --Dine2016 (talk) 09:58, 12 May 2018 (UTC)
 * Sure. I think works. —Suzukaze-c◇◇ 02:01, 14 May 2018 (UTC)

remodel after Russian format
Hi. During the next overhaul of the Japanese infrastructure, what about redesigning headword templates like this?

• (harau) trans cons (infinitive, past )

• (tokubetsu) na/no-adj (adnominal or, adverb )

Rationale:
 * 1) The current headword format describes both the word (lexical item) and the spelling, which makes maintainance harder. Moving spelling information to somewhere else (like this) would be more logical.
 * 2) Russian headword templates add accent marks to the lemma (page name); Arabic headword templates add the vowels. Japanese headword templates could correspondingly add the reading in furigana.

--Dine2016 (talk) 16:37, 21 December 2018 (UTC)


 * , I really like that you're having a close look at formatting and layout for JA entries. This is good work, and fresh eyes are a good thing.  Thank you!
 * That said, looking at the example lines above, I have concerns with the suggested reworking.
 * Excessive abbreviation.
 * I have to guess at what cons means, for instance. I see in the wikicode that this is using   tags, providing more detail on mouse-over.  However, this presents usability problems -- the user has to understand to mouse over in the first place, and mouse-over only works for mouse-equipped environments, which fails outright for touchscreens (no character formatting at all, and no pop-up or additional detail available).
 * Overly busy, with visually indistinct text.
 * There is simply too much on one line. I'd suggest leaving out the inflected forms, as those are already included in inflection tables.
 * The formatting conventions are also a bit muddled. The romanization and the explanatory text should have contrasting formatting, in keeping with existing JA entry formats.
 * The furigana add visual busy-ness that makes things harder to read, especially for beginners or at smaller display sizes. Also, especially on the headword line, they're unnecessary -- we specify the reading in romaji right there, and the kanji breakdown is also given using.
 * Regarding the rationale, I confess I don't understand your point about the current headword format describing the spelling. The headword is the spelling, no?  Could you clarify?
 * → I recognize that the above might sound very negative. Please bear in mind that I am very appreciative that you are looking into this and striking up a discussion.  I am not opposed to changing our layout for JA entries: when we do so (and I'm hoping we will), I want to make sure that our changes improve usability.
 * Separately, I quite like the proposed pronunciation section format in your sample entry for まっとう. My concerns there are mainly formatting: the bolding is a bit distracting, the red for "Phonetic kana" confusingly suggests a redlink, the use of color in the katakana is also confusing, and the encircled numbers to show pitch accent pattern are really hard to see.  I also really like the table at right to show the kanji -- presumably this would only go on wago entries?
 * Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 20:14, 21 December 2018 (UTC)
 * Thanks for your prompt reply.
 * Yes, the abbrevations are confusing. What about “ • (harau) (infinitive, past ); consonant base, transitive” like the Latin entry format?
 * I'm also ok with “Group I conjugation”, but godan makes no sense outside the context of Japanese school grammar. (Why didn't the Japanese use terms like カ行子音動詞, イ段母音動詞, or エウ段(交替)母音動詞 (出づ)?)
 * The format proposed above is parallel to many languages, including Russian, Arabic, Korean, and Tibetan, all of which consist of (1) the headword with some phonological specification, (2) a dot linking to the transliteration page, (3) transliteration in parentheses and not italicized, and (4) key inflections helping to identify the conjugation class. As for furigana, I think it's ok to leave them out in the case of Japanese because of . Other languages are not so fortunate.
 * I think it's common sense among linguists that the language is to a great extent independent from the writing system. Currently the headword format includes two types of information: one describes the word (“godan conjugation, transitive”) and is spelling-independent, the other describes a particular spelling of the word (“shinjitai, kyūjitai X”). In database theory, such a design fails to meet the, resulting in duplication of information and.
 * sample entry for まっとう: the pronunciation section format is based on . The use of color in katakana indicates differences between 現代仮名遣い and pronunciation, although it was originally intended for cases like [コ ンニチ ワ ​  ] . Pronunciation rules like とう → トー are probably too trivial.
 * Again, thanks for your detailed reply! --Dine2016 (talk) 03:18, 22 December 2018 (UTC)


 * I like it.
 * infinitive and past forms: Why were these forms chosen in particular? (similarly, I've wondered why Daijirin sometimes lists the potential form.)
 * "cons": I'm more familiar with godan/ichidan. 🤷
 * : i blame browser developers for making abbr inaccessible 🤷 Perhaps they could also link to the Appendix page, like in your reply to Eirikr.
 * Unlike Eirikr, I don't think it's particularly busy. I like the idea of moving spelling details elsewhere.
 * —Suzukaze-c◇◇ 04:08, 22 December 2018 (UTC)
 * Thanks for your reply.
 * The “nonpast, infinitive, past” paradigm was inspired by dictionaries compiled by western missionaries such as 日葡辞書 and 和英語林集成, where verbs are listed under the infinitive followed by the nonpast and the past (e.g. kaki, u, aita ).
 * As for 大辞林, I guess it's because there is no way to fit the formation of the potential form into 学校文法's “動詞活用形＋助詞／助動詞” system of verb conjugation. If we do a proper analysis of Japanese verb morphology, it's easy to see that the potential suffix is -e- after consonant-base verbs: kak- (to write) → kake- (can write). [Three etymologies of the suffix have been proposed: from the passive suffix -are- after consonant-base verbs, from the auxiliary use of the verb e- (得), or by analogy of transitive-intransitive pairs.] In 学校文法, there is no way to posit the -e- (終止形 -eru) as a 助動詞, and the formation of the potential form for consonant-base verbs is instead described as a change in conjugational class (五段→下一段), obscuring its similarity to the formation of the passive form.
 * The reason I prefer linguistic terms such as “consonant-base” over “godan” should be pretty clear by now :p That said, “Group I” is probably more neutral.


 * --Dine2016 (talk) 08:07, 22 December 2018 (UTC)

I suggest Japanese headings for verbs should introduce transitive-intransitive pairs

 * For example, 上がる should introduce 上げる. There should be a template argument for this. This is how it is supposed to look like:
 * 上がる (intransitive, godan conjugation, hiragana あがる, rōmaji agaru, transitive 上げる)

Huhu9001 (talk)
 * It seems that with the current code it is difficult to insert anything after the "rōmaji" piece. Then maybe we can go like this:
 * 上がる (intransitive, godan conjugation, transitive 上げる, hiragana あがる, rōmaji agaru)


 * The code can be inserted to.

Huhu9001 (talk) 16:59, 28 December 2018 (UTC)


 * That ordering presents a usability issue -- it's confusing, as there's a completely different verb form stuck in the middle of information about the lemma form. I'm also concerned that this line is getting increasingly crowded.  While I wholly support including this information somewhere prominent, I don't think it makes sense in your second example above (where it's inserted in the middle).  ‑‑ Eiríkr Útlendi │Tala við mig 01:32, 8 January 2019 (UTC)


 * To Eiríkr Útlendi: How about calling them active-middle voice pairs instead? -- Huhu9001 (talk) 07:06, 10 January 2019 (UTC)
 * ??? I'm fine with calling them "transitive - intransitive", and in fact I think I prefer that, since it's probably the more well-known terminology among our readers.
 * I'm sorry if I was confusing earlier -- I'm fine with the label, it's simply the layout that I think might be problematic. If we are to include the trans / intrans in the verb line (which is somewhat worrisome due to how crowded the line itself is getting), then the paired form should come at the end of the line, as in your "this is how it is supposed to look like" example.  I think your "then maybe we can go like this" example is very confusing, because we have five data points, with the first two about the headword, then something about an entirely different term, then two more about the headword.  It breaks things up strangely.  ‑‑ Eiríkr Útlendi │Tala við mig 17:34, 10 January 2019 (UTC)
 * I think it's better to show the transitive-intransitive pairs in the "related terms" or "see also" section. Such pairs are unpredictable and sometimes one-to-many like のる→のせる・のす【乗】, おこす→おきる・おこる【起】. --Dine2016 (talk) 12:23, 11 January 2019 (UTC)
 * If we're going to add different verb forms to headword lines, priority should be given to inflected forms rather than transitive/intransitive pairs. --Dine2016 (talk) 04:53, 16 January 2020 (UTC)

hiragana entry linking to katakana in headword line and vice versa
とら and マジ are now broken --Dine2016 (talk) 11:00, 26 January 2020 (UTC)
 * A side effect of new hira-kata matching in mod:ja. I guess it is fixed now. -- Huhu9001 (talk) 11:36, 26 January 2020 (UTC)

用ゐる, 出づ
I have a question. What are we going to do with these entries? -- Huhu9001 (talk) 12:41, 27 January 2020 (UTC)
 * Just aligning the furigana with the base characters should be enough. There are printed Classical Japanese texts with modern kana over historical kana, e.g. . --Dine2016 (talk) 16:10, 27 January 2020 (UTC)


 * for and, how about a special value such as  ? Put a note on the module that this applies for   only. For , I tried to create a section on the module for the shimo nidan value but it may not take into account one-kanji classical verbs such as.
 * Classical verbs like these should have a stem, firsthand past ( -ki), and secondhand past ( -keri). ～ POKéTalker（═◉═） 01:39, 1 April 2020 (UTC)
 * 1. What about 用(もち)ゐる (modern kana もちいる) instead?
 * 2. I think it's more helpful to show the stem (ren'yōkei) and the attributive (rentaikei) of classical verbs. The past forms are built directly on the stem and may further inflect, and don't express perfective (nu/tsu) or stative (eri/tari), so they're less helpful. --Nyarukoseijin (talk) 02:50, 1 April 2020 (UTC)

Longer processing time
After recent changes, Lua is timing out in Grease pit/2013/February from &sect; Japanese headword-line templates down. At least it isn't happening in entries, but I wonder if there is a way to speed it up. — Eru·tuon 05:45, 7 February 2020 (UTC)
 * How is it now? -- Huhu9001 (talk) 05:59, 7 February 2020 (UTC)
 * Fixed! I don't know how moving  could do it, so maybe the page was timing out from an earlier edit in this module.... — Eru·tuon 06:18, 7 February 2020 (UTC)
 * No, it is not. I have some idea but I need to investigate it further before I can explain. -- Huhu9001 (talk)
 * Oh duh, you removed the ruby. That was what "fixed" it. — Eru·tuon 06:59, 7 February 2020 (UTC)
 * The ruby is back now. -- Huhu9001 (talk) 07:19, 7 February 2020 (UTC)
 * I have set a limit on the ruby algorithm. I think this would not happen again. -- Huhu9001 (talk) 08:51, 8 February 2020 (UTC)
 * Thanks for putting in a safeguard. — Eru·tuon 10:57, 8 February 2020 (UTC)

Avoiding excessive abbreviation
, could I ask you to revert your addition of abbreviated forms in ? Abbreviations make the entry harder to read and harder to understand. Meanwhile, we have plenty of screen real estate, and the wiki is not paper-based (see #4), so there's no real reason to abbreviate when the full word would fit just fine. Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 22:23, 9 March 2020 (UTC)

romaji
The romaji of is incorrect. By the way, I don't think it's necessary to link to the romaji entry. According to Eirikr, the romaji entries are created for users who can't type Japanese characters. There isn't useful stuff on them. --Nyarukoseijin (talk) 04:52, 14 March 2020 (UTC)
 * There are some romaji link supporters around. In fact I did not try to restore it before I saw someone other requesting this. Removing romaji links may need to go Beer Parlour first. -- Huhu9001 (talk) 05:01, 14 March 2020 (UTC)
 * Thanks. I don't think it's necessary to link to the romaji entries from headword lines. Other languages don't do it, and doing so just complicates the logic of Module:ja-headword. Users only need romaji entries to land on the lemma entry. They don't need to go from the lemma entry to the romaji.
 * I have no objection to romaji in the morpheme template, but I think they should also be in the pronunciation section for reference. Otherwise, there is no way to show which romaji corresponds to which pronunciation in entries like . This is especially important for historical romaji from the Jesuit publications. --Nyarukoseijin (talk) 05:24, 14 March 2020 (UTC)
 * re: other languages don't...: Gothic has romanized entries and links in the headword line: . —Suzukaze-c◇◇ 17:37, 16 March 2020 (UTC)
 * Adding  to カナディアンフットボール doesn't fix other entries. What was the old implementation like? It seems like any input overrides the page title (speaking in terms of how the page title is used in the new implementation) if the page title is pure kana? —Suzukaze-c◇◇ 17:37, 16 March 2020 (UTC)
 * You mean entries like アメリカンフットボール? -- Huhu9001 (talk) 03:14, 17 March 2020 (UTC)

Please stop adding percent signs to headword templates
There are many ways to improve the accuracy of furigana matching (i.e. which kana above which kanji), but manually adding percent signs to headword templates is not a good idea. First, soft-redirect templates expand the headword templates on a different page, and hardcoded percent signs may work on one page but fail on another. For example, once had, which worked on the three-kanji  page but caused an invisible module error on the two-kanji. Second, the same information can be deduced from the arguments to so adding percent signs is a waste of everyone'e effort. (How many pages do you have to do that to?) Proposed action: --Nyarukoseijin (talk) 05:19, 10 April 2020 (UTC)
 * 1) Stop adding percent signs to headword templates: modify Module:ja-headword to stop accepting percent signs in its kana arguments, and fix any page containing them.
 * 2) Implement accurate furigana matching in a way that doesn't require changes to mainspace entries: For headword templates, transclude the entire page and analyze the arguments to the  to recreate such percent sign info. (In case of multiple s, look for the one matching the current reading.) Any failures should be patched in the module to keep mainspace entries clean and encourage reuse. The code should probably be exported so other templates that do furigana matching  can use it.


 * Counter-proposal / workaround:
 * In and other templates that transclude a whole page, preprocess by stripping out the   symbols.
 * This has the benefit of working now, without requiring that potentially tons of other pages be edited. I suspect this may also be less expensive than parsing for the  arguments.  ‑‑ Eiríkr Útlendi │Tala við mig 15:29, 10 April 2020 (UTC)
 * ✅. However, manually added percent signs still have the problem of only working on the lemma entry. See for example. --Nyarukoseijin (talk) 05:55, 11 April 2020 (UTC)
 * There is a problem with okurigana. E.g., has a t:ja-kanjitab with "書(か)" and "取(と)". き and り are completely missing. -- Huhu9001 (talk) 04:00, 11 April 2020 (UTC)
 * Then you instead of the headword template. Note that if the kanjitab doesn't match the reading, the headword template should simply generate 書取(かきとり) instead of throwing an error. The kanjitab is intended to ensure optimal result. --Nyarukoseijin (talk) 05:25, 11 April 2020 (UTC)
 * I take "ensure optimal result" as "manual seperators are still needed". -- Huhu9001 (talk) 05:45, 11 April 2020 (UTC)
 * At some timepoint I thought you were pushing for the abolition of t:ja-kanjitab, so I have been trying to avoid doing anything related to this template. Have I misremembered? -- Huhu9001 (talk) 05:54, 11 April 2020 (UTC)
 * I proposed to replace with a template that does automatic morphological analysis. For example, given the title 科学小説, and the reading かがくしょうせつ (fetched by transcluding the whole entry), the template should be able to identify the four morphemes 科/か, 学/がく, 小/しょう and 説/せつ. Then the editor only needed to type  (which could be simplified to  if the reading was given as  ).
 * Such automation would not be possible before a large database of kanji readings like Module:ja/data/jouyou-yomi was built. So abolition of wouldn't happen very soon. But once it's abolished, you can use the database to do kanji-furigana matching directly, which avoids the need of transclusion. Also you will be able to avoid  and focus on real work. --Nyarukoseijin (talk) 08:51, 11 April 2020 (UTC)

extended shinjitai
I believe the current idea is to move shinjitai/kyujitai to ja-kanjitab. —Fish bowl (talk) 22:30, 31 March 2022 (UTC)
 * Better remove shinjitai and kyujitai from headword templates. Those are for inflected forms. -- Huhu9001 (talk) 02:49, 1 April 2022 (UTC)
 * I see, that makes sense. LittleWhole (talk) 23:47, 2 June 2022 (UTC)

Romanizing |hhira= with
Man I am still not convinced that this is a good idea. (I've thought about it a lot, but I can't remember the reasons well right now. some below:)


 * 1) loads of existing |hhira= for compound words without spacing; ugly
 * 2) same, but with sequences in compound words that should not be romanized as long vowels
 * 3) 歴史的仮名遣い is etymologically faithful but also formulated in the Meiji era
 * 4) Japanese people today read classical Japanese with modern pronunciation (従ふ shitagō)
 * 5)  -mode romanization is some arbitrary unsourced shit, same as "ancient" in ja-readings

—Fish bowl (talk) 04:38, 5 May 2022 (UTC)

FIXME 10 March 2023
Can you please explain why this (Special:Diff/71620155) is necessary? -- Huhu9001 (talk) 03:43, 12 March 2023 (UTC)
 * The addition of `no_redundant_head_cat` prevents entries from being added unnecessarily to Category:Japanese terms with redundant head parameter. See the Grease Pit discussion about this. The FIXME is just about better tracking of redundant `head=` parameters; it isn't pressing. Benwing2 (talk) 04:24, 12 March 2023 (UTC)
 * Damn typo. Benwing2 (talk) 04:25, 12 March 2023 (UTC)
 * Why should "head" be a list param? Or did you mean "hhira" and "hkata"? -- Huhu9001 (talk) 04:41, 12 March 2023 (UTC)
 * Almost all language headword modules support head2=, head3=, etc. because there may be some circumstances where this is needed, e.g. multiword terms where there may be more than one way of linking the terms, or various other complexities. It is more needed for languages with extra diacritics in the headwords (e.g. Russian, Latin) because there may be more than one way of adding the diacritics. Maybe it isn't needed for Japanese, although surely you need to have multiple hhira= and hkata= values occasionally? Benwing2 (talk) 04:50, 12 March 2023 (UTC)

|head= with kana does not have links anymore
as with クリアファイル &mdash; Category:Japanese terms with redundant head parameter —Fish bowl (talk) 09:16, 3 June 2023 (UTC)


 * special:diff/73299076 I thought this is much simpler so I made it this way. Do you want to change it back? -- Huhu9001 (talk) 13:46, 3 June 2023 (UTC)


 * That seems fine (the reduced redundancy is nice), as long as the existing entries can be fixed. —Fish bowl (talk) 23:53, 3 June 2023 (UTC)
 * I will try to work out a script to fix them. -- Huhu9001 (talk) 01:20, 4 June 2023 (UTC)

Verb conjugation is no longer displayed on kana page titles
Something within https://en.wiktionary.org/w/index.php?title=Module:Jpan-headword&diff=75379557&oldid=75165520 changed this; compare and. —Fish bowl (talk) 03:58, 14 August 2023 (UTC)


 * Fixed in ? —Fish bowl (talk) 18:44, 1 September 2023 (UTC)


 * Broken again. —Fish bowl (talk) 11:23, 24 December 2023 (UTC)

Hrkt-translit is called on all ja-see parameters
or something.

... is this necessary?

example: (Sandbox) triggers new tracking for attempts to romanize kanji (Hrkt-translit), because this line (Jpan-headword) tries to romanize the " readings "   and.

—Fish bowl (talk) 00:21, 10 December 2023 (UTC)