User talk:Benwing2

Archive

 * 2012-2019
 * 2020-2021
 * 2022
 * 2023

Catalan inflections
Hi Ben, any chance we could have automatic Catalan inflections? There's User:DTLHS/catalan bot requests, but it doesn't seem to be running very often, and it's tedious to add manually to a list. Jberkel 18:12, 11 December 2023 (UTC)
 * @Jberkel Yeah I have looked into this. The thing is that I'd probably have to rewrite Module:ca-verb to work like Module:es-verb or Module:pt-verb. The Spanish, Portuguese and Galician modules were all written mostly by me and implement JSON fetching of the inflections as well as es-verb form of and similar to automatically fetch the correct inflections for a given verb form. The former wouldn't be too hard to add to the existing module but the latter would be painful, and it would probably be better to rewrite the module instead. I have looked into doing this but I don't have that good a handle on Catalan verbs, esp. those in -er/-re. Do you have any good references that explain how Catalan verbs work, especially focusing on the -er/-re verbs, which is where the irregularities seem to be? The current module seems to push a lot of the complexity down into the template call, e.g. veure's invocation looks like this, which is a mess:

I'd want to have this stuff all in the module itself, similarly to what's being done for Spanish, Portuguese, Italian, French, etc. Benwing2 (talk) 23:15, 11 December 2023 (UTC)


 * Ok, thanks for looking into it, I sent you some reference material via email. Jberkel 09:01, 12 December 2023 (UTC)
 * @Jberkel Thanks, I received it. Benwing2 (talk) 21:12, 12 December 2023 (UTC)
 * @Jberkel I have a question, not sure if you know the answer. In -ar verbs whose root vowel is e or o, is that vowel pronounced è or é (or ò or ó for roots in o) in root-stressed forms (e.g. first-singular present indicative), or does it vary from verb to verb? In Proto-Romance it varied from verb to verb, and this is still the case in modern Italian. Spanish has a reflex of that in verbs that unexpectedly have ie or ue in root-stressed forms, but Portuguese has regularized the vowel quality (for example, using low-mid vowels in -ar verbs). I think in conservative varieties of Occitan at least, it varies from verb to verb, and this is reflected in the spelling. Benwing2 (talk) 08:41, 13 December 2023 (UTC)
 * Pinging from ca.wikt. Ultimateria (talk) 23:28, 13 December 2023 (UTC)
 * @Vriullop @Ultimateria It appears that it varies from verb to verb in Catalan, at least based on the two verbs pegar, which ca.wikt says has /ɛ/ in Central Catalan (consistent with its origin from Latin short ĭ), and membrar, which ca.wikt says has /e/ in Central Catalan (again, consistent with its origin from Latin short ĕ). But the situation is complicated by the dialects, where many dialects have /e/ for both verbs. I'm interested in finding a dictionary that indicates these vowel qualities so that maybe we can include them in the conjugation table, similarly to how the French and Italian conjugation tables give pronunciation; this would only be for Central Catalan for now (maybe forever), since the dialects are complicated. Benwing2 (talk) 00:19, 14 December 2023 (UTC)
 * BTW if what I've said is correct, where can I find in Catalan dictionaries the indication of how the stressed vowel is pronounced for a given verb? Benwing2 (talk) 05:32, 14 December 2023 (UTC)
 * For variation in dialects see the notation used with ca-IPA: ê for /ɛ/ in Central, /e/ in Valencian and /ə/ in Balearic. Similarly with ô, and è, é, ò, ó has no variations. This is fair consistent with few exceptions.
 * It is etymological, ê from Latin ĭ or ē, but with some exceptions.
 * The only dictionary that indicates the rhizotonic stress is the DNV, for example membrar says é, but it is only for Valencian and it could be either ê or é. It is only helpful for è and ò. I have not found any other source indicating systematically the rhizotonic stress, even the dictionary of pronunciation I have in my bookshelf only includes some paradigmatic verbs. Frankly, there are some verbs I don't know how they are pronounced, apart from my personal perception, not a good sample. The only clue is a noun related with the verb, and the etymology of inherited ones. On ca.wikt I include a rhizotonic parameter verb by verb with ca-IPA notation. Vriullop (talk) 09:25, 14 December 2023 (UTC)
 * @Vriullop Thank you! I wonder why Catalan dictionaries are so bad at including the rhizotonic vowel quality patterns. Pretty much all monolingual Italian dictionaries list the rhizotonic quality (and position) for all verbs. What about the pronunciation of other forms, such as verbs with pres 3s in -ou or -eu? Are there any dictionaries indicating the vowel quality of these and other endings? Thanks for any help you can give. Benwing2 (talk) 09:57, 14 December 2023 (UTC)
 * I'm not sure what you mean, 'mou' from 'moure' and 'veu' from 'veure' have the same stress that the infinitive.
 * Endings that may be ambiguous, without any graphic accent:
 * -em, -eu, as in cantem, canteu, cantarem, cantareu: ê
 * -essis, -essin, as in cantessis, cantessin: é
 * -eres, -eren, as in temeres, temeren: é
 * infix -eix- (-eixo, -eixes, -eix, -eixen, -eixi, -eixis, -eixin): ê, but not used in Valencian that change to -ix-
 * This is a summary from different sources, coherent with the etymology. Vriullop (talk) 12:38, 14 December 2023 (UTC)
 * @Vriullop OK thanks, I suppose that the DCVB dictionary gives the infinitive pronunciation of words like moure. This is very helpful; if I have other questions I'll let you know. Benwing2 (talk) 19:55, 14 December 2023 (UTC)
 * DCVB is fine for pronunciation, but in some cases is not complete or confuse. If necessary, you can compare it with the GDLC in the link "francès" that includes translation ca-fr and also pronunciation in Central Catalan, and the DNV for Valencian. Vriullop (talk) 20:57, 14 December 2023 (UTC)
 * @Vriullop Thanks! Benwing2 (talk) 21:41, 14 December 2023 (UTC)
 * @Jberkel I wrote a preliminary Catalan conjugation module; see User:Benwing2/test-ca-conj for examples. It has a few bugs in it that I'm working out, but it's close. Benwing2 (talk) 22:13, 17 December 2023 (UTC)
 * Already looking good, thanks for working on this! Jberkel 22:26, 17 December 2023 (UTC)

Pronunciation of feu is correct, 2n pl. regular with -eu, and the irregular past was spelled in pre-2016 orthography which is more helpful.

The pattern /e/ in Central and /ɛ/ in Valencian is possible, but rare. It can appear for different reasons: The DCVB indicates these local details. In this case I trust the GDLC more. The DCVB comes from fieldwork in the 1920s. Some of the pronunciations have not been registered in other late 20th c. fieldwork. The GDLC compiles the pronunciation of the main reference work used for radio and TV speakers in Central formal speech. In short, this pattern is rare in formal pronunciation. As far as I can remember, it doesn't happen with verb forms, and it can be treated like other irregular cases that do not follow an expected pattern. --Vriullop (talk) 18:00, 19 December 2023 (UTC)
 * Pronunciation of stressed e is not as uniform in Central Catalan as in other dialects. For example, some word can be /e/ in Barcelona and /ɛ/ in Girona or vice versa. In general, one of the two is considered formal and the other local or dialectal. The formal one is usually the expected one or the same as in Valencian and Balearic.
 * Recent loanwords may have hesitations in their adaptation. They are usually adapted with è, but with é for the Spanish ones.


 * Although the /e/-/ɛ/ pattern above is rare, the other way is more common: /ɛ/ in Central and Balearic, /e/ in Valencian. This is noted on cawikt as ë (double e), a variant of ê (triple e). Stressed schwa in Balearic is used in inherited words and inflections. In cultisms or loanwords (i.e. ), or just words perceived as literary (i.e. ), instead of schwa it is /ɛ/ as in Central. There are indeed verb forms with rhizotonic vowel ë. There is no equivalent with stressed o, but for consistency it could be noted ö (double o) instead of ô. Vriullop (talk) 08:02, 21 December 2023 (UTC)
 * Thanks for all your help. I have implemented ë in Module:ca-IPA. Can you help me by fixing the default rules in the module that currently default to ê to instead default to ë when it's correct? For example, cens defaults to cêns when it should be cëns. This is in the mid_vowel_e function of Module:ca-IPA. I don't know Catalan well enough to fix it myself, and the corresponding cawikt module in ca:Module:ca-pron/AFI seems to have the same rules we currently have. Benwing2 (talk) 20:49, 21 December 2023 (UTC)
 * As stressed schwa depends on inherited v. cultism, there is too much variation with -ens, -ena, -enes endings to be able to redefine the rule. I have added a tracking and I have checked where it was being applied by default. After adding hint ê or ë, I think it is safer to remove this rule: Special:WhatLinksHere/Template:tracking/ca-IPA/ens-ena-enes. Later, I'll look other rules with default ê. Vriullop (talk) 09:19, 22 December 2023 (UTC)
 * @Vriullop Thank you. I agree about removing the rule. In general I'm not much in favor of rules like this that are wrong a significant fraction of the time, and prefer to be explicit except when it's nearly completely predictable. Benwing2 (talk) 11:06, 22 December 2023 (UTC)
 * @Vriullop I just discovered that cerndre is irregularly missing the first r in pronunciation. Does this carry through to inflected forms like cerno, cerns or are they pronounced regularly with /r/? Benwing2 (talk) 03:05, 24 December 2023 (UTC)
 * BTW there is a bug in cawikt's handling of Balearic pronunciation with ; hard /k/ shows up as /c/ in the first of two alternants. See cerca for an example. Benwing2 (talk) 03:08, 24 December 2023 (UTC)
 * @Vriullop OK, I have several more questions. I'll try to list them all here and avoid pinging you individually.
 * cors "privateering campaign" and cors "Corsican" are given without the /r/ in Eastern Catalan pronunciation both here and in cawikt. However, GDLC says /kórs/ for the former and /kɔ́rs/ for the latter. Which is correct, and if the /r/ is correct, do we need to update Module:ca-IPA?
 * I am going through mid-vowel verbs trying to update the inflected forms to have the correct vowels. I am probably going to implement something soon in ca-conj and/or ca-verb to let you specify the mid-vowel quality and display it, similar to what cawikt does. I cannot determine the vowel quality of the following verbs so far: cessar, conrar, copar, copsar, crepar, dopar, drenar, gestar. Can you help?
 * I am going to update Module:ca-IPA so you can individually specify the pronunciation of different dialects, as I have found some need for this. Apropos of this, I notice that the cawikt version of ca-pron supports ; do you think we should support this, or just use the per-dialect support I am going to add?
 * Also, I'm more and more convinced that we should have few default rules for mid-vowel quality, and require it to be given explicitly in all cases that don't involve a well-known affix.
 * fossa "pit, grave, etc.": does it have /o/ [per GDLC] or /ɔ/ [per DNV, DCVB and cawikt]?
 * llei "law": does it have /e/ or /ɛ/ in Eastern Catalan, or some complex mixture? cawikt says /ɛ/, GDLC says /e/, DCVB says a complex mixture.
 * Thanks for your help, Benwing2 (talk) 06:41, 24 December 2023 (UTC)
 * Lot of stuff here, but I'm happy to help.
 * 'Cerndre' losts first r when followed by sequence -ndr-. That is infinitive, future and conditional. All other forms have regular pronunciation. This happens also with and derived verbs. See ca:Categoria:Rimes en català -ɛndɾe including 14 verbs ending with -prendre. Sequence -rndr- only occurs in 'cerndre' and there is not any other term with sequence -rendr- other than these 14 verbs.
 * /c/ in Mallorcan is an allophone of /k/, i.e. local pronunciation [məˈʎɔ̞ɾ.ca̟]. You're right, this is phonological and not phonemic. Catalan works often include some phonological symbols in phonemic representations for dialectal contrast, but this is not the case of [c] with restricted use. I plan to remove it for being misleading.
 * 'Cors' fixed on cawikt. This r is really retained, respelled 'corrs'. The module should not assume the lost of -r(s) in final coda for monosyllables. While most polysyllables do, most monosyllables don't. The problem is how to manage that.
 * My guest on rhizotonic vowels:
 * cessar: é; inherited from Latin ě not followed by an opening context, and DNV é.
 * conrar: ó; from unstressed o, reduction of conrear, DNV ó.
 * copar: ó; from French /u/ and analogous to noun, DNV ó.
 * copsar: ó; inherited from Latin ǔ, DNV ó.
 * crepar etym 1: ë; as noun from the same French root, neologism not attested in Balearic, DNV é.
 * crepar etym 2: é; from Latin ě, only used in Balearic.
 * dopar: ó; neologism as in Spanish, close to the English original, DNV ó.
 * drenar é; idem.
 * gestar: é; from Latin ě, as the noun from the same root, DNV é.
 * Notation ẽ is hardly used. It is better to fix that with parameters per-dialect: ca:Special:Diff/2245937. I'll remove it on cawikt.
 * Some rules for mid-vowels are theoretically justified. I have this pending to review the unwanted side effects. I agree that it shouldn't lead to erroneous results.
 * Fossa should be ò from Latin ǒ, but there have been some modern changes during the 20th c. that I am still unable to explain. The DCVB shows the situation in the first third of the 20th c. in accordance with etymology. Probably in Central today is hesitant. In this case, I would say ó in Central and ò in Balearic and Valencian, two dialects more conservative.
 * Llei fixed on cawikt. From Latin ē it should be ê, but the diphthong has changed it: é in most Central, retained è in northern Central, /ə/ in Balearic, é in Valencian.
 * Vriullop (talk) 18:27, 26 December 2023 (UTC)
 * @Vriullop Thank you! I have applied the changes offline to the specific verbs and other words mentioned above, and I will push them soon. Still working on Module:ca-IPA. A few more questions:
 * More verbs where I'm not sure of the rhizotonic vowel quality: menar "to lead" (is this ê?), menjar "to eat" (apparently it uses now-deprecated ẽ?), mentir "to lie" (?), molar "to mock" (from Spanish; ó?).
 * mesa "altar, mense, table": cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/. Mistake?
 * messes "harvest time": again, cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/.
 * Benwing2 (talk) 05:46, 27 December 2023 (UTC)
 * 'menar': ê.
 * 'menjar': é but Balearic ə. I'll modify the rizo parameter to accept an explicit /e/, /ə/, only used here.
 * 'mentir': é in forms without -eix-.
 * 'molar', to rock, from Spanish: ó.
 * 'mesa' as a noun has two etyms with different pronunciations, but GDLC only show one in translations. Here DCVB is correct.
 * 'messes', I would say é but irregular è in Central.
 * Vriullop (talk) 09:47, 27 December 2023 (UTC)
 * @Vriullop Thanks for your quick response! I have made the offline updates. Some more questions (for N and O) ...
 * noble: I already pinged you about this. DNV says /o/ for Valencian but DCVB says /ɔ/.
 * nombre: cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
 * odre: Same. cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
 * ofi "office": Vowel quality? Maybe /o/ since the o is unstressed in oficina?
 * oi: DCVB splits the interjection into /ɔj/ "yes" from Latin hoc and /oj/ (expression of pain or surprise). GDLC and DNV group these two meanings and say the pronun for both is /ɔj/. Who is right?
 * orla "border, fringe": DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/.
 * oro "suit in a Spanish deck or cards": Same as previous: DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/. (Not in GDLC.)
 * Benwing2 (talk) 01:03, 28 December 2023 (UTC)
 * For P:
 * peli "film" (clipping of pel·lícula): cawikt says pel·li has ê, so I assume this is the same, but it seems strange to have ê for a recent coinage.
 * perca "perch (fish)": cawikt says /ɛ/ for Valencian but DNV says /e/. DCVB doesn't give a pronunciation.
 * pesta "plague": cawikt and DCVB say /ɛ/ for Central but GDLC says /e/ (mistake?).
 * pleca "vertical bar": Balearic vowel? Is it ê?
 * poblar "to populate": DNV says stressed vowel is /o/ despite poble having /ɔ/. Mistake?
 * porro "leek; spliff": cawikt and DCVB say /ɔ/ but both GDLC and DNV say /o/.
 * posa "pose" (not in cawikt): GDLC says /o/ despite this being derived from posar, which has /ɔ/. (Are there two different pronuns/etyms here?)
 * postres "dessert": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
 * pregar "to pray": Presumably /e/ (same as prec)?
 * Benwing2 (talk) 05:23, 28 December 2023 (UTC)
 * For P:
 * 'peli' is an informal spelling of 'pel·li'. The latter is used in the press and has been consolidated, unlike other clippings. I spontaneously pronounce it è just like any word beginning with consonant + stressed e + l, including inherited ones from Latin both ě and ē. Being of general use and not exclusively colloquial, I would say ê, fully adapted in Central and the same value as unstressed in Balearic and Valencian.
 * 'perca': ë. Expected é but è per context C+ě+r, not fully changed in learned borrowings.
 * 'pesta' is weird, expected é but with some irregular è not enough explained in context C+ě+s. From the sources, è but irregular é in Central, although the irregularity is the other way around.
 * 'pleca': ë, as a technical word, schwa is improbable in Balearic.
 * 'poblar': I can't find any explanation for the difference between 'poble' and 'pobla'. Without any confirmation, for now I would say ò.
 * 'porro': ó. Expected ò but usually changes to ó before -rr-.
 * 'posa': noun ó and verb ò. Expected ò both from 'pausa' and 'pausare', but most current senses of the noun are calques of French or Spanish, both ó.
 * 'pregar': é.
 * Vriullop (talk) 13:30, 29 December 2023 (UTC)
 * On cawikt the pronunciation was first added according to DCVB. Revision with GDLC is partial, not completed. Inclusion of pronunciation on DNV is recent, not yet checked. Your guesses are usually correct.
 * For N and O:
 * 'noble': ô. Expected ó, on first syllable changed to ò per consonant context, except on areas with Mozarabic influence as in Valencian.
 * 'nombre': ô. The same case, but I trust DCVB for Balearic with irregular ó.
 * 'odre': ô, but Balearic ó.
 * 'ofi', I've never heard it in Catalan. My guess is ó either from an unstressed vowel or from Spanish.
 * 'oi' both ò and ó. I trust DCVB with three groups, the last one used specially in Balearic. The two authors of the DCVB were Balearic, and both 'oi las' (surprise) and 'ois' (moans) result familiar to me heard from Balearic people. Probably outside the Balearic Islands people don't care about the difference with barely used senses.
 * 'orla': ô. Again, an expected ó changed to ò except in Valencian, confirmed in descriptive works.
 * 'oro': ô, hesitant by analogy with inherited 'or'.
 * Vriullop (talk) 15:32, 28 December 2023 (UTC)

For R: Benwing2 (talk) 08:37, 28 December 2023 (UTC)
 * 1) reble "rubble": cawikt and DCVB say ê, but GDLC says /e/ for Central Catalan.
 * 2) recar "to regret": DNV says /e/; DCVB suggests /e/ everywhere, is that right?
 * 3) regar "to water": Etymologically should be ê, is that right? (OTOH reg has /e/ everywhere per GDLC and DNV)
 * 4) regna, regne, regnar: These seem to have [ŋn]. Do all words in -gn- have this? If so we should fix Module:ca-IPA to do this automatically. (Is this Eastern Catalan only? Valencian seems to have [gn].)
 * 5) reptar: In the meaning "to reprimand; to challenge" it seems to have rhizotonic /e/. In the meaning "to crawl" I am not sure.
 * 6) resar "to pray": Since this is a Spanish borrowing, does it have /e/? res "prayer" seems to have /e/.
 * 7) retre "to give back, to return": cawikt and DCVB say /e/ in Eastern Catalan but GDLC says /ɛ/.
 * 8) rosca "screw thread": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
 * 9) rosta "fried bacon, fried bread": cawikt says /ɔ/ for both Eastern and Western; DNV says /ɔ/ for Valencian but GDLC says /o/. DCVB has /ɔ/ and /o/ dialectally.
 * 10) rosta (feminine of rost "steep"): Same. cawikt says /ɔ/ for all, DNV says /ɔ/ but GDLC says /o/. Here, DCVB has only /ɔ/.
 * 11) rotar: Two etyms: (1) "to belch": Does it have /o/ like rot "belch"? (2) "to rotate": Does it have /o/ because it's borrowed from Spanish?
 * 12) rotllo: "roll; annoyance": DNV says it has /o/ but rotlle has /ɔ/. Mistake? cawikt and DCVB say forms have /ɔ/ everywhere, and GDLC agrees that both forms have /ɔ/ in Central Catalan. Note also rotlo, where again DNV has /o/; here again, DCVB says /ɔ/ everywhere but in this case cawikt says uses ô to get /o/ in Valencian.


 * @Vriullop Thanks again for your detailed responses, I really appreciate the work you're putting into the responses. Issues I found involving terms with S:
 * seca "mint": GDLC says /ɛ/, DNV says /e/ and cawikt says ê, which are all compatible, but DCVB says /ɛ/ everywhere. In this case I wonder if DCVB is actually correct while both DNV and cawikt are mistaken.
 * sedar "to sedate": DNV says /e/ for root vowel but unknown in Central Catalan.
 * sense "without": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
 * sentir "to feel": DNV says /e/ root vowel. No dictionary attests the Central Catalan root quality, although /e/ is expected.
 * serva "serviceberry": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
 * setge "siege; figwort": cawikt and DCVB say é, DNV says /e/ but GDLC says /ɛ/.
 * soga "rope": DNV and GDLC both say /ɔ/ but DCVB says variously /o/ or /ɔ/ for a bunch of obscure places that I'm not familiar with but seem mostly Northwest Catalonian. I assume Balearic must have /ɔ/ but not sure.
 * sonso "clumsy, gauche": cawikt and DCVB say /o/ for both East and West; DNV agrees with /o/, but GLDC says /ɔ/ for Central Catalan. Maybe this is a case of changing over the last century?
 * sorna "sarcasm": cawikt says ô, but both DNV and GDLC say /o/. DCVB doesn't give pronun.
 * sosa "saltwort, soda ash": cawikt and DCVB say ô, but both DNV and GDLC say /o/.
 * sostre "ceiling": cawikt says ó and DNV says /o/, but GDLC says /ɔ/. DCVB maybe has the real story: /ɔ/ in Barcelona, /o/ elsewhere. I'm going with the idea that Western Catalan (Northwestern and Valencian) have /o/, while Central has /ɔ/ and Balearic has /o/. Correct?
 * sotjar "to spy on": DNV says /o/ root vowel. No dictionary attests the Central Catalan root quality, but I am guessing /o/ based on the proposed etymologies. Correct?
 * Note that I'm now 87% through the set of 2,722 terms that I identified for auditing the mid-vowel quality, and have finished with S. T represents about 7% of the total, V represents 4-5%, and the remaining letters around 1%. So I'm quite close to finishing, with lots of help from you :) ... Benwing2 (talk) 09:20, 30 December 2023 (UTC)
 * For S:
 * seca: I think the correct one is ë, although I'm not sure about its evolution from Arabic.
 * sedar: expected ê from Latin sēdō.
 * sense and sens: expected ê, but such words often used as proclitics tend to become closed. So é but schwa in Balearic.
 * sentir: é as expected.
 * serva: ê is correct. As in other similar cases, the GDLC does not distinguish properly different pronunciations from different etyms.
 * setge: expected é, but è in Central per context subject to openness.
 * soga: ò in general. It was identified by Coromines in a handful of about 40 words that have changed an etymological ó by ò except in some specific areas. It is known as the Coromines law, and it is still unknown why it includes certain words and not others.
 * sonso: ó but ò in Central, for unknown reason to me.
 * sorna: ó in general.
 * sosa: ó in general.
 * sostre: it is one of the Coromines law, expected ó changed to ò. This law may have various degrees of extension. Probably most conservatives areas, Balearic and Valencian, maintain the old ó, while most Central has changed to ò. Usually Northwestern also changes by Central attraction, to be confirmed.
 * sotjar: not sure, but ó is the best guess.
 * Vriullop (talk) 08:31, 5 January 2024 (UTC)
 * For R:
 * reble: expected é. The DCVB with ê seems by analogy with other words. I would say é but with an irregular ə in Balearic.
 * recar: é as expected from an earlier 'a'.
 * regar: ê as expected. Nouns 'rec' and 'reg' are interrelated and are not a good indicator for the verb.
 * All -gn- between vowels are pronounced [ŋn]. Also -n- followed by /k/ or /ɡ/, but this one was reverted per no phonemic.
 * reptar: é from Latin rěp(u)tō and ê from rēptō.
 * resar: é as noun 'res'.
 * retre: I really don't know which process applies here. By now I'd say ë, pending of confirmation.
 * rosca: ô.
 * rosta, as a slice of bacon usually fried with bread is a typical dish of the Pyrenees. Although it is the feminine form of 'rost', from the old sense "roasted", in the Pyrenees this ò usually changes to ó. In the DCVB, I read that the northernmost localities say ó, and ò it is quite far from the Pyrenees. In short, as a noun ó in Central, ò in Valencian and Balearic. As an adjective form: ò, although the GDLC does not separate it properly.
 * rotar: ó for both etyms.
 * rotllo, what a mess! It is not attested in Valencian until recent times, probably from Spanish . This ó is archaic, not accepted in other areas where it is used from Old Catalan. 'Rotlle' is the inherited form, hardly used in Valencian where it is preferred the spelling 'rotle', both ò. 'Rotlo' is only used in Balearic, for me it is anecdotal how to try to pronounce it by outsiders with a range of alternatives spellings.
 * Vriullop (talk) 11:29, 4 January 2024 (UTC)
 * @Vriullop Thank you again! BTW I have gone through and added (offline) stressed root vowels to all enwikt Catalan verbs with e or o where I could determine it, using some combination of cawikt, DNV, GDLC and DCVB. (It looks like I was able to figure out the vowel for 1,174 verbs in -ar, 33 verbs in -ir and all relevant verbs in -re and -er, and only couldn't figure out the vowel for 72 verbs in -ar and 2 verbs in -ir.) I am mostly done coding the changes I want to make to Module:ca-IPA and I'll use the new code to support displaying the root vowel info. I'll post the list of undetermined verbs soon. Benwing2 (talk) 19:55, 4 January 2024 (UTC)
 * BTW I have finished the changes to Module:ca-IPA and Module:ca-headword and pushed all the root vowel additions. You can see them in action e.g. in flirtejar, besar, adreçar, annexar and several others. Benwing2 (talk) 07:45, 5 January 2024 (UTC)
 * Also, I added tracking for all terms with defaulted mid vowel quality, with the plan of removing some of the defaults. The first word I looked at, for example, is amulet, a recent borrowing that claims to have ê, which seems unlikely. Benwing2 (talk) 08:07, 5 January 2024 (UTC)
 * Here is the list of now 68 -ar verbs where I couldn't identify the Central Catalan root vowel (sometimes only in one etymology out of several): afogar, agregar, al·legar, alterar, amonestar, ancorar, atemptar, celebrar, col·laborar, commemorar, compensar, condensar, confessar, congregar, conrear, contemplar, crebar, delegar, denegar, depredar, desagregar, desintegrar, deteriorar, devorar, discrepar, dreçar, dropar, edulcorar, elaborar, elevar, encetar, engegar, enllumenar, ennuegar, ensopegar, entaforar, entollar, entrenar, esborrar, esbotzar, esmicolar, espitregar, esverar, evaporar, exacerbar, expectorar, explorar, gofrar, impetrar, increpar, integrar, interpretar, isolar, laborar, negar, perforar, prolongar, rememorar, retolar, rosegar, secretar, segregar, somorgollar, temptar, tomar, trafegar, trepar, trepollar. Benwing2 (talk) 08:12, 5 January 2024 (UTC)
 * In some cases I can't be completely sure, these are my best guesses: afogar ó, agregar é, al·legar ê, alterar é, amonestar é, ancorar ó, atemptar é, celebrar é, col·laborar ó, commemorar ô, compensar ê, condensar ê, confessar é, congregar é, conrear ë, contemplar é, crebar é, delegar é, denegar é, depredar é, desagregar é, desintegrar é, deteriorar ó, devorar ô, discrepar é, dreçar ë, dropar ó, edulcorar ô, elaborar ó, elevar é, encetar é, engegar é, enllumenar ê, ennuegar ë, ensopegar ê, entaforar ó, entollar ò (both), entrenar é, esborrar ó, esbotzar ó, esmicolar ô, espitregar ë, esverar é, evaporar ó, exacerbar é, expectorar ó, explorar ó, gofrar ó, impetrar é, increpar é, integrar é, interpretar é, isolar ô, laborar ó, negar é (both), perforar ó, prolongar ó, rememorar ó, retolar ó, rosegar ê, secretar ë, segregar é, somorgollar ó, temptar é, tomar ó, trafegar ê, trepar é, trepollar ó. Vriullop (talk) 08:23, 10 January 2024 (UTC)
 * Reviewing mid-vowel defaults tracked:
 * e/u: doesn't make any sense, probably it was intended for a diphthong -eu-.
 * o/u: also nonsense.
 * e/ct-cts-cts-ctes: too many variations è with cases of é only in Central.
 * e/dre-dres: mostly ë instead of é.
 * e/final-l: it is stable but needs to exclude -ell(s).
 * e/l-ls-ll: it's ok, I haven't found any problem.
 * e/ma-mes: too many variations
 * e/ens-ena-enes: too many variations ê/ë
 * e/nse-nses: it doesn't worth for a few words
 * e/nt-nts: mostly é with few exceptions, widely used
 * e/r-rs-ra-res: too many variations é/ê
 * e/rC: it's ok
 * e/sos-sa-ses: it's ok
 * e/t-ts-ta-tes: too many variations
 * è/s-blank: FIXME only in last syllable stressed, currently includes, , ...
 * o/r-rs-ra-res: too many variations
 * Vriullop (talk) 09:20, 8 January 2024 (UTC)

I have finished everything up through T and pushed the offline changes to Wiktionary. Issues I found with T:
 * 1) teca three etyms: (1) "food"; (2) "teak"; (3) "theca". All three have /e/ per DNV, and (1) and (3) have /ɛ/ per GDLC. (1) has /ɛ/ per DCVB, otherwise not indicated. I am guessing then that (1) and (3) have ë, and (2) must have either é or ë.
 * 2) temprar: Exactly parallel to emprar. cawikt says ê but DNV says /e/ for tempre. Is /e/ more recent for Central?
 * 3) temptar "to try": /e/ per DNV, I'm guessing é per etymology.
 * 4) tesla "tesla": /e/ per DNV, I'm guessing é.
 * 5) testar "to witness": /e/ per DNV, I'm guessing é per etymology.
 * 6) teu "your": /ɛ/ per GDLC for Central Catalan but /e/ per cawikt. GDLC says /e/ for meu "my" so I wonder if this isn't a mistake in GDLC.
 * 7) text "text": /tekst/ per GDLC, /tɛkst/ per DNV. Correct? DCVB says /test/ for everywhere, which may be antiquated.
 * 8) tomar (1) "to catch"; (2) "to knock down". Root vowel?
 * 9) tondre "to shear": /o/ for Central in cawikt and DCVB, but /ɔ/ in GDLC (DNV says /o/). However,  note that tosa has /o/ in GDLC. What's going on here?
 * 10) tora "aconite": GDLC and DNV both say /o/ but DCVB says /ɔ/ for both Western and Eastern. Is /ɔ/ antiquated?
 * 11) torbar "to disturb", torba "disturbance" and "torba" peat: GDLC and DNV both say /o/ but cawikt says /ɔ/ for Central Catalan (/o/ for Valencian). Is /ɔ/ wrong or antiquated?
 * 12) tors "torso": cawikt says /o/ (dialect not indicated), but GDLC says /ɔ/ for Central (and DNV says /o/ for Valencian). I am assuming GDLC is correct.
 * 13) trempa, trempar: cawikt says ê everywhere, in agreement with DCVB for tremp and trempa, but GDLC gives /e/ for both tremp and trempa; maybe /e/ is more modern as DCVB's fieldwork is ~ 100 years old.
 * 14) trenca "duffel coat": A borrowing from Spanish trenca. The other meaning of the noun "breakage; lesser grey shrike" has ê but this seems unlikely for a Spanish borrowing. I'm guessing ë.
 * 15) trepa "trimming; stencil" also "mob, riffraff, rabble" also a form of trepar "to drill, to perforate". DNV says /e/ for all three etyms; GDLC says /e/ for the first two, but DCVB says /ɛ/ for the meaning "mob, rabble". I am not sure whether all three are etymologically related.
 * 16) tropa "troop; crowd": cawikt says /ɔ/ everywhere (and DNV says /ɔ/) but GDLC says /o/. DCVB says /ɔ/ for Eastern but /o/ for Girona; maybe /o/ for Central is more recent.
 * 17) trotllo "medusafish": cawikt says /ɔ/ everywhere but DNV says /o/, so I'm assuming ô.

Also a few other issues:


 * 1) alliberar: cawikt says /ɛ/ everywhere but DNV says /e/.
 * 2) beca and derived becar "to give a scholarship to": cawikt says ë but DNV says è.
 * 3) clon: cawikt says ò but DNV says /o/. I am guessing ô then.
 * 4) emprar: cawikt says ê but GDLC says /e/ for empre. Is /e/ more recent for Central?
 * 5) perseverar: cawikt says ê, are we sure? sever has é.

Benwing2 (talk) 01:53, 6 January 2024 (UTC)


 * One more question (sorry for the barrage of questions): Currently the module section for Central Catalan unilaterally removes final single -r, whether absolutely word-final or followed by an -s. I'm thinking of making this less absolute, as follows:
 * Don't remove final -r(s) in monosyllables.
 * In non-monosyllables, remove final -r(s) in -ar, -er, -ir and in -[dtsç]or, but not otherwise. This is based on the fact that most words in -[dtsç]or are agent nouns and seem to fairly consistently remove the -r, while the remaining words in -or often (but not always) preserve the -r per GDLC. Here is a long list of such words: amor, humor, anterior, vapor, rumor, labor, major, tenor, tumor, terror, inferior, superior, clamor, posterior, furor, ulterior, tricolor, temor, rigor, vigor, menor, decor, olor, llavor, suor, licor, rubor, petricor, negror, remor, millor, albor, cremor, claror, grogor, blavor, maror, pitjor, frescor, senyor, finor, incolor, rojor, vermellor, blancor, lletjor, amargor, primor, favor, picor, escalfor, tremolor, esgarrifor, llacor, raor, xafogor. The idea is that to force the preservation of -r, write 'rr', and to force the non-preservation, write '-ó' (although if all these words preserve the -r in Valencian, we'd want some other signal, e.g. '-(r)'). Thoughts? Benwing2 (talk) 09:50, 6 January 2024 (UTC)
 * This plan sounds fine, assuming:
 * The non-preservation happens when the final syllable is stressed. When unstressed only affects some words, like,.
 * In Valencian it is always preserved. To force the non-preservation in Central and Balearic writing '-(r)' or '-(r)s' is intuitive. In fact, this is similar to the rhymes, i.e. Rhymes:Catalan/a(ɾ).
 * In Balearic there are more loses of final -r than in Central Catalan. See, although the result is correct, it is not consistent when the preservation is forced writing 'rr', and it is not reasonable to assume that in Balearic no final -r is ever pronounced. Maybe it should be fixed with a per-dialect parameter.
 * There are many pending things above that require more time. Vriullop (talk) 11:57, 6 January 2024 (UTC)
 * @Vriullop Thanks for your comments. I'm thinking that writing rr should force the pronunciation of final -r everywhere, while writing something like rh should cause it to be pronounced in Central Catalan but not Balearic. This is based on looking through the DCVB with a sample of the above nouns, some of which appear to have pronounced -r in the Balearics, some not, and for some it depends on where in the Balearics. More complex scenarios can be handled using dialect-specific params (which are now implemented; see llei for an example). Benwing2 (talk) 21:23, 6 January 2024 (UTC)
 * Another possibility in place of rh for "pronounced everywhere but Balearics" is (rr). This sets up a hierarchy of pronunciation: rr > (rr) > (r) > nothing. Benwing2 (talk) 01:13, 7 January 2024 (UTC)
 * BTW I am planning on making it required to specify the way final -r is pronounced, using one of rr, (r), (rr) [or maybe rh if we decide on that] or omitting it, except in the circumstances where it defaults to (r), which are multisyllabic words ending in stressed -ar, -er, -ir or -[dtsç]or. In all other circumstances, the pronunciation seems far too irregular to provide a default.
 * Note that I have already removed the majority of defaults for mid vowel o and added the vowel explicitly, and I'm planning on doing the same for mid vowel e. For the defaults I removed, either there were few places that made use of the defaults or there were many but with lots of errors, e.g. o and e in the penultimate syllable with -i or -is in the last syllable were defaulting to ò and è respectively, which makes sense for adjectives of this form but doesn't work for subjunctive verb forms, and there were lots of places where this default was being used for subjunctives, producing incorrect results.
 * One other thing: the pronunciation given in GDLC for meteor is [məteɔ́ɾ], with unstressed [e]. Is this correct? If so I'll need to add a special symbol to allow for unstressed unreduced vowels. However, maybe it's a mistake; I found a pronunciation on Forvo here, which sounds more like [mətəɔ́ɾ] (BTW cawikt says [mətəóɾ] with /o/, which may be wrong as well for Central Catalan). Benwing2 (talk) 05:08, 7 January 2024 (UTC)
 * I forgot to add, I'm implementing a shortcut notation to make it easier to specify things like the pronunciation of final -r without having to repeat the entire word. If you write  where   is part of the spelling and   is the corresponding respelling, it will make that substitution in the respelling as long as it's unambiguous. So you can write   for meteor. To make it even shorter, in cases where the spelling and respelling are similar enough, you can just write the respelling, hence , and the code knows that   should match either   or   in the original spelling and   should match either   or  . Another common example is  , which is equivalent to   and can be used to respell   as   in words like boxejador. This will all be documented in ca-IPA as soon as I push the code. Benwing2 (talk) 05:16, 7 January 2024 (UTC)
 * Great.
 * For final -r, I like the hierarchy rr > (rr) > (r)
 * 'meteor' with unstressed [e] is correct. No need to do anything in the module, function reduction_ae does not apply any reduction in groups 'eà' and 'eò'.
 * A shortcut for respelling is useful.
 * Vriullop (talk) 10:27, 8 January 2024 (UTC)
 * @Vriullop I have implemented everything described above and fixed up all terms in final -r(s) appropriately. The use of the respellings for -r is documented in ca-IPA. The substitution notation like is still being documented. Benwing2 (talk) 02:29, 10 January 2024 (UTC)
 * @Vriullop Thanks for your comments! I have added add the root vowels you specified and am going through the defaulted mid vowel conditions and fixing them up. One thing I notice is that written bl pronounced /b.bl/ and similarly written gl pronounced /g.gl/ aren't correctly handled. For bl at least it seems not all occurrences of bl result in this doubling, e.g. doblar does but sublim doesn't yet they have the same structure in terms of # of syllables, word shape, position of the accent, etc. What do you recommend? i tried manually adding written g.gl to segle, writing it as seg.gle, but then Valencian also gets the doubling, which is wrong. I see two approaches: (1) Manually require all doubled bl and gl to be written as bbl and ggl except maybe in certain suffixes (e.g. -able(s), ible(s)), and have the Valencian-specific code remove the doubling and convert it back to single stops; (2) Double bl and gl by default. This would mean we'd need some method of indicating the non-doubled occurrences, maybe by writing sub.lim or something (although this might be problematic when we start providing phonetic output with fricative [βɣð], which I'd like to do soon; not actually sure though if there will be an issue). Thoughts? Benwing2 (talk) 07:02, 12 January 2024 (UTC)
 * The groups -bl- and -gl- are geminate in Central and Balearic in post-stressed position: poble /ˈpɔb.blə/, regla /ˈreɡ.ɡlə/, including endings -able, -ible. That can be coded in the module. It doesn't happen in Valencian, nor in pre-stressed position, as in sublim. But all its derivatives are also geminate even if in pre-stressed position: poblar, població, reglar, reglament, ... That needs to be respelled pobblar, pobblació, regglar, regglament, and then undone in Valencian. Vriullop (talk) 08:22, 12 January 2024 (UTC)
 * @Vriullop Got it, thanks. I'll implement this. What do you think of just providing phonetic output and changing the /.../ to [...]? This seems consistent with what the various dictionaries do; or at least, they explicitly show the fricative allophones [βɣð]. This would mean, for example, that the issue of whether to display [ŋ] goes away: we just display it whenever it's pronounced as such. Benwing2 (talk) 20:04, 12 January 2024 (UTC)
 * I have implemented what you said for -bl- and -gl-. I am currently working on auto-adding secondary stress to adverbs in -ment. (In the process I'm adding a quick shorthand to indicate a part of speech for a given term, e.g.  or just   for a noun,   or just   for an adjective, etc. The idea here is that terms in -ment default to adverbs, which means they get secondary stress by default, but you can override this by specifying   for a noun like desembarcament or   for an adjective like vehement. Some terms need both a part of speech and respelling, e.g. desdoblament needs   to indicate that it's a noun and the -bl- is pronounced /bbl/.) I have a question though about this. Adverbs in the DNV are indicated with *primary* stress on the preceding component and no stress on -ment, e.g. see  for feliçment. This seems rather strange to me and it's contrary to what the Wikipedia article on  says. Is this really true or is it just something weird in the DNV? Benwing2 (talk) 23:40, 12 January 2024 (UTC)
 * BTW I found an exception to the rule that post-stressed -bl- is geminate: bíblic (and Bíblia). Are there others? If so and given how many exceptions there are in the other direction, I wonder if we shouldn't just make all -bl- and -gl- geminate by default in Central Catalan and Balearic, and require that all cases where this doesn't happen get rewritten using  or  . Benwing2 (talk) 04:01, 13 January 2024 (UTC)
 * I implemented the auto-adding of secondary stress to adverbs in -ment, along with the part of speech hints described above, and fixed up all nouns and adverbs in -ment appropriately. (I actually added pronunciations to all or almost all nouns and adverbs in -ment that were missing them; this took several hours for adverbs because there are around 800 of them in -ment, and many of them have secondarily stressed e or o, which needed looking up.) The mid vowel hint now applies to the part preceding the adverbial -ment, not to the -ment itself (which is always pronounced /men(t)/ with /e/). Note also that in the future, these part of speech hints can also help with things like terms in -ar, where adjectives in Central Catalan pronounce the final -r but nouns and verbs generally dont. Benwing2 (talk) 07:33, 13 January 2024 (UTC)
 * OK, from the GDLC it looks like there are actually three ways that -bl- can be pronounced: obligar has [βl], doblar has [bbl], and obliterar has [bl]. Is that correct? If so I'll need to come up with some notation to distinguish these three. Maybe we should write o-bliterar to get [bl]; this is consistent with words like hipoglucèmia, which have hard single [gl] following a prefix with secondary stress [ìpuglusɛ́miə]. This would suggest a respelling hípo-glusèmia. Then if we need post-stressed [βl], we write e.g. Bíb.lia, and if we need post-stressed [bl] for some reason we'd write e.g. Bí-blia or something, and to get post-stressed [bbl] we'd write e.g. Bíbblia (or rely on the default). Make sense? Sorry to dump so much text on you. Benwing2 (talk) 09:25, 13 January 2024 (UTC)
 * Great work here.
 * The inclusion of allophones βɣðŋɱ does not imply to change the transcription with brackets [...] In fact, /β/ is not a voiced bilabial fricative but a simplification without diacritic of an approximant [β̞]. Catalan works follow a convention of "broad transcription" with the inclusion of what is considered relevant and without any claim about phonemic values. A purely phonemic transcription is a theoretical discussion. According to different authors, between 25 and 31 phonemes can be considered in Catalan. For example, the schwa is a predictable dialectal allophone, but it is relevant in contrast with other Romance languages. If it were necessary to mark that it is not strictly phonemic, frwikt uses \backslashes\. They are also used by the Merrian-Webter as a notation for its own IPA transcription. The criteria followed in enwikt do not seem consistent enough to me.
 * The DNV does not show primary and secondary stress, nor does it in compound words. It is more noticeable in Eastern dialects without schwa in secondary stress. The stress showed in adverbs with -ment is misleading.
 * 'Bíblic' and 'Bíblia' are the only exceptions to geminate bl.
 * I have not found any explanation for 'obliterar' and 'hipoglucèmia'. See https://giec.iec.cat/textgramatica/codi/4.4.3.3. Maybe as cultism in very formal speech, but I think it doesn't worth to make exceptions here. On the contrary, note that /β/ does not happen in Balearic and formal Valencian after a vowel, that is in dialects that distinguish /b/-/v/.
 * Vriullop (talk) 09:17, 15 January 2024 (UTC)
 * @Vriullop Thanks for your response, this is very helpful. I am currently working on fixing up terms with written x (there are a lot of mistakes) but I'm almost done with the offline portion and I think next I'll focus on adding the fricative allophones and correctly handling multiple words. For handling multiple words I need to know the following:
 * What are the unstressed words? I assume they are all the proclitic object pronouns em, et, es, el, la, els, les, li, ens, us, ho, hi, en; plus the enclitic ones -me, -te, -se, -lo, -la, -los, -les, -li, -nos, -vos/-us, -ho, -hi, -ne (which might already be handled correctly); the contracted ones with apostrophe (which may already be handled correctly); maybe the unstressed possessives mon, ma, mos, mes, ton, ta, tos, tes, son, sa, sos, ses; the prepositions a, de, per, amb (and obsolete ab?), en (what about cap, des?); the prepositional contractions al, als, del, dels, pel, pels; articles el, la, els, les (already handled as proclitic pronouns), personal articles en, na (what about indefinite articles un, u, uns?); maybe salat articles es/ets, sa, ses, so, sos; the conjunctions i, o (what about si?). Any others?
 * Which assimilation rules apply across words? The Wikipedia article says that final -s voices before a vowel, which seems to cause a preceding consonant to voice as well, hence tots els has /dz/ in the middle. I assume that lenition of written b d g occurs across word boundaries as well. What about final omitted -r? Does it reappear before a vowel in the next word, e.g. in a phrase like vaig amar una dona? (And for that matter, does the -ig in vaig become voiced in this phrase?) Do you have any references on this?
 * Thanks again. Benwing2 (talk) 09:57, 15 January 2024 (UTC)
 * The list is correct: proclitic and enclitic pronouns, unstressed possessives, prepositions but not 'cap', 'des', contractions, articles including personal ones and salats, indefinite articles but not 'u', conjunctions including 'si' and 'ni', and also as a pronoun and conjunction.
 * In general, contact between words have the same process of assimilation, voicing, or devoicing that inside words. A typical example is els avis /əlz/, els savis /əls/, and tots els is really /ˈtodz.əls/, and vaig amar /ˌbad͡ʒ.əˈma/. The final -t reappears followed by a vowel (sant Antoni /ˌsan.tənˈtɔ.ni/). The final -r of infinitives only reappears followed by a pronoun (anar-hi /əˈna.ɾi/). From chapter 4.4 onwards of the IEC grammar you can find a lot of examples. Vriullop (talk) 12:37, 15 January 2024 (UTC)
 * @Vriullop Thanks again for your help. I finally finished most of the work on multiword support. Still to go is approximant allophones of b/d/g, correct handling of apostrophes (represented with ‿), and ‿ as an indicator of liaison in respelling for cases like Sant Antoni respelled Sànt‿Antòni (which should produce /ˌsan.tənˈtɔ.ni/). I (more or less) read chapter 4.4 in the IEC grammar and I notice it also talks about certain cases of total assimilation where maybe cap de is pronounced /kad də/ or something, but I'm not sure we should implement that. I have some questions though:
 * Brunsvic (as in e.g. Nova Brunsvic) given as [bɾunzvík] in GDLC, is the v correct?
 * For drets humans, the module currently generates /ˈdɾɛdz uˈmans/, is that correct?
 * fer cas, fer acte de presència: Is the  pronounced in Central Catalan?
 * Sant Llorenç de la Salanca: the module currently generates /ˈsaɲ ʎuˈɾɛnz də lə səˈlaŋ.kə/ for Central and /ˈsand ʎoˈɾɛnz de la saˈlaŋ.ka/ for Valencia; correct? In general, does final -ç voice when the next word begins with a vowel?
 * The IEC grammar is equivocal about whether b/d/g become fricatives after /r/, /ɾ/ and /z/, what should we do in this case?
 * It appears double schwa /əə/ is often compressed to single schwa /ə/ in Central and maybe Balearic, but not in Valencian. This is indicated in GDLC and seems to operate fairly consistently if the second schwa is in a closed syllable (sobreescalfament, contraescarpa), but only sometimes in an open syllable (centreafricà, contraatac). Can you comment here? Likewise, /i/ or /u/ followed by schwa seems to elide the schwa in aeroespacial, autoescola, antiespasmòdic, but only sometimes if the schwa is in an open syllable (hence not in autoerotisme, antiemètic but yes in fotoelèctric, fotoelectricitat, macroeconomia). Likewise /uu/ seems to compress to /u/ if the second /u/ is in a closed syllable (microorganisme), but only sometimes in an open syllable. How do you think we should handle these cases?
 * I am trying to figure out what to do for written , , , . It seems that these tend to be pronounced as geminates in native words (e.g. cotna, setmana) but with [d] in cultisms/learned words. I'm thinking maybe we should make the cultism behavior the default and require respelling for the remainder, and least for  where there are more terms like ritme, aritmètic, atmosfera than terms like setmana. But maybe this should differ depending on the different spellings, e.g.  even in a cultism like atlàntic seems to have a geminate in it in Central Catalan but not in Valencian. Can you comment on what you think should be done?
 * Benwing2 (talk) 22:45, 26 January 2024 (UTC)
 * Note, I also revamped the testcases, see Module:ca-IPA/testcases (which demonstrate there's still a lot to fix). Benwing2 (talk) 23:26, 26 January 2024 (UTC)
 * Brunsvic is strange. It is supposed the GDLC includes pronunciation from the Diccionari ortogràfic i de pronúncia (DOP), but it turns out that the DOP does not include proper names. For non-Catalan place names I check ésAdir, a website for radio and tv journalists, and it shows /'bɾunz.βik/ as I expected.
 * 'Drets humans' is correct.
 * 'Fer cas', 'fer acte', are correct. The r of infinitives only reappear followed by pronouns: fer-se /ˈfer.sə/, fer-hi /ˈfe.ɾi/, fer-t'ho /ˈfer.tu/...
 * 'Sant Llorenç de la Salanca' is correct. Final /s/ of Llorenç is voiced /z/ followed by a voiced consonant or by a vowel.
 * The IEC grammar is too much descriptive about approximants, when they may or may not appear. Considering that /β/ is rare in dialects with contrast /v/-/b/, that is Balearic and Valencian, and trying to be consistent with GDLC and DNV:
 * No approximants r/s + b/d/g in Central.
 * No approximants r/s + b in Balearic and Valencian.
 * Approximants r/s + d/g in Balearic and Valencian.
 * In general, the concurrence of two identical vowels /əə/ (or /aa/, /ee/), /uu/ (or /oo/) is reduced to a single vowel. Variations may depend on formal v. informal, or common use v. cultism, or emphasis of some prefixes. It is hard to define any exception.
 * Written  and  are geminated in a handful of inherited words: cotna, reguitnar, setmana and its derivatives. But 'setmana' with a single /m/ in Valencian. 'Vietnamita' and 'sotmetre' are hesitant. Others like 'ritme', 'ètnic', 'algoritme' are cultisms /dm/.
 * Written  is always /ll/ in Central and Balearic. In Valencian it is /ll/ in inherited words and /dl/ otherwise. Valencian inherited words include those with alternative spelling : ametla > ametlla, butla > butlla...
 * Written as alternative spelling of inherited  is pronounced /ʎʎ/ in Central and /ll/ in Balearic and Valencian. Although the DNV includes 'ametlla', 'butlla'... it is not really used, and if written it is still pronounced as . As a cultism, like 'ratlla', 'bitllet' or 'butlletí', it is pronounced /ʎʎ/ in Central and /ʎ/ in Balearic and Valencian.
 * Vriullop (talk) 10:54, 29 January 2024 (UTC)
 * @Vriullop Thanks. I have (already) implemented most of the above things. I haven't yet implemented reduction of adjacent unstressed vowels or redone the implementation of  and . As for Sant Llorenç de la Salanca, the module formerly generated [ˈsand ʎoˈɾɛnz ðe la saˈlaŋ.ka] for Valencia (note the [d] in /sand/) but I am guessing this is wrong, so I changed it so it now generates [ˈsaɲ ʎoˈɾɛnz ðe la saˈlaŋ.ka]. Basically I am guessing that elision of stops after nasals happens in Valencia before a consonant but not a vowel or utterance-finally. Is this correct? Benwing2 (talk) 01:53, 30 January 2024 (UTC)
 * I didn't notice 'sant'. It is correct, elision of t and assimilation of the nasal before a consonant, not before a vowel or isolated.--Vriullop (talk) 08:00, 30 January 2024 (UTC)

Your bot is removing valid categories
e.g. at Westsahara. —Justin ( koavf ) ❤T☮C☺M☯ 00:55, 1 January 2024 (UTC)


 * @Koavf This is unavoidable. When you add a page to a category, sometimes it takes a little while for the category to register having the page in it, and in the meantime it shows up in CAT:Empty categories, which is what I use periodically to delete empty categories. I check that category before deleting the empty categories referenced, but I can't notice everything. Any non-empty categories so deleted will get re-created in a few days in any case. Benwing2 (talk) 01:06, 1 January 2024 (UTC)
 * What are you talking about? That category was on that page for 5.5 years and your bot removed it for no reason. How is that unavoidable? Are you telling me that your bot is going to re-add all of these categories and undelete them as well? —Justin ( koavf ) ❤T☮C☺M☯ 01:09, 1 January 2024 (UTC)
 * Dude, fuck off. Seriously. Yelling at me is not going to get me to help you any quicker than writing nicely.
 * As for my response, I thought you were referring to my recent deletion of empty categories (as of a few hours ago) rather than a bot change from a month and a half ago. In the future I'd recommend you link to the specific diff. My removal of the category at that time was a by-hand change, not a script change, even though the bot pushed the change; that's what "manually assisted" means (and I have a strong feeling I've already explained this to you). The reason for the removal is that Module:place normally auto-adds categories of this nature, and I thought it would in this case; the reason it didn't is apparently because Western Sahara is listed in Module:place/shared-data as a country, but its definition identifies it (correctly) as a territory rather than a country. I'll fix this so it gets correctly auto-added. Benwing2 (talk) 01:30, 1 January 2024 (UTC)
 * I was much nicer than you were just now and was in no sense "yelling". There was no reason for that language. I didn't realize that what I wrote was ambiguous and I thought that referring you to the entry would be sufficiently clear where you can see what your bot (or script or by-hand you) did. Thanks for agreeing to fix this and undelete all of these categories. When will this happen? —Justin ( koavf ) ❤T☮C☺M☯ 22:18, 1 January 2024 (UTC)
 * When will you or your bot undo these category removals? —Justin ( koavf ) ❤T☮C☺M☯ 22:42, 15 January 2024 (UTC)
 * @Koavf Which removals are you referring to? Specifically to do with Western Sahara, or are there any others? Benwing2 (talk) 22:44, 15 January 2024 (UTC)
 * The only ones I am aware of are removals of the sort which emptied several categories that were then deleted. I'm not familiar with any others. —Justin ( koavf ) ❤T☮C☺M☯ 22:46, 15 January 2024 (UTC)
 * When will you or your bot undo these category removals? —Justin ( koavf ) ❤T☮C☺M☯ 01:37, 21 January 2024 (UTC)
 * @Koavf Did you not get my ping? I did this days ago. Benwing2 (talk) 02:37, 21 January 2024 (UTC)
 * I see that it did and no, I didn't. For some weird reason, I also did not get updates for this thread even after subscribing. :/ Thanks a lot. —Justin ( koavf ) ❤T☮C☺M☯ 10:16, 21 January 2024 (UTC)

Twice-borrowed terms
I looked up παλάβρα, which is from παραβολή after passing through Ladino, and found out that, after moving all the "twice-borrowed terms" categories to "terms borrowed back into", there are still lots more Greek twice-borrowed terms than Greek terms borrowed back into Greek. This may also be true of other languages. Can you look into it? PierreAbbat (talk) 16:43, 1 January 2024 (UTC)


 * @PierreAbbat It’s because they were added manually due to the origin being Ancient Greek, which is a misuse of the category imo. Theknightwho (talk) 19:17, 1 January 2024 (UTC)


 * Yeah @Pierre, if I may expand on what Theknightwho said, it is indeed because of Ancient Greek being considered a separate language, and this is discussed at Beer_parlour/2023/November (and actually quite a few other places over the years, e.g. Etymology_scriptorium/2016/June, Beer_parlour/2011/October), and ... it's tricky. Because ... while I'm sympathetic to the potential complaint that it's somewhat arbitrary that a word used in the modern form of Hebrew or Latin (or Chinese) and derived from the variety spoken two thousand years ago can be automatically categorized as "borrowed back" while a word in modern Greek or English can't be, just because we decided it was most convenient to handle the changes those languages underwent as still being ==Hebrew==, ==Latin== (or ==Chinese==) but decided to split the changes Greek underwent between two languages ... we do have to draw a line somewhere or else we get into absurdities (e.g. a term from Proto-Indo-European, which went into French, and was borrowed into English, is twice-borrowed/borrowed-back?), and if we draw the line anywhere other than "whatever we've decided to consider a separate full language", it gets fuzzy and messy fast. But please comment in the November BP discussion linked above if you have suggestions. - -sche (discuss) 19:45, 1 January 2024 (UTC)

New :toBcp47Code method
If I interpret this recent change to Scribunto correctly, it provides a way to convert from MediaWiki langcodes to proper langcodes directly. Might be worth incorporating, as I imagine it’ll simplify some of our code, and I think you’re more familiar with that side of things than me. Theknightwho (talk) 15:50, 2 January 2024 (UTC)


 * @Theknightwho Unfortunately I'm not sure this is useful for our purposes. Wiktionary language codes aren't always the same as MediaWiki language codes and I don't think we ever need to convert MediaWiki -> BCP47; instead if anything we'd need to convert MediaWiki <-> Wiktionary and Wiktionary -> BCP47. Benwing2 (talk) 22:47, 15 January 2024 (UTC)

Addition to quotation-template documentation
I just fixed a module error caused by WF converting a quote to quote-book without checking what goes where. The template documentation is thoroughly organized, voluminous, and useless for figuring out how to fix parameter values in the wrong slots. I was going to add a little index of positional parameters, but that would have required reverse-engineering your documentation module. Instead, I'm just going to dump a mockup here, and let you deal with it:

An alphabetical index of parameter names might also be nice.

And, no, I don't want fries with that...

Thanks! Chuck Entz (talk) 06:14, 5 January 2024 (UTC)


 * @Chuck Entz Yeah there are so many params that organizing them properly is a very challenging task. For this reason I tried to do away entirely with positional params but some people squawked loudly enough that they are kept for quote-book and quote-journal, and disallowed for the rest. I think your mockup is a good idea. Benwing2 (talk) 06:17, 5 January 2024 (UTC)

Using the Old French conjugation table as an inspiration
I was trying to create a more complex conjugation table for the Old Spanish language. Then I started viewing other templates and learned that the one used for the Old French language is perfect. I might be able to perform some basic editions to adapt for the Old Spanish conjugation system. However, I couldn't get a sample of that template to edit as there are so many links together. So would you please share with me a simple, editable sample of the template of the Old French language so I can apply it to this page: Cantar? Besides, it'd be helpful to better standardize Wiktionary. Thalyson2019 (talk) 05:42, 6 January 2024 (UTC)


 * @Thalyson2019 The Old French conjugation tables aren't implemented using templates but rather using a module: Module:fro-verb. I agree that it's a good base to start with when designing a conjugation system for a language that wasn't really standardized. I'm not sure if you are comfortable working in Lua, because the module is written in Lua and it's not really possible to do what it does just using template syntax. Benwing2 (talk) 05:57, 6 January 2024 (UTC)
 * Is there any solution for that? I already have the verbs and their positions in mind. I'm not familiar with Lua, even though I create basic templates. Thalyson2019 (talk) 06:08, 6 January 2024 (UTC)
 * @Thalyson2019 You'd have to get someone to create the Lua module for you. I can't commit to something like this right now as I have already committed to several other projects. However if you create some mockups and link them here, then if/when I or someone else is able to contribute, the mockups can be a good starting point. Benwing2 (talk) 06:10, 6 January 2024 (UTC)
 * Such mockups should be in format of codes or pictures? Thalyson2019 (talk) 06:14, 6 January 2024 (UTC)
 * @Thalyson2019 Maybe some sample template calls for some simple verbs like cantar and some complicated verbs as well (tener? ir?). I or anyone working on this would in addition need some good resources on Old Spanish verb conjugation. Benwing2 (talk) 06:18, 6 January 2024 (UTC)

Finnish inflections
Hey Benwing, I know that WingerBot is used to mass-create the inflection pages for Romance verbs. Is there any way that it could do similar work with Finnish noun forms? According to Jberkel's last data dump there are literally millions of Finnish redlinks, most of which appear to be nouns, so bot help is probably necessary to make a real dent. Thanks for your time! Vergencescattered (talk) 20:01, 6 January 2024 (UTC)
 * : have you talked to about this? As a native speaker with a bot, they would be a more logical choice, and more likely to be aware of potential problems. Chuck Entz (talk) 20:35, 6 January 2024 (UTC)
 * @Vergencescattered I agree with Chuck. Also pinging @Hekaheka. E.g. there may be a reason these forms aren't created (too many of them?). Benwing2 (talk) 21:20, 6 January 2024 (UTC)
 * There are probably somewhere around 200,000 nouns in Finnish and each has 30 inflected forms (15 cases in singular and plural) without taking into account any suffixes. This is the rough number found in . Adding dialects and slang one gets roughly to half a million or more. That would give 6 to 15 million entries. If we add the six (third person possessive suffixes are the same for plural and singular but to compensate this potential simplification there are two of them) possessive suffixes, the number of potential entries increases to 40 to 100 million. Some of the forms might be unattestable as abessive, comitative and instructive are quite rarely used but that does not cut more than 20% of the total. On top of this each verb has close to one hundred inflected forms if we take into account the possessive forms of some infinitives and participles.
 * This leads me to think that we might need a new approach to inflected forms in general. Perhaps they should have an entry of their own only in such rare cases in which the inflected form has a meaning or meanings that cannot be readily derived from the lemma form. In most cases the system would work so that a search for an inflected form would redirect to the article of the lemma form. --Hekaheka (talk) 23:33, 8 January 2024 (UTC)
 * @Hekaheka It would be great if MediaWiki could autogenerate the text of an inflected form, but in its current state it can't do either that or redirect from an inflected form to a lemma form. IMO the most useful thing about having inflected forms entered as such is when you have homophones or homographs between different inflected forms. This occurs fairly often in the Romance languages, for example, between noun and verb forms or between adjective and verb forms. It also occurs fairly often in Russian between noun and verb forms but rarely for adjectives except for short forms of adjectives; for this reason I have never done a bot run to create Russian adjective forms (besides the fact that there are a lot of them). If Finnish grammar is largely regular and doesn't have a lot of homonyms, I would think it's not useful to have inflected forms generated. I suppose for the moment we need to use our judgment as to whether it's worth it to create such forms. Benwing2 (talk) 23:38, 8 January 2024 (UTC)
 * I would definitely appreciate their input! I didn't know about Surjection or their bot before you mentioned them, so I apologize for bothering you about it. Thank you! Vergencescattered (talk) 23:27, 6 January 2024 (UTC)

Request to deploy szy-pron
I've created a Sakizaya pronunciation template, and I need help deploying it to all Sakizaya language entries on Wiktionary. Could you assist with this using your bot account? --TongcyDai (talk) 17:29, 7 January 2024 (UTC)


 * @TongcyDai What needs to be done here? Are there any cases where manual respelling or other help for the template is needed? Benwing2 (talk) 22:54, 7 January 2024 (UTC)
 * When adding the template, simply insert szy-pron into each Sakizaya entry, no parameters and respelling are needed. TongcyDai (talk) 10:16, 8 January 2024 (UTC)
 * Please let me know if there's anything else you need from me to deploy the template. --TongcyDai (talk) 18:38, 1 March 2024 (UTC)
 * Is there anything I can help with? --TongcyDai (talk) 07:06, 17 April 2024 (UTC)
 * @TongcyDai Apologies for the delay, I am working on this now. However, the template should be called szy-IPA for consistency with other pronunciation templates. Do you mind if I rename it? Benwing2 (talk) 23:26, 17 April 2024 (UTC)
 * Thank you for the update. I appreciate your help and have no objections to renaming the template, please go ahead. --TongcyDai (talk) 07:33, 18 April 2024 (UTC)
 * @TongcyDai Done. Benwing2 (talk) 23:18, 18 April 2024 (UTC)

Relational -> demonym
Could you clean up Spanish demonyms like ? It makes more sense than categorizing 900+ demonyms as relational adjectives just because they don't have a one-word translation in English. Ultimateria (talk) 19:23, 7 January 2024 (UTC)


 * @Ultimateria Hi, I actually wrote a script awhile ago to do exactly this but never ran it. I don't remember why; maybe it needed a few fixes. I'll go ahead and finish this. Benwing2 (talk) 22:52, 7 January 2024 (UTC)

Revert adding acceleration forms to pl-conj-ai
Hi @Benwing2. You just reverted the changes to the template pl-conj-ai. Could you please elaborate on what was broken? So I could see how it could be fixed while preserving the benefit of the acceleration forms? Incidentally, similar changes have been made to other templates, so the same error could arise for other verbs. You are referring to active adverbial participles, for which only one single form was used before, even though those adverbs have different forms depending on plural/singular and gender. Maybe the breaking tool needs to be updated to cater for those other forms. @Vininn126 JuChelou (talk) 14:04, 25 January 2024 (UTC)


 * @JuChelou For one thing, the specific value of 'active adjectival participle' (along with various other specific values) is processed specially in Module:accel/pl and causes the inflection to be set to 'actv|adj|part'. By changing this you broke this support, and caused it to use an invalid inflection tag set 'm|s|active adjectival participle'. The other inflections of the participle were similar. The correct thing to do is to leave the masc sing participial forms unchanged and if you want to add acceleration to the other forms, they should cause the form to be created as e.g. pl rather than as an inflection of the verb. You can see an example of how to handle this correctly by looking at the lines starting at Module:accel/pt. Benwing2 (talk) 22:50, 25 January 2024 (UTC)
 * Thank you @Benwing2 for your reply. @Vininn126
 * I tried something in Module:accel/pl and pl-conj-ai to add proper accel form support for the adjectival participles.
 * However, I am not fully satisfied with the result because:
 * 1/ on the masculine singular form, it could add 2 forms, for example for wyrzucający wyrzucać
 * 2/ the result would not be similar if the new wiki page is triggered from the conjugation chart or from the adjective declension chart (which I also added recently). For example, for wyrzucające, the new wiki page triggered from the verb link would "miss" the fact that it is also the form for accusative neuter and accusative non virile.
 * Any advice? Or should I just ditch the extra accel forms for the participle and contributors would use the new accel links from the adjective declension module? JuChelou (talk) 16:18, 26 January 2024 (UTC)
 * In theory you could generate wyrzucający and from there generate the others, but it's less than ideal. Vininn126 (talk) 16:40, 26 January 2024 (UTC)
 * @JuChelou Hmmm, I'm not quite sure how to handle #2; either you'd have to add all the non-nominative forms of the participles to the verb table so that the accelerator code knows about them automatically, or you'd have to hack the code in MOD:accel/pl somehow to add the remaining inflections in. (This latter thing is possible, as I think I added a hook that you can define in the accelerator module that operates at the end after all the inflections have been combined.) As for #1, the general principle I've followed is not to include definitions for non-lemma forms that are identical in spelling to the lemma. I followed this principle, for example, when I create a bot script to add Russian noun inflections. This also happens in Portuguese verbs (where the 1st and 3rd singular future subjunctive usually looks the same as the infinitive), and for Latin feminine nouns (where the ablative singular is spelled the same as the nominative singular, although the pronunciation is different as the ablative ends in long -ā while the nominative ends in short -a). I actually removed the cases where Portuguese verbs were defined normally but had an additional definition as the 1st/3rd singular future subjunctive, but I may have left alone the Latin ablative cases because of the different pronunciation. In the Polish case, the pronunciation is the same and so you could fix this by just not having an accelerator defined on the forms that look like the lemma.
 * In general, I would actually argue that instead of including only the nominative case forms, it's best not to include anything but the masculine nominative singular of the various participles in the verb table, and require that the remaining forms be defined using accelerators on the participle table, even though User:Vininn126 thinks this is non-ideal. This is how we handle participles in Russian, for example, which is similar in many ways to Polish. I think the main benefit to having non-lemma participle forms defined in the verb table is if there are irregularities in their formation, but I don't think this is the case in Polish. Benwing2 (talk) 23:20, 26 January 2024 (UTC)
 * An additional thought is maybe we shouldn't be defining non-lemma forms of participles at all, since AFAIK they're quite regular and there are a lot of them. See the discussion above about . This is the policy we follow for Russian, for example. Benwing2 (talk) 23:22, 26 January 2024 (UTC)
 * Where do we define non-lemma participles? Vininn126 (talk) 10:17, 27 January 2024 (UTC)
 * @Vininn126 Sorry, can you clarify what you mean? Benwing2 (talk) 10:37, 27 January 2024 (UTC)
 * I simply didn't understand your last message Vininn126 (talk) 10:59, 27 January 2024 (UTC)
 * Thank you @Benwing2 for your very detailed answer.
 * Basically, regarding your recommendations for #1, that would be easy to remove the accel form for the version identical to the lemma form.
 * For the #2 however, that would be more tricky as it would require to duplicate generating all the forms, opening room for discrepancies between the pl-adj module and the polish accel module.
 * If I understand correctly, your overall recommendation is to remove all the other forms of the participles in conjugation templates. Basically, we would just have "active adjectival participle: masculine singular nominative form".
 * It would be similar to what is done for the verbal noun, where there is only the masculine singular nominative form, even though other forms exist.
 * @Vininn126 what would be your opinion on removing the additional forms of the adjectival participles from the conjugation templates? JuChelou (talk) 17:02, 28 January 2024 (UTC)
 * Sounds fine to me; it's not typical to have them. Vininn126 (talk) 18:03, 28 January 2024 (UTC)

On the template
Hi, I was wondering what exactly the combined use of the parameters start_year and year is supposed to communicate. It's supposed to mean a range of dates, but—with an example 1390–1400—is range meant:
 * in the sense of "the composition of this work started in 1390, and ended in 1400"?
 * or in the sense of "this work was probably completed (or brought to its current state, if unfinished) somewhere between 1390 and 1400"?

Thanks in advance for any clarification. I've recently discovered these parameters, and I'm not sure I've been using them properly. —— GianWiki (talk) 15:24, 25 January 2024 (UTC)


 * @GianWiki These parameters were there before I started to clean the template up, so you might ask User:Sgconlaw, but I'm thinking it's used for works that took several years to create. Benwing2 (talk) 23:45, 25 January 2024 (UTC)
 * I see, I hadn't noticed that. I'll try asking them just to make sure.
 * Thank you for your time. —— GianWiki (talk) 08:18, 26 January 2024 (UTC)
 * I don't think the parameters were clearly defined at the time when I first tidied up the  templates. Personally, I use them to mean a range of publication dates (for example, if a novel is originally published in parts in a magazine over many months), and if I intend a range of dates to mean anything else I add a qualification in parentheses for clarity like this: c. 1597–1600 (date written). — Sgconlaw (talk) 10:54, 26 January 2024 (UTC)

WingerBot and Welsh animal genders
Hi, your bot edited garan ("crane") and petris ("partridge") so they would be “m or f by sense”, which isn’t correct. I've corrected them, but can you amend the bot so it doesn’t edit other animals like this please?

Garan is usually a masculine noun, that can be feminine due to dialect, rather than the sex of the animal (e.g. in Iolo Williams’s Llyfr Adar and the Geiriadur yr Academi) and petris is feminine.

I’ve consulted a bit with other Welsh speakers and the only source I can see for petris ever being masculine is the Geiriadur Prifysgol Cymru, which could easily be due to one or two examples from centuries ago. “A small cock partridge” would be ceiliog petris bach – where bach modifies ceiliog, not petris.

Cheers, Arafsymudwr (talk) 15:54, 30 January 2024 (UTC)


 * @Arafsymudwr This was a one-off run where I manually made the changes in question in a text editor and only used the bot to push the changes (that's what "manually assisted" means in the changelog message). So there's no script to amend but I'll make sure not to change the genders of animal terms in Welsh (or generally in any language, I think) in the future. Benwing2 (talk) 06:11, 31 January 2024 (UTC)

Links to English possessives in inflection-line templates
I wish I had included this in my request about links to components of hyphenated terms in English inflection templates. (How's that coming, BTW?) Many vernacular names of organisms are like Gundlach's hawk (See Gundlach's hawk). It would be better, especially for me, if the link were to Gundlach rather than the possessive. I can't think of any instances for which the possessive would be a better link target and believe that any such instances are relatively rare exceptions. DCDuring (talk) 16:29, 31 January 2024 (UTC)


 * @DCDuring Yes, in fact my concerns over how to handle apostrophes are why this hasn't already gotten done. I'm thinking that we should split any term with a trailing 's except for one's and someone's (with exceptions also maybe for he's, she's, it's), but not split other terms with apostrophes (e.g. I'm, don't, haven't). BTW I notice that we've split apostrophe-s into two terms, 's for the contraction and -'s for the possessive. Personally I think this is confusing and probably they should be merged into 's (without the hyphen). It also makes auto-linking more difficult; probably we should link all occurrences of  into -'s since this is the more common case. Benwing2 (talk) 22:07, 31 January 2024 (UTC)
 * This 's/-'s distinction gets to how to indicate the distinction between an inflectional ending and a contraction, doesn't it? One one level one needs a linguistics or philosophy degree to be qualified and/or motivated to argue this, but I don't hold the right degrees. On another level, how to help users, it would seem both should be on the same page, almost certainly 's. It probably should go to BP, but you may be able to go ahead with what is convenient to implement and rely of links between 's[[ and [[-'s to help users in the meanwhile. DCDuring (talk) 22:22, 31 January 2024 (UTC)
 * @DCDuring Please see User:Benwing2/test-en-multiword for some examples of the new headword link handling system that I'm testing. It includes the ability to change the link of one (or several) of the words of a multiword expression without having to write out the entire expression; see the examples that specify ~.... (This functionality was already implemented for Italian and later extended to other Romance languages.) Note that if there are both hyphens and spaces, the default behavior is to link the space-separated components but not break up hyphen-separated components, although this can be changed using 1. Possibly the default should be reversed and hyphen-separated components broken up by default unless 1 is given; what do you think? Benwing2 (talk) 00:01, 2 February 2024 (UTC)
 * I will look at it in about 16 hours. DCDuring (talk) 00:04, 2 February 2024 (UTC)
 * OK, thanks. BTW I'm thinking we should indeed change the default when there are both hyphens and spaces, and maybe make an argument to convert hyphenated terms to space-separated terms, e.g. for cases like civil-rights movement and claw-hammer coat that should be linked as  and   (likewise closed-circuit television, clock-face timetable, coffee-table book, etc.), although there are also examples like close-up lens, coin-operated laundry, context-free grammar, co-occurrence network, etc. where we do want to link the hyphenated component as such. Benwing2 (talk) 00:58, 2 February 2024 (UTC)
 * I really like the more hyphenated forms because they reduce certain kinds of possible misreading of MWEs, but contemporary relative frequency may indicate that hyphenated forms are already much less frequent. For three-part English vernacular names of organisms, I often find that the hyphen is in the wrong place or is not useful. But black billed amazon is not a good substitute for . DCDuring (talk) 01:10, 2 February 2024 (UTC)
 * @DCDuring I have redone the handling of terms with both hyphens and spaces so that it now looks up the hyphenated term to see whether it exists in order to determine how to link it. Specifically:
 * If the term exists as a space-separated compound, link to that. (We prefer space-separated compounds because the hyphen-separated form often exists as a soft redirect.)
 * Otherwise, if the term exists as a hyphen-separated compound, link to that.
 * Otherwise, link the hyphenated terms separately.
 * This handles most cases properly, although there are occasional situations where it fails; for example, close up and close-up both exist and are different, and by default close-up lens links (wrongly) to the former. For this reason I've provided params to override the default handling: 1 forces case (1) above, 1 forces case (2) above, and 1 forces case (3) above.
 * Benwing2 (talk) 05:27, 2 February 2024 (UTC)
 * I hope we will never have entries for terms like scaly-headed. So I'll have to use nosplithyph=1 for a vast number of vernacular names. I may as well not have asked for this favor. I suppose I could create a new template to wrap en-noun or head, specifiying the parameter, to save keystrokes for these vernacular name entries.DCDuring (talk) 13:41, 2 February 2024 (UTC)
 * @DCDuring If you need to use 1 for a large number of vernacular names, that is defeating the purpose of things. Can you explain why you think you need to use this for so many? Things like scaly-headed are SOP so should be split, IMO. Benwing2 (talk) 20:37, 2 February 2024 (UTC)
 * I misread in haste, I think. DCDuring (talk) 22:43, 2 February 2024 (UTC)
 * @DCDuring I have implemented the various changes to the linking behavior of Module:en-headword. They are documented on the module documentation page Module:en-headword/documentation (although the section on link modifiers is still to be written). There is text in the documentation of en-noun, en-verb and en-adj pointing to the module documentation page for the specifics about multiword linking and suffix handling. Let me know if there's anything else needed documentation-wise. Benwing2 (talk) 00:10, 7 February 2024 (UTC)
 * The section on link modifications (renamed from link modifiers for clarity) is written. Benwing2 (talk) 00:46, 7 February 2024 (UTC)

devil's own
I reverted WingerBot's edit to this entry not just because of the module error (I think you added def to the noun and proper noun code, but not to the adjective), but because it looks to me like the syntax is more along the lines of "[the devil's] own" rather than the "the [devil's own]. Not that I would get into an edit war over this- I just wanted to make sure you were aware of that dimension before deciding how to fix things. Chuck Entz (talk) 04:23, 4 February 2024 (UTC)


 * @Chuck Entz Thanks. Yeah I forgot about handling adjectives with the in them. As for the syntax issue, all that 1 does is add the before the head; it doesn't assert any particular way of parsing the constituents. I suppose it could be interpreted as asserting an analysis like the [devil's own] but that wasn't my intention (and I'm not quite sure how we'd indicate such an analysis in the head). Benwing2 (talk) 04:47, 4 February 2024 (UTC)
 * But adjectives don't have the in them. We should review the entries that so claim and determine whether there is good reason to ever have the inside the headword template for adjectives. DCDuring (talk) 14:14, 4 February 2024 (UTC)
 * Never mind. I was thinking of leading the. We have numerous entries of purported adjectives with the embedded. Some of them seem like attributive use of a noun, but not all. DCDuring (talk) 14:23, 4 February 2024 (UTC)

Category:LANG nouns with other-gender equivalents
Hello Benwing. I hope that this does not take too much of your time. How should CAT:Telugu nouns with gendered forms be added to MOD:te-headword? I tried looking at MOD:hi-pa-headword, but could not figure out what and where to add the equivalent of:
 * lua

to MOD:te-headword. I noticed that this feature was missing for Telugu when I saw
 * wiki

at the entry for. The Lua-fication of te-noun means adding features such as this is not as easy as adding

to te-noun. Kutchkutch (talk) 00:46, 5 February 2024 (UTC)
 * Adding
 * at the end of te-noun seems to work for categorisation but not for the headword line. Kutchkutch (talk) 00:59, 5 February 2024 (UTC)
 * @Kutchkutch Glad you figured it out. IMO Module:te-headword needs to be rewritten; it wasn't written by me and doesn't really follow the standard structure for such modules, which is probably why you had difficulty figuring out how to add the appropriate code. Benwing2 (talk) 22:33, 8 February 2024 (UTC)
 * @Kutchkutch Glad you figured it out. IMO Module:te-headword needs to be rewritten; it wasn't written by me and doesn't really follow the standard structure for such modules, which is probably why you had difficulty figuring out how to add the appropriate code. Benwing2 (talk) 22:33, 8 February 2024 (UTC)

Email
Btw, idk if you have notifications turned on for emails, but I sent you one. Vininn126 (talk) 22:24, 8 February 2024 (UTC)


 * Thanks, I responded. For some reason I didn't get an email notification here on Wiktionary even though I do have email notifications turned on. Benwing2 (talk) 22:32, 8 February 2024 (UTC)

bùzháodiào
Hello. Could you help me fix the Traditional Chinese conversion here? Thanks. &#45;--&#62; Tooironic (talk) 00:31, 11 February 2024 (UTC)


 * @Tooironic What's the exact issue? BTW in general I am not too familiar with how the Trad <-> Simp conversion works; User:Theknightwho knows more. Benwing2 (talk) 00:32, 11 February 2024 (UTC)


 * Thank you User:Theknightwho! &#45;--&#62; Tooironic (talk) 00:39, 11 February 2024 (UTC)

hmm
How much longer is it going to take you to finally finish making this new pronunciation module for Polish? You've been doing it for several months now, hurry up, or someone might think you're getting a little lazyyy :) Gugugagasraniewbanie (talk) 08:30, 13 February 2024 (UTC)


 * @Gugugagasraniewbanie Yeah it will happen soon. Benwing2 (talk) 08:32, 13 February 2024 (UTC)
 * OK, then you have my forgiveness Gugugagasraniewbanie (talk) 08:35, 13 February 2024 (UTC)

Mon-Burmese script
I changed some letters defined for specific languages (e.g. "X is a letter of the Shan alphabet") to that language (i.e. Shan), then added a request for definition to the translingual entry. If this is somehow considered vandalism, I'll revert myself, but I'm assuming obvious fixes like this are acceptable, an it parallels other entries that only have definitions for specific languages. (A definition might be as simple as stating that it's a letter of the Mon-Burmese script corresponding to a certain letter in Sanskrit, but I didn't do that myself as I thought I might be accused of vandalism.)

I also removed a couple pronunciations that were for the wrong entry. kwami (talk) 04:25, 14 February 2024 (UTC)


 * @Kwamikagami "Vandalism" doesn't seem like the right word for changes that are in good faith. As to whether they are wrong or counterproductive I don't know but they seem generally fine to me. User:RichardW57 do you have any comments? Benwing2 (talk) 04:45, 14 February 2024 (UTC)
 * Okay, "blockable offense" then. kwami (talk) 04:47, 14 February 2024 (UTC)
 * Yeah I understand. BTW I think blocking is only likely if you edit-war or keep making changes of a specific nature after people have objected to them. (Also editors who don't know what they're doing but think they do; editors of this nature can do a lot of damage.) Wikipedia seems generally more tolerant of edit-warring, maybe because of the number of editors relative to how many articles there are. Benwing2 (talk) 04:57, 14 February 2024 (UTC)
 * Which Shan alphabet? There are several Shan languages, which often makes the letters translingual because shared by several Shan languages!  The change seems backwards - I would have said that the thing to do was to waste space by adding the Shan entry.  As Burmese-script words easily consist of a single letter, cloning letters to each language using them makes Wiktionary more difficult to find by eye, in accordance with the apparent aim of difficulty of use. --RichardW57 (talk) 08:49, 14 February 2024 (UTC)
 * If there are other Shan languages besides [shn], and they use the same letter, then they should be listed. But as it was, they were not listed -- only [shn] was.
 * And yes, I know you want to lump all languages together, but that's not the consensus for Wikt. kwami (talk) 18:49, 14 February 2024 (UTC)
 * We have Shan (shn), Khamti Shan (kht), Aiton (aio), Phake (phk) and Tai Laing (tle) that use the Burmese script. The Tai Nuea (tdd) (= Tai Le /Tai Dehong / Chinese Shan) (not to be confused with Northern Tai or Northern Thai) and Tai Khuen (kkh) (though their speech is more akin to Northern Thai, but they identify as Shan) use different scripts.  There's also Khamyang (ksu or nrr).  Tai Ahom should arguably be included, but again it has its own script. --RichardW57 (talk) 23:32, 14 February 2024 (UTC)
 * And when we say a letter is used by [shn], do we necessarily know that it's also used by the others? E.g. in Lik-Tai for Khamti? The label "Shan" may cover multiple languages in some usage, but when Wikt has an entry for Shan [shn], we mean specifically that language. When we mean Khamti, we say Khamti. Etc. But sure -- if we can demonstrate that a letter is used by multiple languages, we can say that it's used for multiple languages. Though when giving the pronunciation and orthographic rules, we need to be careful not to present [shn] as representative if it isn't. kwami (talk) 01:23, 15 February 2024 (UTC)

Seeking template help
Hi, we find your Hindi language templates very helpful. Could you assist us with essential Sylheti templates (language code: syl) on English Wiktionary? We could contribute with translations, although we are still familiarizing ourselves with Wiktionary policies. -- ꠢꠣꠍꠘ ꠞꠣꠎꠣ (talk) 07:52, 16 February 2024 (UTC)


 * @ꠢꠣꠍꠘ ꠞꠣꠎꠣ Hi I'm up to my ears in requests so I'm won't be able to get to this soon, although if someone else wants to work on it using the Hindi modules as a starting point, I can provide guidance. Benwing2 (talk) 09:55, 16 February 2024 (UTC)

Category:Romance terms inherited from Latin nominatives
Hi. Sorry, I think I was a bit too 'bristly' with how I responded earlier. I really do support removing these categories and sticking the relevant content into 'Appendix: Romance terms plausibly inherited from Latin nominatives'. Nicodene (talk) 17:21, 18 February 2024 (UTC)


 * @Nicodene This sounds good to me and "plausibly" sounds like a good term to use, and I apologize if I also was a bit in-your-face. If you can write the appendix and put the terms there in a list, I can remove the categories from the terms by bot. Benwing2 (talk) 19:56, 18 February 2024 (UTC)
 * Done. This should actually make it easier for me to reorganise/restructure it all, which I've been meaning to do. Nicodene (talk) 20:39, 18 February 2024 (UTC)
 * @Nicodene Thanks! Benwing2 (talk) 00:45, 19 February 2024 (UTC)
 * @Nicodene I am going to remove the pages listed in the appendix from the '... inherited from Latin nominatives' categories. Just checking that this is OK with you. Benwing2 (talk) 04:56, 19 February 2024 (UTC)
 * Yes, go for it please. Nicodene (talk) 05:01, 19 February 2024 (UTC)
 * @Nicodene OK it's done. BTW the appendix is looking good and I'm glad you have included detailed notes. Benwing2 (talk) 05:26, 19 February 2024 (UTC)

Macrolanguages
Hi - do you have any ideas for how we could handle macrolanguages in the data (Chinese being the most obvious example, given how we handle Chinese L2s). I’m not keen to create a whole new type of object, since this situation comes up in loads of places, as we don’t have a coherent distinction between “is a type of” and “is a descendant of”, leading to the issues I mentioned in WT:RFM, where Teochew and Leizhou Min are “descended from” Min Nan, whereas they’re actually types of Min Nan.

I suspect you’ve noticed similar things with how Persian and Latin are handled. One common situation which stands out are language periods: we list Old Latin as ancestral to Latin, but as it’s an etym-only language of Latin that technically means we’re saying it’s ancestral to itself. Same for Early Modern English and English, and so on. We get round it by adding an explicit check to Module:languages to prevent a language being ancestral to itself, but that’s a kludge which is symptomatic of our poorly defined language model.

Also see the Japonic family tree at Category:Proto-Japonic language, where the periodisation of Japanese is all messed up because they’re all treated as etym-only languages part of Japanese, even though Early/Late Middle Japanese have Middle Japanese as their immediate parent. (They currently display in the wrong order, since Middle Japanese should not be listed before Early Middle Japanese if we were to follow the same system as Latin; the data is correct but Module:family tree is bugged.) A much bigger issue is that we imply Middle Japanese is split into three periods, and that the central period is somehow representative. This is confusing at best, and outright misleading to anyone who isn’t familiar with the nuances of our data modules. Theknightwho (talk) 18:29, 18 February 2024 (UTC)


 * @Theknightwho Since you have merged etym-only and full languages to the point that both are more or less just types of Language objects, can we not just have a "type" field identifying something as a macrolanguage? That way it will still work as a language for most purposes. IMO we do need to properly distinguished is-a-X and is-a-descendant-of-X, and it seems you've provided a way with the ancestors field. As for the issue of Old Latin vs. Latin, we do have a "Classical Latin" etym language and ultimately we need to push more in this direction, although it will require some thinking. These are just the thoughts off the top of my head. Benwing2 (talk) 19:54, 18 February 2024 (UTC)
 * @Benwing2 Thanks - that's helpful to think about.
 * I'd rather not have a specific macrolanguage field, since it's superfluous to whether or not something is set as being a "type of" that language. I think the handling of Chinese, Latin, Persian, English and (one I missed above) Norwegian should probably all be done in the same way. At the most extreme end, the Sinitic family and Chinese are in fact the same thing, so I'm more inclined towards having a way to set one language as a type of another (as we do with etym-only languages), fully merging etym-only languages into languages, and then having a flag which sets whether it should be treated as a full language. That way, we also get rid of the weird half-and-half situation going on with Classical Persian and the arbitrary distribution of Chinese lects between language and etym-only language, while making it more straightforward to switch something from one to the other (e.g. the Prakrits). It may also be worth doing the same with families, since (as Chinese shows) macrolanguages and families are basically the same thing in most situations.
 * I think we probably need some kind of periodisation mechanism. In the case of Latin, if we're treating Old Latin as a "type of" Latin, then strictly speaking Latin's ancestor should be Proto-Italic. However, within that we could have the various periods, including Classical Latin, and there should be a way to set a default period for situations when only the generic language code is provided. For most languages that would be the standard language; in the case of Latin, it would be Classical. This would alo potentially address the issue of cross-overs between regional lects and periods: e.g. Northern Early Modern English, and should also help avoid the silly Japanese situation, since periods should be possible to nest inside each other. Theknightwho (talk) 20:10, 18 February 2024 (UTC)
 * @Theknightwho All this sounds good to me in general although it would be helpful if you could write out your proposals in more detail as it's sometimes a bit hard for me to work out what your thoughts are when presented abstractly. Benwing2 (talk) 20:31, 18 February 2024 (UTC)
 * @Benwing2 Will do. I’ll also have a think about how we should handle this in the family tree display, since a lot of the confusion stems from that displaying descendants and variants/types in exactly the same way. Theknightwho (talk) 20:52, 18 February 2024 (UTC)
 * One problem that needs to be addressed is that language change doesn't always follow a tidy tree model. Macrolanguages are messy. A macrolanguage always has a standard lect that the other lects identify with- but there can be more than one, and which lect is the standard can change over time. Even some of the more complex ordinary languages have similar phenomena. This can end up being reflected in the history of languages both within and deriving from the (macro)language.
 * With English, you have the same language changing its prestige/standard dialect several times in Old English due to the rise and fall from prominence of specific kingdoms: Anglia, Mercia, Northumbria, and finally Wessex (this is off the top of my head- I'm sure I missed something). With the transition to Middle English it all moved to London. Middle English borrowed heavily from Old Northern French, but since then the source has been Parisian French. Scots split off from the northern dialects that descended primarily from Northumbrian. I'm sure there were changes in the Old Norse dialects that Old English and Middle English borrowed from, and then there's the matter of Brythonic Pictish and Goidelic Gaelic in Scotland and their influence on Scots and northern English.
 * China had several changes in which were the prestige lects, and these are reflected in the various named yomi in Japanese, as well as the borrowings into other neighboring languages. Then there's Mycenaean Greek, which is different from whatever became Ancient Greek, and the fact that older Latin borrowings didn't come from the Attic dialect that became modern Greek, and Tsakonian that came from Doric, etc.
 * If you look at a regional lect, you can find things descended directly from the same region in the ancestral language, and things that came in from the standard lects of the different historical stages, and other things that were borrowed from various external languages. Sometimes separate languages split off from these regional lects, so they have more in common with the regional varieties of the main language than with the standard lects of any historical period.
 * To stretch the tree analogy a bit: sometimes a limb that's touching the ground sets root and becomes a tree in its own right, and other times branches or roots from separate trees graft together after prolonged contact.
 * I seem to have written a book here, but I hope you can see what I'm getting at. It would be a good idea to think about some way of representing the internal structure of macrolanguages and even regular languages, and the way that different descendants can come from different parts of the same language. There's a complex interchange between region and historical period, so the Wessex dialect of today has a completely different status from the Wessex dialect of a thousand years ago, and the geographical identification of what's mainstream and what's dialectal changes over time. It's all secondary to the main concept of parent and daughter language, but it might help us with some exceptional cases like Chinese. Chuck Entz (talk) 23:15, 18 February 2024 (UTC)
 * Agreed. Even Anglo-Norman, the main vehicle of 'Gallicisms' in Middle English, began as a chaotic hodge-podge of Old French dialects, certainly in many respects 'northern-flavoured', but not only, and increasingly slanting towards (but never quite attaining) Central French norms as the centuries went by. In this case as well there is no question of a precise dialectal ancestry. Nicodene (talk) 14:34, 19 February 2024 (UTC)
 * I seem to have written a book here, but I hope you can see what I'm getting at. It would be a good idea to think about some way of representing the internal structure of macrolanguages and even regular languages, and the way that different descendants can come from different parts of the same language. There's a complex interchange between region and historical period, so the Wessex dialect of today has a completely different status from the Wessex dialect of a thousand years ago, and the geographical identification of what's mainstream and what's dialectal changes over time. It's all secondary to the main concept of parent and daughter language, but it might help us with some exceptional cases like Chinese. Chuck Entz (talk) 23:15, 18 February 2024 (UTC)
 * Agreed. Even Anglo-Norman, the main vehicle of 'Gallicisms' in Middle English, began as a chaotic hodge-podge of Old French dialects, certainly in many respects 'northern-flavoured', but not only, and increasingly slanting towards (but never quite attaining) Central French norms as the centuries went by. In this case as well there is no question of a precise dialectal ancestry. Nicodene (talk) 14:34, 19 February 2024 (UTC)

Italicising synonyms for taxonomic names
Hi Benwing. Could you edit Module:form of, Module:form of/templates, and/or T:synonym of to add the ability to italicise the linked-to term in transclusions of (preferably by calling i), please? Such functionality is needed for taxonomic synonyms. ATM, work-arounds like those seen in Asclepias filiformis var. buchenaviana, Bulbophyllum buchenavianum, Gomphocarpus filiformis var. buchenavianus, Megaclinium buchenavianum, and Tropaeolum buchenavianum are necessary. 0DF (talk) 00:38, 19 February 2024 (UTC)
 * who would know how this is handled in other taxonomic entries. Chuck Entz (talk) 01:08, 19 February 2024 (UTC)
 * Now, syn of (and alt form of, possibly others) suppresses italics formatting that taxlink provides or direct or piped wikitext formatting. All we would need is templates like syn of and alt form of to handle embedded wikitext for italics, as is now possible in other templates that incorporate links. Alternatively Something like syn of, say taxsyn (also taxalt), would have all the formatting capabilities taxlink, which include not italicizing terms like "var.", "section" ("sect.", "subsect"), "subg.", and "subsp." in taxonomic names. This would probably not involve too much renaming of templates at this point. DCDuring (talk) 13:58, 19 February 2024 (UTC)
 * And it would be nice to allow † to appear without requiring pipes. DCDuring (talk) 14:37, 19 February 2024 (UTC)


 * I assume it would be possible to include the non-italicising functionality of in  by making it contingent upon both mul and 1 being true. I can't imagine a case in which one would want to define a term as a synonym of something translingual that contains any of the strings, , , , or ; italicise it; and for that term not to be a taxonomic name. 0DF (talk) 14:38, 19 February 2024 (UTC)
 * The italicization rules of the various taxonomic bodies include that all taxonomic names (ie, any rank) of viruses, bacteria, and archaebacteria be italicized. It is probably simpler to use passed-through wikitext italics than to duplicate taxlink functionality. DCDuring (talk) 14:47, 19 February 2024 (UTC)


 * I only meant 's functionality of automatically de-italicising those few abbreviations. Italicising dependent on a parsing the taxon (as a species, genus, phylum, or whatever) seems superfluous and unnecessarily complicated for ; 1 should be all that's necessary. 0DF (talk) 14:59, 19 February 2024 (UTC)
 * It seems too complicated to me too, but I've often been surprised with what our techno-mavens are willing to do, for reasons that remain mysterious to me. Simply passing through wikiformatting (and, possibly, "†") would be fine with me. It would be easy enough to find the relatively few instances we would have of improper handling of those not-to-be-italicized terms in syn of, alt of, and the various etymology templates, too. DCDuring (talk) 19:06, 19 February 2024 (UTC)


 * How would you want the obelus to be treated? 0DF (talk) 22:42, 19 February 2024 (UTC)
 * Directly in front of taxon, ignored for linking, but displayed without being italicized. DCDuring (talk) 12:42, 20 February 2024 (UTC)


 * De-italicising would be handled in the same way as it's handled for, , , , and , I expect. Stripping † from the link text would be easy (handled in the same way 🇨🇬 link to 🇨🇬), but it may end up being enacted in undesirable circumstances. Do we need a new ( ?) language code for taxonomic names, perhaps? 0DF (talk) 18:06, 20 February 2024 (UTC)
 * I'd prefer a shorter one, of course, like 'mult' or 'mul-t'. DCDuring (talk) 18:27, 20 February 2024 (UTC)


 * How much freedom do we have in devising language codes? 0DF (talk) 18:30, 20 February 2024 (UTC)
 * You'd have to get consensus at WT:RFM for it. I wouldn't hold my breath. —Mahāgaja · talk 18:48, 20 February 2024 (UTC)


 * Thanks for the response. I mean, rather, what restrictions are there on the form that language codes take? I know we use ISO 639-3 codes where they're available, but what about custom, in-house codes? 0DF (talk) 20:17, 20 February 2024 (UTC)
 * @0DF @Mahagaja @DCDuring We actually already have  as a variant of Translingual (no idea when it got added, but see Module:etymology languages/data). I don't think it's used for anything at the moment, but it would make sense to use it for this. Theknightwho (talk) 20:25, 20 February 2024 (UTC)


 * Thank you. How 'bout it? 0DF (talk) 20:29, 20 February 2024 (UTC)
 * I always fear that the cure will turn out worse than the disease. Can it all be done automagically or will there be a few hundred exceptions? It is true that mul in Latin script is hard to confuse with mul in CJKV. DCDuring (talk) 20:37, 20 February 2024 (UTC)
 * Daniel Carrero added  "for test purposes" back in November 2016; -sche then standardized it to  . I don't know what he was testing, but the code is there for anyone who wants to use it. —Mahāgaja · talk 20:38, 20 February 2024 (UTC)


 * BTW, why are discussions like this conducted on a userpage rather than, say, BP or GP? Does that just reflect where the power is? DCDuring (talk) 20:42, 20 February 2024 (UTC)


 * I looked at the histories of Module:form of, Module:form of/templates, and . They showed me that Benwing had done a lot of editing on all three, so I figured would be sufficiently familiar with those pages to make the changes I requested. There's nothing suspicious about that and I hardly see how I can be said to have "power" here. 0DF (talk) 00:33, 21 February 2024 (UTC)
 * It's a habit of exclusion, not an intent of exclusion. Specific folks can always be pinged. DCDuring (talk) 14:31, 21 February 2024 (UTC)


 * I guess so. Not that I intended the request to turn into a prolonged discussion. 0DF (talk) 15:12, 21 February 2024 (UTC)

Error handling with Module:parameters and Module:languages
Hiya - just a heads up (and you've probably noticed already), but I've recently updated Module:parameters to allow languages, scripts, families (etc) as data types, as well as a few other things. The means that the argument table which is returned contains the relevant object(s), and invalid codes will throw an error (which automatically highlights the incorrect parameter). This avoids having to manually handle invalid codes, since the only way to do proper error-handling previously was to pass the ready-baked parameter into Module:languages using 's   parameter, which was tricky when dealing with lists etc. Having converted a number of template modules, it's also cut down on code length by quite a bit, too.

Ideally, we should be able to remove error handling from Module:languages and Module:scripts altogether at some point, since it doesn't really belong there, and it's annoying having to work around it when requesting etymology langs and families, too. Theknightwho (talk) 15:21, 27 February 2024 (UTC)


 * @Theknightwho Yup I did notice it, thanks. I haven't had a chance to use the new functionality but it sounds good to me. BTW if you haven't already done this you might consider adding support for comma-separated lists of lang codes and for a term with a preceding language code (see in Module:parse utilities, which implements this latter functionality currently). Benwing2 (talk) 20:01, 27 February 2024 (UTC)
 * @Benwing2 I've already done the comma-separated list actually, but haven't updated the documentation since I want to make sure the implementation is stable/won't need further expansion. The solution I opted for was, where   splits the list using  , but using a string value allows for other splits. The other thing which isn't yet documented is  , which is for parameters that take an (ideally small) closed set of values, where inputs with other values would be nonoperative anyway.
 * I'll have a think about how to handle preceding langcodes. Theknightwho (talk) 20:07, 27 February 2024 (UTC)
 * @Theknightwho The set support is definitely useful. Note that the corresponding flag in Python's module is called choice, which might possibly be a clearer name (although I can see the argument for using  as well). Benwing2 (talk) 20:16, 27 February 2024 (UTC)
 * @Benwing2 That makes sense. The reason I opted for  is because it uses the {{code|lua|{a {{=}} true, b {{=}} true, c {{=}} true$)$}} format, since that makes lookup much faster/simpler. Theknightwho (talk) 20:26, 27 February 2024 (UTC)
 * @Theknightwho Hmm, I wonder if that isn't false economy since it requires more typing, and I imagine a lot of people will call listToSet on a list to handle this format. Benwing2 (talk) 20:28, 27 February 2024 (UTC)
 * @Benwing2 That's a good point, but checking a list is the same amount of work as doing listToSet, so changing Module:parameters to accept a list would simply guarantee the worst-case scenario, instead of leaving it up to the calling module. Theknightwho (talk) 20:34, 27 February 2024 (UTC)
 * @Theknightwho I suppose but the actual difference in memory and speed is completely negligible, so IMO you might as well make it easier for the callers. Benwing2 (talk) 20:54, 27 February 2024 (UTC)
 * And also you don't have the overhead of loading a new module. Benwing2 (talk) 20:54, 27 February 2024 (UTC)
 * @Benwing2 If I have time, I might do some profiling on Module:parameters, since I have a feeling it's contributing a significant chunk to page loading time. e.g. a loads about a second faster since I made the changes, and there are still quite a few other optimisations that could be made. Theknightwho (talk) 21:02, 27 February 2024 (UTC)
 * @Theknightwho OK but I still think requiring the use of a set rather than (also) allowing a list is a micro-optimization since the number of items should be small. Benwing2 (talk) 21:10, 27 February 2024 (UTC)
 * @Benwing2 Alright - I can change it. Theknightwho (talk) 21:16, 27 February 2024 (UTC)
 * @Theknightwho & Ben: pardon the partial threadjacking, but I've been waning to ask you two about the practicality of adding parameter checking to existing, non-Lua templates, and this seems like an opportune moment while you're both already thinking about Module:paramaters. I'm envisioning something like an unobtrusive template 1,2,3,foo,bar,baz that could be added to existing templates to generate errors/warnings when the template is invoked with any params besides those listed. On the backend, it could just call Module:parameters.process with the list of supplied params and then do nothing with the result. Ignoring the difficulty of identifying the valid parameters and cleaning up all the existing calls with invalid parameters, would adding param checking to every template add an unacceptable overhead to page processing? JeffDoozan (talk) 01:45, 28 February 2024 (UTC)
 * @JeffDoozan I think User:Theknightwho can best answer the question about efficiency as he's done a lot more investigations of this sort. Benwing2 (talk) 01:48, 28 February 2024 (UTC)
 * @JeffDoozan That's certainly doable, but it would add an extra Lua burden to those templates, and in many cases it would be more straightforward to do the whole thing in Lua anyway.
 * The reason why it concerns me is that a lot of these mixed templates already make multiple calls into Lua to retrieve things like language names, and there is an inherent cost every time a module is invoked; this is the reason why multitrans is so effective, because it removes that inherent cost from each template. Aside from memory costs, each invocation is quite time-consuming (relatively speaking), since a ton of things are done by the back-end to create each new Lua environment. Theknightwho (talk) 01:48, 28 February 2024 (UTC)
 * Thank you for the explanation. I had naively assumed that if a page calls Lua once, then subsequent calls would be relatively cheap. I'm still assuming that most pages include few enough templates that the benefit of having parameter checking outweighs the cost of invoking the checks, but as pages get bigger and closes to memory/speed limits, the calculus may change. Do you have any guess where that tipping point might be? (100 additional calls? 1,000? 10,000?) For pages that exceed that threshold, maybe allowparams could check the pagename against a fixed "denylist" of problematic pages before invoking Lua. I'm assuming the denylist would be < 100 pages and could be programmatically generated from an XML dump by counting the number of templates that would call allowparams. What do you think? JeffDoozan (talk) 17:39, 28 February 2024 (UTC)
 * @JeffDoozan So conventional wikicode would probably preclude that being workable, because there's the post-expand include size limit of 2MB, which is calculated by adding up the size of every page accessed, multiplied by the number of times it's accessed, and on top of that, parser functions like #if: actually apply a multiplier to anything that goes through them (which compounds, though I think it's capped at something like x12). This was a big problem we ran into with the lite templates, where the bottom 10% of a simply wasn't loading templates anymore. Even now, it's using about 1.8MB of the limit. Obviously I'm being really pessimistic when I say these things, but the irony of it is that adding these kinds of checks to aid large pages can end up having the opposite of the intended effect!
 * The things that help are:
 * Reducing the number of calls into Lua. If it can be done in one invoke that's ideal, but really it should be no more than 5. This includes uses of any templates which themselves are Lua based (like l), since they each result in independent calls into Lua. The Coptic conjugation templates are a great example of why this matters, since they're way slower than water/translations despite having nowhere near as many links.
 * Not creating complex wikicode logic with the parser functions (like we do with the citation templates, for example). They're really slow, a pain in the neck to maintain, and inevitably result in lots of separate Lua invocations for basic information like language names.
 * In terms of the parameter checking, let me know if there are any templates which are on your priority list, because it may be that we can score some quick-wins by converting some of them into pure-Lua, whereas with others the manual parameter checking may be workable. Theknightwho (talk) 17:51, 28 February 2024 (UTC)
 * @TheknightwhoThat kind of deep information is exactly why I wanted to run this by you. Since I'm hoping to do this programatically and en-mass, it would be limited to templates where I can parse the code to find all of the parameters used, which eliminates anything already calling #invoke since the invoked module can make its own use of the parameters and I'm not sure how practical it is to try to determine the parameters used by a Module. I think this means that every modified template would mean 1 additional call to Lua for every use and also that there's likely little or no benefit to converting them to Lua. How many total Lua calls on a page is too many?
 * I would probably start with the templates that don't already have calls with bad parameters, which probably means the lesser used templates that might not even be included on our bigest pages. I can check which templates are used on pages with more than X template calls and exclude those templates from the mass conversion, to ensure we're not adding additional stress to our biggest pages. I understand that not all template calls are equal, but is there some reasonable number of template calls I could use for detecting "big" pages? 100? 500? 1000? JeffDoozan (talk) 20:34, 28 February 2024 (UTC)
 * In terms of the parameter checking, let me know if there are any templates which are on your priority list, because it may be that we can score some quick-wins by converting some of them into pure-Lua, whereas with others the manual parameter checking may be workable. Theknightwho (talk) 17:51, 28 February 2024 (UTC)
 * @TheknightwhoThat kind of deep information is exactly why I wanted to run this by you. Since I'm hoping to do this programatically and en-mass, it would be limited to templates where I can parse the code to find all of the parameters used, which eliminates anything already calling #invoke since the invoked module can make its own use of the parameters and I'm not sure how practical it is to try to determine the parameters used by a Module. I think this means that every modified template would mean 1 additional call to Lua for every use and also that there's likely little or no benefit to converting them to Lua. How many total Lua calls on a page is too many?
 * I would probably start with the templates that don't already have calls with bad parameters, which probably means the lesser used templates that might not even be included on our bigest pages. I can check which templates are used on pages with more than X template calls and exclude those templates from the mass conversion, to ensure we're not adding additional stress to our biggest pages. I understand that not all template calls are equal, but is there some reasonable number of template calls I could use for detecting "big" pages? 100? 500? 1000? JeffDoozan (talk) 20:34, 28 February 2024 (UTC)

"terms spelled with"
Hi, I would like to bring your attention to categories such as Category:Hindi terms spelled with ॉ. We seem to have decided that ◌ (U+25CC) should not be used for the Hindi combining characters, but Translingual doesn't seem to know about that, which is why Category:Translingual terms spelled with ◌ॉ exists. What should we do about that? --kc_kennylau (talk) 16:47, 28 February 2024 (UTC)


 * @Kc kennylau Can you explain further about U+25CC? What is its replacement? As for the "terms spelled with" categories, AFAIK these categories are suppressed for one-character entries but this entry seems to involve two Unicode chars. Maybe User:Theknightwho can comment more as he reworked the code to generate these categories. Benwing2 (talk) 02:40, 29 February 2024 (UTC)
 * U+25CC is usually used with combining characters (see Category:Translingual terms spelled with ◌̺, which is U+25CC followed by U+033A) in order to display the character. However, due to some unknown reasons, at least in my browser the Hindi combining characters in "isolation" already come with a dotted circle when they are rendered, so using U+25CC would create two dotted circles when displayed. I tried to look at The Unicode Standard, but so far it seems to me that this is not really specified one way or another, at least not specifically for Devanagari. This is why I don't really know if we should include U+25CC or not. --kc_kennylau (talk) 02:48, 29 February 2024 (UTC)

(moved to Beer parlour/2024/April) --kc_kennylau (talk) 00:59, 4 April 2024 (UTC)

Latin macronization change: veho, vē̆xī, vectum
Hello, I was just looking into the vowel length of Latin vē̆xī (perfect of ) and it looks like most recent sources think there's a good chance that it had a long vowel like Sanskrit ávākṣam (although there is some uncertainty). I edited the entry for with notes on this and to mark the vowel in the perfect stem as ē̆, but of course, that doesn't affect all the inflected forms and derived compounds (e.g. advehō, convehō, invehō, prōvehō, subvehō, trānsvehō, ēvehō). Could you have Wingerbot update those? (The long vowel seems to only be reconstructed for the perfect stem vē̆x-, not the supine stem vect-). I hope it's not too much trouble. I have also been wondering how I might set up a bot account of my own to make changes like this after editing the length of a vowel in Latin entries; if that's feasible for me to do, any tips would be welcome! Urszag (talk) 20:46, 1 March 2024 (UTC)


 * @Urszag Hi. I'll go ahead and fix these. As for setting up a bot account, in order to do that (a) you need to be able to write Python scripts, (b) you do some small test runs using your own account and verify that everything works, (c) you set up a vote to create an account for your bot using the link in WT:Votes. I recommend using a combination of pywikibot to interface to Wiktionary and mwparserfromhell to parse the template invocations on a given page. Note that there's also AutoWikiBrowser which lets you make semi-automated changes based on regular expressions and takes less work to set up than a bot account; I used this several years ago before I set up a bot account. (It is only supported on Windows but it seems to work OK through Wine on MacOS, and there's also a JavaScript browser variant called JWB.)
 * BTW are there are any other macron changes you need done? I think there's an outstanding request somewhere in my archives that I never got to, possibly it was from you. Benwing2 (talk) 01:49, 2 March 2024 (UTC)
 * Done. Benwing2 (talk) 05:18, 2 March 2024 (UTC)
 * OK, I found the previous request. It was from you in April 2023: User talk:Benwing2/2023. You mentioned hirtus, hirsutus, luxus, luctor. The relevant part of the input to my script has this:

a1 hīrtus pn2 Hīrtius a1 hīrsūtus a1 hīrtellus a3 hīrtipēs hīrtiped v1+ lūctor n1 lūcta n3 lūctātiō n3 lūctātor v1+ adlūctor v1+ allūctor v1+ collūctor n3 collūctātiō v1+ conlūctor n3 conlūctātiō v1+ ēlūctor a3 ēlūctābilis a3 inēlūctābilis v1+ relūctor n3 relūctātiō a1 lūxus n4 lūxus v1+ lūxō
 * 1) hīrtus
 * 1) hīrtus
 * 1) lūctor
 * 1) lūctor
 * 1) lūxus "dislocated"
 * 1) lūxus "dislocated"
 * Do all these need to change to ī̆ ū̆? Are there any words missed here? Also can you give me the appropriate changelog comment(s) to have the bot add when making the changes? The default is but that's obviously wrong for these cases. Benwing2 (talk) 05:30, 2 March 2024 (UTC)
 * Thanks! Those all look correct with ī̆ ū̆. I would add lūxuria, lūxuriō, lūxuriōsus, lūxuriēs, obluctor.


 * In addition, it looks like I missed some inflected forms of derivatives of nūbō, nūpsī, nū̆ptum when I made that change (e.g., ). Specifically, there's innūbō, inflected forms of innū̆ptus, nū̆ptia, nū̆ptiae, nū̆ptiālis, nū̆ptus (It seems I just edited the main entry for these), and connūbium and its inflected forms.


 * I just made a new change to the perfects of alliciō, allē̆xī (formerly marked as just long) and illicio, illexī and pellicio, pellexī (formerly marked as just short) to mark them as uncertain (it seems likely all three had the same quality, probably short). These just need the inflected verb forms updated.


 * The references I'm basing these on are cited at the pages for, , , , , , so I think one option is to add notes of the format "Vowel length marked as uncertain based on references cited at ", and so on. Or the specific references could be listed as follows. Hirt- and lux-: uncertain based on Bennett (long) vs. De Vaan (short). Luct-: uncertain based on Bennett (long) vs. De Vaan, Wartburg, Buchi and Schweickard (short, with complications). Allex-: uncertain based on Bennett, Buck and Allen. Nupt-: uncertain based on Lewis and Bennett (long) vs. De Vaan, Ernout and Meillet, Wartburg and Bienvenu (short). -nubium: uncertain per Kennedy. -licio, -lē̆xī: uncertain per Bennett and Buck, "probably short" per Allen.--Urszag (talk) 15:13, 2 March 2024 (UTC)
 * @Urszag Done. Note that there also exists conubialis, which is currently indicated with long ū. Not sure if this needs ū̆. Benwing2 (talk) 06:18, 4 March 2024 (UTC)
 * Thank you! Yes, conubialis seems to be like conubium.--Urszag (talk) 06:36, 4 March 2024 (UTC)

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order
how can you change the alphabet order of the Hijazi Arabic letters from to since پ and ڤ are additional letters and not part of the Alphabetical order عربي-٣١ (talk) 12:39, 2 March 2024 (UTC)


 * @عربي-٣١ Are you referring to the sort order as it appears on category pages? The thing is, those additional letters are letters even if they aren't part of the standard Hijazi alphabet, and they need to be sorted *somewhere*. The "to chart" you gave doesn't include them anywhere. Benwing2 (talk) 22:47, 3 March 2024 (UTC)
 * Oh NVM, you want them placed at the end. Benwing2 (talk) 22:48, 3 March 2024 (UTC)
 * @Theknightwho @Fenakhay What do you think about this? It looks to me like there is no explicit sort key currently specified for Hijazi Arabic (nor for Egyptian and Gulf Arabic). Standard Arabic has a sort key but only for one Judeo-Arabic character, and Moroccan Arabic has a sort key of some sort that has no comments so I'm not sure what it's doing. IMO we should strive to treat all varieties of Arabic the same as much as possible, e.g. in using the same sort order everywhere as much as feasible; the additional letters correspond to /p/ and /v/, which are marginal phonemes in most varieties of Arabic (with the possible exception of /p/ in Iraqi varieties?). (Also per Wikipedia's article, there are two different ways of writing /v/ in the Arabic script, corresponding to an East-West split.) Benwing2 (talk) 03:06, 4 March 2024 (UTC)
 * @Benwing2 Well they are additional variants of letters (foreign letters) and should be included at the end of this list, since they are already included as the last when you check the pages in any of the Arabic dialects sorting pages, also the Arabic sorting key should be from right to left as with the rest of Arabic dialects (not from left to right as it is now in Category:Arabic terms) عربي-٣١ (talk) 16:01, 13 March 2024 (UTC)
 * @عربي-٣١ Sounds good to me but can you post about this in the Beer parlour (WT:BP) to make sure no one objects? Benwing2 (talk) 18:05, 21 March 2024 (UTC)

Replacement of quotation templates
Hi, when you have time could you please do the following quotation template replacements?



Thank you! — Sgconlaw (talk) 13:45, 3 March 2024 (UTC)


 * @Sgconlaw Done. Benwing2 (talk) 22:47, 3 March 2024 (UTC)
 * Thanks! — Sgconlaw (talk) 11:28, 4 March 2024 (UTC)
 * By the way, was the replacement also done? I couldn’t tell; maybe none of the entries it’s used in are on my watchlist. If so I’m changing the template to swap around the 1 and 2 parameters so that the template is in line with other templates. — Sgconlaw (talk) 11:37, 4 March 2024 (UTC)
 * @Sgconlaw Yes. There were only a few pages using those params though. Benwing2 (talk) 21:08, 4 March 2024 (UTC)
 * OK, great. — Sgconlaw (talk) 22:04, 4 March 2024 (UTC)

Bugs in ar-conj/module:ar-verb
Hi. I want to inform you about a couple of problems in ar-con/module:ar-verb. I already informed Fenakhay about'em, I'll also inform you just in case, perhaps you can sort it out. I'm sorry in advance for my post being this long:

when I was looking for entries on حَيَّ/حَيِيَ (root ح ي و), I saw long present tense alone (يَحْيَا) still being generated for short form, and it doesn't generate the short one (يَحَيُّ), which exists per Lisan al-Arab:. Needs to be fixed to generate short present tense.

Also a related problem is for عَيَّ/عَيِيَ (root ع ي ي), while the conjugation table for long form عَيِيَ will be generated with specified paradigm i/a with long present يَعْيَا, unlike with حَيَّ, conjugation table for عَيَّ won't be generated at all. Btw, it also has short version of present: يَعْيُّ:

Also notice how participles aren't generated at all for حَيَّ/حَيِيَ (should be short and long versions: حَيّ and حَيِيّ). Fixmaster (talk) 20:45, 5 March 2024 (UTC)

Bugs in ar-conj/module:ar-verb (part 2)
Also notice how participles aren't generated at all in conjugation tables for حَيَّ/حَيِيَ (should be short and long versions of active participles: حَيّ and حَيِيّ). Same goes for عَيَّ/عَيِيَ (should be عَيّ/عَيِيّ per dictionaries).

And if you generate the conjugation table with عَيِيَ (don't forget, the table for عيَّ won't generate at all), there will be participles, but with wrong form: عَايٍ for active and مَعْيُوّ for passive.

Btw, speaking of passive participles, what they should be? In almaany online dictionary, I found مَحْيىّ and مَعِيّ correspondingly. Notice how patterns don't match? In any case, they could probably be ignored, those passove are mostly theoretical and impersonal, anyway. Just thought it was worthy of mentioning.

What matters is the ability to generate the conjugation table at all for short version verb عَيَّ like we have for حَيَّ, long present tense for these 2 (يَحَيُّ and يَعَيُّ) which currently isn't generated, and generation of short/long active participles (حَيّ/جَيِيّ and عَيّ/عَيِيّ)

Just as a side note: maybe there should be parameters in the template to forcefully override active/passive participles (like we have the parameter for verbal nouns)? Just an idea. Fixmaster (talk) 20:41, 5 March 2024 (UTC)

About categories
Feedback on categories from a not-so-clever reader, if you allow me. I find Categories at en.wikt very complex and unpatrolled (many were started by someone, and then were left untouched). Some of them are broken in so specialised subcategories, that one cannot find a wanted word e.g. dog in Cat:en:Animals. Is there an index=1 kind of Category-Index (allll members a...z)? We have done this at @el.wikt.Animals, plants, medicine with a different colour. Just 3 or 4 Cats. The little ««« links to the overall Cat for all languages. Also! The code-indicator for topics makes alphabetisations and comprehension impossible: why should a reader know the codes? If a first word is to be avoided, why not the style: Cat:Animals (English)? Thank you for listening. &#8209;&#8209;Sarri.greek &#9835; I 03:31, 6 March 2024 (UTC)


 * @Sarri.greek You've brought up several points and this is a big topic. Can you bring this up in the Beer Parlour? Most of the basic decisions concerning category structure predate me and we'd need consensus to institute any significant changes. Benwing2 (talk) 03:35, 6 March 2024 (UTC)


 * , Here, I am not an admin, it is not my place to bring such things for discussion -my understanding of en.wikt structure and modules is not adequate-. Sir, I have been thinking at el.wikt (from where my admin.collegues, mostly wikipedians, demanded that i stop, for being too autocratic... True: I cannot stand sloppiness, lack of refs, loose CFI etc. :) But same is valid for all wiktionaries perhaps: 20 years have passed. Basics (plus details too) are covered. What now? I think, a general workpage for a.Feedback on the current state. b.The future plans for formation of crews on each subject. Cleanup, reviewing, and unifgying: cats, params, templats. Leadership: vote plans by Xadmin, by Zadmin., people responsible to do the plan and supervise the crews.  If you organise a room /wikt.Future or something... and subpages for Cats, for Temps etc... we could all bring ideas? Plus: a very important thing. en.wikt is now the leader of all wiktionries, where every little wikt. copies from.  IF you had to design a wiktionary from scratch, how would you go about it? Because now, it is a patchwork procedure: adding, correcting in a maze of things... Hhhhh I talk too much too! Sorry &#8209;&#8209;Sarri.greek &#9835; I 04:01, 6 March 2024 (UTC)
 * @Sarri.greek I think in a wiki it's impossible to do everything top-down. It has to be done through consensus. Also I don't think we need a separate wikt.Future discussion forum or anything; that's what the Beer Parlour is for. There's no need to be an admin to initiate a discussion for change, just go ahead. Benwing2 (talk) 04:32, 6 March 2024 (UTC)

Adding a category with multiple subcategories
Hi, I'd like to add categories to track calls to templates with bad parameters but I haven't touched categories before so I wanted to double check that that this is a reasonable idea and that I'm going about it the right way. I think I need to create a parent category and then use a handler for the per-template categories. Since these would be maintenance categories, I would edit Module:category tree/poscatboiler/data/wiktionary maintenance and insert: -- add the variable handlers at the top of the page (the file doesn't currently use any handlers) local handlers = {}

--- snip ---

raw_categories["Pages using bad params when calling a template"] = { description = "Pages that use unrecognized parameters when calling a template.", additional = "These template calls should be reviewed and corrected or removed", breadcrumb = "Bad template params", parents = {"Wiktionary maintenance"}, can_be_empty = true, umbrella = false, hidden = true, }

table.insert(handlers, function(data)	local template = data.label:match("^Pages using bad params when calling (.+)$")	if template then		return {			description = "Pages that use unrecognized parameters when calling " .. template .. ".",	       additional = "These template calls should be reviewed and corrected or removed",			breadcrumb = template,			umbrella = false,			parents = ,		}	end end)

-- add HANDLERS to the existing return table return {RAW_CATEGORIES = raw_categories, HANDLERS = handlers}

I know I can do something similar using template tracking, but I'm trying to make this a little more "user friendly" with the hope that it won't just be me cleaning up these categories. Is there an overhead cost to using categories like this or anything else I should take into consideration? Thanks! JeffDoozan (talk) 21:04, 8 March 2024 (UTC)


 * @JeffDoozan Yup, this approach will work, although you need a few changes: (1) use a raw handler instead of a regular handler (because the category in question doesn't begin with a language name), and the first line of the handler should use `data.category` instead of `data.label`; (2) you don't need the `umbrella` settings because raw categories don't have corresponding umbrella categories. Other than that everything looks good. Benwing2 (talk) 21:18, 8 March 2024 (UTC)


 * After adding categorization to ~300 templates that are used less than 5 times and called at least once with invalid parameters, I think it would be easier for cleanup if the templates were categorized into "language" templates and "general use" templates, like this:
 * Category:Pages using bad params when calling a template
 * Category:Pages using bad params when calling Finnish templates
 * Category:Pages using bad params when calling Template:fi-decl-hame-dot
 * Category:Pages using bad params when calling general use templates
 * Category:Pages using bad params when calling Template:cite-av‎


 * To do that, I came up with the following code:

raw_categories["Pages using bad params when calling a template"] = { description = "Pages that use unrecognized parameters when calling a template.", breadcrumb = "Bad template params", parents = {"Wiktionary maintenance"}, can_be_empty = true, hidden = true, }

table.insert(raw_handlers, function(data)	local template_type = data.category:match("^Pages using bad params when calling (.+) templates$")	if template_type then		return {			description = "Pages that use unrecognized parameters when calling " .. template_type .. " templates.",			breadcrumb = template_type,			parents = ,			hidden = true,		}	end end)

table.insert(raw_handlers, function(data)	local template = data.category:match("^Pages using bad params when calling (.+)$")

if template then template_name_without_namespace = template:gsub("^Template:", "")

-- Check if the template name starts with a hyphenated language code local lang possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?-[a-z][a-z][a-z])-") if possible_language_code ~= nil then lang = require("Module:languages").getByCode(possible_language_code) end

-- Check if the template name starts with a two or three character language code if lang == nil then possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?)-") lang = require("Module:languages").getByCode(possible_language_code) end

local template_type if lang == nil then template_type = "general use" else template_type = lang:getCanonicalName end

return { description = "Pages that use unrecognized parameters when calling " .. template .. ".",	       additional = "These template calls should be reviewed and the bad parameter should be corrected or removed.", breadcrumb = template, parents = , hidden = true, }	end end)


 * Am I just re-inventing umbrella categories? Is there a better way to do this? Would this add unnecessary overhead to the categorization system? JeffDoozan (talk) 22:28, 15 March 2024 (UTC)

A couple of code replacements
Hi, as part of the Min Nan split, would it please be possible for you to bot replace a couple of the codes which are being deprecated? The only places these are now used should be links, which should make the switch straightforward. Thanks. Theknightwho (talk) 01:12, 11 March 2024 (UTC)
 * 1) Hokkien:   &rarr;   (etym-only to full language conversion)
 * 2) Teochew:   &rarr;   (code standardisation within the   family)


 * @Theknightwho Sure, will do. Benwing2 (talk) 01:30, 11 March 2024 (UTC)
 * Thanks. Theknightwho (talk) 01:34, 11 March 2024 (UTC)
 * @Theknightwho Does the code still exist? I can't find any references to it in the language data. Benwing2 (talk) 01:34, 11 March 2024 (UTC)
 * @Benwing2 It's currently set up as an alias, but that's just a temporary thing. I recently changed the way aliases are handled so that they're no longer directly integrated into the data, because (a) that added overhead we don't need most of the time, (b) it makes keeping track of aliases easier by collating them all in one place, (c) it means we can use them for situations like this, where a code is being changed for whatever reason, and (d) we can now use them for full languages without having to complicate the language data (see point c). They're now stored in Module:languages/data at the bottom. Theknightwho (talk) 01:37, 11 March 2024 (UTC)
 * @Theknightwho Ahh, thanks. Benwing2 (talk) 01:41, 11 March 2024 (UTC)
 * @Benwing2 Btw, it does mean the integration isn't quite as smooth as before, since you now can't use aliases for anything that accesses the language data directly as the alias is only looked up during the creation of a language object. In practical terms, that just means they can't be used anywhere in the language data itself (e.g. the ancestors field). That was semi-intentional, though, since we don't really want aliases in the first place. Theknightwho (talk) 01:45, 11 March 2024 (UTC)
 * @Theknightwho Yeah that is fine. I agree we should eliminate aliases as much as possible, and in fact I did that previously with a bunch of random etym-only aliases. Benwing2 (talk) 01:47, 11 March 2024 (UTC)
 * @Benwing2 I've just added a check to Module:data consistency check for alias codes, which covers the data for languages, etym-only languages, families and scripts: all it does is check that none of the subtables has multiple keys (e.g. due to someone adding, which is the old way aliases were handled).
 * The only ones it's found at the moment are for various Arabic script codes, where I consolidated all the ones that had identical tables a while back. Working out what to do with them will need a proper discussion, though. Theknightwho (talk) 02:43, 11 March 2024 (UTC)
 * @Theknightwho Yeah I've never been very happy with having a bunch of language-specific script codes for Arabic and certain other scripts. However, I'm not sure whether it's possible to eliminate them (or some of them) using things like language selectors in CSS. Maybe User:This, that and the other and/or User:Erutuon can comment more. Benwing2 (talk) 02:48, 11 March 2024 (UTC)
 * @Theknightwho I did a replacement run for both codes but as the tracking categories were only added yesterday, it will take longer to flush out all the old usages (indeed I now see 8 new pages in the nan-hok category and 3 in the zhx-teo category). Benwing2 (talk) 03:22, 11 March 2024 (UTC)
 * Thanks. Theknightwho (talk) 05:38, 11 March 2024 (UTC)
 * @Theknightwho I'll do another run tomorrow. Benwing2 (talk) 05:41, 11 March 2024 (UTC)
 * @Theknightwho Did another run. Going to bed now but will do another one tomorrow evening; hopefully that will catch any stragglers. Benwing2 (talk) 08:42, 11 March 2024 (UTC)
 * Sounds good - thanks. Theknightwho (talk) 08:43, 11 March 2024 (UTC)
 * @Theknightwho I did two runs, one just now and one about 10 hours ago, and already more have appeared, so it may be a few days before everything catches up and there are no more additions to the tracking categories. Benwing2 (talk) 07:50, 12 March 2024 (UTC)
 * @Theknightwho I went through CAT:Terms derived from Hokkien and CAT:Terms derived from Teochew recursively and changed all the terms in them as well as remaining tracked terms (including uses in rfp and cog and such). I *THINK* this is done now; probably close enough that you can delete the old codes and handle any remaining errors as they occur. Benwing2 (talk) 22:40, 12 March 2024 (UTC)
 * @Benwing2 Thanks - I caught one, but that looks to be it. Theknightwho (talk) 18:49, 13 March 2024 (UTC)
 * I have also wondered why we use those special lang+script codes for the  and   scripts. Perhaps they date from a time when no other solution was well-supported enough to deliver different fonts for different languages. I note that   and   specify different fonts for different languages with CSS alone, so it is clearly possible to do it that way. (Not too sure what is going on with  ...) This, that and the other (talk) 03:50, 11 March 2024 (UTC)
 * Maybe either of you could comment. If we can replace things like with just the appropriate language selectors in MediaWiki:Gadget-LanguagesAndScripts.css I would rather do it that way and not expose what is essentially an implementation detail into the wikicode. Benwing2 (talk) 03:57, 11 March 2024 (UTC)
 * In the case of, it's been split because the code actually covers four closely related scripts: Mongolian (proper), [Oirat] Clear Script, Manchu and Xibe. It's a situation where the split exists to get more accurate language data, rather than because we need different CSS classes (though that may be something we want in the future; Manchu and Oirat-specific fonts exist, and I suspect Xibe as well). In each case, the character ranges only cover the characters used by those scripts; there's some overlap, but most are only used in a subset of the four. See  for a breakdown (note: Todo = Clear Script; Sibe = Xibe). (Edit: this distinction does matter in some cases, e.g. Sanskrit, which has  ,   and  .) Theknightwho (talk) 05:38, 11 March 2024 (UTC)
 * @Theknightwho could you update the Chinese entry at WT:LT, such as it is? This, that and the other (talk) 03:51, 11 March 2024 (UTC)
 * Done. Theknightwho (talk) 05:38, 11 March 2024 (UTC)
 * Done. Theknightwho (talk) 05:38, 11 March 2024 (UTC)

Module editing tutorials
Hi, would you be able to point me to some places where I can learn more about module creation and editing?

I'm self-taught in HTML which has served me fine for entries and templates, but there are quite a lot of things I would like to see done at the module level in Welsh (ways of presenting collective-singulative nouns, accounting for literary and colloquial forms in adjectives, a template for phrasal verbs, a template for generating IPA transcriptions) that at the moment are well beyond my abilities.

I'd also prefer not to bother other users by constantly asking them to do tasks for me when I could just learn to do it myself. Cheers, Arafsymudwr (talk) 16:45, 13 March 2024 (UTC)


 * @Arafsymudwr Sorry for the very belated response! The documentation on how modules work, as well as links to tutorials (under the "Getting started" section), is found in WT:LUA. The first thing you will need to do is learn something about Lua. If you are at all familiar with JavaScript, you will find Lua rather similar. When you make a change to a module, you should always test it before saving. The way to do that is to use the "Preview page with this template" functionality (a box near the bottom left) to preview one of the Welsh pages that uses the module. Start by making a small change and gradually make more extensive changes as you get more comfortable. Let me know if I can be of more assistance. Benwing2 (talk) 23:59, 19 March 2024 (UTC)

Min translations
Hi - following the renaming of various Min lects, could you please do the following name replacements in translation sections? They should all be nested under Chinese.
 * 1) Min Bei &rarr; Northern Min
 * 2) Min Dong &rarr; Eastern Min
 * 3) Min Zhong &rarr; Central Min
 * 4) Puxian &rarr; Puxian Min

I'm not including Min Nan, since all the translations have to be converted manually due to the split anyway, so changing them to Southern Min would just create confusion. Thanks. Theknightwho (talk) 21:26, 13 March 2024 (UTC)
 * @Theknightwho OK I have an existing script to sort translations that I was able to modify to handle this. I will run it shortly. As for Min Nan in translation sections, I checked and there are 2,637 pages with Min Nan translations in them so it will take awhile to do this totally manually. I had hoped they would have a qualifier by them indicating the particular Min Nan lect but usually that doesn't seem to be the case. The first two examples, from dictionary and rain cats and dogs, are typical:


 * Min Nan:, ,
 * Min Nan:
 * I know little about Min Nan but from what I've heard, I suspect the vast majority of them are Hokkien. It may be possible in any case to speed this up by looking up the terms in question to see whether the lect can be identified. For example, the four terms given above all have Pronunciation sections indicating that the transliterations in question are Hokkien (and some of them also have qualifiers). Some translations don't have transliterations given, but in that case as long as there is a Hokkien pronunciation given, I think it's fine to tag it as Hokkien. (Also I looked for Teochew translations and several of them are tagged as  or even, presumably because someone thought  stood for Min Nan.) Benwing2 (talk) 23:53, 13 March 2024 (UTC)
 * @Benwing2 Thanks - I've spent a couple of hours going over them so far, and I've already dealt with all the ones that were marked Teochew (including the one labelled, yeah). Out of the ones simply marked "Min Nan", I've only found one which was definitely Teochew, with the others all being Hokkien.
 * In terms of automating it, the safest thing to do would be to convert any which don't have numbered tones to Hokkien, leaving the rest for manual review (which will probably be <20).
 * There could plausibly be a handful which are in fact Teochew but have POJ-style (i.e. Hokkien-style) transliterations, but I don't think it's feasible to determine those, since it would be way too time-consuming to convert it to the correct romanisation and check against the entry for every single translation.
 * Theknightwho (talk) 00:02, 14 March 2024 (UTC)
 * Sounds good. For reference here is the complete list of Min Nan translations as of the Mar 1 dump that have numbered tones in them:
 * Theknightwho (talk) 00:02, 14 March 2024 (UTC)
 * Sounds good. For reference here is the complete list of Min Nan translations as of the Mar 1 dump that have numbered tones in them:

Page 872 four: Found match for regex: *: Min Nan: , Page 873 five: Found match for regex: *: Min Nan: , Page 1054 eight: Found match for regex: *: Min Nan: , Page 2107 percent: Found match for regex: *: Min Nan: 百分之 Page 2462 cousin: Found match for regex: *: Min Nan: 叔伯兄, 叔伯阿兄 , 叔伯小弟 , 叔伯阿姊 , 叔伯小妹 , 表兄 , 表小弟 , 表姊 , 表小妹 Page 2809 handmaid: Found match for regex: *: Min Nan: 女婢, tsa1-boo2-kan2 Page 4233 eyelash: Found match for regex: *: Min Nan: 目睭毛//目珠毛,, 目毛; 目眥毛 Page 4352 flesh: Found match for regex: *: Min Nan: Page 5089 stiff: Found match for regex: *: Min Nan: liau1 Page 16449 aircraft: Found match for regex: *: Min Nan: 飞行器 Page 30166 gnash: Found match for regex: *: Min Nan: 咬牙切齒, 咬牙, 切齒 Page 31973 farmer: Found match for regex: *: Min Nan:, , , 农夫 Page 35994 cabbage: Found match for regex: *: Min Nan: 植物人 Page 38088 arsehole: Found match for regex: *: Min Nan: lan7-tsiau2-bin7, 臭面人 Page 43201 glove: Found match for regex: *: Min Nan: , Page 45493 reunion: Found match for regex: *: Min Nan: ui5-loo5 围炉 Page 45800 dung beetle: Found match for regex: *: Min Nan: 蜣螂, 屎龜,,  牛屎核 Page 48510 loess: Found match for regex: *: Min Nan:, 黄砂 Page 50690 troublesome: Found match for regex: *: Min Nan: lo1so1, lui1-lui1-tui1-tui1, 啰嗦 Page 50799 feud: Found match for regex: *: Min Nan: se3-siu5 Page 54507 sashimi: Found match for regex: *: Min Nan: 刺身 Page 64034 shove: Found match for regex: *: Min Nan: long1, lang1, nng1 Page 67068 vulva: Found match for regex: *: Min Nan: 陰門, 外阴 Page 76097 shirk: Found match for regex: *: Min Nan: liu1-kiang1 Page 104634 halfway: Found match for regex: *: Min Nan: 半路 Page 106660 thimble: Found match for regex: *: Min Nan: 鍼黹,, 指套 , 頂針, 銅指 Page 106811 spacious: Found match for regex: *: Min Nan: 阔, khuann3-long1-long1 Page 125580 K2: Found match for regex: *: Min Nan: K2 Hong Page 179602 disadvantageous: Found match for regex: *: Min Nan: put4-li7 Page 335793 Wiktionary:Beer parlour/2007/April: Found match for regex: :::*:Min Nan: (Amoy) 囡仔 (gín-á); (Teochew) 孥囝 (nou5gian2) Page 1357199 Wiktionary:Beer parlour/2009/May: Found match for regex: :: That's 1 then. The child has 3 levels. Is it really necessary? Can we keep to 2 levels? For example, ** Min Nan: 囡仔 (gín-á), 孥囝 (nou5gian2) (Teochew)? Anatoli 22:39, 10 May 2009 (UTC) Benwing2 (talk) 00:58, 14 March 2024 (UTC)


 * Thanks. Theknightwho (talk) 00:59, 14 March 2024 (UTC)
 * @Theknightwho I am running my script now to change Min Dong -> Eastern Min and Min Bei -> Northern Min and re-sort appropriately (there were no translations involving Min Zhong or Puxian). A couple of questions:
 * Are you finished fixing up the pages with numbered tones in them that I mentioned above? If so once the script finishes I'll do a run to change Min Nan -> Hokkien in translations along with nan -> nan-hbl, and re-sort.
 * What about occurrences of Min Dong etc. in lb, tlb, zh-forms, q (occurring mostly in Synonyms sections), etc.? Do these need to be renamed? On rough count, there are 1,318 occurrences of Min Dong in lb, 279 in q, 48 in zh-forms and 20 in tlb. Counts for Min Bei are roughly similar, while there are only a few instances of Min Zhong and Puxian (without "Min").
 * Benwing2 (talk) 02:50, 14 March 2024 (UTC)
 * @Benwing2 Thanks.
 * Yes.
 * Yes. For things like labels etc., "Min Nan" should be changed to "Southern Min".
 * Theknightwho (talk) 04:12, 14 March 2024 (UTC)
 * @Theknightwho OK sounds good. #1 is running now. Benwing2 (talk) 04:14, 14 March 2024 (UTC)
 * @Theknightwho What about things like "[Cc]oastal Min" as occurs in zh-forms in 唐人 and in lb in 牛母? (I guess these need manual editing, as it appears Coastal Min can be any of Eastern, Southern or Puxian.) Benwing2 (talk) 04:19, 14 March 2024 (UTC)
 * See also  in 儂. Benwing2 (talk) 04:20, 14 March 2024 (UTC)
 * Not sure if this is useful but there are 203 occurrences of in the Mar 1 dump, which generally occur in surname:

Page 27803 Cu: Found match for regex: # of Hokkien Chinese origin Page 31307 Lao: Found match for regex: # of Chinese origin Page 54700 Dee: Found match for regex: #, most notably borne by: Page 68861 Kong: Found match for regex: # of Chinese origin Page 71443 Juan: Found match for regex: # of Chinese origin Page 80226 Chan: Found match for regex: # (Hokkien) of Chinese origin Page 80245 Chi: Found match for regex: # of Chinese origin Page 80288 Co: Found match for regex: # of Chinese origin Page 80532 Du: Found match for regex: # of Chinese origin, mostly around Cebu Page 80539 Dy: Found match for regex: # Page 80915 Go: Found match for regex: # of Chinese origin Page 81022 Haw: Found match for regex: # of Chinese origin Page 81061 Ho: Found match for regex: # of Chinese origin, most notably borne by: Page 81318 King: Found match for regex: # Page 81334 Ko: Found match for regex: # of Chinese origin Page 81420 Lee: Found match for regex: # Page 81515 Lu: Found match for regex: # of Chinese origin Page 81516 Lua: Found match for regex: # of Hokkien Chinese origin Page 81890 Ng: Found match for regex: # Page 82214 Po: Found match for regex: # of Chinese origin Page 82353 Que: Found match for regex: # of Chinese origin Page 82618 See: Found match for regex: # of Chinese origin Page 82665 Shaw: Found match for regex: # of Chinese origin Page 82674 Sia: Found match for regex: # of Chinese origin Page 82690 Sin: Found match for regex: #, most associated with former Archbishop of Manila, Page 82735 So: Found match for regex: # of Chinese origin, most notably borne by: Page 82750 Son: Found match for regex: # of Hokkien origin Page 82890 Sy: Found match for regex: # of Chinese origin Page 82930 Tan: Found match for regex: # of Chinese origin Page 82931 Tang: Found match for regex: #. Page 82949 Te: Found match for regex: # of Chinese origin Page 82956 Tee: Found match for regex: # of Chinese origin Page 83037 To: Found match for regex: # of Chinese origin Page 83141 Ty: Found match for regex: # of Chinese origin Page 83141 Ty: Found match for regex: # of Chinese origin Page 83396 Yap: Found match for regex: # of Chinese origin Page 83409 Young: Found match for regex: # of Chinese origin Page 83409 Young: Found match for regex: # of Chinese origin Page 121853 Tiu: Found match for regex: #, most notably borne by: Page 196098 Samson: Found match for regex: # common on Filipinos of Chinese ancestry Page 766971 Lew: Found match for regex: # of Chinese origin Page 825754 Anson: Found match for regex: # common on Filipinos of Chinese ancestry Page 825754 Anson: Found match for regex: # common on Filipinos of Chinese ancestry Page 1066196 Chu: Found match for regex: # of Chinese origin Page 1178407 Yao: Found match for regex: # of Chinese origin Page 1265062 Lim: Found match for regex: # Page 1265062 Lim: Found match for regex: # of Chinese origin Page 1265654 Cheng: Found match for regex: # of Chinese origin Page 1265730 Ang: Found match for regex: # of Chinese origin Page 1265732 Ong: Found match for regex: # of Chinese origin Page 1265733 Suan: Found match for regex: # of Chinese origin Page 1265734 Cua: Found match for regex: # of Chinese origin Page 1266900 Pua: Found match for regex: # of Chinese origin Page 1266901 Uy: Found match for regex: # of Chinese origin Page 1266918 Chua: Found match for regex: # of Chinese origin Page 1266924 Khoo: Found match for regex: # of Hokkien Chinese origin Page 1266970 Ching: Found match for regex: # of Hokkien Chinese origin or  of Cantonese Chinese origin, notably borne by: Page 1277675 Gan: Found match for regex: # of Chinese origin Page 1284142 Koa: Found match for regex: # of Chinese origin Page 1443955 Nga: Found match for regex: #. Page 1579807 Kang: Found match for regex: # of Chinese origin Page 2178085 Deang: Found match for regex: # of Chinese origin Page 2625641 Wee: Found match for regex: # of Chinese origin Page 2700666 Tin: Found match for regex: # of Chinese origin Page 2845428 Henson: Found match for regex: # common among Filipinos of Chinese ancestry Page 3305014 Yang: Found match for regex: # of Chinese origin Page 3750292 Lo: Found match for regex: # of Hokkien origin Page 4170429 Chung: Found match for regex: #, or (Hokkien) of Chinese origin. Page 4713793 Coo: Found match for regex: # of Hokkien Chinese origin, or  of Cantonese Chinese origin. Page 5112069 Sanson: Found match for regex: # common on Filipinos of Chinese ancestry Page 5152613 Kho: Found match for regex: #, most notably borne by: Page 5152613 Kho: Found match for regex: #, most notably borne by: Page 5159150 Kua: Found match for regex: # of Chinese origin Page 5171997 Yee: Found match for regex: # of Chinese origin Page 5375208 Yu: Found match for regex: #, the 26th most common in the Philippines Page 5375208 Yu: Found match for regex: #, the 26th most common in the Philippines Page 5404772 Ngo: Found match for regex: # of Chinese origin Page 5406204 Chong: Found match for regex: # of Cantonese Chinese origin, or  of Hokkien Chinese origin. Page 5406528 Tong: Found match for regex: # of Chinese origin Page 5406530 Chiu: Found match for regex: #, most notably borne by: Page 5410833 Leong: Found match for regex: # of Cantonese Chinese origin or  of Hokkien Chinese origin. Page 5411779 Pang: Found match for regex: # of Chinese origin Page 5413143 Ison: Found match for regex: # common on Filipinos of Chinese ancestry Page 5415076 Dizon: Found match for regex: # of Chinese origin, notably borne by: Page 5415076 Dizon: Found match for regex: # of Chinese origin, notably borne by: Page 5435565 Yung: Found match for regex: # of Chinese origin Page 5437599 Shao: Found match for regex: # of Chinese origin Page 5437924 Loo: Found match for regex: # of Chinese origin Page 5438022 Sison: Found match for regex: #. Page 5438022 Sison: Found match for regex: # common on Filipinos of Chinese ancestry, notably borne by: Page 5438104 Hau: Found match for regex: # of Chinese origin Page 5438288 Tian: Found match for regex: # of Chinese origin Page 5439278 Teng: Found match for regex: # of Hokkien origin Page 5442404 Ting: Found match for regex: # of Chinese origin Page 5453194 Tien: Found match for regex: # of Chinese origin Page 5512124 Tuazon: Found match for regex: #. Page 5512124 Tuazon: Found match for regex: # Page 5512124 Tuazon: Found match for regex: # common on Filipinos of Chinese ancestry Page 5514761 Goh: Found match for regex: #. Page 5514761 Goh: Found match for regex: # of Chinese origin Page 5538352 Niu: Found match for regex: #. Page 5538352 Niu: Found match for regex: # Page 5543775 Quiambao: Found match for regex: #. Page 5543775 Quiambao: Found match for regex: # of Hokkien origin, most notably borne by: Page 5558677 Lacson: Found match for regex: #. Page 5558677 Lacson: Found match for regex: # common on Filipinos of Chinese ancestry, most notably borne by: Page 5582383 Tecson: Found match for regex: # Hokkien Chinese, common among Filipinos of Chinese descent. Page 5582383 Tecson: Found match for regex: # of Hokkien Chinese origin, most notably descendants of ‘Tek Sun’ brothers from Guangzhou (Canton), China Page 5584134 Layson: Found match for regex: # common among Filipinos of Chinese ancestry Page 5586737 Cinco: Found match for regex: # common on Filipinos of Chinese ancestry Page 5586737 Cinco: Found match for regex: # common on Filipinos of Chinese ancestry Page 5614689 Soon: Found match for regex: # of Hokkien origin Page 5618852 Singson: Found match for regex: # common among Filipinos of Chinese ancestry Page 5636472 Gozon: Found match for regex: # common on Filipinos of Chinese ancestry Page 5646811 Gotamco: Found match for regex: # Page 5652715 Cayco: Found match for regex: # Page 5652718 Syson: Found match for regex: # Page 5652722 Layco: Found match for regex: # Page 5653661 Tengco: Found match for regex: # of Hokkien origin Page 5655949 Yuzon: Found match for regex: # Page 5655949 Yuzon: Found match for regex: # Page 5656631 Tiongson: Found match for regex: # common on Filipinos of Chinese ancestry Page 5656647 Cojuangco: Found match for regex: #, borne by a known political and business clan in the Philippines Page 5671242 Jocson: Found match for regex: # Page 5673469 Tiangco: Found match for regex: # Page 5674047 Quisumbing: Found match for regex: # of Chinese origin Page 5674054 Lichauco: Found match for regex: # Page 5676213 Locsin: Found match for regex: # of Chinese origin, most notably borne by: Page 5677430 Quizon: Found match for regex: # Page 5677430 Quizon: Found match for regex: # of Chinese origin, most associated with Dolphy, which bears the real name of Rudolf Quizon Page 5677431 Quimpo: Found match for regex: # of Chinese origin Page 5677431 Quimpo: Found match for regex: # of Chinese origin Page 5678951 Tangco: Found match for regex: # or Hokkien origin Page 5678980 Tiongco: Found match for regex: # of Hokkien origin Page 5678984 Guanzon: Found match for regex: # common among Filipinos of Chinese ancestry Page 5678991 Hizon: Found match for regex: # common among Filipinos of Chinese ancestry, most notably descendants of migrants from Macau to, , Page 5684485 Tiamson: Found match for regex: # or Hokkien origin Page 5686268 Tuason: Found match for regex: # of Hokkien Chinese origin, Page 5686671 Tio: Found match for regex: # of Chinese origin Page 5687329 Ganzon: Found match for regex: # or Hokkien origin Page 5689830 Pecson: Found match for regex: # Page 5689830 Pecson: Found match for regex: # of Hokkien origin Page 5690622 Siason: Found match for regex: # of Hokkien origin Page 5690623 Tiozon: Found match for regex: # of Hokkien origin Page 5691453 Unson: Found match for regex: # of Hokkien origin common among Filipinos of Chinese ancestry Page 5692143 Cuizon: Found match for regex: # of Hokkien origin Page 5692145 Suico: Found match for regex: # of Hokkien origin Page 5693840 Quimson: Found match for regex: # of Hokkien origin Page 5694341 Tancinco: Found match for regex: # of Hokkien origin Page 5696938 Ongkiko: Found match for regex: # of Hokkien origin Page 5696941 Sioson: Found match for regex: # of Hokkien origin Page 5700562 Bauzon: Found match for regex: # of Hokkien origin Page 5700580 Yatco: Found match for regex: # of Hokkien origin Page 5700589 Gancayco: Found match for regex: # of Hokkien origin Page 5700604 Limjoco: Found match for regex: # of Hokkien origin Page 5700656 Coquia: Found match for regex: # of Hokkien origin Page 5700659 Dijamco: Found match for regex: # of Hokkien origin Page 5700712 Ticzon: Found match for regex: # of Hokkien origin Page 5700939 Cosico: Found match for regex: # of Hokkien origin Page 5701342 Yuvienco: Found match for regex: # of Hokkien origin Page 5701354 Sangco: Found match for regex: # of Hokkien origin Page 5738755 Ayson: Found match for regex: # of Hokkien origin Page 5740882 Songco: Found match for regex: # Page 5764989 Leyson: Found match for regex: # common among Filipinos of Chinese ancestry Page 5769732 Kiamzon: Found match for regex: # Page 5769773 Sayson: Found match for regex: # common among Filipinos of Chinese ancestry Page 5773490 Sanciangko: Found match for regex: # Page 5773649 Guico: Found match for regex: # Page 5773673 Tanchoco: Found match for regex: # Page 5773685 Siongco: Found match for regex: # Page 5788737 Tayson: Found match for regex: # Page 5788738 Limcaoco: Found match for regex: # Page 5885208 Joson: Found match for regex: # Page 5889986 Tanseco: Found match for regex: # Page 5906982 Siao: Found match for regex: # of Chinese origin Page 5906982 Siao: Found match for regex: # of Chinese origin Page 5983082 Yongco: Found match for regex: # of Chinese origin Page 5983082 Yongco: Found match for regex: # of Chinese origin Page 6060762 Pacquiao: Found match for regex: # Page 6601914 Caw: Found match for regex: # of Chinese origin Page 6601919 Pueson: Found match for regex: # common on Filipinos of Chinese ancestry Page 6601923 Causon: Found match for regex: # common with Filipinos with Chinese ancestry Page 6601938 Quitson: Found match for regex: # common on Filipinos of Chinese ancestry Page 6601988 Auyong: Found match for regex: # of Chinese origin Page 6601989 Awyoung: Found match for regex: # of Chinese origin Page 6603830 Syaw: Found match for regex: # of Chinese origin Page 6603831 Shau: Found match for regex: # of Chinese origin Page 6603884 Hwan: Found match for regex: # of Chinese origin Page 6603960 Liong: Found match for regex: # of Cantonese Chinese origin, or  of Hokkien Chinese origin. Page 6603976 Mapua: Found match for regex: # of Hokkien Chinese origin, notably borne by: Page 6638858 Banzon: Found match for regex: # of Hokkien Chinese origin Page 7439359 Teh: Found match for regex: #. Page 7782052 Sitchon: Found match for regex: # common on Filipinos of Chinese ancestry Page 7782063 Itchon: Found match for regex: # common on Filipinos of Chinese ancestry Page 7849686 Tiong: Found match for regex: #. Page 7849688 Diong: Found match for regex: #. Page 7924413 Ngeh: Found match for regex: #. Page 8003694 Canoy: Found match for regex: # common among Filipinos of Chinese ancestry Page 8060607 Gueco: Found match for regex: # Page 8343774 Siocson: Found match for regex: # of Hokkien origin Page 8343781 Bengzon: Found match for regex: # of Hokkien origin Page 9058034 Quiason: Found match for regex: # common on Filipinos of Chinese ancestry Page 9058035 Quiazon: Found match for regex: # common on Filipinos of Chinese ancestry
 * As can be seen, these are almost all Min Nan, almost all Tagalog and some of them explicitly say "of Hokkien origin". Are these all Hokkien? If so I'll change them accordingly. Benwing2 (talk) 04:29, 14 March 2024 (UTC)
 * @Benwing2 Thanks for this re the surnames. The whole "of X origin" thing is totally superfluous imo, so should be deleted. If it explicitly says Hokkien somewhere then change it to that; it might also be possible to infer it from the etymology section, too. Any remaining ones should be left to manual review. Theknightwho (talk) 04:33, 14 March 2024 (UTC)
 * @Theknightwho All right, I'll do this. BTW some of them are already fixed; I randomly picked Siocson and User:Mlgc1998 fixed it 3 days ago. Benwing2 (talk) 04:36, 14 March 2024 (UTC)
 * @Benwing2 It's probably fine to keep Coastal Min in zh-forms. We should probably have proper categories set up for it, which categories like Category:Southern Min Chinese would be part of.
 * There's a whole issue with labels in Chinese entries causing a ton of duplication between the label categories and the lemma categories, but we've not come up with a satisfactory solution to it yet. Theknightwho (talk) 04:30, 14 March 2024 (UTC)
 * @Theknightwho Yeah, IMO things like Category:Hokkien Chinese should go away in favor of Category:Hokkien lemmas now that we have the latter. lb could be made to generate the latter category in place of the former but it doesn't seem like such a good idea as it wouldn't categorize correctly into the other categories. Benwing2 (talk) 04:34, 14 March 2024 (UTC)
 * Also IMO all label categories that refer to specific lects should have corresponding lang codes, either full or etym-only, and probably the etym-only categories added by the Pronunciation section instead of the lb. Note also that User:-sche proposed awhile ago renaming "etym-only language" to something else, which IMO is a good idea; they have gone far beyond being used only for etymologies. Benwing2 (talk) 04:39, 14 March 2024 (UTC)
 * Yeah, agreed. It's probably worth starting a thread on the BP about renaming etym-only languages, as the current name is really misleading. Theknightwho (talk) 04:50, 14 March 2024 (UTC)
 * Done. BTW it looks like "Min Nan" was already removed from all Tagalog etc. surnames; the only remaining instances of "from=Min" occurred in a few English surnames of Min Dong origin. I cleaned them up and removed the text "of Chinese origin" etc. following various surname invocations. The script to implement #2 above (correct "Min Dong", "Min Bei" etc. in labels/qualifiers/etc.) is running. Benwing2 (talk) 07:05, 14 March 2024 (UTC)
 * Task #2 is close to done; going to sleep now. There are still 6,406 occurrences of "Min Nan" in qualifiers, which my script didn't touch. The occurrences can be found here: User:Benwing2/qualifier-min-nan-1 and User:Benwing2/qualifier-min-nan-2 (split over two files because otherwise the files supposedly exceed the 2MB size; in fact the total file size is 1.2MB but there's that stupid doubler effect). Some of the qualifiers occur in Reference sections but the vast majority seem to occur in Synonyms and Antonyms sections. I am guessing again that the majority are Hokkien but I'm not sure, and generally the transliterations aren't attached. Here we might have to fall back on looking up the terms in question to see which lects they are listed as occurring in (which should be bottable, if you provide appropriate instructions). Benwing2 (talk) 08:31, 14 March 2024 (UTC)
 * @Theknightwho Let me know if you need help with any other renaming tasks that can be done or sped up by bot. I notice you're going through and renaming instances of "Min *" in comments, rfp params and other random places but there may be too many to do by hand. There were 17,750 pages satisfying the regex  as of the Mar 1 dump, and 12,222 remaining when I re-downloaded the same pages last night before running task #2. Task #2 changed 6,245 pages, meaning there might be on the order of 6,000 pages left, although I can check for sure by re-downloading the same pages. As I mentioned above, most of the occurrences are probably  occurring in qualifiers because my script didn't change them. Benwing2 (talk) 22:51, 14 March 2024 (UTC)
 * @Benwing2 Thanks. Yeah, I was just going through and renaming the various "Min Bei" and "Min Dong" labels, but noticed that "Min Nan" is used on thousands of pages. It's annoying, as it's the one where "Hokkien" is sometimes a more appropriate label. That being said, it's not wrong to put "Southern Min", so it would probably be helpful to change those automatically. Theknightwho (talk) 23:04, 14 March 2024 (UTC)
 * @Theknightwho See my comment above from last night. It's probably possible to figure out how to change Min Nan automatically to the right label by looking up the page in question to see what lects are listed on the page. If you want me to work on that I can although I'd need some instructions as to what lects to look out for. Benwing2 (talk) 23:10, 14 March 2024 (UTC)
 * @Benwing2 Yes please - @Justinrleung might be able to give better pointers than me. Theknightwho (talk) 23:12, 14 March 2024 (UTC)
 * @Theknightwho OK, I re-downloaded the relevant pages. There are 7,396 pages remaining satisfying the regex . Of these, 7,128 mention Min Nan; 39 mention Min Bei; 59 mention Min Dong; 22 mention Min Zhong; and 211 mention Puxian but only 15 of those mention Puxian using the regex , which excludes "Puxian Min". There are 8,195 total lines mentioning of Min Nan (since some pages mention Min Nan more than once). Of these lines, 6,761 contain a qualifier and 6,593 specifically satisfy the regex  , i.e. a qualifier followed by a Chinese-style link. Of the 1,605 lines not satisfying  , 45 match   (a qualifier with a generic link); 111 contain thcwd or a variant (thcwda, thcwdq), almost all preceded by a Min Nan qualifier; 227 contain Min Nan inside of zh-forms; 21 contain Min Nan inside of zh-see; 105 contain Min nan inside of zh-der, col3 or a variant; and 24 contain an occurrence of desc. Excluding all of these leaves 1,063 occurrences over 412 pages, of which 260 are outside of mainspace. So I think it should be possible to create a script to handle the   occurrences, and handle the remainder type-by-type in a semi-manual fashion. Benwing2 (talk) 00:02, 15 March 2024 (UTC)
 * @Benwing2 Sounds like a good plan. Thanks for doing this. Theknightwho (talk) 00:03, 15 March 2024 (UTC)
 * @Theknightwho FYI I also did a download run of those same pages checking for those now containing "Southern Min". There are 5,119 lines over 4,377 pages mentioning Southern Min, mostly in labels (as expected) but occasionally in other places that could stand to be reviewed. Benwing2 (talk) 00:05, 15 March 2024 (UTC)
 * @Theknightwho OK. Can you help me sketch out a general idea of what the qualifiers should be transformed into? For example, I randomly picked page 4445 天涯海角, which contains a synonym 天邊海角 labeled "Min Nan". This latter page has a label and it also has zh-pron. According to the documentation of zh-pron,  means Hokkien and the codes inside mean ="Mainland China (Xiamen, Quanzhou, Zhangzhou)", ="Jinjiang", ="mainstream Taiwan", for which a pronunciation is given. How much info do we want in the qualifiers? Is just "Hokkien" enough in this situation? In general, what lects should be specified in the qualifiers? Maybe just Hokkien, Teochew, Leizhou? Possibly also Quanzhou and/or Zhangzhou dialect if pronun is given for these dialects? This is where I need a bit of guidance from someone like you who knows the languages in question. Benwing2 (talk) 00:24, 15 March 2024 (UTC)
 * @Benwing2 I'd wait for Justin to comment, as I think you're really overestimating my knowledge. I've got a very broad understanding of what needs to be done, but my understanding of Module:nan-pron is relatively low, so I won't be much help in interpreting the input. Theknightwho (talk) 00:30, 15 March 2024 (UTC)
 * @Theknightwho OK. I had assumed you know the languages because you seem able to correctly split the lects; maybe you're just a fast learner ;) ... Benwing2 (talk) 00:40, 15 March 2024 (UTC)
 * I think for qualifiers of synonyms, etc., it can just be
 * "Hokkien" when there's only a Hokkien pronunciation, "Teochew" when there's only a Teochew pronunciation, etc., and we don't need to worry about the finer distinctions, which we will get with at the entry. If it's more than one Southern Min variety, we could either use the Southern Min label or list all the relevant Southern Min languages; I don't have a strong feeling about either way. — justin(r)leung { (t...) 01:38, 15 March 2024 (UTC)
 * @Justinrleung All right. What is the complete list of Southern Min varieties? Benwing2 (talk) 01:39, 15 March 2024 (UTC)
 * The currently supported varieties in are Hokkien, Teochew and Leizhou Min. Other than these, there's Hainanese as well as other varieties that haven't be dealt with (WT:RFM). — justin(r)leung { (t...) 01:46, 15 March 2024 (UTC)
 * The currently supported varieties in are Hokkien, Teochew and Leizhou Min. Other than these, there's Hainanese as well as other varieties that haven't be dealt with (WT:RFM). — justin(r)leung { (t...) 01:46, 15 March 2024 (UTC)

I finished the script to convert Min Nan and Southern Min in qualifiers in Synonym/Antonym sections (and the like; whenever followed by a zh-l link). Out of 6,283 pages where it tried to do something, it was able to process 5,938, which is a pretty good record (94.5%). The breakdown of lects generated is as follows: 5418 Hokkien 485 Hokkien|Teochew 16 Hokkien|Teochew|Leizhou 10 Hokkien|Leizhou 9 Teochew The script issued 663 warnings. They are here: User:Benwing2/min-nan-qualifier-conversion-warnings. One of you two might want to go through them. Note that 268 "may be ignorable" (meaning that the script was able to continue on and ultimately do something, despite the warning). Of the remaining 395, 276 are due to the link referring to a nonexistent page; you'd need domain knowledge to know which lect(s) are appropriate. This leaves 119, of which 50 are "Couldn't parse" errors (the line wasn't formatted in a standard fashion); 35 are "Couldn't find 'Min Nan' or 'Southern Min' qualifier" errors (the qualifier template says something like literary or Min Nan, Hakka or Cantonese, Min Nan rather than just "Min Nan"); 22 are "Saw multiple Etymology/Pronunciation sections" (in such a case, the code tries hard to figure out the correct lects, including using the gloss in the zh-l link and making sure there is more than one Etymology/Pronunciation section that refers to Min Nan and that the two sections have different lects in them); 5 are "Can't find Chinese section"; and 7 are some random misc stuff. I am going to run the script in save mode either tonight or tomorrow. Benwing2 (talk) 07:52, 15 March 2024 (UTC)


 * @Theknightwho This is running; maybe 1 to 1.5 hours and it will finish. Benwing2 (talk) 20:43, 15 March 2024 (UTC)
 * Cool - thanks. Theknightwho (talk) 20:47, 15 March 2024 (UTC)
 * BTW can zh-l be replaced by zh? I'm not sure any more what the Chinese-specific behavior in zh-l is. Maybe it's just automatic handling of traditional vs. simplified forms? Benwing2 (talk) 20:47, 15 March 2024 (UTC)
 * @Theknightwho Also maybe we can have the lect be specified using a lang code prefix instead of having it a separate qualifier. Benwing2 (talk) 20:48, 15 March 2024 (UTC)
 * @Benwing2 On that point, would it be possible to do a similar analysis for all uses of the  code used in the Thesaurus namespace? There are 483 uses at the moment, but conversion is slow as it requires a bunch of manual analysis. Some of them also have "Min Nan" in qualifiers, which will need revising as well. Theknightwho (talk) 20:52, 15 March 2024 (UTC)
 * @Theknightwho OK, I'll take a look. Benwing2 (talk) 20:54, 15 March 2024 (UTC)
 * @Theknightwho @Justinrleung For this purpose I think we (a) need to add the missing etym-only codes for Min Nan lects, and (b) we should include the specific lect and not just "Hokkien" in the lang prefix or qualifier. For example, I took a look at Thesaurus:打耳光 meaning "to slap someone in the face"; there are three synonyms labeled as well as two more explicitly labeled Zhangzhou Hokkien and Tainan Hokkien respectively. Of the three labeled, one is a red link, one is labeled Xiamen Hokkien and one is labeled "Quanzhou, Zhangzhou and Taiwanese Hokkien". Labeling the latter two just "Hokkien" would seem incomplete. Benwing2 (talk) 21:09, 15 March 2024 (UTC)
 * @Benwing2 The principle I've followed so far has been to use the most specific label which adequately covers everything at the target, where that's possible. So anything that's labelled (e.g.) "Xiamen Hokkien" would get the langcode, but something labelled "Quanzhou, Zhangzhou and Taiwanese Hokkien" would just get  . I agree with Justin that the labels for links aren't as important as those on the entries themselves, so incompleteness isn't the end of the world. When multiple lects are mentioned (e.g. Hokkien and Teochew), I've ditched the langcode altogether and put (e.g.) "Southern Min" as a qualifier. Theknightwho (talk) 21:12, 15 March 2024 (UTC)
 * Also, as an aside, we don't currently have an etym-only langcode for Taiwanese Hokkien, because it's not a well-defined lect in the way varieties like Xiamen, Zhangzhou and Quanzhou are; all three are spoken on Taiwan, but (for historical reasons) the Hokkien-speaking communities on Taiwan have undergone a lot more influence from Japanese and English than their equivalents on the mainland, so it makes sense to use that label sometimes. In those cases, just labelling them "Hokkien" isn't really a problem if it's just in the thesaurus entry. Theknightwho (talk) 21:20, 15 March 2024 (UTC)
 * @Theknightwho All right, let me look at a few more examples. While we're at it, what do you think of replacing the etym-only codes for the Hokkien varieties with ones conforming to the principles I laid out in WT:RFM? Since these codes are newly added I suspect they're barely used. This would mean nan-jnj -> nan-jin (Jinjiang Hokkien), nan-qzh -> nan-qua (Quanzhou Hokkien), nan-xmn -> nan-xia (Xiamen Hokkien), nan-zzh -> nan-zha (Zhangzhou Hokkien), nan-plp -> nan-qua-PH (probably) or nan-PH (possibly) or nan-phi (perhaps) (Philippine Hokkien). Benwing2 (talk) 21:44, 15 March 2024 (UTC)
 * @Benwing2 I don't mind too much. I have a small preference for doing it syllabically rather than by the first letters of the name, but I don't mind if you want to use a standardised format for them.
 * There are sometimes instances where we won't be able to follow it, though (e.g. Category:South Dravidian I languages and Category:South Dravidian II languages, where I opted for  and , respectively). Theknightwho (talk) 21:48, 15 March 2024 (UTC)
 * @Theknightwho Yes, understood. BTW I wouldn't have an issue with something more syllabic than using the first three letters, it's just that it's not so easy to guess automatically what the right set of letters to use is in that case. (Actually the principle you followed for South Dravidian I/II *is* consistent with the principles I laid out, which call for using the initials of the lect when using the first three letters isn't practical.) Benwing2 (talk) 21:53, 15 March 2024 (UTC)
 * @Theknightwho I changed the language codes. I used for Philippine Hokkien. I think we can go ahead and use  for Taiwanese Hokkien, and create subvariety codes for the specific dialects that are derived respectively from Xiamen, Zhangzhou and Quanzhou (e.g.  etc.). I also modified Module:columns so that it can take a comma-separated list of prefixed lang codes, e.g.  and handle them appropriately (i.e. using the first one to create the term link but displaying all of them as qualifiers). I'm going to work on fixing up the Thesaurus entries now. Benwing2 (talk) 23:29, 16 March 2024 (UTC)
 * I think in most cases, specific dialects of Taiwanese Hokkien should not be tied back to the source varieties of Quanzhou and Zhangzhou (and maybe Xiamen, which is itself generally thought of as a Quanzhou-Zhangzhou mixed variety). These kinds of labels are generally not helpful lexicographically; they are only well-defined phonologically and have small bearing on vocabulary, where much more convergence has occurred in Taiwan due to dialect levelling. The locales in Taiwan (e.g., Lukang, Yilan, etc.) for subdialects of Taiwanese that are less mixed may be more helpful in cases where we want to highlight them. — justin(r)leung { (t...) 02:55, 17 March 2024 (UTC)
 * @Justinrleung OK, this is fine and it jives well with the label. I was just responding the User:Theknightwho's assertion that Taiwanese Hokkien isn't a well-defined lect. Benwing2 (talk) 03:21, 17 March 2024 (UTC)
 * @Theknightwho Code is written to process Thesaurus entries and convert as appropriate. I will finish the analysis tomorrow and run it. Benwing2 (talk) 09:32, 17 March 2024 (UTC)
 * @Theknightwho I expanded the script I wrote so it also attempts to convert lects mentioned in qualifiers into lect code prefixes. (This is the origin of that "part 1" section in WT:RFM.) These should not change the qualifier output much (possibly in some cases rearranging the order, that's it) but will help with transliteration and such. Some stats on what I have so far:
 * I ran it on the 2,013 pages in CAT:Chinese thesaurus entries. It would change 620 pages.
 * It issues 328 warnings. Of these:
 * 255 of these are due to unrecognized lects in qualifiers. All of these are already discussed in the "part 1" WT:RFM section.
 * Of the remaining 73, 40 are due to looking up a page tagged as and finding it doesn't exist.
 * Of the remaining 33, 14 are "informational" warnings that can be ignored.
 * Of the remaining 19, 15 are due to finding multiple etymologies with different sets of Southern Min varieties in the different etymologies.
 * Benwing2 (talk) 05:04, 18 March 2024 (UTC)
 * @Theknightwho Scratch the above stats. My script needs some changes to not overgenerate in the presence of multiple definitions (it already handles multiple etymology/pronunciation sections but needs to be extended for multiple definitions, because sometimes specific labels apply only to specific definitions). Benwing2 (talk) 05:24, 18 March 2024 (UTC)
 * OK, I rewrote the script to take into account the presence of multiple definitions and try to use the glosses present in Thesaurus pages to whittle down the set of possible definitions to use. The first pass doing that increased warnings from 328 to 1,344 (!) and reduced the number of pages changed from 620 to 490, but I think I can do a whole lot better than that. Stay tuned. Benwing2 (talk) 07:07, 18 March 2024 (UTC)
 * Theknightwho With some changes I brought the warnings down to 498 and increased the pages changed up to 624. I just ran the script. There are now only 52 pages remaining in the Thesaurus namespace with links. The warnings generated are here: User:Benwing2/zh-thesaurus-conversion-warnings, minus the warnings about "Saw unhandled lect qualifier", which aren't very important. (For reference, the first four such warnings are as follows:
 * Theknightwho With some changes I brought the warnings down to 498 and increased the pages changed up to 624. I just ran the script. There are now only 52 pages remaining in the Thesaurus namespace with links. The warnings generated are here: User:Benwing2/zh-thesaurus-conversion-warnings, minus the warnings about "Saw unhandled lect qualifier", which aren't very important. (For reference, the first four such warnings are as follows:

Page 4 Thesaurus:一會: WARNING: Saw unhandled lect qualifier Anxi Hokkien (term 一孔久):  Page 46 Thesaurus:中飽: WARNING: Saw unhandled lect qualifier Taiwan (term 歪哥):  Page 56 Thesaurus:亂說: WARNING: Saw unhandled lect qualifier Internet slang (term 口胡):  Page 61 Thesaurus:互聯網: WARNING: Saw unhandled lect qualifier Mainland China (term 網絡):  ) Of the 245 warnings in that file (covering 144 pages), only 67 of them actually concern being unable to convert the code (or occasionally the  qualifier) to something more specific. I'd focus on those. A couple of such warnings are given here for reference: Page 26 Thesaurus:不料: WARNING: Unable to convert 'nan' to correct lang code (reason: Found synonym/antonym 無疑悟 (template, glossed as 'unexpectedly') but page doesn't exist) Page 53 Thesaurus:乞討: WARNING: Unable to convert 'nan' to correct lang code (reason: Saw multiple definitions with different Southern Min types for synonym/antonym 分 (template , glossed as 'to beg (ask for food or money as charity)'): defn '# to divide; to separate' has Hokkien,Teochew while defn '#  to give' has Hokkien,Teochew,Hainanese; skipping)
 * Benwing2 (talk) 04:36, 19 March 2024 (UTC)
 * Generally, zh-l should be replaced (especially if it's giving a Hokkien pronunciation), but that's probably something to do en masse at another time, as there are tens of thousands of uses so we'll probably want to hash out a proper conversion method. Theknightwho (talk) 20:55, 15 March 2024 (UTC)
 * @Theknightwho Yes, agreed; just something to keep in mind. Benwing2 (talk) 20:56, 15 March 2024 (UTC)

Module:columns and Module:sa-verb, Module:sa-verb/data
There are 3 sanskrit entries in CAT:E because of an error in sa-conj, and I checked the entire transclusion list for ततान- your edit to Module:columns is the only recent change to executable code for anything in the list. Indeed, there are comments in Module:sa-verb, saying that code was copied from Module:columns and would need to be updated if that were changed. Chuck Entz (talk) 00:19, 17 March 2024 (UTC)


 * @Chuck Entz Thank you, I'll fix. I looked for modules using Module:columns but I forgot about the entry point. Benwing2 (talk) 00:21, 17 March 2024 (UTC)
 * @Chuck Entz I don't think my change to Module:columns has anything to do with this error. User:Exarchus is actively working on Module:sa-verb/data and made the last change only an hour ago. User:Exarchus, can you take a look at these errors? They are due to a buggy Lua pattern. Benwing2 (talk) 00:35, 17 March 2024 (UTC)
 * I somehow read the dates wrong on those edits- I could have sworn they were from the same date as the ones to Module:sa-verb. You're no doubt right. Sorry! Chuck Entz (talk) 00:47, 17 March 2024 (UTC)

Replacement of quotation templates
Hi, I'd appreciate it if you could do the following bot replacements:


 * → (the template has been updated to add the 1st edition, which is now the default).
 * → (the template has been updated to add the 1st edition, which is now the default).
 * → (the template has been updated to add the 1st edition, which is now the default).

Thank you. — Sgconlaw (talk) 16:00, 27 March 2024 (UTC)


 * @Sgconlaw Done. Benwing2 (talk) 02:08, 28 March 2024 (UTC)
 * Thanks! — Sgconlaw (talk) 04:18, 28 March 2024 (UTC)

Subsequent to our discussion at Grease_pit/2024/February regarding, would you mind making the appropriate edits to the module? RcAlex36 (talk) 07:05, 28 March 2024 (UTC)

Wu information origin
The recent update to Module:labels/data/lang/zh is very much appreciated. However, a lot of the information included seems poorly researched, with a lot of unnecessary/false information, etc. seemingly lifted straight from some enwikp entries, which may be problematic. What sources did you consult? If you need pointers regarding reading/an explanation of the zh primary sources feel free to let me know. Thanks — nd381 (talk) 12:31, 29 March 2024 (UTC)


 * @ND381 Apologies, I was using a combination of Chinese and English Wikipedia entries and Glottolog, which seem to generally agree with each other. I had assumed they were reliable since they generally agree with each other. I can't read primary sources in Chinese, though, except using Google Translate. Let me know what specifically seems wrong and feel free to correct and/or delete stuff. Benwing2 (talk) 18:46, 29 March 2024 (UTC)
 * I should add, I generally only created a label when there is a page in the Chinese or English Wikipedia (and hence a Wikidata item) for the particular lect, except in some cases of higher-level groupings. Benwing2 (talk) 19:23, 29 March 2024 (UTC)
 * Glottolog's family tree is known to have some mistakes and Wikipedia is, well Wikipedia. A few notes from a cursory look at Glottolog's family tree:
 * it uses Li Rong (1987)'s classifications, which are famously unreliable not just in Wu but nation-wide
 * It makes some pretty unorthodox naming choices and forgot to put Jinhua under Wuzhou
 * Its Northenr (Taihu) Wu is a pretty big mess and although I agree with some of their choices it is important to note that not everything there is accepted
 * In particular, "Northern Zhejiang" (which I've seen more as "Southern N Wu", tautological as it may be) is one that is highly likely to be a valid branch, which contrasts with Glottolog's "Northwestern", "Su-Hu-Jia", and "Tiaoxi" branches, themselves also forming a Northern N Wu branch
 * The "Northwestern Wu" branch is completely disproven as Piling has Southern Mandarinic (ie. Huai) influence whereas Hangzhounese has Northern Mandarinic influence
 * I (and wpi for Yue) have notified/fixed a lot of the mistakes already present, however, please consult us next time before making large scale, academically controversial changes to Chinese templates. Do you have a Discord or other means of instant messaging? I can send some English-language sources about Wu diachronics and classification so that you can make a more informed decision next time. Thanks — nd381 (talk) 00:17, 30 March 2024 (UTC)
 * @ND381 I am not on Discord currently; feel free to post the sources here. I did notice that Jinhua was not under Wuzhou in Glottolog, and in general I followed the names used in Wikipedia (English and/or Chinese) except for the reclassifying of Wuzhou Wu as Jinqu Wu, which seems not well-accepted. In terms of intermediate branches, if there is controversy about them, one fairly easy way to handle that is to flatten the trees, so that e.g. the Tiaoxi etc. branches go away. Overall what I have been trying to do for all primary branches is fill out the main missing labels, esp. those corresponding to labels already present in various entries, although I did add more labels for Wu than other branches. Before my changes, things seemed to be in a pretty haphazard state. The idea is that from labels we can create categories and then add the more important ones as etymology-only languages. Keep in mind that in general, information in a place like Module:labels/data/lang/zh can easily be changed as it's in a single location and not propagated across several entries; but indeed I will try to consult you guys in the future. Note also that because of these label changes there are now some uncreated categories in Special:WantedCategories, such as Category:Jinhua Wu, Category:Chuqu Wu, Category:Quzhou Wu and Category:Shaoxing Wu (and possibly more; the data in Special:WantedCategories is from a couple of days ago). I will be creating these categories using auto cat if that's OK with you; again, the information here is easily fixable if needed. Benwing2 (talk) 00:36, 30 March 2024 (UTC)
 * Please also note that I just made a change to the code that handles the language variety categories using auto cat so that the Wikipedia articles are automatically pulled out of the labels in places like Module:labels/data/lang/zh if not explicitly specified. See Category:Jiaoliao Mandarin for an example of where this does its thing. In general I am trying to consolidate the information on lects in fewer places, as it's currently scattered in at least five locations (labels data modules; language data modules such as Module:etymology languages/data; "dialect data" modules used for alt; category pages that use auto cat; dialect synonyms data modules such as Module:dialect_synonyms/zh). Benwing2 (talk) 00:47, 30 March 2024 (UTC)
 * I think what I'm also going to do is add a field to the label data so that the tree of lects can be indicated properly; this info is already specifiable in Module:etymology languages/data and auto cat. Benwing2 (talk) 01:02, 30 March 2024 (UTC)
 * I see, thank you. I am not against the creation of the category pages.
 * What we have in the labels page for the most part works now, though I would like to make the following changes:
 * Lishui and Pucheng (Fujian) Wu are to be separated; the original Fujian ∈ Lishui notation was only done because of the lack of Chuzhou.
 * "Northern Zhejiang Wu" and "Northwestern Wu" are to be removed as very few sources even mention let alone include them
 * Jinhuanese is to be included as a Jinqu variety
 * Taizhouic is to be renamed to Taizhou; "Taizhounese" in itself doesn't really exist as the urban centre of Taizhou is home to several varieties. There is nothing that is conflated with Taizhou in reality other than maybe Tàizhounese (Mandarin > Huai) in Jiangsu
 * Not implemented yet, but if necessary, Urban Shanghainese should be a subcategory of Shanghainese, which should in turn be a subcategory of Sujiahu
 * If I have any additional thoughts later I can inform you/edit the labels page. When the time comes.
 * My sources document is here (a lot of the books are pirated) and is maintained and handled by many other users, including several here on Wiktionary. The Wu section is Section 1.6. Thanks — nd381 (talk) 03:12, 30 March 2024 (UTC)
 * @ND381 Wow that is a lot of information in that link! Thank you. I will make the suggested changes. A couple of questions, though:
 * If we rename "Taizhouic Wu" -> "Taizhou Wu", what should happen to the current "Taizhou Wu"? The reason I chose "Taizhouic" as the name is because of the existence of both the and  articles, based on the name "Beijingic Mandarin" (which is found in Glottolog) corresponding to the primary Mandarin branch  as opposed to "Beijing Mandarin" corresponding to the dialect of Beijing itself, i.e. . Should we use something like "Urban Taizhou Wu" corresponding to Wikipedia's ?
 * "Jinqu" seems to be the more recent name of Wuzhou Wu. Should we get rid of Wuzhou Wu in favor of Jinqu Wu?
 * Benwing2 (talk) 03:25, 30 March 2024 (UTC)
 * Also do you have knowledge of Northern Min? According to Chinese Wikipedia, there are two primary branches called Dongxi (東溪) and Xixi (西溪) (although Glottolog does not have them, but groups all Northern Min varieties other than Shaojiang, which seems to not be Northern Min at all, under "Northwestern Min Bei"). If they are real, I am thinking maybe it would be better to call them something like "East Northern Min" and "West Northern Min". Does this make sense? I have similar notes in Module:labels/data/lang/zh about Eastern Min, where the primary branches Funing and Houguan should maybe instead be called North Eastern Min and South Eastern Min. Benwing2 (talk) 03:43, 30 March 2024 (UTC)
 * Regarding this, I do not study Min. I do not know. Ask one of the Minguists here instead. I would direct you to one but I'm not sure which of the people I know are here and which are on Discord, sorry — nd381 (talk) 09:01, 30 March 2024 (UTC)
 * 1. Delete it. I saw your comment regarding "Beijingic Mandarin" already. The main problem here is that "Taizhounese" doesn't actually refer to something that "corresponds to the dialect of [urban] Taizhou itself", as that would usually be called Linhainese (臨海話), Huangyanese (黃巖話), Jiaojiangese (椒江話), etc., cf. Wugniu. What I have as "Urban Taizhounese" in the dump™ is just an helpful label for searching and is not meant to be used as an authoritative source in classification. I would recommend removing (Urban) Taizhou Wu and rename Taizhouic Wu to just Taizhou Wu. If an "urban Taizhou" label is to be desired, use Jiaojiang.
 * 2. It's not that it's more recent and more just that the revised edition of Li's atlas (Li 2012; the one in the dump™ called 中國語言地圖集), which is still filled with blatantly false information, uses it. I personally use a Wuzhou-Chuzhou-Xinqu (Xin referring to Shangrao) split but you can really do whatever you want/leave it be since classifying these lects is still p contentious
 * 3. Change "Fujian Wu" to Pucheng Wu. No conflict with Pucheng Min since Wu is specified; lects in Ningde and the Jinxiang isolate are Auish (ie. Wenzhou-related) and Northern respectively and would lead to more ambiguity.
 * Thanks — nd381 (talk) 08:59, 30 March 2024 (UTC)
 * @ND381 Hi. You caught me right as I'm going to bed but please take a look at the current state of the module. I already renamed Taizhouic to Taizhou and Taizhou to Urban Taizhou; I'll delete the latter. I removed Northern Zhejiang Wu and Northwestern Wu but left Wuzhou as-is, and added a Pucheng Wu node, removing Fujian Wu. BTW I am now going through and adding textual descriptions and parent label properties, which will be useful in centralizing the information currently found in the individual category pages; but none of this is in the production module yet, just on my own machine. Also see the Grease Pit post I made about centralizing/consolidating lect info. Benwing2 (talk) 09:06, 30 March 2024 (UTC)
 * Thank you. — nd381 (talk) 09:35, 30 March 2024 (UTC)
 * @ND381 I have added support for including descriptions and parent labels in the label data in Module:labels/data/lang/zh. I have converted the Lua comments to parent labels in most cases and added descriptions (using the region or sometimes def fields) for Mandarin and Northern Wu lects and some others. I am working on Southern Wu now but I may need a bit of help. In particular, I added labels for all the subgroups called  in Chinese, which we define as "cluster" (see the box near the bottom of the 吴语 page for a diagram of all these clusters), but increasingly I think they shouldn't be defined. Enwiki generally doesn't include such intermediate divisions in its descriptions of individual dialects, and most of these "clusters" are red links, redirects or stubs in zhwiki. Tentatively I'm thinking of keeping the ones for Northern Wu (Piling, Tiaoxi, Sujiahu, etc.) and probably the ones for Chuqu Wu (Longqu, Chuzhou, Shangshan), and discarding the remainder. Thoughts?
 * Also, on a related subject, why is it that there is such extraordinary diversity in the Wu lects (esp. the southern ones) in such a small area, when Mandarin lects seem to vary only a little over vast areas? Is the terrain in Zhejiang such that movement is very difficult? Or was there some sort of recent calamity in Northern China that caused migration all over the place (and resulting dialect mixing)? Benwing2 (talk) 06:06, 1 April 2024 (UTC)
 * mountain = dividing + no wide-scale areal effects (Mandarinic is not a phylogenetic group)
 * as for xiaopian, honestly if you want you don't need to add any since they're very contentious — nd381 (talk) 10:51, 1 April 2024 (UTC)
 * @ND381 OK thanks. I have left the xiaopians that I mentioned above (for Northern Wu and Chuqu Wu) and removed the others. I finished adding parents and region descriptions for Southern and Northern Wu lects to Module:labels/data/lang/zh, added all Wu categories that had at least one entry in them and fixed the existing Wu categories to read just auto cat (instead of having additional parameters to specify the parent, region, etc.), so that the parent and region description get picked up from the label data. There shouldn't be anything very controversial that I added; the descriptions are mostly just listing the area(s) where each lect is spoken per English and Chinese Wikipedia, although in some cases (e.g. Fuyang Wu, Hangzhounese, Jinxiang Wu, Old Guangde Wu, Sujiahu Wu, Urban Shanghainese Wu, Baizhang Wu, Changbei Wu, Jujiang Wu, Old Xuanzhou Wu, Pucheng Ou Wu, Pucheng Wu, and in general all the primary branches) I added text under the addl field describing the notable characteristics of the lect.
 * It occurs to me we will eventually probably need to split Wu into different languages, at least on the primary branch lines (Northern Wu, plus some number of Southern Wu branches); but I think we probably should wait to tackle that until we finish the Southern Min and Yue splits. Benwing2 (talk) 05:13, 2 April 2024 (UTC)

Gender-neutral adjectives in Module:es-headword
I noticed you added the option  for gender-neutral nouns in Spanish. Could you add the same option for adjectives?

For example, the headword-line for the adjective latine currently displays as "m or f", which is wrong, it should look the same as the headword-line for the noun. 26agcp (talk) 19:20, 30 March 2024 (UTC)


 * @26agcp I added this. You can use 1 on an adjective to indicate that it's gender-neutral, which I have done for latine. I'm not sure whether this will work correctly on adjectives not ending in -e, such as latinx or latin@ (if it's possible to use these as adjectives). If these are adjectives and 1 doesn't work right, let me know and I'll fix it. Benwing2 (talk) 06:15, 1 April 2024 (UTC)

Category:Chinese terms written in foreign scripts
Hi, I noticed that you've added functionality in Module:zh-pron to automatically add pages that do not contain any Chinese characters to the category. However this has caused the category to be flooded with POJ entries (which are romanisation entries and therefore shouldn't be there) and Zhuyin entries (which are not "foreign"). Can you see how this can be fixed? Or perhaps revert the changes for the time being. Much thanks.

PS: The POJ entries are there because of Module:zh-see which tries to call Module:zh-pron with only_cat. It's a total mess there which I don't want to talk about.

– wpi (talk) 13:56, 2 April 2024 (UTC)


 * @Wpi Thanks for letting me know. The Zhuyin entries should be fixable by changing the regex to exclude Zhuyin/Bopomofo characters. If the POJ entries are only there because they are calling Module:zh-pron with a specific flag, I can check for that flag. Let me see what I can do. Benwing2 (talk) 19:41, 2 April 2024 (UTC)
 * OK, this should be fixed. Benwing2 (talk) 20:07, 2 April 2024 (UTC)

Replacement of quotation template
Hi, when you are free could you please do the following bot runs?


 * → (the quotation template has been updated to add the 1st edition (1994), so current uses need to be updated)
 * → (the quotation template has been updated to add the 1st edition (1994), so current uses need to be updated)

Thank you. — Sgconlaw (talk) 22:52, 2 April 2024 (UTC)


 * @Sgconlaw Done. Benwing2 (talk) 04:13, 5 April 2024 (UTC)
 * Thanks! — Sgconlaw (talk) 04:36, 5 April 2024 (UTC)

Time-outs from change to Module:headword
Hi - I think your latest change to Module:headword is causing time-outs at some Written Oirat entries, like. Theknightwho (talk) 03:38, 5 April 2024 (UTC)


 * @Theknightwho Yup, I just added a check to make sure this doesn't happen. I don't quite know why it's happening, something weird about the script being returned, but I limit the iterations to 10 now no matter what. Benwing2 (talk) 03:41, 5 April 2024 (UTC)

By the way
It seems that one thing that really slows things down is declaring functions inside other functions. Sometimes it's unavoidable, but there are plenty of instances where it's straightforward to move them out of the parent function; sometimes with extra parameters, if they needed to access any upvalues. This happens a lot with anonymous functions declared inside, but Module:languages is currently a big offender, since all the methods get redeclared every time a language object is requested. Theknightwho (talk) 11:52, 5 April 2024 (UTC)


 * @Theknightwho Hmmm, interesting. Do you know if it's related to the size of the function (in which case we could move the contents of the larger functions in Module:languages outside of the object) or just the presence of the function? Benwing2 (talk) 18:29, 5 April 2024 (UTC)
 * @Benwing2 There seems to be an inherent cost for each closure, just like objects. The literal length of the function in bytes is a (very small) factor, but since it's only parsed once it makes no difference whether it's inside another function or not. Theknightwho (talk) 18:35, 5 April 2024 (UTC)
 * This is basically the memory-speed trade-off with Lua. Local objects/functions are cleared very quickly by the garbage collector (especially anonymous ones), but you need to spend extra time generating each one, and often it's just not worth it. Theknightwho (talk) 18:39, 5 April 2024 (UTC)
 * @Theknightwho Got it, in which case the only way to speed up Module:languages is to redo it without the use of an object, which would entail (AFAIK) a huge amount of rewriting of code that uses it. Possibly there is an in-between way, e.g. create an object-less version of Module:languages and then create an object that wraps it, and rewrite only the core modules (Module:links, Module:headword, Module:translations, ...?) to use the object-less version. But this is still a fair amount of work. Benwing2 (talk) 19:26, 5 April 2024 (UTC)
 * @Benwing2 I don’t think it’s as bad as that - it should be possible to move the function declarations out of the language-generating function, since they’re inherited via metamethods anyway. If I remember correctly, the reason for the current set-up is because I wanted to make it possible to grab language objects that use lua instead of lua in contexts where speed is more important than memory-use (since lua adds a lot of overhead to data access times). I think the only module which uses that option is Module:family tree, since everything’s done in a single invocation via auto cat. Theknightwho (talk) 15:12, 6 April 2024 (UTC)
 * @Benwing2 I just implemented this, and this change alone sped things up by about 5% on very large pages.  is now specified using a key and a dedicated  method, but in all honesty it might not be necessary anymore: it was only ever implemented because  makes memory usage worse with auto cat, because everything's contained within one invoke, and some proto-language pages were pushing the old 50MB limit due to the descendants trees. Theknightwho (talk) 23:16, 21 April 2024 (UTC)
 * @Theknightwho Hmm, are you saying the functionality might not be needed because it's only used in Module:family tree, and the 50MB limit no longer applies? If so, it might be reasonable to consider removing it at some point, but I would say leave it for the moment because it might be needed elsewhere. I notice for example that some pages that use auto cat are hitting 65MB or so of memory, and maybe could benefit from this. I think the high memory usage is because of the implementation that searches through all labels to find those that categorize into a particular category, since I've been consolidating the lect info into the labels modules instead of having them scattered in the auto cat calls themselves and duplicated in several other places. I added memoization of the calls to  (which ends up being called multiple times due to the way that the poscatboiler code retrieves information on all parent categories in order to determine the breadcrumbs and the parents' parents etc.), which reduced the memory a bit and sped things up a lot. Maybe adding further memoization of the fetched labels data would reduce memory usage significantly and/or use  (the labels data itself is already loaded using  but it is then converted into containing structures by Module:labels/utilities, which maybe could be cached since it's all happening in a single auto cat invocation). Benwing2 (talk) 23:31, 21 April 2024 (UTC)
 * @Theknightwho BTW thanks for all the profiling work you're doing. This sort of work isn't really my strong point and something I don't really enjoy doing that much, so I am glad you are putting the time into doing it as it's quite necessary. Benwing2 (talk) 23:33, 21 April 2024 (UTC)

pcall and accessing nonexistent pages
I think I've worked out the reason why lua is so slow when used with nonexistent modules: it's because nothing gets cached in lua, so every time the module's requested it's forced to run the full loader, whereas retrieved modules simply use the cache on subsequent accesses. We could get around the issue by adding lua to lua after the first failure, which should speed things up. After doing various profiling tests, I'm pretty sure the issue isn't down to lua itself. Theknightwho (talk) 06:23, 8 April 2024 (UTC)


 * @Theknightwho Interesting. This does make sense, and I wondered why I was seeing such slow pcalls (loading nonexistent modules) when you reported no issues with them. Benwing2 (talk) 06:36, 8 April 2024 (UTC)
 * @Benwing2 I've added lua to Module:utilities, which (1) checks if there's a cached value for the module in lua and returns it if so, (2) runs lua, and (3) if the module doesn't exist, caches it as lua. Two things of interest:
 * It's still about twice as fast as lua even when handling already-cached modules, since it doesn't bother with all the lua safety checks and so on (close to 1 million iterations per second).
 * Nonexistent modules still don't work with lua even after they've been cached, since lua checks lua instead of lua. I'll put in a Phabricator ticket about that, but they'll probably ignore it.
 * Theknightwho (talk) 07:43, 8 April 2024 (UTC)
 * Related to this, I've discovered that if you lua a module with a return value of lua, you get lua haha. lua seems to use lua as a placeholder so that modules with no return value get cached, but the falsy existence check causes this bug. Theknightwho (talk) 08:58, 8 April 2024 (UTC)
 * Related to this, I've discovered that if you lua a module with a return value of lua, you get lua haha. lua seems to use lua as a placeholder so that modules with no return value get cached, but the falsy existence check causes this bug. Theknightwho (talk) 08:58, 8 April 2024 (UTC)

Consolidating into Module:string utilities
Hiya - I've done a total rewrite of most of Module:string utilities, which I'll be introducing over the next few days (so that I don't run into issues changing everything at once). I've decided to reverse course on splitting out functions into their own modules, as I'm not convinced that it's actually very helpful, and it makes organising everything much more confusing.

At the same time, we've got a bunch of duplicate functions floating around (I think there are 4 version of lua), so it makes sense to consolidate everything. To that end, I think it makes sense to merge in most of the single-function modules, as well as some of the smaller satellite modules which are integral to string manipulation, like Module:pattern utilities, because so many of them are dependent on each other anyway. Theknightwho (talk) 18:22, 8 April 2024 (UTC)


 * @Theknightwho OK, makes sense! Benwing2 (talk) 18:25, 8 April 2024 (UTC)
 * @Benwing2 I've rolled out pretty much everything new - there are stll a few single-function modules I want to merge in, but at least the new code seems to be holding up well. By the way - I've renamed lua to lua, since it's faster than lua for everything except the default charset (since that's the only time lua uses the standard string library), so it makes sense to use with and without capturing groups. Theknightwho (talk) 21:04, 8 April 2024 (UTC)
 * @Theknightwho Interesting ... I wrote capturing_split long ago with no particular intention of making it fast; the capturing functionality was just needed by Module:ru-common. Benwing2 (talk) 21:10, 8 April 2024 (UTC)
 * @Benwing2 I've reworked it pretty heavily, but it essentially still works in the same way. The big thing is finding fast ways to detect whether you can use the string library, since anything involving magic characters in the ustring functions is completely hopeless. Theknightwho (talk) 21:17, 8 April 2024 (UTC)
 * @Theknightwho Cool, thanks for all this work. I do think after you finish this you should revisit the pattern change in Module:headword made by User:Erutuon that seems to have slowed down the average time of big pages; maybe there's a way to preserve the functionality while avoiding the double Kleene star operators. Benwing2 (talk) 21:21, 8 April 2024 (UTC)
 * @Benwing2 Yeah, it should be possible to do it in multiple stages. Theknightwho (talk) 21:24, 8 April 2024 (UTC)
 * Also, just to illustrate the point about speed: the revised lua function is over 10 times faster than lua with the input lua, and the gap increases as the string gets longer. Theknightwho (talk) 23:12, 8 April 2024 (UTC)
 * I'll say again that changing and even removing that pattern altogether and previewing a page didn't seem to significantly up Lua execution, but I gave up pretty quickly so maybe it's worth further testing. — Eru·tuon 00:20, 12 April 2024 (UTC)
 * @Erutuon Yeah there seems to be a whole lot of variability in times, but something definitely caused an average-time slowdown, just don't know what. Benwing2 (talk) 00:22, 12 April 2024 (UTC)

Module:grc:Dialects
CAT:E is being swamped with Greek pages complaining that this module doesn't exist. I think it may have been deleted prematurely... Ioaxxere (talk) 01:41, 10 April 2024 (UTC)


 * @Ioaxxere Ahh, fuck me. Thanks to whoever undeleted it. Benwing2 (talk) 01:43, 10 April 2024 (UTC)

Nonfunctional newversion in
See. Neither  or   is accepted. ―⁠Biolongvistul (talk) 13:42, 11 April 2024 (UTC)

Template:tracking/defdate/hyphen
defdate still seems to be using this, and is now displaying a redlink to it; see e.g. sirrah or bḥ. (Searching mainspace for "template:tracking" finds about 600 instances of this, but AFAICT no instances of any other template besides defdate doing anything like this, so it seems to be an issue with only this one template, not a more widespread issue.) - -sche (discuss) 18:20, 11 April 2024 (UTC)


 * @-sche Should be fixed. Benwing2 (talk) 22:23, 11 April 2024 (UTC)

Bot-addition of templates
You've recently told Theknightwho multiple times to not introduce changes to core modules without discussing it with the community beforehand. Then why are you doing the exact same thing with templates? I have not heard anything about this, it's from before my time, I am now finding it all over the place added by a bot, and I have many objections! You can't just go off an old discussion and start mass-adding templates with a bot without making sure the current editors are still fine with it, especially when the discussion that this template was based on had just five users comment on, and, I repeat, is eight years old. Thadh (talk) 21:42, 11 April 2024 (UTC)


 * @Thadh I did not do that. User:-sche posted in the Beer Parlour about this template, see WT:Beer parlour/2024/April, and I posted and said I would do a bot run to introduce this if no one objected. I waited about a week; next time I'll wait a month if that would help. What are your specific objections? This can be undone if necessary but I want to make sure your objections can't be met in some other way. Benwing2 (talk) 21:46, 11 April 2024 (UTC)
 * Yes, waiting a bit longer would be a good idea for next time. I'll post my objections in the BP thread, but thanks for pointing me towards it. Thadh (talk) 21:48, 11 April 2024 (UTC)

Why does Wingerbot has been made to "canonicalize Sicilian phonemic pronun"?
Can I ask you why every Sicilian pronunciations I am encountering it's being wrongly changed in phonological expressions? Hyblaeorum (talk) 09:59, 19 April 2024 (UTC)


 * @Hyblaeorum What is wrong? User:Nicodene asked me to convert Sicilian pronunciations into their phonemic form, that's all. Benwing2 (talk) 19:24, 19 April 2024 (UTC)
 * Transcriptions using // are phonemic, and Sicilian only has five vowel phonemes: /i ɛ a ɔ u/. Nicodene (talk) 19:47, 19 April 2024 (UTC)
 * Actually Sicilian language has 5 stressed vowels and 3 unstressed ones. English got 28.
 * So if a language like Sicilian has a given set of unstressed vowels in its system are they going to be put out of the slashes?
 * Just to be clear:
 * u lupu is not pronuounced /uˈlu.pu/;
 * it's unavoidably /ʊˈlu.pʊ/
 * I would like to allow people to know how to speak my language; not spreading misinformations about it. Hyblaeorum (talk) 08:42, 20 April 2024 (UTC)
 * [ʊ] is an unstressed allophone of the phoneme /u/ in Sicilian. Unless you can show an example where [u] versus [ʊ] can distinguish word A versus word B (a minimal pair) I don’t see any basis for giving a phoneme */ʊ/. It is simply a mis-use of basic linguistic notation. Phonetic and phonemic transcription are not the same thing.
 * Speaking of misinformation, according to The Oxford Guide to the Romance Languages (pages 250–1) Sicilian /i/, /u/ in word-final position are phonetically [i], [u] and not [ɪ], [ʊ]. I’m not sure why you keep doing that. Nicodene (talk) 09:38, 20 April 2024 (UTC)

"Cannot handle template synonym of."
I thought I would bring this up here rather than throwing red meat to the wolves at a certain other discussion. If we're going to be having Module:transclude pulling from a wide variety of entries, we need to make it robust enough so it doesn't get the vapors at the first sight of a template someone didn't think to program into it. I'm really surprised I haven't seen this error before. Chuck Entz (talk) 02:22, 21 April 2024 (UTC)


 * @Chuck Entz Yeah this is why I generally only use it for toponyms. Handling things like synonym of, alternative form of, etc. is tricky because when you switch to another language, the form-of template no longer becomes valid. In this case, admiral was changed to say it's a synonym of flagship, but naturally that relationship doesn't apply in Middle Polish or any other language. I think in some cases like this one, this can be fixed by just listing the other term without any form-of qualifiers (hence "synonym of flagship; a ship of the line [etc.]" becomes just "flagship; a ship of the line [etc.]"), but that may not work in all cases. The only other alternative I can think of is to just ignore the form-of template in the transclusion. Thoughts? Benwing2 (talk) 02:44, 21 April 2024 (UTC)
 * My first thought was to ignore, but flag. That way someone could follow up to look for things that could be fixed or that would need to be addressed. Chuck Entz (talk) 02:55, 21 April 2024 (UTC)
 * @Chuck Entz OK, I will implement something like this: (a) handle all the form-of templates I can think of in some sensible way, (b) handle unrecognized templates by ignoring them but issuing a warning during Preview, and also add template tracking and/or a tracking category, and also maybe logging using mw.log. We could also insert some text into the output itself saying essentially "implement handling for this template"; I don't know if you think this is a good idea. Benwing2 (talk) 03:27, 21 April 2024 (UTC)

transliteration of Greek to Latin characters
Hi Benwing2, I am usually in el-wikt and only occasionally here. We are looking for a tool in el-wikt to transliterate names in greek characters to latin characters according to en:w:ISO 843. I see that here Template:t does something similar (we would only have to change the table of equivalent characters), but my knowledge is not enough to locate and copy the relevant part (I am looking at Module:translations, but as I told you, I cannot see which module is invoked). Are you the right person to ask for help? If not, who could possibly give us a hand? FocalPoint (talk) 16:45, 22 April 2024 (UTC)


 * Forget it, we found it ! No need to invest time. Have a nice day. FocalPoint (talk) 05:32, 23 April 2024 (UTC)
 * @FocalPoint Glad you found it, and sorry for the delay in responding. Benwing2 (talk) 05:50, 23 April 2024 (UTC)

Duplicate categories
Hi, I'm from ckbwiktionary. While mass importing subcategories of Category:Languages by country on ckbwiktionary, I noticed that Category:Languages of Republic of the Congo and Category:Languages of the Republic of the Congo being the same category. Which one should stay? Thanks! Aram (talk) 21:56, 26 April 2024 (UTC)


 * @Aram Thanks for pointing this out. The latter category should stay; this one is consistent with our naming policies (which include the word "the" when appropriate), and contains all but one of the languages. Benwing2 (talk) 22:08, 26 April 2024 (UTC)

toxic hellstew
The last CFI-meeting quotation I found was from 2015 (this 2019 Wired article is quoting the 2015 blog). I'm not sure whether non-CFI-meeting quotations should have any bearing on a label like ephemeral, assuming we did start using it. Ioaxxere (talk) 06:32, 28 April 2024 (UTC)


 * @Ioaxxere The larger point is that I don't think you can determine "ephemerality" until well after the fact; 2015 is not far enough in the past. Even if a well-known blog like medium.com doesn't count for CFI, the fact that it is still in use by them (and not in any way quoting the 2015 blog) means it's likely not "ephemeral". Benwing2 (talk) 07:00, 28 April 2024 (UTC)
 * Yes, I guess we'll have to wait and see although I think it's fairly unlikely that this particular term will get revived. FYI, Medium isn't a blog: it's user-generated content on par with Twitter and others. Ioaxxere (talk) 07:05, 28 April 2024 (UTC)
 * @Ioaxxere OK, my mistake about Medium but I think the point still stands. Benwing2 (talk) 07:19, 28 April 2024 (UTC)
 * BTW I would label this term as a neologism; IMO that fairly portrays the facts that it is (relatively) new and not yet (ever?) clearly entered the lexicon. Benwing2 (talk) 07:22, 28 April 2024 (UTC)

Your reverts
Why are you reverting? — Fenakhay ( حيطي · مساهماتي ) 22:32, 28 April 2024 (UTC)


 * Requests_for_deletion/Others seems to show consensus for removal of noun plural form as a headword. Benwing2 (talk) 22:34, 28 April 2024 (UTC)
 * IMO it accomplishes nothing over noun form. What other sorts of noun forms are there? Benwing2 (talk) 22:35, 28 April 2024 (UTC)
 * What consensus?? You need a vote in WT:BP if you want to implement this drastic change. There are dual, paucal and plural forms in Chadian Arabic. — Fenakhay ( حيطي · مساهماتي ) 22:39, 28 April 2024 (UTC)
 * A vote for this kind of thing seems appropriate to me. DCDuring (talk) 11:43, 29 April 2024 (UTC)

Replacement of quotation templates (April 2024)
Hello, when you are free, please carry out the following replacements:


 * → (except  and )
 * → (except  and )
 * → (except  and )

(I have updated the templates with the 1st editions of the work as the default, so all current uses which were based on later versions need to have those versions specified).

Thank you. (I may add a few more requests over the next few days.) — Sgconlaw (talk) 17:21, 30 April 2024 (UTC)


 * A friendly suggestion: vary your discussion-section titles, rather than always naming them “Replacement of quotation templates”. Identical names mean that section links all point to the first such-named section of a given page (e.g. ). 0DF (talk) 17:47, 30 April 2024 (UTC)


 * OK. I used to give them numbers, but it was hard to keep count … — Sgconlaw (talk) 17:50, 30 April 2024 (UTC)


 * I can quite imagine. What about titles that specify what you're quoting? E.g., in this case, “Replacement of [Homer|Chapman|Odysseys] quotation templates”? 0DF (talk) 21:05, 30 April 2024 (UTC)


 * I'll put the date. But, frankly, does it really matter at all? — Sgconlaw (talk) 22:50, 1 May 2024 (UTC)


 * On talk pages, not all that much, but it must be inconvenient for Benwing, though I don't know whether cares. Alternatively, you can just append new requests under old requests in the same section. Feel free also to ignore my suggestion; I only made it because I'd noticed it happening several times, but this isn't my talk page, so it hardly affects me. 0DF (talk) 23:06, 1 May 2024 (UTC)


 * @0DF @Sgconlaw Honestly I didn't even notice that all the titles are the same. Benwing2 (talk) 23:13, 1 May 2024 (UTC)


 * @Benwing2: À fortiori you didn't care. 0DF (talk) 14:48, 2 May 2024 (UTC)


 * : For those of us who read the diffs from the revision history, it's completely irrelevant. A completely unrelated issue, though, is the years of redlinks to the deleted/moved templates that are cluttering Special:WantedTemplates. Now that most of the tracking template links are gone, redlinks to deleted templates are the next obvious target. Chuck Entz (talk) 03:19, 2 May 2024 (UTC)


 * Perhaps, but for those who don't, it is relevant. 0DF (talk) 14:48, 2 May 2024 (UTC)


 * @Chuck Entz Yup, I noticed that although I'm not sure how to solve it. I suppose we could go through the user, talk and Wiktionary pages that link to these templates and comment them out using  or whatever, although it would be nice if MediaWiki provided a way of suppressing links by namespace when generating these lists. Benwing2 (talk) 03:49, 2 May 2024 (UTC)
 * @Sgconlaw Are you expecting to add more requests? If not I will go ahead and run the ones you've listed. Benwing2 (talk) 04:40, 6 May 2024 (UTC)
 * OK, I've added the other requests. Please go ahead! — Sgconlaw (talk) 15:31, 6 May 2024 (UTC)
 * @Sgconlaw Should be done. Note that it also ran on the template pages themselves as well as Quotations/Templates/English T–Z and Quotations/Templates/English C; if this is wrong, please undo. Thanks! Benwing2 (talk) 20:35, 6 May 2024 (UTC)
 * Thanks! — Sgconlaw (talk) 20:39, 6 May 2024 (UTC)

Template:eu-verb form of/new
Hello: If possible, could you run your bot to change all instances of to ? Both templates are identical so no other changes are required. Thank you in advance. Santi2222 (talk) 18:03, 6 May 2024 (UTC)


 * @Santi2222 Done. Benwing2 (talk) 20:43, 6 May 2024 (UTC)


 * Perfect, thanks! Santi2222 (talk) 11:10, 7 May 2024 (UTC)

WingerBot converting "gl" to "q"
Hi! I noticed WingerBot converting "gl" to "q" in Tagalog. However, I see it changing glosses of definitions into "q". It says here that to provide context in definitions, we use "gl" instead of "q". So it seems to me that the bot edits done to Tagalog lemmas seems to be wrong. Could you check? Thanks! Mar vin kaiser (talk) 07:24, 7 May 2024 (UTC)


 * @Mar vin kaiser Hi Mar (can I call you that?). Can you give me some examples? I am following standard practice, which is AFAIK: glosses are used when giving full definitions of terms; labels are used to label registers and grammar and usage characteristics before the definition; raw parens are used to indicate typical direct objects that are not part of the meaning of the verb when standing by itself; and qualifiers are used for everything else. The documentation you are pointing to is in no way standard practice; it was added by User:Fytcha in November 2021 based, I assume, on a misunderstanding of what gloss is for. More specifically:
 * direct object arguments to verbs go in raw parens, e.g. NOT.
 * full definitions for concepts, either to clarify polysemy or for unfamiliar concepts, use gl. e.g..
 * all other clarifications and qualifications use q, e.g..
 * Note the difference between case 2, which *defines* ("glosses") the term advocacy and case 3, which does *NOT* define the terms developed or progressive but *qualifies* then by giving the context in which the terms might be appropriately uesd.
 * Note also that I went through manually looking for uses of gl and corrected them by hand to use the correct syntax. The only purpose of the bot was to push changes I made manually offline, in a text editor. If you think I've gotten standard practice wrong, please post in the BP about this and we can have a discussion about what standard practice in definitions reall is when it comes to q vs. lb vs. gl. Benwing2 (talk) 08:25, 7 May 2024 (UTC)
 * Thanks for the reply. I thought the documentation I shared was the standard practice. Where can I find a guide where the difference is explained as part of standard practice? --Mar vin kaiser (talk) 08:29, 7 May 2024 (UTC)
 * @Mar vin kaiser The closest I could find was Style_guide, where it says the usage of gl is "Used to gloss a definition by redefining it in different words, especially to disambiguate an English defining word with many definitions, goes after the definition." and the usage of q is "Miscellaneous explanatory text (register, variety etc.)". Meanwhile under Style_guide it says "Parentheses should be used in definitions only for the purpose of identifying the selectional restrictions of the headword in the current sense:" which is consistent with my #1 above. Benwing2 (talk) 19:30, 7 May 2024 (UTC)

"a" is for module errors
Adding the new tl-pr template seems to have pushed this over the edge into permanent rather than intermittent Lua timeout errors. I'm not sure what can be done about it. Chuck Entz (talk) 14:05, 12 May 2024 (UTC)


 * Yeah, this has been concerning me as well, and there are no obvious causes which stand out. Theknightwho (talk) 16:39, 12 May 2024 (UTC)
 * @Chuck Entz @Theknightwho We can revert the changes on this page back to where they were before as a temporary fix, but I think for a permanent fix we might have to either split up letter pages or request some sort of per-page exception to the Lua limits. Letter pages in general are problematic because there are potentially thousands of languages that could be on the page. Benwing2 (talk) 19:25, 12 May 2024 (UTC)
 * BTW I tried switching tl-pr to use "raw" notation like this:


 * Here we specify the IPA directly to avoid going through the code to generate the IPA, but it seems to make no difference. Benwing2 (talk) 19:47, 12 May 2024 (UTC)
 * @Benwing2 Yeah, a lot of it is down to the inherent costs of Scribunto and the underlying core modules. I submitted this patch for review today which should help a little bit: it speeds up by about 15%, and cloning  for each invoke is a big contributor to the inherent cost. Theknightwho (talk) 20:55, 12 May 2024 (UTC)
 * The problem with asking for more Lua time is that the page already takes too long to load overall, independent of any system resources it uses. One time I got a server timeout while doing a null edit. Chuck Entz (talk) 21:28, 12 May 2024 (UTC)
 * @Chuck Entz Yeah that is true. I personally think MediaWiki should not try to recompute pages on the fly if they took more than a certain time to be generated last time they were generated, but that would require some discussion I'm sure by the MediaWiki developers. I think in the long run we'll have to split the letter pages somehow, although I'm not sure how. Benwing2 (talk) 21:34, 12 May 2024 (UTC)
 * @Benwing2 We could try doing a general version of multitrans, which was what the template parser was originally written to do, but we'd need to come up with a way to solve the section edit link issue, because they don't normally appear for headings enclosed in templates. Theknightwho (talk) 21:41, 12 May 2024 (UTC)
 * just substed the entire page, which caused a module error in desctree at Reconstruction:Proto-Slavic/a. I fixed a problem with derivation categories caused by the Breton etymology using Welsh language codes, but I won't mind if you wipe out my work by reverting to the pre-subst version. Chuck Entz (talk) 23:26, 12 May 2024 (UTC)
 * Oops, I admit that I neglected the possibility of other pages relying on the templates at . Maybe the best solution is to only subst certain templates which are both Lua-hungry and relatively unlikely to be modified. Alternatively, these other templates could be modified to work with HTML. Ioaxxere (talk) 23:34, 12 May 2024 (UTC)
 * @Theknightwho I think there is a way to solve the section link issue. User:This, that and the other might have ideas.
 * @Ioaxxere @Chuck Entz If we subst it like this, we need to write JavaScript to allow the page to be updated easily from the source. In general though this is a sub-optimal solution, as it's difficult to prevent people from manually hacking the substed version rather than the source. Benwing2 (talk) 23:50, 12 May 2024 (UTC)
 * I think the best solution is spin the letter entries off to subpages, since they're a completely different sort of lexicography- no syntax or semantics. So "X" as the 24th letter of the English alphabet would go there, but the "adult-content" symbol would stay. Likewise, all the prepositions, particles, etc. at "a" would stay, and probably the abbreviations on all the pages. It would at least temporarily solve our system problems and also address the issues that Kwamikagami has been hacking and ad-hoc-ing over. We might even think about setting up a namespace for the character entries. Chuck Entz (talk) 01:23, 13 May 2024 (UTC)
 * @Chuck Entz I would be in favor of that. What about names of letters, since often the name of the letter "a" is also "a"? Do we leave those or move them? Benwing2 (talk) 01:44, 13 May 2024 (UTC)
 * Those seem like entry-page material. After all, there would still be entries like aitch, so why would we treat the ones that happen to be the same as letters any differently? Chuck Entz (talk) 02:09, 13 May 2024 (UTC)
 * @Chuck Entz Makes sense. I think we should bring this up in the Beer parlour. Benwing2 (talk) 02:16, 13 May 2024 (UTC)
 * @Benwing2 @Chuck Entz I've discovered a major cause of the recent uptick: Module:headword/data, which is used by Module:headword, calls the function in Module:headword/page right at the end, which means the data table contains a load of computationally expensive data that only needs to be generated once for the whole page.
 * Normally this isn't an issue, but if the parameter is given to head, then  will be run again with that new pagename instead. This is usually only needed for testing, but a bunch of headword modules have been using the actual pagename as a default value for, which has meant  was being called about 15 times on a, adding a ton of unnecessary work.
 * I've amended Module:headword so that it only does that if is something other than the default value, but this really should be fixed in the calling modules, because they'll be wrongly overriding the proper default pagename calculated by Module:headword/data, which is non-trivial to determine for things like unsupported titles. Theknightwho (talk) 17:23, 13 May 2024 (UTC)
 * @Theknightwho Oops, that was my doing. I have meant to clean up the handling of pagename in the various modules I've worked on but I didn't realize it was causing this problem. I'll make it a priority to fix these. Can you point me to some of the modules doing this? Benwing2 (talk) 19:43, 13 May 2024 (UTC)
 * @Benwing2 I didn't look in much detail, but it was a bunch of Romance ones plus Tagalog iirc. Theknightwho (talk) 19:47, 13 May 2024 (UTC)
 * @Theknightwho OK thanks! Benwing2 (talk) 19:55, 13 May 2024 (UTC)
 * @Theknightwho The easiest way to clean this up is to keep your fix in place that avoids rerunning (which you can view as not a hack but a sort of caching fix), and change the modules to use  rather than  as the default pagename when pagename isn't explicitly given by the user. (This is already done by the Spanish headword module in fact.) Lots of the modules unilaterally set  in the Module:headword  structure passed from the main entry point to the POS-specific functions, and some of them need to be able to have access to the pagename (whether the actual pagename or user-specified proxy) to do POS-specific processing. The alternative is to have two different fields in the Module:headword  structure, one that is always set with the pagename and one which is set only when the pagename needs to be overridden. To me this seems an unnecessary complication given your recent fix, but if you think this is the right approach, I would recommend renaming the  field that is recognized by Module:headword to be . That makes it clear that this field should only be set when overriding the default pagename; otherwise the various headword modules would need some other field name to hold the always-available pagename, which would IMO be less than clear. What do you think? Benwing2 (talk) 05:52, 14 May 2024 (UTC)
 * @Benwing2 I think the first solution is preferable, to be honest.
 * A third thing that could help would be to separate out the stuff in Module:headword/page that relies on the pagename from the stuff that doesn't, but that would be pretty faffy. Theknightwho (talk) 16:44, 14 May 2024 (UTC)
 * @Benwing2 I think the first solution is preferable, to be honest.
 * A third thing that could help would be to separate out the stuff in Module:headword/page that relies on the pagename from the stuff that doesn't, but that would be pretty faffy. Theknightwho (talk) 16:44, 14 May 2024 (UTC)
 * A third thing that could help would be to separate out the stuff in Module:headword/page that relies on the pagename from the stuff that doesn't, but that would be pretty faffy. Theknightwho (talk) 16:44, 14 May 2024 (UTC)

привыкнул
Would you mind undeleting привыкнул? It is an archaic form of привык. Source: Ushakov Dictionary, online:, quote: ПРИВЫКНУТЬ, привыкну, привыкнешь, прош. привык, привыкла, и (устар.) привыкнул, сов. (к привыкать). For context, "устар." is short for "устаревшее" (neuter form of устаревший) (see ru:Викисловарь:Условные_сокращения). This archaic form can be encountered in older books. For example, it shows up five times in the Russian National Corpus: corpus search results. —andrybak (talk) 10:56, 15 May 2024 (UTC)


 * It's not even archaic, I mainly use this form. Thadh (talk) 13:01, 15 May 2024 (UTC)
 * @Thadh, same, but this would be original research ;-) —andrybak (talk) 13:13, 15 May 2024 (UTC)
 * @Andrybak I undeleted it and marked it as archaic. Benwing2 (talk) 23:21, 15 May 2024 (UTC)

K
K 194.71.19.145 08:42, 20 May 2024 (UTC)


 * ? Benwing2 (talk) 09:02, 20 May 2024 (UTC)

Phrasal verbs with forward
Your bot deleted these, but we really should include them. Any fix? Denazz (talk) 09:55, 20 May 2024 (UTC)
 * There may be other particles missing too, as there used to be over 5000 entries in Category:English phrasal verbs. Would be a bummer to add them all again manually. Denazz (talk) 10:02, 20 May 2024 (UTC)
 * I'm referring to the fact that Category:English phrasal verbs formed with "forward" is empty. Denazz (talk) 10:04, 20 May 2024 (UTC)
 * @Denazz This is easy to fix without adding all the manual categorization; there is a list of recognized particles in Module:en-headword. Please check out the list and let me know if anything is missing. Benwing2 (talk) 00:26, 21 May 2024 (UTC)
 * Easy to fix for the programmers like you! I don't understand modules, Ben, and see no "list of recognized particles". You'll need to guide me to the list, I'm afraid. Assume I'm a moron, and you won't be far off! Denazz (talk) 14:52, 21 May 2024 (UTC)
 * OK, the list is easy to find on that page, but it is protected. I suggest adding "asunder" to the list. Denazz (talk) 14:56, 21 May 2024 (UTC)
 * I think asunder,forward,forwards,adrift,aground,backwards should be added. Denazz (talk) 14:58, 21 May 2024 (UTC)

A minor issue.
is the altform of and has two descendants. and are descendants of, so under that Sanskrit entry I put  , which should have shown ,  and the two descendants of. It is doing exactly that, except for one mistake: is now appearing twice. Could you please look into the issue when you have time? -- 𝘗𝘶𝘭𝘪𝘮𝘢𝘪𝘺𝘪(𝘵𝘢𝘭𝘬) 01:52, 22 May 2024 (UTC)

IPA|a=
Hi, now that IPA takes a for accents, please edit x2IPA to take it as well so that inputs like en work. Thanks! —Mahāgaja · talk 12:47, 23 May 2024 (UTC)


 * @Mahagaja Should be fixed. Benwing2 (talk) 18:51, 23 May 2024 (UTC)

Replacement for
Hi, I have overhauled. Could you please carry out the following bot replacement?


 * 1) If a use includes 2nd, don't make any changes.
 * 2) Otherwise, add 1873 to the use.
 * 3) If a use has an Arabic or Roman numeral in the 1 position, add book to it.
 * 4) If a use includes url, remove it as that parameter is no longer in the template.

Thank you. ( for your information.) — Sgconlaw (talk) 15:22, 27 May 2024 (UTC)


 * @Sgconlaw I implemented (4) for the time being to reduce the errors in CAT:E. The others will come shortly. Benwing2 (talk) 23:42, 27 May 2024 (UTC)


 * Just a gentle reminder that there are still 91 entries in CAT:PFE because of this. Chuck Entz (talk) 14:48, 29 May 2024 (UTC)
 * @Chuck Entz OK, I'll take a look. Benwing2 (talk) 18:34, 29 May 2024 (UTC)
 * Spot-checking CAT:PFE, they seem to all be due to Roman numerals in 1. Of course, anything with Arabic numerals for the book number in 1 won't show up in CAT:PFE, so I wouldn't see those. I don't know if there are any that have page numbers already. Perhaps just flag those with ≤12 and skip >12, since there are no more than 12 books in any edition.
 * One additional step, though: changing 1 to book will cause 2, if present, to become 1. That means the former 2 would need to have passage added in those cases. Just to be clear and thorough:
 * {| class="wikitable"

! Parameter 1 !! Action !! Parameter 2 !! Action
 * none || none || none || none
 * none || none || text || none
 * Roman numeral || add "book=" || none | none
 * Roman numeral || add "book=" || text || add "passage="
 * Arabic numeral || not sure: add "book="? || none || none
 * Arabic numeral || not sure: add "book="? || text || if "book=" added to Parameter 1, add "passage="
 * }
 * Chuck Entz (talk) 01:10, 1 June 2024 (UTC)
 * Thanks, I've been punting on this because (a) I need to verify and enumerate all the cases, which you seem to have done, (b) I need to write a script (I have a script to handle most requests from User:Sgconlaw, but this one has special logic in it). Thanks for writing out the steps. Benwing2 (talk) 01:13, 1 June 2024 (UTC)
 * The template can deal with either an Arabic or Roman numeral as a value for book. — Sgconlaw (talk) 04:57, 1 June 2024 (UTC)
 * Wonder if you've had a chance to work on this? — Sgconlaw (talk) 21:40, 17 July 2024 (UTC)
 * Arabic numeral || not sure: add "book="? || text || if "book=" added to Parameter 1, add "passage="
 * }
 * Chuck Entz (talk) 01:10, 1 June 2024 (UTC)
 * Thanks, I've been punting on this because (a) I need to verify and enumerate all the cases, which you seem to have done, (b) I need to write a script (I have a script to handle most requests from User:Sgconlaw, but this one has special logic in it). Thanks for writing out the steps. Benwing2 (talk) 01:13, 1 June 2024 (UTC)
 * The template can deal with either an Arabic or Roman numeral as a value for book. — Sgconlaw (talk) 04:57, 1 June 2024 (UTC)
 * Wonder if you've had a chance to work on this? — Sgconlaw (talk) 21:40, 17 July 2024 (UTC)
 * Wonder if you've had a chance to work on this? — Sgconlaw (talk) 21:40, 17 July 2024 (UTC)

Template:label
We have to come up with a better way to show the labels. Currently the template page is periodically popping up in CAT:E, and just now I got a server timeout while doing a null edit with previewing enabled. It's only a matter of time until the module error becomes permanent. Perhaps the list could be split up into separate subpages linked to from the documentation page. Chuck Entz (talk) 14:46, 29 May 2024 (UTC)


 * @Chuck Entz Yes, I agree. I just previewed it and it took 3 tries to not get it to time out. From looking at the list, it looks like maybe we could split it three ways: (a) language independent, (b) language-dependent A-G, (c) language-dependent H-Z. Benwing2 (talk) 18:38, 29 May 2024 (UTC)
 * @Chuck Entz I turned off label tracking while processing the large table of labels, and now the CPU usage is around 6 secs, the memory usage is about 64MB and the post-expand size is around 1.4MB; all well below the limits. So it may not be necessary to split the labels for awhile. Benwing2 (talk) 04:44, 30 May 2024 (UTC)

Replacement of quotation templates (May–July 2024)
Hi, could you please carry out the following replacements:


 * → . The parameter 1678 is no longer used and should be removed.
 * → (if year is not currently used). Also, remove all instances of the chapter parameter as it is not used by the template any more.
 * If no year, add.
 * If 1580 appears, change to.
 * – remove the chapter parameter which is no longer used.
 * If 1580 appears, change to.
 * – remove the chapter parameter which is no longer used.

Thank you. (, for your information.) — Sgconlaw (talk) 18:12, 29 May 2024 (UTC)

Wondering if you've had a chance to work on these? I added one more. — Sgconlaw (talk) 21:39, 17 July 2024 (UTC)


 * @Sgconlaw Oops, I totally overlooked this. Let me do them now. Benwing2 (talk) 22:21, 17 July 2024 (UTC)
 * Much obliged! — Sgconlaw (talk) 22:32, 17 July 2024 (UTC)
 * @Sgconlaw You mentioned above that year=1678 for RQ:Butler Hudibras should be removed, but in windore the year is 1663. Benwing2 (talk) 23:11, 17 July 2024 (UTC)
 * That was a typo, as 1663 doesn’t do anything in the template as far as I’m aware. — Sgconlaw (talk) 05:06, 18 July 2024 (UTC)
 * @Sgconlaw OK, everything else should be done. Let me know if you see any errors or issues. Benwing2 (talk) 05:25, 18 July 2024 (UTC)
 * Thanks! By the way, I also asked about above. — Sgconlaw (talk) 16:30, 18 July 2024 (UTC)
 * Oops, I missed that too :( ... Benwing2 (talk) 20:44, 18 July 2024 (UTC)

Coding help
Hello,

I created some categories (Theodore Roosevelt, Wars) and I need help integrating them into the module/cat tree (because Wiktionary uses a module and not just categorization like everywhere else 😡😡😡). Do you do that sort of thing? I filed a couple requests on Grease Pit but nobody replied <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 13:54, 30 May 2024 (UTC)
 * Your request there reads like something an executive would leave on the voicemail of an assistant. It would help if you made a case for it. Otherwise the unspoken gut response is going to be "did you want fries with that?". Chuck Entz (talk) 15:09, 30 May 2024 (UTC)
 * I'm going to interpret your comment as, "make a case for the category's existence". The Theodore Roosevelt category should exist because there are more than a dozen words about, created by, or popularized by him.  The Wars category should exist because there are quite a few proper nouns that are Wars and contain the word "War" <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 16:20, 30 May 2024 (UTC)

Yeak Laom
I am a newbie at using the place template- could you please improve its formatting in this entry to enable better categorization? Thank you! Inqilābī 17:00, 2 June 2024 (UTC)


 * @Inqilābī It generally looks good to me. I made a minor tweak to add the words municipality and province for clarity. Communes aren't currently categorized separately. If you want lakes categorized, you could write it like this (it's not categorizing currently because it doesn't recognize the words volcanic or crater as qualifiers):
 * en.
 * Benwing2 (talk) 00:33, 3 June 2024 (UTC)

dont tread on me RfD...suggest immediate withdrawal
I think you really need to withdraw that RfD First off, why do you start the RfD with "an entry created by Purplebackpack89"? What the hell's that got to do with anything? It smacks of treating entries I create in bad faith. Please remember to assume good faith with all editors. Also, what research did you do on historical use of the term? Because I DID do research on the term and rendering it that way IS commonly used, it's not some random misspelling as you seem to allege. <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 23:31, 2 June 2024 (UTC)


 * @Purplebackpack89 If you disagree with an RfD, the procedure for doing so is not to demand that the creator withdraw the RfD. Instead you should post a response justifying the term. As for tagging you, I routinely do that in RfD's as a courtesy to the creator of the page that their term is being submitted to RfD or RfV. No bad faith is intended. Benwing2 (talk) 23:51, 2 June 2024 (UTC)
 * I've also provided rationale at the RfD. Please read it. <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 23:54, 2 June 2024 (UTC)
 * You're down 5-1 on this one...sure you don't want to withdraw? <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 21:11, 3 June 2024 (UTC)
 * Dude. Chill. For real. Benwing2 (talk) 00:00, 4 June 2024 (UTC)
 * or...you could admit that maybe that RfD was a bad idea? <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 05:12, 5 June 2024 (UTC)

Weird WingerBot bug with translations
Hi,

I just wanted to let you know that your bot seems to have made a strange formatting change on the entry left-wing (I reverted it) causing all the translations to show up in the gloss field. I'm not sure if this has happened in any other pages, but you might want to take a look. LOOKSQUARE (talk) 20:37, 4 June 2024 (UTC)


 * @LOOKSQUARE I think this was a one-off issue related to the strange formatting of the trans-top line. Looks like User:Fenakhay corrected the formatting. Benwing2 (talk) 03:05, 5 June 2024 (UTC)

Persian Audio labels
Hi, is it possible to add the parameter  to all audios recorded by User:Darafsh? And honestly all audios that are imported from Persian Wiktionary, Afghanistan is not as connected to the internet as Iran is, so it's pretty safe to assume all audios from Persian Wiktionary are in the Iranian dialect. The bot that is mass importing from Persian Wiktionary is not labeling any of the recordings (I don't believe it is your bot, though).

But, if you want to be safe, I suspect that almost all Persian audios are only recorded by a handful of people; If that's the case, could the bot like, make a list? Then I can listen to the audios recorded by that user to confirm if they are Iranian. — S AMEER  (؂・؄・؏) 03:47, 5 June 2024 (UTC)


 * @Sameerhameedy Sure. Are you looking for a list of all Persian audios, or a list of all users that created Persian audios (which would have to be based on the file names), or both? Benwing2 (talk) 03:50, 5 June 2024 (UTC)
 * I was hoping it was possible to compile a list of all user's that created audio files. Then I could confirm all users speak the Iranian dialect before the bot began to label what dialect they're speaking in the audio.
 * If that's not possible, I'm not sure what the best course of action is to reduce the amount of audios that need review... but the large number of unlabeled audios is a bit problematic — S AMEER  (؂・؄・؏) 04:21, 5 June 2024 (UTC)
 * @Sameerhameedy I recently made a dump of all occurrences of audio (661,680 of them) for cleaning up their captions. There are 1,028 Persian audio files. 701 of them come from Lingua Libre, from only four authors: Afsham23 (not all are labeled but the labeled ones all say "Iran" or "Iranian" in some variant, except for میخ that says Dari); Darafsh (most aren't labeled; the labeled ones say "Iran" or "Iranian"); Mazanin (same labeling issues as with Darafsh); and a single audio معماری from Soroor (Opsylac), labeled "Iran". The remaining 327 are anonymous, and mostly have the format ; about half are labeled "Iran" or "Iranian" and the others are unlabeled, except for and  labeled Tehran. If you want I can get you a sorted list of all the audio files so you can listen to a sample of them and verify that they're Iranian. Benwing2 (talk) 04:47, 5 June 2024 (UTC)
 * After listening to their audio recordings, I can confirm that all 4 people you mentioned had Iranian accents and all audio recordings by them can be safely labeled as "Iranian Persian".
 * Yes, if you could make a list I can give them a listen and try to pick out any that are not Iranian. Admittedly their might be some ambiguous cases if I don't have multiple audio samples per person, but I should be able to determine most. — S AMEER  (؂・؄・؏) 05:31, 5 June 2024 (UTC)
 * See User:Benwing2/fa-audio. Benwing2 (talk) 05:43, 5 June 2024 (UTC)
 * lol I'm really sorry, but my browser keeps crashing when I open the link. It looks like all the audios are there though, is it possible to remove all audios that are already identified? If not, it's fine. I'll try to open it on my computer or something, it should be able to handle it. — S AMEER  (؂・؄・؏) 06:05, 5 June 2024 (UTC)
 * Sure, I'll remove those that are already identified. Benwing2 (talk) 06:08, 5 June 2024 (UTC)
 * @Sameerhameedy See User:Benwing2/fa-audio-no-iran-1 and User:Benwing2/fa-audio-no-iran-2. I removed the audios identified as Iran or Iranian, which cuts out about 25%, and split the remainder into two. Let me know if you're able to open them. Benwing2 (talk) 06:12, 5 June 2024 (UTC)
 * yes thank you that works perfectly. Since we're having a bot add the labels, I'll just delete all the Iranian audios as I go through the list. I'll let you know when I am done. — S AMEER  (؂・؄・؏) 06:45, 5 June 2024 (UTC)
 * @Sameerhameedy Sounds good. Benwing2 (talk) 08:45, 5 June 2024 (UTC)
 * @Benwing2, Ive finished listening to all the audios. They are all Iranian speakers. I did upload some audios (of a family member speaking) a few days ago with dialect labels. You intentionally removed those right? Or did I miss them?
 * Anyway, we can safely label all those audios as Iranian. (though could the bot also make sure they are below fa-IPA and not above it?) — S AMEER  (؂・؄・؏) 23:41, 5 June 2024 (UTC)
 * @Sameerhameedy The list of audio files I generated finished on June 1, so if you uploaded anything after that, it will have been missed. I will label all the Persian audio files I found as Iranian (and make sure they are below fa-IPA). Benwing2 (talk) 03:01, 6 June 2024 (UTC)

Category:Pages using duplicate arguments in template calls
From audio. —Fish bowl (talk) 04:53, 5 June 2024 (UTC)


 * @Fish bowl Thanks. I just encountered these myself, will fix. Benwing2 (talk) 04:54, 5 June 2024 (UTC)

Creating automatic IPA module for Northern Mansi
Hey there! I thought I ask you for some help in creating the automatic IPA module for Northern Mansi, since I cannot code this type of module myself. I can lend you all the info about Mansi pronunciation. Ewithu (talk) 11:16, 6 June 2024 (UTC)

editing again
Hi Benwing,

RichardW57m commented in one of the discussions that "And adding mul to an existing entry is a valid process."

Is that the way to go? Or if there's a pattern of errors, should it be a consolidated mass request? kwami (talk) 00:47, 10 June 2024 (UTC)


 * @Kwamikagami It would be best not to mass-add the same rfc tag to a bunch of entries. Instead, make a list of all the entries you have issues with and bring that list to the Beer Parlour with specific proposals for how to fix the issues. Benwing2 (talk) 01:01, 10 June 2024 (UTC)
 * Okay, noted on my page. kwami (talk) 01:13, 10 June 2024 (UTC)

Split Yue and Split Wu
Hi, please have a look at these two threads. Are the splits ready? --kc_kennylau (talk) 13:24, 10 June 2024 (UTC)

Getting rid of the old category boilers
I'm going to do some reworking of Module:auto cat so that it no longer relies on keeping the old category boilerplate templates around, since none of them are used directly anymore so there's no reason not to just do everything via direct function calls. This affects poscatboiler, topic cat and ws topic cat. Theknightwho (talk) 14:45, 22 June 2024 (UTC)


 * @Theknightwho Agreed. I didn't even know about ws topic cat. Benwing2 (talk) 18:15, 22 June 2024 (UTC) Benwing2 (talk) 18:15, 22 June 2024 (UTC)

A favour
Hi Benwing,

I've been wanting to set up an auto-conjugator for Franco-Provençal. Unfortunately I don't quite understand how the ‘architecture’ you've set up for French works as it is a bit elaborate and spread across multiple modules/templates.

I was wondering, would you mind perhaps setting up a basic architecture for Franco-Provençal by recycling some of the code used for French? The French verb-endings (etc) can be kept as placeholders for me to replace/modify.

Best,

- Nicodene (talk) 06:18, 23 June 2024 (UTC)


 * @Nicodene The conjugator for French is in any case non-ideal; it was inherited from older code written by User:Kc kennylau and hacked on by many people. It is really messy and IMO it needs a complete rewrite along the lines of Module:es-verb, Module:pt-verb, Module:gl-verb, Module:ca-verb or the like (which are not split across multiple modules and templates). Unfortunately I don't know the first thing about Franco-Provençal and creating such modules is not as easy as merely copying another module and changing the endings; usually rather more fundamental changes are needed. On top of that, I don't even know if Franco-Provençal is standardized in its spelling; if not, that's a whole nother can of worms. If you point me to detailed (and ideally complete) information on how Franco-Provençal verb endings and irregular verbs work, I will take a look and will have a better idea how to create such a module, but I'm not guaranteeing it will get done on any specific time frame. Benwing2 (talk) 07:00, 23 June 2024 (UTC)
 * @Benwing2 I see.
 * There is a standard ‘supra-dialectal’ spelling system that is widely enough accepted to have been used e.g. for an article in the JIPA. A fairly comprehensive description of the spelling system is available here. The same source also provides, among other things, a description of Franco-Provençal morphology. I'll go ahead and write up a summary, in English, of how the verbs work. This will probably take a day or two. Nicodene (talk) 07:23, 23 June 2024 (UTC)
 * @Nicodene Great, that will be very helpful. Benwing2 (talk) 07:31, 23 June 2024 (UTC)
 * It's done. Conjugations 1 and 2 can be automated (apart from the handful of exceptions mentioned). For conjugation 3 the verbs will require their own individualized tables.
 * I've included a description of noun and adjective morphology as well, as well as orthographic consonant changes caused by changes in following vowels. This is already more or less handled by Module:frp-headword, which I cloned from the French counterpart and modified a bit, but perhaps it can be done more ‘cleanly’ or efficiently. Nicodene (talk) 22:33, 1 July 2024 (UTC)
 * @Nicodene Awesome, thanks! The way that the equivalent of conjugation 3 verbs are handled in French (i.e. irregular verbs in -re/-ir/-oir) is through principal parts, along with rules that default certain principal parts to others if not explicitly given. (Same goes for the Spanish, Portuguese, Galician and Catalan verb modules.) Probably the same can be done here. Benwing2 (talk) 22:46, 1 July 2024 (UTC)
 * @Benwing2 I see. I've yet to (definitively) complete the third-conjugation verb inflections but there does seem to be a fair amount of predictability.
 * You'll notice there is quite a bit of allomorphy. For instance ‘I would finish’ would be any of the following: fenirê, fenirên, fenitrê, fenitrên. In principle this could be condensed to feni(t)rê(n), at the cost of looking a bit unpleasant and also messing up any links. Alternatively we could perhaps shove some of the allomorphy into footnotes.
 * For the tables we already have Category:Franco-Provençal verb inflection-table templates as a starting point. All of these are incorrect in one way or another but the layout seems usable. Nicodene (talk) 23:02, 1 July 2024 (UTC)
 * Yeah the table layout looks fine, and similar to the layout of other Romance languages. As for allomorphy, that also happens e.g. in the reintegrationist Galician norm. See for a typical example, or  for an example with up to five possible forms (in the 2nd plural pluperfect). Since the source dictionary for reintegrationist Galician identified some forms as more or less recommended, we followed suit in putting the less recommended ones in italics with a footnote. If such a distinction is made here, we could do that too, or just list the possible forms as long as there aren't too many. Benwing2 (talk) 23:41, 1 July 2024 (UTC)
 * I suppose it's not too bad. We can go ahead and list the forms consecutively, in whatever order.
 * For the forms in question, the source doesn't seem to explicitly recommend one over another. My copy of Stich 2003 (a paperback dictionary) should arrive in a few weeks - if that has anything to say on the matter we can adjust accordingly.
 * Nicodene (talk) 03:44, 2 July 2024 (UTC)

WingerBot is changing language codes to the wrong code
Hi, as you can see in, while cleaning up entries to use the tcl template, your bot incorrectly changed two language codes (Central Nahuatl, nhn; and Czech, cs) to ncn, which corresponds to the Nauna language. Please correct your bot so it refrains from making such errors in the future. This edit happened a little over 7 months ago, so I don't know if any other pages were affected, but it seems probable.

Thank you. Brusquedandelion (talk) 06:40, 25 June 2024 (UTC)


 * @Brusquedandelion Thanks for the bug report. This was probably a one-off error; anything that says 'manually assisted' by it was done by me manually editing the respective pages offline in a text editor. In these cases there isn't a bot script responsible. Benwing2 (talk) 06:47, 25 June 2024 (UTC)
 * Gotcha. Thanks! Brusquedandelion (talk) 20:01, 25 June 2024 (UTC)

caricature form
Thinking about debbil, ebery, etc, it occurs to me that finding wording that will look grammatical in a range of situations (templates) is non-trivial. This would work in lb, but in pronunciation spelling of, it'd display "a caricature of Black speech English": perhaps you know how to make it instead display "a caricature of Black speech in " in that environment, to produce "a caricature of Black speech in English", or else I suppose suppress "speech" so it produces "...Black English"? Such a label also doesn't look good in altform's from=, but perhaps we just have to not use it in altform, unless you can think of a wording that would work there (either for use in that particular family of templates, or a better general-use wording); it does seem like most Remus-esque spellings are currently handled as pronunciation spelling ofs, not altforms(?). (I also considered whether to say "a caricature..." or "caricatures...".) - -sche (discuss) 20:47, 25 June 2024 (UTC)


 * There is support in Module:labels for changing the display form, Wikipedia link, categorization etc. depending on the "mode", where "mode" means how it's being called (whether from lb, accent qualifiers or form-of templates). You can see an example here Module:labels/data/lang/la which changes the display and Wikipedia link for the label when called from a or from the various a and aa parameters for specifying accent qualifiers. In this case, if you add a key, it overrides the value of  for form-of templates, so maybe you can add  or  or something. Benwing2 (talk) 20:55, 25 June 2024 (UTC)


 * Thanks for explaining how display-changing works. I can't think of a way to make both T:altform/T:altspell and T:pronunciation spelling of display sensibly, unless there is a way to send a different value to the former vs the latter:
 * if I set  then T:pronunciation spelling of can display "Pronunciation spelling of every, representing a caricature of Black English.", but T:altform displays "A caricature of Black form of every",
 * if I set it to  instead, then T:altspell and to a lesser extent T:altform display decently, "A caricature of a Black pronunciation spelling of every.", but then T:pronspell displays badly
 * I suppose this is not necessarily a problem: I'm sure we have other "labels" which are only intended for use in one place and not another (e.g. it'd make little sense to stick "RP" in T:lb). And on a balance, I guess we should prioritize having T:pronunciation spelling of display well, and consider that to be the correct template to use for these (and consider using T:altform to be substandard)...? I was given pause for a moment by the fact that these don't represent anyone's actual pronunciation, but I see we use T:pronunciation spelling of for other potentially irreal pronunciations, like representations of Canadian or German accents that don't always reflect how a Canadian or German accent actually sounds.
 * You can check how it looks now in ebery, and in T:lb in debbil. I think maybe putting it in from= is better than putting it in a label; what do you think? - -sche (discuss) 23:24, 25 June 2024 (UTC)
 * I think I agree with you that using from is better. However it's always possible (and not hard) to add another "mode" to distinguish pronunciation spelling of (and any other templates behaving similarly?) from altform. I don't think there are any labels currently using so there are no compatibility issues in any case to worry about. (In general I prefer having one label to represent a single semantic concept rather than multiple labels for different templates, where you need to remember which label goes with which template.) Benwing2 (talk) 23:40, 25 June 2024 (UTC)
 * OK. As far as I'm concerned, we can hold off on adding another "mode" for now, since I'm not sure it's needed here: my idea above for how to make T:altspell+from= display something grammatical basically entailed using the fact that the template generates "Foo spelling of bar" + supplying "pronunciation" as part of the from=, to make it display "...pronunciation spelling of...", which is . . . just making it an ersatz version of pronunciation spelling of, ha; maybe we can just consider using this label in T:altspell substandard, for now, and let the awkward way it'd display be a sign of that. - -sche (discuss) 02:58, 26 June 2024 (UTC)
 * OK that's fine. Benwing2 (talk) 03:19, 26 June 2024 (UTC)

Prakrit module errors
Hi! I found that non-mainspace pages like talkpages, user subpages, etc. using old codes like psu, pmh, etc. have module errors and have become unreadable due to their depracation in favour of new etymology-only codes of these Prakrit lects. Would it be possible for you to change these by bot like it was done for the mainspace pages? —Svārtava · 01:44, 26 June 2024 (UTC)


 * I can do this although it will take some time before I get to it. Can you make a list of the old and new codes? I think there may have been two sets of old codes because I rationalized the etym-only codes at a certain point. Benwing2 (talk) 01:54, 26 June 2024 (UTC)
 * Yes, you're right. Here's the list:
 * Ardhamagadhi Prakrit:  /   →
 * Helu Prakrit:  /   →
 * Khasa Prakrit:  /   →
 * Magadhi Prakrit:  /   →
 * Maharastri Prakrit:  /   →
 * Paisaci Prakrit:  /   →
 * Sauraseni Prakrit:  /   →
 * It's certainly not very urgent, you can take your time. Thanks in advance. —Svārtava · 03:25, 26 June 2024 (UTC)

T:hi-ndecl
Hello! Is it currently possible to have both the phonetic respelling of the lemma and the Perso-Arabic broken plural in the Hindi declension template hi-ndecl? It seems like the template only supports one or the other but not both.

For example, at the phonetic respelling is  and the Perso-Arabic broken plural is. Although Perso-Arabic broken plurals are usually rare in Devanagari-script Hindustani, it still might be useful to display them on the lemma entry.

In 1 the syntax for phonetic respelling is
 * wiki

and the syntax for broken plurals in 1 is
 * wiki

When I try to combine the two as
 * wiki

I see both
 * the phonetic respelling below the page title
 * and the automated transliteration below the page title

separated by a comma in the singular column as
 * ज़लज़ला, ज़लज़ला
 * zalzalā, zalazalā

Am I not using the syntax correctly, or does something need to be fixed in the template hi-ndecl or the underlying module MOD:hi-noun? Kutchkutch (talk) 09:53, 26 June 2024 (UTC)
 * Can you try this?

wiki
 * Benwing2 (talk) 18:08, 26 June 2024 (UTC)


 * Thank you! That expression produces the output that I was expecting. It seems like you did not have to edit anything. That is good since this has probably not taken too much your time, and the template and module already had this capability.


 * Although there are only a few users who have added Perso-Arabic broken plurals to
 * hi-ndecl
 * and CAT:Hindi nouns with irregular plural stem
 * it might be helpful to mention this examples at:
 * T:hi-ndecl/documentation
 * for future reference. Kutchkutch (talk) 05:18, 27 June 2024 (UTC)
 * Yup, I need to document the overrides. I'll use the above example as one; do you have any other examples of individual overrides? Benwing2 (talk) 05:20, 27 June 2024 (UTC)


 * Regarding manual overrides in general:
 * The only category that I am aware of to find existing manual overrides is:
 * CAT:Hindi nouns with irregular plural stem
 * and are the only Sanskrit learned borrowings in that category. The rest of the terms in that category are Perso-Arabic borrowings. For  and, the spelling of the lemma does not end in , but  appears in the spelling of the oblique plural stem.


 * Determining which declension patterns need examples at
 * T:hi-ndecl/documentation
 * may need some detailed consideration. If there are any interested Hindi editors, perhaps they could add important examples on that documentation page.
 * There are two phenomena that I can think of at the moment for which I do not know what the syntax should be to obtain the expected output.


 * Correct lemma auto translit, incorrect irregular plural auto translit:
 * For and, the automated transliterations of the lemma forms are correct. However, the automated transliteration of their irregular plurals
 * and


 * need to be respelled as
 * and


 * What would be the syntax for this? I tried
 * wiki
 * and wiki
 * but they do not produce the expected output.


 * See for additional context about  / :
 * In the literary language, animate nouns generally use the suffix ـان -ân


 * Multiple irregular plurals:
 * For &, the comma & space are included in the link. For example, the direct plurals of
 * and


 * have a single link for the entire list of forms:
 * wiki → नताइज, नताएज
 * wiki → जवाहिरात, जवाहरात, जवाहर


 * instead of a separate link for each form:
 * wiki → नताइज, नताएज
 * wiki → जवाहिरात, जवाहरात, जवाहर


 * needs to be respelled as.


 * For some context on these forms:
 * & are just alternative forms of the Perso-Arabic broken plural . However,
 * from are actually double broken plurals derived from the single broken plural
 * from where it says:
 * has become singular in Persian and so has its own plural forms despite its etymology as a plural form.
 * Kutchkutch (talk) 13:59, 27 June 2024 (UTC)
 * has become singular in Persian and so has its own plural forms despite its etymology as a plural form.
 * Kutchkutch (talk) 13:59, 27 June 2024 (UTC)

After your edit to and figuring out how to add multiple irregular plurals, the two issues mentioned above appear to be resolved. Kutchkutch (talk) 06:53, 28 June 2024 (UTC)


 * Great! I added an example of using but I will add some more. Benwing2 (talk) 06:56, 28 June 2024 (UTC)

Your undoing of my close was in error
There is no reason for the RfD at hawk tuah to continue. The IP who RfDed it hasn't even participated and no one has provided any rationale at all for deletion. Furthermore, no rationale exists to have it deleted via RfD at all; IDONTLIKEIT or ITHINKITSGROSS aren't rationales for deletion here. I ask you: what is the point of that RfD continuing?

Also, let's be perfectly honest here: if anybody but me had closed that, you wouldn't have undone it. You've somehow decided that I shouldn't be on this project and are attempting to harass me off it by undoing anything questionable. Please stop. <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 13:03, 7 July 2024 (UTC)


 * @Purplebackpack89 Dude. You need to Assume Good Faith or I *will* start blocking you. I have told you this before. No one but you speedy-closes RfD's after 4 days. I would recommend you not close any RfD's since you seem to believe all of them are in error and don't seem able to follow normal policy. Benwing2 (talk) 19:09, 7 July 2024 (UTC)
 * @Benwing2 I blocked them for 3 days for this, as they do this kind of thing in almost every single discussion they participate in, and it’s getting really disruptive. Check their contribution history, too: they make these kinds of threads on tons of users’ talk pages.
 * See their talkpage for more details on the block. Theknightwho (talk) 19:14, 7 July 2024 (UTC)
 * Bad and POINTY block by Theknightwho. "Almost every single discussion" is an exaggeration at best.  Blocking somebody for criticizing a close, or an undo of a close, is highly inappropriate.  Blocking somebody for feeling victimized is 1984 territory.  Criticism is disruption and criticizing an admin should NOT be a blockable offense.  If we blocked Theknightwho every time he criticized somebody or started a fight, he'd be inedeffed.
 * As for, "No one but you speedy-closes RfD's after 4 days.", you almost did that yourself...until you had to walk it back once you realize how hypocritical it was. <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 14:22, 10 July 2024 (UTC)
 * @Purplebackpack89 I blocked you for precisely the kind of behaviour you have immediately started engaging in again, which have now escalated to false accusations of hypocrisy: Ben did not speedy close any thread, and didn't even say they had considered doing so. If you carry on causing drama like this, I will block you for a week, because it is disruptive. Theknightwho (talk) 22:17, 10 July 2024 (UTC)
 * You will do no such thing. You will instead stop getting your jollies by blocking me and find something else to do.
 * FWIW, the edit I am referring to is right here <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 23:57, 10 July 2024 (UTC)
 * Can you not read? Wonderfool speedy-closed the term, not me. Benwing2 (talk) 00:02, 11 July 2024 (UTC)
 * My mistake. But if Wonderfool did it, shouldn't he have also been blocked 3 days?  Or blocked indef because he's a sock?
 * Anyway: the main thing is that both of you need to stop blocking me, or considering blocking me, over essentially nothing. You and I disagree?  Whatever!  That's not grounds for a block!  <b style="font-family:Verdana"><b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 00:15, 11 July 2024 (UTC)
 * @Purplebackpack89 That isn't why you were blocked. Theknightwho (talk) 00:24, 11 July 2024 (UTC)
 * @Purplebackpack89 The problem is not that you and I disagree. The problem is that your constant assumptions of bad faith on the part of everyone but you are extremely disruptive. Please read and, as a lot of the points there apply to you. I'm giving you a last warning: next time you assume bad faith towards someone, you will be blocked for a week. Benwing2 (talk) 01:02, 11 July 2024 (UTC)
 * @Purplebackpack89 The problem is not that you and I disagree. The problem is that your constant assumptions of bad faith on the part of everyone but you are extremely disruptive. Please read and, as a lot of the points there apply to you. I'm giving you a last warning: next time you assume bad faith towards someone, you will be blocked for a week. Benwing2 (talk) 01:02, 11 July 2024 (UTC)

Optional aliases for "[language] foos" categories
I was thinking about Category:Four-character idioms by language, since it's not a very intuitive name, but I know we struggled to come up with something language-neutral that was better. However, I don't think we need to constrain ourselves to that name when it comes to the language-specific categories themselves, where it would make more sense to use "chengyu", "yojijukugo" or whatever. Would it be feasible for categories like this to have aliases, so that languages can use a more appropriate custom name where applicable? As another example, having Category:Japanese Han characters instead of "Japanese kanji" is just silly. Theknightwho (talk) 21:43, 8 July 2024 (UTC)


 * @Theknightwho Well, there is support for language-specific categories currently. You could use that to simulate alias of sorts. For true aliases we might need a bit of coding; how do you envision them working? Would e.g. someone need to specify the category name as 'chengyu' or are you expecting that if the say it auto-redirects to 'chengyu'? Benwing2 (talk) 22:08, 8 July 2024 (UTC)
 * @Benwing2 So if you had category "foo" with alias "bar", you could create "Category:langname foos" or "Category:langname bars", but the latter would be treated as those it were the former for the purposes of subcategorisation and so on. I don't think it would be feasible to maintain some kind of special whitelist, so we'd just have to trust that people wouldn't create "Japanese hanja" or whatever. We could probably enforce this by doing existence checks against other possible aliases with auto cat. Theknightwho (talk) 22:23, 8 July 2024 (UTC)
 * @Theknightwho What do you mean "for the purposes of subcategorization"? BTW I'm not sure why whitelists or lists of the language-specific equivalents of given categories aren't feasible. We do something similar with pseudo-loan, which displays as pseudo-anglicism if the source is English, pseudo-Latinism if the source is Latin, etc. falling back to pseudo-loan from SOURCE for non-special-cased source languages. It even displays wasei eigo (appropriately linked) if the source is English and destination is Japanese. Benwing2 (talk) 22:42, 8 July 2024 (UTC)
 * @Benwing2 As in, "Category:Chinese chengyu" would be categorised as though it were "Category:Chinese four-character idimoms", "Category:Japanese kanji" as though it were "Category:Japanese Han characters" etc. etc.
 * The reason I'm not keen on a whitelist is that terms like "chengyu" and "kanji" potentially apply to whole language families (Sinitic and Japonic), so it's just annoying to maintain, and I don't think there's much harm in someone technically being able to create "Chinese kanji", since if they're messing about like that then they're unlikely to care about details like this. I doubt we will need aliases in more than a handful of situations, but the number of languages which will need to use them is potentially in the dozens. Theknightwho (talk) 22:48, 8 July 2024 (UTC)
 * Why not do the whitelist at the family level then? Benwing2 (talk) 23:16, 8 July 2024 (UTC)
 * I guess. I've got some bugfixes to do elsewhere, but once those are done I'll have a look at it today.
 * On a semi-related note, there's preliminary consensus for having a Chinese equivalent to Category:Japanese kanji by reading, which will probably need to be covered in a similar way. It's slightly complicated by a couple of edge-cases (hanzi with polysyllabic readings (e.g. ) and non-hanzi with monosyllabic readings (e.g., , ), so I'm not certain on the exact name yet. Theknightwho (talk) 12:07, 9 July 2024 (UTC)
 * On a semi-related note, there's preliminary consensus for having a Chinese equivalent to Category:Japanese kanji by reading, which will probably need to be covered in a similar way. It's slightly complicated by a couple of edge-cases (hanzi with polysyllabic readings (e.g. ) and non-hanzi with monosyllabic readings (e.g., , ), so I'm not certain on the exact name yet. Theknightwho (talk) 12:07, 9 July 2024 (UTC)
 * On a semi-related note, there's preliminary consensus for having a Chinese equivalent to Category:Japanese kanji by reading, which will probably need to be covered in a similar way. It's slightly complicated by a couple of edge-cases (hanzi with polysyllabic readings (e.g. ) and non-hanzi with monosyllabic readings (e.g., , ), so I'm not certain on the exact name yet. Theknightwho (talk) 12:07, 9 July 2024 (UTC)

A little guidance
Hi, if you have time, could you explain the easiest way to set up a conjugation module?? I made Module:User:Babr/fa-conj (though I haven't started actual conjugations yet), but I can't get the table to display. I was hoping to make a module that can create conjugations for Dari, Iranian, Kabuli & Tehrani with one input.

I think I can probably do it myself, I was just hoping you could explain how to set it up. If you have time, that is. — B ABR ・talk 00:30, 9 July 2024 (UTC)


 * @Babr I can try to help you. How familiar are you with Lua, and coding in general? Benwing2 (talk) 00:36, 9 July 2024 (UTC)
 * Hmm, well I understand some very basic elements of Lua like string functions. That's is to say, I'm definitely a beginner, lol. But, even with my very basic ability, I find replacing characters contextually is a bit easier to do in lua than it is to do with dozens of templates; Which is basically why I was hoping to make a luacized conjugation table. — B ABR ・talk 01:19, 9 July 2024 (UTC)
 * @Babr Well, you might start with MOD:hi-verb, which is about the most straightforward of the verb modules I've written. But even then there is significant work to do in creating a conjugation module and it goes well beyond just find and replace, because each language has its own system for how conjugations work and how to deal with irregularities, etc. Benwing2 (talk) 01:40, 9 July 2024 (UTC)