Wiktionary:Beer parlour/2023/April

The treatment of Proto-Austronesian?
(Just as a headsup I'm doing this merely as a reader pointing out the issues and inconsistencies in the Proto-Austronesian reconstructions, not as a fellow editor)

About_Proto-Austronesian states that “[e]ntry names should generally follow the scheme by Wolff.” and “Reconstructions by Blust should be generally alternative forms unless they a certain consonant does not appear in Wolff 2010”, but clearly the Category:Proto-Austronesian lemmas uses *S, which should instead be written as *s per Wolff, likewise for *N (Blust) and *ɬ (Wolff). Also note that the often-referenced Austronesian Comparitive Dictionary is by Blust and obviously follows Blust's reconstruction.

The quoted text was added in November 2021 by, but a lot of Proto-Austronesian entries were created before that and follows Blust; some newer entries follow the About page and hence Wolff (e.g. ), resulting in inconsistencies. Also pinging who seems to be the main editor of Proto-Austronesian.

(I should also mention that the order of listing descendants is often problematic – they should be nested, but not flat like in . A proper phylogeny of some sort should be established, but that is perhaps a separate discussion) – Wpi31 (talk) 10:35, 1 April 2023 (UTC)


 * Yes, I did added that line, but it's mostly due to some entries getting moved to follow the reconstructions by Wolff. It's worthy of discussion which reconstruction should be used on PAN entries, but I will agree PAN reconstruction should still use Blust's.
 * On the issue with PAN descendants list, there's really a lot of them so far as I'm concerned; it's mostly out of laziness. TagaSanPedroAko (talk) 20:07, 2 April 2023 (UTC)
 * Let's update the About page to follow Blust's notation. It is not perfect, but the common standard, and following Wolff would be insular. The only modification that most Austronesianists (including me) make is changing *e to *ə. Blust's "e" has been a source of much annoyment for users of his Austronesian Comparative Dictionary.
 * We have a phylogeny here for Austronesian that is pretty decent (except of my pet peeve, the hoax "Sunda-Sulawesi", and some wrongly placed languages, see User:Austronesier/sandbox), it just needs to be applied systemetically. –Austronesier (talk) 19:25, 7 April 2023 (UTC)
 * I agree. Blust's e should be ə, and R should be ʀ. Kwékwlos (talk) 19:55, 7 April 2023 (UTC)

Latin gerunds
Last year I did some cleanup work on POS headers and moved a bunch (~ 27) Latin gerunds from L2 "Gerund" to "Verb". just pointed out that Verb is the wrong header for these forms .However, "Gerund" is listed as "explicitly disallowed" in WT:EL. So, what should the header be? I found some prior discussion on BP, which didn't reach a conclusion. Jberkel 12:37, 1 April 2023 (UTC)


 * The 2019 discussion points out the problems for German gerunds, where the gerund is capitalized, but the verb is not. A summary of my point of view on Latin gerunds:
 * Latin gerunds aren't verbs: there is no number, no tense, no mood, and no conjugation. It has no characteristics of a Verb.  I can understand why Gerund isn't allowed for English, Spanish, or any modern language I'm familiar with, but for Latin Gerund is the best option.  But they are definitely NOT Verbs.
 * In English entries, gerunds are treated as nouns: e.g., "Walking is good exercise." In Latin, it's not that straightforward, and the model used at English walking fails to account for which senses of the verb transfer to the gerund, and make it impossible to organize quotations.
 * Latin gerunds also have inflected forms, so there needs to be a lemma form that all the inflected forms can point to, and where the table of forms is listed. --EncycloPetey (talk) 15:39, 1 April 2023 (UTC)
 * Why not Participle? That is the standard header used for things that are combinations of verbs and nouns/adverbs/adjectives. Benwing2 (talk) 17:23, 1 April 2023 (UTC)
 * I think it's confusing to use "participle" to describe a gerund in Latin when the word participle usually refers specifically to another verb form.--Urszag (talk) 21:15, 1 April 2023 (UTC)
 * I'm not very impressed by the argument that "It has no characteristics of a Verb". The gerund is not a finite verb form, so it isn't surprising that it has no mood, no person conjugation, etc. However, the gerund can take some of the same kinds of complements and adjuncts that finite verbs can, such as a noun in the case governed by the verb (although rarely a direct object, since that is usually used with the gerundive instead) or an adverb. It is similar in usage to the infinitives, and we put the infinitives simply under the heading "verb".


 * As for English, it is poor use of terminology to refer to "walking" in contexts like like "did assent to these walkings" as a gerund. Those kinds of words are not in fact gerunds; rather, they are derived verbal nouns (also called "deverbal nouns" or "gerundial nouns") with the same form as the gerund and present participle: a comparison in Latin would be the derived -tus or -tiō nouns. A gerund does not take an article or adjectival modifiers and can take a direct object, while a gerundial noun cannot take a direct object but can be used with an article or adjective(s) (or can be pluralized). E.g. "making" is a gerund (and thus a verb, not a noun) in "Making this machine is a difficult task"; it is a gerundial noun in "The making of this machine is a difficult task".--Urszag (talk) 21:15, 1 April 2023 (UTC)
 * I agree with Urszag. It is not true that the Latin gerund "has no characteristics of a Verb", as the Oxford Latin Syntax also notes explicitly on vol. 1, p. 59: "When one compares the behaviour of [the gerund] ulciscendi and [the verbal noun] ultionis in more detail, the verbal properties of ulciscendi become more apparent (it is found with a second argument, as in ['qui ... cupiditate inimicos ulciscendi arderent']), while its nominal properties appear to be more restricted." "Verb" (as a set of verb forms) or "Participle" are both fine IMO. —Al-Muqanna المقنع (talk) 22:11, 1 April 2023 (UTC)
 * Another indication of POS is that the first takes an adverb: "Correctly making this machine is a difficult task", while the second takes an adjective: "The correct making of this machine is a difficult task". Chuck Entz (talk) 22:18, 1 April 2023 (UTC)
 * In the current Latin verb conjugation tables, gerunds are described as verbal nouns, and they decline like nouns. --EncycloPetey (talk) 03:46, 2 April 2023 (UTC)
 * I still maintain we should use the header "Participle"; Latin has lots of participles, gerunds are one of them. I don't get your argument at all. If there were a single form called "participle" in Latin that would be different, but Latin has numerous types of participles, including some that don't normally have the word "participle" in their names, e.g. gerundives. Labeling them as "Verb" is even more confusing IMO because it implies that they are verb forms, and verb forms don't normally inflect at all; the whole point of having "Participle" as a separate POS is to express the fact that they are non-lemma forms that (typically) have their own inflections. Wiktionary's POS headings are intentionally cross-linguistic (that's why language-specific headings like "Gerund" aren't normally allowed), and cross-linguistically gerunds are closest to participles so we should use that header. Benwing2 (talk) 07:31, 2 April 2023 (UTC)
 * Also, if gerunds are verb forms, then what are inflections of gerunds? Verb form forms? This is nonsensical, but it makes perfect sense as "participle forms". Benwing2 (talk) 07:33, 2 April 2023 (UTC)
 * It's only nonsensical if you introduce a Platonic gerund form that they are inflections of, but there's no reason to do so—if we wanted to we could remove the gerund lemma and treat them all directly as inflections of verbs. The inflections of gerunds are just verb forms. The Oxford Syntax similarly just calls the gerund a paradigm of four verb forms. Like I said, though, my own view is that either "Verb" or "Participle" works, but it sounds like "Participle" is probably the easiest option for people to grasp. Since (as Fay Freak points out) you yourself wrote the incorrect description of the Latin gerund as a verbal noun and the Appendix pages are generally poorly maintained, I don't think that particular point is relevant. —Al-Muqanna المقنع (talk) 14:19, 2 April 2023 (UTC)
 * Yes, I wrote that Wiktionary text, but that description did not originate with me. See Wheelock's Latin, 6th ed., ch. 39, p. 276: "The "gerund" is a verbal noun...".  Also Latin via Ovid, 2nd ed., sxn 162, p. 282: "The gerund is a verbal noun..."  Also Allen & Greenough's New Latin Grammar, sxn 155: "The following Noun and Adjective forms are also included in the inflection of the Latin Verb:— [...] b. The Gerund: this is in form a neuter noun of the second declension." This section lists the Participles in a. (as have been discussed here) and in c. the Supine as a verbal noun. --EncycloPetey (talk) 17:36, 2 April 2023 (UTC)
 * Words like "participle" are used lots of ways. It's possible that the Latin gerund has at some point been called a participle (or in Latin, a participium); it does literally "partake" in both the nature of a verb and a noun. But I have never heard it called a participle or listed among the participle forms of Latin, and I have read explanations of Latin grammar that explicitly treat the gerund as a separate category from participles. (The gerundive, as you said, is sometimes considered to be a participle.) The word participle is most often used to refer to verbal adjectives, not verbal nouns. Latin has various participles, but the present participle, perfect participle, future participle and gerundive all share in common that they are verb forms that are used "like an adjective". This web page says "It is important to keep in mind the difference between a participle and a gerund".


 * I don't see why "verb forms don't normally inflect at all" should be considered some kind of inviolable rule. The inflectional categories of verbs differ greatly between languages and it isn't uncommon for there to be multiple intersecting types of inflection on verb forms (e.g. a verb form might simultaneously be inflected for mood, tense, person and number). Let's look at other languages with case inflection for comparison. Finnish has some infinitives that are inflected for case; we list them in the verb tables and under the heading "Verb" (e.g. puhumaan "illative of third active infinitive of puhua", puhumatta "abessive of third active infinitive of puhua"); some are listed separately both under the heading "Verb" and "Noun" (e.g. puhuminen). In other languages, such as Irish and Arabic, we apparently list verbal nouns just under the header "Noun" (e.g. bualadh, منظر). Are there any languages where we list a verbal noun under the header "Participle"?--Urszag (talk) 08:45, 2 April 2023 (UTC)
 * Also EncycloPetey wrote on Appendix:Latin gerunds in 2010 that Latin gerunds are verbal nouns.
 * Makes sense, since apparently the Arabic maṣādir, that are called Arabic gerunds, are also verbal nouns—also having the property of possibly, though but rarely, as in Latin, taking a direct object in the accusative. And we definitely rightly categorize them as nouns, also for their formal relation to other forms that are mere nouns without such functions: like is well a plain noun without verb-like function even though a correct verbal noun, while other words of the same pattern as verbal nouns are only nouns, and there is nothing what would make verbal nouns morphologically distinct from other nouns bar non-base stem derivations.
 * Similarly, the “German gerund” is a verbal noun and a mere noun in another sense. Thus Latin gerunds should have the POS header “noun”, while “gerundives” are adjectives.
 * “Classifying” as “gerund” here is equally inconsequential as classifying as “participle”. I have also misused “participle” for Arabic active and passive participles once, as on ; they merit the header “Adjective”. German Wikipedia sees the analogical relation: “Die Bezeichnung ‘Partizip’ und ebenso die deutsche Bezeichnung Mittelwort bringen diese Eigenart zum Ausdruck, an zwei Kategorien zugleich teilzuhaben, nämlich Verb und Adjektiv. In ähnlicher Art gibt es auch Zwischenstufen zwischen Verb und Substantiv, die als Gerundien bezeichnet werden.” In either case the properties the former verb is converted into are decisive for lexical class.
 * Due to this like situation, we may also criticize that WT:EL disallows the header “gerund” but allows “participle”: both are dispensable, as  shows; we have the “participle” used in periphrastic tenses as “verb”, then the gerund as ”noun”, then again the participle as “adjective”. “Participle” and “gerund” are terms to describe certain morphological derivations of verbs (they share forms in English, syncretism), so they are useful for breviloquent definitions and etymologies but do not ultimately represent word classes. Fay Freak (talk) 09:22, 2 April 2023 (UTC)
 * Appealing to other languages here (including English) might confuse more than it helps. Regardless of the conventions of Arabic and German grammar, and the conflation of the two in non-specialist sources, in Latin verbal nouns and gerunds are two distinct things. See also the section in Spevak's Nominalization in Latin comparing and contrasting the two. —Al-Muqanna المقنع (talk) 14:26, 2 April 2023 (UTC)
 * The words for part of speech don’t have different senses, at least on Wiktionary, as Benwing has already worked out above: “Wiktionary's POS headings are intentionally cross-linguistic”. Even if we were to acknowledge different senses in the worldwide practice of linguistic terminology then we would be inclined to even out the differences and standardize the meanings for Wiktionary because people would rightly be confused by header names having different meanings between languages. So this is no well-thought argument. Fay Freak (talk) 14:44, 2 April 2023 (UTC)
 * I'm not talking about whether the labels are being used differently, I mean the actual phenomenon we are classifying. The fact that two things refer to one phenomenon in some languages has no bearing on whether they're separate phenomena in another: how Latin lemmas are classified ultimately depends on Latin syntax. Wiktionary's POS headers are indeed cross-linguistic, I agree. —Al-Muqanna المقنع (talk) 14:59, 2 April 2023 (UTC)
 * 'Gerund' is therefore inappropriate as an L2-header - its sense ranges massively across languages, even across Indo-European languages. It can be the verbiest verbal noun, as in usage for Latin, an adverb as in Russian, having a participial tinge, and effectively an otherwise uninflected subordinating verb as in Sanskrit and Pali, for whose usage alternative words such as 'absolutive' have been coined. --RichardW57m (talk) 10:42, 3 April 2023 (UTC)

Old Telugu, Old Malayalam, Proto South Dravidian, Proto North Dravidian, Proto Central Dravidian and Hindustani
Can they get codes? Also many places consider Kumarbhag Paharia and Sauria Paharia to be a single language Malto (even wikipedia) so can they be made into one?

Kannada script is recent and during Middle Kannada and Old Kannada period, Kadamba script (and southern Brahmi?) were used but its shown as the modern Kannada script was used for old Kannada AleksiB 1945 (talk) 14:09, 8 March 2023 (UTC)

Talk pages or discussion pages?
Wikipedia has talk pages, Wiktionary has discussion pages. Right?

So why Help:Talk_pages: "A talk page is a special Wiktionary page containing discussion about the contents of its associated "subject" page. To view the talk page of an article, click on the "discussion" tab at the top of the page." It seems unnecessarily confusing.

And why lock the talk/discussion page for that Help article?

—DIV (1.145.110.167 12:35, 2 April 2023 (UTC))


 * From the next paragraph in the lede to Help:Talk_pages:
 * "It is highly recommended to use the various Wiktionary Wiktionary:Discussion rooms to talk about entries instead, as those receive much higher traffic in general." DCDuring (talk) 16:26, 2 April 2023 (UTC)
 * Yes, I am aware of that.
 * I don't think that it has addressed any of my questions/comments here though.
 * —DIV (1.145.125.236 12:07, 4 April 2023 (UTC))
 * There is only one question that I can understand above. Answer: Help talk:Talk_pages is locked to unregistered users because such users vandalized the page and no one else used it. DCDuring (talk) 12:25, 4 April 2023 (UTC)


 * @DIV I see what you mean. I always understood "discussion page" and "talk page" to be interchangeable. Wikipedia's help page about "talk pages" says they are also called "discussion pages". Wiktionary still has a "Talk" namespace rather than a "Discussion" namespace. (This is why the discussion page for is titled Talk:pin rather than Discussion:pin.) &mdash; excarnateSojourner (talk &middot; contrib) 05:05, 12 April 2023 (UTC)

Talk/Discussion page conventions/tips
I suggest it may be helpful to introduce a suggestion for the (sub)headings of comments in the Discussion/Talk pages that marks more specifically the part of the entry being referred to.

For instance, headings such as would help direct attention at a glance.
 * "en:noun antonyms", or
 * "en:noun How about pureness?", and
 * "fr:verb attestation" or
 * "fr:verb Usage on daytime TV soap operas!"

(Of course, although there are more than a dozen languages in the entry for vice, on its Discussion page there's currently a grand total of one brief comment — and a brief response — dating back to more than a decade ago. So who knows.)

There's nothing about that currently at Help:Talk_pages. I wouldn't imagine it to be something to be stringently enforced, but including it as a tip might help it to catch on.

—DIV (1.145.110.167 12:50, 2 April 2023 (UTC))


 * @DIV I wouldn't oppose establishing a pattern that editors may use when they think it would be helpful, but I would oppose expecting it to be used universally, because of the substantial proportion of singular-sense entries. &mdash; excarnateSojourner (talk &middot; contrib) 05:12, 12 April 2023 (UTC)

Etym-only languages and where do they stop

 * Previous discussion: WT:RFM

I have always been under the impression that etym-only codes are specically created to be used in etymology sections, as a way to categorise various historical stages of a language in derivational categories: For instance, a French borrowing from Medieval Latin is something completely different from a borrowing from New Latin. Since this is purely used to differentiate between borrowings from different stages, which are handled differently by the acceptor languages, there's no need to use them in any other way. This also makes the number of etym-only codes relatively small: There are only a couple of language with a rich literary history where borrowing was perpetual, and a couple of Proto-language donors where it's useful to merge them but distinguish them in etymologies.

However, apparently etym-only codes are also widely used in descendant sections, which in my opinion beats the point: If the languages are dissimilar enough, then you should create a full-fledged code, if they aren't, then you should use labels and qualifiers in descendants, and in the case of dialects, you could just omit them from the descendant section and only give them at the lemma form. If we are to use etym-only codes in descendant sections, and in links, then why would we not create an etym-only code for all dialects? How is, say, Midieval Latin any different from Szczeczin Polish with regard to descendant sections? Why would one deserve its own code and not the other? Obviously it's untenable to create a code for every single variety of every single language, because we don't have room for over 7 billion language codes, nor is it useful.

So my question is, where do we draw the line? I personally argue that etym-only codes should stay where they are, i.e. in etymology sections, and just not be given in descendant sections at all, and also be restricted to significant historical stages that were donors of a good amount of borrowings that are distinct from borrowings from other stages. I'm curious what other people think. Thadh (talk) 14:12, 2 April 2023 (UTC)


 * I'm very curious on it as well. Do we really need to list separate L2 in etymology including Pannonian Rusyn, Non-Romani Traveller Norwegian, Lower Dalecarlian, Northern Raba-Prekmurian, Non-Polish Silezian, Late-Proto-Balto-Slavic and so on? Or can it be solved by redirects into more common entries with alternative forms listed? User:ZomBear and User:Benwing2 were also talking on thus subject now recently. Tollef Salemann (talk) 14:34, 2 April 2023 (UTC)
 * I think this highlights the issue with how etymology-only languages are implemented almost haphazardly. In some cases, they have to appear in descendant sections to show the history of the word properly, primarily when there's another descendant from the etymology-only language. The biggest example I've seen of that, other than Early Modern Korean which we've since made a full-fledged L2 for these reasons, is the case of Saint Dominican Creole French (ht-sdm) & Haitian Creole (ht). Since Haitian Creole is descended from Saint Dominican Creole French, but the latter is only an etymology-language connected to the former, when showing the descendants in entries like French, it'd be extremely weird to show "Haitian Creole" descending from "Haitian Creole" , thus the etymology-only language is used. I'd personally argue that Saint Dominican Creole French should be its own L2, but it's also not the only example of this, and with our current status quo, it's a legitimate reason to have etym-only languages appear in descendant sections. AG202 (talk) 14:44, 2 April 2023 (UTC)
 * I would just suggest omitting Saint Dominican Creole French from the descendant sections (so zòrèy < oreille), and have the SDCF form nest on instead (as an altform with a qualifier). It's pretty weird to say that Haitian Creole inherited from a term that redirects to Haitian Creole anyway; That said, making SDCF an L2 does seem sensible to me as well. Thadh (talk) 14:51, 2 April 2023 (UTC)
 * It feels inaccurate to say that "zòrèy" comes directly from "oreille" though as there is an intermediate form that is markedly different. The SDCF word is also not an alt form, but a direct ancestor. It'd be as if our current policy were to have Middle French as an etymology-only language (which some folks have argued before) and we listed MF terms as alt forms of modern French, it just doesn't feel the most accurate per our practice on alternative forms. I do agree though that it is weird with the redirect situation though, which is why I think it should be an L2. AG202 (talk) 15:01, 2 April 2023 (UTC)
 * @AG202 Old Latin > Latin > Vulgar Latin > Old French > Anglo-Norman > English. Loads of English entries have at least one of those three etym-only inheritances. Theknightwho (talk) 15:05, 2 April 2023 (UTC)
 * Yes I've noticed this before, though I'm not sure exactly what this reply is wanting to show. AG202 (talk) 15:26, 2 April 2023 (UTC)
 * @AG202 That this isn't some edge-case issue, and affects a very large number of entries. Theknightwho (talk) 15:28, 2 April 2023 (UTC)
 * oh yes, I thought I made that clear with saying that this is the biggest example I've [personally] seen of it and "but it's also not the only example of this, and with our current status quo, it's a legitimate reason to have etym-only languages appear in descendant sections", but to reiterate, yes, etymology-only languages, if they have direct descendants, should continue to appear in descendants sections. AG202 (talk) 15:31, 2 April 2023 (UTC)
 * @AG202 I think I might have miscommunicated here - I was just supporting your point, is all! Theknightwho (talk) 18:08, 2 April 2023 (UTC)
 * Any language has intermediate forms, we're just lucky to have it attested in this case. We can mark this in the etymology section ("From earlier ..."), but if you compare European languages, like French, we don't give all intermediate spellings in the etymology section, because that would quickly become too cluttered to be useful. For example, we know over four different ancestral forms to the word (lemelle > alumelle > alumecte > amelette > omelette), and this is a pretty rare word. Imagine what we would find with a word like  or ! Thadh (talk) 15:07, 2 April 2023 (UTC)
 * I mean, yes, but that doesn't really feel like the same situation... It's not just an intermediate spelling, but a different intermediate language/stage entirely as defined by the linguists that work with them. It's not just an alternative spelling. AG202 (talk) 15:26, 2 April 2023 (UTC)
 * Again, I wouldn't be against splitting off SDCF, since it's obviously attested and very different from its descendant. If we don't however, we are saying that SDCF is Haitian Creole, which means that any of its forms is an obsolete/archaic spelling variant, which is exactly the same as French spellings. Thadh (talk) 15:50, 2 April 2023 (UTC)
 * But Pannonian Rusyn is a dialect, not a parent language Tollef Salemann (talk) 14:54, 2 April 2023 (UTC)
 * @Tollef Salemann We shouldn't be omitting information for the sake of policy. It should be the other way around. We certainly shouldn't be listing the intermediate form separately on the tree, which is actively misleading. Whether SDCF should be a separate L2 is a completely separate question to how we display our descendant sections, in my opinion. They're two different things. Otherwise, you're going to mess up every instance of a Latin form inherited from Old Latin, and we don't want Old Latin to be a separate L2. Theknightwho (talk) 14:56, 2 April 2023 (UTC)
 * Cool! But in this case, are Rusyn and Pannonian Rusyn descendants from Old Rusyn? Tollef Salemann (talk) 15:02, 2 April 2023 (UTC)
 * @Tollef Salemann That question seems more appropriate to the original thread on WT:RFM. This thread is about a much broader question that affects hundreds of languages. Theknightwho (talk) 15:06, 2 April 2023 (UTC)
 * Descendants sections and etymology sections are two sides of the same coin. Fay Freak (talk) 14:48, 2 April 2023 (UTC)
 * So I'm of the strong opinion that we should be using these in descendant sections, because they provide a consistent method for giving specific lects. Using qualifiers doesn't allow for this, and in any event would make descendant sections inconsistent with their corresponding etymology sections. For example, a term like shows that  inherited from (Classical) . How could we show that without using the etym-only language for Late Latin? I don't see why we should use a separate label such as {[q|Late Latin}} either, which is just a recipe for confusion and miscommunication - not to mention the fact that it won't be consistent between entries. How would you handle a term which had the tree Old Latin > Classical Latin > Vulgar Latin > Romance languages AND > Late Latin > Medieval Latin > New Latin? It would be a complete mess to use labels. Plus, why should we do that when we don't do it in etymology sections themselves?
 * I also think it's completely unrealistic to give each of these their own L2. There is often no impetus to do that, because the etym-only variety might only differ in a specific way, despite being considered the same language.
 * To me, what is a bigger concern is that I don't think we have a coherent idea of what etymology languages are for. Comparing two examples:
 * We have Latin and Ancient Greek, which each have numerous etym-only languages that mostly represent different stages of the language, or different regional varieties. Every descendant of these languages is obviously descended from a particular stage of the language: for example, Old Latin > Classical Latin > Vulgar Latin > Romance languages (leaving aside any debates on the specifics of that for now). What is important is that you cannot say that a modern Italian term was inherited from New Latin, for instance. However, I could type the (total nonsense) it on an entry and get "", which does not throw an error. However, at the moment it's not technically feasible to set (e.g.) Vulgar Latin as the proto-language of Romance languages, which would restrict Romance ancestors to Vulgar, Classical and Old Latin, because that would cause things like it ("") to throw an error. Instead, it would be necessary to put something like it, which is - in practical terms - unrealistic to expect of editors.
 * On the other hand, we do already do this for certain languages like Tajik, where we specify that its ancestor is Classical Persian. This means that tg ("") is totally fine, but tg (inheriting from "Persian") will throw an error, because  on its own refers to the modern language.
 * Perhaps this is a mistake, but at the moment this feels like we're treating etymology-only languages in two different ways, and that we need to have a better idea of exactly what they represent. Theknightwho (talk) 14:55, 2 April 2023 (UTC)
 * The way to handle barca deriving from baris is simple: Just give la - it's still Latin, after all (it doesn't matter whether this derivation was made in Classical or Late Latin, since it's still a derivation in the same language, according to our model). Or better yet, just don't give it at all, it's a derivation, it's not like we're going to show all derivations of in all the various languages in the descendant section of, we'll just give the one form and direct the user to see the derivations and descendants there. Thadh (talk) 15:02, 2 April 2023 (UTC)
 * @Thadh Shouldn't you also be arguing to delete them from etymology sections, too? What makes no sense to me is the inconsistency. Theknightwho (talk) 15:09, 2 April 2023 (UTC)
 * Their value in etymology sections is completely different from descendant sections:
 * If French borrows from Medieval Latin, a completely different set of sound laws applies than if it borrows from Late Latin.
 * However, it's still a borrowing from Latin. This is why if you say that it's a borrowing into a particular language, that language is Latin.
 * I have a much bigger problem with denoting inheritance than borrowing in descendant sections though. Thadh (talk) 15:14, 2 April 2023 (UTC)
 * Why would the sound laws only matter in one direction? I'm not following the logic. —Al-Muqanna المقنع (talk) 15:21, 2 April 2023 (UTC)
 * Likewise. It seems nonsensical to say we can only give finer detail like this in etymology sections, or to say we should separate off the intermediate term as an alt form or whatever. Imagine doing that for Proto-Sino-Tibetan terms with all the Chinese lects - it'd be chaos. Theknightwho (talk) 15:24, 2 April 2023 (UTC)
 * Let's not mix up two completely different things here. How Chinese is handled comes nowhere close to historical stages' handling or even dialect handling. Chinese is a very unique situation, because it's one writing system for numerous languages, which has absolutely nothing to do with Late Latin, Middle Russian or Pannonian Rusyn. Thadh (talk) 15:47, 2 April 2023 (UTC)
 * @Thadh That is a cop-out, because we shouldn't be carving out an exception for Chinese. Isn't your whole concern that the different dialects use the same spelling, which makes it directly analogous to the Chinese situation? Plus, if you actually look at the descendants for PST in detail, you'll see that they split into many more branches than we have L2s. Theknightwho (talk) 17:49, 2 April 2023 (UTC)
 * No, that's not at all my concern, my concern is that we try to handle lects as both languages and dialects at the same time. And adding branches without a code is also not a problem, a problem is making codes when there is no need in them. I'm completely lost in what you're trying to argue for here and it seems you're just trying to make everything seem logical when it doesn't. Thadh (talk) 18:03, 2 April 2023 (UTC)
 * @Thadh The code provides a consistent name for the lect, and ensures that anything specific to that lect is applied to that term (e.g. different transliteration or whatever). Why do you want to make that less convenient? You are quite literally the only person arguing that we shouldn't be using etym-only codes here. Theknightwho (talk) 18:06, 2 April 2023 (UTC)
 * There needs to be a balance between convenience in handling and simplicity in the model.
 * Our original model was simple: We have a language, and we have varieties that are part of this language. These varieties are not noted anywhere, because they are nonstandard, but are instead given soft redirects and noted on the standard lemma form.
 * The inclusion of etym-only codes was a way to make the model slightly more difficult but to ensure that our etymologies became more exact: If you say a term is borrowed from Latin, it's much less specific than if it's borrowed from Medieval Latin. These codes are thus only there as specifiers for categories and borrowings.
 * Now, currently, two new uses of this system have arosen: The usage of etym-only codes for contemporary varieties (Pannonian Rusyn) and the usage of etym-only codes in descendant sections. I have the following concerns with these:
 * The creation of codes for contemporary varieties without clear etymological motivation means that we can add a code for any contemporary variety (Szczeczin Polish), because why would Pannonian Rusyn be of more interest than Szczeczin Polish? And a model where every single variety of every single language can be indicated is too complex to handle.
 * The addition of codes to descendant sections implies that we do not handle the etym-only code as the same language as its L2, which makes the whole idea of etym-only codes useless. If they're not the same language, why do we handle them as one? And if they are, why would we indicate them differently in descendant sections? If you borrow a term into Medieval Latin, you're automatically borrowing it into Latin; Then what is the point of giving the "Medieval Latin" name instead of "Latin", if it's all Latin, and the L2 name is also Latin?
 * We couldn't possibly handle Chinese in a way where there is one standardised language under which all varieties are nested (partly because of the political mess we'd be getting in, and partly because the varieties are so dissimilar, that it beats the purpose). In the case of pretty much any other language, however, if the two varieties are too dissimilar to be nested under one L2, we can split them. If we don't, that means that they are similar enough to handle in the standard/variety model.
 * Note also the meaning of the word transliteration: We aren't supposed to give any phonetical details in it, because that's not what it's for, we have the pronunciation section for that. It's supposed to help readers that do not know the script, not to give them an accurate representation of the phonological shape of that language. Thadh (talk) 18:31, 2 April 2023 (UTC)
 * @Thadh The first concern is a theoretical one, not a practical one: if there is a desire to have a code for one lect, that does not necessitate creating codes for every other possible lect right away. We create them as and when they're desired. If that starts to cause problems, then we cross that bridge when we come to it. However, I don't think it will for a long time.
 * Your second concern simply doesn't make sense to me - why shouldn't we explain that a term was inherited through different stages of an L2 that we happen to group together? We recognise the boundary of Middle English and Modern English, so why not the boundaries that exist between any of the lects which we group together under Latin? There are practical reasons why we might group them under one L2 or not, but that doesn't mean we can't recognise the very real fact that a language is an ever-evolving process, and that it's possible to give finer detail in some cases.
 * With regards to Chinese, you're ignoring the fact (which I have already pointed out to you!) that Proto-Sino-Tibetan descendants frequently list far more descendants than we have L2s. It is irrelevant that "Chinese" covers several L2 codes, because the point still stands even if we did away with that. We absolutely do list pronunciations in descendant sections, too - it's arbitrary to forbid this.
 * As for transliteration, you've missed the point (which I have already pointed out to you twice!), which is that sometimes it is desirable to have separate transliteration systems for etym-only codes, like Classical Persian. This might also apply to regional varieties of a language like Korean, too, wbere different romanisation systems exist depending on the country. Theknightwho (talk) 18:51, 2 April 2023 (UTC)
 * Solving problems when we come to it is just bad management. We should limit the use of the codes now before we run into problems with them.
 * You're acting as if handling of the languages in descendant sections and under L2s are completely separate things. They're not. They're based on exactly the same parameters, namely: Are these languages similar enough, is it practical to handle them as separate languages. I'm running out of things to explain this to you.
 * And you're also acting like whatever is in use now is dead-set and everybody is fine with it. If there are varieties given in the Sino-Tibetan descendant lists that do not have their own L2s and do link to anything, then they should imo be removed as well. There's nothing special about Sino-Tibetan languages.
 * And I haven't missed anything, I've addressed this issue already: Why would a transliteration system be different for variants than it is for the standards? They are still the same characters! Thadh (talk) 19:02, 2 April 2023 (UTC)
 * @Thadh It is inane to oppose the use of etymology-only codes because of the hypothetical issue that we would need to create too many of them to handle. The fact is that we are already using them in descendant sections, and they are far outnumbered by regular codes at the moment anyway. Unless you have plans to create several thousand of them soon, I do not foresee any issue in the short or medium term. I don't consider this a serious objection.
 * Your assumption about descendant sections and L2s being based on the same parameters is really faulty, in my opinion. The fact that and  are different is obviously relevant in the descendant section of, whereas the question of whether Late Latin should be a separate L2 is one where we need to consider the lects as a whole. It's a question of a systematic difference, and not really relevant to an individual descendant section.
 * Here's the difference: in a chain of inheritances, it should be mandatory to include every L2 in the chain (where possible). However, we generally only want to include the finer detail given by etym-only languages where it's actually relevant.
 * Even aside from that, though, the fact that you want to remove large amounts of information because it doesn't fit your schema suggests that you're coming at things from the wrong angle. We are descriptive, which means that we adapt to the facts, instead of pretending they don't exist because we don't like how complicated they are. Theknightwho (talk) 19:30, 2 April 2023 (UTC)
 * I'm not trying to remove any information, I'm just trying to relocate it and to handle it in a way that is not untenable in the long run. Thadh (talk) 19:42, 2 April 2023 (UTC)
 * Because Latin doesn't have defining sound laws in that period. French does. Whether French borrowed from Late Latin or Medieval Latin is only of importance to the French entry, since it describes at which point the word was borrowed and thus which sound laws applied to the original word. Thadh (talk) 15:44, 2 April 2023 (UTC)
 * Could you please give an example of where this issue applies? It also doesn't seem relevant to most situations involving etym-only languages, where the particular lect is relevant due to changes inside the language itself. Theknightwho (talk) 17:56, 2 April 2023 (UTC)
 * Just my 2p but since they're specifically not being treated as separate L2s it makes sense for etym-only languages to be consistently handled as proper subsets of their parent languages, so the first, more permissive option should be standard. —Al-Muqanna المقنع (talk) 15:12, 2 April 2023 (UTC)
 * @Al-Muqanna In terms of the logic, I think we should change it so that if you specify an etym-only language as an ancestor (e.g. Vulgar Latin), a language can inherit from that etym-only language's parent (Latin), or any of its children (we don't have any, but imagine if we had varieties of Vulgar Latin too). However, it shouldn't be able to inherit from the etym-only language's siblings (e.g. Ecclesiastical Latin), unless they also happen to be one of its ancestors (e.g. Classical Latin). Theknightwho (talk) 15:16, 2 April 2023 (UTC)
 * Even aside from that, though, the fact that you want to remove large amounts of information because it doesn't fit your schema suggests that you're coming at things from the wrong angle. We are descriptive, which means that we adapt to the facts, instead of pretending they don't exist because we don't like how complicated they are. Theknightwho (talk) 19:30, 2 April 2023 (UTC)
 * I'm not trying to remove any information, I'm just trying to relocate it and to handle it in a way that is not untenable in the long run. Thadh (talk) 19:42, 2 April 2023 (UTC)
 * Because Latin doesn't have defining sound laws in that period. French does. Whether French borrowed from Late Latin or Medieval Latin is only of importance to the French entry, since it describes at which point the word was borrowed and thus which sound laws applied to the original word. Thadh (talk) 15:44, 2 April 2023 (UTC)
 * Could you please give an example of where this issue applies? It also doesn't seem relevant to most situations involving etym-only languages, where the particular lect is relevant due to changes inside the language itself. Theknightwho (talk) 17:56, 2 April 2023 (UTC)
 * Just my 2p but since they're specifically not being treated as separate L2s it makes sense for etym-only languages to be consistently handled as proper subsets of their parent languages, so the first, more permissive option should be standard. —Al-Muqanna المقنع (talk) 15:12, 2 April 2023 (UTC)
 * @Al-Muqanna In terms of the logic, I think we should change it so that if you specify an etym-only language as an ancestor (e.g. Vulgar Latin), a language can inherit from that etym-only language's parent (Latin), or any of its children (we don't have any, but imagine if we had varieties of Vulgar Latin too). However, it shouldn't be able to inherit from the etym-only language's siblings (e.g. Ecclesiastical Latin), unless they also happen to be one of its ancestors (e.g. Classical Latin). Theknightwho (talk) 15:16, 2 April 2023 (UTC)
 * @Al-Muqanna In terms of the logic, I think we should change it so that if you specify an etym-only language as an ancestor (e.g. Vulgar Latin), a language can inherit from that etym-only language's parent (Latin), or any of its children (we don't have any, but imagine if we had varieties of Vulgar Latin too). However, it shouldn't be able to inherit from the etym-only language's siblings (e.g. Ecclesiastical Latin), unless they also happen to be one of its ancestors (e.g. Classical Latin). Theknightwho (talk) 15:16, 2 April 2023 (UTC)


 * I agree with Fay Freak and TKW above that what's good for the goose is good for the gander. If a French borrowing from Medieval Latin is completely different from a borrowing from New Latin then why wouldn't a borrowing into Medieval Latin also be completely different from a borrowing into Classical Latin? Historically these are also very different things. It's not always obvious, either: there are independent borrowings from Ancient Greek into each stage of Latin. IMO etym-only languages should be fair game for descendants sections, and "where they stop" should be the same in both cases. Treating the relationship as reversible in this sense seems natural to me, and I don't see any advantage to using qualifiers instead in descendants sections. I do also tend to agree with TKW about the fuzziness of etymology-only languages, though. —Al-Muqanna المقنع (talk) 14:49, 2 April 2023 (UTC)

Addition of Church Slavonic as a language
So I understand that Old Church Slavonic is a language in Wiktionary, but for some reason, Church Slavonic is not. Is there a reason why such is the case? They are two separate languages after all. Yakodobro (talk) 21:08, 3 April 2023 (UTC)


 * @Yakodobro there was a request for it: WT:RFM (by @Sławobóg). But everything fell silent. ZomBear (talk) 18:09, 4 April 2023 (UTC)
 * . Sławobóg (talk) 15:41, 14 April 2023 (UTC)

Supplemental and alternative/non-standard forms in daughter languages from Proto-Slavic terms with varying stems
A couple related things:
 * should the Proto-Slavic/OCS stems be mentioned in entries where there are forms that differ by their etymological stem, and should this information appear as a footnote in the declension table? Doing so would clarify these irregularities for learners and new categories could potentially be created and linked to, in order to point to similar instances in these languages. For example, the descendants of : eg. Polish and Serbo-Croatian / inherit the hard o-stem plural and i-stem dual for different usages, and Serbo-Croatian also retains the s-stem as a pejorative; the Russian  actually does mention the plural being derived from the dual, while leaving out stem info. The relevant descendants of eg., ,  leave this information on unexpected irregularity out of their entries.
 * there are also some noun entries where forms are not shown in the usual declension table (ex. ), but are given their own entries because they have another meaning and are plurale tantum:, , (all have s-stem but their usual plurals come from hard o-stem); likewise, , , ,  are all inconsistently constructed, variously forgoing etymology, haphazardly linking or being entirely forgotten on their etymon's entry. Should these pages be unified like the usual irregular plurals, with notes on ironic/colloquial/obsolete/historical/etc. natures, and when retaining two head words, what is the proper custom to link them (the most developed seems to be )? Anarhistička Maca (talk) 00:53, 4 April 2023 (UTC)


 * The Polish plural of ucho is uszy, ucha is used for non-living ears (compare oko/oczy. I am not sure how these are etymologically different however, all these forms are related.
 * It seems like it would make more sense to mention irregular plural information in the appropriate L2's etymology section, but we don't always do this (i.e. with English irregular declensions). Vininn126 (talk) 08:07, 6 April 2023 (UTC)
 * Make sure that Телеса, небеса, чудеса, ушеса, an Russian stuff like that aren't borrowings from the Church Slavonic. Maybee some OCS-user may help on this question? Tollef Salemann (talk) 20:42, 2 May 2023 (UTC)

German Low German?
Why does Wiktionary refer to Low German as German Low German? That's just weird. Synotia (talk) 20:52, 4 April 2023 (UTC)


 * There's also Dutch Low Saxon and Plautdietsch. This helps distinguish the three. Thadh (talk) 21:01, 4 April 2023 (UTC)


 * And there's also the general Category:Low German language - Low German is also spoken in America, both north and south (e.g., ).

Four hyphens on citations pages
Parallel to practice in entries, I've long seen people use four hyphens to distinguish language sections on citations pages, as on Citations:hubris. This has the same benefits (clearly separates the sections when editing the page from the traditional wikitext, not Visual, editor) and presumably same the drawbacks (hard for bots?). Following the removal of the hyphens from entries, should we remove them also from citations pages? (Should we also make a horizontal rule display automatically between language sections on citations pages, as is done in mainspace entries?) - -sche (discuss) 00:09, 5 April 2023 (UTC)


 * The editor-navigation problem is worse on citation pages. At least main entries mandatorily give clues like language names, PoS headers etc. DCDuring (talk) 22:30, 5 April 2023 (UTC)

Cebuano names
I've seen a lot of Cebuano names that shouldn't make any sense historically, such as Jansen van Vuuren, which is Dutch. I wouldn't see a reason why they should exist in the language if they are absent in Tagalog (or any other Philippine language).

Most of them appear to be the result of a single person, @Carl Francis, creating countless number of proper names in Cebuano. The same, but less extreme, situation is also found in Tagalog. Kwékwlos (talk) 15:06, 5 April 2023 (UTC)


 * The only problem I have with Jansen van Vuuren is that is arguably Jansen + van Vuuren. If we allowed such names we would have a combinatorial explosion of Spanish names. DCDuring (talk) 22:27, 5 April 2023 (UTC)
 * Besides that, there is a whole lot of English names that have Cebuano sections, but not a Tagalog section. Kwékwlos (talk) 11:29, 6 April 2023 (UTC)
 * This is stupid. Why should the Cebuano entries be dependent on the existence of a Tagalog entry? Pretty much everyone has English-sourced names in Cebuano-speaking areas. Parents named children Hugh after the grandfather Hugo, Rosalinda became Rosalind, Marcos became Mark, Julian pronounced 'j' not 'h', etc. Carl Francis (talk) 04:37, 7 April 2023 (UTC)
 * At least, provide a reference, quotation or citation attesting their presence in the local-language media. Kwékwlos (talk) 18:03, 7 April 2023 (UTC)

One solution is to verify if the Cebuano term actually has any attestations in the media (some of them have Citations pages). This can be done via RFV as there is a disproportionate amount of these lacking a Tagalog entry. We wouldn't want to see English names being littered with Cebuano entries that were indiscriminately added by a single person. Kwékwlos (talk) 17:32, 6 April 2023 (UTC)


 * The criteria for inclusion you suggest are not part of WT:CFI. In particular, the relative proportion of Cebuano and Tagalog names has no bearing whatsoever on inclusion decisions. If there are missing Tagalog sections for some names attestable in Tagalog, then someone could add them. If there are some names unattestable in Cebuano, they can indeed be RfVed, though it would be wise to focus on a few of the least likely ones rather than doing so for every name. DCDuring (talk) 14:51, 7 April 2023 (UTC)
 * I read this as merely circumstantial evidence: if there's no substantial difference in the distribution of English given names between the languages of the Philippines, then the fact that there's such a difference in our coverage would suggest a bias in our coverage.
 * That said, we need to be careful: languages tend to borrow things because of cultural attitudes toward the donor language. For instance, you'll find a lot of female given names in the US borrowed directly from French instead of using existing English equivalents because French names are (or were) considered exotic and pretty, and male given names borrowed from place names in England to give a connotation of wealth and prestige.
 * As I understand it, English has a very strong influence on modern usage in the Philippines, with a lot of code-switching and borrowing. If Filipino parents perceive that English given names have some kind of desirable connotation, that would explain why there would be a lot of them. Also, don't forget that older Filipino given names are predominantly direct borrowings from Spanish, so it's not like borrowed given names are a new idea. I get the impression that the system of surnames and given names was foreign to ancient Filipino culture, so that native Filipino given names aren't very common.
 * Another thing to watch out for, since this is English Wiktionary: it's very easy for native English speakers to fall into the trap of perceiving English as normal and other languages as foreign and exotic, and rejecting unadapted borrowings from English as violating this dichotomy. I'm not saying that's happening here, but it should at least be considered as a possibility.
 * Pinging, who may have something to say about this. Chuck Entz (talk) 19:19, 7 April 2023 (UTC)
 * I could populate Category:Toba_Batak_proper_nouns with thousands of names derived from all kinds of western given names, surnames and full names. My first language consultant had the given name Robin Hood (plus ), I also heard about a couple of guys called Hitler. In the Philippines, turning last and full names into given names is less common (my cousin's buddy Rommel comes to mind), but still not unusual. But in the case of the Boer surname Jansen van Vuuren, I'd definitely like to see a concrete attestation of it being used by Cebuanos. Austronesier (talk) 19:37, 7 April 2023 (UTC)
 * PS: And then there are of course some founder effects: e.g. Schuck has become a common Tausug family name. Austronesier (talk) 19:41, 7 April 2023 (UTC)
 * I overlooked the example at the start of the thread: surnames are trickier, since they can be independent of the language spoken by those who bear them. My great great grandfather changed the spelling of his surname from "Enz" to "Entz" when he came to the US, but many immigrants haven't. IMO we've never really dealt with the difference between a surname that arose within a language and one that followed someone who adopted the language. Over time, it becomes irrelevant: only an etymologist would consider Norris and North to be different. In the first few generations, though, it's hard to sort out. Chuck Entz (talk) 19:55, 7 April 2023 (UTC)
 * Most of these do not have citations. For example, we have Bambi as a generic Disney name, and Aramis, which is a French name derived from the Three Musketeers. This poses a problem for English readers who are surprised to find a Cebuano entry of a usually American, British, or even German and Russian name. Kwékwlos (talk) 22:28, 10 April 2023 (UTC)
 * Most of our definitions do not have citations supporting them. If someone tried to remove any of them outside of the RfD and RfV processes, especially en masse, they would be blocked. The best remedy for some of the possible problems of having a Cebuano section, but not an English one, for a common English name would be to add the English section, not remove the Cebuano one. DCDuring (talk) 16:33, 11 April 2023 (UTC)
 * @Kwékwlos I already reverted your mass removal of these names a couple of months ago on the basis that these should be RFV’d, and I note that you’re now doing it again. Please stop ignoring our processes, even if you don’t agree with them. A bit later on, I’m going to go through your contributions and revert any non-RFV/RFD removals again. Theknightwho (talk) 19:57, 7 April 2023 (UTC)

Transliteration versus transcription
I have found several "transliteration" systems that are apparently transcription systems (WT:AS TR, WT:FA TR, WT:RU TR to name a few). According to Transliteration and romanization, transliteration is the "rendering of written text from one writing system into another, letter-by-letter, or character-by-character for non-alphabetic scripts", while transcription means "phonological or phonetic transcription, the written representation of spoken utterances." Some "transliteration" systems are really "written representation[s] of spoken utterances". One editor at Wiktionary talk:Hindi transliteration even said, "...different [Indo-Aryan] and Dravidian phonologies are different and trying to merge them under a single romanisation is ridiculous. I don't think we should transliterate at all, rather transcribe them phonemically." So I think the policy regarding transliteration and transcription should be changed to reflect the current situation. Sbb1413 (he) (talk • contribs) 15:59, 5 April 2023 (UTC)


 * Note: Chinese romanization follows the transcription system since it is impossible to transliterate Chinese characters. Sbb1413 (he) (talk • contribs) 16:35, 5 April 2023 (UTC)


 * My proposal is to use tr for transliteration and ts for transcription and there should be separate transliteration and transcription schemes. --Sbb1413 (he) (talk • contribs) 17:04, 5 April 2023 (UTC)
 * The rule for the use of ts seems to be given at the documentation of Module:headword:
 * This is only used in a few languages with non-Latin scripts where the spelling is significantly different from the pronunciation, such as Akkadian, Old Persian or Hittite".
 * This would therefore be inappropriate where reading rules should suffice, such as the pronunciation of  in modern Indic languages. An example of appropriate Indic usage is the restoration of unwritten clusters in Ashokan Prakrit (e.g. .  I suggest a comparable dubious usage would be marking the &lt;i&gt; in canonical Pali  as a svarabhakti vowel.  (By contrast, it is a full vowel in mediaeval Pali - Warder). --RichardW57m (talk) 12:54, 6 April 2023 (UTC)
 * Thai is an example where mechanical transliteration is close to unintelligible, and is used at most for inscriptions and words of Indic origin. Khmer was treated on a similar basis, though I think with less justification.  We do, however, allow for vowels to be supplied from the phonetics for abjads, and similar toleration would makes sense for languages such as Thai and the Northern Indic languages, which very inadequately mark whether vowels are unsounded.  --RichardW57m (talk) 12:54, 6 April 2023 (UTC)

Arabic Special:Contributions/2A01:CB09:8067:5937:457E:8F63:4D20:4040
The IP user is mass-RFD-ing Arabic entries with no hamza above and below. The hamza above and below alif is not considered mandatory in a not very strict spellings by many authors, even in Qur'anic studies. Please review. Mass-revert if necessary. Alternatively change to Anatoli T. (обсудить/вклад) 05:17, 6 April 2023 (UTC)
 * Is unhamzated spelling of the correct template to use? Vox Sciurorum (talk) 17:28, 6 April 2023 (UTC)
 * Yes. I agreed for Template:rasm spelling of with already that we should keep and consequentially use this template. Fay Freak (talk) 11:57, 8 April 2023 (UTC)

Addition of "adjunct" as a POS
In a language such as Blackfoot, most references and scholars use the term "adjunct" for a morpheme that does not necessarily become a prefix or suffix, but is used in different sentences in different ways; this makes it hard to classify Blackfoot stems as either prefixes or suffixes. I'm proposing the addition of "adjunct" to the list of allowed POSs in order to abide by the professionals' stance. GKON (talk) 14:51, 6 April 2023 (UTC)
 * That is not how adjunct is used for English. Do we even have this definition of adjunct? (It's not def. 6 (labelled "linguistics" and I can't decipher def. 7 (labeled "syntax").) I see potential for confusion without some kind of link to explanatory text at a Wiktionary (eg, About Blackfoot) or Wikipedia page (eg, Blackfoot language). DCDuring (talk) 14:35, 7 April 2023 (UTC)
 * An adjunct is a morpheme that must be attached to another affix to be a valid part of the sentence. I understand the confusion but there really is no other way to classify some of these morphemes... GKON (talk) 04:36, 8 April 2023 (UTC)
 * Well, there is (i.e. prefix or potentially in the future clitic, or even just adverb/adjective etc.), but it's definitely not pretty. Thadh (talk) 11:15, 8 April 2023 (UTC)
 * Can we start by adding and attesting the appropriate definition of adjunct and its usage context? Is it only used for Blackfoot? Is it used for all Algonquin languages, all polysynthetic languages, all agglutinative languages? DCDuring (talk) 16:32, 8 April 2023 (UTC)
 * In the section Lexical categories, the WP article on does not include adjunct, so it is apparently not universally used. DCDuring (talk) 16:45, 8 April 2023 (UTC)
 * To my knowledge it was universally used but maybe I need to broaden my sources further? What would you use then as an alternative that already exists? GKON (talk) 16:47, 8 April 2023 (UTC)
 * Ah, I have found a source that uses "general root" or "other affix", do any of these exist? GKON (talk) 16:50, 8 April 2023 (UTC)
 * Affix is likely to be relatively uncontroversial here and apparently with some usage among Blackfoot linguists. DCDuring (talk) 17:28, 8 April 2023 (UTC)
 * Ok sounds good, I will replace all adjuncts with affix GKON (talk) 17:30, 8 April 2023 (UTC)
 * @GKØN440 @DCDuring Be careful, as we tend to be more specific by saying "prefix", "suffix" and so on. I would prefer that we used the most appropriate terminology for a given language, rather than trying to box it into pre-existing categories. We have plenty of other unusual parts of speech for other languages already, like converbs and ideophones. Theknightwho (talk) 07:32, 9 April 2023 (UTC)
 * Prefixes and suffixes do not work in this specific scenario. What is wrong with affix? GKON (talk) 15:22, 9 April 2023 (UTC)
 * @GKØN440 That's kinda my point: it sounds like it would make sense to use the more common term "adjunct", as "affix" isn't the best part of speech header to begin with. Theknightwho (talk) 15:44, 9 April 2023 (UTC)
 * We don't even have an appropriate definition in principal namespace, let alone an attested one. Nor do we have evidence that adjunct is commonly used in linguistic discourse about Blackfoot or other Algonquin languages. DCDuring (talk) 16:10, 9 April 2023 (UTC)
 * In Cree, the terms "suffix" and "prefix" are usually used, as mirrored by our handling of these. I can't speak of other Algonquin languages though. Thadh (talk) 18:26, 9 April 2023 (UTC)
 * I see... GKON (talk) 18:44, 9 April 2023 (UTC)
 * I'm just going to say, it seems as though even among Blackfoot linguists, there isn't a clear consensus as to the terminology... I am comfortable using whatever you guys decide... but I strongly suggest the term affix, because the words we are talking about here are not necessarily "prefixes" or "suffixes" - they can be added to a word in any place really. @Thadh, I know you work with Cree (what dialect specifically?)... what is your suggestion? GKON (talk) 18:50, 9 April 2023 (UTC)
 * I'm working sporadically on Plains Cree. There, these kind of words are always bound to one part of the root, so are actually either prefixes or suffixes. If Blackfoot doesn't distinguish between the two, it does seem best not to use "suffix" or "prefix". An alternative solution would be to call these adjectives/adverbs etc., but to indicate that they are bound, and also mark them with double hyphens (-foo-). I would suggest either that or indeed allowing a nonspecific header "affix" site-wide. Thadh (talk) 19:25, 9 April 2023 (UTC)
 * Is "affix" not a valid POS like adjunct isn't? GKON (talk) 19:41, 9 April 2023 (UTC)
 * Adjective and adverb might not work because these affixes include tense and aspect sometimes... GKON (talk) 19:42, 9 April 2023 (UTC)
 * See Entry layout. The affixes that include tense/aspect would then probably be called "particles". But again, we simply don't have the tools yet to handle polysynthetic languages yet, so we just have to create them. Thadh (talk) 19:58, 9 April 2023 (UTC)

Vandalism and harassment
A user is vandalizing Template:ll/documentation, trying to add to the documentation a parameter that is nonexistent (special:diff/72634172). That user is also harassing me (special:diff/72634188). Please stop this user if you can. I have difficulty doing things with this harassment around.

-- Huhu9001 (talk) 06:23, 9 April 2023 (UTC)


 * @Huhu9001 I gave you a warning for engaging in passive-aggressive behaviour, which you knowingly did. That isn't harassment.
 * It's highly inappropriate to make a complaint about someone without tagging them, or even referring to them by name; it merely proves my point, which is that you are not engaging in good faith. It's also inappropriate to tag ~75 users, too. Theknightwho (talk) 06:27, 9 April 2023 (UTC)
 * Well, they somehow managed not to ping any dead people, but they do seem to have pinged everyone else. Why, I'm not exactly sure- it just makes them look bad. Chuck Entz (talk) 06:38, 9 April 2023 (UTC)
 * Oh please. This user keeps harassing, stalking and obstructing me. This user is now reverting some of my edits that is intended to fix some bugs in mod:Jpan-sortkey (special:diff/72635378). Do you just care more about who looks bad than what doing some actual Wiktionary works is? -- Huhu9001 (talk) 08:06, 9 April 2023 (UTC)
 * : the first thing you need to recognize is that is doing what they're doing because they sincerely believe they're improving things. They're not infallible, and they've certainly made their share of mistakes, but they mean well- so there's no vandalism involved. You're the one who keeps undoing their edits, so the charge of obstruction won't stick, either.
 * What needs to happen here is that you need to let go of your anger at their making changes without asking first. You need to level with us and make your case on the merits- no ad hominems, and no stunts- so we can objectively decide which is the best way to go from here. Until you do that, it will be all too easy to dismiss you as a grumpy obstructionist with a grudge. If you think they've been unfairly painting you in a bad light, the first thing to do is to stop doing their work for them. Chuck Entz (talk) 08:39, 9 April 2023 (UTC)
 * Do you mean you can not even simply objectively decide "trying to add to the documentation a parameter that is nonexistent" is wrong? -- Huhu9001 (talk) 08:48, 9 April 2023 (UTC)
 * If not even "trying to add to the documentation a parameter that is nonexistent" can "make my case on the merits", what else can I do? From when on does Wiktionary accept documentation pages that describes something nonexistent? -- Huhu9001 (talk) 08:51, 9 April 2023 (UTC)
 * @Huhu9001 It's not a non-existent parameter lol. Theknightwho (talk) 09:12, 9 April 2023 (UTC)

Japanese sortkey
Japanese entries are now capable of automatically generating correct sortkeys without manual input. E.g. 成り金 is sorted as なりきん in cat:ja:Shogi despite that it does not provide  to. (Some sortkeys take time to update.)

-- Huhu9001 (talk) 07:47, 9 April 2023 (UTC)

The auto-sortkey mechanism is based on Template:ja-kanjitab. For entries with no kanji, add an empty with just a sort parameter. E.g. in PC-98, do this:

Proper noun

 * : the NEC PC-9801

This will make generate correct sortkey   for cat:ja:Computing.

Trouble
Sorry, this PC-98 part does not work now because User:Theknightwho keeps vandalizing and I can't have my work done. For other normal entries like 成り金 it works normally.

Related information special:diff/72638140, special:diff/72638224, special:history/Module:languages/data/2. User:Theknightwho is following me everywhere to revert my edits, or in the user's own word "to get the last word". Are you still unable to see who is right and who is wrong, who is "the one who keeps undoing their edits", who is overwhelmed by anger and is extremely emotional? -- Huhu9001 (talk) 10:07, 9 April 2023 (UTC)


 * I reverted your changes because you misunderstand how scripts work, and the fact you are clearly refusing to communicate with me means it is impossible for me to explain your mistakes to you. You very obviously lack the ability to collaborate productively with others, and seem to enjoy throwing tantrums when you can't get your own way. It's much worse than last time, and definitely ban-worthy. Theknightwho (talk) 10:10, 9 April 2023 (UTC)

Sorry. I remember you have previously complained about the tedious Japanese sortkey works. Now I am trying to solve it. Can you please do something make me able to do works smoothly? -- Huhu9001 (talk) 10:14, 9 April 2023 (UTC)


 * My understanding of the Japanese sort key is that there should be a separate radical-stroke data module for =Jpan, since there are some differences in the stroke numbers between Chinese and Japanese, but Module:Hani-sortkey (formerly Module:zh-sortkey) would be used as a stop gap measure until a separate module is created. – Wpi31 (talk) 10:47, 9 April 2023 (UTC)


 * Radical-stroke sortkeys are only used by . Why not just make that template use mod:Hani-sortkey? Why must other templates get involved? -- Huhu9001 (talk) 10:50, 9 April 2023 (UTC)
 * @Huhu9001 While it's obviously great to have a system that sorts by reading automatically, you've done it in a way that makes Japanese sorting incompatible with how sorting is done for every other language. That is a problem, because it means we need to rework what you've done. Otherwise, we can't rely on Japanese sortkeys being reliable in other modules.
 * One major flaw is that your new system is not compatible with Module:collation, which means it won't work in any of the automatically sorted list templates, which is now an extra hindrance to using them with Japanese. This obviously needs to be fixed. Theknightwho (talk) 11:07, 9 April 2023 (UTC)
 * see for example Category:ja:Animals where you could find 動, 匹, 頭 as a separate header because those pages lack ja-kanjitab. Imagine if there are dozens of pages like this – there will be roughly the same amount of headers. If Hani-sortkey is used, e.g. in Category:ja:Sciences 線 is sorted under 糸; this at least provides some sort of organisation when radicals are available.
 * Obviously sorting by kana is preferred, and I think the idea to use ja-kanjitab for sorting is brillant, but this should be done carefully, and that's why I said Hani-sortkey is used as a stop gap measure, or a fallback. Note that this is especially important for the Japonic languages that don't have a kanjitab template, see for example Category:mvi:Family. – Wpi31 (talk) 11:08, 9 April 2023 (UTC)
 * Also, we definitely don't want to have multiple sorting methods called in different ways for a single language. That's just chaotic. Theknightwho (talk) 11:10, 9 April 2023 (UTC)
 * Please look carefully, 動, 匹, 頭's sortkeys are incorrect not because those pages lack, but instead because and others do not call the sortkey function. They simply write a plain category link there, yet providing no sortkeys any way. The sortkey function is not responsible for some situation it is not called at all.
 * Making a kanjitab for other languages is extremely simple. If you ever want one, you just create the template with . -- Huhu9001 (talk) 11:33, 9 April 2023 (UTC)
 * Appologies for missing that. However, the argument still holds for cases where pages call ja-sortkey but lack ja-kanjitab, e.g. 線 which I've mentioned above. – Wpi31 (talk) 11:38, 9 April 2023 (UTC)
 * @Wpi31 Also, it strongly suggests ja-kanji should be modified so as to include head, which does include the conventional sorting function. Even Chinese always had that. Theknightwho (talk) 11:39, 9 April 2023 (UTC)
 * In that cases just add a kanjitab for 線, instead of asking Jpan-sortkey for some incorrect result. -- Huhu9001 (talk) 11:47, 9 April 2023 (UTC)
 * As we've discussed yesterday on Discord, I think we should track which entries are (probably incorrectly) using Module:Hani-sortkey as the fall back for Module:Jpan-sortkey. – Wpi31 (talk) 11:43, 10 April 2023 (UTC)
 * @Wpi31 Sounds good. Theknightwho (talk) 12:07, 10 April 2023 (UTC)
 * mod:Jpan-sortkey is locked from editing. It need to be unlocked to do this. -- Huhu9001 (talk) 02:41, 11 April 2023 (UTC)
 * It could still be edited by an administrator. Would you mind doing the honours? – Wpi31 (talk) 05:04, 11 April 2023 (UTC)
 * @Wpi31 Yes - no problem. I'll do it once I have some free time today. Theknightwho (talk) 05:48, 11 April 2023 (UTC)
 * This can now be tracked at T:tracking/Jpan-sortkey/fallback and T:tracking/Jpan-sortkey/fallback/ja (or whichever langcode). Theknightwho (talk) 19:11, 11 April 2023 (UTC)
 * This can now be tracked at T:tracking/Jpan-sortkey/fallback and T:tracking/Jpan-sortkey/fallback/ja (or whichever langcode). Theknightwho (talk) 19:11, 11 April 2023 (UTC)

Let's get rid of exceptionally formatted etym lang codes
Under WT:LOL/S there are some weirdly formatted etym-lang codes. For example, 'Early Scots' has in addition to the standard code 'sco-osc', the exceptionally-formatted codes 'Early Scots', 'O.Sc.' and 'Old Scots'. Similarly, 'Ecclesiastical Latin' has 'EL', 'EL.' and 'Ecclesiastical Latin' in addition to 'la-ecc'. The substrate 'a substrate language originally spoken by the Pygmies' has codes 'qfa-pyg' as well as 'pygmy'. The weird pseudo-language 'taxonomic name' has both 'mul-tax' and 'Tax.'. We also have 'en-HK' as the only code for 'Hong Kong English' (and similarly for several other country-specific variants of common languages such as English, Spanish, German, Dutch and Hebrew). Supporting these weirdly formatted codes while filtering out non-language-prefixed items in col (e.g. 'Appendix:Foo', '5:2 diet', the output of desc in a col entry, etc.) is difficult.

As a first pass I'd like to propose eliminating all codes that don't follow the standard format of two-or-three-letter lowercase groups separated by hyphens, except for the country-specific variants (which allow the last group to be all-caps) and the period-final codes, which should follow the Lua pattern '[A-Z][A-Za-z]*%.', i.e. an uppercase letter, followed optionally by upper or lowercase letters, followed by a final period. This means that 'CL' needs to go (but 'CL.' can stay for now); things like 'O.Sc.' should be renamed to 'OSc.'; and weirdnesses like 'pygmy', 'Koine' and 'pregrc' should go entirely. If this is agreed to, I can do a bot run to convert the offending codes appropriately, and then we can get rid of them. In the long run I'd like to get rid of the period-final codes like 'CL.', 'VL.', 'Tax.' etc. but that may be more controversial. Benwing2 (talk) 08:46, 9 April 2023 (UTC)


 * @Benwing2 I'm fully supportive of this, and I think we should eventually look to eliminate codes like  as well. Howver, I think the ones with country codes (like  ) are fine on a long-term basis, as they're widely understood (and probably part of some standard).
 * While we're on this topic, we should also sort out the 7 anomalous script codes,  ,  ,  ,  ,   and  . Every other script code follows the pattern  ; or sometimes with a language code in front. Theknightwho (talk) 08:57, 9 April 2023 (UTC)
 * While we're on this topic, we should also sort out the 7 anomalous script codes,  ,  ,  ,  ,   and  . Every other script code follows the pattern  ; or sometimes with a language code in front. Theknightwho (talk) 08:57, 9 April 2023 (UTC)


 * Re exceptional script codes, we should make sure (like with exceptional language codes) that we're naming them in a way that doesn't conflict with—i.e. makes clear they aren't—ISO codes. So, don't just rename  to something four letters long like , because that string could be assigned by the ISO to some other script. Maybe we could use codes in the private use range,  , but these wouldn't be able to resemble the scripts' names, so they might be hard to remember or understand. One idea is to make all the exceptional codes five letters:  ,  ,  ,  ,  ,  ,  . - -sche (discuss) 23:25, 9 April 2023 (UTC)
 * Sounds good to me. Benwing2 (talk) 00:04, 10 April 2023 (UTC)
 * @-sche @Benwing2 Agreed. That’s a good solution.
 * I also think the time has probably come to retire  altogether, as it came from a time when font support was considerably worse. These days, I suspect it’s totally unnecessary. Pinging @Erutuon, who knows about these things. Theknightwho (talk) 12:11, 10 April 2023 (UTC)
 * Font support for Ancient Greek single code points has been fine for a long time; however, as far as I know, combinations of vowels with macrons or breves and additional diacritics like ᾱ̓́ ῐ̈́ (macron, smooth breathing, acute; breve, diaeresis, acute) are only properly rendered in special fonts like SBL Greek and New Athena Unicode that aren't installed by default on any operating system. Actually New Athena Unicode doesn't render ῐ̈́ properly. (Gentium Plus, which is supposed to do well with lots of combinations of diacritics in Latin script, does terribly on these Greek combinations.)
 * It's possible that we could still get rid of  because only only Paeonian and Ancient Macedonian besides Ancient Greek use   and they might not have these mostly unsupported diacritic combinations because we don't seem to be marking vowel length on them. So we might be able to switch all three over to   with no ill-effects and use   in the CSS instead of  . — Eru·tuon 00:19, 11 April 2023 (UTC)


 * Main downside I can see to getting rid of 'Ecclesiastical Latin', 'EL' and the like is that people who use those languages in etymologies are probably used to typing those. But now that I think about it, it is pretty weird that we have both 'EL' and 'el' as codes for completely different languages. (No objection from me, to getting rid of them.) Codes like 'de-AT' are valid IETF codes, as TKW alludes to, so those are probably OK(?). - -sche (discuss) 00:29, 10 April 2023 (UTC)
 * I have eliminated all the exceptional etymology codes except for,  , and the seven Latin variant codes  ,  ,  ,  ,  ,   and  . I did this by setting up tracking in Module:etymology languages/track-bad-etym-code; eventually all the pages using the codes to be eliminated get listed. Many of the codes, e.g.  ,   and all the non-standard Scots variant codes, were totally unused.   is still here because I forgot to track it; I didn't track the seven Latin variant codes above, but they are being tracked now. As for  , I haven't eliminated it yet because it's used on 1,695 pages (the other codes were used on < 100 pages each at the most). What do you think? Should we go ahead and eliminate it? The canonical value is  , which is unfortunately 5 chars longer than  ; maybe we could set up a shorter but still standard-format alias? One possibility is to rename all the substrate codes from   to just  ; that shouldn't clash with anything and it's IMO clearer. Along the same lines, the Pygmy substrate currently has an exceptionally-named code   instead of expected  ; we could call this  . Benwing2 (talk) 05:47, 12 July 2023 (UTC)
 * Also User:Theknightwho, it looks like scripts don't currently support alias codes the way that full and etymology languages do. Can you help me implement this, or at least point me to how to do it and any pitfalls you think I might encounter? Benwing2 (talk) 05:51, 12 July 2023 (UTC)
 * Sorry to ping you again. I am definitely thinking now of standardizing substrate codes to begin with  rather than  . I'm not sure if there is any current module code that depends on these codes beginning with   but I will check for it before renaming. I renamed the 415 pages using   to  . For the Latin varieties, we have approximately the following usages:


 * After I eliminate, the above codes will be the only nonstandard language codes left. We have three options, I think, for them:
 * Leave them all in place, grandfathered, as special-case exceptions. This requires some extra logic in certain places that accept  terms, to recognize the nonstandard codes.
 * Eliminate them all in favor of the canonical codes. The main disadvantage here is that they are three extra chars to type -- not such a big deal as we already have things like  and even   in common use and I haven't heard grumbles about people wanting a   or   code or anything.
 * Eliminate the lesser-used ones (which would likely be the ones with < 1000 uses, hence we would eliminate,   and   and keep the other four). This would reduce the number of exceptions but still require the extra logic mentioned in choice #1 above, and might create confusion as people wouldn't necessarily be able to remember which varieties have the special-format codes and which ones don't.
 * I personally favor #2 (eliminate them all), I think, but I am not strongly opposed to #1 (keep them all). Thoughts? Benwing2 (talk) 06:45, 13 July 2023 (UTC)
 * Barging in here, but I'm definitely opposed to naming substrate codes as sub-.... It clashes with an existing language with the code sub. We should absolutely use a code starting with q for this, like . &mdash; S URJECTION  / T / C / L / 06:59, 23 July 2023 (UTC)
 * OK sure.  or  ? The only advantage of the latter is it's four chars and so less likely to clash in the (unlikely) event that   gets assigned to some language. User:-sche said   could conceivably clash with a four-digit script code but that seems unlikely to me in that (a) script codes begin with a capital letter (unless CSS classes are case-insensitive; I dunno), (b) new scripts get named fairly infrequently compared with langs, and (c) script codes generally are of the form   or , not  . Benwing2 (talk) 07:06, 23 July 2023 (UTC)
 * ISO will never assign codes in the range qaa-qtz to any language, as they are specifically reserved for "local use". &mdash; S URJECTION / T / C / L / 07:17, 23 July 2023 (UTC)
 * Aha, OK in that case  it is. Benwing2 (talk) 07:30, 23 July 2023 (UTC)
 * (e/c) Yeah,  is a good prefix, for that reason. - -sche (discuss) 07:59, 23 July 2023 (UTC)
 * I have renamed the  codes to   and changed all uses. Benwing2 (talk) 00:43, 24 July 2023 (UTC)

Glossary definition of slang
Our Appendix:Glossary definition of slang, which en links to, says it's a jargon or cant of a group, profession or subject. It has come to my attention that this does not match either how some people use the label, how our entry [[slang]] defines it, or how other dictionaries like Dictionary.com define it: they include, alongside the sense "jargon", a sense along the lines of "informal language". Should we expand our glossary definition of slang to include this? But then, why use "slang" to label informal language when we have a label "informal"? Should we, in the other direction, clean up uses of en on words that are only en and not jargon? (Discussion which prompted this was here.) - -sche (discuss) 01:44, 10 April 2023 (UTC)
 * I agree with you, although I think the glossary label can still have a narrower definition than the entry for the word as it's used in general, and therefore I wouldnt expect it to line up with other dictionaries' entries either. It might help also to note that our slang label is often used attributively, e.g. Internet|_|slang, which appears to the reader as a single label "Internet slang" despite the software categorizing it separately into two categories. — Soap — 07:34, 10 April 2023 (UTC)
 * Internet slang (without |_|) categorizes into one category (e.g., Category:English internet slang for English). J3133 (talk) 12:10, 10 April 2023 (UTC)
 * In speech, I don't think I've ever heard "slang" refer to "jargon". I assume most instances of the label here refer to informal language. I'm in favor of having "slang" and "informal" produce the same output like "pejorative" and "derogatory" do now. Ultimateria (talk) 16:11, 17 April 2023 (UTC)


 * I'm convinced there is a difference, e.g. "ghostie" is informal, but not slang, bc it's from the normal word "ghost". But something like "spook", a separately invented word for informal contexts only, is slang. That's what I believe. Anyway please check strong sources like OED for the meaning of "slang" vs. "informal" before making such changes. Equinox ◑ 16:13, 17 April 2023 (UTC)
 * I don’t doubt that your conviction has its basis in usage. Not being “the normal word” or “invented”, in addition to being informal, is insufficient merit for the labelling, which is a speech act that goes beyond merely attaching any labels you think of having a matching meaning. You try to say something with it, and in the context of a dictionary the reader has legitimate reason to assume that the editor adding this label, in contradistinction of other possible labels (for invented abnormal words or meanings), attempted to inform the reader of it having only usage in exclusive circles, which is probably also the basis why the glossary definition specific to this dictionary does not have the sense you maintain (the appendix author thought of this meaning first!). Which latter itself may also be a more informal, broader meaning of the word, like some linguistic terms get abused by a broader population, considering our chronology of the term  assuming the order “special vocabulary of tramps or thieves” → “specialized language of a social group” → “language in the informal register and not the conventional (normal) word”. Fay Freak (talk) 17:14, 17 April 2023 (UTC)
 * From what I've seen, the overwhelming majority of entries categorized as "slang" are those which should (according to the glossary) be tagged as "colloquial" instead, with jargon usually ending up in specific subcategories (e.g. fandom slang, criminal slang, military slang). Given how difficult it would take to manually recategorize thousands of entries, as well as the fact that most people use to refer to colloquial speech, I think the best approach would be to merge Category:English slang and Category:English colloquialisms, similar to how en and en both categorize pages under Category:English derogatory terms. Binarystep (talk) 13:14, 31 May 2023 (UTC)
 * To me "slang", at least Category:English slang would best be considered a long-term cleanup category, as more specific labels seem less ambiguous. I have occasionally converted the "slang" label to "informal". To me, also, "slang" seems somewhat pejorative. I'm not sure that our use of it fits my belief. DCDuring (talk) 15:10, 31 May 2023 (UTC)
 * I honestly agree that slang is marked differently than colloquial - slang has a certain level of jargon-y-ness. Vininn126 (talk) 13:23, 31 May 2023 (UTC)

FYI: APRIL UPDATE FROM UNICODE
https://mailchi.mp/266f7a23de0e/unicode-in-6229370 —Justin ( koavf ) ❤T☮C☺M☯ 15:38, 10 April 2023 (UTC)

A category for arms of the sea
There doesn't seem to be one. By arms of the sea I mean inlets, fjords, harbours, sea lochs or loughs, sounds and the like, which geographically are not bays. there is Category:Bays, and Category:Sounds, which is intended for sounds which are noises. I would like to see a category to cover these inlets, but what to call it - Category:Inlets? There are also Category:Gulfs and Category:Straits, but they are not suitable either. DonnanZ (talk) 19:06, 11 April 2023 (UTC)

Thesaurus:nonentity and Thesaurus:worthless person
Should we merge "Thesaurus:nonentity" and "Thesaurus:worthless person"? It seems to me there is a lot of overlap. — Sgconlaw (talk) 11:21, 12 April 2023 (UTC)
 * I dont know how to word this .... I'd like to see a distinction based on the speaker's compassion. There are terms of contempt for people perceived to deserve it, and terms that might range from contempt to pity depending on how theyre used. e.g. "he's just a pipsqueak, he doesnt belong out there on the worksite" shows pity for someone who's being tasked with a job they cant handle.  The terms in the nonentity category seem to be mostly of that type, whereas the terms under worthless person show a lack of sympathy even when the referent is in a position of pain.  The first category also specifically means the referent being small, either physically or metaphorically. So I think there is a distinction to be made here, though perhaps we could refine it by adding more specific terms and removing the ones that overlap, or putting the overlap terms into a  parent category that encircles both of them. — Soap — 12:47, 12 April 2023 (UTC)
 * hmmm, first off there's nothing in the meaning of nonentity or worthless person which indicates any connotation of the speaker's level of compassion, so the names of the thesaurus pages would have to be altered in some way. Secondly, the thesaurus pages could be merged (my suggestion would be to "Thesaurus:worthless person" as it is clearer than "nonentity"), and in that one page the distinction which you suggest could be made. Having said that, I'm not sure how easy it will be to determine whether certain words are used more compassionately than others. It seems to me a very subjective thing. — Sgconlaw (talk) 22:04, 12 April 2023 (UTC)
 * I'd be okay with merging and listing them in two sections were it not for my impression that our Thesaurus pages are cluttered already, with much use of L4 and even L5 headers that impedes readability. But perhaps that's just me.  Since this isn't a vote about that, I have no objection to the merger so long as there are still two sections on the page.  Indeed, I noticed just now that we have the worthless person as a subset of the nonentity already. So this won't change much except its appearance.   I will try to think of a better way to word my idea for separating based on the speaker's compassion. — Soap — 22:43, 12 April 2023 (UTC)
 * I've just added wankstain to both categories, as it was surprisingly missing, but I don't see any problem with merging them. --Overlordnat1 (talk) 00:01, 13 April 2023 (UTC)
 * Are not those who regard non-entities as worthless people themselves despicable? A non-entity may be a worthy person. --RichardW57m (talk) 16:11, 13 April 2023 (UTC)

PRM Theory: Does this WorldCat OCLC apply to this NYT correction?
Hey- does the OCLC number I used apply to the correction I cited here: ??? I am weak on the theory of the connection between an ISSN for a periodical and an OCLC. I see that relation as infinitely more suspect than an ISBN with an OCLC. With an ISBN, if the ISBN appears on the OCLC page, and other details align, I feel confident that this is permanently recorded media under Wiktionary's theory. With an ISSN and OCLC, I don't really know if everything I'm seeing online is permanently recorded unless there are some corroborating details. In this particular case, I feel like the correction may not be permanently recorded while the article proper may be. (You may say: Frankly, my dear Geographyinitiative, Wiktionary don't give a damn--- but I'm still trying to satisfy the higher-bar Attestation standards for entries while simultaneously using pure web sources when it seems to give some special insight or support on a given term.) please ping me here.
 * --Geographyinitiative (talk) 12:55, 12 April 2023 (UTC)
 * I think that errata such as that is recorded solely online. Print errata in news has its own thing. Wouldn't hurt to double-check, tho. From: CitationsFreak: Accessed 2023/01/01 (talk) 05:29, 13 April 2023 (UTC)
 * I add OCLC numbers for serial publications because, unlike ISBNs for some reason, a serial publication can change its ISSN from time to time. — Sgconlaw (talk) 04:58, 18 April 2023 (UTC)

Adding categorisation for Chinese character lists
There are a number of official/standard Chinese character lists, which I think we should set up categories for them, similar to how Japanese has categories for Jōyō kanji, Jinmeiyō kanji, etc. The proposed lists include:
 * (通用规范汉字表) of mainland China
 * (常用國字標準字體表) of Taiwan (all four lists)
 * (常用字字形表) of Hong Kong (the 2000 version, since that is the newest version available online; also note that the 2007 version is somewhat controversial)

– Wpi31 (talk) 14:04, 12 April 2023 (UTC)


 * 常用字字形表 2007 is online in the form of 香港小學學習字詞表. I also have a hard copy of the 2012 edition. I don’t know what the controversy is; is the 2000 version still adopted by HK schools now? — justin(r)leung { (t...) 15:04, 12 April 2023 (UTC)
 * I think the textbooks follow the newer standard, but the difference should be minimal between them. The controversy (mentioned on ) mainly surrounds the character shape rather than the coverage of the list, so I suppose it should be fine to use the later versions, preferably the 2012 version if it has corrected those mistakes of the 2007 version. – Wpi31 (talk) 15:45, 12 April 2023 (UTC)
 * I am inclined to oppose. Japanese is one language (AFAIK). Chinese is a collection of several dozens of languages. These Mandarin-centered lists do not reflect the commonness in other varieties. I can't even find 喺哋嗮諗 in the Hong Kong list. 恨国党非蠢即坏 (talk) 06:06, 13 April 2023 (UTC)
 * I'm not sure why that's an issue - we can add lists for other lects too. We still treat and  as being part of different languages, even though they both go under the "Chinese" header, so a category that applies to one does not need to apply to the other. Just so long as we're clear about which lect the list applies to. Theknightwho (talk) 01:30, 18 April 2023 (UTC)
 * I am of the same opinion. Words or characters can belong e.g. to Appendix:HSK list of Mandarin words/Beginning Mandarin or any similar lists for Cantonese, Min Nan, if such lists or categories exists simultaneously.
 * We already have Category:Mandarin by difficulty level (with subcategories).
 * @Wpi31. Anatoli T. (обсудить/вклад) 01:48, 18 April 2023 (UTC)
 * Agreed. I don't have such concern, but I am of the opinion that these categories should still fall under Chinese, since the Hong Kong list contains characters like 鎅 which are Cantonese but have entered formal usage. – Wpi31 (talk) 02:01, 18 April 2023 (UTC)
 * @Wpi31: The individual names of categories can be derived from the usage and applicability. If characters are in various standards for various or at least two Chinese varieties/topolects, "Chinese" is appropriate, especially if the original list includes the word "Chinese". Anatoli T. (обсудить/вклад) 02:12, 18 April 2023 (UTC)
 * The reason these categories exist is that they come from some official educational materials. But since, as shown above, official lists are all Madarin-based, similar lists made for other varieties will instead be original research of Wiktionary. This nullified the reason why these categories are created in the first place.
 * I strongly oppose the idea to choose whether to use "Chinese" according to whether "at least two Chinese varieties" or not. This will make it almost impossible to get a complete list of a single Chinese variety from these categories. They will not be very useful in that case. 恨国党非蠢即坏 (talk) 17:28, 18 April 2023 (UTC)

script codes present in ISO but not here
We try to monitor if we need to make any updates when the ISO adds/removes language codes, so I'm highlighting for anyone interested that we currently lack the following ISO script codes, a few of which seem useful to add: - -sche (discuss) 16:02, 13 April 2023 (UTC)
 * Aran (Arabic - Nastaliq variant, variant of Arabic; we do cover some other variants of scripts, like Latg, but I don't know if we need this one for anything...?)
 * Cirt (Cirth, not in Unicode - though we do have some other codes despite them not being in Unicode, like Blis, and Kpel)
 * Elym (Elymaic, a real script which was encoded in Unicode in 2019, so unless we're subsuming it into something else, it seems like one we should add...?) (added)
 * Hanb (Han with Bopomofo, an alias for Han + Bopomofo, presumably we get by fine just using those codes)
 * Hmnp (Nyiakeng Puachue Hmong, also exists in Unicode and seems like a candidate for adding here) (added)
 * Hrkt (Japanese syllabaries, alias for Hiragana + Katakana)
 * Jamo (Jamo, alias for Jamo subset of Hangul, presumably we just use Hangul's code)
 * Pcun (Proto-Cuneiform, not in Unicode)
 * Pelm (Proto-Elamite, not in Unicode)
 * Piqd (Klingon / KLI pIqaD, rejected for inclusion in Unicode)
 * Psin (Proto-Sinaitic, not in Unicode)
 * Sara (Sarati, not in Unicode)
 * Sunu (Sunuwar, not in Unicode)
 * Syre, Syrj, Syrn (variants of Syriac ... would this help with encoding different fonts?)
 * Yezi (Yezidi, revived by Yazidis in 2013 and encoded in Unicode in 2020 - candidate for adding here? — even if we don't use it for lemmas, we could include it and set fonts for it so quotations using it might, possibly, not be tofu) (added)
 * Zinh (code for inherited script, don't suppose we need it), Zsye (emoji variants of symbols), Zxxx.

Italics in labels and usage examples
Should italics be used in for titles of works that are usually italicized? This would be consistent with different formatting (italics) in other places (e.g., the definition). After WordyAndNerdy reverted this change at —“” to “”,—I provided an example of this (italics changing to roman within italics) at Wikipedia (: “For the namesake franchise featuring him, see Mario (franchise) and Super Mario. For other uses, see Mario (given name) and Mario (disambiguation).”) and in a book in the edit summary. WordyAndNerdy reverted it again, calling it “hypercorrectness”; however, it seems simply correct, not hypercorrect. This also applies to usage examples because they are in italics, too. J3133 (talk) 21:04, 15 April 2023 (UTC)


 * I think they should be (and the general practice is that if text is already italicized, then the term that would usually be italicized should be displayed without italics). — Sgconlaw (talk) 21:59, 15 April 2023 (UTC)
 * Agree with you on all points, J3133 – although my opinion may not count much; I confess I tend to be a nitpicker. But there is a fairly sharp distinction between nitpicking ("the sort of pedantry up with which I shall not put," as Churchill supposedly said) and hypercorrectness...just between you and I. Using roman type to render italics within italics may be nitpicking in some contexts, but it is never hypercorrect. My own experience is that it appears to be the universal standard in well-edited publications. – HelpMyUnbelief (talk) 04:41, 16 April 2023 (UTC)
 * Splitting hairs like this really isn't helping your case but it's hypercorrect because it's seeking to impose a standard of formatting correct in one area (prose text) onto an area where it is nonstandard (context labels). WordyAndNerdy (talk) 17:28, 18 April 2023 (UTC)
 * Oh, this is too good not to share: I just looked up nitpicking, and the sample sentence quoted Wally from the comic strip Dilbert. No, the name of the comic strip, which linked to its Wikipedia entry, was – you guessed it – not italicized. Yes, I changed it. I must be totally incorrigible...HelpMyUnbelief (talk) 04:53, 16 April 2023 (UTC)
 * I'm also in favor of properly italicizing (in this case toggling italics). And it looks like somebody intended for this to be possible because they added  to MediaWiki:Common.css, which makes the italics syntax not be italicized in contrast to the rest of the text inside of the parentheses of . — Eru·tuon 01:01, 17 April 2023 (UTC)
 * I agree that we should be doing this as well. The consensus seems reasonably clear here, so I've restored the italics-in-italics. Theknightwho (talk) 02:13, 18 April 2023 (UTC)
 * Usexes are text content. It makes sense to apply the same formatting conventions to them that one would apply to any other body of text. Context labels are not text content. They are a template meant to convey limited information in a concise format. They are functionally wiki-content, like the links in the sidebar. They should be presented in a way that is clear and consistent. We don't allow the insertion of wikilinks into quotations even though they might be helpful. Preserving the integrity of quotes is weighted above any kind of wikis-should-be-formatted-like-wikis ethos. Anyway, this "problem" is resolved by moving mention of Cuphead from the context label to the definition. WordyAndNerdy (talk) 03:53, 18 April 2023 (UTC)
 * @WordyAndNerdy This presentation is clear and consistent, because it is the standard way of presenting italics within italic text. Please stop edit warring over your own personal preferences. Theknightwho (talk) 14:45, 18 April 2023 (UTC)
 * Yeah, no. This is a textbook case of a solution in search of a problem. No one in this discussion aside from me seems to have made substantial contributions to the fandom slang topic area. Nor have any readers expressed concerns over the so-called "incorrect" formatting of a context label in a single entry to my knowledge. You're swooping in and demanding to impose prose text formatting conventions on context labels over all other considerations solely to satisfy your own fixation with enforcing specific grammar/formatting. This is both unwelcome and not a constructive use of time. WordyAndNerdy (talk) 16:33, 18 April 2023 (UTC)
 * I didn't know this until just now but readers can customise how they want context labels to appear. Adding formatting to context labels would presumably break or limit readers' ability set custom preferences. Once again, this is a solution in search of a problem. No one is asking for this "fix." WordyAndNerdy (talk) 17:04, 18 April 2023 (UTC)
 * The consensus is clearly against you, which you keep ignoring. Please stop. This is not something specific to fandom slang, and your arguments are unpersuasive because they essentially just amount to personal preference and speculation. Saying "nor have any readers expressed concerns" feels a bit absurd when there are several of them in this thread doing just that. Theknightwho (talk) 01:08, 19 April 2023 (UTC)
 * And now you're edit-warring and blindly reverting unrelated edits instead of actually engaging with any of the points I have raised. WordyAndNerdy (talk) 01:19, 19 April 2023 (UTC)
 * @WordyAndNerdy I did engage, as you can see in my comment above. Continuing to ignore the fact that consensus is against you is not a good approach. Theknightwho (talk) 01:22, 19 April 2023 (UTC)
 * You've also reverted 4 times in 24 hours, and don't seem to be at all interested in having good faith discussion. Frankly, that's grounds for a block. Theknightwho (talk) 01:27, 19 April 2023 (UTC)
 * Last I checked, Wiktionary doesn't have 3RR, and by my count you have also reverted four times, including blindly reverting unrelated edits (updating old-format categories to the new template) out of apparent spite. I may have been distemperate in some of my comments. But the difference is that I'm not an admin. I'm not expected to exercise special restraint and avoid wielding my authority in the service of personal conflicts. I haven't threatened to block someone in order to "win." WordyAndNerdy (talk) 01:59, 19 April 2023 (UTC)
 * @WordyAndNerdy I didn't threaten to block you: I said that what you did was grounds for a block, but didn't action it myself because I'm involved. The comment that you link is me explicitly calling you out on your attempt to engineer a block, which you make very obvious by acting like I did it anyway. It's just not credible. Sorry.
 * You have been extremely unpleasant throughout this exchange (especially on your talk page), and have essentially just tried to bully everyone else into submission with consistent personal attacks and insinuations about their motivations. Now that I've pulled you up on that, you seem to be making things personal as a way to avoid landing yourself with a block. That is completely unacceptable behaviour. Theknightwho (talk) 02:05, 19 April 2023 (UTC)
 * I would like to note that after a week of discussion (22 April) I will add a note to Style guide regarding what the consensus is (which seems to be to include the italics as WordyAndNerdy is the only one who opposed). J3133 (talk) 06:44, 18 April 2023 (UTC)
 * It's a bit early to declare consensus. But, as a contributor of text for taxonomic names, I would be greatly disappointed if we did not follow the practice of italicizing those taxonomic names that are recommended by the taxonomic authorities to be italicized when in non-italicized text and to have contrasting typography for such taxonomic names in italicized text. The recommendations of such authorities are widely followed. There do seem to be (have been?) some templates that don't permit this. DCDuring (talk) 18:26, 20 April 2023 (UTC)
 * All right, let’s look at the consensus after another week. J3133 (talk) 18:34, 20 April 2023 (UTC)
 * The bone of contention here is whether English italics-within-withins conventions should be applied specifically to context labels. Applying those conventions to the prose-text parts of entries (definitions, usage examples, usage notes) is uncontroversial to me. But I believe that context labels are a different animal. They are templates designed to convey limited information in a concise format. I don't think they should be treated like the prose-text parts of entries. The detriments of breaking the always-italicized format of context labels outweigh any perceived benefit of universally applying italic-within-italics rules in my mind. We have other style exceptions like disallowing the addition of wikilinks to the text of quotations. In any case, with cupsona the problem was neatly elided by moving mention of the video-game title from the context label to the definition proper. WordyAndNerdy (talk) 19:48, 20 April 2023 (UTC)
 * What are your thoughts on italics in, similar in many ways to ?
 * What about italics around Latinisms (eg, per se, c.) or terms like Schadenfreude in ? DCDuring (talk) 22:46, 20 April 2023 (UTC)
 * Not sure there's a one-size-fits-all policy solution that can be applied to every case. I tend to to italicise et al. whenever its use seems necessary, e.g. when formatting a cite with too many authors to practically list. I tend not to italicise uses of per se, circa, or e.g. based on the sense that these terms have become integrated into English through common use. (I'm italicising them here as mentions.) Whereas et al. feels a bit more alien – a bit more like a loan word, not fully integrated into English. I understand this is a matter of personal sensibility. I'm not sure we could craft meaningful policy solutions to account for such individual determinations. They ultimately cut back to subjective questions of "what is English, and what's not?" and obviously everyone's personal standards are different in that regard. Whereas italicising taxonomic names isn't a matter of individual preference for me. It's standard scientific practice.
 * TL;DR: A certain degree of latitude should ideally go into crafting policy. We ought to avoid broad prescriptions and weigh options on a case-by-case basis. WordyAndNerdy (talk) 06:36, 21 April 2023 (UTC)
 * Normal users do, unbidden, point out inconsistencies of various kinds from time to time, so "flexibility" has drawbacks. DCDuring (talk) 16:35, 21 April 2023 (UTC)
 * I don't really have much else to say, but I do think that it's a bit unfair to strongly declare "consensus" and move forward in just a few days after the discussion has started and there are not that many people discussing. While I do agree that italics within italics should be displayed as Erutuon mentioned, I do think that there's time for others to join in as well. AG202 (talk) 14:22, 20 April 2023 (UTC)
 * I am not sure whether you were referring to my message. Is 22 April (a week after starting the discussion) too early? J3133 (talk) 15:05, 20 April 2023 (UTC)
 * No, not yours specifically, but mainly @Theknightwho's. AG202 (talk) 15:55, 20 April 2023 (UTC)
 * @AG202 In the context of what was happening, I was making the point that it was a bad idea to keep imposing something unilaterally. I don't have any issue with keeping the discussion open. Theknightwho (talk) 16:02, 20 April 2023 (UTC)
 * Edit: I've realised you were referring to a different comment I made about the consensus being reasonably clear, which I'd forgotten I'd said as my change got reverted - sorry. Theknightwho (talk) 17:34, 20 April 2023 (UTC)
 * (chiming in a bit late)
 * My general sense is that the point of italics is to visually emphasize a short string of text by distinguishing its formatting from the formatting of its greater context.
 * If the greater context is already italicized, the appropriate technical approach to visually emphasizing would be to un-italicize within the longer italicized string. This is indeed the approach I am most familiar with from academic and other formal writing.
 * This string is not italicized, and uses italics to visually emphasize.
 * This string is italicized, and uses non-italics to visually emphasize.
 * Much as nothing much is visually distinct if the entire string is un-italicized, nothing much is visually distinct if the entire string is italicized.
 * If emphasis within an italicized string is required, and bolding is not optimal for some reason, then non-italics is the way to go. ‑‑ Eiríkr Útlendi │Tala við mig 22:11, 21 April 2023 (UTC)
 * We use bold for other things and overuse of bold is almost as bad as SHOUTING. DCDuring (talk) 01:22, 22 April 2023 (UTC)
 * I personally agree with Wordy that de-italicising in labels doesn't really make sense (it's not an italics continuous text, but rather one label, much like "Cuphead" is on its own). Of course this shouldn't apply to non-labels. I don't have a strong opinion though, and would be fine with either outcome. Thadh (talk) 13:35, 22 April 2023 (UTC)
 * I personally agree with Wordy that de-italicising in labels doesn't really make sense (it's not an italics continuous text, but rather one label, much like "Cuphead" is on its own). Of course this shouldn't apply to non-labels. I don't have a strong opinion though, and would be fine with either outcome. Thadh (talk) 13:35, 22 April 2023 (UTC)

I have changed Style guide to include the consensus (80% supported italicizing). J3133 (talk) 07:33, 1 May 2023 (UTC)

Pronunciation templates
A couple of questions:

1) a) Is there rhyme or reason to the syntax for the templates that auto-generate pronunciations for various languages? The differences seem random; we have, for example, even within the relatively narrow confines of the Romance family, "ca-IPA", "fr-IPA", and "pt-IPA" for Catalan, French, and Portuguese respectively; "it-pr| " for Italian; and "es-pr" for Spanish. Obviously there is a need for specialized templates for the endless array of peculiarities that can break the neat one-to-one correspondence between a word in a particular language and a set of pronunciations; but the template syntax variations among the languages where I have any useful knowledge appear needlessly confusing.

1) b) Where the differences are truly arbitrary, are there any plans for standardization?

2) With or without standardization of templates where it's possible, the process of adding pronunciations (with appropriate template included) to language sections that are missing them would seem to be an ideal candidate for a bot. At this point I have no intention of trying to learn how to use bots, much less create them, but I'm curious: Is there any such bot, or plans in the works to create one? – HelpMyUnbelief (talk) 22:15, 15 April 2023 (UTC)


 * IPA modules are usually smaller than their pronunciation counterparts, usually only generating IPA, where pronunciation modules include the IPA module, rhymes, syllabification... Part of the issue is that pronunciation modules just haven't been made yet, and also all the IPA and pronunciation modules NEED to work differently as different languages have different rules, i.e. stress or they need respelling if the orthography doesn't account for all the spelling (like in Italian there's no way to distinguish /e/ or /ɛ/, if I recall correctly, so you need to tell it which e it is. Vininn126 (talk) 10:44, 16 April 2023 (UTC)
 * As User:Vininn126 mentions, the *-IPA templates just generate pronunciation but the *-pr and *-p templates also generate rhymes and hyphenation and allow things like homophones and audio to be specified. The latter type of templates only exist for a few languages; I wrote the *-pr templates for Spanish and Italian while User:Surjection (I think) wrote the *-p templates, which explains the different syntax. Benwing2 (talk) 20:42, 17 April 2023 (UTC)

Global CheckUser activity policy
Perhaps this has already been advertised somewhere here that I have simply failed to notice, but there's a global RfC at meta:Requests for comment/CheckUser activity RFC about requiring that every CheckUser on a project must run at least five checks a year or else they (and if the project only has two CheckUsers, also the other CheckUser) get removed for inactivity, if that interests anyone. I think our two CheckUsers are both active enough to be unaffected. - -sche (discuss) 03:21, 16 April 2023 (UTC)

Should spaces and punctuation be ignored in column template sorting?
I was informed by Erutuon at that spaces and punctuation are no longer being ignored in column template sorting after Theknightwho’s edit to Module:collation made in March which switched to  in Module:utilities.

This results in, for example, at, the derived terms being listed as “Potter County / Potter Valley / Potterdom Potterphile / Potterverse” instead of “ Potterphile / Potter Valley / Potterverse”.

Erutuon also stated, “I had made Module:columns stop ignoring spaces on August 3, 2018, after a comment on the talk page, and DTLHS undid that on September 8, 2018, so ignoring of spaces and punctuation was carried over into Module:collation. On January 4, 2023, punctuation stopped being ignored in languages that use Module:Tibt-sortkey with this edit.”

I think we should look at the order other dictionaries use for their terms. J3133 (talk) 10:43, 18 April 2023 (UTC)
 * As another example, listed Kennedy Town after Kennedy's disease in the derived terms list of  yesterday but this is the displayed order of the list: “Kennedy plea / Kennedy Town / Kennedyesque  Kennedynomics / Kennedy's disease”. J3133 (talk) 11:33, 18 April 2023 (UTC)
 * FWIW, I just follow whatever order the rest of the list (in the source code) is using. – Wpi31 (talk) 12:59, 18 April 2023 (UTC)
 * I agree with @J3133 that we should follow the practice of other dictionaries in this regard. We can exploit their research or experience with user expectations in this (and other) regards.
 * I always thought that hyphenated, spelled-open, and spelled-solid items on such lists should appear next to each other (even on the same line). I have not noticed what other dictionaries do about contractions and possessives, ie, apostrophes, but the logic of ordering English terms as if they were spelled solid (ie, ignoring spaces and hyphens) would suggest that ALL punctuation be ignored. DCDuring (talk) 14:24, 18 April 2023 (UTC)
 * We should not use a question about English to decide something for all languages. Even within English, it is nonstandard to ignore spaces (but usual to ignore punctuation), and so we shouldn’t do that because it is unintuitive to do so. Theknightwho (talk) 14:38, 18 April 2023 (UTC)
 * That looks like your intuition only, unless you have some contrary evidence. In any event, at English Wiktionary it would behoove use to follow the practice of comparable English dictionaries for English L2s in this regard rather than a single user's intuition or prevailing practice in any other language or group of languages. MW3 has angels followed immediately by angel's hair, followed by angel shark. That is, MW3 ignores apostrophes and spaces. MW3 has andoke, and/or, andorite. That is, MW3 ignores /. Longmans DCE has hot air, hotbed, hot-blooded, hotch-potch, hot-cross bun, hot dog, hotel. That is, DCE ignores spaces and hyphens.
 * MW3 has the following in its "Explanatory Notes": 'The main entries follow one another in this dictionary from a to zyzzogeton in alphabetical order letter by letter. For example, above the line follows abovestairs (not above all) as if it were printed abovetheline with no spaces in the middle.'
 * You can satisfy yourself consulting your favorite dictionary. Print dictionaries are probably more likely than online dictionaries to have thought this through from a human perspective, however. DCDuring (talk) 16:25, 18 April 2023 (UTC)
 * @DCDuring I'm not that bothered about what we do for English, but I will strongly oppose any argument that we should apply this to all languages. That's all.
 * It also doesn't make sense for us to use different sorting methods for the column template and categories. They should always be the same. Theknightwho (talk) 22:21, 18 April 2023 (UTC)
 * I care deeply about what we do for English, especially as this is English Wiktionary. As category names are English, I would have thought that we would want them to be sorted in the same way as column items in English L2s, that is, just as other English dictionaries seem to, ignoring spaces, hyphens, dashes, apostrophes, virgules, and, probably, all other non-alphabetic characters. I would hope that ligatures are expanded and normal folding would take care of accented characters. I would be interested to know how alphabetization is carried out in dictionaries of all the languages that use alphabets, especially those derived ultimately from Phoenician, however altered and augmented. Has someone looked into it? DCDuring (talk) 23:11, 18 April 2023 (UTC)
 * Sorting of category names (categories inside other categories) isn't at issue here. I'm not sure if the sortkey generation modules are ever used to generate the sortkeys of categories within other categories. It probably wouldn't be noticeable in the vast majority of cases either, because category names don't usually differ by just a space or punctuation mark. The question here is how to sort entry names inside categories that are dedicated to a single language (such as Category:English lemmas) and in templates like . — Eru·tuon 23:29, 18 April 2023 (UTC)
 * You might be right. Category names are artificial - our creations. Headwords (and redirects) can't be forced without us getting prescriptive. DCDuring (talk) 04:14, 19 April 2023 (UTC)
 * @DCDuring I didn't mean category names; I meant the terms inside them. This thread refers to column sorting, but it would make more sense for it to refer to all sorting, as there's no sense in doing it differently. The change I made brought them in line with each other, which is something we should retain regardless of how we think the terms themselves should be sorted. Theknightwho (talk) 23:30, 18 April 2023 (UTC)
 * I strongly agree with @User:J3133 for English. I would like to see evidence that it wuld not be acceptable for other languages that use an alphabet and other characters (punctuation, etc.) like the English one. DCDuring (talk) 04:22, 19 April 2023 (UTC)
 * @DCDuring Shouldn't it be the other way around? I'm not sure we should assume that what works in English should be the default for all languages unless evidence is presented otherwise. It's not even compatible with some writing systems, and it has never been that way for category sort. Theknightwho (talk) 04:50, 19 April 2023 (UTC)
 * Before this discussion there was no evidence whatsoever advanced to support the statement above: "Even within English, it is nonstandard to ignore spaces (but usual to ignore punctuation), and so we shouldn’t do that because it is unintuitive to do so." Why did one user's intuition become privileged? The only evidence looked at so far is the practice of a couple of English dictionaries. (I can probably do the work of finding more in a week or so.) I think it behooves us to get actual evidence and not rely solely on the unaided intuition of our technical adepts. The practice of lexicographers, both commercial and governmental (language academies) seems the best indication of what normal users expect. As "smaller" languages usually don't have competing modern dictionaries, we would end up relying on one lexicographer for the evidence for such languages. DCDuring (talk) 14:27, 19 April 2023 (UTC)
 * You are conflating two issues here: I am saying that what we do for English should not set the standard for what we do for all languages, and that there is no precedent for doing that. I am not talking about what we do for English, which I have already said I’m not that bothered about - so I have no idea why you keep focusing on it despite me making it clear I’m not talking about that. I have already encountered this problem in Tibetan, and it would be absurd to impose a standard on all languages based on your Anglocentric intuition that we should follow English rules unless proven otherwise. Thanks. Theknightwho (talk) 14:40, 19 April 2023 (UTC)
 * This discussion is about "Should spaces and punctuation be ignored in column template sorting?" You seem to be deflecting away from that. As far as other languages are concerned, I can only hope that we do not rely merely on the intuition of one person, rather than evidence, as to what practice is in other languages. You have not expressed any acknowledgment of the value of such evidence, let alone produced any. DCDuring (talk) 14:52, 19 April 2023 (UTC)
 * "Should spaces and punctuation be ignored in column template sorting?" is not a question which is specific to English, and so we should not be saying “yes” based on the evidence of a few English dictionaries. That does not mean we cannot change the setting for English, however, but it does mean I will oppose any argument that we should make it the default. Instead, I am saying that we should do it based on what makes most sense for each specific language, and that in the absence of evidence we should use the default electronic sort, which is how things have always worked with categories. Is that something you disagree with? Theknightwho (talk) 15:19, 19 April 2023 (UTC)
 * Sure. But what little evidence we have gathered so far in this discussion suggests that, for English, at English Wiktionary, spaces and "punctuation" should be ignored for all normal-user-facing content. DCDuring (talk) 17:36, 19 April 2023 (UTC)
 * Perhaps we should accept evidence of lexicographical practice of neighboring languages in the same family rather than letting the sort favored by the machines govern. DCDuring (talk) 17:38, 19 April 2023 (UTC)
 * @DCDuring I don’t think that’s a good idea. Neighbouring languages frequently work quite differently in that regard, and the default sort has the advantage of being natively supported by the software. It would be wrong to sort French like Spanish, or (a more likely scenario) Buryat like Mongolian. Theknightwho (talk) 18:29, 19 April 2023 (UTC)
 * Strongly agree with this. Every language has its own sorting rules, and there should be more flexibility rather than less. AG202 (talk) 15:52, 19 April 2023 (UTC)
 * And some have several sets - French, German, Lao and Northern Thai to mention just a few! --RichardW57m (talk) 16:35, 19 April 2023 (UTC)
 * Me too. I think we need evidence about prevailing practice in languages for which such evidence is available before imposing uninformed intuitions on any language. DCDuring (talk) 17:33, 19 April 2023 (UTC)
 * @DCDuring Nobody is suggesting that we impose anything on a language without evidence. Theknightwho (talk) 18:27, 19 April 2023 (UTC)
 * That did seem to be the initial position you took when you stated "it is nonstandard to ignore spaces (but usual to ignore punctuation), and so we shouldn’t do that because it is unintuitive to do so." DCDuring (talk) 19:21, 19 April 2023 (UTC)
 * @DCDuring I'm not sure why me giving my initial view implies that I think we should just ignore evidence... Theknightwho (talk) 19:32, 19 April 2023 (UTC)
 * You stated your position categorically. You were dismissive of other points of view. You have not present or indicated any interest in evidence. DCDuring (talk) 19:51, 19 April 2023 (UTC)
 * @DCDuring That simply isn't true, and you seem to have misunderstood my point. I repeatedly clarified, and the other participants seem to have understood what I was saying, so I don't think there's any value in me going over it again. Theknightwho (talk) 21:33, 19 April 2023 (UTC)
 * What you repeatedly did was blow me off. I understood what you were saying and at no time did I disagree or say that I favored imposing the ignore-spaces-and-punctuation sort order for anything but English. You failed to address my point at any time, which I take as possible unwillingness to do anything other than what you said was "intuitive" at the beginning. DCDuring (talk) 23:07, 19 April 2023 (UTC)
 * @DCDuring We've obviously been talking past each other, because I thought I made it very clear that I wasn't arguing against what you were saying for English in my subsequent comments. Theknightwho (talk) 23:23, 19 April 2023 (UTC)
 * @DCDuring We've obviously been talking past each other, because I thought I made it very clear that I wasn't arguing against what you were saying for English in my subsequent comments. Theknightwho (talk) 23:23, 19 April 2023 (UTC)

For a comprehensible view, see The Chicago Manual of Style: "18.56 The two principal modes of alphabetizing —or sorting— indexes are the letter-by-letter and the word-by-word systems [...] Dictionaries are arranged letter by letter, library catalogs word by word [...] In an index including many open compounds starting with the same word, the word-by-word system may be easier for users. Both systems have their advantages and disadvantages, and few users are confused by either." --Vriullop (talk) 06:42, 21 April 2023 (UTC)


 * Indexes have other considerations than dictionaries do, which is way I didn't use that work to support the spaces-and-punctuation-ignored argument.
 * Works on lexicography sometimes discuss the order of the headword list, which is more similar to the question of order in column templates. The only three books on lexicography that I've looked at have something to say about alphabetization:
 * Dictionaries: The Art and Craft of Lexicography 2nd ed. (Landau 2001) (limited to English): "Dictionaries usually alphabetize letter by letter rather than word by word. Letter-by-letter arrangement has the great virtue that readers need not know whether a compound is spelled as one word, as a hyphenated word, or as two words."
 * A Handbook of Lexicography (Svensen 2009) addresses language differences, but has this to say about English: "In a language such as English, however, there may be difficulties since the varying way of writing compounds (separately, hyphenated, or solid) rather speaks in favor of letter-by letter sorting which has the advantage that the user who is unsure about the spelling will not have to search several places in the dictionary.
 * The Oxford Guide to Practical Lexicography (Atkins & Rundell 2008), apparently aimed at entry-level lexicographers, has less to say, giving as reasons "The discussion here relates only to print dictionaries: alphabetization holds no fears for editors or users of electronic dictionaries." and "Deciding on the alphabetical order of the headword list is a quagmire, but one which poses few real problems for editors of current English dictionaries. This is principally because every publishing house has its own policy, enshrined in the dictionaries already in print. the Style Guide will give explicit guidance on what goes where in the entries you are writing ." and "Dictionaries therefore tend to alphabetize letter by letter, ignoring capitalization .''
 * All of these discuss the special problems of finding idioms among headwords, but this is of less concern for lists within entries.
 * DCDuring (talk) 15:53, 21 April 2023 (UTC)
 * : Do you think I should make a vote regarding this issue (specifically for English)? J3133 (talk) 10:18, 2 May 2023 (UTC)
 * Now, one can dispense with the column templates and put items in whatever order one prefers, including grouping alternative forms, even on a single line. One can also thereby overcome the technical limitations that apparently prevent proper alphabetization of terms with templates such as and . So, as a practical matter, a vote would simply be:
 * precatory upon our technical contributors to do the work required to create templates that addressed the problem only in part (because of the apparent technical limitations)
 * preventive of mandatory use of the column templates in English L2 sections.
 * preventive of edit-warring about such template use in English L2 sections.
 * There are other concerns that have not been discussed at all: what is the impact of such alphabetization on the servers? What would be the server-load benefits of replacing alphabetization-on-demand via templates with some kind of one-time "hard" alphabetization? Should letter-by-letter also be applied to and ? Are there other instances of alphabetization, besides under derived terms and related terms. that need alphabetization (eg, see also, synonyms)?
 * IOW, I think it might be premature to have a vote.
 * DCDuring (talk) 12:28, 2 May 2023 (UTC)
 * I don’t think we should include anything about mandatory use or edit warring: people have different opinions on whether column templates should or shouldn’t be used that don’t necessarily align with their opinions on sorting, and edit warring is already disallowed, so any provision on it would be pointless. Server load is negligible - not worth taking into account. Theknightwho (talk) 12:42, 2 May 2023 (UTC)
 * : I thought of making a vote because it seems this discussion died down without a conclusion (and will likely be forgotten). Having one standard decided by consensus would be consistent and prevent edit-warring. J3133 (talk) 13:49, 2 May 2023 (UTC)
 * Does anyone have any observations on the technical benefits or costs of:
 * word-by-word vs. letter-by-letter alphabetization in column templates or elsewhere
 * different sort orders for different languages
 * replacing sorting on demand with (semi-)automated offline sorting, for all or some languages
 * DCDuring (talk) 15:00, 2 May 2023 (UTC)
 * @DCDuring
 * Letter sorting seems to be how it’s done for English, which is trivial to implement.
 * We have extensive sorting set up for lots of different languages, but they should be considered on a language-by-language basis. I suggest no change to these for the time being, as this discussion mainly seems to concern English.
 * I don’t see any advantage to sorting by bot, as it’s not resource-intensive, and it has three major disadvantages: it’s not instant, it relies on somebody actually running a bot to do it in the first place (which is not reliable, as sometimes people don’t do this stuff for years, like anagrams) and it removes direct control of the sorting algorithm from everyone except the bot owner.
 * Theknightwho (talk) 16:46, 2 May 2023 (UTC)
 * Some kind of offline or manual sorting will apparently still be necessary to handle lists that include templates or  (or possibly others). Automatic in-line sorting would undo any efforts to deliver the sort order a user would expect in lists that have those templates leading off items. DCDuring (talk) 17:52, 2 May 2023 (UTC)
 * Could you remind me what the issue with sorting those templates is? It would be good to find an automatic solution if possible. Theknightwho (talk) 10:14, 4 May 2023 (UTC)
 * As I interpreted a comment by, I believe, in a discussion of column templates that I haven't located, it had something to do with the timing of template expansion relative to the timing of the sort mechanism used in the column templates. DCDuring (talk) 12:05, 4 May 2023 (UTC)
 * "In the new column template regime, will there be any template that alphabetizes, ignoring templates like the list family and and . DCDuring (talk) 9:57 am, 17 April 2023, Monday (17 days ago) (UTC−4)
 * "@DCDuring That's not really possible with nested templates because the outer template sees the expansion of the inner template rather than the template itself. The best solution I think is a specialized column template that supports the equivalent of and  using inline modifiers. Benwing2 (talk) 10:39 am, 17 April 2023, Monday (17 days ago) (UTC−4)"
 * This is from a March GP discussion. DCDuring (talk) 12:13, 4 May 2023 (UTC)
 * I think what we really want is for only the visible text of the term to be taken into account, which certainly is possible. We also don’t necessarily want to ignore all templates, because they sometimes change the visible output (which we do want to take into account). We already do something like this for l, so I will have a look at how vern and taxlink work to see how straightforward this is.
 * By the way, there’s no reason to have a specialised column template for this: it’s something the main ones should be able to handle without any extra faff for the user. I would much prefer that we have a small number of powerful, versatile templates over hundreds of specialised ones that require the user to know about them/use them properly.
 * By the way, there’s no reason to have a specialised column template for this: it’s something the main ones should be able to handle without any extra faff for the user. I would much prefer that we have a small number of powerful, versatile templates over hundreds of specialised ones that require the user to know about them/use them properly.

Theknightwho (talk) 12:24, 4 May 2023 (UTC)


 * If it could all be done in a standard template without causing excessive delay in rendering the pages that use the modules (or templates), that would be fine. But, it might end up being resource-intensive. If so, it would not be too hard to maintain a specialized column template. Column templates that included an item beginning with or  could be fairly readily identified and converted to a specialized template offline. Online detection of the templates and use of a distinct sort mechanism might also be resource intensive. DCDuring (talk) 14:01, 4 May 2023 (UTC)
 * So far as I can tell, vern and taxlink both already sort correctly inside column templates without any modification being needed. Could you have an experiment and see if you can cause either of them to sort wrongly? I've not been able to, but I've not tested them extensively. I can't see anything obvious that would cause problems, though, as they both rely on formatting that's already accounted for by Module:columns and Module:collation. Theknightwho (talk)
 * Indeed, the column templates I've been dealing with don't seem to have that problem, but they do have the problem of being hard to edit because:
 * the items appear in a haphazard order in the edit window. (solution: offline alphabetization)
 * items like brownbag, brown-gag, brown bag, and brown-bagging don't even display near each other (solution: sorting ignoring spaces and punctuation, esp., hyphens, dashes, slashes)
 * Translingual names need mul to avoid displaying as orange. (solution: don't make Latin-script translingual terms appear orange whatever parm1 in a column template is)
 * Another solution is to make sure that, after all is said and done, there is a non-alphabetizing column template that just balances columns, the maximum number of columns being selected by the user. DCDuring (talk) 20:54, 5 May 2023 (UTC)
 * @DCDuring I'm very hesitant to have a generic unsorted column template, as their use is often a sticking-plaster for some deficiency in automatic sorting. In some ways, it's better for people to complain that there's a problem instead of silently circumventing the issue in a way that results in lots of manually sorted lists, as they can very easily devolve into an unordered mess. If we know about the issue quickly, it means the underlying problem's much more likely to get solved quickly.
 * That being said, the most obvious place I can think of where we don't want automatic sorting is for descendants, but I think they probably merit their own template, as there are other things that would be useful for those.
 * I would raise your third point as its own thread, as there aren't many people who know how gadgets work (and I'm not one of them). Theknightwho (talk) 21:43, 5 May 2023 (UTC)
 * Whining is no fun. The best way to find out about problems would be to look at places where people bypass canned uniformitarian solutions. If there is more effort required to bypass, but people persist in bypassing, that is something economists call revealed preference against the low-effort alternatives. For example, one can always have one-column lists using wikitext unless that little freedom is to be taken away, too. DCDuring (talk) 22:20, 5 May 2023 (UTC)
 * @DCDuring Yes, but it would be much better for people to flag any issues up so that we can fix them, instead of doing ad hoc solutions that are unmaintainable and inconsistent. You may think collation on Wiktionary is "low effort", but I can assure you that it is not. As for being "canned", isn't consistency and predictability the entire point? It's to help people find things, after all. Theknightwho (talk) 22:36, 5 May 2023 (UTC)
 * Better for whom?
 * It is because I don't believe that collation is low effort and because I do believe that this discussion will last for months without any outcome of value to me, even on items 1 and 2 (despite 2 being "trivial to implement" (See above.), that I would like a non-collating column template. Is there still one around? DCDuring (talk) 01:51, 6 May 2023 (UTC)
 * @DCDuring Better for those of us who actually want to fix problems at their source. If you don't believe there'll be any useful outcome for you, I don't know why you bothered commenting. Feel free not to waste my time in future: you have the "freedom" to make whatever templates you like. Theknightwho (talk) 04:42, 6 May 2023 (UTC)
 * I had not previously complained much about various aspects that make handling taxonomic names cumbersome, except about the heterogeneous nature of Translingual. I had not complained about the difficulty of editing large derived and related terms sections with auto-collation. This discussion seemed to offer hope for fixing things. In this case I was also concerned about being railroaded into a situation where things would be worse than they are for taxonomic items. I was also more optimistic at the start of the discussion than I am now. DCDuring (talk) 11:12, 6 May 2023 (UTC)
 * @DCDuring I was happy to help until you said "I do believe that this discussion will last for months without any outcome of value to me". Theknightwho (talk) 11:29, 6 May 2023 (UTC)
 * : I am not sure what the status of this issue is; will you implement the ignoring for English? J3133 (talk) 16:06, 9 May 2023 (UTC)
 * @J3133 Yes - it now discounts spaces. Regarding punctuation, we probably need a fallback for entries like and, or otherwise they'll sort unpredictably. I'll check what needs to be done regarding that. Theknightwho (talk) 16:36, 9 May 2023 (UTC)
 * : I am not sure what the status of this issue is; will you implement the ignoring for English? J3133 (talk) 16:06, 9 May 2023 (UTC)
 * @J3133 Yes - it now discounts spaces. Regarding punctuation, we probably need a fallback for entries like and, or otherwise they'll sort unpredictably. I'll check what needs to be done regarding that. Theknightwho (talk) 16:36, 9 May 2023 (UTC)

See also Usage
has added unrelated words to the 'See also' section on chan, chi, chia, chin, ching, hao, ju, kai-- see for instance, , , where I revert them. Is there anything that can be done about this? Am I in the wrong for removing these words? Semantic_relations says "If the semantic relation is none of the above (such as, for example, plesionymic, such that it is partially overlapping on a semantic field but with important distinctions), or if you don't know exactly how a word is semantically related to the word defined by the entry you are editing, please add it to this section. However, since almost all words are semantically related to each other on some (sufficiently remote) abstract level, please use your own judgement on whether somebody possibly would find it useful." --Geographyinitiative (talk) 13:23, 18 April 2023 (UTC)


 * Just revert the edits if you dislike them, or try to get the user blocked (the latter will happen sooner or later anyway). BTW, haven't we already had this conversation? It is probably (talk) 13:30, 18 April 2023 (UTC)


 * I just want to make sure I'm not breaking the rules of Wiktionary by doing these eight removals and any future removals. I'm sorry if this is duplicative discussion, but, speaking honestly, is bringing that fact up an academically or morally legitimate control device on me speaking out against this behavior? Thanks. --Geographyinitiative (talk) 13:42, 18 April 2023 (UTC) (Modified)


 * Screw the rules, man! It is probably (talk) 13:46, 18 April 2023 (UTC)


 * I suggest: that (& future similar accounts) use text from See also in the edit summaries of any future edits in which a word with no relation excepting the same Roman alphabet letters as the entry title is added to a 'See also' section to justify such an edit. (I believe that will be impossible.) I hope (beyond hope) that community members will second my suggestion. My goal is to prevent unrelated but similarly spelled items from appearing in See also sections (unless that is what the community members want). Thank you. --Geographyinitiative (talk) 13:49, 18 April 2023 (UTC) (Modified)
 * Is this about See also in the English L2 section or Galician or ...? It would have been nice to see the diff by which they were added if saying what L2 was involved was too hard. Also, 'rollback', which removes something from entry history and leaves no trace that it occurred, is more drastic than 'undo', which leaves the removed content and the fact of the undoing in the entry history.
 * Is that control device or control + device? DCDuring (talk) 15:23, 18 April 2023 (UTC)
 * Here are four of the diffs where I revert:, , , . To keep discussion on topic, please forgive my eccentric use of 'control device'. I don't plan to make any further comments here. --Geographyinitiative (talk) 15:29, 18 April 2023 (UTC)
 * I am only looking for clear communication on this page. People probably weren't rushing to answer because they didn't see what the issue was for the cases you mentioned. DCDuring (talk) 16:31, 18 April 2023 (UTC)
 * The policy says the see also section may be used for semantically related words (paraphrasing here), but I believe I've often seen it used for words that are orthographically similar and I think that should be permitted because there is no other place to put that information when it is not appropriate for the at the top of the page, which should be reserved for strictly orthographically related terms (including terms in other scripts). — Eru·tuon 23:19, 18 April 2023 (UTC)

Category:Spanish Spanish and Category:Portuguese Portuguese
These category names sound kind of stupid to me :D Synotia (talk) 14:36, 19 April 2023 (UTC)
 * Also CAT:English English. Meh. Could rename them to "Spain Spanish" (a la Category:Switzerland German) or "Spanish of Spain", I suppose. - -sche (discuss) 17:56, 19 April 2023 (UTC)
 * Why not just European or Peninsular Spanish?
 * Same for Portuguese; the most common term for how they speak in Portugal is European Portuguese. Synotia (talk) 19:09, 19 April 2023 (UTC)
 * Yeah, if we're going to change anything, then "Spain Spanish" & "Portugal Portuguese" would be fine. (Though I'm also fine with what already exists.) Category:European Spanish and corresponding one for Portuguese already exist and while they don't include much, I assume that they're there for categorical reasons and that messing with them would mess up categorization (also it's just not as accurate), so I'd personally oppose any name changes to those. AG202 (talk) 20:55, 19 April 2023 (UTC)
 * As far as the technical/categorization side of things, I think we could just set the labels "Spain" and "Portugal" to have language-specific categorization (into the European category) when the language was Spanish or Portuguese, respectively. I notice we already put Spain-Spanish and Portugal-Portuguese verb forms under "European", so I don't really see any problem with putting lemmas into the European category too (indeed, it would be consistent)... but I also don't mind "Spain Spanish" or just leaving "Spanish Spanish" either. - -sche (discuss) 23:05, 19 April 2023 (UTC)
 * I’m honestly more worried about if there’s a reason in the future to add like Gibraltarian Spanish (llanito) terms in the future, they’d fall under the European Spanish category, so I find it better to still have it even if it’s just populated by Spain/Portugal right now. AG202 (talk) 01:19, 20 April 2023 (UTC)
 * That's a good point. Plus, we occasionally see terms that exist only in diaspora communities, so it's good to keep the field of scope accurate even if the name sometimes looks a bit silly. "English English" > "England English" in my opinion, but I'm not hugely fussed. Theknightwho (talk) 01:28, 20 April 2023 (UTC)
 * "Spain Spanish" would be barbarous. --RichardW57 (talk) 07:51, 20 April 2023 (UTC)

names used as nouns
Do we have a subcategory or list for when a name is used as a common noun? I'm thinking especially of when given names are just lowercased, like, , , /, , , , , , though I suppose uppercase examples and surnames are also interesting. The top-level Category:Eponyms by language categories like Category:English eponyms include all kinds of derived terms (Sherlock Holmesiana, Adam and Eve on a raft, Adamic, etc), so what I'm imagining is a subcategory of that. Would such a thing be of interest / worthwhile? - -sche (discuss) 18:07, 19 April 2023 (UTC)


 * I don’t know; seems like trivia to me. — Sgconlaw (talk) 19:51, 19 April 2023 (UTC)
 * Are you thinking only of exact correspondence (except for initial capitals) of common and proper noun in English? It might be a smallish category. There are a good few well-known standard physical units that would fit the bill, not to mention the smoot. DCDuring (talk) 01:31, 22 April 2023 (UTC)

"Translingual"
Isn't it just Latin? Synotia (talk) 21:25, 19 April 2023 (UTC)


 * @Synotia We put a lot of New Latin taxonomy in there because it's used in tons of languages. It's a bit of a compromise. Theknightwho (talk) 21:27, 19 April 2023 (UTC)
 * Translingual includes lots of things, such as Category:Translingual emoticons, internet top-level domains, airport codes. Taxonomic names are included partly because they are intended to be used in multiple languages. Another reason is because including them in Latin would bug some editors, including me: it's kind of misleading to describe names that often have minimal or zero use in Latin text or speech as being part of the Latin language, and it would have a distortionary effect on Latin language categories if all of the taxonomic names were included.--Urszag (talk) 01:09, 20 April 2023 (UTC)
 * All of the above, plus pronunciation. We can all agree that Trifolium pratense means red clover but few if any of us are going to use the classical or even Ecclesiastical Latin pronunciation for it when speaking our own languages. — Soap — 11:31, 20 April 2023 (UTC)


 * 'Translingual' is meant in the same way 'Chinese' is meant, except that 'Translingual' is biased toward seeing the Roman alphabet as universal. Both are an overarching label over numerous languages that use similar written forms. (Hey Mom, come look, I stuck it to the man again online!) --Geographyinitiative (talk) 11:37, 20 April 2023 (UTC)
 * I believe that the most numerous types of item under the Translingual L2 header are characters and symbols. Translingual is a hodgepodge of items unwanted by real languages. DCDuring (talk) 13:24, 20 April 2023 (UTC)


 * "Translingual" is for things that are independent of individual languages. If you look at Chinese scientific writing, for instance, you'll see a sea of Chinese characters with translingual terms in Latin script here and there. Same with Greek or Russian or Arabic.
 * As for taxonomy, taxonomic nomenclature is based on Latin because modern science started at a time when scholars all over the world corresponded with each other in Latin: since Latin wasn't anyone's native or national language, it was considered neutral territory. The fact that a European language was chosen for this is an artifact of the history of modern science. It may not be fair to the rest of the world, but it's pretty much arbitrary.
 * Taxonomic nomenclature started as actual Latin, but the use of Latin running text in science faded out, leaving a very restricted subset. Real Latin has 5 main cases: nominative, accusative, genitive, dative and ablative, with traces of two more, locative and vocative. Taxonomic Latin just has nominative and genitive. Taxonomic Latin has no verbs, adverbs, prepositions, interjections, etc. You can't write a sentence in taxonomic Latin. Everything is nouns in the nominative case and modifiers, with the modifiers being either adjectives in the nominative case agreeing in gender and number with the main noun or nouns either as nominatives in apposition, or in the genitive case. The names of higher taxa are nominalized adjectives derived by adding endings that indicate the taxonomic rank to the genitive stem of a noun. There's also Ancient Greek in taxonomy, but it follows the same restrictions of part of speech, gender, number and case- and then it gets converted into taxonomic Latin before it can be used. You can also use any other word from any language (or what's called a "random sequence of letters"), but it also has to be made into taxonomic Latin, including being assigned to Latin grammatical categories as needed.
 * In other words, taxonomic nomenclature is a very artificial construct built out of mostly Latin or Ancient Greek parts and put together in such a way as to look just like Latin. Chuck Entz (talk) 03:05, 22 April 2023 (UTC)

CAT:en:Names (Sergei) vs CAT:English names (Stephen); also CAT:en:Exonyms (Sofia)
As discussed at WT:RFM, our "exonyms" categories are categorized into "CAT:Places", and currently only contain place exonyms. However, other things can be exonyms, e.g. German for Deutsch, Germans for the people, or Averroes for the person Ibn Rushd. In that old RFM discussion, it is proposed to rename categories like CAT:en:Exonyms to CAT:English exonyms, and move them from CAT:en:Places (a subcat of CAT:en:Names) into CAT:English names. It is relatedly proposed to merge CAT:en:Names (which contains e.g., in the subcategory CAT:en:Russian male given names) with CAT:English names (which contains , in CAT:English male given names). We'd need to come up with a naming scheme for the renamed "en:Russian male given names" cats ("English Russian male..." is bad, maybe "English renderings of Russian male given names"?), but I agree with the RFM opener it's not sensible to have two categories like this, so I think we should merge "en:Names" and "English names" (likewise for other languages). As the RFMs are old, I'm asking here: agree, disagree, other feedback? - -sche (discuss) 03:07, 21 April 2023 (UTC)


 * Admittedly, I am unsure if I understand the problem here, but given the silence that this topic has received so far perhaps this response is better than nothing:
 * I prefer ‘en:’ over ‘English’ as it’s briefer and might reduce ambiguity. ‘Places’ should be a subcategory of ‘Names’, and I am fine with us applying ‘Exonyms’ more broadly. Rather than something like ‘English renderings of Russian male given names’, we could go for ‘en:Transliterations of Russian male given names’ and make that a subcategory of CAT:English transliterations of Russian terms (which we should probably rename to CAT:en:Transliterations of Russian terms ).
 * I hope that this helps. —(((Romanophile))) ♞ (contributions) 21:52, 25 April 2023 (UTC)

Ryukyuan templates
Ryukyuan languages now have access to Japanese-styled templates for the entry layout. Ryukyuan language versions of, , and Japanese headword-line templates can be made in just a few lines, described below.

I am not sure whether Ryukyuan editors actually need them (or even whether there is still any Ryukyuan editor active on Wiktionary). But I saw many Ryukyuan languages use because they lack a template of their own. produces incorrect links and categories for these languages so I have replaced them with of the correct language I had just created. If Ryukyuan editors don't like this, please revert these replacements and ignore the rest.

How to Play:

1., , :

For example, to make a Miyako version of, i.e. , just create the template with this line:

For the documentation page, if you don't feel like to write a new one, you can reuse the Japanese documentation page with:

2. Japanese headword-line templates

Headword templates are slightly different because 1 is for POS. So they use lang instead for the language code. To make a Miyako version of, i.e. , create the template with this line:

3. Transliteration

The transliteration data of Ryukyuan languages is located in "Module:ja-translit/data/(language code)" (Module:ja-translit/data/mvi for Miyako). It is not as straightforward as creating templates. But basically you can understand it as a table that overwrites the values in Module:ja-translit/data (data for Standard Japanese) to make the specific rules of a Ryukyuan language's transliteration. -- Huhu9001 (talk) 17:46, 21 April 2023 (UTC)


 * This has long been needed. Thank you, @Huhu9001! ‑‑ Eiríkr Útlendi │Tala við mig 22:13, 21 April 2023 (UTC)

Renaming Lari to Larestani
Pinging : I think renaming Lari (lrl) to Larestani would be a good change. For one, "Lari" is ambiguous since it can also refer strictly to the dialect spoken in Lar, Iran; Larestan clearly signifies the whole region. The title of also makes this distinction, since their wordlist is meant to be for the Lari dialect while they call the whole series "Larestani Studies".

The only other issue is that "Achomi" is also a common term for the language, e.g. the wiki article (which does mention Larestani as an alternative). Either way though, "Lari" is definitely not the best name, but we should decide between Achomi and Larestani. —AryamanA (मुझसे बात करें • योगदान) 19:05, 21 April 2023 (UTC)


 * I agree with everything here. With regard to Achomi vs. Larestani, the former (Achomi) is the common contemporary name in Iran today, but is not the name used by Iranists in the literature. The latter (Larestani) is the most common name in the linguistic literature. Kadxuda (talk) 19:10, 21 April 2023 (UTC)


 * I suppose there might be potential for confusion with Achumawi and its variant Achomawi, but I don't know how well known it is beyond those who study the California Indians and their languages. Chuck Entz (talk) 20:05, 21 April 2023 (UTC)
 * I looked at the bibliography of this paper. Most of the sources indeed use "Larestani", although the paper itself uses "Lari". "Achomi" seems to be a vernacular name, like "Farsi", "Sorani", "Kurmanji". We avoid those in favor of more professional "Persian", "Central Kurdish", "Norther Kurdish". In short, I agree with renaming to "Larestani". Vahag (talk) 20:52, 21 April 2023 (UTC)
 * Support: Larestani (Lārestāni) also prevents confusion with Luri. -- Sokkjō 23:17, 21 April 2023 (UTC)
 * ✅ —AryamanA (मुझसे बात करें • योगदान) 03:07, 25 April 2023 (UTC)

Wugniu implementations and bots
As per previous discussions, we plan for Wiktionary to eventually transition to Wugniu as the default Shanghainese romanisation. @Justinrleung has requested a discussion with people with bots, in particular, @Fish bowl, as to how the switch should happen. In previous beer parlours, a scheme like this was proposed: What do you think? Please leave any suggestions.
 * Display Wugniu in the zh-pron dropdown menu (this is likely to be skipped)
 * A temporary |wugniu= parametre may be implemented for switching the code in modules, bots, etc
 * When all code switches to the Wugniu module, the |w= header switches to Wugniu OR becomes legacy and a different header will be chosen for Shanghainese

Relevant: Module:wuu-pron/sandbox

(@Wpi31, @Musetta6729) — 義順 (talk) 19:33, 21 April 2023 (UTC)


 * The way I designed the module allowed for full backwards compatibility: it could still accept the older Wiktionary romanisation and output Wugniu. Things will not break and there is no need in using a bot for conversion when the module switches over, though obviously in the long term we would want to convert all of the input to Wugniu. I don't have much further comments. – Wpi31 (talk) 19:44, 21 April 2023 (UTC)
 * Thanks. I don't understand why step 1 is likely to be skipped; it should always be shown if it's the default romanization. For legacy purposes, it would be interesting to know what people want to do with the in-house system; should it still be displayed in the dropdown? For bot work, I was thinking if there's a quick way to move to Wugniu all at once, such something like a conversion from the existing  into  . Another thing I forgot to mention is what to do with other places where Wu romanization is used, such as translation tables, etymologies, and . Also pinging  because they have also voiced interest in helping with bot work. — justin(r)leung { (t...) 19:47, 21 April 2023 (UTC)
 * Thanks for the comment and pinging Manishearth. What I meant by the first point is that the default stays as WT and Wugniu is only shown in the dropdown, not also the default. As previously discussed, WT's system will be kept in the dropdown even after Wugniu being set as default, and unless there are any objections, that should be happening.
 * Personally, I would believe that the current Wugniu transcription should be used for all three. This orthography is a lightly altered version of what the creators of Wugniu and the populace use, and so I do not see why it should not be put in place — 義順 (talk) 19:56, 21 April 2023 (UTC)
 * I do agree that it would be good to keep the current Wiktionary romanisation under the dropdown menu for legacy really; since a considerable amount of Wu romanisation schemes did arise online from waves of small-scale individual efforts that were subsequently lost without being archived, I would consider it good to keep the Wiktionary romanisation accessible on Wiktionary for posterity as part of that.
 * Also agree with @ND381 regarding using Wugniu for translation tables, etymologies, etc - I reckon that Wugniu is the closest thing that exists to a somewhat authoritative, viable Pan-Wu romanisation right now so I do think it would be the best choice out of all the romanisation schemes possible. To my knowledge though Wugniu does not necessarily cover all Wu varieties equally well: many Southern Wu lects for example do not necessarily have well-designed and well-recognised schemes, so it might be good to couple the use of Wugniu with some IPA too where exact phonetic transcription becomes necessary in etymologies and such. Musetta6729 (talk) 13:22, 22 April 2023 (UTC)
 * As for the note on showing both Wugniu and IPA, perhaps something similar to what Hokchiu uses (see 伓) could work? — 義順 (talk) 13:27, 22 April 2023 (UTC)

China-related Geography: Matching Wikipedia Coverage
If you look at Category:en:Places in China, I would say that every administrative division's English language name (or a close variant) meets Attestation. Exception: anything below the county-level division level (towns, villages) can be shaky (but some shaky ones still have three website cites (I mean: normal cites of English languge sentences from some online source (no OCLC number) that happen to not meet Permanently Recorded Media.)). So I believe (with that justification and with my own experience) that every county-level division linked from is going to meet Wiktionary Attestation thresholds (especially if pure website cites are allowed) one way or another. And further, a lot of the counties (but not districts) are in the Columbia University Gazetteers. So I think the time has come for me (and others interested) to just make entries for all of the words in those county-level division lists on Wikipedia. There will be problems and hiccups here or there, and an entry may need to be shifted from one spelling to another, or perhaps some districts will be too obscure. Perhaps there will be a region where there's just zero coverage, and so a slew of entries might not meet Wiktionary:Attestation. But I think on the whole that a duplication of English Wikipedia's coverage of county-level divisions would be within English Wiktionary standards, and at this point is possible and reasonable and would be trustworthy. That would look like (per ) a total of 3000 or so entries. Without this Wikipedia duplication, Wiktionary's coverage is not professional enough- too slanted into the seven or eight provinces I have focused on. This fuller skeleton will be a great jumping off place for people who can go deeper into other geographical features. For county-level divisions with one syllable names (x qu or x xian or similar), I would just make "something"- either the x xian entry or just the x entry. Both are legitimate; if one is more common or official, one could shift the page to that one later on if needed. Just following Wikipedia's lead or whatever you see. The goal is to relieve what I feel to be the stilted nature of the coverage while of course following the attestation rule of Wiktionary. --Geographyinitiative (talk) 18:32, 22 April 2023 (UTC) (Modified)


 * What do you mean by websites? ETA: Also, gazetteers are just geographical dictionaries. They mention terms, they don't use them.CitationsFreak: Accessed 2023/01/01 (talk) 19:18, 22 April 2023 (UTC)


 * "website cite" means a normal cite of an English language sentence that is from an online source doesn't have an OCLC number connected to it- something online that's not really "Permanently Recorded Media" under the Wiktionary meaning. I bring up the gazetteers because they are an indicia of reliability that the word is the academically accepted word and that the word exists. I changed the above comment to make that clearer. --Geographyinitiative (talk) 20:02, 22 April 2023 (UTC) (Modified)
 * I feel like counting gazetteers as dictionaries for this lets us not have the Chinese towns that no one's taking about in English. Sure, they may exist, but if the only results are atlases and the like, then is it really worth having? (The same goes with every other city.) CitationsFreak: Accessed 2023/01/01 (talk) 21:02, 22 April 2023 (UTC)
 * My question is: will you have fun? Catonif (talk) 21:41, 22 April 2023 (UTC)
 * I would argue that there's probably enough cites out there to support the modern spelling (the one from pinyin) of the counties, but I'd prefer seeing at least a cite or two on the entry. – Wpi31 (talk) 05:46, 23 April 2023 (UTC)


 * Responses to above: The gazetteers count for nothing. It is still a "zero" in terms of Attestation. But I'm telling you that when I (subjectively) come back to one of the obscure entries I've worked on, and there's a Columbia Gazetteer in Further reading, or one of the dictionaries, it feels like I'm on a reliable website. And really, some of those gazetteer descriptions from 1998 or 2008 are actually STILL better than Wikipedia's descriptions in 2023. It's just valuable material! It doesn't go toward Attestation directly, but it can be more illuminating than all the info in the three cites combined. --- Having this skeleton in place will allow for towns to be covered. Guiyu is a great example of something that no Gazetteer or dictionary has, but is definitely meeting Attestation. With this skeleton in place, you're ready for people to make town entries for the higher profile towns. Further, it will allow for towns to be covered in their Chinese character entries without having redlinks. --- It will be fun to work through it and fun to see how things will grow from this as time goes on.  I agree with the cite or two on the entry concept. But I think it's clear that if I can reach three cites with Yuanhui or the districts and counties of Ningxia, then I'm guessing that some high percentage (may even "all") will get to three cites, either by English Wikipedia's spelling or something close/similar. I have never seen an English-language name of a county-level division successfully challenged on Attestation grounds. --Geographyinitiative (talk) 11:44, 23 April 2023 (UTC)

To carry this out, I would just find the list of county-level locations in a single province-level division (for instance: ) and just add them all over the course of several days or maybe a week or so. Then I will wait a week or some similar length of time (where I go back to my regular routine) before doing the next province so that the community can review what has just happened. --Geographyinitiative (talk) 20:57, 22 April 2023 (UTC)


 * To be brutally honest, you're the only Wiktionarian who cares about Chinese villages, so don't expect a proper "review". We'll pretty much let you get on with your niche interest. Wonderfool April 2023 (talk) 11:58, 23 April 2023 (UTC)


 * Well I have decided against doing this. I think it is a good idea on some level, but I'm looking at stuff like Longzihu, and it's just weak in terms of the cites I can immediately see online! Too weak! So I'll just expand this area in the more conservative and slow manner that I normally do rather than make a Great Leap Forward here. Maybe I will do this later if it makes sense. --Geographyinitiative (talk) 12:31, 23 April 2023 (UTC)

De-italicizing "Usenet" in Template:quote-newsgroup
Currently, “Usenet” is in italics in Template:quote-newsgroup; e.g.:


 * 1986 August 27, Marc A. Ries, “Re: UNIX v.3 Query”, in , , retrieved 2 July 2016, message-ID <1267@trwrb.UUCP>:

I asked about the italics at ; this was Sgconlaw’s reply:
 * “Because the template uses the standard “Module:quote” backend as all the other quotation templates. This is something I have no technological capacity to change as I don’t know how to use Lua. Probably best to raise it as a general issue for discussion at the Beer Parlour before asking for any change.”

J3133 (talk) 11:41, 23 April 2023 (UTC)


 * I suggested that raise this at the Beer Parlour as I don't know if there is any established rule that the names of websites (which are not, say, magazines or newspapers) should be left unitalicized and not treated as titles of works which would usually be italicized. If there is consensus that names of websites shouldn't be italicized, I imagine it would be fairly straightforward to update "Module:quote" to provide a parameter for turning off the italics of titles where needed. — Sgconlaw (talk) 12:18, 23 April 2023 (UTC)
 * Note that Usenet is not a website. Italicizing “Twitter”, for example, would be unusual. J3133 (talk) 12:26, 23 April 2023 (UTC)
 * We should probably discuss the italicization of names of websites at the same time, then. — Sgconlaw (talk) 12:52, 23 April 2023 (UTC)
 * Yes, only the newsgroup should be italicized, that is the information—for those who recognize the format. The name of the Usenet and that of a newsgroup are not on the same ontic level so cannot be both italicized. Fay Freak (talk) 12:56, 23 April 2023 (UTC)
 * I made a change as an experiment. What do you think? This, that and the other (talk) 00:56, 25 April 2023 (UTC)
 * Seems OK to me. Or, actually, we could just use the format parameter. Thanks. — Sgconlaw (talk) 05:11, 25 April 2023 (UTC)
 * Perhaps it should link to Usenet newsgroup? J3133 (talk) 07:46, 25 April 2023 (UTC)
 * I'm OK either way. — Sgconlaw (talk) 13:07, 25 April 2023 (UTC)

Appendix:Unicode/Private Use Area/GB-T 20542 and Appendix:Unicode/Supplementary Private Use Area-A/GB-T 22238
This came to my attention because the second of these is in CAT:E with a Lua time-out error. At first I thought I would fix that by removing all the wikilinks, since they were all redlinks and we shouldn't have mainspace entries for PUA codepoints. Then I noticed that the first page has bluelinks and there are actual entries that seem to be using those codepoints (or is something converting the links to non-PUA codepoints? I have no way to check).

What's going on here? Should these appendices even exist? And if those really are PUA-codepoint mainspace entries, should they even exist? . Chuck Entz (talk) 15:14, 23 April 2023 (UTC)


 * @Chuck Entz These appendices shouldn’t exist, as private use characters are only relevant to character encoding, and don’t represent anything linguistic at all.
 * The weird results could be a quirk of private use characters being temporarily used for various internal functions by various modules, as they’re a convenient way to swap something you want to protect out for a placeholder while you process a string, like formatting. The reason this is possible is because there are 0 circumstances under which we should ever be inputting or outputting private use characters, as they have no real identity, so different fonts use them for loads of different things. For our purposes that’s actually a good thing, as it means we can rely on them for what we need them for, but you’ll get weird things like this if you actually try to use them like normal characters.
 * That all being said, in this case it seems like the links are to normal characters, which these PUA characters represent under some scheme - by the looks of things, the rejected proposal to encode Tibetan as several thousand atomic characters, instead of the combining ones we use today. Still not linguistically relevant, though. I only checked a handful of links, though, so there could be attempted links to actual private use characters too. Theknightwho (talk) 15:31, 23 April 2023 (UTC)
 * That all being said, in this case it seems like the links are to normal characters, which these PUA characters represent under some scheme - by the looks of things, the rejected proposal to encode Tibetan as several thousand atomic characters, instead of the combining ones we use today. Still not linguistically relevant, though. I only checked a handful of links, though, so there could be attempted links to actual private use characters too. Theknightwho (talk) 15:31, 23 April 2023 (UTC)
 * That all being said, in this case it seems like the links are to normal characters, which these PUA characters represent under some scheme - by the looks of things, the rejected proposal to encode Tibetan as several thousand atomic characters, instead of the combining ones we use today. Still not linguistically relevant, though. I only checked a handful of links, though, so there could be attempted links to actual private use characters too. Theknightwho (talk) 15:31, 23 April 2023 (UTC)


 * ...Nope.
 * Just look a bit more carefully and you will notice that page uses no PUA character for wikilinks at all. PUA character only appear in the "PUA" column and they are in the plain text, no link, entry, template, or module involved. PUA characters have nothing to do with this.
 * The real culprit for the time out error is the laggy string functions in Module:string utilities. To prove this, add the following lines to the end of that module (before return):


 * Then you will see the lua error is gone. I did warn the community about the potential risks of these functions with good reasons.
 * If you don't want to touch that module, converting all to, or spliting up that appendix page can also help to get rid of the error. -- Huhu9001 (talk) 05:47, 24 April 2023 (UTC)
 * @Huhu9001 You obviously didn't read my reply properly. If they were the "real culprit", then it doesn't make sense that the time-outs would only be happening now, either. We can certainly make that change and cause memory errors on some high-use pages, though, if you feel that strongly about it. Theknightwho (talk) 06:01, 24 April 2023 (UTC)


 * The module errors were only what made me aware of the appendix, not the reason I suggested we might not want it. I never thought anything having to do with PUA characters was causing the module errors, just that the module errors made the issue of whether we should have such an appendix more then academic. At any rate, when I saw the bluelinks, I realized I didn't understand what was going on, so I asked here. Note that I didn't post this to RFDO or the Grease pit because this wassn't a matter for just one or the other, but both- along with possibly other issues.
 * I'm not trying to draw attention away from system architecture issues- there's obviously a reason this only showed up in CAT:E recently even though it was created over a year ago. Same with the Korean-syllable entries that are also there (yes, having dozens and dozens of templates showing every character that has a given syllable as a reading drawing from dozens and dozens of data modules is a massive drain on resources, but it only went over the edge into a memory error in the past few weeks).
 * And I'm not making excuses for Tkw- I've had to deal with thousands and thousands of module errors after we had had CAT:E pretty much clear- but they've been tireless in cleaning up the problems they've created, and we can't just revert everything they've done and magically make it all better. For better or worse, we have to work with things as they are now, and with the people who made them that way. "I told you so" doesn't fix anything. Chuck Entz (talk) 06:56, 24 April 2023 (UTC)
 * For the record: the change Huhu9001 suggests reduces the page time to "only" 9.7s (out of a 10s limit), and this page is also an outlier in that it consists exclusively of text for which the Module:string utilities functions are slower than the ustring ones. This is atypical, and a similar comparison on the majority of slow, high use pages (which have a lot of Latin text) should be enough to explain why those custom functions are useful.
 * I suspect Appendix:Unicode/Supplementary Private Use Area-A/GB-T 22238 has been hovering extremely close to the 10s limit for quite a while now, and something's obviously just tipped it over the edge. Stripping out any number of relatively minor features causes it to (just about) finish within time.
 * Huhu9001 is also flat wrong about a few things, too: lang calls Module:links if it detects links, just like many other templates. That means the full back-end infrastructure is being used. Theknightwho (talk) 07:16, 24 April 2023 (UTC)
 * I don't know what you are complaining about. Haven't I just told you the error can be removed by converting the page to t:lang-lite? If that is not "work with things as they are now", what is? -- Huhu9001 (talk) 07:41, 24 April 2023 (UTC)
 * Yes, but that might introduce another error, i.e. Category:Pages where template include size is exceeded. You can't only focus on eliminating the other problem. – Wpi31 (talk) 08:05, 24 April 2023 (UTC)
 * What...? No it doesn't. You can easily know this by doing a preview yourself, instead of guessing "that might". -- Huhu9001 (talk) 08:31, 24 April 2023 (UTC)
 * I wasn't complaining about that part. That was the one piece of good advice out of the whole thing. While you were posting your reply, I already fixed the module error- but instead of lang-lite, I used l-lite. There was absolutely no reason to use lang instead of l in the first place. lang with the square brackets was doing exactly what l would do without them, except it had the extra overhead of converting the square-bracketed form into a link. You gave me a good suggestion, but it wouldn't have worked without addressing the point that Tkw brought up. That's the way to get things done: coming up with a solution by bringing together different people and listening to them. Chuck Entz (talk) 08:12, 24 April 2023 (UTC)
 * I am just reluctant to introduce additional changes beyond the necessary technical ones to avoid more arguments. That's why I suggested t:lang-lite (of course it works, not "it wouldn't have worked without..."). If you are not so, alright, I have no problem with it. And to add a little more, you forgot Tibt. -- Huhu9001 (talk) 08:51, 24 April 2023 (UTC)
 * Good point. Tibt added. Chuck Entz (talk) 09:06, 24 April 2023 (UTC)

Thai Coalmines
What is the status of Thai compounds such as, which might be argued to be an SoP of and ? It has been reduced to a bare redirect to the latter without any word of explanation, which is why there is no 'transliteration' displayed above. If the semantics don't work, does the answer depend on analysis of Thai line-breaking (and possibly spell-checking)? --RichardW57m (talk) 13:16, 26 April 2023 (UTC)


 * @RichardW57m: Good question. It's something around inclusion in monolingual dictionaries, perception by native speakers and a couple of other criteria, which now escape me. @Octahedron80 has converted the entry to a redirect. has two many meanings. With the qualifier  it becomes "Monday". I personally think that  should be included but without any rules, it's just voting, if I'm not mistaken.
 * It would be good to have CFI specific for scriptio continua languages (including Vietnamese, if it has spaces between syllables). Languages like German or Finnish, which form long compound words) would need their specific CFI but that's a different story. Anatoli T. (обсудить/вклад) 13:40, 26 April 2023 (UTC)


 * Does "จันทร์" alone regularly mean Monday? If not, I don't see how "Monday" is in any way guessable from "[term meaning moon]" plus "[term meaning day]". (Sure, "moon" + "day" is the ety of the English word, too, but the point is: how do I know which day of the week that is? Is it lunar day the day after Thursday? You can't guess from the parts.) - -sche (discuss) 00:41, 27 April 2023 (UTC)
 * @-sche: Yeah, the choice for removing/converted to redirects seems arbitrary. วันจันทร์ is included in https://www.sanook.com/dictionary/dict/all/search/วันจันทร์/ Anatoli T. (обсудить/вклад) 00:51, 27 April 2023 (UTC)
 * Digression in same post above and multitudinous replies to it have been moved to Grease pit/2023/April in response to an implied complaint from . --RichardW57 (talk) 11:17, 29 April 2023 (UTC)
 * Yes; the Royal Institute Dictionary (RID) dated 2542 BE defines the simplex as 'name of the second day of the week'. The origin of the order of the planets is obscure.  In this case, one could see  as an optional classificatory prefix, just as  is a classificatory prefix for celestial objects other than the two luminaries and  is for months.  Irritatingly, I can't find  itself in the RID, though Se-ed's Modern English-Thai dictionary gives it as the translation of 'Monday'.  Obviously, I can't see from the latter whether it's one word or two. --RichardW57 (talk) 11:56, 29 April 2023 (UTC)

Seeking volunteers for the next step in the Universal Code of Conduct process

 *  m:Special:MyLanguage/Universal Code of Conduct/U4C Building Committee/Nominations/Announcement • 

Hello,

As follow-up to the message about the Universal Code of Conduct Enforcement Guidelines by Wikimedia Foundation Board of Trustees Vice Chair, Shani Evenstein Sigalov, I am reaching out about the next steps. I want to bring your attention to the next stage of the Universal Code of Conduct process, which is forming a building committee for the Universal Code of Conduct Coordinating Committee (U4C). I invite community members with experience and deep interest in community health and governance to nominate themselves to be part of the U4C building committee, which needs people who are:


 * Community members in good standing
 * Knowledgeable about movement community processes, such as, but not limited to, policy drafting, participatory decision making, and application of existing rules and policies on Wikimedia projects
 * Aware and appreciative of the diversity of the movement, such as, but not limited to, languages spoken, identity, geography, and project type
 * Committed to participate for the entire U4C Building Committee period from mid-May - December 2023
 * Comfortable with engaging in difficult, but productive conversations
 * Confidently able to communicate in English

The Building Committee shall consist of volunteer community members, affiliate board or staff, and Wikimedia Foundation staff.

The Universal Code of Conduct has been a process strengthened by the skills and knowledge of the community and I look forward to what the U4C Building Committee creates. If you are interested in joining the Building Committee, please either sign up on the Meta-Wiki page, or contact ucocprojectwikimedia.org by May 12, 2023. Read more on Meta-Wiki.

Best regards, 

Xeno (WMF) 19:01, 26 April 2023 (UTC)

Attested Arabic transliterations and pronunciations getting removed or replaced
Hello all. Can the rules for transliterations of foreign names, loanwords be defined in Arabic, so that editors don't remove attested transliterations if they don't like vowel symbols like "ō"/"o", "ē"/"e" (or consonants), which are not part of the standard Arabic? Or have a productive discussion or a vote? Specifically, has an attested transliteration "kōriyā" (shortened to "kōriya" in the real life but the last "ā" is phonemic, so it can be left with a macron).

@Fenakhay and @عربي-٣١ always like to correct transliterations, even if they are provided in the reference just a couple of clicks away. Hans Wehr (HW) dictionary must be very disliked by some native speakers, LOL. HW doesn't answer all questions but it's pretty good and mostly reliable.

We all know that phonemes "ō"/"o", "ē"/"e" exist in (standard) Arabic. I don't even have to search for an audio evidence, there are plenty. We had a similar back and forth discussion about transliterated as "beljīkā". I had to put a comment there "this is sourced in Hans Wehr dictionary, do not remove without a discussion, phoneme /e/ is available in Arabic".

BTW, you can always ADD, as native speakers, the more common pronunciation/transliteration but please don't REMOVE transliterations. Let's not waste each other's time :)

Disclaimer: I am not claiming to know Arabic well but prove me wrong on the subject. There is more than one way to pronounce many loanwords in Arabic and it's so often depends who you ask, what region they are from and what school of thought they follow. I do have associates among educated Arabic speakers and they all seem to have various views. We have Al Jazeera, YouTube channels, Forvo to get an idea on pronunciations of various words. Anatoli T. (обсудить/вклад) 00:23, 27 April 2023 (UTC)


 * My understanding is that the vowels of Standard Arabic are unanimously rendered into either open or close vowels, without high-mid/close-mid ones. The exceptions are very scarce, like the case of imaalah, which is attested in the Quranic prosody only once in the most common narration but prevalent in other less common narrations. It's true that some of the modern spoken dialects (like my native Egyptian) and some borrowings feature those close-mid vowels, but I have to agree with @Fenakhay and @عربي-٣١ about كوريا in particular. That pronunciation sounds very unnatural, particularly so for Standard Arabic, in which the entry is. I've never been exposed to it, despite a significant familiarity with Arabic dialectology working as an Arabic interpreter for Arabs in the Anglophone. I hopped on HW and was able to confirm it, but I still think it'll need to be corroborated with an explanation of the why and, ideally, the where of such a deviation from the usual phonology of similar borrowings takes place. Assem Khidhr (talk) 05:25, 27 April 2023 (UTC)
 * We can start from what we know and what we (possibly) agree on.
 * Phonemes "ō"/"o", "ē"/"e" do exist in MSA (Modern Standard Arabic). They are infrequent, maybe "dispreferred" but they ARE used - by speakers themselves. Let's take - any of "muskū", "mūskū", "mōskū", "mōskō" are the current transliterations. )You may even find some discussions about. I won't search for it. Some discussions may have gone away after I deleted my talk page several times.) It depends on who you ask. So, is "mōskō" unnatural? Possibly yes, is it used by native speakers? Yes, you can even try Al Jazeera or other Arabic media.
 * There are roughly three views on the matter, two extreme ones with a few in the middle:
 * All loanwords should be Arabised, e.g. adjust to the Arabic spelling. becomes "mūskū".
 * When an Arabic speaker recognises a foreign word, he or she pronounces it based on their knowledge of the word. E.g. جو بايدن (Joe Biden) will not be necessarily pronounced "jū bāydan" but closer to the English pronunciation.
 * Depends on the word, region, its currency in Arabic, knowledge of foreign languages. Sometimes on how difficult a word is to pronounce. [p] may be replaced with [b], [ɡ] with [d͡ʒ] or [ʒ], etc.
 * Re corroboration per @Assem Khidhr: exactly! Native speakers could do it. If a reading can be said to be out of use, dated, (extremely) rare, regional, whatever, it can be labelled but not deleted without any explanation, ignoring the reference provided. Typically, ignoring references is frowned upon. If the reference is bad (I doubt it), it can be discussed. I respect the contributions of these users. Just calling for some cooperation. A revert like with an edit summary "No..." won't do.
 * I wouldn't insist on using referenced pronunciation from Hans Wehr but is there anything better out there? Is there a respected authority to prescribed how words should be pronounced? Do we need search through YouTube videos for each loanword? Native speakers can shed light on how a term is more commonly pronounced but I don't think one needs to remove a referenced pronunciation. Someone has already done the hard work if a word is already defined in dictionaries. Anatoli T. (обсудить/вклад) 07:08, 27 April 2023 (UTC)
 * I am inclined to agree with you. In particular, MSA is no one's native language and no one speaks "pure" MSA especially with regards to numbers and recent loanwords (which includes most proper nouns). Sounds like ē and ō do occur in the typical speaker's native language so there's no reason they won't be carried over into MSA. Hans Wehr's dictionary is generally both accurate and conservative, so I think it's a good resource to use to cite pronunciations with "nonstandard" sounds. Benwing2 (talk) 08:11, 27 April 2023 (UTC)
 * @Benwing2: Thank you.
 * A related older discussion can be found at pronunciation of بلجيكا in Arabic Anatoli T. (обсудить/вклад) 08:22, 27 April 2023 (UTC)
 * Also inviting @Esperfulmo, since you posted on my talk page on a similar matter a while ago in User_talk:Atitarev. Anatoli T. (обсудить/вклад) 08:28, 27 April 2023 (UTC)
 * I agree that @Fenakhay's approach might have been prescriptivist in enforcing the fully Arabized pronunciation, which I know a lot of MSA speakers don't adhere to, particularly in the Levant/Mashreq. The examples of موسكو and جو بايدن are spot-on and I agree your renditions are in use. The lack of elaboration in the edit summary is also unproductive, compared to labelling for example. As for the general views, I actually think they work in tandem, the first emphasizing the ideal and the third emphasizing the hurdles against that ideal, and the second emphasizing the choice to code-switch. That's why the most accurate view is more empirical than analytic, based on reflecting convention, kind of like a mild case of English pronunciation (which is extremely pragmatist).
 * Now I wouldn't say the phonemes exist per se in MSA, not even with the qualifiers you mentioned. The phonology of loanwords is hardly considered part of any language's loanwords, let alone at a time of close contact with the one most global language by a long shot. The occurrence of these phonemes is less about an actual phonemic expansion or switch and more about codeswitching, not just to English but to the spoken Arabic varieties that surely contain these phonemes and whose pronunciation better match the source loan. This kind of codeswitching is a daily behavior for modern Arabic speakers and therefore seems the best explanation.
 * Be that as it may, there is still some generalizations to be made about the likelihood of full Arabization. In words like موسكو, the source pronunciation only differs than the fullly Arabized one by a couple of vowels that are inherently similar to their Arabic counterparts, in terms of articulation. This is why it's more likely for both the consonants and the vowels to be borrowed. However, the first vowel in Korea is actually a schwa with a stress on the second syllable. As stress isn't phonological in Arabic and schwa relatively distant than its vowel system, كوريا is more likely to have only its consonants borrowed and its vowels arabized. I know this is all speculative, but this is our best bet as pronunciation is translingually more difficult to verify in writing due to the unfamiliarity of the public with phonetic notation. Assem Khidhr (talk) 14:46, 27 April 2023 (UTC)
 * @Benwing2 @Assem Khidhr I changed the pronunciation for كوريا because I never heard an Arabic speaker pronounce it as kōrya/kōriya (im not sure about Arabic speakers west of Libya) it sounds so unnatural, for some reason it’s pronounced kūrya but for example موسكو Moscow is mostly pronounced Mōsku [moːsko] - [mosːku] even in Standard Arabic / TV news and mūsku information would sound unnatural and enforced, so for most Arabic speakers east of Algeria long ē ō ī ū are four different clear sounds/phonemes but short i/e and u/o are allophones even for speakers with zero foreign language knowledge --عربي-٣١ (talk) 14:51, 27 April 2023 (UTC)
 * I think the idea of adding the appropriate labels is a good one, e.g. if 'kōriya' is attested in Hans Wehr but rare, we can mention that. As for what counts as MSA phonology, there are several languages that have additional sounds present only in loanwords; e.g. German has nasal vowels in words such as Orange that are borrowed from French, which are quite stable despite German not having any native nasal vowels. So I don't think it's correct to say this is necessarily code switching. Benwing2 (talk) 15:28, 27 April 2023 (UTC)
 * Apologies, messed up the ping. Benwing2 (talk) 15:28, 27 April 2023 (UTC)
 * @عربي-٣١: Thanks for your response. I did hear it pronounced 'kōriya'. Also https://forvo.com/search/كوريا/, which also include "كوريا الشمالية" and "كوريا الجنوبية". It's 'kōriya' in all cases. It's fine to make "kūriyā" default and most common but the entry can handle head2, head3, etc. to add various pronunciations.
 * @Assem Khidhr: Yeah, the code-switching argument is always there at Wiktionary but loanwords in Arabic are written in the Arabic script and can get adopted. Would you say, if "كوريا" is pronounced "kūriyā", then it's a borrowing but if it's "kōriyā", then it's code-switching? It happens that, as with Germans, who pronounce foreign words close to the original, Arabs seems to be in the same category, even if several sounds are not part of the standard Arabic. has various pronunciations. "doktōr", "duktōr" or "duktūr" (some of these attested readings have also been reverted from the entry). Do you consider, e.g. "duktōr" code-switching as well? Anatoli T. (обсудить/вклад) 22:10, 27 April 2023 (UTC)
 * @Atitarev You may have misinterpreted my comment. I'm not saying this is an instance of code-switching to English or the source language of the loanword in general, but to the spoken Arabic variety that does allow those close-mide vowels. This is especially true as switching back and forth between Standard Arabic and the spoken varieties is a constant behavior for most contemporary Arabic speakers. is indeed an example of such code-switching.
 * Regardless of how the Forvo entries you had mentioned turned out to sound, it isn't a particulary good source for that purpose as the fact that it's community-based means that most contributors won't pay much difference to whether their entries should go in the Arabic section or the spoken Arabic one. If you ask any Arabic-speaking layperson what language they speak, they'd just say "Arabic", even if their Standard Arabic proficiency is very low.
 * I'd also ditto عربي-٣١'s distinction between short and long close-mid vowels. In Arabic, short close-mid vowels are only phonetically rather than phonologically distinct from their close counterparts. In my experience, non-phonological phonetic differences aren't necessary to mark. And even when they are, they'd be enclosed in square brackets after the phonological rendition. For instance, might be notated as,  [bel.d͡ʒi.kaː] (dialectal). A similar resolution may be adopted for long close-mid vowels but without the square brackets and with a qualifier, which is already the case in  (see the Gulf Arabic pronounciation). This is more efficient too as it eliminates the need to create a whole new section for the spoken variety just to accommodate a different pronunciation, assuming there are no semantic differences. Assem Khidhr (talk) 00:33, 29 April 2023 (UTC)
 * /e/ and /o/ are never contrastive in Arabic. They are realizations of, but so is /g/ for among many Egyptians and, reportedly, early-medieval Arabs.
 * @обсудить every single pronunciation in Forvo are pronouncing it kūriya or kūrya I think you are having a hard time differentiating between [uː] and closed-o sound [oː], also regarding دكتور as I explained the difference between short u and o (dammah) is very irrelevant in Arabic and most speakers have them both as interchangeable allophones wether in Standard or dialects, so only duktōr or duktūr are phonemic pronunciations with short u/o being interchangeable, but any speaker can tell the difference between long ū and long ō since they are super clear like saying sink instead of think in English or saying mūt (die! 'imperative') instead of mōt (death) in dialectal Arabic.
 * @[User:عربي-٣١|عربي-٣١]]: Sorry, I had bad sound or my hearing let me down when I checked. Please ignore the Forvo links. I won't touch the entry, if I don't find the evidence with "kōriyā"/"kōriya" but you do agree that some words use "ō". I don't buy your argument about irrelevance, though. I find that Arabs who normally clearly pronounce "ū" would pronounce "ō" on "دكتور" or on some other foreign words. This was also on some training videos I used to learn Arabic. They didn't claim they spoke pure MSA but it was elevated, sort of mixture, so-called "educated spoken Arabic". (Your ping was stuffed up, you didn't sign either, so it didn't go through. BTW, you can change signature, so that people also know what to call you in English "Arabic-31/Arab-31/arabiyy-31", etc.) --Anatoli T. (обсудить/вклад) 07:37, 28 April 2023 (UTC)
 * Sorry about the signature and I do pronounce دكتور as [do̞kˈto̞ːr] or [dʊkˈto̞ːr] but at the end phonemically it’s /dukˈtoːr/ not /dokˈtoːr/ and btw when i checked the pronunciation in the Egyptian Arabic entries in Forvo they pronounce كوريا as [korja] - /kurja/ with short [o] but that’s the stress distribution in their dialect and not in Standard Arabic - Arabi-31 عربي-٣١ (talk) 13:08, 28 April 2023 (UTC)
 * Transliterating as /kūriyā/ (with slashes) seems therefore reasonable since this matches the vowel points of the word. How every Arab among the 450-something million in the Arab world would say it is irrelevant. Roger.M.Williams (talk) 19:17, 27 April 2023 (UTC)
 * Also, there is no need to look this deeply into it and bring up code-switching. Kids who pronounce the ḍammah as /o/ and as /ʒ/ or /g/ when reading out Arabic in a classroom are just like the adults who might do the same on television and in the gravest of speeches: they do not know how to say it otherwise or find it too uncomfortable. Noting their speech in every entry, however numerous they may be, is irrelevant. Roger.M.Williams (talk) 19:39, 27 April 2023 (UTC)
 * @Roger.M.Williams: Thanks for your opinion. I disagree. I think it's relevant. "ō"/"o", "ē"/"e" can't be rendered using native means - with the Arabic script and diacritics but as a dictionary, we can do more. I don't think Hans Wehr dictionary is irrelevant. It is an effort to show how not only native words but many loanwords are pronounced. Should be transliterated as 'ʔinjlīziyy' or 'ʔinglīziyy' then?
 * This topic is also about pronunciation, not just transliteration. If you enter, you get /koː.ri.jaː/, if you enter or , you get /kuː.ri.jaː/. Anatoli T. (обсудить/вклад) 22:20, 27 April 2023 (UTC)
 * I'm pretty sure I heard a radio host say 'ʔinjilīziyyah' ("the English language") once some time ago. And some people say it as 'ʔinkilīziyyah' (a spelling pronunciation, given that it is sometimes written as ). Searching for on Google shows various results, but I doubt many people say it like this (I don't recall hearing it myself). I do, however, admit that 'ʔinjilīziyyah' and 'ʔinkilīziyyah' sound cringe.


 * But you should not equate this case (or that of "England") with the situation of vowels. The word "English" has established itself, and how the English say it is well-known (somewhat). This is like the names of people: one is less likely to pronounce as 'jāndalf' if one knows that this is not what he is called, but even then he might say it like this anyway, as every word can be bent and fit into the frame of (native) Modern Standard Arabic sounds.


 * What I mean by this is that, if there is a name whose origin is well-known to have a sound that is not in the inventory of MSA (like ), the form is less likely to be twisted. This is why there can never be a "rule" for the transliteration of country names and loanwords in Arabic since there are no rules. If one wishes to integrate into an entry Hans Wehr's insights, one could write some usage notes rather than agonize over how this word or that is precisely said across Arab populations. Roger.M.Williams (talk) 19:12, 28 April 2023 (UTC)

Script parameter for headwords of Chinese entries in Latin script
So I see that recently User:Theknightwho has been adding = to pages, e.g. WiFi, PO and BT. (No, I have no intention in blaming TKW for this) I believe the current practice is to use = (i.e. omitting it) when the entry title is entirely in the Latin alphabet, which is what I have been doing since day one. IMO the usage of = is unaesthetically pleasing since it enlarges the font to 150% and displays in an (rather) ugly font. It also causes the issue mentioned in Grease pit/2023/April. Nevertheless, I would like to see if there are other reasons (and/or consensus) that we should be doing things the other way. – Wpi31 (talk) 17:46, 27 April 2023 (UTC)


 * @Wpi31 So I did this as a way of distinguishing entries written in pinyin from those written in running text with Hanzi (which I know also applies to pinyin sometimes, but you know what I mean). The most obvious example I can think of is . I agree it's really ugly, though, but realistically speaking this is (a) how it is in usage examples with those terms, and (b) how they usually look outside of Wiktionary, too. Theknightwho (talk) 18:11, 27 April 2023 (UTC)
 * In my experience, they are always written in half width and in generic English fonts, whether online or in print, serifs when used along Ming/Song fonts and sans-serifs when others. I suspect the places which don’t do this are either prescriptive dictionaries or older books that used to have a stricter typesetting requirement. – Wpi31 (talk) 18:38, 27 April 2023 (UTC)
 * You're probably right. This doesn't happen to me on mobile, and I assume that's because iPhone simply uses a less crappy font to display Latin characters tagged as.
 * Regardless of the font (which we can probably fix in CSS), I do think we probably should be tagging Latin characters used in running text with Hanzi as, as that's how they're being used, but I'm happy to defer to others on this. Theknightwho (talk) 15:11, 28 April 2023 (UTC)
 * That makes sense to me as well because the host script where those words would be used is Han. Could you see if you can follow the instructions in my post here and figure out what the terrible-looking font is? That would allow us to remove it from the CSS, if it's there. (I don't know if the instructions are good enough because nobody's used them yet when I've asked.) — Eru·tuon 19:02, 30 April 2023 (UTC)
 * The font is PingFang TC, which is the default Chinese font on Macbooks. I have custom fonts set up in my common.css, but PingFang TC is also the first one in MediaWiki:Common.css, which means it should be the default displayed font for Mac OS users. – Wpi31 (talk) 19:12, 30 April 2023 (UTC)
 * @Erutuon @Wpi31 For me it's MingLiU on Windows. As Wpi31 suggests, it's probably not a good idea to just remove them, as they're really common. Alternatively, we could maybe come up with a new CSS class for Latin text that's used in running text with Han (which would be trivial to autodetect, meaning no manual input). However, that would mean we'd need to find a way of implementing it when mixed with Hanzi as well, as it wouldn't make sense for it to work for but not . The existence of Category:Chinese terms written in multiple scripts does suggest we should implement that feature at some point, but I don't know if it'd be straightforward.
 * As a side point, another reason we want to use  is because it means automatic transliteration works for terms like . Theknightwho (talk) 19:17, 30 April 2023 (UTC)
 * I think it's fine for them to be tagged as =Hani if they're within running Chinese text (e.g. examples, quotes, and mixed terms) and the Chinese font has well-designed glyphs for Latin letters to be mixed within running Chinese text, which sadly isn't the case for most fonts, or they don't have a glyph for the Latin letters (PingFang TC is pretty decent on this front, along with Noto Sans CJK IIRC, which is why I only complained about the purely Latin ones, except that the font size change on =Hani is a bit weird in all cases).
 * In places where it only has Latin letters without CJK characters (e.g. headwords of such terms, links to such terms), IMO it's better to mark them as =Latn, especially for many of the longer Hong Kong Cantonese words. Wpi31 (talk) 19:37, 30 April 2023 (UTC)
 * I guess the IETF language tag for Mandarin pinyin would be  based on the language subtag registry file. (I'd prefer   to be more specific about the language, though they don't indicate that as valid.) Using IETF tags would make browsers display some text better, but unfortunately we don't have a way to output valid IETF language tags, even the minimal tags of language code and script code  because many of our script codes are not valid in IETF language tags. For instance, tagging CJK romanizations as ,  ,   seems to prevent the browser from overriding our CSS and giving them Han-script-appropriate fonts. (Testcases: zh , ja , ko ; zh-Latn , ja-Latn , ko-Latn .) We would at least have to specify in Module:scripts/data which internet script codes (if that's what you call it) to use in place of our special script codes and then in addition to our script classes, add the internet script code at the end of the language attributes when tagging text. — Eru·tuon 19:12, 27 April 2023 (UTC)
 * In places where it only has Latin letters without CJK characters (e.g. headwords of such terms, links to such terms), IMO it's better to mark them as =Latn, especially for many of the longer Hong Kong Cantonese words. Wpi31 (talk) 19:37, 30 April 2023 (UTC)
 * I guess the IETF language tag for Mandarin pinyin would be  based on the language subtag registry file. (I'd prefer   to be more specific about the language, though they don't indicate that as valid.) Using IETF tags would make browsers display some text better, but unfortunately we don't have a way to output valid IETF language tags, even the minimal tags of language code and script code  because many of our script codes are not valid in IETF language tags. For instance, tagging CJK romanizations as ,  ,   seems to prevent the browser from overriding our CSS and giving them Han-script-appropriate fonts. (Testcases: zh , ja , ko ; zh-Latn , ja-Latn , ko-Latn .) We would at least have to specify in Module:scripts/data which internet script codes (if that's what you call it) to use in place of our special script codes and then in addition to our script classes, add the internet script code at the end of the language attributes when tagging text. — Eru·tuon 19:12, 27 April 2023 (UTC)

Identifying circular definitions
In this diff an anon complained about an abundance of circular definitions here. Is that sufficiently true to merit an XML-dump run to identify simple circular definitons (2-entry circularity: A defined as "B" and B defined as "A")? How hard would it be to do a run for three-entry circularity? DCDuring (talk) 14:31, 28 April 2023 (UTC)
 * That IP's complaint is perplexing, because none of the definitions at are circular. I often see complaints, however, about e.g. verbal nouns using the base verb in the definition, yet those are merely referential, not circular, even if it can be less than helpful at times. That said I am somewhat curious about the real prevalence of 2-entry circularity. ←₰-→  Lingo Bingo Dingo (talk)  21:24, 4 May 2023 (UTC)
 * There are specific types of circularity, some pernicious, some not. For example, I fear that many taxonomic names are "defined" using English vernacular names and some of those English vernacular names are "defined" using taxonomic names. But taxonomic names are really defined by their hypernyms and hyponyms. In some other discussion of definition circularity, I pointed to the definitions taken from MW 1913 that consist, at least in part, of lists of purported synonyms, which are defined similarly or with just a single synonym.
 * If I were hunting for circularity, I would not try to root out every possible instance, but rather focus on high-likelihood cases. For example, a definienda that has a single-term definiens (excluding determiners). In such cases, it would be easy to check to see whether the definiens was defined using the definienda. One could even use this method to find 3-term (or n-term) circularity. DCDuring (talk) 21:49, 4 May 2023 (UTC)
 * I agree that the type described in your second paragraph seems like one that would best be prioritised. ←₰-→ Lingo Bingo Dingo (talk)  18:23, 5 May 2023 (UTC)
 * And then there's Xochiatipan, which is in a circle of one ... Chuck Entz (talk) 03:45, 6 May 2023 (UTC)
 * They aren't really circular, but the definitions at (etymologies 1 and 5; 5 may entirely be a reduplication of 1) do suffer from unclearness. ←₰-→  Lingo Bingo Dingo (talk)  18:23, 5 May 2023 (UTC)

Bot suggestions
Hey, I have two ideas that could be performed by a bot, respectively regarding Dutch and French entries.

Dutch entries often lack specification of whether they are used in the Netherlands or Belgium. On the Dutch Wiktionary, one can find extensive prevalence categories of what % of Dutch/Flemish people know a certain word. A large discrepancy in the knowledge % very likely indicates that one word is only common in one of the two regions. I don't know what threshold can be used, but there is certainly bot potential in this, I can just feel it.

Moreover, French entries really often lack an Etymology section. On the French Wiktionary, loanwords and so on are much more extensively documented, compare for example this and this. And many of the categorized French words are already present here, their entries are just not as detailed. Surely there is a way to code a bot into parsing this information and indicating it. Synotia (talk) 15:55, 28 April 2023 (UTC)
 * The Dutch percentages could be added by a bot to a statistics section. I've argued against surname frequencies here for English, but this seems like an ideal use of the statistics header. I'm not sure the categories for each possible percentage are necessary though.


 * For the French etymologies, I don't think anyone would support importing them by bot, but someone could create lists of our entries that are not in fr.wikt's categories for certain borrowings and the lists could be added to WT:TODO. Of course you have to consider that the information at fr.wikt might be wrong, so each entry should be checked against a reliable source. Ultimateria (talk) 05:17, 2 May 2023 (UTC)
 * Do we know what the licence of those statistics is? I have sometimes consulted them to determine Belgium-Netherlands splits, but they are something in need of interpretation; i.e. you cannot rely on them blindly. Certain Dutch or Belgian words can be quite famous in the other country. ←₰-→ Lingo Bingo Dingo (talk)  18:17, 5 May 2023 (UTC)

Elfdalian nasalization
Hello! I was waiting with this question, but may be it might be an important one i don't know. What we suppose to do with the etymology three for stuff like 🇨🇬 and ? They are not from the Old Norse, because they've got the old Proto-Germanic nazalisation which was allready obsolete in the Old Norse.

But if i'm gonna put them into Proto-Germanic, it's gonna be also weird, because Elfdalian isn't a separate language family from Swedish and Norwegian. Or is it? Do we have any example of this in some other language, when an extremely archaic form is preserved in some dialect, but is non-existent in any related language? Or is it highly un-normal?

Post Scriptum. And actually, this kind of nasalization is also used in some archaic and quear Norwegian dialects (i reckon, some kind of old Tydalmålet or something), but i can't find no examples or study on this subject. Tollef Salemann (talk) 18:09, 29 April 2023 (UTC)
 * Nasalization was present in the Old Norse of the . When I have looked at Wiktionary entries, they seem to ignore nasalization outside of IPA transcriptions (search query showing Old Norse IPA transcriptions with the nasalization diacritic), and the Old Norse orthography in the books I used to look at also ignores it, so I guess they are prioritizing Old Norse varieties that had lost nasalization. — Eru·tuon 18:47, 29 April 2023 (UTC)
 * Ok! I've seen some examples of Old Norse here with nazalization, but it seems not listed in 99% of cases. May i just put a reconstructed nasalization into Old Norse descendant tree, marked with a star? Tollef Salemann (talk) 19:00, 29 April 2023 (UTC)
 * I'm not involved enough in Old Norse entries to answer. Pinging Mårtensås who probably knows more. — Eru·tuon 19:15, 29 April 2023 (UTC)
 * All of Wiktionary Old Norse is based on standard 1200s Icelandic normalization. This has very strange effects when 13th century Icelandic forms are listed as the ancestors of 12th century Old Danish ones. In general I think we could be a lot more archaic with our Old Norse orthography, like differentiating e from Proto-Germanic *e vs e from Proto-Germanic *a with i-umlaut, as is still done in some dialects to this day. The only reason this isn't done is because they merged in Icelandic.
 * Elfdalian is a Swedish dialect, specifically of the Dalecarlian group. That does not mean that it is descended from more southern Old Swedish, only that it's part of the Swedish dialect area. As said, the nasals are attested in the 12th century Icelandic FGT. They are also in Swedish and Danish Runic inscriptions, showing that they existed in common North Germanic, which on Wiktionary is called Old Norse. What I'm saying is that you should list them as descended from  since that is the current convention, but we should consider normalizing our Old Norse according to a more neutral and less Icelandic-specific orthography. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 19:23, 29 April 2023 (UTC)
 * Yes, i've noticed allready that there are forms which ain't possible to find in the original manuscripts. On the second hand, the Old Norse section seems to bad, so we need to fix at least the words themself before we can register all of their alternative spellings. So i can't see no problem in the Icelandic spelling, but it seems too Icelandic sometimes. Do you have any way of how to spell the nasalized forms in Old Norse?


 * Ja, jag märkte redan att det finns former som äro inte lätt att hitta i manuskripterna. På den andra sidan, den norröna avdelingen ser fortfarandes fattigt ut, så jag tänker at man bör konsentrera sig på att registrera först orden, före man ska registrera ers sideformer. Så jag ser alltså inget problem med islendsk stavning, själf om den kan iblant se altför islendsk ut. Har du kanske nogra förslagen om huru man kan visa fram till dom nasaliserade norröna formerna? Tollef Salemann (talk) 14:52, 30 April 2023 (UTC)
 * Inspirerade av den "" kunde vi nyttja en punkt (så ȧ) i slika fall. Därmed från urgermanskt, men  från urgermanskt . ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 17:44, 1 May 2023 (UTC)
 * Väl, gjorde det för *gans, men osäker vartifrån kommer den älvdalska lǫs. Jag menar, om den är från *lȧss eller kanske från *lȧs. Har int studerat älvdalska nått särskilt mycket än så länge. Tollef Salemann (talk) 18:02, 1 May 2023 (UTC)
 * gȧs behöver väl inte rekonstrueras? Det är en stavningsvariant av gás som bättre visar etymologin ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 07:56, 2 May 2023 (UTC)

Template:sup
User:Sokkjo added =style="font-size:.6em" to the template in. This is inconsistent with the documentation (which says it's =text, and also broke the display of entries like 方太教煮餸——整定: notice that the superscript in (using sup) and  (which has its superscript generated automatically by the translit) is now different because of this change. After me reverting their change, Sokkjo then changed the template to =text, and also the documentation. Sokkjo said "the fact that this matched unicode uppercase in size is the only reason I use this template", but I don't see any point in doing so: if you want to match Unicode upper case [sic, I assume superscript?], then why don't just use =undefined for all the characters? I should also note that this also makes the text smaller and harder to read. – Wpi31 (talk) 04:49, 30 April 2023 (UTC)


 * PS: This is now also inconsistent with templates like sub, underline, italic, color, etc. that does exactly what the template says without doing fancy things like smaller text. – Wpi31 (talk) 04:57, 30 April 2023 (UTC)
 * So, when you say "broke" you mean it's "harder to read"? html or html is consistent with the sizing of unicode superscript characters. If you think sup is tool small, why not just use html? What's the issue? -- Sokkjō 08:24, 30 April 2023 (UTC)
 * Also, for the record, the template had employed = since . I was later, for some reason, and I merely added it back. -- Sokkjō 08:37, 30 April 2023 (UTC)
 * 1. By "broke" I meant making the display in the entry I mentioned to be different; harder to read is another separate reason – I have no idea why you comprehend a side remark of my comment as the main reason.
 * 2. The sizing of Unicode superscript characters depends heavily on the font. It may look good on your device with whichever font you're using, but they do NOT display as such for me, and probably others as well.
 * 3. Bare html =text is less ideal than using a template text.
 * 4. For the record, the template has been using =undefined since two years ago, why are you making this change only now, not earlier?
 * – Wpi31 (talk) 09:02, 30 April 2023 (UTC)
 * 1. The reasoning you gave for was, in your words, "this broke a lot of stuff". Again, what you seem to have really meant was "harder to read", which is a) not "broken", and b) subjective. 2. All font sizing is based on the defaults in Vector. If it does not display well in other scripts or languages, it is on those scripts to adapt accordingly. 3. Using html in conjunction with html is a facsimile. 4. Not really relative to the point, but I was inactive for period. -- Sokkjō 15:25, 30 April 2023 (UTC)
 * 1. I never meant "this broke a lot to stuff" to be "harder to read" – I always meant "[breaking] a lot of stuff" to be making the display of things different, which is obviously breaking the consistency across places in the wiki. As I've said, "harder to read" is a separate (and additional) reason for the change.
 * 2. The default font for css is simply css, which depends entirely on what fonts are installed on your device and how they are set up. There's nothing or whatsoever for the Mediawiki settings to have any effect here (including my example, which also has css if you use inspect element).
 * 3. searching for has 116 hits (this should account for both ways of ordering the tags and also some false positives);, 49078 hits; {{search|insource:/\{\{sup\{{!}}/}}, 1260 hits. It's pretty obvious that {{code|html| }} is not a common practice, instead {{code|html| }} is way more common.
 * – Wpi31 (talk) 16:06, 30 April 2023 (UTC)
 * 1. I'm just quoting your (misused) words. 2. This very text is set to {{code|css|font-size: calc(1em * 0.875); line-height: 1.6}} inside the Vector theme. That size and line height works well by default, but may not be optimal in all usages and scripts, and should adjust accordingly. 3. I was referring to in general use online, not on the project -- obviously, we've had sup here since 2012 for that. -- Sokkjō 16:40, 30 April 2023 (UTC)
 * While my wording may have been inaccurate, I never said "broken" = "harder to read". Please do not misinterprete my words. "harder to read" is a clear consequence of "[this] makes the text smaller".
 * Nudging the line height and/or size does not help in any way, since some fonts have a different glyph for Unicode superscript characters that is different from the normal characters. For example these all display differently for me: {{code|={{sup{{!}}1}}}}: $2$; {{code|html|={{sup|1}}}}: {{sup|1}}; {{code|=¹}}: ¹; {{code|={{sup{{!}}6}}}}: $2$; {{code|html|={{sup|6}}}}: {{sup|6}}; {{code|=⁶}}: ⁶ I don't see any point in the attempt of trying to make them match in line height or display or whatsoever; the template shouldn't serve such purpose that would be impossible to achieve across the many operating systems, browsers{{,}} and fonts; instead it should simply be a shorthand for {{code|html|={{sup|}}}}.
 * The predominant use of {{code|html|={{sup|}}}} over the other two syntaxes (around 36 times of the latter, even if you count all uses of sup as {{code|html|{{sup| }}}}) just shows what the demand is. Also, {{tl|sup}} was only changed to the modern {{code|html|={{sup|}}}} syntax in  in 2021, before that it was using some ugly CSS relative position hack. Please refrain from misrepresenting the history of the template by implying that {{tl|sup}} was always {{code|html|{{sup|  }}}} and/or hasn't been changed since 2012.
 * – Wpi31 (talk) 17:18, 30 April 2023 (UTC)
 * Victar has a way of taking words and implying you said something else while also ignoring what you actually said, because it's inconvenient. Vininn126 (talk) 17:45, 30 April 2023
 * Vininn126, what the fuck are you talking about? Contributing nothing to a discussion beyond making false claims to personally attack someone only makes you look bad. -- Sokkjō 20:32, 30 April 2023 (UTC)
 * Wpi31, you apparently missed my link to this {{revision|16297177|original version}}, where the content was simply {{code|lua|= {{sup|{{{1}}}}} }}. Later it actually had made use of two nested html tags. -- Sokkjō 20:47, 30 April 2023 (UTC)


 * To move on productively from this, I created the template {{tl|smallsup}}. {{ping|Erutuon}}, would it be possible to run ToilBot to replace all instances of {{tl|sup}} with {{tl|smallsup}}, giving users the option to switch to plain {{tl|sup}} if they wish? -- Sokkjō 21:05, 30 April 2023 (UTC)
 * Thank you for your cooperation, but why are you unilaterally deciding that all {{tl|sup}} should be changed to {{tl|smallsup}} when there are clearly use cases (e.g. mine) that should remain using {{tl|sup}}? – Wpi31 (talk) 06:16, 1 May 2023 (UTC)
 * Because it has over 9 years of usage with {{code|html| }} and two without. I'm sure {{ping|Erutuon}} can exclude cases you specify from the replacement. -- Sokkjō 07:04, 2 May 2023 (UTC)
 * {{reply to|Sokkjo|Wp31}} Sure. I can publish a list of the lines where they are used and let you look them over to see which ones you want to be left alone. Looks like there will be at least 1,400. There are 110 cases in templates, which may be harder to evaluate from just a line. — Eru·tuon 16:52, 3 May 2023 (UTC)
 * I strongly oppose doing a mass replacement of {{tl|sup}} with {{tl|smallsup}}. Theknightwho (talk) 17:58, 3 May 2023 (UTC)
 * Agreed. I doubt that most people would know the subtleness in {{tl|sup}} and then specifically wanted to use {{code|html| }} rather than {{code|html| }}. – Wpi31 (talk) 19:14, 3 May 2023 (UTC)

Interlingue translation entries
I just noticed that Interlingue has been excluded from the mainspace languages and am curious where all the data has gone. If/when it gets restored as a mainspace language, can the data be restored? (See my contributions for the contributions I made) Mithridates (talk) 15:07, 30 April 2023 (UTC)
 * See Special:PrefixIndex/Appendix:Interlingue/ Chuck Entz (talk) 15:34, 30 April 2023 (UTC)
 * I don't see any of my contributions in there - have they been erased or can they be at least automatically moved to the appendix in the meantime while efforts are made to restore Interlingue as a mainspace language? Mithridates (talk) 15:56, 30 April 2023 (UTC)
 * Apparently all your contributions were translations, and the translations were removed by bot starting here Chuck Entz (talk) 20:49, 30 April 2023 (UTC)
 * Thanks for the link. Seeing that saddens me a bit.
 * https://en.wiktionary.org/w/index.php?title=Special:Contributions&end=2021-03-15&namespace=all&start=2021-03-15&tagfilter=&target=NadandoBot&offset=&limit=500
 * Mithridates (talk) 07:42, 1 May 2023 (UTC)
 * Our current practice is to not have translations into appendix-only languages. I wouldn't hold your breath on Interlingue becoming a mainspace language again any time soon - the vote to not allow it only had one vote against (which argued for the seemingly radical position that any language with an ISO code should be allowed in the mainspace). &mdash; S URJECTION / T / C / L / 20:52, 30 April 2023 (UTC)
 * Yes, I found the page on the vote and believe it needs to be redone as the rationale is wildly incorrect and this would have unduly influenced the vote. It claims that Occidental/Interlingue had "some success (a regular magazine and a few books)" when as far back as 1936 the number of publications was 80. The CDELI in Switzerland where much of the older material is stored has 75 linear metres in the language (compared to 60 for Volapük and 50 for Interlingua, which is a mainspace language). One of the community's biggest problem at the moment is simply getting through all the material and digitizing it because there is so much of it. A recent discussion talks about attestability, which the language clearly has (again, more attestability than Interlingua and Volapük). It looks like the proposal for the vote is confusing it with another sort of Novial which did indeed flare out pretty quickly and is not even mentioned on the CDELI homepage. Mithridates (talk) 01:05, 1 May 2023 (UTC)
 * I'm hardly active in this wiki (somewhat more in others), but I agree that the vote should be redone since it was based on false premises which none of the voters even bothered to check. They also seem to have totally missed that Occidental has its own Wikipedia and its own Wiktionary – surely that should count for something? Generally I think that every conlang that has an active Wikipedia should be allowed in the main namespace. That would add five more conlangs in addition to the four that are currently allowed – certainly a number that Wiktionary can handle. And it would be a more principled approach than trying to decide about each conlang on an individual basis. Krissie (talk) 17:56, 2 May 2023 (UTC)
 * I think that would be easiest too. Otherwise it ends up being a vote in a single language on a single Metawiki project despite decisions already having been made to support Wikipedias in said languages.
 * And for Occidental/Interlingue, even if the criteria is attestation then that has certainly already been met. This page alone has 1 million+ words these pdfs add another 70 or so works as pdfs (most not yet digitized), and there are more left that are scanned to upload and start the work on. Attestation is in no way an issue for this language. Mithridates (talk) 23:19, 3 May 2023 (UTC)
 * As an aside, if anyone is interested in helping digitize some of the CDELI content by typing it up they are always welcome (we are buried in it...). Mithridates (talk) 01:07, 1 May 2023 (UTC)
 * I just saw this. You are welcome to start another vote but I think you are trying to fight the tide here. I personally would rather have *NO* conlangs at all (except maybe Esperanto) in the mainspace, and I think many others have at least somewhat similar views. Benwing2 (talk) 07:34, 12 July 2023 (UTC)
 * Sounds good, I will prepare some arguments over some time and see if we can get another vote started. Personally I think the explosive growth of Interslavic shows that fighting against auxiliary languages is a losing battle and will phrase the vote in that context as opposed to just Interlingue. Thanks! Mithridates (talk) 08:50, 12 July 2023 (UTC)