Wiktionary:Grease pit/2020/November

as indeclinable adjective
Hi. I needed to identify this word, Latin prōde, as an indeclinable adjective but I couldn't figure out how. I looked at Template:la-adecl, linked to from Template:la-adj, and it seemed to suggest I could use "la-adj|prōde<+>" to do what I wanted to do (I quote: "A bare lemma is equivalent to the lemma followed by <+>."), but it doesn't seem to work at all. I'd appreciate it if you fixed this for me.--Ser be etre shi (talk) 07:17, 1 November 2020 (UTC)
 * The default method is just adjective indeclinable. You should never leave an entry without a headword template. Chuck Entz (talk) 07:44, 1 November 2020 (UTC)
 * I implemented 1 awhile ago for this purpose; unfortunately it wasn't documented. It now is. Benwing2 (talk) 16:51, 1 November 2020 (UTC)

Morphology tables for Kunwok verbs
Dear editors, my colleagues are studying the Kunwok dialect of this group Bininj Gun-Wok. They have information about the various forms of the verb, see the page.

Please, could you advise, which morphology template in Wiktionary we can take as an example to create a morphology template for Kunwok verbs? --Andrew Krizhanovsky (talk) 11:47, 1 November 2020 (UTC)

Help for a primitive module
Trying to make for el.wiktionary, probably the most primitive 'auto cat' ever seen. All was going well. But a problem occured with one Category. Its title comprises of two keywords taken from the language-data module. I cannot split the title and match its parts. I describe the problem at el:Module talk:yy. If it is not possible, we could just alter completely this Category's name to a more manageable one. I was just wondering if it were possible -sorry to bother you with such questions- Thank you &#8209;&#8209;Sarri.greek &#9835; | 12:38, 1 November 2020 (UTC)
 * Hi Sarri ... I think you're trying to pull out the two language names from a category like Νεοελληνικές λέξεις αγγλικής προέλευσης? Then you'd want something like this:

local receiver_lang, donor_lang = mw.ustring.match(category, "^(.+) λέξεις (.+) προέλευσης$") After this, if  has the value , then   will have the value   and   will have the value. Benwing2 (talk) 17:34, 1 November 2020 (UTC)


 * Thank you, thank you . I need help, review and examples like that for Ref#Captures because the expressions flactuate (genitive, accusative etc. I need to match them with some data). The manual is difficult to apply, it is my only lesson-source. I try to write down easy-copy-paste applications need at my notes, for example, notes for titles.
 * It would be so nice, if real Luaers review such notes. Because, all lesson pages make an abrupt leap from 'Hello world' to very complicated things, without any intermediate examples. Describing does not help, because even a comma makes a difference. Erutuon, recommended help also from wikipedia, and indeed Johnuniq and Trappist the monk gave me lots of tips. The only helpful course i found is this one @wikiversity, by prof. Dave Braunschweig (I asked for help for small wikis). The aim is not to teach Lua, but to offer copy-paste easy-to-apply modules for small wiktionaries. &#8209;&#8209;Sarri.greek &#9835; | 00:07, 2 November 2020 (UTC)
 * Feel free to ask me questions. I agree that more examples would help; the manual basically assumes you have experience with another programming language, and it can be rough otherwise. Benwing2 (talk) 00:25, 2 November 2020 (UTC)


 * Thank you . It works fine with the above expression. I have problem with el:Κατηγορία:Νεοελληνικές λέξεις προέλευσης από τη μέση άνω γερμανική where i need to substract προέλευσης από τη μέση άνω γερμανική. &#8209;&#8209;Sarri.greek &#9835; | 00:31, 2 November 2020 (UTC)
 * Never mind. I shall try a different combination. &#8209;&#8209;Sarri.greek &#9835; | 00:34, 2 November 2020 (UTC)
 * Same problem at e.g. trial el:Module:yy line500 is that I cannot match the second part of e.g. el:Κατηγορία:Δάνεια από τα αγγλικά από τα αγγλικά to the el:Module:Languages.apota keyword. I did it manually, by substracting the first words (Borrowings, Calquies, etc) assuming that what is left is apota keyword. &#8209;&#8209;Sarri.greek &#9835; | 01:38, 2 November 2020 (UTC)
 * I assume από τα αγγλικά means "from English"? I see that each language has an apota entry expressing how to say "from LANG" for that language. Are you trying to go from the από τα ... text to the language code, i.e. get the language code for a specific language given the category? In that case, you first have to build a table that maps from apota text to the language code. What you have in el:Module:Languages is the opposite, it's a map from language code to apota text. If this is what you're trying to do, then you first need to build a table like this:

apota_to_language_code = {} for langcode, data in pairs(Languages) do   apota_to_language_code[data.apota] = langcode end
 * Then you need to extract the apota text and look up the language code, something like this:

local apota_text = mw.ustring.match(category, " (από .*)$") local apota_langcode = apota_to_language_code[apota_text]
 * This assumes that all apota text variants begin with από surrounded by spaces and that the apota text is always at the end of a category. If this isn't the case, then it gets harder, and we can figure out how to handle it depending on what if anything is always true about the apota text. Benwing2 (talk) 04:46, 2 November 2020 (UTC)
 * You are great, . I will study and try out all this. I have already tried some at el:Module:lang. There are so many expressions in inflectional languages. I can at least break them to even smaller parts. If needed we would have to change all keys. I will let you know what happened!  PS -i do not know how to write captures, i write them wrong all the time-.   &#8209;&#8209;Sarri.greek &#9835; | 04:53, 2 November 2020 (UTC)
 * , apota was not a problem. Already works fine at el:Module:yyline500+el:Module:langp.langapota_to_langiso. I will try some of your thoughts at titles with two keys, plus 2 variations per key with your style for captures. Thank you so much. &#8209;&#8209;Sarri.greek &#9835; | 07:20, 2 November 2020 (UTC)
 * The problem is keyword «from»:   It does not know where to put boundaries, unless it finds if somewere, in order to match it. Τhe expressions preceding it have the same problem: they are other keywords, also varying.  I guess, the whole thing is impossible. It is the fault of such varying style of naming. I will propose to my bureaucrat a change of all these keyowrds.  &#8209;&#8209;Sarri.greek &#9835; | 08:09, 2 November 2020 (UTC)
 * It might be possible with the existing naming scheme but you might have to loop over all languages and check each one in turn, which isn't efficient if you have a lot of languages. Definitely better if the naming scheme is as consistent as possible. Benwing2 (talk) 15:19, 2 November 2020 (UTC)
 * Yes yes Sir!, I am now trying to persuade them to change... :) We all thank you for your help &#8209;&#8209;Sarri.greek &#9835; | 15:25, 2 November 2020 (UTC)

Module:ckb-pron
Could someone who understands pronunciation modules please take a look at Module:ckb-pron and figure out why it's suddenly started creating links to the pronunciations? For example, at instead of outputting  it's outputting. The weird thing is that neither Module:ckb-pron nor ckb-IPA has been edited recently, but this behavior has only started within the last two weeks. —Mahāgaja · talk 13:20, 1 November 2020 (UTC)
 * It's because of a change made by User:Fenakhay to Module:ckb-translit on Oct 21. I think the intent was to have translits linked similarly e.g. to what happens with Gothic, but this isn't the right way to do this. I undid the change and I'll poke around and see if I can figure out how the Gothic translit linking is happening. Benwing2 (talk) 17:02, 1 November 2020 (UTC)
 * There's a  setting for Gothic in Module:languages/data3/g. This is implemented in Module:links. We can potentially add this to Module:languages/data3/c for Central Kurdish. Benwing2 (talk) 17:07, 1 November 2020 (UTC)

Automatic transclusion of English pronunciation symbols
One area where Wiktionary is still severly lacking is consistent and user-convenient inclusion of pronunciation guide material. While pronunciation files are way more difficult to formulate at first, much less to incorporate into entries and to reflect standard phonlogy, phonetic alphabets can revolutionize the enrichment of Wiktionary with pronunciation data. To date, however, they can only be entered by individual contributors using the on-screen IPA keyboard, which I think is the largest barrier to widespread participation in creating pronunciation sections, especially for laypersons. At the same time, almost every entry on Merriam-Webster is accompanied by the site's specialized transcription, which can be converted, with full conservation, to IPA by a simple conversion table. Luckily, Merriam-Webster's has a free API, too. Based on the previous, I propose the creation of:


 * 1) template:MerriamIPA to be used by individual contributors to automatically transclude US pronunciation characters and convert them into IPA. These can include both symbolic representations and audio files. Though, to incorporate the files would require further technical and legal compatibility with Commons to perform the extra steps to create an entry there.
 * 2) MerriamIPA bot to search for pages lacking in US pronunciation data and adds them as appropriate. As a next proofreading step, to supplement regional pronunciations added as perceived by editors, it can compare already present IPA symbols for US pronnuciation with standards from Merriam-Webster and add the latter if not found.

Unfortunately, I don't have neither the time nor the expertise required for such project, but I do think that such preliminary conception can be easily translated into a concrete piece of code by volunteers on the site. Please let me know your feedback. Assem Khidhr (talk) 15:38, 2 November 2020 (UTC)
 * Excellent idea. Let's hear what the bot guys have to say. -- Dentonius (my politics | talk) 15:54, 2 November 2020 (UTC)
 * I don't think we can legally do this. The API says this:
 * The Merriam-Webster Dictionary API is free as long as it is for non-commercial use, usage does not exceed 1000 queries per day per API key, and use is limited to two reference APIs.
 * I'm not quite sure what this means as I haven't looked into what "API key" means in this context, but I seriously doubt it would fly to start copying stuff into Wiktionary. Their goal is clearly to entice people into building their API into games and such so that the successful ones end up having to pay them for continued use, not to allow a free site like Wiktionary to leverage their work. Benwing2 (talk) 03:11, 4 November 2020 (UTC)
 * Anything restricted to noncommercíal use isn't free enough for a Wikimedia project. —Mahāgaja · talk 14:43, 4 November 2020 (UTC)
 * I was thinking about this when I read the noncommercial condition in the API site, but as far as I know, content available under fair use terms is sometimes allowed in Wikimedia projects. Whether this would apply to a Wiktionary template, a Wiktionary bot, and pronunciation data, however, is beyond my knowledge at the moment. I'll try to consult the appropriate sources. Legalities asides though, I was wondering about the technical feasibility of such project. Once the vision is adopted and the technical readiness achieved, even if with significantly less comprehensive free equivalents, data availability might follow suit in the future. Assem Khidhr (talk) 15:42, 4 November 2020 (UTC)

template lb produces a wrong link

 * linking to w:Brandenburgisch dialect.
 * linking to w:Brandenburgisch dialect.
 * linking to w:Brandenburgisch dialect.

for example are attested in Friedrich Woeste, "Eine Zwergsage. Mündlich .. in märk. [= märkischer] Mundart" and are from a Westphalian dialect, for which WP only has w:Westphalian language. --Der Zeitmeister (talk) 19:14, 2 November 2020 (UTC)
 * I guess Woeste is using a different definition of Märkisch than we are. If you're sure these words are Westphalian, then just label them Westphalian, rather than Märkisch. Does he say where in the Westphalian Sprachraum his "Märkisch" is/was spoken? —Mahāgaja · talk 16:43, 4 November 2020 (UTC)
 * It's relating to Grafschaft Mark (en.wp) and Märkischer Kreis (en.wp); also compare.
 * Westphalian is less specific, hence it's not good, especially as there are many quite different Westphalian dialects, such as: Westmünsterländisch, Münsterländisch, Lippisch, Paderbornisch, Soestisch, ...
 * Using an ambiguous term when there are common and less ambiguous or even unambiguous terms is a bad idea. For Brandenburgisch there are common and less ambiguous terms:
 * Brandenburgisch
 * Märkisch-Brandenburgisch (*Mark-Brandenburgisch might be better but AFAIK doesn't exist)
 * Brandenburgish (English and not German; e.g. The Dialects of Modern German, The Oxford Handbook of Comparative Syntax, poor e-book edition).
 * More specific terms like Nordmärkisch, Mittelmärkisch, Nordbrandenburgisch.
 * --Der Zeitmeister (talk) 10:25, 2 January 2021 (UTC)
 * It seems both dialects get called "Märkisch", so we're probably best off avoiding that term altogether. I've no objection to calling the East Low German lect "Brandenburgisch". As for the Westphalian lect, maybe we could call it "Märkisch Westphalian"? Also, Westfälische Dialekte suggests that Westphalian can be divided roughly into four dialect groups or more narrowly into nine dialects, including Märkisch. Would it be good enough for us to use the four-group scheme instead, and assign Märkisch words to whichever group Märkisch belongs to (I assume it's South Westphalian but I'm not sure)? —Mahāgaja · talk 11:05, 2 January 2021 (UTC)
 * South Westphalian too is too unspecfic. Hence, or  should be used, with  (or , , if any is a common name?) for the other.
 * BTW:
 * What can be found:
 * gets only one or two Google Books result, but not related to languages.
 * gets a few results relating to languages (, ; borderline: ).
 * gives some titles with, and there might be some titles with (märkisch-sauerländ. is an abbreviation, märkisch-sauerländem probably a mistake lacking an ).
 * gets a few result (with lack of capital:, ).
 * too.
 * Possibilities:
 * Use an existing term, like Märkisch or Westfälisch-Märkisch.
 * Make up a translation of an existing term like Märkish or Markish, Westphalian-Markish. Not a good idea, and in English often the German terms are used, e.g. from  and not Münsterlandish (or with Munster, -ic or -ian).
 * Make up a term, like *Märkisch-Westfälisch or *Markish-Westphalian. A bad idea, at least if there are (common) existing terms.
 * --Der Zeitmeister (talk) 19:00, 30 January 2021 (UTC)
 * I do think using a German name is probably best, because there is certainly far too little English-language literature about this dialect for an English name to be established. And I reiterate that Märkisch alone is ambiguous, because Brandenburgisch is also called Märkisch. I could support "Westfälisch-Märkisch" or "Märkisch-Sauerländisch", though. —Mahāgaja · talk 19:12, 30 January 2021 (UTC)
 * I do think using a German name is probably best, because there is certainly far too little English-language literature about this dialect for an English name to be established. And I reiterate that Märkisch alone is ambiguous, because Brandenburgisch is also called Märkisch. I could support "Westfälisch-Märkisch" or "Märkisch-Sauerländisch", though. —Mahāgaja · talk 19:12, 30 January 2021 (UTC)

Bot task: change hyphens to en dashes in defdate argument
There is a page Special:WhatLinksHere/Template:tracking/defdate/hyphen listing uses of defdate with hyphens. One is supposed to use an en dash instead of a hyphen to separate centuries. Most of these can be mechanically converted by looking for the pattern digit,digit,t,h,hyphen,digit,digit in the argument. Alternatively, the template itself could to the work. See next post. Vox Sciurorum (talk) 19:26, 2 November 2020 (UTC)

Improving defdate
On French Wiktionary there is a template named  that takes a century number (in Roman numerals) as argument and adds the appropriate text around it, e.g.   yields (XXe siècle). We could improve the most common use of  by adding a template   with one or two arguments. With one argument it generates the equivalent of. With two arguments it generates a range using the en dash that nobody likes to type,. Thoughts? Vox Sciurorum (talk) 19:31, 2 November 2020 (UTC)
 * Good idea, it has been brought up before, but nothing came out of it: . – Jberkel 21:25, 2 November 2020 (UTC)
 * I created century. 3 = [ c.] .  I will attempt to document it.  Vox Sciurorum (talk) 23:57, 2 November 2020 (UTC)
 * Seeing who linked to my just-created template turned up another discussion from 2009: Beer_parlour/2009/September. I like the idea suggested there that the obsolete label should say when the term became obsolete.  And yet, the best is the enemy of the good. Vox Sciurorum (talk) 00:07, 3 November 2020 (UTC)
 * An alternative is to make smarter, so that e.g. it autoconverts hyphens next to numbers to en-dashes, and automatically converts e.g. 17c -> 17th c., and autoconverts 17-19c -> 17th–19th c.. If you think this is the right approach, I can probably implement it. Benwing2 (talk) 05:36, 5 November 2020 (UTC)
 * I figured if it hadn't happened in two years it wasn't going to happen. Making defdate smarter is fine, if it doesn't use too much memory. Vox Sciurorum (talk) 14:35, 10 November 2020 (UTC)

Strip soft hyphens from links
I unknowingly cut and pasted a word containing U+00AD, a soft hyphen. The character is invisible so I didn't know I had it, but it caused what should have been a blue link to turn red: instead of. Soft hyphens should be stripped like diacritics are in Old English link ( = ), and they should not be allowed in page names. Vox Sciurorum (talk) 19:05, 3 November 2020 (UTC)
 * I've prohibited the soft hyphen in titles with a title blacklist entry. I get that it'd be convenient if it were automatically stripped from links formatted by Module:links, but I'm uncertain whether to do that because we mostly strip things that are supposed to be in the displayed text and I'm not sure we want soft hyphens. If we stripped soft hyphens, I think it would be done for all languages in  in Module:languages. (Would be nice if the MediaWiki software would do it for us in wikilinks so that plain links would also automatically link to the soft-hyphen-less title.) — Eru·tuon 01:03, 4 November 2020 (UTC) — Eru·tuon 01:03, 4 November 2020 (UTC)

Buggy Buginese Template
The template bug-noun isn't displaying Lontara spellings properly. --Apisite (talk) 09:10, 6 November 2020 (UTC)

memory errors
Anyone know any changes made recently resulting in more memory errors? Formerly using would reduce memory by around 13M on average, whereas recently it's not reducing as much memory. I didn't make any changes to the mechanism that would result in this. land and bat are now running over when recently, with, they were well within range. Benwing2 (talk) 07:32, 7 November 2020 (UTC)
 * It's sad to see how little support there is from the WMF regarding our memory problems. Phabricator tickets regularly get closed deferring to "on-wiki" solutions, a ticket regarding better memory profiling support (so we can fix on-site) doesn't move. Plus we're stuck with an ancient version of Lua which has some inherent memory management weaknesses. – Jberkel 10:46, 11 November 2020 (UTC)
 * Do all the things that are done in real time have to be done in real time? Some content may never change if done correctly the first time. Some templated content could be substed. We could decide not to have all translations downloaded with the rest of the entry. We could offload content such translations to Wikidata and only load the translations on demand. Do we really need rather than ? DCDuring (talk) 22:53, 11 November 2020 (UTC)
 * The reason for doing some of these in real time and not substing is to avoid things getting messed up or out of sync if the underlying code that the subst is based on is changed. I would rather first see if we can get more creative with memory usage. For example, I was able to cut around 6MB on average out of pages with large translation tables by changing the way the language data is loaded, and below I propose splitting the language data further for more memory savings. Benwing2 (talk) 02:20, 16 November 2020 (UTC)
 * The "substing" should be done on a different layer, and already is to some degree: the transcluded pages are cached and don't have to be re-rendered, unless dependent modules change. I like the idea of moving some data to Wikidata, its better suited for the job than the huge blobs of JSON/mediawiki markup and can can be queried as needed. – Jberkel 08:25, 16 November 2020 (UTC)
 * It's a high price we will be paying for the advantage. Shifting the cost to WM by increasing the memory limit to, say, 75Mb is nice for us, but isn't that a cost for Wikimedia that is proportional to the number of Wiktionary users online at a particular time? I assume that is on SSDs. Do we know how many users and how many wiktionary pages they have open on average? Is it just windows that have focus that matter? And how many items of relevant content get changed in a day? How much would relying on Wikidata slow things down compared to our current Lua-module-dependent approach. DCDuring (talk) 21:43, 16 November 2020 (UTC)
 * The limit is on RAM used to regenerate a page (generally in respose to an edit or template/module change). The generated page which consumes SSD space is smaller.  Vox Sciurorum (talk) 22:18, 16 November 2020 (UTC)
 * In any event any cleverness in working to operate within the current limit is greatly appreciated. DCDuring (talk) 21:43, 16 November 2020 (UTC)

strange result from template:zh-see on 库
Saw something leak out of using at bottom of page 库. It is showing (bold added)
 * For pronunciation and definitions of 库 – see 庫 (“warehouse; storehouse; &lbrace;&lbrace;zh-short|庫侖|coulomb; etc.”).

whereas for another use like 车 we see a more benign
 * For pronunciation and definitions of 车 – see 車 (“chariot; cart; land wheeled vehicle; car; etc.”)

Not having looked at this before, apparently the Module:zh-see is simply sucking up and repeating the initial definitions from the destination page? At page 庫 we see:
 * # warehouse; storehouse
 * # library
 * # library

So is this module not understanding the use of templates within the initial definition strings? But wait, at 車 we see the early use of templates in initial definitions ( lb ):
 * # chariot; cart
 * # land wheeled vehicle; car

So... is it that Module:zh-see is only looking at the first NN characters, and not noticing an unterminated template reference? The second example 車 has just about 100 chars to get past the included " car ". Checking only 100 chars in the first example 庫 would leave us in the middle of the zh-short template. Leaving us pretty ugly looking, yep. Shenme (talk) 22:06, 7 November 2020 (UTC)

Issue with link to Latin -entia
If you go to the English entry at and click on the Etymology link to 🇨🇬, it takes you to ; not to. It seems there is a redirect from to, but why ? Leasnam (talk) 23:03, 8 November 2020 (UTC)
 * This is not a Grease Pit issue. See Talk:-antia. Chuck Entz (talk) 23:22, 8 November 2020 (UTC)

en.wiktionary costs CPU cycles
Hi there,

my browser's task manager's lately been showing the English Wiktionary tab as gobbling up a lot of ressources on my computer, regardless of what page i'm on (like this simple edit page). So either it's my browser, or Wikt. The French Wikt tab does not seem to require as much, although it's not far behind. &mdash;Jerome Potts (talk) 12:31, 9 November 2020 (UTC)

Pronunciation templates
Looking at the out of memory error on i, I found that removing the various generators of pronunciation, e.g. hu-IPA which calls Module:hu-pron, allowed several more languages to generate without error. I don't understand what causes the Lua evaluator to use memory, but perhaps there is an optimization to be done on the various IPA modules. Otherwise, what do people think about manually copying the auto-generated pronunciation to reduce module invocations on this large page? (I have a lot of experience debugging memory use in various languages with automatic and manual memory management, but I don't know Lua in particular.) Vox Sciurorum (talk) 14:34, 10 November 2020 (UTC)

Bug with multiword terms category for unsupported titles?
Unsupported_titles/Colon_slash automatically went into Category:Translingual multiword terms. This seems like a mistake. Equinox ◑ 22:36, 10 November 2020 (UTC)
 * I think it's because Module:links/data (which is fully protected) needs to be updated to turn the title from "Colon_slash" to ":/" – Nixinova [&zwnj;T|C] 22:52, 10 November 2020 (UTC)
 * I've added it to Module:links/data, but that doesn't remove it from the category. — Eru·tuon 23:15, 10 November 2020 (UTC)
 * Oh, it happens to all unsupported titles as well -- Category:Translingual multiword terms. Definitely an issue with the categorisation template(s). Thought it was just an issue with this new page. – Nixinova [&zwnj;T|C] 23:20, 10 November 2020 (UTC)
 * I'll look into this and fix it. Benwing2 (talk) 01:31, 12 November 2020 (UTC)
 * I disabled multiword terms in Translingual; too many false positives of various sorts. Benwing2 (talk) 03:48, 12 November 2020 (UTC)

English multiword terms?
And speaking of the new "multiword term" categories, why are there no CAT:English multiword terms? —Mahāgaja · talk 22:46, 10 November 2020 (UTC)
 * This is because I added English to the  list in Module:headword/data. My logic was that there are so many such entries that it's not clear the category is useful; but if you think the category will be useful, I can remove English from the list. Benwing2 (talk) 01:34, 12 November 2020 (UTC)
 * I took English out of the list. Benwing2 (talk) 03:48, 12 November 2020 (UTC)

Special:BookSources obsolete link
I assume one needs some sort of elevated privileges to fix this... clicking on the U.S. Library of Congress link for an ISBN number results in the error "LC Catalog - Legacy Interface Retired"; the correct current URL appears to be

...with "【ISBN】" replaced by the ISBN number. --Struthious Bandersnatch (talk) 19:11, 11 November 2020 (UTC)

mod:ja and lua memory
It seems that a copy of mod:ja/data is made every time mod:ja is loaded:

Is it possible to alleviate the memory problem by improving the code above? -- Huhu9001 (talk) 06:50, 12 November 2020 (UTC)
 * Unlike,   only generates the module table the first time it is called on a page, while every time it is called it generates a special table that allows the code to access the cached module table. The special table takes up memory every time   is called, so we can save memory if any templates use Module:ja but don't call a function that accesses Module:ja/data. It can be tested by replacing   in Module:ja with   and then replacing   with  . This will only load the module if needed, so it will save whatever memory is used for  . This technique can be used for other  -loaded modules too. — Eru·tuon 00:13, 15 November 2020 (UTC)
 * I made the change to Module:ja described above and previewed 水 with the template sandbox extension (see the diff), but there was no difference in where the out-of-memory error showed up. Oh well. — Eru·tuon 00:18, 15 November 2020 (UTC)

splitting language data
Currently language data is split by the first letter. However, I think this is not fine enough. I suggest we split further, either by first+second letter or go all-out and put each language in its own module. The latter solution is used by various Asian-language modules and seems to work OK. Benwing2 (talk) 22:59, 14 November 2020 (UTC)
 * BTW for reference, I made a recent change to to only load language modules as needed and use   instead of , and it reduced land by 6 MB, which was enough to get it below the 50MB threshold. (It's not yet enabled on other pages that use .) So there is still a lot of room for optimization in the language modules. Benwing2 (talk) 23:05, 14 November 2020 (UTC)
 * Also, the solution can be adopted anywhere we have a large number of similar templates called. Benwing2 (talk) 23:06, 14 November 2020 (UTC)
 * Any thoughts about this? Benwing2 (talk) 03:34, 17 November 2020 (UTC)
 * I'm not completely opposed to splitting up the language data modules, but I feel like they'd be tedious to edit because there would be many more of them (using the two-letter solution, 602 according to this script, up from 28). We have variables like,  ,   that are used in multiple language data records, and they'd have to be copied to some of the child modules, or moved to fields in a helper module, because actual literal combining diacritics are hard to read and it's hard to ensure that the punctuation remains the same. Having a lot of modules also makes it more work to update multiple languages at once: more pages to edit.
 * At least, I have already felt a bit overwhelmed even with just 28 modules. At least the modules would be smaller, with no more than 112 languages in each module (currently the maximum is 624 in Module:languages/data3/k according to my "language stuff" page).
 * I'd prefer to edit the larger modules, and automatically create the smaller ones. A bot could dump the language data as Lua using the equivalent of . Trouble is some bot operators would have to maintain the bot and make sure it updates the modules quickly enough that people don't get annoyed that their changes aren't being reflected in pages. A benefit would be that we could split the language data modules however we like, while template editors wouldn't have to learn a new system.
 * In any case, to test this out, a bot would be the easiest method.
 * Historical note that Rua mentioned the splitting language data idea in . Earlier, TheDaveRoss suggested it in and I mentioned using a bot and DTLHS said that I should volunteer to maintain it. I hadn't written bot scripts then, but now I think could at least write a script to create the modules; I don't yet know how to make a bot respond to certain pages being edited.
 * I was also thinking the bot could dump the data as JSON so that it's easier for gadgets and bots to use — if they often retrieve data for one or more languages. Then the smaller data modules could just be something like: Just throwing this out there; I'm not sure if it's a good idea. But it'd be easy to do with a bot. — Eru·tuon 09:14, 17 November 2020 (UTC)
 * Splitting into smaller modules by two letters would not work in the long run, if translation tables get larger and have more languages in them. Eventually, every language will be covered. On the other hand, splitting off the data points that translation tables need will reduce that problem. —Rua (mew) 09:38, 17 November 2020 (UTC)
 * Thanks for your comments. I get your point about it being hard to edit 600+ modules if you need to do it by hand. An alternative to having a bot script that autosplits the modules is to do it using Javascript, similar to how Module:languages/canonical names works. I don't know exactly how this is set up by I imagine it isn't too hard for someone like you who knows Javascript. The Javascript would have to be smart enough to only make changes when needed so it doesn't take too long. If you can set up the basic structure, I can help you improve it; I've worked a bit with Javascript and can figure it out if the infrastructure is already in place. As for things like, this can be handled automatically by the script that does the splitting; it doesn't actually need to create variables like  , but can just use something like   directly wherever needed (or even just inline the character directly), since people shouldn't be manually editing the auto-generated files. Benwing2 (talk) 01:58, 18 November 2020 (UTC)
 * The translation tables are actually not an issue currently, due to (based on a suggestion you made). The problem is more with pages like a, e, i, etc. that have a lot of entries for different languages on them, and here I think splitting the language tables by two characters would really help. I think your alternative suggestion is to split on the other dimension, e.g. split out certain fields like 'otherNames/varieties/aliases' and Wikimedia codes and such. This is definitely possible but it would introduce some of the same issues that splitting by two characters would introduce in terms of making it harder to do edits across modules, and would have an additional IMO negative effect in that the information on individual languages would be split across multiple files. Benwing2 (talk) 02:02, 18 November 2020 (UTC)
 * If we do all of this automatically, it'd be easy to create another set of modules with only some of the language data fields and see how that affects memory usage. I'm doubtful that it would improve things, but it doesn't hurt to try.
 * If we want to use both Python and JavaScript, it would be easiest to have a module that outputs the list of data modules that need to be saved. Then the script only has to call the module function and save the data modules. The module can decide which data modules need to be updated, which is easiest to do from inside the Wiktionary module system.
 * I already tried writing a sandbox module that would bundle the source code for the two-letter-prefix language data modules in JSON format, but it ran into the template include limit, and this limit also applies to the ExpandTemplates API, so the module has to break the response into chunks that are each under 2 MiB. Just recreated the test module at Module:User:Erutuon/split language data modules, but without chunking logic yet.
 * I guess chunking could lead to an inconsistent state in the split-up modules if someone edits another language module between the time one chunk is generated and the next is. The script could retrieve the latest revision timestamp of all the user-edited language data modules before and after the chunks are retrieved. If the before and after timestamps are not the same, it could keep retrieving the chunks, until a certain number of retries has been exceeded. — Eru·tuon 04:08, 19 November 2020 (UTC)
 * If the module just outputs the language data for each two-letter-prefix module as JSON, it comes well under 2 MiB. (There are a lot of \n and \t in pretty-printed Lua.) So either I need a Lua table pretty-printer for JavaScript or Python, or to just create Lua modules that parse a JSON string or JSON pages that are loaded by Lua modules. The latter currently requires changing the content model, because   pages in the Module namespace are not automatically set to JSON, but that could be done with a script if an admin runs it every time a new JSON page needs to be created. The JSON string method seems the easiest because it doesn't involve Lua pretty-printing or content model changing. — Eru·tuon 10:28, 19 November 2020 (UTC)

I split the language data under a sandbox module using the JSON string method. I tried out using the alternative data modules by adding  at the beginning of   in Module:languages and entering a title in the "Preview page with this template" below the edit box and clicking "Show preview". I think this makes use the split-up modules even though it doesn't load language data with Module:languages. This caused land and white to run out of memory, and fire and black and water to increase their memory usage somewhat. Not what I expected. — Eru·tuon 11:35, 24 November 2020 (UTC)

apihighlimits question
May I request apihighlimits to be enabled for my account? Once in a few months I'd like to use API:categorymembers on subcategories of Hungarian lemmas (or this category proper). Getting results by 200 or 500 and then having to combine them manually can get tedious when the total is closer to the order of 10,000s than to the thousands. Or I wonder if I could get a researcher bit (which includes apihighlimits) if this way is more convenient for you. Or shall I request it on behalf of my bot account) instead? (This account hasn't been active on the English-language Wiktionary, but I made quite a few edits on the Hungarian Wikipedia around a decade ago.) Thank you in advance. Adam78 (talk) 18:44, 16 November 2020 (UTC)
 * The only user groups with the  right in Special:ListGroupRights are administrator or bot, so you'd have to be one of those. If your bot account were botified here, you could download subcategories using that account.
 * But there might be a tool on ToolForge (see the directory) that does what you want, for instance PetScan.
 * You can also generate it from the dump (specifically from the files,  ,  ). I have a program that does this; it creates a JSON map from page title to array of categories that start with a certain prefix. Filtering by the prefix   should get all the pages in the subcategories of Category:Hungarian lemmas as well as some other categories. (The result looks like  .) If that sounds useful for your purposes, I can figure out a way to get the JSON to you (it's 8.9 MB) or explain how to run the program if you want to be able to generate the file yourself. — Eru·tuon 01:01, 17 November 2020 (UTC)

, thank you very much for your reply! PetScan seems to be perfect for all my purposes. – It's a bit odd though that when I tried looking up Hungarian lemmas, it produced one fewer results (26,634) than the number of the articles listed there (26,635) – and not due to the exact time of the request, because PetScan did have the 10 most recent lemmas given here under "Recent additions to the category", so the difference must lie somewhere else. Non-lemma forms resulted in the same total in PetScan as here, even though the number is higher (38,089). Adam78 (talk) 19:49, 17 November 2020 (UTC)

Missing babel box template
The template Phnx-1  for Phoenician doesn't exist for some reason. Dngweh2s (talk) 20:22, 16 November 2020 (UTC)

Adding labels to comparative/superlative forms
In the adjective "head" line, e.g. as generated by "en-adj" template, I want to list both "more/most" and "-er/-est" forms, but label the latter as less common, or rare, or whatever. I sort of remember seeing this done somewhere, but I can't remember where, and I could be wrong anyway. Does anyone know a recommended way to achieve this, either using "en-adj" or some other way? Mihia (talk) 11:51, 18 November 2020 (UTC)
 * I added support for this in ; use comp_qual, comp2_qual, ... to specify comparative qualifier(s), and sup_qual, sup2_qual to specify superlative qualifiers.
 * Oops. Benwing2 (talk) 01:37, 19 November 2020 (UTC)
 * Great, thanks very much for doing that. Mihia (talk) 21:59, 19 November 2020 (UTC)

Suggestion for Module:la-adj/table: use diagonal split header
For greater clarity and a better appearance, use diagonal split header instead of plain Case / Gender. I don’t know how to make it work in Lua, otherwise I’d implement the change myself. —Born2bgratis (talk) 02:13, 19 November 2020 (UTC)

template:de-conj-strong & template:de-conj-weak
It lacks forms, which are quite regular: Some examples where these forms do exists and are lacking in the table:, , , , (at least for the indicative). --Schläsinger X (talk) 02:45, 20 November 2020 (UTC)
 * 1st ps. sg. indicative present active: e.g. and  besides  (from )
 * 2nd ps. sg. imperative for words with stem in d: e.g. and  besides  (from )
 * I'm not sure we want to include forms like and . These are informal pronunciations, for sure, but as spellings they may be nonstandard. Benwing2 (talk) 03:10, 20 November 2020 (UTC)
 * Even as spellings they are quite common. And as for the imperative, forms like are already included when the stem doesn't end in d or t (e.g.  has "lauf (du) | laufe (du)"; but  and  only have the form with -e). --Schläsinger X (talk) 11:40, 20 November 2020 (UTC)
 * Because in the imperative the apocopic forms are standard and not considered informal. Apart from that, a final schwa can be clipped virtually anywhere in poetry and informal or dialect-close language, so there is no point to include these forms in the tables. It should be avoided as noisy, to show readers what the expected standard forms are. And indeed it is useless to create such apostrophic spellings and I reckon that such verb forms should be deleted on sight, so much nonstandard and irregular these spellings are – the expected spelling is . Fay Freak (talk) 23:03, 20 November 2020 (UTC)
 * For verbs with stems in d or t, the endingless imperative is missing - albeit even Duden has them: binden, retten. --Schläsinger X (talk) 10:04, 21 November 2020 (UTC)

"Watchlist" facility
When I create a new page, I leave the "watch this page" box ticked. What is this supposed to do? I mean, what is supposed to happen once I have "watched" a page? Should I get notifications or messages relating to that page or what? I have looked at the "Watchlist" page but the list is dominated by MY edits (that I can see on my "Contributions" page, and don't need to see again here). How do I turn those off? The presentation of that "filters" list is also thoroughly confusing because many are potentially subsets of others, and it is unclear how they interact. For example, the default settings leave all types of editors unchecked in the list, so wouldn't that exclude all edits of all types? Then there are the notifications of "A link was made from X to Y" that I get, I think when Y is a page I created. This seems to be controlled by the "notification preferences" settings. Is this separate from "Watchlist" or are the two connected? Should I get any notifications as a result of "watchlist" settings? And why are there two separate places to configure watchlists (one under "Preferences" and one under "Watchlist" itself). I am just totally confused by all this. Can anyone help, or is there anywhere it is clearly explained? Mihia (talk) 20:55, 20 November 2020 (UTC)
 * Whenever my Watchlist kept getting crowded, I just got a new account. I recommend it Darren X. Thorsson (talk) 21:05, 20 November 2020 (UTC)
 * Yes, unfortunately the watchlist feature doesn't seem to scale well, it takes several seconds to load when your list reaches a certain size. – Jberkel 13:02, 21 November 2020 (UTC)
 * Notifications for links are separate from the watchlist. You can turn them off in Special:Preferences under "Page link". When "watch this page" is ticked, the page is added to your watchlist. You shouldn't get notifications for activity related to pages on your watchlist because that's not an option in preferences. I think with no filters on a particular aspect of an edit it just shows you everything, but as soon as you choose one or more filters that pertains to some aspect, like the user who performed an action, it removes edits not agreeing with any of the filters. For instance, you can select two filters to show only actions by newcomers and unregistered users. For more information you might try mw:Manual:Watchlist and mw:Help:Watchlist and w:Help:Watchlist. — Eru·tuon 21:48, 20 November 2020 (UTC)
 * Thanks, so, just to confirm, the "Watchlist" is just that list of edits on the "Watchlist" page, where it says "Live updates", and is unconnected with the notification system, right? Do you (or anyone) know a way to turn off my own edits in that list? I thought that the main idea of "watchlist" was that you wanted to see other activity on an article, not your own. Mihia (talk) 21:57, 20 November 2020 (UTC)
 * There's a row of checkboxes: Hide: [] registered users [] anonymous users [] my edits [] bots [] minor edits [] page categorization [] Wikidata. Check my edits.  Vox Sciurorum (talk) 21:59, 20 November 2020 (UTC)
 * Vox Sciurorum's directions work if you've got the earlier version of the watchlist. If you have the newer fancier version, check in "filter changes". There's a filter for "changes by others". You can select it, and if you want to always use it, save it using the bookmark icon inside the "active filters" box. — Eru·tuon 22:04, 20 November 2020 (UTC)
 * Aha, thank you very much. IMO the design of that filter list is HIGHLY confusing. By default "Changes by You" is UNCHECKED, and yet "Changes by You" are by default DISPLAYED. "Go figure", as I believe they say. Mihia (talk) 22:28, 20 November 2020 (UTC)
 * I find I have to experiment on the occasions when I am adjusting the watchlist because the meaning of the filters is not clear. I can never remember whether the filter controls include or exclude. The result is that I rarely adjust the watchlist and the controls to do so are just a waste of screen space. DCDuring (talk) 13:46, 21 November 2020 (UTC)

template:gsw-decl-adj
The template is incorrect, or incomplete and misleading. There are other forms as added in and. Possibly, there are even more as seen in: w:Low Alemannic German (unsourced), w:Walser German, Walliserdeutsch. as creator of the template. --Schläsinger X (talk) 10:09, 21 November 2020 (UTC)
 * To which dialect(s) do the forms in the template belong?
 * How should other forms be added? All in one template (which becomes ugly, messy, confusing, and is incorrect for adjectives which don't exist in all dialects), or with different templates for different sub-dialects?
 * It's based on the "standard" Swiss German, which is a sort of codified mixture based mainly on the dialect of Zurich. I have no idea how other dialects should be treated; I'm just adding what is important and necessary for me (I live in Switzerland), and in practice every valley here does things slightly differently, without exaggeration. In my opinion the best way to illustrate these things is with lots of examples – though this again poses unique problems for Swiss German, which is mainly unwritten. Ƿidsiþ 07:20, 23 November 2020 (UTC)

Font of the headword at "root"
The headword at is appearing in a sans serif font, unlike ordinary entries. I suspect this is because there is a Chinese section (which uses and places the entry into "Category:Chinese terms written in foreign scripts"). I think the change of font is fine if the entry consists of only a Chinese term written in a Latin script, but it doesn't seem like a good idea when there is an English section and other language sections as well. Should this be fixed? — SGconlaw (talk) 12:50, 21 November 2020 (UTC)


 * (Additional discussion: .) - -sche (discuss) 00:56, 25 November 2020 (UTC)
 * Resolved: see "" below. — SGconlaw (talk) 17:16, 29 November 2020 (UTC)

Template:R:Mindat date problem
Creating, I added a MinDat reference and noticed that it said "accessed 29 August 2016", which is wrong, because I accessed it just now, today, and perhaps this entry didn't exist there in 2016. I see that this date is hard-coded into the template. Really I think it should embed "today's date" at the point when a new entry is saved. Or what can we do about this? Equinox ◑ 15:40, 21 November 2020 (UTC)
 * That date seems to have been in the template from the beginning; I'm not sure why. I couldn't figure out how to have the template indicate the date at the point when a new entry is saved, so as a temporary (semi-permanent?) fix I replaced it with the year when the database was launched and the current year. — SGconlaw (talk) 16:25, 21 November 2020 (UTC)

Can we start categorizing archived rfv's, rfd's etc. by language?
It occurred to me that I wanted to look through words that have previously failed rfv to see if I can attest them, but the language isn't currently recorded. Adding a lang parameter to is easy, and the header that is generated when you follow the "+" link on  and  already contains language data; however, that data is simply discarded when you archive it. I don't have permission to edit MediaWiki:Gadget-aWa.js for some reason (I edited MediaWiki:Gadget-QQ.js in the past, but I seem to have lost that permission at some point?).__Gamren (talk) 20:46, 21 November 2020 (UTC)
 * Sure., Gamren should be made an interface admin. —Μετάknowledge discuss/deeds 21:01, 21 November 2020 (UTC)
 * Received; thanks Chuck.__Gamren (talk) 14:22, 22 November 2020 (UTC)

Chinese section on out (etc?) causing top of page to change font
Go to "outside" and look at the appearance of the word "outside" at the very top of the page, above the TOC: it appears, for me, in a relatively smaller and 'fancier' serif font. Now look at "out": it appears in a different, relatively larger and simpler / sans-serif font. Previewing individual language sections by themselves, I have found that this is due to something in the Chinese L2 section. What should be changed so that the Chinese modules/templates do not change the display of top-of-page Latin-script text? - -sche (discuss) 21:54, 24 November 2020 (UTC)
 * This is the same issue as brought up at above. —Mahāgaja · talk 22:07, 24 November 2020 (UTC)
 * I see. Until a better fix can be had, I've fixed it by removing the headword-line template and replacing it with bare text + manual categorization. - -sche (discuss) 00:52, 25 November 2020 (UTC)
 * Update: judging by all right, and then tested on out and root, simply using head instead of Chinese-specific templates also seems to produce correct results; I've switched the two aforementioned entries to that format. - -sche (discuss) 00:55, 25 November 2020 (UTC).
 * Great, thanks. This may need to be documented at the Chinese-specific templates (that is, don’t use these templates if there are other language sections in the entry?). — SGconlaw (talk) 04:37, 25 November 2020 (UTC)
 * Resolved: see "" below. — SGconlaw (talk) 17:16, 29 November 2020 (UTC)

No italics for ikt mentions
On my Mac (OS 10.15.7, Firefox 83.0) links and mentions for Inuvialuktun (code ikt) are formatted identically and use a font that doesn't match anything else. Compare, , ,. I see italics, roman, roman, roman with the last two in a different font than the first two. Is there anything special about Inuvialuktun that calls for a different font? If so, it needs to provide both roman and italic forms. Vox Sciurorum (talk) 15:17, 26 November 2020 (UTC)
 * The reason is that Inuvialuktun is listed at Module:languages/data3/i as being written only in Canadian Syllabics, not in the Latin alphabet. Should I add Latin as a script for it? —Mahāgaja · talk 17:32, 26 November 2020 (UTC)
 * There is a translation dictionary cited in an RFV thread using Latin script. According to internet searches both scripts are in use.  Vox Sciurorum (talk) 18:51, 26 November 2020 (UTC)
 * OK, I've added Latin script for Inuvialuktun, and just look at the words in your original post now! —Mahāgaja · talk 19:24, 26 November 2020 (UTC)
 * It's a miracle! It also pains me to see.  I used to work for a company that sold a cloud service where we kept all your documents (3d models) on our server.  One of our basic rules was old versions of a document should render the same forever even if the generating functions (equivalent to templates) changed later.  We did this by linking each document to the specific version of a template being used.  There was a task that would generate new versions of documents with links updated, but the old version was always available.  We never had the problem seen on Wiktionary where you go to an old version of page and see a sea of red due to an incompatible template change.  Vox Sciurorum (talk) 19:29, 26 November 2020 (UTC)
 * Now I'm seeing a funny font for Inuktitut, code iu, which does have "Latn" (not Latn) in the script list. iu = 🇨🇬.  Vox Sciurorum (talk) 19:35, 26 November 2020 (UTC)
 * Yeah, that happens for several languages. I don't know how to fix it; it doesn't seem to have anything to do with the language's entry in Module:languages. —Mahāgaja · talk 20:03, 26 November 2020 (UTC)
 * Looks like it's a combination of (1) using a Mac, (2) the directive "html,body{font-family:sans-serif}" in the first stylesheet loaded by the head section of HTML, (3) lang="iu" in the &lt;i/&gt; surrounding the mentioned word. I wonder if this is a problem with Apple's font handling.  It happens with both Firefox and Safari on Mac, not with Chrome on Chrome OS.  Does Firefox use WebKit? Vox Sciurorum (talk) 20:34, 26 November 2020 (UTC)
 * I don't know, but the fonts are different for me too, and I use Windows, not Mac. —Mahāgaja · talk 21:06, 26 November 2020 (UTC)

Bug report (WikiHiero)
Just noticed this bug on the Egyptian nḫt-nb.f page. The word's WikiHiero code reads as follows: n:xt-x*t:D40-nb:f which appears correct. The hieroglyphs, however, render as instead of  Looks to me like WikiHiero automatically added in a couple extra letters when it ligatured the first cluster. If anyone here has access to the back end, you might want to check it out. (While I could just edit the nḫt-nb.f page, I thought I'd bring the issue up since it might be affecting other pages too. Not sure why the n:xt was even ligatured to begin with, since it's not n&xt.) 2601:49:C301:D810:B163:5C7E:AE44:4CEE 00:50, 27 November 2020 (UTC)
 * —Μετάknowledge discuss/deeds 03:10, 27 November 2020 (UTC)
 * WikiHiero unfortunately tries to ligate glyphs separated by  and not just   as you’d expect; you have to use   or   instead to force a non-ligature. This is a product of bad logic in the main WikiHiero code here, where it converts all separators into   when testing for available ligature images. Rarely, as with , it inappropriately expands ligatures if it has an image with the right name available. In this case the cause of the issue is that the image   is named   in the WikiHiero code repository instead of being named   as it should be. Unfortunately we don’t have control over that at Wiktionary; you’d have to file a bug report at Phabricator to get it fixed. It’s annoying, but it’s a known bug, and most of the uses of   on Wiktionary are intentional, done with the bug in mind. You’re right that the one at  isn’t, though; it should be  . I’ve gone ahead and fixed it in the entry. — Vorziblix (talk · contribs) 03:58, 27 November 2020 (UTC)
 * Ok, thanks! 2601:49:C301:D810:9108:9228:94E:3E1F 12:26, 27 November 2020 (UTC)

Weird styling applied to the entry title at check
In the inspect element it is easy to verify that "Hani" class is the culprit. I think we should take an action right now or else we might have a "Hani" Wikidemic soon. Dixtosa (talk) 12:11, 28 November 2020 (UTC)
 * I've changed the Chinese section to use zh instead of zh-verb as a temporary workaround. Vox Sciurorum (talk) 13:52, 28 November 2020 (UTC)
 * This issue has now been mentioned on this page three times. Should we document your workaround on the Chinese template pages (i.e., use instead of this template if there are other language sections in the entry)? — SGconlaw (talk) 17:00, 28 November 2020 (UTC)
 * Rather, "use head instead of this template if the word is written in the Latin alphabet". There's no reason not to use zh-verb on pages written in Hanzi that also have Korean, Japanese, or Vietnamese entries. —Mahāgaja · talk 17:21, 28 November 2020 (UTC)
 * Ha ha, good point. — SGconlaw (talk) 18:26, 28 November 2020 (UTC)
 * Or better yet, change the module that adds "Hani" so it doesn't do so when the headword isn't in Han characters. Chuck Entz (talk) 17:37, 28 November 2020 (UTC)
 * can what Chuck suggested be implemented? — SGconlaw (talk) 18:26, 28 November 2020 (UTC)
 * I've implemented a similar fix: Module:headword will only add a display title if the title is not all-ASCII. That fixes the cases mentioned so far, but wouldn't prevent a non-completely-ASCII Latin title from being formatted as Hani. I'll see if I can find any such cases and edit the module if I do. — Eru·tuon 21:15, 28 November 2020 (UTC)
 * Okay, so here is all the script-tagged display titles where the title contains Latin characters (from a SQL query). A few contain non-ASCII Latin characters (as defined by Unicode, not by Module:scripts/data):
 * ,, and   are errors of script recognition; Eastern Cham, Tai Dam, and Akkadian are defined as only using one script each, so  ,  , and   respectively are assigned with no checking. This can be fixed by adding   to the list of scripts for those languages (as all of them use Latin script fairly often: search User:Erutuon/scripts in link templates for their language codes,  ,  , and  ).   is manually assigned and the   is not an error. — Eru·tuon 22:49, 28 November 2020 (UTC)
 * Fixed Eastern Cham and Tai Dam by adding Latin to their scripts, and added Latin to the scripts of Akkadian as well because Akkadian in link templates does sometimes use Latin script according to User:Erutuon/scripts in link templates, but it doesn't fix the display title in pa-rá-su-um because head in that entry uses 𒉺𒁺𒋢𒌝 and the display title logic in Module:headword doesn't check whether the script of the head parameter matches the script of the title. I could just change the ASCII check above to a check for Latin script or punctuation or whitespace, but I'm not sure if that's the best solution when the actual problem is that script in the display title might match the head parameter rather than the title. — Eru·tuon 23:07, 28 November 2020 (UTC)
 * Iff there's only one entry where the pagetitle is Latin script but the head= is cuneiform, that suggests that that entry is doing something nonstandard and needs to be changed, rather than that we need a bigger change to "handle" it. Looking at the edit history, I see the page had been at the cuneiform title before Tom 144 moved it, and I recall that Votes/2019-05/Lemmatizing Akkadian words in their transliteration passed, but I see it has mostly not been implemented, due to various problems with it. Tom or Victar might know what should be done with this entry, which might include moving it back to cuneiform. - -sche (discuss) 23:28, 28 November 2020 (UTC)
 * There were also a few Akkadian links with Latin script, which can be seen here. I moved them to tr if they have hyphens, otherwise to ts, and reverted my edit to the language data. That was the right way to handle this. Someone else will have to deal with the Latin-script Akkadian title. — Eru·tuon 05:05, 29 November 2020 (UTC)
 * Great! Thanks, . — SGconlaw (talk) 17:16, 29 November 2020 (UTC)


 * @Dixtosa: I can't believe you didn't go with "Handemic". 😂 - -sche (discuss) 20:19, 28 November 2020 (UTC)

Additional blank line at swack
What causes the extra blank line before Etymology 3? Is inserting a spurious blank line? Equinox ◑ 16:32, 29 November 2020 (UTC)
 * I fixed it by removing a newline from R:Partridge 1984. Vox Sciurorum (talk) 16:41, 29 November 2020 (UTC)

Template:it-prep phrase
Can someone create this please? It's just a headword-line template, but I don't have much knowledge of how to code these. Imetsia (talk) 01:28, 30 November 2020 (UTC)
 * Would it make anything easier that would be more complicated with it? If not, there's no particular need for a language-specific template. —Mahāgaja · talk 08:17, 30 November 2020 (UTC)
 * Many of the major languages have the template (Template:en-prep phrase, Template:fr-prep phrase, etc.), so it seems fitting that Italian have it too. It does nothing more than save a few characters and templatize the headline. It's not strictly necessary, but nice to have. Imetsia (talk) 17:26, 30 November 2020 (UTC)
 * The French headword templates invoke Module:fr-headword, which, thanks to, automatically splits terms containing an apostrophe or a hyphen. This allows me to do edits like.
 * If Module:it-headword was able to do the same, Italian-specific headword templates invoking it would be an improvement over the generic templates. PUC – 12:09, 2 August 2021 (UTC)

Ban variation selectors in page names
User:Justinrleung recently nominated for deletion the page 次󠄁, which is 次 plus the Unicode U+FE01 (URL shows as /次%F3%A0%84%81 in my browser, using the UTF-8 encoding of U+FE01). U+FE00 to U+FE0F should be prohibited in page names. According to Wikipedia, they should only be used when they do not change the meaning. Vox Sciurorum (talk) 10:15, 30 November 2020 (UTC)
 * For reference, titles with variation selectors as of the 2020-11-20 dump are listed at User:Erutuon/lists/variation selectors. At the moment, mainspace pages only use the emoji-related variation selectors (VS15, VS16: U+FE0E, U+FE0F), apart from the pages for the variation selectors themselves, which just redirect to Appendix:Control characters. And most are just emoji (marked by VS16) or non-emoji (marked by VS15) versions of characters that redirect to the entry for the character. — Eru·tuon 11:06, 30 November 2020 (UTC)
 * 次+VS2 exists (linked above) but is not on your list. Vox Sciurorum (talk) 11:16, 30 November 2020 (UTC)
 * Right, that's a recent addition and my list reflects the state of affairs on November 20th. — Eru·tuon 11:19, 30 November 2020 (UTC)
 * Hmm, apparently I wasn't checking for titles with code points in the block, but there weren't any before November 30th when 次 was created. — Eru·tuon 20:23, 3 December 2020 (UTC)
 * As the employment of variation selectors, however uncommon in practice, is legitimate under various circumstances (obviously they exist for linguistic content), one should be adverse to a ban. That “they should only be used when they do not change the meaning” is doubtful.  and  is not different in meaning either, yet the distinction exists. Editors are expected to exhaust the possibilities of Unicode to represent writing as it appears on paper. It’s disheartening if entries are deleted or banned beforehand because they are “too correct” and in general one cannot use Unicode correctly because it exceeds the vulgar’s computer education. Fay Freak (talk) 11:48, 30 November 2020 (UTC)
 * Chiming in late as I do some maintenance work on RFDs.
 * One technical approach that might work as a partial ban would be to ban the variation selectors when used in combination with certain codepoint ranges. So far as I'm aware, these selectors should never be used in conjunction with any CJK string, so that's a sizable chunk that could be implemented right there.  ‑‑ Eiríkr Útlendi │Tala við mig 17:57, 10 February 2021 (UTC)

Definition API
I am trying to figure out how to use the Wiktionary API to get the definition markup for a word. I found the API documentation and know how to get lists of words and information about words, just not the actual definition. I have also done a web search and could find no answer other than screen scraping. I even know how to get Wikipedia content; frustratingly, everything other than what I want.

My real goal, if it matters, is to find the year of first use of a word, which I would get by looking at the years of all the illustrative quotations. I know that there is also the Citation: namespace. Do I need to look there as well, or is it fairly reliable that the earliest known use will always be in the main entry?

Matchups (talk) 13:54, 30 November 2020 (UTC)

P.S. Should API and help desk have hatnote references to the appropriate project pages, as help does?
 * There isn't a real API for Wiktionary, there's only a somewhat unmaintained endpoint used by the Wikipedia app: /page/definition/{term}. It does not return any quotations, however. I'm not sure if the quotations on Wiktionary are really useful for what you're trying to do. A word might have been used in a lot of other contexts before the earliest date listed here. – Jberkel 14:08, 30 November 2020 (UTC)
 * As Jberkel pointed out, the quotations in our entries are insufficient for the purpose you intend. Many entries have no quotations at all, and even for those that do, we are in most cases unable to guarantee that the earliest quotations shown are the first known published occurrences of particular entries, simply because editors have neither the time nor resources to conduct extensive research. (Occasionally when we do know that a particular quotation is the first known occurrence, we indicate this in a note.) — SGconlaw (talk) 18:48, 30 November 2020 (UTC)
 * Looking for the defdate text at the end of a definition would be better, but still not very good. Vox Sciurorum (talk) 18:53, 30 November 2020 (UTC)
 * Indeed, because we don’t consistently add for all senses of all entries. — SGconlaw (talk) 19:18, 30 November 2020 (UTC)

Thanks, all, for the info. I will do the best I can with what's there, for now. The intent is to use it as one of the inputs to a machine learning process, so even dirty data is better than nothing at all. And perhaps I will also see what I can do to improve some words going forward. Matchups (talk) 01:10, 1 December 2020 (UTC)