User talk:Erutuon/2019

Flag of Portuguese
Hello. I see you are an administrator who deals with flags so maybe you can help me. We currently use only the flag of Portugal to represent Portuguese and here it was requested twice that it be replaced by this flag, which represents Brazil as well (consider that Brazil has twenty times more Portuguese speakers than Portugal). Both requests, the first in May 2016 by myself and the other one in June 2018, have been largely ignored. I'm here to request that change for the third time. - Alumnum (talk) 22:50, 3 January 2019 (UTC)
 * Done, since nobody has objected to it. — Eru·tuon 23:04, 3 January 2019 (UTC)
 * Thanks! - Alumnum (talk) 23:07, 3 January 2019 (UTC)

Change to MediaWiki:Common.js
This is about this change. IE9 and older browsers get grade C support which means our js does not even get to run on them. more info. Giorgi Eufshi (talk) 06:37, 4 January 2019 (UTC)
 * Thank you! I was trying to find that information, but didn't succeed. That makes things easier. I'll remove stuff relating to unsupported versions of IE. — Eru·tuon 06:52, 4 January 2019 (UTC)

do ... end?
I noticed you added a block of code to Module:nyms that begins with  and ends with , but it doesn't seem to loop at all. Is this some Lua construct I'm not aware of? There's nothing on https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual about it. —Rua (mew) 21:33, 6 January 2019 (UTC)
 * It's a simple block. It's mentioned briefly under . The only effect it has is to make the local variable  inaccessible below where it's actually used. I'm not sure I've ever seen it on Wiktionary before. — Eru·tuon 21:50, 6 January 2019 (UTC)

problem with
At [[rock]] Derived terms the control that expands the list does not appear. Clicking the place it should appear does expand the list. DCDuring (talk) 19:36, 7 January 2019 (UTC)
 * In all of the Derived terms sections that have enough terms in them, the control appears for me and works. The one for "an act of rocking" doesn't have a control because it doesn't have enough terms. Are there any other show–hide things that aren't working, like the translations boxes? — Eru·tuon 20:09, 7 January 2019 (UTC)
 * Others on the same page appear. I wonder it is some kind of interference from the rhs table of contents or other stuff appearing on the right side. DCDuring (talk) 20:18, 7 January 2019 (UTC)
 * : Which one are you talking about? They all appear for me except Etym 2 noun which doesn't have enough entries to activate it (set for 4 columns). Not a system I am fond of. DonnanZ (talk) 20:30, 7 January 2019 (UTC)
 * The first is the one that doesn't appear for me. The Translation controls appear. DCDuring (talk) 20:38, 7 January 2019 (UTC)
 * I'm doubtful that anything else on the page would interfere with it. On my screen, the image on the right pushes the derived terms over, but the control still appears and works. If you're in mobile mode, the control will not appear. It sounds to me like the JavaScript for the list switcher is not working. I am curious if your browser console has any error messages indicating that this is the case. If you have Firefox or Chrome you can click F12 and select the Console tab to view it. There are usually a lot of annoying warning messages, but maybe the only thing relevant would be a message containing "Error" (or "TypeError", or some variation on that). — Eru·tuon 20:48, 7 January 2019 (UTC)
 * It's OK for me. I had to revise earlier where hyp4 was used to hyp3 because of odd behaviour; it doesn't need four columns anyway. DonnanZ (talk) 21:17, 7 January 2019 (UTC)
 * The problem also arises in Chrome. I found lots of messages, but none showing error. The control to reduce the list does appear, of course well after the right-hand side table of contents. I wonder whether anyone has looked at that gadget in the last few years. DCDuring (talk) 01:49, 8 January 2019 (UTC)
 * Disabling the rhs ToC gadget gave me the control. But I'd rather have the rhs ToC than the control or that configuration. I am also loath to defeat it with JS as we have often had long periods where the JS functionally lagged dreadfully. I've stripped a lot of optional JS away. DCDuring (talk) 02:01, 8 January 2019 (UTC)
 * We are talking about, right? That's one of the templates that I redid recently and that the discussion and vote in November's Beer Parlour was about. I don't have the RHS ToC so I will have to enable that to see if I can reproduce what you're talking about. — Eru·tuon 02:07, 8 January 2019 (UTC)
 * Whew, I enabled the RHS ToC and it is quite horrifying what it does to the control for the derived terms list: pushes it all the way down into the next set of definitions. It's some weird interaction between all the images on the right and the HTML elements in the derived terms list. I'll look into it. — Eru·tuon 02:14, 8 January 2019 (UTC)
 * Okay, it was a CSS issue and I think it will be resolved whenever your browser manages to get ahold of the most recent version of MediaWiki:Common.css. — Eru·tuon 02:37, 8 January 2019 (UTC)
 * Does the rhs ToC gadget have dated, deprecated elements? DCDuring (talk) 03:06, 8 January 2019 (UTC)
 * Thanks. It works. DCDuring (talk) 03:09, 8 January 2019 (UTC)
 * Thank you for pointing out the bug. I've noticed lagginess too. I'm working on making more of the default JavaScript into gadgets loaded by default, which could help if the problem is download time. — Eru·tuon 02:21, 9 January 2019 (UTC)
 * The RHS ToC gadget just involves a very short CSS file, so no HTML elements are changed. The issue was a CSS property that was applied to the control for the derived terms list, which was putting the control into the stack of elements floating on the right side of the page, below the TOC and most of the images. Removing that property fixed the problem. — Eru·tuon 03:17, 8 January 2019 (UTC)


 * could be an issue with WOTD causing problems at the top of the page entry. That code is definitely in there. DonnanZ (talk) 10:42, 8 January 2019 (UTC)
 * This is what I mean, see, which hasn't been modified. There is a workaround which fixes the gap (which I see, maybe you don't) where images and Wikipedia links are placed under the first header, which would be Etymology here. DonnanZ (talk) 00:09, 11 January 2019 (UTC)
 * Where are you seeing a gap? With ToC on the left side, I see the WOTD link under the header, with a reasonably sized gap that is simply due to the bottom padding of the header above it; with ToC on the right side, the WOTD link is pushed down by the ToC but there isn't a significant gap between it and the ToC. — Eru·tuon 00:29, 11 January 2019 (UTC)
 * I have the TOC on the left, with a large gap alongside the image, with Etymology pushed down to the line below the image. It could be my browser doing it, I'm on Windows 10. Hiding the TOC makes no difference. I have discussed this with Sgconlaw before, he now modifies current WOTDs. DonnanZ (talk) 00:42, 11 January 2019 (UTC)
 * I'm using Firefox on Linux Mint. I can get the same effect by adding the CSS properties  or   to the "Etymology" header. Maybe there is a gadget that adds that CSS property to the header. — Eru·tuon 01:01, 11 January 2019 (UTC)
 * I haven't got a clue. I experimented with placing the image on the left, which gets rid of the gap for me, but it had some odd effects, with bullets and numbers showing through the image, so I didn't save it that way. Images default to the right, unless they are modified; and so do wp, swp, wikipedia, so maybe WOTD should too, and not use . I don't know, I'm not a programmer. DonnanZ (talk) 10:32, 11 January 2019 (UTC)


 * I still use IE, as I have all my favourites stored there, but I thought I would try Edge and Orange. No gap on Orange, everything as it should be, but Edge is the same as IE, a massive gap. So I guess it's a Microsoft problem. DonnanZ (talk) 13:05, 11 January 2019 (UTC)
 * Why did I say Orange? I meant Chrome. Oops. DonnanZ (talk) 16:20, 26 January 2019 (UTC)
 * , I think you should know we're discussing this. DonnanZ (talk) 15:14, 11 January 2019 (UTC)
 * Is there anything specific you'd like my input on? — SGconlaw (talk) 16:26, 11 January 2019 (UTC)
 * Not that I can thank of at the moment, I was just drawing your attention. I know many editors use Firefox or Orange rather than Microsoft products, but countless other users (passive or otherwise) may use Edge or IE, so we still have to cater for them. DonnanZ (talk) 17:15, 11 January 2019 (UTC)
 * At some point I should go into Windows and see if I can reproduce the problem and figure out a solution. — Eru·tuon 07:34, 15 January 2019 (UTC)
 * OK, I was tempted to modify, but I will leave it as it is for now. DonnanZ (talk) 09:39, 15 January 2019 (UTC)
 * Okay, I'm in Windows and the problem in telemark is the  CSS property on the HTML element that encloses the image (html). For some reason, Microsoft Edge thinks that the property means that the etymology heading has to be below the image, but Firefox and Chrome don't. Just removing the property isn't desirable; then the image appears to the left of the "WOTD" text. — Eru·tuon 18:39, 17 January 2019 (UTC)
 * I take it that means there's nothing you can do. Ironically, IE seems to have "packed up", so I've been using Chrome (in preference to Edge) for the last two days. DonnanZ (talk) 16:28, 26 January 2019 (UTC)
 * Well, I did some web searches and didn't find any references to the issue or any solutions. I don't feel very motivated to do more searching, but I might go back to it. — Eru·tuon 01:01, 27 January 2019 (UTC)

-ύς epic declension
Hi! Could you please have a look here? Thank you very much, --Epìdosis (talk) 10:20, 9 January 2019 (UTC)

Issue with "Template:WOTD" and audio files
Wonder if you can see if I did something wrong. I updated so that it would recognize audio files in the format "File:En-au-[entry].ogg" which Commander Keane has been diligently uploading and inserting into entries. However, it works for some entries and not others. For example, if you look at the January 2019 WOTDs at "Word of the day/Archive/2019/January", the audio file of appears but that of  doesn't. I tried resetting the transcode of File:En-au-Tiggerish.ogg but that didn't make a difference. Any idea what might be going wrong? Thanks. — SGconlaw (talk) 06:47, 15 January 2019 (UTC)
 * Heh, I spent a lot of time looking at the template code and seeing no problems, and then finally edited the section and the audio showed up, so I pressed "refresh" in the upper right hand side of the WOTD box to make the audio show up on the page. It was apparently a caching issue. — Eru·tuon 07:22, 15 January 2019 (UTC)
 * Ohhh. I tried refreshing the entry page and the WOTD archive page, but didn't think it was an issue of the template page needing to be refreshed as well. Thanks for discovering that! — SGconlaw (talk) 07:27, 15 January 2019 (UTC)
 * I mean, I clicked one of the "refresh" buttons in a WOTD box in Word of the day/Archive/2019/January, not in Template:WOTD. That purges the page ( in the URL), different from reloading the browser. — Eru·tuon 07:32, 15 January 2019 (UTC)
 * Ah, I see. — SGconlaw (talk) 07:41, 15 January 2019 (UTC)


 * Whatever you did seems to have speeded up the loading of audio on a page, I had noticed recently there was a delay where you had to wait for it to catch up before the page or entry could be edited. DonnanZ (talk) 11:02, 15 January 2019 (UTC)
 * That's weird. — SGconlaw (talk) 11:06, 15 January 2019 (UTC)

What have you done?
I know you're trying to clean up my (admittedly) poorly constructed category, but you do realize that 糹 is not a triplication? Johnny Shiz (talk) 16:19, 9 February 2019 (UTC)
 * Oops! That should be fixed with . — Eru·tuon 20:46, 9 February 2019 (UTC)

rookie's question
Eru, you don't have to answer this... But if you ever have time: I'm trying to understand lua (at my age, impossible), which is needed at el.wiktionary, because the last person who could handle it, disappeared last year. I know that neither is correct, but which one is the worst? the 1st or the 2nd? sarri.greek (talk) 23:56, 13 February 2019 (UTC)
 * Neither of them will work, but I added a version that probably will. — Eru·tuon 01:22, 14 February 2019 (UTC)
 * Αχ, thank you, ευχαριστώ! I'll study it, I promise. I am indebted to you. sarri.greek (talk) 01:43, 14 February 2019 (UTC)
 * I remember when I was just starting to learn Lua. It was pretty hard and I made a lot of mistakes. It was just about my first programming language. These cheat sheets might be helpful: 1, 2. — Eru·tuon 02:21, 14 February 2019 (UTC)
 * Another idea is to try playing around with a, where you can type in code and see the result. There is a console below the editing area in module pages, or you could try the console on the Fengari website (yes, they named it after the Greek word for moon!) which uses a more recent version of Lua and doesn't have MediaWiki-related stuff. — Eru·tuon 02:40, 14 February 2019 (UTC)
 * You are so sweet, and a genius!! I did it! el:Πρότυπο:sarritest and el:Module:sarritest (only i changed 'local' at function). And I will study your links too. I'll try to keep most of the things in simple templates. If something happens to me, there will be noone to continue or correct things. sarri.greek (talk) 05:04, 14 February 2019 (UTC)

Just informing that the Saudi IP has an agenda for removing computing-related senses
Regarding the of the quote on, this follows a long line of removing anything implying usage of Arabic words for computing, regard the history of , , , and others I cannot name off the cuff. The removal of such references may also be the only motivation for layout changes, this IP appears to frequently camouflage removals by changes in other respects of dubious worth. Informing also who have previously tackled this IP. Fay Freak (talk) 23:30, 23 February 2019 (UTC)
 * Ahh, thanks for the information. That makes sense of what the user was doing. — Eru·tuon 23:34, 23 February 2019 (UTC)
 * I don't think I have ever seen this editor communicate, but I have seen them edit-war in the past. They simply undo any edits that counteract theirs. &mdash; surjection &lang;?&rang; 23:44, 23 February 2019 (UTC)

proper way to clone a table
Why doesn't mw.clone work on loadData'd tables? What is the error? What is the proper way to clone a table? Maybe table.shallowClone and/or table.deepcopy? Module:parameters should *DEFINITELY* not be side-effecting the params table passed into it; that's bad juju and can lead to all sorts of subtle and hard-to-debug errors. Benwing2 (talk) 01:36, 1 March 2019 (UTC)
 * from Module:table is intended to copy tables loaded with, but when I tried plugging it into your edit, there was a stack overflow. Not sure how that happened. — Eru·tuon 01:40, 1 March 2019 (UTC)
 * The problem with  is that it copies the metatable, and the metatable makes the copied table read-only, and prevents   from writing any keys to it.   allows the metatable not to be copied. — Eru·tuon 01:43, 1 March 2019 (UTC)
 * I see. Well, in this case, either deepcopy without metatable copying or shallowClone should work, as only the top level is being side-effected. Benwing2 (talk) 01:45, 1 March 2019 (UTC)

I am not familiar with the particular standard used here, but I do not believe "alternative" is an accurate term for these terms. I have previously used "alternative form" for slightly different forms of the same word that are more or less equal in the standard language, like "kaitsema" and "kaitsma". In this case, these forms are not entirely equal in meaning. "Jauhatama" for instance is the Võro word, and would not be considered an "Estonian" word by most. While "jahvama" is listed in the ÕS as a dialectal termin, I would still not consider it an "alternative form", but rather a dialectal synonym. If used, it carries a dialectal connotation which makes it different from "jahvatama". Strombones (talk) 09:55, 6 March 2019 (UTC)
 * "Alternative forms" is not meant to be very specific or descriptive; it's simply the header that's used (see WT:ALTER) for dialectal forms, as well as quite a few other things. For instance, see the Alternative forms sections of or, which list dialectal forms. The words in the Alternative forms section should ideally be labeled with the name of the dialect or dialects that they belong to. Often the entry for these words will contain a definition line like "alternative form of x" or "{dialect name} form of x".
 * However, I think Võro is a special case; since it is considered a separate language here on Wiktionary (meaning, it uses the "Võro", not the "Estonian" header), a Võro word should probably be linked from some other part of the entry, like the etymology section (as a cognate), though I don't know what would be appropriate in this case. Usually the Alternative forms only contains words that have or will have an entry with the same language header as the current entry. — Eru·tuon 10:11, 6 March 2019 (UTC)
 * Ah, thank you. I misconstrued the heading because I had only seen it being used one way. The thing with Võro terms is that sometimes they are used in standard Estonian for a dialectal "twist" of sorts, along with other non-Võro dialectal words. That's probably irrelevant here, so I think I'll remove the Võro words and just keep "jahvama".Strombones (talk) 14:06, 6 March 2019 (UTC)

grc-noun form
Frankly, I do not think grc-noun form is necessarily preferable to grc. --Dan Polansky (talk) 06:25, 23 March 2019 (UTC)
 * It might be worth a discussion. As noted in the documentation for Module:grc-headword, the module does some things that does not. — Eru·tuon 06:45, 23 March 2019 (UTC)
 * All right, then. You probably mean the following: "This module tracks the monophthongs α, ι, υ (a, i, u) without macrons, breves, circumflexes, or iota subscripts (◌̄, ◌̆, ◌͂, ◌ͅ) with the tracking template grc-headword/ambig, so that length can be marked as policy requires, and it categorizes all Ancient Greek words into categories for accent type, such as Ancient Greek oxytone terms." --Dan Polansky (talk) 06:49, 23 March 2019 (UTC)
 * Yeah. So converting non-lemma entries to use Module:grc-headword is mainly so that these services are provided for non-lemmas as well as lemmas. However, tracking ambiguous vowels and listing terms by accent could be done by analyzing the dump instead. — Eru·tuon 06:55, 23 March 2019 (UTC)

dot= in form-of templates
Can you rerun your script checking for any of the following templates? Some of them don't end in 'of' (particularly the shortcut aliases). Thanks:

Click to show or hide list

Benwing2 (talk) 05:00, 25 March 2019 (UTC)
 * Sure. I will have to rewrite the program a bit first though so that it can digest a list of templates to look up instances of. — Eru·tuon 05:05, 25 March 2019 (UTC)
 * Done! Now I can make a listing of multiple templates pretty quickly. Do you find it useful to have the wikitext of the templates printed on the page under the titles like this, or no? — Eru·tuon 06:29, 25 March 2019 (UTC)
 * Thank you! This is very cool. Having the wikitext of the templates is useful so I can inspect it, e.g. I wouldn't have thought to look for cases like dot. Benwing2 (talk) 09:53, 25 March 2019 (UTC)
 * Can you make a list of all pages that have a template with the dot param? Thanks! Benwing2 (talk) 01:34, 28 March 2019 (UTC)
 * I went through the previously generated data file and there were only a few matches: B.J., C.I.A., J.C., M.S., steerike. They are already on the list above. — Eru·tuon 02:20, 28 March 2019 (UTC)
 * Oh, I forgot that was on the preceding list. Thanks! Benwing2 (talk) 02:21, 28 March 2019 (UTC)

καλός
Sir, have it your way, if you must. But, with respect, you need to understand that language and poetic scansion are not the same thing. You seem to be confusing the two. Traditional metric scansion can sometimes do violence to language, forcing it to do abnormal things, but that does not mean that those abnormalities then become part and parcel of normal everyday speech. Thus, we can be quite certain that, while forced to scan κᾱλός while reciting certain kinds of poetry, speakers of Attic-Ionic never said κᾱλός in actual normal speech.

Now, I am not saying that information about the linguistic abnormalities of metric scansion is not germane to the Wiktionary. On the contrary, I think it is very useful to a student of Classical Greek poetry, provided that it is placed in the appropriate context, such as in a Usage Note (as it is now), and stating very clearly that what applies to the traditional metric scansion of poetry does not apply to normal everyday language.

However, that kind of metric information certainly does not belong under Pronunciation. “Epic Greek” scansion is not a Greek dialect, as Doric, Attic, Ionic, Aeolian or Boeotian are. Nor are “certain other cases” forms of language in the way that dialects are.

Perhaps, as a student of Classical Greek poetry (a highly commendable endeavor to be sure), you have little concern for language outside of metric scansion. But I very much doubt that ancient Greeks went about their day reciting Homer all the time (thus saying κᾱλός much more often than κᾰλός). I am certain that they actually spoke their Greek as a real everyday language.

The Wiktionary is primarily a dictionary, not a guide to metric scansion. And the purpose of a dictionary is to record actual language, not the abnormalities of metric scansion. Pasquale (talk) 16:02, 26 March 2019 (UTC)
 * As I understand it, both pronunciations of are inferred from poetry: the short-vowel version from Attic drama and some other types of poetry, the long-vowel version from epic poetry and some other types of poetry. As you say, presumably the usual pronunciation had a short vowel since Attic drama would probably use the usual pronunciation, but the long-vowel pronunciation would have been used when reading Homer. (As the entry points out, the long-vowel pronunciation was probably not the actual Homeric pronunciation since at the time of composition the word would have had a short vowel and a digamma. It was a reinterpretation after the digamma was lost.) But that doesn't matter; here on Wiktionary we show transcriptions for both the usual and unusual vowel lengths. — Eru·tuon 17:41, 26 March 2019 (UTC)


 * Indeed, that's absolutely correct. When it comes to α, ι, and υ, vowel quantity has to be inferred from poetry. But then linguistic analysis takes over. There have been several important volumes written about the phonetics and phonology of Attic-Ionic Greek, as well as other dialects, for over a century now. As a result, we know for certain that the short-vowel pronunciation κᾰλός was, in fact, the standard spoken Attic-Ionic pronunciation, while the long-vowel pronunciation κᾱλός was restricted to scanning certain types of poetry, especially epic poetry, ergo not part of the actual spoken language. Back to what I wrote about the difference between actual language and poetic scansion... Thanks. Pasquale (talk) 21:00, 26 March 2019 (UTC)
 * Yes, and there are a variety of other ways it's inferred, for instance in from the circumflex, or in  from either the circumflex in certain forms or the fact that Ionic had, or from the expected vowel grade in certain formations, or from the fact that a form has undergone compensatory lengthening or quantitative metathesis.
 * Anyway, are you arguing that the long-vowel version would never have actually been used by Athenians, even when reciting Homer, that they would have recited it un-metrically? — Eru·tuon 21:18, 26 March 2019 (UTC)
 * (butting in) I agree with Pasquale that artificial pronunciations that exist solely for the sake of the meter should not be put on an equal footing with the natural, prosaic ones. Chignon – Пучок 21:22, 26 March 2019 (UTC)
 * What do you mean exactly? Marking the artificial pronunciations somehow, not showing them at all? At the moment the κᾱλός pronunciation is labeled as being used in epic poetry.
 * Note that there are some metrically modified forms that have or will have their own entries, like words with other vowel lengthenings, like ε to ει or ο to ου, or with consonants doubled for the sake of meter, or with οω instead of ω. — Eru·tuon 21:41, 26 March 2019 (UTC)
 * I think we can still show the artificial pronunciations, but yes, in my view we should explicitly mark them as artificial. Maybe we could write (artificial lengthening for the sake of the meter) (although that's a bit long) or something like that?
 * As to your second point, yes, I agree that those deserve their own entries of course. But again, let's write explicitly where they come from. Chignon – Пучок 21:55, 26 March 2019 (UTC)
 * Now I'm kind of interested in writing a script that looks through page titles to find words that might be metrical modifications of other words. — Eru·tuon 22:13, 26 March 2019 (UTC)
 * According to Proto-Greek language, "Loss of /h/ and /w/ after a consonant was often accompanied by compensatory lengthening of a preceding vowel." This suggests that this is a regular change and not artificial at all. —Rua (mew) 22:00, 26 March 2019 (UTC)
 * Yes, I think I've encountered that before. Maybe is a bad example, but my point still stands: there are some lengthenings that have no etymological justification. Chignon – Пучок 22:06, 26 March 2019 (UTC)
 * Compensatory lengthening after the loss of after a consonant may be more common in Ionic than Attic, though: *monwos → Attic, Ionic ; *ksenwos → Attic , Ionic . — Eru·tuon 22:10, 26 March 2019 (UTC)
 * So perhaps the longer form is just non-Attic, but really was used somewhere. Given that Homer was an Ionian himself, that seems like the first place to look. —Rua (mew) 22:21, 26 March 2019 (UTC)
 * In reply to your question: "are you arguing that the long-vowel version would never have actually been used by Athenians, even when reciting Homer, that they would have recited it un-metrically?": No, of course, I am not suggesting any such thing. And that's perfectly clear from my previous comments, I believe. What I did say is that information about the linguistic abnormalities of metric scansion is fine, and indeed very useful, as long as it is placed in a Usage Note. But it certainly does not belong under Pronunciation and should be removed from that section. The Pronunciation section strictly references the Attic dialect (and not, for example, the Homeric). Look at what the Pronunciation section says now:
 * In most cases:


 * (5th BCE Attic) IPA: /ka.lós/
 * (1st CE Egyptian) IPA: /kaˈlos/
 * (4th CE Koine) IPA: /kaˈlos/
 * (10th CE Byzantine) IPA: /kaˈlos/
 * (15th CE Constantinopolitan) IPA: /kaˈlos/


 * In epic poetry and in some other cases:


 * (5th BCE Attic) IPA: /kaː.lós/
 * (1 st CE Egyptian) IPA: /kaˈlos/
 * (4th CE Koine) IPA: /kaˈlos/
 * (10th CE Byzantine) IPA: /kaˈlos/
 * (15th CE Constantinopolitan) IPA: /kaˈlos/


 * This is just silly and probably incorrect. As I repeat, metric scansion is distinct from normal everyday language, and it has a history of its own. An experienced reciter of poetry may well have pronounced /kaː.lós/ with a long ᾱ while scanning Homeric verse, not only in 5th BCE Attic, but also well into 1 st CE Egyptian, 4th CE Koine, and even later. On the other hand, there were surely speakers of 5th BCE Attic Greek who never recited Homeric verse in their lives and only ever said /ka.lós/, which we know was the normal everyday pronunciation.


 * Lots of other languages, ancient and modern, offer similar peculiarities of metric scansion, which often shed light on earlier stages of those languages, but synchronically are always artificial. They merit mention in a note, but are not listed as synchronic pronunciation variants. For example, there are numerous words in Sanskrit that have peculiar scansions in Vedic hymns (sometimes only in the oldest hymns of the Rigveda); e.g. स्वर् (svár, but metrically súar or súvar in Vedic hymns), see स्वर्. Or, in Italian, the word oriente, which is normally pronounced in three syllables, but is often pronounced in four syllables in poetry (and maybe spelled orïente), see oriente. There are myriad such cases. But, invariably, the synchronically artificial metric scansion is discussed at best in a note, never listed as a normal pronunciation variant. Hope this is clear. Pasquale (talk) 17:24, 27 March 2019 (UTC)

Hmm, I agree that the post-Classical Attic pronunciations of are probably inaccurate: it's not clear when Greek speakers would have stopped reciting Homer metrically. So it would make more sense to only show the Classical Attic pronunciation for any special Epic forms.

As mentioned above, we tend to include in every entry, and some entries are for special metrical forms that are spelled differently from the normal forms. So to apply your preferred policy, pronunciations would have to be removed altogether from certain entries. Another option is to label these forms and to only show the Classical Attic pronunciation, because it is not clear how the pronunciation used in reciting poetry would have evolved, but it is plausible that poetry known to the Athenians would have been recited in something like an Athenian accent. I'm more inclined to the latter because I think it is helpful to provide some kind of transcription of poetic words or pronunciations. — Eru·tuon 18:06, 27 March 2019 (UTC)

Wikipedia links
I am interested in the inline links to Wikipedia that we have enclosed in,  (96K pages of transclusions, mostly multiple), and  (3K). There is an argument to be made that the frequency with which we feel compelled to have such a link is an indication that it might make a useful entry, provided, of course, that it meets CFI.

I could extract these using modifications of the Perl scripts that I use for (15K) and  (25K). Are you aware of any such compilation or of any well-designed code to do the same thing?

Does it make sense to do it from the tool server? How? DCDuring (talk) 23:33, 29 March 2019 (UTC)
 * I haven't worked with the toolserver yet, and am not sure if there are tools for getting statistics on wikilinks, but I generated a list of and  instances and could make a program to find wikilinks if you'd like. I figure it would be faster to generate the data from a list of templates and wikilinks than the whole dump. — Eru·tuon 00:12, 30 March 2019 (UTC)
 * Where are said lists of the template instances?
 * How would I get a count of links that use, which is not a template?
 * I didn't even know about until I has looking for the number of pages that transcluded . My  does a link to WP, but obviously I know all about them. Are there other templates that link to WP?
 * Ny needs are only for links to English Wikipedia. (I think I've already extirpated all link to Wikispecies other than via .) Ideally, I would like to merge all the lists of instances and get counts. Specifically, for a given link, I would like the wikipedia link and the display. The WP links might include links to headers and there could be different displays for a given WP link. The sort should group instances by WP link page, and subsort by any header links and then by display. The groups should sort by decreasing frequency of the entire group. DCDuring (talk) 01:55, 30 March 2019 (UTC)
 * I probably can do this myself. Please don't spend time on it unless I come begging. DCDuring (talk) 01:57, 30 March 2019 (UTC)
 * I've got the and  files on my computer. Unfortunately the  file is too big to save on a Wiktionary page (about 3.5 MB), the other one a bit smaller and available here. If you'd like the  file, you can send me an email via Special:Emailuser. (I'm not super familiar with file-sharing sites.) — Eru·tuon 02:22, 30 March 2019 (UTC)
 * ✅, I think. DCDuring (talk) 16:30, 30 March 2019 (UTC)

Not quite done etyl cleanups
Hi, if I edit etyl to show that a certain language is done, but there are actually still a few forms left, please don't undo my edit to the template. I often list language codes as done that don't actually have an "etyl cleanup/xx" subcategory precisely so that I can find the last few stragglers via CAT:E. Obviously if I've really jumped the gun and there are suddenly dozens of pages in CAT:E, you can and should revert me, but if there's only a handful of pages, then please let it be. I'll find and fix the module errors soon. Thanks! —Mahāgaja · talk 14:43, 9 April 2019 (UTC)
 * Sorry about that. I'll stop interfering. I'm not super enthusiastic about using module errors, but it works.
 * If it would help, I can supply lists of pages that had a given language code as of the last dump. I created a folder with separate files for each language code in a format similar to . (It comes to about 4 MB.) I was doing a JWB run through a list like that to clean up the last instances. Usually there are several pages that have already been cleaned up since the dump was generated, so it's not the best for manual editing, unless you have a way to filter out pages that have already been cleaned up. — Eru·tuon 21:06, 9 April 2019 (UTC)

form-of templates: Full information
Thought you might find this useful ... I created a programmatic list of all the non-language-specific form-of templates and their properties. Not sure if you use Python but if you do it should be very easy to fetch whatever you want out of this list. Each template has an associated dict of properties:
 * : List of aliases
 * : List of deprecated aliases (should no longer be used)
 * : If true, template displays a default initial capital and supports nocap
 * : If true, template displays a default final period and supports nodot and dot
 * : If true, template supports from, from2, etc. to specify regional dialects or whatever
 * : If true, template supports POS to control the part of speech of the category
 * : If present, non-language-specific portion of the category to which the page belongs (prepended with the canonical name of the language to form the actual category name); value could potentially be a list of multiple categories, but no such entries exist among the non-language-specific templates

I'm still working on the corresponding language-specific list. These templates are much messier, often work in idiosyncratic ways, and are often defined manually instead of using a function in Module:form of/templates. I'm gradually converting them and cleaning them up.

Click to show or hide list

Benwing2 (talk) 02:48, 10 April 2019 (UTC)
 * Thanks, that's already been useful to help me determine that a list of form-of templates that I made would include one with capitalization. (I've written a little Python that used Pywikibot and mwparserfromhell, but I'm not as familiar with it as with Lua, JavaScript, and C.) — Eru·tuon 04:55, 10 April 2019 (UTC)

Lang-specific form-of templates
Here is my current list of lang-specific form-of templates and their aliases (if there are multiple comma-separated template names listed on a single line, the first one is the canonical name and the remainder are aliases). I haven't gotten around yet to classifying them by behavior, which is difficult in any case because each one is so idiosyncratic and my plan is to obsolete as many as possible.

Click to show or hide list  ar-instance noun of ar-verbal noun of be-Taraškievica spelling of bg-adj form of bg-noun form of bg-pre-reform bg-verb form of blk-past of br-noun-mutation of,br-noun-mutated br-noun-plural ca-adj form of ca-form of ca-verb form of caret notation of ceb-superseded spelling of chm-inflection of cmn-erhua form of,zh-erhua form of cu-Glag spelling of cu-form of da-e-form of da-pl-genitive de-du contraction de-form-adj de-form-noun de-inflected form of de-superseded spelling of,de-deprecated spelling of de-umlautless spelling of de-verb form of de-zu-infinitive of egy-alternative transliteration of,egy-alt egy-verb form of el-Cretan dialect form of el-Cypriot dialect form of el-Italiot dialect form of el-Katharevousa form of el-Maniot dialect form of el-Pontian dialect form of el-form-of-adv el-form-of-nounadj,el-form-of-pronoun el-form-of-verb,el-verb form of el-monotonic form of el-participle of el-polytonic form of en-archaic second-person singular of en-archaic second-person singular past of en-archaic third-person singular of en-comparative of en-ing form of en-irregular plural of en-past of en-simple past of en-superlative of en-third-person singular of,en-third person singular of enm-first-person singular of enm-first/third-person singular past of enm-inflected form of enm-plural of enm-plural past of enm-plural subjunctive of enm-plural subjunctive past of enm-second-person singular of enm-second-person singular past of enm-singular subjunctive of enm-singular subjunctive past of enm-third-person singular of eo-form of eo-root of es-adj form of es-compound of es-note-noun-mf es-verb form of es-verb form of/adverbial es-verb form of/conditional es-verb form of/imperative es-verb form of/indicative es-verb form of/participle es-verb form of/subjunctive es-verb form of/subtense-name es-verb form of/subtense-pronoun et-nom form of et-participle of et-verb form of fa-adj form of,fa-adj-form fa-form-verb ff-fuc-form of fi-form of fi-infinitive of fi-participle of fi-verb form of fr-post-1990 fr-pre-1990 fy-pronadv of ga-emphatic of ga-lenition of gl-verb form of gl-verb form of/conditional gl-verb form of/doWork gl-verb form of/error gl-verb form of/imperative gl-verb form of/indicative gl-verb form of/participle gl-verb form of/pronoun gl-verb form of/subjunctive gl-verb form of/subtense-name gl-verb form of/subtense-pronoun gmq-bot-verb-form-sup got-compound of got-nom form of got-verb form of han tu form of,vi-hantu form of he-adj form of he-defective spelling of he-excessive spelling of he-infinitive of he-noun form of he-prep form of he-verb form of hi-form-adj hi-form-adj-verb hi-form-noun hi-form-verb hit-broad transcription of hit-transliteration of hu-exaggerated of hu-inflection of hu-participle hy-form-noun hy-reformed hy-traditional ia-form of ie-past and pp of io-form of is-conjugation of is-inflection of it-adj form of iu-spel ja-form of ja-kyujitai spelling of,kyu,ja-kyu sp ja-past of verb ja-romanization of,ja-romanization-of ja-te form of verb ja-verb form of jbo-rafsi of jyutping reading of ka-form of ka-verb-form-of ka-verbal for,ka-verbal of ko-hanja form of,hanja form of ko-mixed form of ko-root of ku-verb form of la-praenominal abbreviation of lb-inflected form of liv-conjugation of liv-inflection of liv-participle of lt-būdinys,lt-budinys lt-dalyvis-1,lt-dalyvis lt-dalyvis-2 lt-form-adj lt-form-adj-is lt-form-noun lt-form-part lt-form-pronoun lt-form-verb lt-padalyvis lt-pusdalyvis lv-adv form of lv-comparative of lv-definite of lv-inflection of lv-negative of lv-participle of lv-reflexive of lv-superlative of lv-verbal noun of mfe-medial of,mfe-short of mn-verb form of morse code abbreviation morse code for morse code prosign mr-form-adj mt-prep-form my-ICT of nb-noun-form-def-gen nb-noun-form-def-gen-pl nb-noun-form-indef-gen-pl nb-noun-form-indef-pl nl-adj form of nl-noun form of nl-pronadv of nl-verb form of nn-verb-form of nn-verb-form-imp nn-verb-form-past nn-verb-form-pastpart nn-verb-form-pre no-noun-form-def no-noun-form-def-pl ofs-nom form of osx-nom form of pi-sc pinyin reading of,pinread,pinof pt-adj form of pt-adv form of pt-apocopic-verb pt-article form of pt-cardinal form of pt-noun form of pt-obsolete-differential-accent pt-obsolete-hellenism pt-obsolete-sc pt-obsolete-secondary-stress pt-obsolete-silent-letter-1911 pt-obsolete-éia pt-obsolete-ôo pt-obsolete-ü pt-ordinal form,pt-ordinal def pt-pron def pt-pronoun-with-l pt-pronoun-with-n pt-superseded-hyphen pt-superseded-paroxytone pt-superseded-silent-letter-1990 pt-verb form of pt-verb-form-of ro-Cyrillic of ro-adj-form of,ro-form-adj ro-form-noun ro-form-verb ro-superseded spelling of roa-opt-noun plural of ru-abbrev of ru-acronym of ru-alt-ё ru-clipping of ru-initialism of ru-participle of ru-pre-reform sa-desiderative of,sa-desi sa-frequentative of,sa-freq sa-root form of sce-verb form of sco-past of sco-simple past of sco-third-person singular of sga-verbnec of sh-form-noun sh-form-proper-noun sh-verb form of,sh-form-verb sino-vietnamese reading of sl-form-adj sl-form-noun sl-form-verb,sl-verb form of sl-participle of sv-adj-form-abs-def sv-adj-form-abs-def+pl sv-adj-form-abs-def-m sv-adj-form-abs-indef-n sv-adj-form-abs-pl sv-adj-form-comp sv-adj-form-comp-pl sv-adj-form-sup-attr sv-adj-form-sup-attr-m sv-adj-form-sup-pred sv-adj-form-sup-pred-pl sv-adv-form-comp sv-adv-form-sup sv-noun-form-adj sv-noun-form-def sv-noun-form-def-gen sv-noun-form-def-gen-pl sv-noun-form-def-pl sv-noun-form-indef-gen sv-noun-form-indef-gen-pl sv-noun-form-indef-pl sv-proper-noun-gen sv-verb-form-imp sv-verb-form-inf-pass sv-verb-form-past sv-verb-form-past-pass sv-verb-form-pastpart sv-verb-form-pre sv-verb-form-pre-pass sv-verb-form-prepart sv-verb-form-pres-pass sv-verb-form-subjunctive sv-verb-form-sup sv-verb-form-sup-pass sw-adj form of tg-adj form of,tg-adj-form tg-form-verb tl-superseded spelling of tl-verb form of tr-copulative form of tr-inflection of tr-possessive form of ug-uly of ug-uyy of uk-pre-reform ur-form-adj ur-form-noun ur-form-verb vi-Nom form of,Nom form of,nomof xh-combining stem of yi-alternatively pointed form of yi-inflected form of yi-phonetic spelling of yi-unpointed form of za-sawndip form of zh-alt-form zh-altname,zh-alt-name zh-altterm,zh-alt-term zh-misspelling of,zh-misspelling zh-old-name zh-only used in,zh-only zh-original zh-short,zh-abbrev zh-subst-char zh-sum of parts zh-synonym of,zh-synonym zu-combining stem of zu-verb inf of

Benwing2 (talk) 23:55, 13 April 2019 (UTC)
 * I've made a file of instances of these templates (121 MiB!) if you need to do any text analysis on them. — Eru·tuon 21:16, 14 April 2019 (UTC)
 * Thanks! What I actually need currently though is a list of any instances of the inflection tag "mp" in ; i.e. any cases where "mp" (possibly with spaces on either end) occurs in param 3 or greater in a call to . BTW I missed two templates in the list above (now corrected): Template:he-infinitive of (I just forgot it) and Template:fy-pronadv of (recently added). Benwing2 (talk) 21:24, 14 April 2019 (UTC)
 * Okay, the list of  containing  . There shouldn't be many cases in which   isn't a grammar label because it isn't a language code and isn't very likely to be a word, and no instances with explicitly numbered parameters include   as a grammar tag. — Eru·tuon 21:56, 14 April 2019 (UTC)
 * Thanks! Benwing2 (talk) 22:12, 14 April 2019 (UTC)

Scripts scripts scripts
BTW as part of my cleanup of the lang-specific form-of templates I wrote some general scripts to rewrite templates in various ways. One of them lets you do fairly simple things like rename templates or remove or rename parameters using command-line arguments; e.g. I used the following:

to rename to  and remove the lang parameter, with a filter added saying to operate only when da, for safety's sake. Another one lets you specify complex rewrite specifications in code. An example is for rewriting to  (this latter template doesn't exist yet but it will): ("et-verb form of", ( # The template code supports m=ptc and categorizes specially, but # it never occurs. "Inflection of", ("error-if", ("present-except", ["1", "p", "m", "t"])), ("set", "1", [     "et",      ("copy", "1"),      "",      ("lookup", "p", { "1s": ["1", "s"], "2s": ["2", "s"], "3s": ["3", "s"], "1p": ["1", "p"], "2p": ["2", "p"], "3p": ["3", "p"], "pass": "pass", "": [],     }),      ("lookup", "m", { "pres": "pres", "past": "past", "cond": "cond", "impr": "impr", "quot": "quot", "": [],     }),      ("lookup", "t", { "da": "da-infinitive", "conn": "conn", "": [],     }),    ]),  )), This will, for example, rewrite  to, but will complain and refuse to do anything if it sees an unfamiliar parameter or an unexpected value for a known parameter. I also have lots of other scripts to do things like regex-based lookups and rewrites, lists of pages in a given category or namespace or referencing a given page, etc. All of these scripts operate online, although most of them can be passed a list of pages to operate on, making it possible to interface them with scripts that search through a dump. If you're interested, I can make these scripts available. Benwing2 (talk) 22:34, 14 April 2019 (UTC)
 * Hi again! Now that I have a bot these scripts would be very useful. I made a script to swap parameters in or move a numbered parameter to a named one, and realized I might have saved some effort by using your scripts instead, because it turned out to be more complex than I thought. — Eru·tuon 19:59, 23 December 2019 (UTC)

Your Latin>Cyrillic edits
Hi Erutuon, I appreciate your Latin>Cyrillic edits for the terms in Turkic languages.

Just wanted to ask: are you sure those terms prior to your edits were actually typed using Latin characters? Each time I took the effort to use the actual Cyrillic characters using the respective character sets. If so, then I will have those character sets corrected.

Regards, Borovi4ok (talk) 09:07, 19 April 2019 (UTC)
 * I'm quite sure. My program uses regular expressions to find words with non-Cyrillic characters and a set of replacements based on the  here to automatically replace Latin characters with Cyrillic. (I also sometimes verify using a program that I paste text into to see the names of the characters.) If you have trouble finding the characters, you can use the "Cyrillic" menu under the edit box (also available here) as a reference; all the letters there are Cyrillic except in the "Transliteration" section. — Eru·tuon 09:36, 19 April 2019 (UTC)

Thanks. I actually routinely use the "Cyrillic" menu under the edit box. So I am confused now. Can I be sure that it actually has all the correct characters in it? Borovi4ok (talk) 10:12, 19 April 2019 (UTC)
 * Yeah, I just checked the letters in the Cyrillic menu and they're all Cyrillic, except the ones in the Transliteration section. If you used the menu, I don't know how you could have been adding the Latin lookalikes. — Eru·tuon 19:57, 19 April 2019 (UTC)
 * I wish there was a way to access the edit tools when using the translation adding tool. — SGconlaw (talk) 03:51, 20 April 2019 (UTC)
 * Thanks, Erutuon! Borovi4ok (talk) 07:24, 22 April 2019 (UTC)

List of inflection tags by usage?
Hey ... one of the side effects of my adding a whole bunch of inflection tags is that some pages are now running out of memory. One way to attack this is to separate the tags into more and less common ones, and only load the less common set if an unknown tag is encountered. To do this I need a list of all tags by usage; is this something you can produce? Benwing2 (talk) 00:45, 21 April 2019 (UTC)
 * Yep, see here. I had a Lua script go through the uses of, convert the tags from shortcuts to full forms if possible, and count them. — Eru·tuon 01:29, 21 April 2019 (UTC)
 * Oops, I didn't parse HTML comments. But apart from that, it's okay. — Eru·tuon 01:30, 21 April 2019 (UTC)
 * Restricted the list to tags from Module:form of/data. — Eru·tuon 01:46, 21 April 2019 (UTC)
 * Thank you! Can you also make a list of all the cases of involving tags not in Module:form of/data? That way I can fix them up appropriately or add the missing tags to Module:form of/data. Benwing2 (talk) 15:38, 21 April 2019 (UTC)
 * Done. But it comes to about 3 or 4 MiB depending on the format, too large to conveniently save on-wiki. I don't do filesharing much; how do you want me to get it to you? — Eru·tuon 19:41, 21 April 2019 (UTC)
 * Hmm, you could email it to me as an attachment; you should be able to do it using the "email this user" link on the left-hand side. Another possibility is to categorize each page by the inflection tag and make a list of each inflection tag and, under the tag, just the names of the pages containing the tag; that should be much smaller. BTW there may be a bug in your script that computed the counts above; for example, you have only 66 entries listed under "prepositional" but there should be > 40,000, since there are that many Russian nouns and each one has at least a prepositional singular non-lemma form (and usually also a prepositional plural). Similarly there should be thousands of entries under "first-person", "second-person", "third-person", "animate" and "inanimate". Benwing2 (talk) 02:52, 22 April 2019 (UTC)
 * Yeah, you're right, those counts are way off. I'll see if I can fix it. I sent you an email. — Eru·tuon 03:30, 22 April 2019 (UTC)
 * Fixed. It was simply that I was skipping the very first tag in each template. 🙄 — Eru·tuon 03:48, 22 April 2019 (UTC)
 * Thanks for the email. I definitely see some tags that can be added, e.g. to better support Irish and Old Irish, as well as a lot of junk, some of which can be easily cleaned up by bot and some of which is harder to do because it's idiosyncratic. I bet though that at least 90% of the 56,980 entries can be eliminated without a lot of work. Benwing2 (talk) 04:40, 22 April 2019 (UTC)
 * Also we'll need to do another run after I finish converting the lang-specific form-of templates to generic templates; this will hugely increase the frequency of some tags. Benwing2 (talk) 04:45, 22 April 2019 (UTC)
 * 18 rules plus elimination of empty tags plus addition of a dozen or so tags to Module:form of/data2 (which will hold the less frequent tags) leads to elimination of > 94% of the cases:

Fraction of templates with bad tags = 3165 / 56980 = 5.55% Bad tags: other = 1138 autonomous = 314 = 125 Epic = 121 Attic = 107 copulative = 50 negative conjugation = 49 duoplural = 42 definite form = 41 resultative = 40 variant = 39 Doric = 37 unaugmented = 36 Verbal noun = 34 Passive participle = 32 inalienable = 32 possession = 30 (multiple possessions) = 30 indefinite form = 25 = 25 ...
 * Figuring out what to do with the "other" tag will eliminate more than 1/3 of the remainder. Benwing2 (talk) 11:15, 22 April 2019 (UTC)
 * Every one of the "other" tags comes from Polish and corresponds to the rightmost column of e.g. abonować, which is listed in the conjugation table as "masculine animate or masculine inanimate or feminine or neuter", as opposed to "masculine personal". Suggestions for how to handle this? Should I list out 'm|an//in|and|f|and|n', or 'm|nonpersonal|and|f|and|n' (with a new 'nonpersonal' tag), or 'non-masculine-personal' (perhaps handled by a special 'non-' tag), or ...? Benwing2 (talk) 11:26, 22 April 2019 (UTC)
 * I think it would be a good idea to discuss this with Polish editors. (I'm not very familiar with Polish grammar.) — Eru·tuon 20:08, 22 April 2019 (UTC)
 * Hello Polish editors ... could you read the preceding paragraph? There are over 1,000 entries that use the "other" tag in . All of these are Polish past-tense forms like abonowałyśmy, where "other" means "not masculine personal", but this is far from clear without context. I'd like to replace the "other" tag with something more specific, do you agree? Benwing2 (talk) 14:16, 10 May 2019 (UTC)
 * This is easy because we use the "nonvirile" tag in newer entries, e.g. grałyśmy, srałyśmy. Wrzodek (talk) 17:56, 10 May 2019 (UTC)
 * Thanks! This looks easy enough to implement, just other -> nv (= nonvirile), right? Benwing2 (talk) 23:19, 10 May 2019 (UTC)
 * Yes, assuming all "other" tags are in Polish entries, this should fix it once and for all. I can't find any case where "other" could not be made into "nonvirile". Wrzodek (talk) 16:17, 11 May 2019 (UTC)
 * I made the change, using my bot. Benwing2 (talk) 03:17, 12 May 2019 (UTC)
 * Looking at the list, I believe "autonomous" is a particular verb form in Irish, so that one is legitimate. Copulative is used in Zulu and its relatives, for a special form that has the function of a copulative verb. —Rua (mew) 14:22, 10 May 2019 (UTC)

List ϝείδω for etymology of εἴδομαι, εἶδον, οἶδα, and ϝοράω+ϝείδω for ὁράω.
ϝείδω and ϝοράω warrant unique inclusion, as they are the common Ancient Greek ancestors of ὁράω, εἴδομαι, and εἶδον. Their existence explains the weirdness of ὁράω, εἶδον, and οἶδα, from two common verbs of origin, and warrants an exception to the usual tendency to skip reconstructed Ancient Greek forms. Indeed ϝείδω's mention in ὁράω is very useful, and instantly explains why its imperfect is ἐώρων. Wing gundam (talk) 00:29, 25 April 2019 (UTC)
 * I think these irregular forms can be explained without spelling reconstructed forms in Greek script. (What does ϝείδω have to do with ἐώρων? Perhaps you mean that ϝοράω explains ἐώρων?) — Eru·tuon 19:20, 26 April 2019 (UTC)

Grease Pit reversions
Are you sure you got each one or should I revert everything to Rua's edit of 29 minutes ago? DCDuring (talk) 22:59, 28 April 2019 (UTC)
 * Yeah, it should be good; I actually was reverting to Rua's edit over and over. — Eru·tuon
 * Oops, apparently I wasn't. Fixed. — Eru·tuon 23:03, 28 April 2019 (UTC)
 * Yes. I'd noticed no reversion after those last few. Sorry I didn't catch it and block the IP shortly after he started. DCDuring (talk) 23:05, 28 April 2019 (UTC)

and vs. // etc.
Hello. I remember awhile ago you wondered if we could convert uses of  in  to. I wrote a script to do that. It's careful only to combine things of the same type, and I have special exceptions for certain cases where combining doesn't make sense. The script also combines things like  to   and   to. A couple of issues that I'd like your input on: Benwing2 (talk) 02:19, 4 May 2019 (UTC)
 * 1) The use of   can be ambiguous in how loosely or tightly it joins. There are cases like   (in Modern Irish, which should be read as "(nominative + vocative + dative + strong-genitive) plural") and   (in Norwegian, which should be read as "(definite singular) + plural") and   (in Ancient Greek, which should be read in the obvious way). I propose to introduce the code   to bind more tightly than , so that the above three examples could be written as  ,  , and  . I'm not sure how to display this to indicate the binding, maybe nominative, vocative, dative and strong_genitive plural (with an underscore) or definite-singular and plural (with a hyphen). What do you think?
 * 2) When you have multiple "and"'s or "//"'s, sometimes the display can be confusing, e.g. dative and ablative masculine and feminine plural, which should be read as "(dative and ablative) (masculine and feminine) plural" but might be confused as "(dative) and (ablative masculine) and (feminine) plural". I wonder if we should display them differently, e.g. dative–ablative masculine–feminine plural (with en dash) or dative+ablative masculine+feminine plural (with +) or some other way. Comments?
 * BTW, I think if the proper binding can't be expressed using  and , the tag set should be split into multiple tag sets. For example, litear currently has . This could be expressed as  , but might better be expresed as  . I think this especially goes for cases like paca, which has , where the two things being joined share almost nothing; why not use ? Benwing2 (talk) 03:02, 4 May 2019 (UTC)
 * I think it would be okay not to make the display of cases like "dative and ablative masculine and feminine plural" any clearer, and to rely on language-specific context for disambiguation. The incorrect reading "(dative) and (ablative masculine) and (feminine) plural" is technically possible syntactically speaking, but someone who understands the basic grammar of the language (or even has basic understanding of the meanings of the terms) will probably know that's nonsensical and that the correct reading is "(dative and ablative) (masculine and feminine) plural", because only cases and genders can be joined by a conjunction. But of the options given, I prefer dashes (dative–ablative masculine–feminine plural) to plus signs (dative+ablative masculine+feminine plural) because they're more commonly used in good typography. Actually the dashed version is far more readable than the version with "and".
 * The other issue seems more complicated, aside from the person–number labels which feel very obvious to me because they have parallel structure. I don't really like putting unusual characters like underscores in the output, but I don't have a better idea right now. Maybe the binding of "strong genitive" and "definite singular" is something that can be left to language-specific context though. (I don't have a great sense for these particular examples though.) Using underscores with a special meaning in the template code confuses me a little, because underscores are equivalent to spaces in wikilinks and page titles and they are a character in C-ish identifiers (which don't have internal semantic structure). — Eru·tuon 03:50, 4 May 2019 (UTC)
 * Thanks for your input. I'm not wedded to underscores, my other thought is colon, e.g. . The advantage of having *some* code like this is that the underlying template code has an unambiguous interpretation (even if the output doesn't show it), which can enable various use cases. The interpretation of either underscore or colon as a separator would be inhibited if the tag contains either a link (i.e. any of the   or   or   chars) or HTML (i.e. the   or   chars). It isn't necessary to inhibit interpretation of   in this fashion because   doesn't normally occur in links or HTML (which is why I chose it); this allows things like , which occurs frequently.
 * As for your comment about the en-dash version, I agree that it's more readable than the version with "and"; maybe I'll implement this. Benwing2 (talk) 04:50, 4 May 2019 (UTC)
 * Colon does seem less confusing. I also like the idea of having the template code clearly convey the intent, though I'm not sure what non-programmers will think.
 * I had the idea of adding HTML so that JavaScript can find the output of these conjoined tags and change the way they are displayed. It would be sufficient to enclose each of the separators and the whole sequence with a class. Say,  and  . If the separator ends or begins with an ASCII space, I think the space has to be replaced with    to prevent the MediaWiki parser from moving the space outside of the HTML tag. Oddly,   (as well as aliases like  ) is replaced with an ASCII space in the HTML emitted by the parser. For , this would look roughly like html if the linking is omitted. Then JavaScript can iterate over each   element and find the child   elements and change their displayed text. — Eru·tuon 18:58, 10 May 2019 (UTC)
 * This is a good idea. I'll implement it. Benwing2 (talk) 23:17, 10 May 2019 (UTC)
 * Implemented. Benwing2 (talk) 03:08, 12 May 2019 (UTC)
 * Another cleanup task: I think that in Ancient Greek at least, all the incorrectly conjoined tags from the same category in the formation, like  , can safely be changed to  .  should include all the templates that need to be fixed. — Eru·tuon 07:33, 4 May 2019 (UTC)
 * Yup, my script already handles those as well. Benwing2 (talk) 07:50, 4 May 2019 (UTC)
 * Sweet. Oh, also your script could remove . — Eru·tuon 16:09, 4 May 2019 (UTC)
 * I'll make sure it handles that also. Benwing2 (talk) 16:14, 4 May 2019 (UTC)
 * The script has finished running. Let me know if you see anything that's wrong. BTW for a comparison between "and" and en-dashes, see User:Benwing2/billigen vs. User:Benwing2/billigen2. You can view any page in en-dash format by locally changing the return value of export.multipart_join_strategy to "en-dash" in Module:form of/functions, and previewing the page. Benwing2 (talk) 16:46, 5 May 2019 (UTC)
 * I also cleaned up the remaining entries in that my bot didn't handle and that you hadn't already fixed. Benwing2 (talk) 16:56, 5 May 2019 (UTC)
 * Thank you so much! — Eru·tuon 23:24, 5 May 2019 (UTC)

combining adjacent calls to
Hey ... sorry to see all the vandalism on your page. I wrote a script to combine adjacent calls to into a single call with semicolon separators, and then apply combination logic when sets of inflections differ along only one axis (the same thing I already did to existing calls to  with semicolons in them). I am thinking of running it, what do you think? Benwing2 (talk) 01:20, 8 May 2019 (UTC)
 * I would be very glad if you ran that on Ancient Greek entries; I was considering starting a bot to do it because WT:ACCEL doesn't yet do it. I would merge by multiple dimensions, but since Rua disagrees, it's best not to do that without a vote; on the other hand, people are unlikely to disagree with merging by a single dimension, and the templates can later be merged by multiple dimensions if that is agreed on. — Eru·tuon 19:57, 8 May 2019 (UTC)
 * Are you dead set against having syncretism along two axes? As mentioned above, I wrote a script to combine adjacent calls to and combine syncretisms as much as possible. I first went through the latest dump and identified subsections where such combination is potentially possible (producing 442,504 subsections on 420,214 pages), and then ran my script on those subsections. The script first combines adjacent calls to  that can be combined (same language, same lemma, etc.), using , and then seeks to further combine tag sets that differ in a single dimension. Some stats after all combining is done:

Num tag sets seen = 691737 Num tag sets with 1 multipart tags = 342350 (49.49%) Num tag sets with 0 multipart tags = 300938 (43.50%) Num tag sets with 2 multipart tags = 48445 (7.00%) Num tag sets with 3 multipart tags =     4 (0.00%) Tag sets by ordered dimensions of multipart tags: = 300938 (43.50%) case                                    = 169362 (24.48%) gender                                  =  65031 (9.40%) person                                  =  52322 (7.56%) mood                                    =  47584 (6.88%) case, gender                            =  44323 (6.41%) tense-aspect                            =   5490 (0.79%) person, mood                            =   2947 (0.43%) number                                  =   2146 (0.31%) person, number                          =    792 (0.11%) gender, case                            =    318 (0.05%) animacy                                 =    204 (0.03%) voice-valence                           =    122 (0.02%) state                                   =     75 (0.01%) case, number                            =     34 (0.00%) person, tense-aspect                    =      7 (0.00%) voice-valence, mood                     =      7 (0.00%) unknown                                 =      7 (0.00%) class, case                             =      6 (0.00%) state, case                             =      4 (0.00%) class                                   =      4 (0.00%) case, gender, number                    =      4 (0.00%) grammar                                 =      3 (0.00%) number, case                            =      2 (0.00%) number, mood                            =      1 (0.00%) number, gender                          =      1 (0.00%) person, grammar                         =      1 (0.00%) animacy, case                           =      1 (0.00%) gender, number                          =      1 (0.00%)
 * What this means is that 691,737 tag sets were left after combinations were applied (where a "tag set" is a single grouping of tags representing a single inflection, and the semicolon separates tag sets), of which 342,350 (about half) had a single multipart tag in them (where a multipart tag is something like, i.e. it denotes syncretism along an axis), while 300,938 (43.5%) had no multipart tags, 48,445 (7%) had two multipart tags, and only 4 had three multipart tags. The rest of the info specifies the dimensions of the multipart tags: 169,362 (24.48%) of the tag sets had a single multipart tag along the case dimension; 44,323 (6.41%) of the tag sets had two multipart tags, with the earlier one along the case dimension and the later one along the gender dimension (this accounts for almost all the cases of two-axis syncretism); etc.
 * Typical examples of two-axis syncretism are like this:
 * BTW the only examples of three-axis syncretism come from Slovenian, like this:
 * Note that the above three examples rendered using en dashes (which I think looks better) are:
 * (Sorry, the multiline calls to aren't getting formatted right but you get the idea.)
 * If we are to syncretize along only one axis at a time, how should this be done? Should we first seek to minimize the number of inflection lines (hence dat m//f//n and abl m//f//n, rather than dat//abl m, dat//abl f and dat//abl n), and then choose some ordering of dimensions? If so, what should the ordering be? In general I think the two-axis syncretisms are compact and readable and help readers to know the common syncretism patterns, e.g. the dative and ablative plural are almost always the same in Latin, which is obscured by splitting them into separate dat m//f//n and abl m//f//n lines, whereas splitting them the other way obscures the fact that the dative and ablative plural in Latin adjectives almost always have the same form for all genders.
 * Benwing2 (talk) 01:33, 10 May 2019 (UTC)
 * I'm not totally against it, but clarity to the user has to come first. Keep in mind that not everything that's clear to us experienced users is also clear to new users. —Rua (mew) 09:48, 10 May 2019 (UTC)
 * I agree. Can you comment more specifically on the examples above, whether you think they are clear, and if you'd prefer to have syncretism along only one axis, how you'd prefer it done? Benwing2 (talk) 14:11, 10 May 2019 (UTC)
 * I find the hyphens clearer than using "and", because it makes it more clear which terms are on the same axis. But whether that is clear enough for everyone I can't say. Perhaps actual slashes, like in the code, are even clearer (i.e. first/third-person). This is probably something that needs more eyes. —Rua (mew) 14:13, 10 May 2019 (UTC)
 * I'll see what others have to say, but for the moment I modified the examples above to use slash instead of en-dash for joining. Benwing2 (talk) 15:59, 10 May 2019 (UTC)
 * If we are to syncretize along only one axis at a time, how should this be done? Should we first seek to minimize the number of inflection lines (hence dat m//f//n and abl m//f//n, rather than dat//abl m, dat//abl f and dat//abl n), and then choose some ordering of dimensions? If so, what should the ordering be? In general I think the two-axis syncretisms are compact and readable and help readers to know the common syncretism patterns, e.g. the dative and ablative plural are almost always the same in Latin, which is obscured by splitting them into separate dat m//f//n and abl m//f//n lines, whereas splitting them the other way obscures the fact that the dative and ablative plural in Latin adjectives almost always have the same form for all genders.
 * Benwing2 (talk) 01:33, 10 May 2019 (UTC)
 * I'm not totally against it, but clarity to the user has to come first. Keep in mind that not everything that's clear to us experienced users is also clear to new users. —Rua (mew) 09:48, 10 May 2019 (UTC)
 * I agree. Can you comment more specifically on the examples above, whether you think they are clear, and if you'd prefer to have syncretism along only one axis, how you'd prefer it done? Benwing2 (talk) 14:11, 10 May 2019 (UTC)
 * I find the hyphens clearer than using "and", because it makes it more clear which terms are on the same axis. But whether that is clear enough for everyone I can't say. Perhaps actual slashes, like in the code, are even clearer (i.e. first/third-person). This is probably something that needs more eyes. —Rua (mew) 14:13, 10 May 2019 (UTC)
 * I'll see what others have to say, but for the moment I modified the examples above to use slash instead of en-dash for joining. Benwing2 (talk) 15:59, 10 May 2019 (UTC)

Javascript tooling
You seem to be a JS “poweruser”. what do you recommend for adding small Javascript based refactoring tools? I came across TemplateScript, is this any good? I have some Python scripts I use for formatting but I always need to switch back to the terminal, copy&paste etc, I'd like to streamline this. – Jberkel 15:45, 9 May 2019 (UTC)
 * Well, I created a few TemplateScript scripts, but it didn't seem quite flexible enough for everything I wanted to do, so I created User:Erutuon/scripts/CleanupButtons.js to add buttons above the textbox based on arbitrary conditions that usually have to do with the wikitext in the edit box, and have them execute callbacks when clicked. I think most of all I wanted to be able to decide when to add the buttons, rather than having a whole flock of buttons appear on every page, and to have them near the textbox so that I don't have to page up to the sidebar (since I'm using Monobook). CleanupButtons started as a part of User:Erutuon/scripts/cleanup.js, and you can see examples there. Then I moved the button-adding code because I was using it in another script and on Wikipedia. — Eru·tuon 16:45, 9 May 2019 (UTC)

πολύγονον
Thanks for taking a look at my addition. I admit to being a little out of my depth when it comes to some of the finer details like the declension- I basically combined information from an entry starting with πολύ- and one ending with -γονον after checking the LSJ at Perseus and the Gaffiot entry. I also checked as many of the forms as I could get the word study tool at Perseus to show me, though for some reason I couldn't get the genitive to display.

I was wondering if we have any reference template for pages from the. It seems to be an alphabetized condensation of from the Middle Ages, but it has very nice illustrations, and it's viewable online here]. If we do, it would be nice to link to folio 121 for this entry. Chuck Entz (talk) 05:31, 11 May 2019 (UTC)
 * Well, you got the declension and so on right. There are probably hardly any nouns in -ον that don't belong to the -ον, -ου neuter second declension.
 * I'm pretty sure there isn't such a reference template; I didn't see it while recategorizing entries on plant names, and it doesn't seem to be in Category:Ancient Greek reference templates either. — Eru·tuon 23:52, 11 May 2019 (UTC)

WT:NEWS
Although, https://www.unicode.org/versions/Unicode12.1.0/ did come out in May, specifically for 令和 :p —Suzukaze-c◇◇ 02:39, 14 May 2019 (UTC)
 * Ooh, interesting... that character. — Eru·tuon 02:45, 14 May 2019 (UTC)

rookie's question 2
Dear Erutuon! May i bother you again... it is not urgent. I am writing this little module: if a greek word begins with x, x, x letters, then write article την. I do not know exactly how to write them. I know, I should not use commata, and that they need U+ codes (I have them) and something like local gsub = mw.ustring.gsub. Is there a module where I can see examples? I've looked at transliteration modules, but they substitute letters which is a bit different. --sarri.greek (talk) 11:52, 15 May 2019 (UTC)
 * The function that you want for testing that a term begins with a Greek letter is . (  does not always work for Greek letters because it looks at bytes and Greek letters are two or three bytes long in .) It returns a number (actually two numbers, but that doesn't matter in the code that you showed me) if the letter was found or   if it was not, so it can be used in the protasis of an if-statement (  or   if you want to explicitly convert to a boolean). To check if a term begins with α, you can use  . To check if a term begins with one of multiple characters, put them in square brackets:   checks if   begins with a lowercase vowel letter.   at the beginning of the pattern forces the pattern to match only at the beginning of the term, so   returns nil but   returns a number.
 * To avoid having to list a bunch of letters with diacritics, you can decompose the term with  before using  . When decomposed, for instance   (U+03AC GREEK SMALL LETTER ALPHA WITH TONOS) becomes   (U+03B1 GREEK SMALL LETTER ALPHA, U+0301 COMBINING ACUTE ACCENT), and   will return a number while   returns.
 * I'm not sure if there is a good Greek module for this type of thing, but I hope this long post helps. I can give a module with examples if you need it. — Eru·tuon 19:52, 15 May 2019 (UTC)
 * ow this is wonderful: you are a great teacher. I will practice with all the instructions you gave me.  Your previous help with the module that recognizes affixes, is a great hit!! We are very grateful. --sarri.greek (talk) 01:47, 16 May 2019 (UTC)
 * I will experiment with accented letters -which will be very useful-, but in the module I will do the easy thing and reverse the rule: I will state which letters do NOT get the article την (they are just β, γ, δ, θ, φ, χ, λ, μ, ν, ρ, σ, ζ). Thank you!! --sarri.greek (talk) 01:57, 16 May 2019 (UTC)
 * ! itttt works! (tests are all ok) THANK you teacher: now I can do all declensions! --sarri.greek (talk) 12:49, 16 May 2019 (UTC)

Akkadian IPA
Hello, could you help me out with Akkadian traditional transcription and IPA. I could use a template that could convert the transcription to IPA. Luckily, it's pretty straight forward. Each letter has a single correspondence except for e which would have to be imputed manually. – Tom 144 (𒄩𒇻𒅗𒀸) 22:09, 26 May 2019 (UTC)

Franc-Comtois
Cheers, I made a request for Franc-Comtois. --Lo Ximiendo (talk) 03:40, 3 June 2019 (UTC)
 * You left the flag at sixty-eight pixels.
 * Hi, when discussing this gadget, please ping me at MediaWiki_talk:Gadget-WiktCountryFlags.css to get my attention. — Eru·tuon 16:48, 3 June 2019 (UTC)

How do if find a diff I know only by number and wiki?
I know a specific edit number for a WP edit that allegedly triggered a ban of a veteran user. I'd like to see it and the context and judge for myself. I don't know what page was being edited, nor the date. If you don't know how to do this, do you have any idea where I can look? DCDuring (talk) 12:03, 12 June 2019 (UTC)
 * If I understand correctly, you can look up the diff by entering . You don't need the page name because all edit numbers on a wiki are unique. If you then want to look at the history for more context, you can note the date, click the History tab, and enter the date to view edits around that time. — Eru·tuon 18:16, 12 June 2019 (UTC)
 * Thanks. It worked perfectly. I am right now finding the confrontation that probably led to the ban. DCDuring (talk) 19:14, 12 June 2019 (UTC)

CAT:E
Lots of errors in documentation pages of translit modules, which you seem to have introduced. Benwing2 (talk) 05:01, 28 June 2019 (UTC)
 * Sorry, fixed. It was a design flaw in Module:array. — Eru·tuon 06:11, 28 June 2019 (UTC)

Your miracles
! THANK YOU. What you have taught me at this module, I applied here annnddd it works wonderfully! (λύση, gen.sg). Ι will expand now! You are my hero. sarri.greek (talk) 05:44, 25 July 2019 (UTC)

Module:fi-pronunciation
I'm creating a new module designed to implement a template that would replace, and. I was planning to name this Module:fi-pronunciation, Template:fi-pronunciation (after I realized was taken for pronouns). However, Module:fi-pronunciation is used too, by a module of yours that seems to be an unused template meant to replace (?) Module:fi-IPA. Mind if I (eventually) take the name for my module? &mdash; surjection &lang;?&rang; 20:00, 3 August 2019 (UTC)
 * That'd be fine. I created Module:fi-pronunciation by mistake when I didn't notice there was already Module:fi-IPA, so it serves no purpose. — Eru·tuon 00:12, 4 August 2019 (UTC)

Example sentences in usage notes
I pressed ENTER too fast and now my message in the edit summary of appears rude. But what I mean is that such sentences do not look marked enough, not enough. There is also usage examples in new lines with various templates for example in but it seems excessive and I imagine the templates are abused this way. All that I know is not satisfactory. Are there better methods? Fay Freak (talk) 22:12, 11 August 2019 (UTC)
 * Personally I prefer to put the notes themselves on lines without bullets and put examples in a bulleted list below. That's demonstrated, where the previous state is very weird HTML-wise, since it contains single-item unordered lists that contain the usage note and then series of dd tags created by  containing the usage examples. Having notes in paragraph tags and examples in unordered lists (as in the linked diff) makes sense HTML-tag-wise, though I can't speak for whether it looks good or not.
 * In general if the examples are inline, as in, and in Latin script, I put them in italics rather than quotation marks, like (and put a gloss or translation in quotes, if it's provided). That might make them stand out more visually, though it does not make it clear that the two consecutive examples are separate as the quotation marks do. — Eru·tuon 22:36, 11 August 2019 (UTC)

Module:User:Erutuon/Wonderfool
You made a Wonderfool Module? That's so lame. --Gibraltar Rocks (talk) 15:38, 15 August 2019 (UTC)
 * You're welcome! I'm glad you like it. — Eru·tuon 16:10, 15 August 2019 (UTC)
 * I was trying to have my "revenge" by making an Erutuon Module, but I soon realised I still haven't learned Scribunto. So essentially, I lost the game. --Gibraltar Rocks (talk) 16:38, 15 August 2019 (UTC)

RE
I was under influence of măceș. When I pass a multiple-word term as third parameter (altdisplay parameter) of the normal linking templates and use square brackets to link the separate terms the diacritic strip does not run, so I added. Though I could pass the same thing with the desired effect to the second parameter so it does not make sense to use the third, something else does not make sense either. I remember I had this problem unrelated to Bulgarian, I think it was Latin diacritics did not get stripped in such an environment – I only now see the pattern, and yep the test code works with Latin content; before, because of the described error I thought that ѝ does not get stripped because of special handling, so people can link ѝ. But how do people link ѝ anyway if the diacritic is stripped? It’s another thing somewhere here that does not make sense. (Arguably, the page should not exist, but the content abide on и with the diacritic in the headerlines.) And I do remember that there was that discussion about stuff removed from Arabic-script links, differently in Arabic and Persian, I remember the mechanism left much to desire. Fay Freak (talk) 06:31, 21 August 2019 (UTC)
 * Right now the Bulgarian entry-name replacements do not allow linking to ѝ. (As you pointed out, entry-name replacements can be sidestepped by putting links in alt parameters, because alt parameters are not modified in any way. Then the links also don't point to a language section unless they are explicitly written that way: .) Perhaps they should be refined to leave the accent on this word. Without hardcoding anything in Module:languages, that could be done by replacing the word  with a placeholder, removing grave accents, then putting   back again. Hacky, but it would work. The other option is moving the Bulgarian entry from ѝ to и – if the word is usually spelled without the grave accent, outside of teaching materials or dictionaries. — Eru·tuon 07:12, 21 August 2019 (UTC)

Overriding Skt. adjective templates?
Hello - I added inflection templates to अल्प, but am not sure how to add the irregular masc. nom. pl. in -e. Do you know how to override the adjective templates? Hölderlin2019 (talk) 23:33, 28 August 2019 (UTC)
 * Sorry, I don't know much about Sanskrit templates. Perhaps JohnC5 would know? — Eru·tuon 23:40, 28 August 2019 (UTC)
 * So, I did not make the template sa-decl-adj-mfn, which is just the templates sa-decl-noun-m, sa-decl-noun-f, and sa-decl-noun-n concatenated together. It would be possible to add parameters like m_nom_s, m_gen_s, etc., which punch through to the nom_s, gen_s parameters of the sa-decl-noun-m template inside. Regardless, this is not how I wanted to build this template since Sanskrit adjectives sometimes decline differently from the nouns. So... yeah. —*i̯óh₁n̥C[5] 05:01, 29 August 2019 (UTC)

2 things
Hi :)

Could you help with this?
 * 1) review and commit my change to TranslationAdder.js removing the balancer buttons and reliance on trans-mid. I have used it daily since the change and have not seen any problems after removing it.
 * 2) Add me to this list for me to able to use JWB.--So9q (talk) 10:24, 9 September 2019 (UTC)
 * Your change to the gadget looks okay, so I'll copy it to the gadget page.
 * I'm just an interface admin, so I can't edit AutoWikiBrowser/CheckPage. You'll have to get the attention of a real admin (sysop). — Eru·tuon 18:07, 9 September 2019 (UTC)

Lua memory usage
Hi, I found this via your common.js: User:Erutuon/scripts/simpleTranslations.js. It contains this:  for Latin-script terms with just lang, term, and gender, to reduce Lua memory usage, using JavaScript 

Is this still relevant? If yes, would it not be a good idea to improve the TranslationAdder.js to insert these for da, no, nb, etc.? WDYT?

I saw that some pages have sub-pages /translations to work around the Lua memory issue. Can massive use of t-simple avoid that?--So9q (talk) 10:40, 9 September 2019 (UTC)
 * No, the translation adder shouldn't use . It's just a workaround on pages that are in CAT:E because they are using too much Lua memory. And doesn't always reduce memory enough to remove the error messages; that's why there are translation subpages. — Eru·tuon 17:05, 9 September 2019 (UTC)

Community Insights Survey
Share your experience in this survey

Hi ,

The Wikimedia Foundation is asking for your feedback in a survey about your experience with and Wikimedia. The purpose of this survey is to learn how well the Foundation is supporting your work on wiki and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation.

Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.

This survey is hosted by a third-party and governed by this privacy statement (in English).

Find more information about this project. [mailto:surveys@wikimedia.org Email us] if you have any questions, or if you don't want to receive future messages about taking this survey.

Sincerely, RMaung (WMF) 14:34, 9 September 2019 (UTC)

Context deprecation and red message
In context, I restored the version that does not show the long red message. The point of deprecation as opposed to deletion is to make page histories legible. I did that after I noticed in page histories illegibility that I did not expect to be there, and then found the source of the illegibility.

I understand this was an attempt to prevent people from using the template. There is a better way, preserving history legibility: create an edit filter that is going to prevent people from saving an entry that contains a deprecated template. No one created such a filter yet and I don't know why; I fear I do not have enough user rights to edit these filters.

In any case, we have deprecation under control via Category:Pages using deprecated templates, which now contains 4 pages. I am cleaning up the category once in a while, and I remember similar counts. It is very manageable. With the edit filter, it would be even easier. --Dan Polansky (talk)
 * Not a bad approach to the problem. So much edit history is virtually unusable because of deprecation. DCDuring (talk) 14:47, 12 September 2019 (UTC)
 * I'm also generally in favor of keeping histories legible, but got a bit carried away so I added the error message. Since you are keeping an eye on the category, it makes sense to remove it. I do like the idea of an edit filter for frequently used deprecated templates, but I'm not an admin either. — Eru·tuon 00:15, 13 September 2019 (UTC)

Administrator?
You do a lot of valuable work with templates and modules. Would you consider becoming an administrator? — SGconlaw (talk) 11:51, 13 September 2019 (UTC)
 * Good idea. You would have access to more things. We wouldn't make you do more patrolling. DCDuring (talk) 13:10, 13 September 2019 (UTC)
 * Not that I would mind if we had more people patrolling... —Μετάknowledge discuss/deeds 16:51, 13 September 2019 (UTC)
 * I'm grateful for what he does. I run into vandalism that he's undone all the time. Chuck Entz (talk) 22:23, 13 September 2019 (UTC)
 * I'm surprised Erutuon is not an admin already! —AryamanA (मुझसे बात करें • योगदान) 18:50, 14 September 2019 (UTC)
 * He's been offered the position before: see here. 31.173.87.215 18:54, 14 September 2019 (UTC)
 * I refused before, but I guess I'd be willing now if there's something I could do with the admin tools. Perhaps protecting vandalized modules and templates and moving pages. — Eru·tuon 19:31, 14 September 2019 (UTC)
 * Great! Let me see if I can figure out how to nominate you. (Unless someone else wants to jump in and do it first ...) — SGconlaw (talk) 19:51, 14 September 2019 (UTC)
 * Done. Please endorse the nomination. 31.173.83.164 12:15, 15 September 2019 (UTC)
 * Oh, thanks, 31.173.83.164! Erutuon, you need to indicate your acceptance on the voting page. — SGconlaw (talk) 14:46, 15 September 2019 (UTC)

Erroneous conversion to t-simple
Hi, I just discovered that these entries have been converted by you to t-simple because of the Lua memory bug but in a way that does not show the information about gender. This is correct: --So9q (talk) 11:33, 16 September 2019 (UTC)
 * Ouch. Good catch. I'm going to have to figure out if it's better to make parameter 3 be gender, or convert these to use g and change my script. — Eru·tuon 18:24, 16 September 2019 (UTC)
 * Census of parameters in from the latest dump:
 * Since 3 is so common (because of me no doubt),  the gender in either 3 or g. I also checked and there was only one instance with both 3 and g, which I . — Eru·tuon 20:28, 16 September 2019 (UTC)
 * Nice! Thank you, again, again :)--So9q (talk) 20:56, 16 September 2019 (UTC)

English at top
Concerning this do you have a link to a policy or vote stating this norm? I found nothing in wt:EL and other style pages I looked at.--So9q (talk) 08:05, 18 September 2019 (UTC)
 * From ELE: "en"

- Priority is given to Translingual: this heading includes terms that remain the same in all languages. This includes taxonomic names, symbols for the chemical elements, and abbreviations for international units of measurement; for example Homo sapiens, He (“helium”), and km (“kilometre”). English comes next, because this is the English Wiktionary. After that come other languages in alphabetical order. Giorgi Eufshi (talk) 10:43, 18 September 2019 (UTC)
 * OK, that makes sense. --So9q (talk) 11:18, 18 September 2019 (UTC)

Admin
Congratulations! Chuck Entz (talk) 13:01, 30 September 2019 (UTC)
 * Yeah, you are awesome and admin --Vealhurl (talk) 17:52, 10 October 2019 (UTC)
 * Indeed, congrats! — SGconlaw (talk) 20:16, 10 October 2019 (UTC)

wikt:majolica n.
Re your reversion, removal of images: The word majolica has been dogged with confusion since it is used for two distinctly different products in different countries in different periods of time. All other dictionaries than Wiktionary define it inaccurately or omit one sense of the word. Hard to believe but true. The two products, the two meanings of majolica, the two majolicas are visibly different. I feel the deleted images assist understanding and warrant an exception to the 'minimal images' rule. Davidmadelena (talk) 23:10, 15 October 2019 (UTC)
 * I have no objection to illustrating the two definitions – it's just not clear to me why so many images are needed. Why wouldn't two images, one for each definition, be enough? (This is an honest question – I hadn't heard of majolica before the entry showed up in my possibly incorrect headers cleanup page.) If you could find two images that clearly illustrate the differences in the two techniques, that would be ideal. To allow people to see more images, you can create a page on Wikimedia Commons (see c:Category:Majolica) and link it from the entry using . — Eru·tuon 23:33, 15 October 2019 (UTC)
 * Overnight I had reached the same conclusion: two images to clearly illustrate the difference. Done, and thanks, much better now.Davidmadelena (talk) 10:15, 16 October 2019 (UTC)
 * I think you mean :) Eru (talk) 16:36, 16 October 2019 (UTC)
 * My confusing signature is to blame... — Eru·tuon 16:39, 16 October 2019 (UTC)

Removing control chars
Some of these should not be removed, but rather replaced with an em dash, e.g.. Equinox ◑ 21:25, 18 October 2019 (UTC)
 * Oh, yeah, that makes sense. I'll go and clean up after myself. — Eru·tuon 21:36, 18 October 2019 (UTC)

Template:t-simple
Regarding, I thought the whole point of translation subpages was that they would avoid Lua memory problems without the need for the clumsy t-simple template. That's why I've been from them. If you're readding it though, then we're working at cross purposes. —Mahāgaja · talk 09:40, 24 October 2019 (UTC)
 * I switched translations in fire/translations to because it was running out of memory. In general I'm in favor of having translation subpages use,  if they can without running out of memory; if fire/translations can be switched back (maybe I should make a script for this), it should be. — Eru·tuon 16:07, 24 October 2019 (UTC)
 * Good heavens, you're right: it was running out of memory. That's kind of appalling. But I agree that using t-simple in that case is unavoidable. —Mahāgaja · talk 19:50, 24 October 2019 (UTC)

Thank you
I'm extremely new to Lua. Having a solid background in JavaScript has helped me transition, but I appreciate the improvements you've offered. I just wanted to tell you that I've been working on a major update to the module script, which I've been editing offline because...Wiktionary's editor isn't as convenient as EditPad for indentation, regular expressions text search and replacement, etc.

Some background information: I know ideally, if I can get more people to help me out with Marshallese maintenance on Wiktionary (and on Wikipedia, where I'm mostly responsible for it there, too), I can't just treat scripts like something I can write and maintain unilaterally. But for now, the script is still very much in flux, not just in the state of code but in the wisdom of coding decisions, etc. For instance, I think I made a huge mistake embedding separate MED vs. Choi vs. Willson IPA symbols, because they don't actually represent different dialects, but merely different published researchers' occasionally conflicting phonological analyses of the language. Honestly, the state of Marshallese linguistics publications can be a bit of a mish-mash of different researchers doing their own things and not always agreeing on conventions, which has led me occasionally having to get a tad...creative. Lately I've been asking for more peer review on w:Talk:Marshallese language to help improve the occasionally confused and OR-prone state of the article and pronunciation templates, and what the scripting I write here is something I hope can eventually be used there as well where appropriate. That effort on Wikipedia, like this script, and the About Marshallese proposal, are still all very much a work in progress, and for the most part I've had to maintain it all myself, and inadequate peer review means the mistakes I make tend to become the decisive word in how the wikis describe the language, sometimes for years on end until someone (or myself) notices the problem.

So thank you for your help with scripting and setting up some simple test cases, etc. While I'm still improving the script offline, I've made note of your improvements and am trying to add them in the offline editing before I submit and test features of a new update, all while trying not to break currently deployed invocations in the process. - Gilgamesh~enwiki (talk) 08:03, 31 October 2019 (UTC)
 * Glad that my tinkering was appreciated. I just encountered some module errors due to outdated input in and provided more informative module errors, and then possibly made the errors useless by removing u from the supported characters. (All the erroring instances had u.) About Marshallese still needs updating though.  — Eru·tuon 17:18, 2 November 2019 (UTC)
 * Thanks again. And yeah, my bad.


 * Again, some background, and what motivated me to make such drastic changes today: When writing Marshallese templates on Wikipedia and Wiktionary years ago, I devised an ASCII-based symbol system loosely based on the MED phoneme transcription developed by Byron W. Bender and used in the Marshallese-English Dictionary.  But in my effort to simplify it into an ASCII-inputtable system, I changed Bender's a e ẹ i notation to a e o u, since at the time we were treating the  phonemes as underspecified for backness or roundedness&mdash;which is true, they are underspecified for that, but at the time we were representing the phonemes using central vowel symbols .  But in the most recent discussion at w:Talk:Marshallese language where I asked for review by other editors to improve the quality of the language's representation and to reduce original research, it was agreed that only one of the published linguists had represented the phonemes with central vowel symbols at all, and that was Choi (1992).  No one else used his ad hoc system, and it excluded one of the vowels altogether, representing only three.  Other published researchers had either phonetically represented the vowels only as allophones, or echoed Bender's half century of Marshallese research using front vowel symbols (instead of central vowel symbols) to represent the underlying phonemes, which meant that the a e o u notation used before had come to make even less logical sense now.  We agreed to change the way the article on Wikipedia represents the phonology.  Many of those edits are still pending&mdash;I've been focusing most of my edits so far on Wiktionary because it will be most affected by these changes.  Anyway, it was observed that before Bender started using a e ẹ i, he represented them in his earlier works from 1968 and 1969 as a e & i using an ampersand instead of ẹ, and I realized that since those four characters are still ASCII, they're as good as any symbols to represent those phonemes in modules and templates.  I changed every instance I could find in the word entries, and I checked Category:E to check for stragglers, but at the time there weren't any, so I thought I'd gotten them all.  Obviously, it seems I missed two of them.


 * But yes, the use of "o" and "u" as symbols are in the process of being retired, and I edited the parse function to no longer recognize them when I thought I'd at least already updated all the examples in the word entries. (I still need to edit About Marshallese, and the examples in talk pages have the lowest priority at the moment.)


 * This time, I was quick to incorporate your most recent changes to the module code in my offline editing copy. But I admit...I don't understand the syntax text:gsub or what that snippet of code does.  I didn't know Lua at all before a couple of weeks ago, and I've adapted to writing it much more quickly than I imagined possible, but that's thanks to where I've been able to convert my equivalent JavaScript knowledge.  Though I understand your error-checking edits were to diagnose straggling "u" as the culprit, I don't fully understand what your added code actually does in regards to error message reporting.  Could you please explain it, if possible?  When I've gotten errors from the module, I've mainly just been browsing the stack trace and the line numbers of where the error was generated. - Gilgamesh~enwiki (talk) 18:52, 2 November 2019 (UTC)
 * Thanks for the further explanation.
 * I learned JavaScript (and C) after learning Lua, so I can try to explain the colon syntax by comparison with JavaScript. In JavaScript,  is a method call and passes an implicit , equal to  , to the method. In Lua,   is the closest equivalent; it passes   as the first argument to the method.   would call the method with no arguments. The functions in the library are available when a string value is indexed (via the   field in the metatable for strings), so if   is a string,   gives a function equal to  , and   is equivalent to  , and is analogous to   in JavaScript.   would fail to pass   as the first argument to the function, so is equivalent to  :   is the string, and   is the Lua pattern. (Lua will throw a runtime error because the replacement value is required: "string/function/table expected".) In JavaScript, it would be sort of similar to do.
 * The error messages I added were to avoid the incomprehensible error for indexing of a  value for   and such indexings. If   and , then accessing   will cause the error "attempt to index field '?' (a nil value)" because   is   (there is no value indexed by  ) and   values can't be indexed in vanilla Lua. So I added a check that will prevent the "indexing of  " error message, since I like error messages to be somewhat understandable (even though average users can't fix them). The error message might be wrong, since I was writing it quickly, and it's possible the check is no longer needed, if the module ensures that the transcription has correct phonotactics or syntax before that point. — Eru·tuon 20:29, 2 November 2019 (UTC)
 * Thank you. I didn't even know that calling syntax was possible in Lua, but it looks elegant.  I'm tempted to use it more. - Gilgamesh~enwiki (talk) 06:52, 3 November 2019 (UTC)
 * So, to be clear, arg:func is syntactic sugar for func(arg), right? And arg:func(a, b, c) is equivalent to func(arg, a, b, c)? - Gilgamesh~enwiki (talk) 07:12, 3 November 2019 (UTC)
 * Apparently it's not quite that simple... But I'd love to understand it. - Gilgamesh~enwiki (talk) 08:13, 3 November 2019 (UTC)
 * No, any old local or global variable can't be accessed with method syntax. For  to work, indexing   (or  ) has to yield a function. So, setting   as a field in a table with   enables it to be used as a method:  . (The same can be done by setting the metatable for the table:  .)
 * In the Scribunto variety of Lua, we can only modify the fields or metatables of tables. As mentioned, strings have a metatable that allows using the functions in the  library as methods, but it can't be modified. — Eru·tuon 15:35, 3 November 2019 (UTC)
 * I see... - Gilgamesh~enwiki (talk) 23:01, 3 November 2019 (UTC)
 * Well, I hope I'm making sense. Methods in JavaScript and Lua pretty similar apart from the  thing and the difference between prototypes and metatables. — Eru·tuon 17:24, 4 November 2019 (UTC)

Also, if you don't mind my asking, are there any thoughts or critiques you could offer on how I structure the module code, the things I'm doing in the functions, etc.? I'm trying not to make my code too convoluted, but I'm also consciously aware I'm exercising some degree of feature creep. And when I realized you were also exporting the internal conversion functions, I changed the export naming convention so that all such functions are prefixed with an underscore to indicate they are internal functions not intended for normal exported use rather than the actual exports functions. - Gilgamesh~enwiki (talk) 19:00, 2 November 2019 (UTC)
 * In regard to design, it would be simpler (at least conceptually, and for the testcases module) if the transcription-generating functions took a string and yielded a string, rather than an array of strings. Then multiple transcriptions can be handled by applying the functions multiple times. And it would be consistent with to have the separate inputs in numbered parameters, rather than separate them with commas in a single parameter, and to bracket them separately: for instance,  instead of  yielding  as the phonemic transcription instead of . But this might complicate  or the module, so you should be the one to decide. — Eru·tuon 18:51, 5 November 2019 (UTC)

Okay, so to be clear...calling gsub</tt> with <tt>tbl</tt> is equivalent to <tt>function(match) return tbl[match] or match end</tt>? I thought if the item wasn't in the table, it might return <tt>nil</tt> or something, which is why I wrote it as a function that returns the item <tt>or match</tt>. Also, I noticed you replaced all those substitutions with <tt>"("..V..")(ː*)%1"</tt>. I was honestly not aware it was possible to reference a capture within the same pattern. - Gilgamesh~enwiki (talk) 20:40, 4 November 2019 (UTC)
 * Yes, that's correct. Similarly, if a function supplied to  returns   for a particular match, no change will be made to that match. For instance, both   and   return  . (Whereas in JavaScript if you do   you get  . Heh.) — Eru·tuon 20:55, 4 November 2019 (UTC)
 * I appreciate what you've further done with the testcases, in making tests appear on the main module's page itself. And since I really didn't write any of the testcases script and am not sure what to change without breaking it, I should probably let you know that the MED/Choi/Willson stuff is not coming back.  I don't know what I was thinking, putting linguists' conflicting vowel symbols in pronunciation sections as if they were different dialects&mdash;that was really unwise of me to begin with. - Gilgamesh~enwiki (talk) 12:10, 5 November 2019 (UTC)
 * In case you haven't noticed, I've made the testcases on Module:mh-pronunc/documentation to compare the outputs of Module:mh-pronunc and Module:mh-pronunc/sandbox. In each of the table cells for which the sandbox module differs, its output is shown below the output of the main module. — Eru·tuon 16:13, 6 November 2019 (UTC)
 * Yes, I noticed. It helps.  Though I still don't quite understand how you're getting that word list programmatically, as it hasn't seemed to have updated since I added new word entries on the wiki. - Gilgamesh~enwiki (talk) 16:36, 6 November 2019 (UTC)
 * The list of pages and template inputs isn't automatically updated; I generated it from this list of all templates, which I made two days ago with Pywikibot. I can regenerate it soon if you like. — Eru·tuon 16:41, 6 November 2019 (UTC)
 * Oh. Okay, that makes sense. - Gilgamesh~enwiki (talk) 17:24, 6 November 2019 (UTC)

Since you've been helping me maintain the module code, I thought I should let you know that I made some major changes to the code structure. I wrote a new local function,, to help reduce boilerplate in the source, since   is called a lot and I wanted to streamline it. - Gilgamesh~enwiki (talk) 23:55, 13 November 2019 (UTC)
 * I like it. You might want to take a look at applying the useful behavior of the function replacement value. I think it makes the code more readable. — Eru·tuon 21:01, 14 November 2019 (UTC)

My  function may not have been as wise as I once thought. Though it makes code more elegant to read, it can actually make it harder to debug, because errors that occur inside anonymous functions don't seem to report their line numbers if they generate an error, which in a long batch makes it harder to determine where the error came from. I may find myself restructuring code again, but if a lot of sequential  calls are necessary, I think I'd rather reduce the length of some variable names, because the sheer amount of boilerplate can be awful. - Gilgamesh~enwiki (talk) 00:55, 18 November 2019 (UTC)
 * Hmm, this should be an improvement. However, if you aren't aware, you can click the Lua error to get a backtrace (assuming JavaScript is working). — Eru·tuon 05:20, 18 November 2019 (UTC)
 * If I adopt a  mechanism again, I'll look into it. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)

I just noticed a strange abundance of words in the table spelt "About Marshallese", with six different phonological forms. :) Also, been adding more words up to moments ago. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)
 * Yeah, I wasn't sure if you had gotten all the new transcriptions, so I ran the Pywikibot script. It prints the contents of the transclusions of in About Marshallese as well as in entries; then I have to remove the unwanted titles. I added a list of titles to exclude so that in the future the unwanted titles can be automatically removed. Perhaps alternative spelling entries could just be soft redirects using, without any definition or pronunciation (because both of those are the same for all spellings). I changed  to an alternative spelling entry for  based on something you said in the Wikipedia discussion, but am not sure about the others. — Eru·tuon 17:18, 19 November 2019 (UTC)
 * Yeah, the orthography takes a while to get a feel for. I'm still learning new mini-rules about it, especially recently since I started writing that script.  Of the examples at the top of my head, where Bender phonemes are otherwise identical...
 * over or .   isn't difficult from there.
 * over, etc. The Marshallese new orthography, strictly speaking, has no Y.
 * over . The new orthography has no G, either.  Just AĀBDEIJKLĻMM̧NŅN̄OO̧ŌPRTUŪW.
 * over, and similar examples.
 * over, and similar examples.
 * I'm not 100% sure whether or  should be considered primary.  I'm guessing, because  unusually spells out an epenthetic vowel that the new orthography largely avoids.
 * Between spaces, hyphens and unspaced unhyphenated compound words, there's really no difference in pronunciation, so just one can be picked from multiple. Multiple words undergo assimilations in uninterrupted speech, and individual morphemes of words can be enunciated as needed.  The logic of that is...a work in progress;  I'm still trying to reconcile the differences between normal vowels and epenthetic vowels when they neighbor glide consonants {y h w}.  Anyway, I'd probably go with unhyphenated words or hyphenated ones, and hyphenated words over spaced words.
 * Note overall that as I've written vowel simplifications into the module, I've largely been following orthographic norms in deciding which surface vowel to express. And I've been trying to leave notes as to "{this} is [that], not [that]", etc.
 * And thank you again. :) - Gilgamesh~enwiki (talk) 19:49, 19 November 2019 (UTC)
 * And over . - Gilgamesh~enwiki (talk) 22:35, 19 November 2019 (UTC)

Efficiency
I may have significantly increased the module's execution time, which may be extending table load times. I changed it so that  is actually (pretty much unconditionally) called twice and the duplicate result discarded. This is for  mode (variable name subject to change), to satisfy inconsistencies between the way Bender (1968) and Willson (2003) described the language, and the more careful pronunciations prescribed by Naan (2014). Basically, in careful mode, the nasal consonant cluster assimilations are avoided, there's a handful more cases where clusters have epenthesis instead of assimilation, and the behavior of epenthetic vowels neighboring glides has changed. I don't necessarily see an inconsistency in including both, since most languages (including English) have words or phrases that differ notably in pronunciation when spoken more rapidly or more slowly, and can change how people perceive the word in their own speech. Compare "" vs., where some people primarily speak it as two syllables, and some (like me) say it as one syllable. - Gilgamesh~enwiki (talk) 20:02, 20 November 2019 (UTC)
 * Yes, execution time is definitely way up according to the "Lua time usage" measurement (at the bottom of the edit page). According to the profile as I am writing this, 4160 ms (85.2%) of that is . It's not a very efficient function because it's implemented using PHP regex and calls go over the Lua–PHP boundary. Sometimes the number of calls can be reduced by generalizing the patterns (regexes) and using a function replacement. — Eru·tuon 20:17, 20 November 2019 (UTC)
 * By the way, I like how the "careful" mode avoids assimilations. Assuming is a native word, it seems strange for the r to be assimilated into a ņ, when the only reason for the r to be in the spelling is if it is sometimes pronounced. Otherwise, it should be . Similarly with, which could be , though since it's a loanword and the j might be needed to represent the original s, it's not very strong evidence against assimilation. — Eru·tuon 20:44, 20 November 2019 (UTC)
 * Youch... So would it actually be more efficient to pass a function substitutor argument than a string substitutor argument?  I'm all for increasing the efficiency of the script by whatever practical means available.  It is also my very first Lua script.
 * And yes...Marshallese orthography has always been a strange creature. The new orthography since the 1970s is not purely phonemic, obviously, if you compare it with Bender's phonemes, but is designed so that syllables in isolation are reasonably easy to learn how to pronounce once you learn which sound each letter stands for, and is something foreigners (most of whose languages do not have vertical vowel systems) can more easily learn to pronounce.  Native speakers of the language already know words in isolation, and know how to string them together into compound words and sentences, so their orthography can simply string together morphemes and allow epenthesis, sandhi, assimilations, etc. to take their natural course.  In this way, it also preserves the morphemic structure and thus more of the etymology of words, in an orthographic approach also preferred in languages like French and Icelandic.   is a compound name of two morphemes:   "lagoon beach" and  "wave".  If you simply write the assimilations and write it Aņņo, the etymology is relatively more obscured.  What seems to be relatively new to the equation is learning how to pronounce words as they are written in a stable orthography already provided.  This means that some consonant clusters that were previously routinely assimilated, may now be enunciated more carefully by people who have learnt to read and write at school.  Spellings like kw increasingly are no longer taken as single consonant phonemes, but as sequences of k and w.  Two-syllable words like io̧kwe may instead come to be analyzed as three-syllable words because of how they are written.  rn is pronounced as two different consonants because it is written that way.  I've seen evidence of these trends in the pronunciation guides prescribed by Naan (2014), my discovery of which led me to rethink how to write the Lua module.  I honestly can't say I know how realistic these "careful" pronunciations are among native Marshallese speakers (some of it may well be more artificial than not), but it certainly seems to be increasingly how Marshallese is taught, at least in a college environment.  If only we had more access to more native Marshallese speakers, but internet access is too expensive and unreliable for most of the population.  (I'm impressed that the undersea fiberoptic cable connecting Majuro to Guam manages to span the Marianas Trench.) - Gilgamesh~enwiki (talk) 22:09, 20 November 2019 (UTC)
 * I just noticed you made changes to the script. I haven't fully assessed the changes yet, but I've seen just enough to pique my interest. - Gilgamesh~enwiki (talk) 22:31, 20 November 2019 (UTC)
 * Yeah, I think a function substitution can be more efficient. The function replacement handling assimilation is slightly faster, if the "Lua time usage" figures for the "before" and "after" versions of the module are accurate. (But sometimes the figures vary unpredictably. Greater differences are less likely to be the result of chance.) It means only one  call to handle all assimilations, and perhaps the overhead of calling a function for every series of two consonants is less than the overhead of multiple calls to  . I think that's plausible because of all that PHP has to do for each   call.
 * I didn't realize was a compound (naturally, since I'm pretty ignorant). That does provide an explanation for the spelling, even if there's assimilation. — Eru·tuon 22:35, 20 November 2019 (UTC)
 * Is it all right if I rename the substitutor function's variable names? Not just because I generally start non-consonant variable names with a lowercase letter, but   already exists as a separate higher scope variable, and using a different variable name may reduce the risk of variable name confusion and make the code more readable.
 * And s'fine. A lot of common Marshallese morphemes are only two letters long, and there was no Wiktionary Marshallese entry for  yet anyway. - Gilgamesh~enwiki (talk) 22:42, 20 November 2019 (UTC)
 * Yeah, the variable name duplication is not a good idea. I noticed it and was displeased. I do prefer somewhat descriptive variable names over "a, b, c, d" though. — Eru·tuon 22:47, 20 November 2019 (UTC)
 * I tend to think of captures as  as a sequence of captures, and easier on the eyes than letter-numbering them like , etc.  Anyway, I think I know what you're trying to accomplish.  Your code broke some of the (as of yet unused)   logic, but what you're doing here looks very, very clever and I think I know how to take it and run with it with other parts of the code. - Gilgamesh~enwiki (talk) 22:58, 20 November 2019 (UTC)
 * Well, the variable names  were abbreviations of "consonant 1", "articulation 1", "consonant 2", "articulation 2" (though that's not completely accurate terminology, since it's more like primary and secondary articulation), so more descriptive than either   or  .  — Eru·tuon 23:03, 20 November 2019 (UTC)
 * I've thought of it:  .  It helps that neither X nor Y are in the standard new orthography.  And when I realized what you were doing, I rewrote your function.  May I demonstrate...? - Gilgamesh~enwiki (talk) 23:36, 20 November 2019 (UTC)
 * Ahh, that's much more readable! — Eru·tuon 01:20, 21 November 2019 (UTC)
 * Thanks. :D And I'm not even done yet. You gave me the idea, and I'm running with it.  About to try another edit. - Gilgamesh~enwiki (talk) 02:14, 21 November 2019 (UTC)

In response to your question, "Why did the epenthetic vowel disappear between the p and the k in Āneeļļapkaņ?", the pattern is not matching the when   is called the second time, because  is not changed when   is called the first time, and is matched both times. Here is a technique for cases like this that also allows  to be called only once. (Gah, in the edit summary I meant to say "getting the surrounding consonants with ", not " ".) — Eru·tuon 02:54, 21 November 2019 (UTC)
 * Your solution with the  and   indices was clever.  (I renamed them   and  .)  It all...seems to work now.  Now let's see if I can rewrite the logic of another expensive regex batch without breaking it too badly.
 * Oh, and...the table's Rālik vs. Ratak logic seems reversed. When both forms are the same, it shows two table cells.  But when the forms differ, it only shows the Rātak form.- Gilgamesh~enwiki (talk) 03:13, 21 November 2019 (UTC)
 * How much time do you think was shaved off the module's execution, comparing right after I added "careful" mode to when we rewrote this regex batch? - Gilgamesh~enwiki (talk) 03:15, 21 November 2019 (UTC)
 * Whoops, fixed the logic. Glad you spotted it.
 * It is apparently somewhat faster; I previewed Module:mh-pronunc/documentation three times with the old version and the new version, and got 5.3 or 5.4 or 7.1 seconds and 4.5 or 4.6 or 3.0 seconds respectively. Significant variation, so it's hard to say just how much faster, but there wasn't overlap. The number of calls to  in Module:mh-pronunc in the generation of the testcases table (counted thus) has been reduced from 228,294 to 156,516.
 * We should probably be editing Module:mh-pronunc/sandbox to avoid changing transcriptions in entries (and avoid asking the server to update pages).... — Eru·tuon 07:18, 21 November 2019 (UTC)
 * So, edit sandbox for experimental code, and the main module for stable milestones? Yeah, I can see how that's a good idea. - Gilgamesh~enwiki (talk) 13:36, 21 November 2019 (UTC)

I've been considering an alternative approach to programming the phonetic algorithm. As it currently stands, the regex approach is effective in thoroughly processing the input text, but it's also proven a lot more inefficient than I predicted. Putting more logic into substitutor functions improves the performance somewhat, but in a process where regex replaces matches one by one, it's not as practical in making necessarily adjustments to vowels that were already replaced. For example, this existing code: "" Unlike other logic that replaces text based on what already exists to the match's left-hand side, this replacement can only be made if the stable value of the vowel on the right is already known. This is how I earlier solved the problem so that its phonetics were properly displayed as  instead of. In a more optimized approach, that could be fixed in a second regex pass, but I think I have a better idea&mdash;I just don't know beforehand how practical it will be.

Basically, my idea is, instead of relying so much on regex, just parse the input text and represent its data as a doubly linked list of table objects, where each node represents either a consonant or a vowel. Code could loop through the link nodes, make changes in them informed by nodes that come before or after, and can make secondary changes to previous node data as needed. Then, when the linked list is done being manipulated, convert it back to text.

But can this all be done in Lua using only linked lists and logic, more efficiently than batches of regex replacements can do it? - Gilgamesh~enwiki (talk) 18:46, 22 November 2019 (UTC)
 * I'm not sure, but I think it could end up being faster because the overhead of many  calls is considerable. It could also reduce memory because fewer intermediate strings would be created. But I'm speculating.
 * I haven't done anything quite like this; the closest thing is the pair of functions  in Module:grc-utilities and   in Module:grc-utilities. The former processes Greek characters into "tokens" (sub-sequences, mainly to handle diphthongs and single vowels correctly), and uses objects to represent the characteristics of the Greek characters, and the latter processes the tokens to create a transliteration. Not super elegant, but my version of the tokenization function was much faster than the previous one, probably because it got rid of most of the calls to   functions.
 * Using a doubly linked list is an interesting idea. It could be more elegant, though I can't imagine all the details of how it could work. — Eru·tuon 03:24, 24 November 2019 (UTC)
 * Well, practically any grc script has to be easier to maintain than the pre-Scribunto version, which I wrote back in the day. That was such a beast... - Gilgamesh~enwiki (talk) 14:37, 24 November 2019 (UTC)
 * Wait...you said  functions were inefficient.  Does that include  ? - Gilgamesh~enwiki (talk) 14:40, 24 November 2019 (UTC)
 * is noticeably inefficient when there are many calls, for instance when you iterate through strings using . In the previous version of the tokenization function,   was called about up to three times for every code point in the string. My impression is that that explained most of the inefficiency in the old version of the function, though it's not a great testcase because the old and new versions are so different. The overhead is probably not as noticeable in the function replacement in Module:mh-pronunc though, where it currently has only 2,028 calls, as opposed to 115,872 for   to create the testcases table. (And I guess   probably has greater overhead.) It's not so efficient that the function should be avoided altogether.
 * I should say, the module is already efficient enough in entries (it looks like takes about a twentieth of a second in entries), so don't feel obligated to remodel it for that reason at least. (Not to discourage you from rewriting it if you want to – I do quite a bit of random rewriting of modules for various reasons.) — Eru·tuon 23:08, 24 November 2019 (UTC)
 * It's not just Wiktionary I have to think about. I want to also be able to migrate the code to Wikipedia.  Most WP articles where it would be relevant might need the entry only once, but not on articles like  where there are Marshallese names provided for all the notable islets and many of them are notable, but most not notable enough to get separate articles of their own.  And some of these islands have two or three separate Marshallese names depending on context.  Obviously, being WP, pronunciations aren't embedded in the same format as Template:mh-ipa-rows, and perhaps that means fewer functions called, but   would certainly be called multiple times in an article like that.  I'd rather not add that much extra load time there. - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)
 * Also, as I've tried to write linked list code, I'm realizing that I'm still creating a beast of a different kind: Far fewer , but immensely more bloated code.  I get the impression that functions like   are so expensive because the strings are probably encoded in UTF-8, but logic required to seek codepoint indices&mdash;or worse, conceivably to convert between UTF-8 and UTF-16 and back&mdash;may involve a lot of overhead if called often enough (I'm not sure which, if any of these things, is actually being done).  Obviously we're working with a lot of Unicode text and the data needs to be preserved in that format.
 * I wonder...what if I completely redesign the internal code format (returned by  and passed to the other internal functions) to use only ASCII surrogates and byte-based string functions for the text-crunching, and then convert them to Unicode forms to represent their final forms?  Are there also byte-based functions available for regex that are more efficient? - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)
 * I just had a thought. Many calls to   can be expensive, right?  But most of the time I only need a single Unicode character.  What if I...split a string into an array of characters first, and just reference the array's indices?  No dynamic linear behavior involved in retrieving an indexed Unicode code point from a byte string. - Gilgamesh~enwiki (talk) 02:00, 25 November 2019 (UTC)
 * Hm, yeah, maybe some Wikipedia articles could invoke the module enough to noticeably increase Lua time usage. There are quite a few words in that could have IPA transcriptions.
 * I certainly hope  doesn't do any conversion between UTF-8 and UTF-16. That would be madness. I found that the implementation of calls  in PHP, which calls, but I didn't figure out what it does to UTF-8.
 * The byte-based functions are the  library functions (the ones that can be called as methods on strings). They are much more efficient because they call directly into C and don't have to deal with UTF-8 or Unicode categories. But using ASCII replacements for the Unicode characters sounds like a bit of a pain; it could make the intermediate forms a bit harder to understand.
 * Yeah, using an array of characters should be cheaper if you're calling  to get multiple characters from the same string. To be super cheap, I would use  :  . — Eru·tuon 05:38, 25 November 2019 (UTC)
 * I'm increasingly wondering if UTF-16 isn't involved under the hood at all. But then, Unicode code point operations on UTF-8 data still means that the functions cannot know in advance which byte index contains which code point index, which means that it has to measure from the start of the string.  That means linear behavior, and that isn't much better than converting the whole string to UTF-16.
 * Anyway, the string-to-character-array code I had in mind was, called only once before a major   operation whose substitutor function would have otherwise needed   multiple times per match.  I hadn't considered your   approach before, but it looks interesting&mdash;might there be a way to expand it to work with three- and four-byte UTF-8 code points?
 * And yeah, trying to find an ASCII-based surrogate code has proven...challenging, to the point I think maybe I won't do it. I tried to design a Unicode-to-ASCII-to-Unicode cipher mostly based on, but it had its constraints, and a lot of X-SAMPA sequences use two or more ASCII characters where Unicode IPA would only use one code point.  It's fortunate I'm pretty knowledgeable in X-SAMPA, which greatly improved since I wrote an offline JS utility (downloadable here) that automatically converts X-SAMPA input to IPA as you type.  (I wrote it several years ago, and my coding conventions have certainly improved since then, so don't be too horrified if you view source.  If I could write the identical utility today, there would be so many things I'd change.  But I digress.)  So, to try to come up with a one-code-point-to-one-character cipher, I had to think of ways to simplify some sequences.   already has a one-to-one conversion with  , but when writing regex sequences,   would have to become  , so I could just replace it with   instead.  The secondary articulations is where it gets trickier, as the equivalents of  are  .  Since I only use  as a final phonetic presentation form, I could conceivably just use  , but it's again complicated where the X-SAMPA equivalent of  is  .  Lots of these little things call for lots of little simplifications, until you get to the point where the internal string   has a pseudo-X-SAMPA appearance of  , and...I end up kinda not wanting to go that route anymore.  Regex and the algorithm can already get complex enough without making the internal IPA so much harder to read. - Gilgamesh~enwiki (talk) 16:27, 25 November 2019 (UTC)
 * Oh, just now realized that your  does support three- and four-byte code points. - Gilgamesh~enwiki (talk) 16:36, 25 November 2019 (UTC)
 * Wait, your example code just grows an array by assigning new indices to the end of it? That seems bad to me from a JS background, where an array becomes much more inefficient unless you grow it with  .  You sure that doesn't hurt array storage efficiency on the JIT site?  (Or does Scribunto/Lua not use a JIT anyway?)  I'd probably find myself writing it with  's Lua equivalent,  . - Gilgamesh~enwiki (talk) 16:41, 25 November 2019 (UTC)
 * Huh... Okay, then, your approach is better. :) - Gilgamesh~enwiki (talk) 16:44, 25 November 2019 (UTC)
 * Hm, is it generally safe (and hopefully performs better) to use byte-string-based regex functions on UTF-8 strings in situations where it doesn't have to care how the Unicode code points are encoded? UTF-8 searches, UTF-8 replacements, etc.  It seems to me like it would only really get unsafe if you tried to mix non-ASCII characters into single-character regex logic (  etc.), as it would test for the byte rather than the codepoint.  But stuff like simple substring replacements and multi-character captures   could be fine even with UTF-8 code points included. - Gilgamesh~enwiki (talk) 17:02, 25 November 2019 (UTC)
 * isn't any more efficient than . As mentioned in the link, it's actually slower because of the two meanings that   has (  vs.  ). Scribunto doesn't use LuaJIT. It would probably improve performance to allocate the entire array at once with , but that requires knowing the number of code points and having a function that can return that many nils.
 * Yep, those are two cases in which the  library doesn't work with multi-byte characters; also several of the character classes like   are Unicode-dependent in the   library. I wrote a little about this at  and created Module:User:Erutuon/patterns, which contains a function that tests whether a pattern will match correctly (according to UTF-8 and Unicode semantics) in the   library functions.
 * I imagine that converting UTF-8 to UTF-16 and back requires memory allocation, so there should be a significant performance penalty if  is implemented that way. Certainly indexing UTF-8 by code point is slower than byte indexing, but I imagine with this decoding technique it could be fairly fast. — Eru·tuon

I've given the the theoretical Unicode-to-ASCII-pseudo-X-SAMPA cipher more thought, and I believe if I were to use it, it would look something like this: Because, on second thought,  is rather hard to read, but then, so is. These are internal formats, not display formats (even the internal IPA is pseudo-IPA), and at least X-SAMPA is well documented enough for a pseudo-X-SAMPA approach to be viable. I'm still working with code ideas offline. - Gilgamesh~enwiki (talk) 21:23, 26 November 2019 (UTC)

I've tried a variety of coding approaches, and I'm realizing there may be no real substitute for batches of regex. Regexp can be written fairly concisely, and the more bloated code comes, the harder it is to read. And after multiple attempted rewrites, I've found that I've stopped writing comments to reduce mental gear-shifting. Well-written code doesn't need many comments anyway. I just want to write something that balances readability with efficiency. Fortunately, I've had decent success with the pseudo-X-SAMPA approach in concept, and I can minimize the use of UTF-8 regex functions and rely more on faster functions like. (At least I hope it's faster...) - Gilgamesh~enwiki (talk) 08:16, 2 December 2019 (UTC)
 * This revision does seem to be noticeably more efficient than this: about 1.7 seconds versus 2.7 or so. Since some of that is the less efficient Module:mh-pronunc, I guess the sandbox module takes 1.7 - 2.7 / 2, or 0.4 seconds. But there is a tradeoff between efficiency and readability. 20:34, 2 December 2019 (UTC)
 * I wonder...how are Lua's regular expressions functions implemented? ,  , etc.  I cringe to think that the engine has to compile a new regex edifice every time the regex code is passed to one of these functions.  I hope they are at least being cached between calls, either in an internal hashtable or attached to the internalized pattern strings themselves. - Gilgamesh~enwiki (talk) 02:08, 3 December 2019 (UTC)
 * Since Lua patterns are so much simpler than proper regular expressions, they're just interpreted. You can see the pattern-interpreting function used by all of the -library pattern-matching functions, except   when the   flag is set, here. — Eru·tuon 04:15, 3 December 2019 (UTC)
 * I see... I hadn't considered that.  Keeping it simple means implementing it simple. - Gilgamesh~enwiki (talk) 04:27, 3 December 2019 (UTC)

I finished writing the new draft and ironing out the bugs, and replaced the non-sandbox version with it. How does the performance compare now with the previous version? - Gilgamesh~enwiki (talk) 21:32, 5 December 2019 (UTC)
 * Wow! Considerably faster for the whole testcases table: less than half a second. — Eru·tuon 22:52, 5 December 2019 (UTC)
 * Seems like a winner, then. And the code is readable?  The pseudo-X-SAMPA isn't too much trouble?  I had to deviate significantly for some symbols, like   which do not represent their conventional X-SAMPA counterparts, for the sake of being more regex-pattern-friendly and single-character-friendly.  The way I use them,   is actually,   is ,   and   are transitional representations of unsurfaced and surfaced glides,   is &#123;yi'y&#125; ,   is &#123;'yiy&#125; ,   is  (  isn't as readably regex-friendly),   is a dotless  that is friendlier to IPA tie bars, and   is the diacritic .  Otherwise (unless I've forgotten any), the symbols are the same as their X-SAMPA counterparts (or  -notated forms thereof), which are mostly the same as their IPA counterparts when they are plain Latin lowercase letters.  The system works well.  (Right now, in edit preview, it complains that  is invalid IPA, but the choice is really just to keep the tie bar from hovering so much higher than over other pairs of vowels when  is present&mdash; vs. .  If it proves problematic, it can be reverted to &mdash;I just wanted to polish the presentation a bit, which makes a different with certain IPA typefaces like Gentium and certain browsers like Firefox.) - Gilgamesh~enwiki (talk) 01:35, 6 December 2019 (UTC)
 * It looks pretty readable to me, since I'm familiar with a fair amount of X-SAMPA.
 * An alternative to using the dotless i would be to use &#x035C; (U+035C COMBINING DOUBLE BREVE BELOW) if either of the two vowels is i: . I prefer that because the dotless i confuses me: it looks somewhat like, and I think I'm used to seeing the dot when there's a tie bar. The equals sign could be converted to the tie character above or below before the rest of the ASCII characters at the end. — Eru·tuon 04:40, 6 December 2019 (UTC)
 * That is a very good point. I think I'll do what you suggest. - Gilgamesh~enwiki (talk) 04:49, 6 December 2019 (UTC)
 * You know, it has been my conventional wisdom for decades that regular expressions are one of the slowest devices in scripting, and that practically any other conventional means of parsing text is preferable for speed. But that isn't always true, is it?  At least, not in Lua.  In some cases,   actually seems faster than trying to do the same thing procedurally, even if you try to do it all with arrays of one-character strings.  These calls are actually a lot faster than I gave them credit for&mdash;I knew they would be faster than , but not that they might actually be faster than my attempts to do the same thing procedurally.  I suppose it also helps that, this time, I eliminated most throwaway lookup tables, and instead generate them only once and cache them.
 * All that said...I still kinda hate Lua. Too many  s and  s and not enough curly braces, and arrays starting at   instead of   is consistently maddening.  I miss JavaScript.  Would love to write modules in modern JS. - Gilgamesh~enwiki (talk) 05:09, 6 December 2019 (UTC)

I made a small change that could significantly improve performance, at least for some regex replacements, but I don't know how well. The change is: local function string_gsub2(text, pattern, subst) local result = text result = string.gsub(result, pattern, subst) -- If it didn't change the first time, it won't change the second time. if result ~= text then result = string.gsub(result, pattern, subst) end return result end Still looking for small ways I can improve efficiency. - Gilgamesh~enwiki (talk) 19:44, 21 January 2020 (UTC)

toMOD
I wrote a simple new function,, that I need tested, perhaps with a new column in the table. It converts standard orthographic spelling to the format used by the Marshallese-English Online Dictionary, converting ĻļM̧m̧ŅņN̄n̄O̧o̧ to ḶḷṂṃṆṇÑñỌọ. This has potential applications in Marshallese reference templating, where a word in standard orthographic spelling can be automatically converted to MOD's spelling so that references can link directly to dictionary entry anchors on that site without us needing to directly embed a differently-spelt word in the external link. No such template has been written yet. It may be a good idea for each row of the "term" column and a potential MOD column to share a table cell where the forms have identical spelling. And, in any event, the separate MOD spelling should probably not link to a Wiktionary entry with that spelling, as it is and always was a non-standard alteration to Marshallese orthography which is largely limited to the MOD, Naan and associated media intended for offline distribution to available computers in the Marshall Islands. I imagine that, if the standard orthography were considered friendlier to older Windows and Mac computers and their available font rendering, MOD and Naan would be using the standard orthography out of the box, but for the time being they are what they are. - Gilgamesh~enwiki (talk) 07:44, 10 December 2019 (UTC)
 * That is a useful function to have. I think it would be useful to display the MOD spelling in the entry, unlinked – that would allow people to search for the MOD spelling ( and find the entry, provided there's no entry for a homograph of the MOD spelling. — Eru·tuon 22:09, 10 December 2019 (UTC)
 * I thought most modern browsers allow Ctrl-F text searches that recognize letters and ignore diacritics. Right now I press Ctrl-F and type unmarked "lon" and it finds both of those words you just mentioned.  However, just displaying the MOD spelling in the entry might be doable...might need some new templates.  But I think I've been hesitant to dive into new Marshallese entry templating design too soon when there are still so many aspects of the language's grammar I don't fully understand.  For instance, all Marshallese adjectives are verbs, and beyond suspecting that adjectives are stative verbs (equivalent to English "to be &lt;adjective&gt;"), I don't know what else that actually means.  Yet for now, a Marshallese entry template doesn't have to be complicated&mdash;it can just redirect to the standard entry template, but display the MOD spelling as an alternate where they differ.
 * By the way, I've not yet figured out how display actual wiki markup using Scribunto/Lua&mdash;everything I print out seems to be the same as the contents of &lt;nowiki&gt;&lt;/nowiki&gt;. If I knew how to write scripts that generate more complex wiki markup output, I might be able to migrate more of the functionality of  to a template.
 * It also occurs to me that Module:mh-pronunc is getting big, at over 30K now. Conventional wisdom suggests splitting it up into multiple scripts that can be imported into each other as needed, but then a multi-file project isn't as simple to mirror at Wikipedia.  (A copy exists at wikipedia:Module:mh-pronunc, and its comment at the top links back here.)  So maybe, the most portable, reusable portions could be maintained as one script, and more site-specific applications can be separate scripts that can stay on this wiki.  For instance, mh-ipa-rows is useful at Wiktionary but notso much at Wikipedia. - Gilgamesh~enwiki (talk) 03:04, 11 December 2019 (UTC)
 * Oh, by search I'm mean the search engine for Wiktionary. Right now is the 17th result in the search for, but if it is displayed in one of the templates, it should be higher in the results. I was thinking the MOD spelling could be displayed in the pronunciation template, but that isn't quite appropriate, and anyway alternative spelling entries probably need a MOD spelling, but might not have a pronunciation template. Probably the template that displays the MOD spelling should be placed in the Alternative forms section.
 * I've maintained a sort-of mirrored version of a set of Wiktionary modules on Wikipedia (Module:Unicode data), but the Wikipedia and Wiktionary versions have drifted apart in some ways; it's tedious copying the source code. It might be easier with a Pywikibot script, but I can't edit the Wikipedia module anymore because it's been template-protected. — Eru·tuon 04:05, 11 December 2019 (UTC)
 * I didn't realize that's what you meant&mdash;I put it in (newly-created and under-featured) for now.  At least the MOD spelling is being displayed, though.  And I don't think it may be the best idea to put the MOD spelling in an alternative forms section, because it may prompt a naive third-party editor to turn the unlinked term into a linked term and create a word entry.  My concern is that it may motivate an unnecessary duplication of many entries with the non-standard orthographic variants.  It also doesn't help that some sources for the language write Marshallese words without any diacritics, and it seems  was created from one of these sources as an unknowing duplicate of . - Gilgamesh~enwiki (talk) 08:05, 11 December 2019 (UTC)

If I may ask, could you please update the table? I was updating it manually, but then I added so many new entries that I got behind. Most of the new entries are words that start with &mdash;demonyms, mainly. - Gilgamesh~enwiki (talk) 05:08, 15 December 2019 (UTC)
 * Done. And finally the script is fully automatic: it reads the "excluded titles" list and updates the list of template input without me copy-pasting anything. — Eru·tuon 09:59, 15 December 2019 (UTC)
 * Thank you. What do you think of the state of the script and entries now?  It's still only a tiny selection of the language, but I've been trying to steadily add more words.  I'll also try to add words of phonological interest that help continue to refine the script. - Gilgamesh~enwiki (talk) 11:01, 15 December 2019 (UTC)

Overhauling Template:mh-head
Marshallese doesn't have all the complex noun cases of an agglutinative language, but it does have some inflected forms, and would seem to be the appropriate place to list these. I have an idea of what I want to accomplish, but it may require some additional Scribunto/Lua API I'm not that familiar with, since I think template-only logic would become unnecessarily bloated. I was wondering if you could help me write such a template and backing script. I need to figure out how vanilla creates its inflection list and handles the appropriate automatical categories with language-sensitive sorting keys, and how I can extend or replicate that in a script, with possibilities like default inflected forms, more than one of the same kind of inflected form, etc. I can conceptualize what I want to achieve, but API-wise I'm in over my head. - Gilgamesh~enwiki (talk) 02:14, 24 December 2019 (UTC)

I think I found some resources to start with, chiefly Module:headword. - Gilgamesh~enwiki (talk) 18:02, 24 December 2019 (UTC)
 * Yeah, the language-specific headword-line modules call  in Module:headword and if necessary   in Module:utilities to format extra categories that don't begin with the language name. In the Marshallese module there could be a main function that generates the MOD spelling and it can call one of the   to handle part-of-speech-specific stuff. I'm not sure what is a good module to base the Marshallese one on though. Much of Module:eo-headword is probably understandable because the morphology is simple at least. — Eru·tuon 19:52, 24 December 2019 (UTC)
 * Now that I understand the technical aspects better of implementing the template, I realize I still need a better understanding of the grammar, so I'll put it off for the time being. After all, I'm sure there may be all sorts of unforeseen errors in the Wiktionary entries that could be remedied with a better understanding of both Marshallese grammar and the MOD entry structure. - Gilgamesh~enwiki (talk) 05:04, 25 December 2019 (UTC)

Distributive verbs
I think sometimes I forgot just how much technical work you do here at Wiktionary, beyond just helping me with a Marshallese module. I created a new category, Category:Marshallese distributive verbs, but shows this category is not supported. What would be involved in creating new grammar categories? - Gilgamesh~enwiki (talk) 13:45, 14 January 2020 (UTC)

Some brief background: Marshallese distributive verbs basically modify a noun or verb with the rough inflected meaning of "there are a lot of [something]s." This particular grammatical form is demonstrated extensively in example sentences throughout the Marshallese-English Online Dictionary. - Gilgamesh~enwiki (talk) 13:53, 14 January 2020 (UTC)
 * The "distributive verbs" category should only be added to the category system (Module:category tree/poscatboiler/data/lemmas probably) if it's going to be used in other languages and the meaning is roughly the same for all of them – meaning if there are distributive verbs in another language with a different meaning, that doesn't allow us to have a single description for every language's distributive verbs category. At least to start with, it can have manual content. — Eru·tuon 23:38, 15 January 2020 (UTC)
 * That seems logical. Since I'm not specifically aware of distributive verbs being in any other language, I couldn't guarantee they would mean the same thing in those languages.  As it is, Marshallese already uses at least a few relatively exotic grammatical forms that only one or a few other languages use&mdash;for instance, besides Category:Marshallese noun construct forms, there's only Category:Hebrew noun construct forms as subcategories of Category:Noun construct forms by language.  Then there's also adjective verbs, which I initially categorized as Category:Marshallese adjectives, but then wondered if they shouldn't be better in Category:Marshallese stative verbs (there are no adjectives that are not verbs), when in reality these grammatical categories don't always easily fit in the existing conventional hierarchy, and I'm not proficient enough in the language myself to make confident decisions about their placement, and I fear I may be introducing errors that might have to be fixed in bulk at a later date. - Gilgamesh~enwiki (talk) 06:28, 16 January 2020 (UTC)

Wow, you are a busy bee. I think I have even greater respect for what you do here than I did even just 24 hours ago. As much as I would appreciate your continued feedback in my ongoing endeavors, I can still wait. - Gilgamesh~enwiki (talk) 23:28, 15 January 2020 (UTC)

Bug
There's a bug in the module's debug table, most noticeable with words whose Bender spellings start with "yiy" and a vowel. In line with references explaining how Marshallese words can be enunciated phoneme by phoneme, I'm testing an experimental enunciate-mode, where short prosodic breaks are inserted in the middle of consonant clusters. The problem is...the International Phonetic Alphabet specifies these as pipe characters |. I already tried hard-coding  in the module output, but it only looks like. So now I'm using a normal pipe character, but there's a bug in the way the module's debug table displays it. What's only displaying should actually be displaying  - Gilgamesh~enwiki (talk) 19:03, 16 January 2020 (UTC)
 * Fixed, in the testcases module, by escaping the pipes. They are part of template syntax, and in this case the stuff before the pipe was being treated as attributes for the table cell. — Eru·tuon 19:14, 16 January 2020 (UTC)
 * Thank you. :) - Gilgamesh~enwiki (talk) 19:25, 16 January 2020 (UTC)
 * Just FYI: it's unnecessary to ping someone on their talk page, because they already get a notification just from someone else editing their talk page. Chuck Entz (talk) 04:11, 17 January 2020 (UTC)
 * Ahh, good to know. - Gilgamesh~enwiki (talk) 06:37, 21 January 2020 (UTC)

Ratak and Rālik specific word categories
How do I set this up? So things work in, and so forth. I know similar categories exist for Category:Indian English, Category:New Zealand English, etc. The and  dialects of Marshallese are mutually intelligible, and differ mainly by some regular variations in pronunciation reflex, and some vocabulary differences. But many of the different forms are often still written differently depending on dialect. For instance, "good" is the common stem,  is the Rālik reflex, and  is the Ratak reflex, but in both dialects the prothetic vowel vanishes if the stem takes a bare vowel prefix:   ( + ) means "good person." I want to start making articles for the stem forms, and have their dialect reflex entries (by spelling) automatically categorized through,  /  , etc. I should add that I don't know if the dialects themselves have supplemental language codes, the same way Tosk Albanian is "als" (Albanian, South) and Gheg Albanian is "aln" (Albanian, North).

I'm not sure what to name the categories, though&mdash;"Rālik Marshallese"? "Rālik dialect Marshallese"? "Rālik Chain Marshallese"? I'm not sure what the most stable nomenclature would be. In the Marshallese-English Online Dictionary, they're also frequently just called "Dial. W" and "Dial E.", since Rālik ("sunset") is the western chain and Ratak ("sunrise") is the eastern chain, but the two dialects' native isogloss line still runs between the two chains themselves.

I should probably additionally add...I'm not 100% sure that I know what I'm doing. It's one thing to know how templating and scripting languages work (which I increasingly know), and another thing entirely to know how existing templates and scripts are set up so I extend them for specific editing needs. - Gilgamesh~enwiki (talk) 01:14, 20 January 2020 (UTC)
 * Categories for most language varieties are added to entries via Module:labels/data/subvarieties. You can add definitions for the labels and  there, with categories and linked display text if desired. Personally, I like the shorter category name: "Rālik Marshallese". The category page can explain what it means. It looks like there aren't ISO codes for Rālik and Ratak, but if they might be referred to in etymologies (for instance, {{temp|der|en|  prefix to the NEC parameters in the URL, to avoid collisions, and it's traditional to use hyphens in class names rather than underscores. I've made the script use   instead of a custom function.
 * I loaded the scripts, and some of the translation links are colored; but clicking the links doesn't show the NEC. Maybe I broke User:So9q/new-entry-creator.js when I edited it? — Eru·tuon 20:12, 4 November 2019 (UTC)
 * I just tested and it still works for me clicking translation links. Although for now CreateTranslation.js only support fetching the first PoS. There is a bug with lang=code not being set also.--So9q (talk) 16:30, 6 November 2019 (UTC)
 * Oh, it's working now for me too. That's odd. — Eru·tuon 17:07, 6 November 2019 (UTC)

Adding aliases to Module:family tree
You've done a lot of work on this. Now that we have aliases for etymology languages, I'd like to display them, either in the family tree or in an info box, similar to what we have with. Maybe we should have for etymology language categories; currently these categories, when they exist, aren't standardized in name or contents. Benwing2 (talk) 05:40, 15 November 2019 (UTC)
 * I've thought of creating a template for etymology language categories, but I got hung up over an unresolved issue. At the moment, many etymology language categories just have a category for the canonical name (Category:Attic Greek), though there is also Category:Kölsch Central Franconian corresponding to Kölsch . Entries are added to the categories using and . Ideally lemmas and non-lemma forms would be in different categories, but I didn't know how to do that. It would be weird to have to specify lemmas or non-lemma forms in, like having  or  display as "(Epic)" but add different categories, and I didn't know how to accommodate that in Module:labels and couldn't think of another good way to add the categories. So I never came up with any kind of action plan. Maybe this issue doesn't have to be solved right away though. — Eru·tuon 19:52, 15 November 2019 (UTC)
 * One possibility is to allow etymology languages in, which knows about the POS and hence whether it's a lemma or not. The only other way I can think of without having the POS or lemma status marked explicitly in is for  to look through the page text, which is expensive and likely error-prone. Benwing2 (talk) 18:11, 16 November 2019 (UTC)

Χαῖρε! On 21st century Wiktionary we shouldn't perpetuate the biases of 19th century Englishmen; Doric is real Ancient Greek! Not a subdialect of Attic...

 * Χαῖρε, hello, nice to (virtually) meet you...


 * With regard to recent edits on ἅρπα I wasn't sure where to post this, I was just responding specifically vis-à-vis the Doric Greek morphology of ἅρπα but ran long touching on the broader subject of Greek dialects and their inclusion on Wiktionary, so I'll post this full comment on your talk page too...

Inqvisitor (talk) 08:24, 16 November 2019 (UTC)
 * Hi, it looks like your post in WT:RFVN is substantially the same. In future, please post in just one place. You can bring my attention to the post by including a link to my user page (Erutuon). That will send me a notification. — Eru·tuon 09:04, 16 November 2019 (UTC)

On the reversal of my edit on the article on ışık
You reverted my edit on the page ışık. Why is that? The declension adds nothing to the article (the nominative declension is the word itself and the accusative declension is already given in the  template: "ışık (definite accusative ışığı, plural ışıklar)"). In my opinion, the templates  and   shouldn't be used anywhere on Wiktionary as they provide no information that   doesn't already provide already but only bloat the site. --Fytcha (talk) 18:16, 6 December 2019 (UTC)
 * There are a lot more forms in the table than just the definite accusative and the plural, but they are hidden by default. You've got to click two "more" buttons on the right side of the table to see them. — Eru·tuon 18:22, 6 December 2019 (UTC)

Another Rustacean :)
I noticed that you are working in Rust. It has become my favourite language recently, although for Wiktionary bot work I still use Python. —Rua (mew) 11:01, 9 December 2019 (UTC)
 * I've become quite fond of it as well, and now often miss features like return values from blocks and match blocks when programming in Lua. — Eru·tuon 19:36, 9 December 2019 (UTC)
 * I'm interested in things you dislike about Rust. I looked at it a while ago, and there was a lack of libs for doing standard stuff (talking to a database etc.), but that's probably changed in the meantime. - Jberkel 00:26, 10 December 2019 (UTC)
 * Yeah, the development is going pretty fast. Not just the language itself, but library infrastructure as well. —Rua (mew) 10:14, 10 December 2019 (UTC)

If you ever have time
I hate to bother you all the time. If you ever have time, could you check el:Module:sarritest The only person in el.wikt who knew Lua is now a 'vanished' user. sarri.greek (talk) 00:00, 11 December 2019 (UTC) Thank you so much! sarri.greek (talk) 18:48, 11 December 2019 (UTC)
 * Let me know if you need any more help or further explanation. — Eru·tuon 18:51, 11 December 2019 (UTC)
 * The basic ideas of Lua, I cannot grasp. I have tried all kinds of combinations of the words 'local', 'frame', but I cannot make the collective function.main work. It is just an excercise, it is not important.
 * One general question, if i may: When we have a module which produces declensions automatically like el:Module:κλίση/el/ουσιαστικό, is it better/preferable to do all the paradigms IN the Module? Or create wikitext Templates with the parameters for the endings? They are so many! and the Module page becomes so long! sarri.greek (talk) 16:08, 13 December 2019 (UTC)
 * It turns out I had reversed the logic for getting . That's not uncommon with me.
 * Do you mean separate templates for each declension? I suppose either way works, but I like to be able to edit all the paradigms at once and compare them, so having them in a single module helps. For Ancient Greek, the module is Module:grc-decl/decl/staticdata/paradigms. If each is in a separate template, then there are more pages to edit. — Eru·tuon 19:04, 13 December 2019 (UTC)
 * Thank you SO much. For the many pages of paradigmata: I was worried about what is best for ...errr... you call some actions 'expensive' or bad, or not good. I will study the examples you have shown me. sarri.greek (talk) 19:09, 13 December 2019 (UTC)
 * Ahh, I see. I'm not sure which is least expensive in memory and Lua processing time. — Eru·tuon 19:20, 13 December 2019 (UTC)

Req
Hi Erutuon. Can you run a bot to do this:
 * moving translations with ku code and Latin script to kmr code and Northern Kurdish dialect
 * moving translations with ku code and Arabic script to ckb code and Central Kurdish dialect

also this:
 * changing translations with ku code and Latin script to kmr code
 * changing translations with ku code and Arabic script to ckb code

also we shouldn't allow ppl to add translations with ku code; they should use Kurdish dialects codes (kmr, ckb, ...) instead of using ku code directly. Thanks.--Calak (talk) 16:50, 13 December 2019 (UTC)
 * Hmm, I know how to identify scripts, but don't have a method to modify translations yet. I can at least make a list to start with. — Eru·tuon 08:13, 14 December 2019 (UTC)
 * Oh, no! You don't need to modify translations, you should change "ku" code to "ckb" or "kmr" per its script.--Calak (talk) 11:15, 14 December 2019 (UTC)
 * Right, by modifying translations I mean changing moving translations from "Kurdish" to "Northern Kurdish" etc. while using the correct format (the first diff). For that, it would be nice to have a method that would move translation x from language a to language b and format everything correctly. It seems complicated though. Perhaps someone else has worked this out already. But I might be able to change language codes easily (the second diff). — Eru·tuon 22:25, 14 December 2019 (UTC)
 * OK. How about to prevent people from using ku code in translations? Can you add a code (in TranslationAdder gadget) to do this?--Calak (talk) 16:19, 15 December 2019 (UTC)
 * Hmm, perhaps the TranslationAdder could suggest inserting the translation under,  , or   instead of  ? I might be able to figure out how to do that but I've mostly stayed away from that gadget because its code confuses me. — Eru·tuon 09:14, 17 December 2019 (UTC)

It is OK Erutuon. I will be thankful if you can apply any one of them.--Calak (talk) 07:12, 21 December 2019 (UTC)

Reverted Edit
Hello, it is not an "odd alternative pronunciation". Several million people pronounce it that way, whereas the mispronunciation of "decade" has about five variants on the site for about 10 speakers. ABAlphaBeta (talk) 08:39, 17 December 2019 (UTC)
 * I'm sorry for my hasty reversion. I've restored the alternative pronunciation that you probably meant (as User:Mellohi! pointed out to me), but moved it into : . I know very little about the fine details of French pronunciation and you may be right. Words with (or ultimately derived from ) are transcribed with either  or  on Wiktionary, and while the soundfiles of  on the French Wiktionary and on Forvo has, perhaps some people pronounce it with  like  and other words because it may be as confusing for French speakers as it is for foreigners like me. — Eru·tuon 09:10, 17 December 2019 (UTC)

Deletion reasons
Hi. In October, you added "Incorrect title: a mixture of Latin- and Cyrillic-script characters". Do you think this could be merged into the existing "Bad entry title"? How do they differ? Equinox ◑ 08:05, 20 December 2019 (UTC)
 * Well, it's certainly a subtype, but I prefer to be clear since it's not always easy to see what's wrong with the title. I was thinking maybe something like "mixed script" or "incorrect lookalike characters" would work as well. At the time there was a backlog of these titles, and I was getting tired of re-entering the deletion reason since the "content: ..." bit prevented the input box history from working. But perhaps it won't be needed now that there's this abuse filter. It displays a message showing which characters are in which script, which seems to enable editors to create the entry at the right title, so there aren't any new badly titled entries to delete. — Eru·tuon 08:33, 20 December 2019 (UTC)
 * Yeah, went and removed it. — Eru·tuon 08:56, 20 December 2019 (UTC)

Help needed at simple.wikt
Hi Erutuon, can you help me with the Lua Module:number list on simple.wikt? Minorax (talk) 05:10, 29 December 2019 (UTC)
 * Sure... I did fix one problem that caused a module error. — Eru·tuon 05:30, 29 December 2019 (UTC)
 * So that was the problem, forgot about that. Thank you :) Minorax (talk) 05:37, 29 December 2019 (UTC)
 * And since simple.wikt only contains English words, Module:number list/data/en isn't really needed as a subset of the module, is it possible to merge it into the main module? Minorax (talk) 05:41, 29 December 2019 (UTC)
 * It's possible, but I wouldn't recommend it. Putting data in the main module adds many lines, making it harder to edit, and if you want to keep the Simple Wiktionary module in sync with the English Wiktionary module, it will be harder to copy code. — Eru·tuon 05:51, 29 December 2019 (UTC)
 * Alright :) Minorax (talk) 05:52, 29 December 2019 (UTC)