Wiktionary:Beer parlour/2022/November

Do we really need Category:Portuguese adjective feminine forms etc.?
We have Category:Adjective feminine forms by language and Category:Adjective plural forms by language for certain languages, esp. Romance languages. Do we really need these categories? Do they add anything useful? In general we don't categorize non-lemma forms according to their inflectional properties, so I'm not sure why we're doing it here. Benwing2 (talk) 05:32, 1 November 2022 (UTC)


 * Do we really need categories by etymology?
 * The information can be useful for the collection of oddities. For example, cat:Welsh adjective plural forms collects plurals that are distinct from the masculine singular, very much a minority of Welsh adjectives.  Now, the current method of collection leaves a great deal to be desired.  One needs to know that plural forms should be categorised as 'adjective plural form' rather than 'adjective form' via the PoS headline, which is not mentioned in WT:About Welsh.  Consequently, the category is much shorter than it should be, omitting for instance.
 * In this case, better coverage would be obtained by generating a category 'Welsh adjectives with distinct plural', though there may be some awkward corner cases. A specific 'form of' template would also work, though there is the problem of training editors to choose the right template.
 * Category:Hebrew adjective feminine forms could likewise be useful, if one can restrict the display to feminines ending in taw. RichardW57m (talk) 10:59, 1 November 2022 (UTC)
 * If we want to categorize irregular / unexpected forms, it would be better to add something like "irregular" to the category names (and update the contents); as it is, Category:Portuguese adjective feminine forms, with its combination of regular singular and plural form-of soft redirects, which swamp any irregular forms that may be in there, seems kinda useless. Probably we should also rename the Welsh category something like "...irregular plural forms" or "...distinct plural forms" instead of just "...plural forms" for consistency, although if regular plural forms are identical to the singular and wouldn't be categorized at all (since we seem to in general not put "inflected form of itself" sense lines on pages), the need is less pressing. - -sche (discuss) 16:35, 1 November 2022 (UTC)
 * The Portuguese case is more difficult, but for the Hebrew case one can use a search such as:
 * incategory:"Hebrew adjective feminine forms" intitle:/...*ת/
 * Unfortunately, regular expressions seem not to support anchors at all. Let us not make the best the enemy of the good. --RichardW57m (talk) 10:14, 3 November 2022 (UTC)
 * I say delete them. may want to weigh in. Ultimateria (talk) 02:35, 2 November 2022 (UTC)
 * While we're at it, why do we split lemmas by part of speech? --RichardW57m (talk) 10:28, 3 November 2022 (UTC)
 * You mean categories like "English nouns"? I find those very useful for filtering searches. I regularly include or exclude results by part of speech category. Ultimateria (talk) 03:34, 5 November 2022 (UTC)
 * I agree with User:-sche here; categories like this are only useful if they track only irregular forms (and have the appropriate name). Tracking all forms (the vast majority of which will be regular) isn't terribly helpful. Benwing2 (talk) 02:39, 2 November 2022 (UTC)
 * I don't think I have any arguments to offer in favour of retaining them, but I agree that there are situations in Welsh and Hebrew (and probably others) where subcategories of adjective forms might be a good idea even if the general concept is discarded. embryomystic (talk) 01:42, 3 November 2022 (UTC)
 * It would be good to hear from the creators such as before we trash their work on templates and modules. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)
 * Speaking only for myself — I don't have strong feelings either way. I don't actually remember doing work on templates and modules to support these categories, but whatever it was that I did, I imagine that most of it would have been needed anyway in order to show the right display text. —Ruakh TALK 19:14, 11 November 2022 (UTC)
 * You are I take it aware that the concepts of a regular Welsh plural noun and of a regular masculine Arabic plural are dubious, just like the concept of the regular perfect of a Latin 3rd conjugation verb. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)
 * I am not proposing to "trash" Hebrew and Welsh template or module work. It's not even strictly necessary to eliminate all categories named "adjective feminine forms" and "adjective plural forms" etc. But I see no benefit at all to keeping these categories for Romance languages; do you? BTW there do exist regular Arabic masculine plurals (aka "sound masculine plurals"). The irregular ones you're thinking of are broken plurals, and IMO the categories should be named as such, i.e. in a language-specific manner. In fact, we do have such categories; take a look for example at Category:Arabic nouns by inflection type and you'll see a lot of them. Benwing2 (talk) 04:17, 5 November 2022 (UTC)
 * These categories are populated by templates and modules. If the categories are deleted, an implicit invitation to recreate (namely, a red link) will be sent to everyone who is shown the categories of a page being placed in them. The only way to permanently remove the categories requires changing the templates and modules.  Now, it may be simple to orphan the categories by adjusting the code invoked by, but that strikes me as a retrograde step if the categories still exist.  Once objects are no longer be placed in them, these categories will be caught up in the regular slaughter of empty categories.
 * Eliminating these categories for Romance languages is extra work - and I'm not sure that French adjective plurals in -x are not of interest. (Unfortunately, anchors are currently missing from regular expressions in searches - someone should raise a Phabricator ticket to add them.) --RichardW57m (talk) 10:09, 7 November 2022 (UTC)
 * OT: Arabic sound masculine plurals are just one, circumscribed option - it's hard to describe them as the 'regular' form, except when the singular fits certain patterns of derivation, and there are also predictable broken plurals, e.g. for diminutives. --RichardW57m (talk) 11:57, 7 November 2022 (UTC)
 * I absolutely do not understand your objection concerning eliminating the Romance categories given that no one else is in favor of keeping them. "It's extra work to get rid of them" is a pretty questionable reason for keeping them (and in any case the actual work is trivial). Benwing2 (talk) 02:15, 8 November 2022 (UTC)
 * I suspect you're uttering an untruth. Not all users read the Beer Parlour every week.  Should you even expect non-editors to read the Beer Parlour at all?  You haven't even announced the threat to delete them on the category pages themselves!  And how do you propose to eliminate these categories properly?  How to restore them isn't obvious to everyone - your knowledge of the systems employed is excellent, but the systems are not well documented.  Indeed, different languages do the same thing differently. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)
 * Now, some of their functionality should be addressed better. But one should put the alternative functionality in place before deleting the old. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)
 * Damn you are blustery. I want to at least eliminate the Portuguese adjective categories, which are populated only partially and only when you use adj form of, and you are acting like the gatekeeper of all category changes. You haven't given a single reason why these categories are actually useful and worth the maintenance burden (which falls on people like me, not you). Please let me know why you are so desperate to keep them -- do *you* actually use them? Or is this just a sort of "nothing should ever be removed because someone might possibly find them useful"? Benwing2 (talk) 02:13, 14 November 2022 (UTC)
 * It is partly that someone went to the trouble of creating them; I'd be a lot happier if on reconsideration they accepted that it was not useful work. I'd be a lot happier if you put notices on the categories you want to get rid of alerting any who actually use them of the categories' imminent removal.  In general, these adjective categories are being generated by two routes - the -type route, as in, and by direct invocations of .  I don't see any saving in eliminating the former for Portuguese; you eliminate one occurrence of "romance_adjective_categorization," in Module:form_of/cats.  Now, there would be a saving of maintenance effort if you eliminated the categorisation of entries by inflection for gender and number, but if that is what you are proposing, say so. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)
 * Having looked into this mechanism, I am now wondering if it could actually be useful when revising inflection tables and reducing the number of forms. Several cases spring to mind for the dative singular of Pali a-stems:
 * The ending in -tthaṃ does not actually seem to be a case-ending, and one day we may be able to get rid of it. (There was no mechanism to formally challenge it.)  By that time, there may be senses of words ending in -tthaṃ claiming them as dative singulars.  We would then need to eliminate or redescribe them.  However, for this one, it might actually be quicker to search for noun and adjective forms in -tthaṃ, and rely on such forms having entries in the Roman script.
 * We may be overstating the number of masculine and neuter dative singulars in -āya. This is not a rare form for feminines, being used for several cases.  We may therefore need to revise such case forms when entered as terms.
 * At present, we distinguish Pali datives from genitives by their meaning. In Prakrit, the criterion is form, and therefore many words lack datives entirely.  If we switched Pali to the same treatment as Prakrit, what are currently described as dative/gentive caseforms will have to be redescribed.
 * It would seem a shame for categorisation by inflection to have to be re-implemented for such filtering. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)
 * I am already using some Pali verb forms as the basis of categorisation, but categorising the lemma rather than the inflected form, and classifying the categories as maintenance categories. The problem is that the textbooks help one recognise a form, rather than tell one if it does not exist.  --RichardW57m (talk) 13:24, 14 November 2022 (UTC)

Duplicated words in Category borrowed
notifying. It is me again, about borrowed terms. Example: under Category:Greek terms borrowed from French, the members of subcategories Category:Greek learned borrowings from French, the Cat:unadapted & the Cat:obor, etc are duplicated. They appear twice, in the 2 categories. So, we cannot tell, which ones have the template. The terms under calques and semantic loans are OK, they do not appear twice. Same happens at e.g. Category:English terms borrowed from French, and so on. For Greek languages, perhaps others too, the template is very significant and distinct from the other templates. It would be great if this duplication could be avoided? Thank you. &#8209;&#8209;Sarri.greek &#9835; I 07:58, 1 November 2022 (UTC)
 * I guess you're requesting that template lbor does not categorize into 'DEST terms borrowed from SOURCE' but only 'DEST learned borrowings from SOURCE'? My original logic for categorizing into both is that a learned borrowing is still a borrowing, and if you remove them from the parent category, it would be easy for a new Wiktionary user to miss the fact that they also have to look in all child subcategories to find all borrowings. Also there was a vote in favor of including 'DEST terms borrowed from SOURCE' also in 'DEST terms derived from SOURCE', and this is in the spirit of that vote. OTOH I suppose this same argument could potentially be made for including all terms in all subcategories in all their parent categories, which might be undesirable. Benwing2 (talk) 02:48, 2 November 2022 (UTC)
 * No, this time I do not request any template (! I changed my mind, since en.wikt, thinks differently.). As is have seen here and there, there are 2 kins of Categories:
 * 1) The index-like-cateogires (all the members of all subcategories can be viewed there) (Probably they should have a different name too: Index:C....)
 * 2) and the 'non-index' ones which are
 * 2a) either empty, and only subcategories can be seen
 * or 2b) subcategories have their hyponyms, +we view in the general Cat the words which have no characteristic of a hyponym.
 * The above e.g. Category:Greek learned borrowings from French and the similar are a bit sloppy in the sense that there is no way to spot the {bor} = the ones that are NOT hyponyms (I have understaood, that in Eng.Dictionaries, the {bor} is a general word and means no specific kind of borrowing. So, The structure 1. or 2b (I would love to have the 2b, because it serves other languages too, which need to separated {bor} from {lobr} {ubor} ... I am sorry, that I cannot express myself a bit better from the linguisitcs side of things. Thank you, for your attention. &#8209;&#8209;Sarri.greek &#9835; I 03:00, 2 November 2022 (UTC)

Ecclesiastical Latin vs. Medieval and New Latin
For purposes of classification what's the difference between them meant to be exactly on WT? The definitions currently on the category pages are (Ecclesiastical Latin) "a form of Latin initially developed to discuss Christian thought and later used as a lingua franca by the Medieval and Early Modern upper class of Europe"; (New Latin) "a revival in the use of Latin in original, scholarly, and scientific works since c. 1375/1500"; (Medieval Latin) "a primarily written form of Latin used across Europe in the Middle Ages". The definition of Ecclesiastical Latin is the sticking point here since it makes it synonymous with, or a collective term for, Medieval and New Latin, or weirdly implies that the latter are basilects (not "upper class").

My own thought, which seems to better reflect the terms that are actually in the category and how I've used it as a label myself, is that Ecclesiastical Latin should be limited to terms with a specifically liturgical or theological bearing, especially ones that have been current in the Catholic Church up to the contemporary era (apart from the liturgy, many Catholic specialist journals were still written in Latin up to the mid-20th century). The "lingua franca" stuff should be dropped from the description—Ecclesiastical Latin is Latin used by the Church, not just "the upper class" and not specifically in medieval or early modern times. —Al-Muqanna المقنع (talk) 12:03, 1 November 2022 (UTC)


 * Do we need a category for Ecclesiastical at all? As you mention, it spans multiple periods in history. It almost amounts to a topic label, such as 'food' or 'types of potato'. Nicodene (talk) 14:07, 1 November 2022 (UTC)


 * I tend to agree actually, it would make more sense to just have straightforward chronological categories and use Category:la:Theology, Category:la:Bible, Category:la:Christianity etc. where appropriate, and maybe treat existing "Ecclesiastical Latin" labels as meaning "post-Classical". I was thinking about this when I made, which is very much a theological term but a Protestant one (the term is Calvin's and both of my Latin citations are from Lutherans)—is there "Protestant Ecclesiastical Latin", or should it just be listed as New Latin? Might be easier to avoid the question and just use Medieval/Renaissance/New with topics as appropriate. —Al-Muqanna المقنع (talk) 14:14, 1 November 2022 (UTC)
 * How much does in cost us to maintain these labels and categories? If all we get is a bit of tidiness, it doesn't seem worthwhile to suppress the information reflected in the labels and categories. Not all of our category groups are mutually exclusive and collectively exhaustive, nor should they have to be. DCDuring (talk) 15:05, 1 November 2022 (UTC)
 * The problem isn't tidiness, it's that it isn't clear what the label is actually intended to mean, and the description of the category (which is also the intro of the Wikipedia page the label links to) contradicts how it's used in practice. I don't mind if it's kept with an explanation, e.g. along the lines of my suggestion above (Latin as used by the Church, up to the contemporary age). But I am sympathetic to Nicodene's point to the extent that getting rid of the term would not actually suppress any information, since as actually used it doesn't seem to contribute anything that wouldn't be covered by a chronological + topical combination like "New Latin, theology" and the like. —Al-Muqanna المقنع (talk) 16:44, 1 November 2022 (UTC)
 * Exactly. The meaning isn't tidy.
 * It certainly doesn't contribute anything to someone not interested in what it might mean. Is it really true that all Ecclesiastical Latin is about academic theology, rather than, say, maintenance of churches, canon law, or the conduct of rituals. Has anyone knowledgeable taken a good look at how the labels are actually used? What was the source of the labels? How did the source apply them? Is "Ecclesiastical Latin" actually used only for terms used in theological discourse? Do we have anyone who respects the subject(s) enough to make an improvement on the current labels and categories? Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins. Doesn't that add to the value of the existing label? DCDuring (talk) 22:55, 1 November 2022 (UTC)
 * A lot of it relates to law, and it's not entirely appropriate to put the word "ecclesiastical" on that. Yes, much of it obviously was used in that way by the church, but certainly not exclusively. Theknightwho (talk) 22:57, 1 November 2022 (UTC)
 * I think I get your point a little better, but I'm not concerned about e.g. the use of "Ecclesiastical Latin" in etymology sections and the like, imported from dictionaries, although those could be more precise in some cases. I am myself a specialist and I add terms that I come across in primary sources. It isn't clear to me when "Ecclesiastical Latin" should be applied to a term that is being added, or, conversely, what it means when someone else adds one, because our definition of the term is poor. I imagine for a non-specialist it would be even less helpful. So, I think it would probably be good to clarify how we are using it. If you're asking for someone knowledgeable to take a look, well, I am here and taking a look at it, hence this thread. "Ecclesiastical Latin" of course does not only apply to academic theology, hence my point above about theological or liturgical bearing and my suggestion to describe it expansively as language used in relation to Church matters and especially terms that are not obviously circumscribed by era.
 * I do disagree, as a point of fact, that "Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins": I think FWIW that in practice precisely the opposite is true. Law Latin developed over a much shorter period, is entirely technical and constituted more of a pan-European argot because the study of law was dominated by a small number of institutions (Orléans, Bologna) from the time of the reintroduction of the Corpus Iuris Civilis. By contrast, liturgy in the Middle Ages was not developed by technicians and, before the advent of printing, Trent, and , the language of clergy reflected a much more diverse set of local practices, often developed diocese by diocese. Anyway, all this is just to say that I think we should decide on an in-house definition of Ecclesiastical Latin that can be applied with reasonable consistency and can be explained to non-specialist readers, rather than just point to or copy what's on the Wikipedia page, which is fine as it is but wasn't written with a dictionary in mind. —Al-Muqanna المقنع (talk) 00:30, 2 November 2022 (UTC)
 * It may be relevant to this discussion to note that there is an official Vatican body responsible for (among other things) creating a dictionary of neologisms for modern concepts, which are likely often not used, but are probably incorporated into the official Latin translations of Vatican documents. Most of these are not ecclesiastical terms per se, but I would think they are primarily used in ecclesiastical contexts (papal encyclicals and the like). Andrew Sheedy (talk) 04:20, 2 November 2022 (UTC)
 * That's worth noting, for sure. Our current definition of EL focuses on medieval and early modern usage, and sometimes dictionaries use it just to mean "Medieval Latin": but that's a very different beast from Latin as used by the Vatican now. I think my "era-independent" suggestion would encompass that better. —Al-Muqanna المقنع (talk) 19:46, 3 November 2022 (UTC)

Pre-Proto-Mongolic
Modern literature on Mongolic languages tends to make a distinction between Proto-Mongolic (the direct ancestor to Middle Mongolian, spoken between the 10th/11th and 13th centuries) and Pre-Proto-Mongolic, the ancestor to that language, tracing back to approximately the 5th century. Although Proto-Mongolic and Pre-Proto-Mongolic are both unattested, the distinction does still matter, as they're reconstructed by very different means: Proto-Mongolic is primarily reconstructed from extant (and attested) languages within the Mongolic family (though obviously with Turkic, Tungusic and Sino-Tibetan influence where appropriate). On the other hand, Pre-Proto-Mongolic is only possible to reconstruct externally (i.e. indirectly), from what we can infer from known/suspected contact with other language families at the time, and then cross-comparing to what we know about Proto-Mongolic + later developments.

Obviously the number of Pre-Proto-Mongolic lemmas is inevitably going to be quite small for a very long time, but I think the difference between the two is significant enough that it warrants creating a separate L2. For comparison, Pre-Proto-Mongolic would be (near-)contemporary with Old Turkic. Theknightwho (talk) 16:53, 1 November 2022 (UTC)


 * Unless there are descendants of Pre-Proto-Mongolic other than Proto-Mongolic, it seems quite shaky to reconstruct it at all. You can always give the reconstructed older forms (with appropriate references) in the etymology sections of Proto-Mongolic, there's no need to make separate lemmas for them. Thadh (talk) 17:05, 2 November 2022 (UTC)
 * @Thadh We do know that there was some influence of Pre-Proto-Mongolic during that period, which is how we are able to do any reconstructions. I would also feel uncomfortable adding reconstructions under a name not used for them outside of Wiktionary. Theknightwho (talk) 17:09, 2 November 2022 (UTC)
 * Even so, reconstructing purely on the basis of (supposed) loanwords is... eh. And I'm not saying you should add PPM lemmas under the name of PM, I'm rather referring to things like, where the earlier stage (early Proto-Finnic, or pre-Proto-Finnic, if you wish) is given in the etymology section. Same thing is also widely done for Pre-Germanic. No need to make links out of them. Thadh (talk) 17:13, 2 November 2022 (UTC)
 * @Thadh I should probably have mentioned that much of this comes from the attempted reconstruction of, which is a , of which Proto-Mongolic is only one (or is its sister family, depending on which academic you talk to). Although this is tentative (and I'm unsure quite how many actual pages we can be confident enough to create), there are certainly a small handful. Theknightwho (talk) 18:33, 4 November 2022 (UTC)
 * Usually, creating full-fledged codes for proto-languages that contain just one more descendant than another code has not provided fantastic results here on Wiktionary - take Proto-Polynesian (compared to Proto-Nuclear Polynesian) and Proto-Semitic (compared to Proto-West Semitic) - usually the former is identical to the latter and people just link the older language making the whole categorisation and lemmatisation a mess. Thadh (talk) 20:40, 4 November 2022 (UTC)
 * I'm not sure that would happen here. There aren't many PPM reconstructions, compared to the large number of reconstructions for PM. Theknightwho (talk) 21:19, 4 November 2022 (UTC)
 * Support. AG202 (talk) 05:39, 4 November 2022 (UTC)
 * I don't think it's terribly necessary if we are talking about loans from let's say Proto-Turkic into pre-Proto-Mongolic. 's example of shows how to illustrate the etymology elegantly without an additional entry for pre-Proto-Finnic. In such a case, we can include the Proto-Mongolic form among the descendants of the Proto-Turkic reconstruction, thus reciprocally linking the two forms to each other.
 * The opposite case is more interesting if let's say again Proto-Turkic borrowed from pre-Proto-Mongolic, i.e. if the Proto-Turkic form cannot be derived from Proto-Mongolic but definitely reflects an earlier form preceding the Proto-Mongolic stage (similar to pre-Grimm's law Germanic borrowings into Finnic). In the etymology of the Proto-Turkic form, we could mention the putative pre-Proto-Mongolic form and link to the Proto-Mongolic reconstruction derived from the latter, but we cannot include the Proto-Turkic reconstruction among the descendants of the Proto-Mongolic entry. In such a case (to ensure reciprocal linking), pre-Proto-Mongolic entries make sense. –Austronesier (talk) 18:36, 4 November 2022 (UTC)
 * I think we could bend the rules a little and give the Proto-Turkic descendant on the Proto-Mongolian entry with a necessary qualifier From earlier *PPM_form: in the descendants section, something like on (there are much better examples but I can't come up with one off the top of my head and I think the premise is quite clear here). Thadh (talk) 20:34, 4 November 2022 (UTC)
 * @Thadh I'm not sure I understand why that should be necessary, instead of doing it properly. Theknightwho (talk) 21:16, 4 November 2022 (UTC)
 * In practice, how many entries will we get for pre-Proto-Mongolic as donor? –Austronesier (talk) 21:20, 4 November 2022 (UTC)
 * @Austronesier I wouldn't say very many - at least not at this stage. We're probably looking at 20 reconstructions which are possible at all, which theoretically could be used on the pages for about 10 languages each. Theknightwho (talk) 21:32, 4 November 2022 (UTC)
 * That is doing it properly. Reconstructions of languages based on borrowings are very speculative, and we don't host terms that would normally have two (**) or even three (***) asterisks.
 * If we're aiming at a language with under thirty terms that can be (relatively) safely reconstructed, while having a solid reconstruction of a descendant that is also the ancestor of all its other descendants, then just adding this note to thirty lemmas out of hundreds potential pages isn't a problem and saves space and a lot of headache.
 * If we're talking about an actual solidly reconstructed language with a lot of reconstructions, then that means that pretty much any modern Mongolic term will need to have one more code added to its etymology, and that's becoming bothersome on that end. Thadh (talk) 21:25, 4 November 2022 (UTC)
 * So your argument is that if there aren't many there's no point, and if there are lots then it's too much work? Hmm. Forgive me if I'm misunderstanding you there. Theknightwho (talk) 21:29, 4 November 2022 (UTC)
 * I'm saying if there's few then there's no point, and if there are lots it may be better to just switch to generally giving the older form instead of the newer in the reconstructions. Thadh (talk) 16:33, 6 November 2022 (UTC)
 * - Even the reconstruction of Proto-Mongolic is tentative and based upon a handful of works. There is no consensus on the reconstruction of Pre-PM and indeed the reconstruction of the Khitan sound system itself is still in its early phases. The needs of linking Turkic and Tungusic cognates and Khitan entries can be well served by the PM pages themselves. Hromi duabh (talk) 14:28, 25 November 2022 (UTC)

Apply for Funding through the Movement Strategy Community Engagement Package to Support Your Community
The Wikimedia Movement Strategy implementation is a collaborative effort for all Wikimedians. Movement Strategy Implementation Grants support projects that take the current state of a Movement Strategy Initiative and push it one step forward. If you are looking for an example or some guide on how to engage your community further on Movement Strategy and the Movement Strategy Implementation Grants specifically, you may find this community engagement package helpful.

The goal of this community engagement package is to support more people to access the funding they might need for the implementation work. By becoming a recipient of this grant, you will be able to support other community members to develop further grant applications that fit with your local contexts to benefit your own communities. With this package, the hope is to break down language barriers and to ensure community members have needed information on Movement Strategy to connect with each other. Movement Strategy is a two-way exchange, we can always learn more from the experiences and knowledge of Wikimedians everywhere. We can train and support our peers by using this package, so more people can make use of this great funding opportunity.

If this information interests you or if you have any further thoughts or questions, please do not hesitate to reach out to us as your regional facilitators to discuss further. We will be more than happy to support you. When you are ready, follow the steps on this page to apply. We look forward to receiving your application.

Best regards, Movement Strategy and Governance Team Wikimedia Foundation Mervat (WMF) (talk) 13:49, 2 November 2022 (UTC)

Braille
I propose we move Braille from Translingual to Alt Forms of the approrpiate L2's and create something like. Braille entries as they are are a mess. @Binarystep @AG202 @Thadh, and anyone else interested. Vininn126 (talk) 14:44, 2 November 2022 (UTC)


 * That seems sensible for many entries. Can it be automated? —Justin ( koavf ) ❤T☮C☺M☯ 14:49, 2 November 2022 (UTC)


 * Isn't some Braille translingual? Maybe numeric digits, music notation, etc.? Equinox ◑ 14:57, 2 November 2022 (UTC)
 * This definitely seems true, so some translingual braille will have to stay. Vininn126 (talk) 15:30, 2 November 2022 (UTC)
 * There is already Brai-def, but that seems to be only ever used for Japanese. – Wpi31 (talk) 16:00, 2 November 2022 (UTC)
 * I'm inclined to oppose: Braille is essentially an alternative orthography never used in print media nor on the web; We don't include morse code, attested encoding mechanisms or shorthand either, and for good reason: It takes five minutes to look up the braille alphabet and you'll be able to read any braille text with the table, assuming you even manage to find a braille text that doesn't have a regular text next to it. And why on earth Unicode decided to add braille is beyond me. Thadh (talk) 17:01, 2 November 2022 (UTC)
 * Braille books exists? Vininn126 (talk) 17:28, 2 November 2022 (UTC)
 * Okay, I guess that wasn't a perfect wording, I rather meant "print media intended for visual consumption" - braille books are still intended for a very specific group of people that would probably prefer using regular text types if they could. Thadh (talk) 17:31, 2 November 2022 (UTC)
 * Of course they aren't for visual consumption, the vast majority of people reading these books can't see. I don't think I'm understanding the difference you are making. Is your argument based on the fact we should be recording printed letters as opposed to cues for other senses because these alternative "alphabets" are usually based on a visual alphabet? Somewhat relatedly, do you think what we have at ⠁ is what we should be doing? Vininn126 (talk) 17:36, 2 November 2022 (UTC)
 * The point I'm making is that braille, along with morse, shorthand etc., are specialised respellings of the regular (in English's case, Latin) orthographies. So they don't have any place in a dictionary, plain and simple: If someone seriously wants to see what a braille texts says, they should use a converter, or a chart, but not a dictionary. To give some more examples of specialised respellings: binary code, hexadecimal code, UTF-codings... So no, I don't think ⠁ is something we should be doing, I'm fine with keeping the translingual entry for consistency's sake, but making language-specific entries makes no sense to me. Thadh (talk) 17:42, 2 November 2022 (UTC)
 * This is essentially the discussion from a while ago trying to determine if we should collapse a lot of Language's letter content into translingual, utlimately the consensus from that was that we should keep them separate. I think it's rather inconsistent to have separate letter information for a in each L2 but not for various symbols such as this. Vininn126 (talk) 17:46, 2 November 2022 (UTC)
 * @Thadh Braille can be radically different from language to language and country to country though… it’s not the same as Morse code at all. You can’t look up a Braille converter for Braille in Japan for example and expect it to be the same. Also there are shorthand words made from Braille that don’t align with the letters. It feels oddly similar to the arguments made against including Sign Languages. Looking at ⠁⠉ for example, in English Braille it means “according” from the shorthand of “ac” but in Korean Braille it means, which you wouldn’t even be able to easily guess from the Korean Braille alphabet. Another example is ⠾ which differs from language to language significantly. Who knows what other shorthand Braille there are? This is actually one of the better things that Unicode has added, along with SignWriting as it can increase access significantly (who knows how Braille can interact with screen readers?) To quote English Braille: “Braille is frequently portrayed as a re-encoding of the English orthography by sighted people. However, braille is a separate writing system, not a variant of the printed English alphabet”. To label it as a respelling of a regular orthography is inaccurate. This is lexical information that’s important to users and increases accessibility and awareness of how Braille works. I support this proposal wholeheartedly. CC: @Vininn126 AG202 (talk) 21:26, 2 November 2022 (UTC)
 * See also: English Braille & American Braille. You can’t pull out a dictionary and read everything out automatically. And that’s only three Braille systems that I’ve looked into, let alone the many many more. AG202 (talk) 21:40, 2 November 2022 (UTC)
 * I hate to use the "as someone [relative clause]" formation but as someone whose mother frequently uses Braille and teaches it, this "code" stance is fairly wrong.
 * "A few shorthands" (note: this was wording used on the English Wiktionary Discord, not here) does not come close to covering the amount of contractions, multisymbol contractions, symbols, and deprecated usages in Modern English Braille. There are sixty-four (64) possible Braille cells and the amount of distinct symbols and indicators in modern English Braille far exceeds that.
 * Braille, as we know, is not a language, but it is a specialized orthography deserving of demarcation from translingual lemmata. This discussion must acknowledge that not only is there of course multilingual Braille, but there is Braille specifically designed for technical purposes, e.g. Nemeth Braille Code (used for encoding mathematical + phsyical notation). These technical codes (which exist in tongues beyond English) are extremely complex and cannot likely be explained away in a translingual section.
 * Again, in a Braille cell, there are sixty-four possible individual characters. Multiple Braille cells are used to represent completely different letters, contractions, and symbols in different languages. N is not exclusively a translingual page. Why should ⠳ be so?
 * I am aware, Thadh, that you yourself don't like letter pages anyway. But there is a precedent. Jodi1729 (talk) 17:10, 3 November 2022 (UTC)
 * I would like to add a clarification - when I say split, I mean just split the existing letters by language. I do not wish to imply things like transliterations of each words. If there are interesting, non predictable attestable forms of words and such then we can discuss that. Vininn126 (talk) 23:50, 2 November 2022 (UTC)
 * Support. Binarystep (talk) 07:59, 3 November 2022 (UTC)
 * Largely oppose. For the one-cell characters, they are mostly better not split by natural language.  is nice and compact - it would be disastrous to split the Bharati Braille usage by language, and I wouldn't like to split the lemma by script.  Abbreviations and logograms are possible exceptions - I wonder what multilingual Braille systems do for the word-like abbreviations.  In this case, perhaps Wiktionary should act like a reference manual and list transliterators from Braille. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)
 * I concede that there may be a case for L2 Braille-system headers, such as 'Unified'. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)
 * Why would it be any more disastrous than splitting for any other script? Theknightwho (talk) 13:06, 4 November 2022 (UTC)
 * The letter 'a' only has entries for languages written in the Roman alphabet, the corresponding Braille letter would have entries for every language written in Braille - you would add most of the languages of mainland south and southeast Asia. --RichardW57m (talk) 10:42, 7 November 2022 (UTC)
 * Why is that a problem, though? Theknightwho (talk) 15:05, 7 November 2022 (UTC)
 * do you really thing umpteen entries for is better than what we currently have? --RichardW57m (talk) 10:20, 9 November 2022 (UTC)
 * If they’re generally semantically different, then yes. Theknightwho (talk) 16:07, 9 November 2022 (UTC)
 * They will usually map in the first instance to "1" or the "character used to represent /a/ or the approximation thereto". --RichardW57m (talk) 17:26, 9 November 2022 (UTC)
 * So do normal letters. I take it you would support merging those into translingual as once proposed? Vininn126 (talk) 17:31, 9 November 2022 (UTC)
 * Yes, and I note a lot of letters have false precision, as exemplified by definitions like "the nineteenth letter of the Welsh alphabet". The word 'nineteenth' is false precision - it depends on whether 'j' is in the Welsh alphabet, and some such definitions have been inconsistent.  (It wasn't when I was a boy.)
 * Sheer aesthetics argue for the collapse to a single lemma in the case of Braille. --RichardW57m (talk) 11:12, 10 November 2022 (UTC)
 * That's an argument for improving the quality of those entries; not removing them. Theknightwho (talk) 16:52, 10 November 2022 (UTC)
 * Is there even multilingual braille? AG202 (talk) 13:24, 4 November 2022 (UTC)
 * Bharati Braille. RichardW57m (talk) 10:45, 7 November 2022 (UTC)
 * Thank you, I missed that in your original comment. I do wonder though, seeing how Braille systems often have contractions and shorthand, if those could differ for the languages that implement the Bharati Braille system, as you mentioned. Though I disagree with the implementation that Wiktionary should only act as a reference manual. Maybe an L2 Braille system like "Bharati Braille" would be useful, because as is, "Translingual" is not clear and has become a catch-all which is a problem. AG202 (talk) 15:23, 7 November 2022 (UTC)
 * I'm not saying that Wiktionary should act only as a reference manual. If someone is trying to decipher some Braille text, I think it is too much to hope that we will have found attestations for the Braille spelling of every English word in Braille, let alone Welsh.  What we can do is point to a transliteration service.  We might even supply them ourselves - if we list abbreviations, let alone words, we should probably offer transliterations, just as we do for other scripts, though notably on a language by language basis.  (Hmm - non-Roman targeted Brailles need two levels of transliteration - target script and Latin script.  And Bharati Braille is script-agnostic - it even supports basic Latin!) --RichardW57m (talk) 13:04, 10 November 2022 (UTC)
 * The idea of an L2 heading "Bharati Braille" has some appeal, especially if we must break translingual up. As far as I can tell, Bharati Braille has no contractions - it's 'level 1' equivalent employs all codes for simply written words.  There must be some subtleties in the writing - I need to draw up a cell to letter etc. coding table. --RichardW57m (talk) RichardW57m (talk) 13:04, 10 November 2022 (UTC)
 * I believe that lack of a contractions in one language isn't an argument to not separate other languages. Lack of a word for "bombard" in one language is not evidence to not add it in another language. Also I want to emphasize the point of the thread is not to provide transliterations, just change the presentation of the current entries to be more consistent with other letters. There was an attempt to merge them into translingual before, ultimately leading to no change. Vininn126 (talk) 13:13, 10 November 2022 (UTC)
 * Ah, so you are just talking about Braille letters, and not other Braille characters? Note that we haven't split, whose Scandinavian meaning ("minus sign") is different to its English meaning. I will remark here that Unicode considers the Braille characters to be symbols, not letters! Unfortunately, consistency is overrated. ---18:15, 10 November 2022 (UTC) RichardW57m (talk) 18:15, 10 November 2022 (UTC)
 * Look at my second comment to myself above. Also, disagree on the consistency! It makes a huge difference for readers. Vininn126 (talk) 18:38, 10 November 2022 (UTC)
 * So does Unified English Braille have 26, about 51 or how many letters? Is there anywhere a Wiktionary taxonomy for entities in Braille script? It's the 26 that are amongst the most translingual! --RichardW57m (talk) 12:52, 11 November 2022 (UTC)
 * I'm not following. Could you please elaborate? Vininn126 (talk) 12:54, 11 November 2022 (UTC)
 * Not all 64 6-dot Braille cells are letters. I think I've seen ligature and logogram used, and, irrelevently,  of course there are the ten numerals which double as letters.  Decade 5 is mostly punctuation, and the right-shifted cells are mostly 'format' or similar characters.  The dotless cell does not function as a letter. --RichardW57m (talk) 14:51, 11 November 2022 (UTC)
 * (honestly, it should be split) AG202 (talk) 19:00, 10 November 2022 (UTC)
 * There are five characters in Bharati Braille (and more for Indian Urdu) whose writing includes format characters. There are also a couple of ambiguous characters - or at least, that's implicit in the documentation I can find. --RichardW57m (talk) 12:14, 11 November 2022 (UTC)

Frequency information in usage notes
A user removed frequency information that I added to entry supermajority:

"The term supermajority is much more common in the American corpus while qualified majority is much more common in the British corpus."

It traced to R:GNV.

The user said it belongs to context label but did not add any context label himself. This kind of procedure seems very unwiki to me.

I don't think we can fairly describe this in a context label: "Chiefly British" or "Chiefly American" does not seem appropriate context labels. It is not so clear what the prevalence in the corpora means; all we can do is state the prevalence and let the reader follow the GNV link to see for themselves. All it can mean is that Americans use "supermajority" to refer to their political supermajorities while EU uses "qualified majority" to refer to what they do.

What do you think? Does the usage note do more harm than good? I find it very useful, especially when paired with a link to follow.

--Dan Polansky (talk) 13:07, 3 November 2022 (UTC)


 * If that isn’t what the word “chiefly” means, then it’s not at all clear what it is ever supposed to mean. It’s also clearly escaped your attention that I did add a context label, but it would have been helpful if you could have bothered to do it yourself.
 * Rather than putting this information in a usage note that uses 5-10 times as many words as necessary, it is much better to simply use a context label - something that we do almost everywhere else. I also wasn’t aware that “British English” and “EU English” are synonymous. Theknightwho (talk) 13:12, 3 November 2022 (UTC)
 * Per WT:EL: These notes should not take the place of context labels when those are adequate for the job. Case closed. Theknightwho (talk) 14:59, 3 November 2022 (UTC)
 * It would, however, be helpful to indicate in the entry the more common British equivalent. Andrew Sheedy (talk) 15:20, 3 November 2022 (UTC)
 * @Andrew Sheedy It’s right under the definition. Theknightwho (talk) 15:21, 3 November 2022 (UTC)
 * shows supermajority to be about 4 times as common as the other term in the American corpus. Does it make qualified majority "chiefly British"? Not to me: the term still sees very significant use in the American corpus. To me, "chiefly British" would require much smaller use in the American corpus. A problem is that we do not define anywhere what "chiefly" means numerically, something a professional dictionary would have to do. We have too many things uncodified. --Dan Polansky (talk) 18:27, 3 November 2022 (UTC)
 * If your concern is the precise meaning of the adverb "chiefly", then that is solvable by using a different adverb. I strongly suspect you're just nitpicking, though. Theknightwho (talk) 18:42, 3 November 2022 (UTC)
 * So which adverb? As I explained, my understanding of "chiefly" is different from what the data shows. The sentence I used does not suffer from that problem. I do not recall ever tagging entries as "chiefly US" or "chiefly UK" and I do not know what our guideline is for that tagging. My suspicion is that it is based on whim. The problem with crude labels is apparent in color entry, which says "color (countable and uncountable, plural colors) (American spelling) (Canadian spelling, rare)". By contrast, OED says "colour | color" and data shows "color" to be fairly common in the British corpus as of late. Crude labels do not do justice to facts and OED does a better job than we do in its "color" entry. --Dan Polansky (talk) 19:10, 3 November 2022 (UTC)
 * So you're arguing that a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?
 * Your example is actually a great demonstration of why we need to take these NGram numbers with a heavy dose of salt anyway. Just because a variant is used in a corpus doesn't mean that it is actually accepted as being part of a particular variety of English by the speakers of that variety. There are other reasons why it might occur instead: spellcheckers, for instance. Theknightwho (talk) 19:22, 3 November 2022 (UTC)
 * 'a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?' Yes. Kind of obvious to me.
 * The color example shows real data, not guesses and unsubstantiated opinions. OED seems to think so as well given they say "colour | color". Given the data and the OED entry, it seems that "color" is now widely accepted in the British English. Supplementary evidence could challenge that idea, but mere unsubstantiated opinions won't. --Dan Polansky (talk) 14:50, 4 November 2022 (UTC)
 * The example shows that you don't understand how to interpret raw data, and that you don't understand that the OED isn't limited to British English; you've failed to address both of these points. As a native speaker of British English, I can tell you pretty definitively that  is not "widely accepted" in British English. Other corpora do not support your point, either, given that BASE and British English 2006 contain almost 0 instances of, and UkWac Complete and GloWbE show very low usage compared to . As someone who is not a native speaker of English and who does not even live in a country where English is the dominant language, you are not in a position to make the claim that you are; especially when you've stonewalled the obvious explanation that I've already given you.
 * Stop embarrassing yourself. You seem to have absolutely no idea what you're doing, and seem to be completely incapable of accepting that the conclusions you've hastily jumped to might be flawed; often fatally so. Theknightwho (talk) 20:56, 5 November 2022 (UTC)
 * As an American, qualified majority would confuse me; I would have assumed it's the normal use of qualified + majority, which could mean anything given the context. I'd think that the use of qualified majority in US English would either be in that broader sense, or specifically talking about the EU procedures and using the language they use to describe them.--Prosfilaes (talk) 18:00, 16 November 2022 (UTC)
 * I googled for US uses of qualified majority, and after pages of British or European uses, and a few US pages talking about the EU, I found a George Washington University article that used the phrase "qualified majority": "Along with Costa Rica, Argentina, Ecuador, and Nicaragua have adopted qualified majority-runoff rules; i.e., to win outright, the leading candidate must reach a threshold, but the threshold is lower than 50 percent of the vote." That is, this US source uses "qualified majority" for almost the exact opposite of "supermajority".--Prosfilaes (talk) 18:07, 16 November 2022 (UTC)

Albanian proper noun lemmas - indefinite vs definite
I think lemmas for Albanian proper nouns should be the indefinite forms, like with common nouns, even if definite forms are more commonly used. There is no consistency and there are many duplications. So I have already created or changed indef. forms to be lemmas and def forms to be a inflected form only (focusing on country names for now).

Examples of pairs indefinite - definite (already checked and edited by me)

Inflection and headword templates (incomplete) currently support the indefinite form to be the lemma.

Question: not sure if indefinite forms are always easily found but they probably exist. Do all proper nouns have both forms?

Please note Albanian Wikipedia uses definite forms in article names, e.g. "India", not "Indi".

Please comment if you have preferences or knowledge or on the subjects, as I have been making changes, so that less rework would be required. Anatoli T. (обсудить/вклад) 23:35, 3 November 2022 (UTC)

How come Korean verb conjugation templates split between different degrees of politeness, but Japanese don't?
Compare Category:Japanese verb inflection-table templates with Category:Korean verb inflection-table templates.

I just feel like Japanese verb templates would really benefit from the addition of ます forms, etc. These wouldn't be immediately intuitive to new language learners from the Japanese verb templates as they currently stand, when they're just as essential in the appropriate contexts in Japanese as they are in Korean.

So, what do you think? Dennis Dartman (talk) 00:57, 4 November 2022 (UTC)


 * The Japanese inflection of the formal is consistent and simple. It doesn't change dependent on conjugation type, unlike Korean (even if there are commonalities in Korean). Types 1, 2 or 3 have the same formal endings - -ました, -ませ, -ません, etc. Perhaps a link or a note in the conjugation table will suffice.
 * BTW, unfortunately, Korean template don't handle conjugations with 100% accuracy irregular verbs when the formal forms are lacking or the informal forms are lacking. Anatoli T. (обсудить/вклад) 01:11, 4 November 2022 (UTC)
 * Could you give an example for the wrong conjugations? @Atitarev AG202 (talk) 05:36, 4 November 2022 (UTC)
 * There are a few. The latest issues are in Module_talk:ko-conj but you can see me in the same talk page. Both good knowledge of the Korean grammar and module writing skills are required but this could be a combined effort with building cases. Suppression and manual overrides (or a different type for the copulas, special versb) would be required. Anatoli T. (обсудить/вклад) 07:30, 4 November 2022 (UTC)


 * Because the current Japanese inflection-table templates are actually not "tables", but rather "lists". As you can notice, these templates in a single line have kanji, kana and romaji, the three items essentially constituting just one single inflected form. Thus they are 1-dimensional, which I would call "lists", and can only give a few forms before becoming almost unreadable. We would need 2-dimensional tables to have enough space for those addtional ます forms.
 * While it is possible to convert these templates to 2-D table-like structures, this may increase Lua memory usage which I am not sure would be a good idea to everyone. -- Huhu9001 (talk) 03:02, 4 November 2022 (UTC)
 * Well, considering Wiktionary is apparently okay with the likes of Template:sw-conj... Dennis Dartman (talk) 03:47, 4 November 2022 (UTC)
 * Re: memory issues. The Japanese conjugation template already uses Lua memory due to invoking Module:ja repeatedly, even though the main glue of the template is not written in Lua. In my testing, removing the conjugation template from saved a little over 2 MB, out of the 52 MB limit. For comparison, both Russian conjugation templates on  combined, fully implemented using Lua, take less than 1 MB. My guess is that if the list/table were extended to have twice as many forms using its current implementation, that would require about twice as much memory. If the whole table were implemented using Lua (like the Russian ones are), the addition of more forms might incur a smaller additional cost because it wouldn't require loading the modules over and over. I haven't tested this though.
 * Anyway, memory would mainly be a concern on verb entries whose title is a single kanji character, since single-character pages tend to be the worst offenders for Lua memory overuse (due to having many language sections, each of which can be long). 98.170.164.88 04:44, 4 November 2022 (UTC)
 * I don’t think the additional workload is likely to be majorly intensive, and there are economies of scale in Lua if done right. The new Mongolian inflection template uses about 3MB (which is an inherent issue due to how the forms have to be generated, though there are about 3 times as many as Russian). Splitting off the independent genitives so that they have their own tables on the relevant pages didn’t save that much memory, despite cutting the number of forms from 117 to 53. Theknightwho (talk) 12:28, 4 November 2022 (UTC)

The Swahili conjugation template: is it unnecessarily convoluted? Could it use trimming?
Template:sw-conj is massive. Gargantuan.

And are we okay with this?

Feedback from a Swahili speaker preferred. Dennis Dartman (talk) 03:48, 4 November 2022 (UTC)


 * I recommend that you explain why this is bad, and what ideas you have got to fix it. (Feedback from someone who wants to solve problems preferred.) Equinox ◑ 03:50, 4 November 2022 (UTC)
 * I agree it seems rather unwieldy, but as long as it's collapsed by default how much of an issue is it? One benefit of having a comprehensive table is that if you come across a conjugated form and search for it, you'll find the entry for the stem.
 * I guess one drawback is that big pages take longer to load. For comparison, the table on ruka takes up about 309 kB of HTML, while the whole page (excluding JS, images, etc.) is 439 kB. The arm photograph in the Lower Sorbian section is 36.3 kB. 98.170.164.88 04:20, 4 November 2022 (UTC)
 * I already mentioned this at several places, without receiving any response. Another issue is that the table is …drumroll… woefully incomplete; there’s many more relative forms than that. (For example, on the very fist page of Duniani kuna watu you’ll find the form linalompeleka. We do have the entry, but this form doesn’t show in search and I certainly can’t find it anywhere in the interminable collapsible boxes in the table.)
 * Furthermore the template often generates wrong forms.
 * Anyway, most of the forms (including the lacking ones) are 100% predictable. Trying to include all forms is like including “I wouldn’t have been robbed” in the conjugation table at . What’s the point? It drowns the useful information in a sea of cruft.
 * Finally, looking up every single word in a language you don’t know the basics of is a completely pointless exercise. Try and look up all the words in “I’ll make it up to you.” This won’t tell you the meaning of the sentence at all, because you should look up instead of individual words. Someone who doesn’t know the basics of a language should learn the basics of the language rather than looking up every single string surrounded by spaces they come across. Otherwise you just get Bud Carry Without Being in Love. MuDavid 栘𩿠 (talk) 02:06, 5 November 2022 (UTC)
 * There does come a point when agglutinative languages get completely out of hand, yes. Are any of the suffixes involved derivational? Certainly for Mongolian and Turkish, voices like the causative are treated as deverbal, and participles are given their own tables. Otherwise, their verbs would also end up with hundreds/thousands of forms in their conjugation tables. My Mongolian spellcheck dictionary boasts that it can handle 1.8 billion possible inflections, and there's no genuine grammatical reason for it to stop there; just a practical one. We should take a similar view, but probably want to draw the line in a different place (e.g. "with one who is not without one that has a horse" is one for the spellchecker that we probably don't need, and that's just looking at nouns). Theknightwho (talk) 03:04, 5 November 2022 (UTC)
 * I also complained about this a few months ago, but it seems I somehow did not have the tables collapsed by default the way nearly everyone else does. If other people complain, I just want to make sure that they are not having the same problem  I once had. — Soap — 11:16, 5 November 2022 (UTC)

pinging, anybody else? I’ve been working on a proposal for new, trimmed, tables: here. There’s still work to do (see the issues I mention at the bottom of the page), but I feel I’ve progressed enough that feedback is welcome. Let me know what y’all think. MuDavid 栘𩿠 (talk) 03:38, 14 December 2022 (UTC)

Replacing bare lists of adjectives & nouns in usage notes
We currently have a number of entries which have usage notes containing (sometimes very lengthy) lists of nouns and adjectives that the main term is commonly used with. For example: at, &. In my opinion, these are extremely low-effort and of little-to-no use to a reader, given they provide zero contextual information, can't be used as signposts, and are laid out in a format that is much too dense for those who would use the info that they're trying to convey anyway. They're just unlinked blocks of text (which is particularly bad on mobile), with absolutely no information about how any of the listed terms are used with the word in question.

The largest problem, though, is that this is a major misuse of usage notes, which makes it harder to pick out any genuine usage information which is buried underneath. Usage notes are frequently one of the most important sections in an entry, given they usually contain (sometimes critically) important contextual information, which someone unfamiliar with the term needs to know in order to understand the term properly. We do not want to train our readers to skip over them, as these lists invariably will do.

Fortunately, we already have ways of displaying this kind of information: collocations and derived terms. These have several advantages, not least of which are that they're segmented off from other info (and therefore easier to parse), as well as the fact that they show how the two terms are used together. Not every adjective is in the attributive position, after all.

Given this only affects a relatively small number of entries (<50) at present, I suggest we nip this in the bud by converting all of these sections into collocations (or whatever else is appropriate in the context), and we disallow the addition of these bare lists going forward. @Dan Polansky has claimed to me that this is the "traditional" way of doing things, but if there was ever truth in that, it's obviously not how things are generally done now. Theknightwho (talk) 13:49, 4 November 2022 (UTC)


 * I used this format in the English Wiktionary for over a decade. It is more compact and the information conveyed is the same as the space-wasting format for collocations. The lists are as useful as collocations; the difference is that, instead of writing "A X, "B X" and "C X", I write A, B, and C and leave it to the reader to fill in X. This format is used by some collocation dictionaries. The format is very compact, ensuring that even a fairly long list of items takes little screen space.
 * I don't oppose anyone wanting to convert this to the space-wasteful collocation format, but it is not worth my effort and I find the more compact format preferable.
 * I find the lists very useful. They sometimes reveal deficiencies in our definitions. They are often more useful than badly chosen quotations of use, of which we have many.
 * The information content is nearly the same as with collocations, just more compact.
 * If I were a reader, I would be glad someone is actually willing to do this kind of menial work.
 * I would therefore appreciate if I were allowed to continue using that format, in part as impetus for and a recognition of the work being done, even if uninspiring menial work. It will be easy to convert the information to a collocation format using a bot later in volume if desired. --Dan Polansky (talk) 14:46, 4 November 2022 (UTC)
 * We use "Adjectives often used with" on 28 entries, "Nouns often used with" on 16, "Verbs often used with" on 6 and "Adverbs often used with" on 2. Several entries have multiple lists. If you have being using this format for over a decade, you clearly haven't been using it very often. It is certainly not a "traditional" format.
 * You also fail to account for the fact (which I have already mentioned) that we cannot use a bot to convert into collocations, because it cannot accurately predict how each of the collocated terms will fit together. In fact, you've addressed none of the issues. Please just do the work properly in the first place, instead of lying to everyone that your niche way of doing things is the de facto standard. I would also appreciate if you did not misrepresent the issue: the problem is obviously how the information is being presented, and not the fact that it is there at all. Theknightwho (talk) 14:59, 4 November 2022 (UTC)
 * Agreed with User:Theknightwho on all accounts; please use the collocation templates and section. See for instance how nice and tidy looks, much better than those plain lists that visually interfere with the actual usage notes. Furthermore, I hope that the new translation table improvements also affect collocation tables because that would allow us to display them even more concisely while not sacrificing readability. By the way, co-top has already been deployed 262 times, coi 4'902 times. If there's any standard, it's this. &mdash; Fytcha〈 T | L | C 〉 15:15, 4 November 2022 (UTC)
 * I don't like how broken looks; the repetition of the adjective feels unnecessary and a relatively short list takes so much space, using only two colums. German Wiktionary uses a compact format with plain comma-separated lists. And it is of course more typing. More place taken in the wiki code. The adjectives are not linked either in broken. The lists usually do not interfere with anything since most entries do not have usage notes. And collocations are in fact usage.
 * If forced, I will probably resort to using this unseemly co-top business, but it is really annoying. --Dan Polansky (talk) 15:23, 4 November 2022 (UTC)
 * In revision history of hopeless, I noticed there used to be my list of collocating nouns and someone has converted this to the new collocation format later. We did use to have many more of my lists before the new collocation format vote. --Dan Polansky (talk) 15:30, 4 November 2022 (UTC)
 * We used to live in huts and shit in the woods. It is equally irrelevant. Theknightwho (talk) 15:36, 4 November 2022 (UTC)
 * It supports my claim about traditional practice. The new practice is unlikely to be objectively better: German Wiktionary does not use it and Polish Wiktionary does not either, from what I remember. Some people happen to prefer this wasteful format and so do some collocation dictionaries; other collocation dictionaries don't. To liken other professional collocation dictionaries to "shitting in the woods" is outlandish. I always recommend paying attention to objective verifiable facts and contrast them to subjective preferences, whims and value statements. It would be more respectful and true to facts to recognize that people and their preferences differ and start from there. --Dan Polansky (talk) 15:41, 4 November 2022 (UTC)
 * The issue is that you've given no actual argument other than calling it wasteful, which is trivially disproven by the fact that we can put it in a collapsible box, and doesn't address the fact that not all collocations are formulaic. What other Wiktionaries do is not of any relevance, given they frequently mimic our practices. Theknightwho (talk) 15:47, 4 November 2022 (UTC)
 * "wasteful, which is trivially disproven by the fact that we can put it in a collapsible box": nonsense. It is visually wasteful once uncollapsed. Should not need to be said.
 * What other Wiktionaries and other collocation dictionaries do confirms that there is no "objectively best" way of doing it, and in fact, there's often no arguing about taste. I rest my case that we are dealing with subjective preferences, not objective facts of value. --Dan Polansky (talk)
 * Why does that matter when they can collapse and uncollapse it at will? These objections are absolutely surreal. You've not addressed a single concern raised here; you've just had a self-absorbed tantrum about having to do things a bit differently. Theknightwho (talk) 16:11, 4 November 2022 (UTC)
 * An interesting question: how many of these collocations we now have were added anew by people willing to do the menial work and how many of them are just converted collocations entered by me. This would help show how much editors take the value of collocations seriously beyond talking about them and regulating them. --Dan Polansky (talk) 15:46, 4 November 2022 (UTC)
 * Tagging @Vininn126, who is a big fan of collocations. Theknightwho (talk) 15:49, 4 November 2022 (UTC)
 * All of the collocations I add are taken from a Polish National Corpus. Sometimes it takes a very long time to do them. Vininn126 (talk) 16:00, 4 November 2022 (UTC)
 * All of the collocations I add are taken from a Polish National Corpus. Sometimes it takes a very long time to do them. Vininn126 (talk) 16:00, 4 November 2022 (UTC)


 * I would think we would want these hidden by default, so that they wasted less space. Text in a show-hide bar could explain or hint at what lurked beneath. DCDuring (talk) 17:53, 4 November 2022 (UTC)
 * It really depends on the amount of collocations. When it gets close to 7 for a definition I move them from inline to the collapsible box. Sometimes I even set up multiple boxes with senses or even senseid's. Vininn126 (talk) 19:16, 4 November 2022 (UTC)
 * IMHO as soon as they take up more space than the show/hide bar, they should appear under it, collapsed. I'd use the number of columns that led to the smallest amount of vertical screen space occupied by these lists when expanded. Everyone wants to get a lot of space for their favorite content: etymology, pronunciation, citations, usage examples, etc.; now, collocations too. I still have the strong suspicion that users want definitions first and foremost. Their other interests vary. Registered users get to make appear what they want, whatever the default. DCDuring (talk) 22:29, 4 November 2022 (UTC)
 * Agree with Theknightwho and DCDuring, including when Theknightwho is being rude. MuDavid 栘𩿠 (talk) 01:36, 5 November 2022 (UTC)

The minds seem set, but for the benefit of the reader, let's consider 4 major collocation dictionaries: Very interesting. The only one that does anything like Wiktionary is Cambridge. By my subjective taste, the format chosen by Wiktionary is greatly inferior, requiring visual parsing of the same repetitive element again and again. 3 of 4 collocation dictionaries agree. Oh well. I will add that whether an adjective collocates attributively or predicatively is largely irrelevant: it is still a collocation. A "rule" can be "rigid" in the predicative position, and that is also interesting. --Dan Polansky (talk) 14:32, 5 November 2022 (UTC)


 * Stop being so fragile. You have already made it very clear that you have contempt for the concerns of other users over the way you want to do things. No need to repeat yourself. Theknightwho (talk) 14:38, 5 November 2022 (UTC)
 * The current Wiktionary format is superior because it is also flexible in its presentation. You can just change your user JS to display collocations as text lists again if you insist. It's not possible the other way around: a script can't reliably parse your unstandardized plain text lists and convert them to bulleted lists. &mdash; Fytcha〈 T | L | C 〉 14:45, 5 November 2022 (UTC)
 * And how do I customize it so that the repetitive element gets hidden, to match the presentation in the collocation dictionaries? If it at least used tilde instead of the repetitive element, that would be quite an improvement. --Dan Polansky (talk) 14:54, 5 November 2022 (UTC)
 * You could make your own template and apply it. If there were enough use or interest someone might moudule-ize it. DCDuring (talk) 15:07, 5 November 2022 (UTC)
 * What is that supposed to mean? That is not JavaScript. I would have to edit mainspace, wouldn't I? That's not personal customization. --Dan Polansky (talk) 15:18, 5 November 2022 (UTC)
 * You can do that by adding this line to User:Dan Polansky/common.js:  This only works if the proper template (co/coi) is used and if the term is bolded (which should be done anyway). &mdash; Fytcha〈 T | L | C 〉 15:33, 5 November 2022 (UTC)
 * Thank you; fair enough. The boldface is another bad idea: the repetitive items are obtrusive enough even without boldface. If the collocating items that vary were in boldface, that would make a little bit more sense. A problem with the customization idea is that we should provide best defaults possible. We will have to assume that the choice made is the best default from usability standpoint. I don't believe that at all and 3 dictionaries agree with me, but the minds are set, so it's what it is. --Dan Polansky (talk) 15:38, 5 November 2022 (UTC)
 * With your mindset, no-one would ever innovate. Theknightwho (talk) 15:52, 5 November 2022 (UTC)

Comments in quotations
If I need to clarify something minor in usage examples or quotations (for instance things for which English lacks a distinction), I abuse abbr to add my comment (see for a recent example). However, I'm wondering what the best approach would be for quotations where every other word merits a comment. This mainly happens when a language that is correctly written with (many) diacritics is informally written without them (see for a recent example). I want to add the correct forms with diacritics so that learners know what to look up and how it is pronounced but I don't know how to best present this information. abbr seems inadequate because it is impossible to copy from the hover text (well, apart from editing the page) while sic after every other word looks woefully ugly and disrupts the reading flow. Ideas? &mdash; Fytcha〈 T | L | C 〉 20:03, 5 November 2022 (UTC)


 * I always use block brackets [ ] - cf. the quote at . Thadh (talk) 21:53, 5 November 2022 (UTC)


 * Brackets or even another line if a whole quote needs normalizing (analogous to how translations are given on a second line) seems like the best approach. The latter might require updating the template if anyone wanted to have the template do it rather than formatting the cite "manually". If multiple words need nomalizing or sicing, it seems advisable to move the brackets to the end, i.e. knot [not] sick [sic] everi [every] wurd [word], but instead knot sick everi wurd [not sic every word]. Related issue: in the 2003 quote on that page,
 * 2003 April 13, Spaima Limbricilor, “Ploua, ploua, Bombonel se oua! [It rains, it rains, Bombonel lays eggs!]”, in soc.culture.romanian, Usenet [It rains, it rains, Bombonel lays eggs!][3]:
 * it's weird that the template puts the  redundantly in two places (and not in the best place either time; I'd think it would ideally be placed outside of, but directly next to, the quoted title). - -sche (discuss) 01:20, 7 November 2022 (UTC)
 * What about using  or   for the normalized spelling? 98.170.164.88 01:24, 7 November 2022 (UTC)
 * What about languages that do need transcription or transliteration? I don't think that's desirable. Thadh (talk) 07:14, 7 November 2022 (UTC)
 * What I've done, e.g. at, is to split the transliteration line into transliteration of text as is and as normalised/corrected, separated by
 * "&lt;br&gt;&lt;span style='font-style:normal;'&gt;With ambiguities resolved:&lt;/span&gt;&lt;br&gt;"
 * or similar using angle brackets in the actual text. The path is a bit complicated - the text is stored in Module:RQ:pi:Anisongfree in variable lua and is formatted by .  This technique is useful for Lao script, where the writing is usually ambiguous, and for older Tai Tham texts, where the spelling is idiosyncratic or simply atrocious. --RichardW57m (talk) 12:55, 7 November 2022 (UTC)
 * Interesting. Maybe this tells us that we could do with an additional parameter in our templates, something along the lines of . As for [], I think a new template should be created attaches a separate CSS class to these comments. This has the advantages that their appearance is customizable and even toggleable and also that it is always clear which [] are from the source and which were inserted by us. &mdash; Fytcha〈 T | L | C 〉 13:05, 7 November 2022 (UTC)
 * Good ideas! We need two versions of copyedited, one for the original script, and one for the transliteration, as it may sometimes be appropriate to edit the original script.  While one could copy-edit my Tai Tham instances, typical Lao-script Pali writing systems cannot make the distinctions, which is why there was pressure to encode the Buddhist Institute's additions/restorations.  In some cases, we might even want to correct the original and then resolve sandhi in the transliteration!  That makes me think we need explanatory lines saying what the change is.  --RichardW57m (talk) 13:26, 7 November 2022 (UTC)

Was this change a 'substantial or contested change' that should have required a formal vote?
In late 2014 and early 2015 a vote was held to decide whether to "[make] it official policy to delete entries which do not meet WT:CFI [...] even if there is a consensus to keep". That change was not enacted and the vote was closed "no consensus" with 7 supporting votes and 9 opposing votes (44% support).

Shortly after the vote was closed, Kephir, who entered an "abstain" in the vote, removed the template marking Criteria for inclusion as a "policy, guideline or common practices page" and instead marked it as obsolete and "not intended to be used ever again" saying "how else can you interpret [the vote]?". Shortly afterward that BD2412, who did not participate in the vote, undid the change saying that "CFI can still be a guideline even if it is not mandatory where there is consensus for an exception". After that BD2412 added the following passage to Criteria for inclusion:


 * In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination editors that inclusion of the term is likely to be useful to readers.

In their edit summary for the change BD2412 wrote that "[t]his is what the vote really means."

More recently, PUC removed the passage added by BD2412 saying "[it] was never approved by vote" and also explained "so that it can no longer be invoked by the likes of Dan Polansky".

For context, during the time that the passage added by BD2412 was part of Criteria for inclusion, I am aware of seven instances where it was referenced as part of a discussion (1, 2, 3, 4, 5, 6, 7)

Per a 2012 vote, "[a]ny substantial or contested changes [to CFI] require a VOTE". My question for now is "was the removal of the passage originally added by BD2412 a 'substantial or contested change' that should have required a formal vote?" If the consensus is "no", then this discussion can resolve with no further action. If the consensus if "yes", I will start a vote about whether the passage should be removed from Criteria for inclusion. I appreciate hearing everyone's thoughts and hope we can approach this question narrowly. Take care. &mdash;The Editor's Apprentice (talk) 23:33, 5 November 2022 (UTC)
 * The removal by PUC is both substantial and contested, and per policy requires a vote. I understand PUC action: the sentence arrived into CFI without a vote about that sentence. But the sentence was in CFI for over 7 years and has become entrenched; any admin who opposed its addition could have removed it in the days, weeks and ever months that followed. Plus of the sentence: it documents our widespread practice of policy overrides, supported in some cases by nearly everyone, e.g. for hot words, for which no one played the stick-to-the-rules game. Without a sentence like this, CFI would be less honest. I don't really need the sentence anyway: I can always invoke . Without policy overrides, editors must not invoke LEMMING ever again, nor "set phrase", nor "term of art", etc. We don't have Wikipedia's W:Wikipedia:IAR, and the sentence does this job in a nuanced and fairly weak manner, not so aggressive as the "Ignore all rules" phrase. And if we want to remove from CFI things that arrived without a vote and are not supported by consensus, fine, let's remove the irrational WT:COMPANY. If people want to improve the sentence and place it to other location of CFI, fine, let's do that, but as a replacement, not a removal. --Dan Polansky (talk) 07:38, 6 November 2022 (UTC)
 * It has not become entrenched, and was in fact already contested one year ago, as can be seen from the discussion I linked to above. Maybe no admin cared enough to remove it until now, and maybe some did not even notice it.
 * It's the addition of this sentence that should be submitted to a vote and approved by a 2/3 (or 60% percent, I don't particularly care) majority, not its removal. Submitting its removal to a vote means a 1/3 minority could have its say in what appears in the CFI. PUC – 10:10, 6 November 2022 (UTC)
 * Should WT:COMPANY be removed from CFI as not arriving into CFI via a vote? --Dan Polansky (talk) 11:03, 6 November 2022 (UTC)
 * For others, do you wish the controversial phrasebook provision in CFI to be removed from it as not arriving into CFI via a vote? --Dan Polansky (talk) 11:04, 6 November 2022 (UTC)
 * None of this is relevant to the discussion. It's just . Theknightwho (talk) 11:48, 6 November 2022 (UTC)
 * This kind of invocation of the concept of whataboutism is largely nonsense. The idea is: if one proposes to act in a way that can be questioned, one should identify the principle behind the action and examine the acceptability of the principle on a variety of specific examples. The idea is Kantian and Popperian. Here, the principle seems to be: "If part of CFI is controversial and was added to CFI without a vote, it should be removed without a vote even if it was in CFI for several years". I propose to investigate whether we want to accept the principle by applying it to a variety of cases. To apply the principle to some cases but not to others is to reject the principle.
 * PUC, are you acting on this principle or on another principle? Your answers to questions would be appreciated; I have no desire to interact with the Knight. --Dan Polansky (talk) 12:37, 6 November 2022 (UTC)
 * This is obstructionism, and an obvious attempt to muddy the water by making the discussion about something that it is not. Please stop. OP even requested that we approach this question narrowly. Theknightwho (talk) 12:43, 6 November 2022 (UTC)
 * If we are to approach the question "narrowly", then the only question is whether the change is a) substantial or b) contested. It is in fact both a) and b). Everyone should agree on that (which is glaringly obvious), make the agreement on record and move on. The core of this pickle is that a "narrow" approach seems unfair to PUC. It seems to me that there could be a meaningful interaction with PUC in which he would himself realize the approach he has taken is unworkable, and would undo his edit. --Dan Polansky (talk) 13:05, 6 November 2022 (UTC)
 * No, that isn't what it means to approach a question narrowly, and you're just trying to find ways to keep talking about things that are not relevant. You do love your false dichotomies, though. Theknightwho (talk) 13:20, 6 November 2022 (UTC)
 * I forbid myself to respond to the Knight in this thread. I allow myself to respond to PUC. --Dan Polansky (talk) 13:27, 6 November 2022 (UTC)
 * I believe this was a substantial change as well, as I have always operated under the assumption that this rule applied. I would think many editors did, since few of us were here back in 2015. Thadh (talk) 13:59, 6 November 2022 (UTC)
 * To quote what I said on Discord: "Agree it should not have been removed [as that] clause is not being evoked in every discussion anyways and plenty of entries get deleted [anyways]". If the issue is with Dan Polansky, then talk to him directly. As seen in the prior discussion there was no consensus as to whether or not that line should be deleted, and it was inappropriate to delete it, especially after having participated in the aforementioned discussion. And it was even more inappropriate to make such a sweeping change with no discussion in this forum and to target a specific user while doing so. I maintain that this, along with prior instances, is unbecoming of someone that has recently become an admin. I hope that another admin will revert this change while this discussion takes place. AG202 (talk) 15:11, 6 November 2022 (UTC)
 * See also: User_talk:PUC, pages shouldn't be deleted like this while ignoring CFI. This combined with other behaviors related to "inclusionists" is very concerning, and it's not the first time that this has been brought up either. AG202 (talk) 15:18, 6 November 2022 (UTC)
 * I see two possible types of concern:
 * genuine concern: you simply care about my (not) following proper procedure, and would have objected just as strongly if I had summarily removed a sentence of CFI you don't agree with, or summarily deleted an otherwise valid entry you personally disliked (on account of offensiveness, for example);
 * ideologically motivated concern: you don't care as much about my (not) following proper procedure as about my challenging things that support your views.
 * Which one is it here?
 * As for your quote:
 * "[that] clause is not being evoked in every discussion anyways": what's the logic there? That clause still is a bad argument. It's a good thing when a virus isn't spreading everywhere; that doesn't mean we shouldn't try to get rid of it completely.
 * "plenty of entries get deleted [anyways]": so we should start grasping at straws?
 * PUC – 20:53, 7 November 2022 (UTC)
 * I’ve called out “inclusionists” and “deletionists” alike for not following procedure and for many other reasons. Deleting a clause in CFI without discussion, however, is unheard of for me in my time here, and the fact that the edit summary was targeted towards a specific user, the other example I cited, and prior encounters, it raises it to a level of concern that untenable for me at the moment. If you had edited CFI in the other “direction”, I still would not have been pleased. Though I don’t agree with everything done here, I have never gone as far as to unilaterally change something major on my own without discussion and consensus, even when I’m the only one with expertise, let alone something as powerful as CFI.
 * As for the quotes you’ve cited, the first one was in response to your edit summary where you removed the clause. When you said “Dan Polansky and the likes”, it makes it seem like it’s something that’s seeping in every discussion, when it’s not, and even if it was, you should’ve still had discussion about it here first. The second quote was less of an argument and more of a personal observation. In my time here, I haven’t really seen that clause itself save an entry enough times to warrant any sort of backlash at this rate (“a virus,” really?). I understand that you don’t like it, but that’s not how things work here as I’ve seen myself many times. There should be consensus. I’m just overall concerned that you took it upon yourself to make that change with the admin power that you recently got, almost in retaliation against described “inclusionists” when, to me, that’s not really what admin power should be for. AG202 (talk) 02:55, 8 November 2022 (UTC)

I've reverted myself. : I do hope this will be put to the vote. As you can see, the sentence is controversial, and should certainly not get a free pass imo. It's unfortunate that it's been sitting there unchallenged for so many years. PUC – 19:29, 6 November 2022 (UTC)

Since there seems to be consensus that the removal was "substantial" and/or should have followed a formal vote, I have started a formal vote which is currently in the premature stage to answer the question of if the passage originally added by BD2412 should remain. Please give it a look and discuss and possible improvements or fixes on the vote's talk page. Take care. &mdash;The Editor's Apprentice (talk) 06:42, 8 November 2022 (UTC)

Let's deprecate the Thesaurus namespace
To be clear, I think there's a lot of value in giving synonyms, but I think there are some serious flaws in how we do it at the moment. I don't want to set out a detailed proposal for this without getting a sense for what the consensus is, but my overall impression is that we probably want to integrate it into the mainspace:


 * 1) Badly neglected and inconsistent. I think it's fair to say that not very many editors maintain thesaurus pages. They're inconsistently categorised (Category:English thesaurus entries is extremely incomplete), and have no standardised layout, which creates confusion for the reader. There are also wide inconsistencies as to whether we should be including the language code in the page name. None of this is desirable, and it certainly does not aid the reader. It's also obvious that the various clean-up jobs which have been done over the years on Wiktionary have bypassed the Thesaurus: the template still uses the acryonym "ws" (depite the Wikisaurus name being deprecated back in 2017), there are still a bunch of interlanguage links (which have been removed everywhere else), and the templates still follow schemes that have been deprecated (e.g. they still use  ). These are all obviously fixable, but it is highly indicative of how much attention is actually paid to these pages by the majority of the editor-base (read: not much). The lists even still require manual alphabetizing, which is absurd.
 * 2) Potential clutter is easily avoidable. As with other sections which often contain lengthy lists (e.g. derived terms), there are ways of including these that don't clutter the page. The most obvious solution being to ensure that the section is collapsible.
 * 3) Better to be consistent with everything else. I can't think of a compelling reason for treating synonyms differently to everything else; particularly given that we only do this when lists of synonyms become longer. It's far more accessible (and better meets reader expectations) to treat synonyms consistently across all pages, whether the list has 3 entries or 300; just as we do with derived terms et al.
 * 4) A better model is already in use. The pages for Chinese already make use of an extensive system of modularised thesaurus templates, which can be placed on each page as necessary, and update automatically as new synonyms are added (i.e. bypassing the reason for having thesaurus pages in the first place). In the case of Chinese, these are primarily used to show dialectal distribution, but there is no reason why a similar system can't be used in a more general purpose way. You can see a bunch of these in use at 條. Note: I am not saying we should use this layout ; just that the underlying system can obviously be utilised by other languages. It's also not the only possible solution, but simply an example of how we could do things better. A less radical model would be doing what we do with translations, which is to point the user to the translation section on the primary entry.

What are people's thoughts? Theknightwho (talk) 22:47, 7 November 2022 (UTC)


 * You're clearly wanting to deprecate it because User:Dan Polansky is editing tirelessly in that. Not cool. Quit trolling. Celui qui crée ébauches de football anglais (talk) 22:53, 7 November 2022 (UTC)
 * I don't care who edits the thesaurus namespace. Stop derailing the discussion. Theknightwho (talk) 23:00, 7 November 2022 (UTC)


 * One use case for a thesaurus is to try to gradually navigate to the mot juste (or a word you have forgotten). I've also used this e.g. when composing cryptic crossword clues and trying to create a convincing "surface reading". In such cases it's very useful to navigate in a thesaurus-only mode: when I see a candidate, I click it, and jump to the thesarus page for that word, and thus get closer and closer. You see the same interface in e.g. Microsoft Word thesaurus. (We don't really have enough thesaurus root words to be able to do this, yet.) Equinox ◑ 23:03, 7 November 2022 (UTC)
 * That is a good point. I think the main problem that we have is that our thesaurus at the moment essentially just acts like an overflow, and I don't think it's likely to change anytime soon. I also suspect that any modularized implementation would allow both formats, and for greatly expanded coverage in thesaurus-only mode as well (as any input to page sections would also benefit that). Theknightwho (talk) 23:14, 7 November 2022 (UTC)
 * I agree that there is a lot of improvement to be made with respect to synonyms, and a dropdown template with a dozen words or so automatically added in seems like a great idea. But in my opinion, many thesaurus entries are so impractically long that they deserve their own pages. Do we really want the full list of synonyms in Thesaurus:drunk to display in the main entry?
 * Ioaxxere (talk) 05:39, 8 November 2022 (UTC)
 * We already include pages with very large numbers of derived terms (comparable to Thesaurus:drunk); it doesn't seem like too much of any issue to me. Just have a look at the derived terms on neuro-, where they're not too difficult to parse (remembering that most thesaurus entries won't have lots of similar-looking terms like that, either). Theknightwho (talk) 07:01, 8 November 2022 (UTC)
 * Don't forget about the Lua memory limits. Having more content on main will likely push some entries over the cliff. Could we have a mixed model, where most of the content is in Thesaurus:, but some of it could be pulled into the main space? Perhaps the most salient synonyms? – Jberkel 08:22, 8 November 2022 (UTC)
 * Good point. I almost never remove synonyms from the mainspace. In my view, the mainspace should list some of the most common synonyms and then link to the thesaurus. By contrast, I have seen some editors remove synonyms from the mainspace, which I find not so good. --Dan Polansky (talk) 08:44, 8 November 2022 (UTC)
 * Lua memory limits are a concern only on a comparatively tiny number of pages. While the issue is obviously there, it's important to remember that the back-end for labels alone is considerably more burdensome than synonyms are ever likely to be, given the size of the module tables involved. Theknightwho (talk) 09:46, 8 November 2022 (UTC)
 * Yes, concerns are now only on a small number of pages, but only because content was moved *out* of pages. I think this should be the general direction to follow, moving non-essential content out of main, either to namespaces or to Wikidata. Editing large pages is already very slow right now. – Jberkel 10:29, 8 November 2022 (UTC)
 * Including a massive list of synonyms within the entry, many of them slang or obscure, would seriously hinder writers who are just looking for a single decent word. We should provide 10-20 of the most common and useful synonyms (and a handful of antonyms) as part of a template, provided with a link to the full thesaurus page. See the layout of Google's dictionary for what I basically mean. Ioaxxere (talk) 14:28, 8 November 2022 (UTC)
 * There are plenty of options for how we could lay things out. There is no obligation to do a massive list with no additional context. Theknightwho (talk) 15:26, 8 November 2022 (UTC)
 * I agree. Your example with neuro- gave the impression that you would like a list of synonyms to be laid out as such, but that would of course be less than ideal. Ioaxxere (talk) 15:33, 8 November 2022 (UTC)
 * I guess my point was just that there's plenty of precedent for having large lists in mainspace. I'd certainly prefer them to be subdivided sensibly, though. Theknightwho (talk) 15:48, 8 November 2022 (UTC)
 * Benefits are described here:
 * Thesaurus/Benefits
 * In brief, the thesaurus helps ease maintenance of semantic lists by centralization, allows focus on a single sense or place in the semantic space, allows focus on semantic relations to the exclusion of etymology, pronunciation, etc., and provides hints where to navigate next to find other semantic lists via the "=> Thesaurus" links next to items.
 * The problems raised do not seem intractable or very serious. The greatest problem is the lack of interest of editors, but I don't expect using the mainspace would improve that very much. The work on the thesaurus involves hard and unique challenges that most editors are not interested in. The bulk of the English thesaurus was made by two people with serious interest in it; AdamBMorgan did a lot of work there. An entry to consider is Thesaurus:number with all its structure and rich content, not constrained to synonyms, hyponyms and meronyms. To form a better idea of what's involved and what the mentioned benefits mean in practice, one has to look at some of the more interesting complex non-synonymic entries. (As an aside, the voters in Votes/pl-2017-11/Restricting Thesaurus to English thought having a separate thesaurus is a good idea.) --Dan Polansky (talk) 07:25, 8 November 2022 (UTC)
 * Just compare how much attention derived terms get compared to the thesaurus. The difference is enormous. You haven't really explained why it needs to be in a separate namespace or presented any solutions to the (numerous) issues outlined, which are well-proven to be a problem judging by just how neglected and problematic the Thesaurus namespace currently is.
 * By the way, I'm going to nip in the bud any attempt to misrepresent this as being about the work involved or whether synonyms are valuable, because quite obviously I want to improve access to that, not remove it. Theknightwho (talk) 07:43, 8 November 2022 (UTC)
 * Derived terms are entirely trivial to add and figure out, requiring no skill to talk of at all; semantic relations, which emphatically are not just true synonymy, which is relatively boring and uninspiring, are a whole different beast. I recommend the readers to read the page with the benefits articulated, and if anyone has any questions for me, please ask, and I will try to do my best. --Dan Polansky (talk) 08:01, 8 November 2022 (UTC)
 * What relevance does any of that have to how we present information? You're also wrong, but it simply doesn't have anything to do with the topic at hand. The thesaurus pages are in a sorry state, whichever way you slice things. Theknightwho (talk) 08:17, 8 November 2022 (UTC)
 * Let's try something different: where is the thesaurus data for Thesaurus:number to be stored? Directly in the mainspace, in number? What about Thesaurus:drunk, in drunk? Will there be templates and modules to extract the content from the mainspace entry drunk and show it in the synonym entries? --Dan Polansky (talk) 12:09, 10 November 2022 (UTC)
 * From reading the top again, the answer seems to be templates, like in some Chinese entries. So all the people who could not figure out the thesaurus will be able and willing to do essentially the same kind of information filtering, selecting, taxonomizing and sequential ordering (e.g. Thesaurus:number), just using the template namespace and template technology? Is really using templates and modules easier for non-technical mortals, perhaps semanticists, ontologists and philosophers in general, than using the markup in use in the thesaurus? And why could not the same templating and module technology proposed be used in the thesaurus namespace? Could we thus retain the namespace but use the proposed technological change, provided the change really brings more pros than cons? --Dan Polansky (talk) 12:29, 10 November 2022 (UTC)
 * People seem to have no problem doing so with everything else that's done through modules. It's not that we're all too thick to work out how to use the thesaurus, if that's what you're implying. Theknightwho (talk) 12:52, 10 November 2022 (UTC)
 * Okay, let us assume (I don't) that editing modules and templates is generally as easy and general-editor-friendly as editing the current setup of the thesaurus. How is the semantic focus to the exclusion of everything else going to be achieved, given the semantic relationships are going to be transcluded in to the mainspace in some way? --Dan Polansky (talk) 13:13, 10 November 2022 (UTC)
 * It's possible to use module data for more than one purpose. This is trivially obvious. Theknightwho (talk) 16:34, 11 November 2022 (UTC)
 * I think it would have been better had you disclosed that you are the sole editor of the benefits page. By linking to a Wiktionary namespace entry where arguments are collected, people who forget to check the history are tempted to think that there is more support for your personal views than there actually is. Please don't try to make it look like something that is not the case (e.g. by writing Benefits are described here instead of "I've described the benefits here"). &mdash; Fytcha〈 T | L | C 〉 11:47, 8 November 2022 (UTC)
 * Fair point; I could have made it explicit that I am the sole author of the argumentation. However, the benefits are an exercise in argumentation and do not necessarily have objective factual validity, as is all too often the case with "benefits". Everyone has to form their own judgment. While there are some purely factually valid claims such as the listing of thesauri in other dictionaries, to what extent the factual claims are relevant or convincing is for the reader to determine. --Dan Polansky (talk) 11:54, 8 November 2022 (UTC)
 * Perhaps your views would be better-suited to your personal userspace. Theknightwho (talk) 15:25, 8 November 2022 (UTC)

Keep, but rename it to DanThoughts. --Vahag (talk) 11:07, 10 November 2022 (UTC)

new Vector 2022: wasted space
Looks like they've just added a banner prompting people to switch to Vector 2022. I tried it and the first thing I notice is that there's a lot of wasted whitespace on the right, which seems to serve no purpose at all; in addition, the left rail got wider, which combined with the wasted space on the right means the contents in the middle are a lot narrower. I gather many people will switch skins, but all the wasted space will mean we potentially need to make things significantly more vertical and less horizontal (maybe necessary anyway for mobile devices, but otherwise non-ideal). Is there a way to recover the space with some customization settings while not switching entirely back to Vector 2010? Benwing2 (talk) 04:04, 8 November 2022 (UTC)


 * Honestly it looks like a mobile version to me, but I'm just being grumpy. Vininn126 (talk) 10:11, 8 November 2022 (UTC)
 * They also somehow managed to make the title non-copyable in edit mode (again) :/ (T322725) – Jberkel 10:37, 8 November 2022 (UTC)
 * The only hope I have left after seeing that they didn't respond to well-founded criticism is that Vector 2022 is never rolled out as a default skin on en.wikt (do we have control over that?). It's patently clear that this skin was created with only Wikipedia in mind. &mdash; Fytcha〈 T | L | C 〉 11:53, 8 November 2022 (UTC)
 * Unchecking "Enable limited width mode" under "Skin preferences" recovers the space on the right, but the left side remains well padded. JeffDoozan (talk) 20:23, 8 November 2022 (UTC)
 * It might be possible to change the sidebar width. The DIV class seems to be .mw-panel but I was trying to figure it out a few days ago and for some reason it didnt work on the site, even though it did work in my HTML editor on my computer. There might be some CSS that's loading externally that's interfering with it. What I do know is that I've hidden the sidebar entirely on a private wiki where it just doesn't serve much purpose. I wouldn't want to hide the sidebar on Wiktionary, but I'd hope it is at least possible to compress it and make the font smaller since most high-volume editors won't need it very often. — Soap — 19:34, 11 November 2022 (UTC)

Including lists of notable people in a field
wants to include a list of notable philosophers in our thesaurus page. What do other people think about this? &mdash; Fytcha〈 T | L | C 〉 12:18, 8 November 2022 (UTC)>
 * We do list instances in various entries in the thesaurus, and it makes sense, e.g. Thesaurus:country and Thesaurus:political party. The "instance of" relationship is well established in the thesaurus. In the discussed entry, it follows the example of Moby II and WordNet. In so far as mainspace entries should better be covered in the thesaurus, e.g. the sense for Aristotle should be covered somewhere and it naturally belongs to Thesaurus:philosopher. The choice of the notable philosophers is driven by criteria that, while arbitrary, are bound to two specific external lists, providing for maintainability. --Dan Polansky (talk) 12:26, 8 November 2022 (UTC)
 * I really don't understand the point of this proposal, at least as a proposal vs. an opportunity to troll. Most dictionaries do not include such lists. In particular, at least one dictionary with an affiliated encyclopedia does not, Merriam Webster. Why should we be duplicating content available from a sister project. We would have at best the same content WP has as in a mere listing page or category. DCDuring (talk) 16:09, 10 November 2022 (UTC)
 * To understand the objections better, I have posed a list of analytical questions below. If you would be so inclined and answered some of them, that would be great. The lead question is whether all instance-of relationships are a problem or something else is a problem. As for background, I do recall your objections to our having geographic names, so I am not surprised by your opposition. I am surprised that you spend most of your time here making a vastly incomplete replica of Wikispecies, but that's your choice, not for me to judge. --Dan Polansky (talk) 16:19, 10 November 2022 (UTC)
 * Not in the least interest in encouraging any more of this. DCDuring (talk) 18:21, 10 November 2022 (UTC)
 * We do have geographic entries as per voted policy supported by a 2/3-supermajority and we do have planets. Should Mars be removed from Thesaurus:planet since it is in instance-of relationship? And should biological taxa be removed from the mainspace since they are generally considered names of specific entities and they do not show attributive use in widely understood meaning? There is in fact no policy protection for names of taxa. Where are your principles, if any? --Dan Polansky (talk) 19:06, 10 November 2022 (UTC)
 * I can see this as a type of hyponym, and I think it makes a certain amount of sense. I wonder if a link to a category page or something would be better. Vininn126 (talk) 12:27, 8 November 2022 (UTC)
 * One thing is certain: listing all merely "notable" philosophers rather than "very notable" would become unwieldy. One can try to figure out where to draw the line, and include fewer notable instances. The choice I made was based on two reasonably short external lists, one a thesaurus, one a semantic network that we picked the semantic relationships from. If there is a shorter canonical list, we can consider using that one. A comprehensive list of notable philosophers should indeed be delegated to a category. However, including senses for specific philosophers in the mainspace is still a controversial issue, with no policy regulating the subject, so filling the category would be controversial. --Dan Polansky (talk) 12:39, 8 November 2022 (UTC)
 * I see this as a can of worms: It opens us up to potentially endless content disputes over who is or isn't a philosopher, whether someone's pet philosopher is noteworthy enough etc. with no good mechanism to determine who's in the right. We could theoretically do the heavy legwork of meticulously defining a razor-sharp demarcation such that there is no dispute possible, but OTOH we could also just not include these. Providing a list of luminaries in a field is an encyclopedia's job. I also want to remind that there is currently majority (though currently not supermajority) support in an ongoing RFD to delete such a "surname-person-sense": Requests_for_deletion/English &mdash; Fytcha〈 T | L | C 〉 12:44, 8 November 2022 (UTC)
 * The thesaurus as a whole provides for potentially endless content disputes since so much of it cannot be algorithmically and deterministically regulated. Arbitrary external lists can be picked and there does not need to be any dispute. I picked two lists that do not in any way cater to my preferences but rather are "natural" picks. While Dickens is perhaps more vulnerable to poorly argued deletionist whims, Aristotle could be less so. I will also note that -ian/-ist nouns describing adherents (Platonist, Aristotelian, Marxist) are natural hyponyms of Thesaurus:philosopher, and will be probably listed anyway, and the selection problem will be the same or similar. As for the "covered by encyclopedia" argument, that alone has almost no force since a lot of dictionary content is necessarily covered by encyclopedias, and better so, e.g. names of laws, theorems and principles. Many dictionaries/networks do think this kind of content is inclusion worthy. --Dan Polansky (talk) 12:58, 8 November 2022 (UTC)
 * I strongly advise that you read WT:What Wiktionary is not. It is becoming extremely tiresome having to relitigate all the minutiae of Wiktionary because of your constant attempts at rules lawyering. There is an obvious and material difference between terms like and the philosopher Karl Marx. We focus on the former, and leave the details of the things those terms describe to Wikipedia. Theknightwho (talk) 15:35, 8 November 2022 (UTC)
 * My point is that the problem of selection criteria for surnames and -ian/-ist items is the same. If all -ian/-ist items are listed, they will be too many. And the list is fairly selective and interesting. Such a list is in fact not found in Wikipedia; you can try. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)
 * Why do we need lists for surnames or -ian/-ist terms? Both of those are already covered by categories. Theknightwho (talk) 15:57, 8 November 2022 (UTC)
 * To make the thesaurus more complete as for hyponymy. The categories do not list hyponyms of "philosopher"; they list all derivations from -ian/-ist, which will be not only such hyponyms. At the very least, one should list a few examples to remind the reader that such hyponyms exist. --Dan Polansky (talk) 16:00, 8 November 2022 (UTC)
 * What you seem to be doing is manually creating lists that could be trivially generated from categories. Theknightwho (talk) 16:02, 8 November 2022 (UTC)
 * A selection of very notable instances cannot be generated from categories. That holds true for all instances for which we have names in Wiktionary, whether people, countries, cities, rivers, mountains, etc. In my view, listing at least some instances adds value. WordNet agrees. --Dan Polansky (talk) 16:13, 8 November 2022 (UTC)
 * Notable by who's judgment? Why do we care about a list of philosophers that you personally consider noteworthy? How does any of this relate to terms? Theknightwho (talk) 16:18, 8 November 2022 (UTC)
 * By the determination of an external list, not mine. Popper is missing on the list, a scandal. The point is that exemplification is better than no exemplification. I am fine discussing how many should be there, whether 50, 100 or 200. For rivers, I picked the longest ones, that's easier and more easily measurable than notability of philosophers. But notability of philosophers is also a fact; some are much more notable than others. --Dan Polansky (talk) 16:25, 8 November 2022 (UTC)
 * So you've just copied a list you found somewhere else? Theknightwho (talk) 16:26, 8 November 2022 (UTC)
 * (outdent) The list is a union of Moby Thesaurus II and WordNet for "philosopher", and precisely the union. Nothing personal. We may choose a different standard if we wish. --Dan Polansky (talk) 16:41, 8 November 2022 (UTC)
 * Oppose. Equinox ◑ 15:27, 8 November 2022 (UTC)
 * What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)


 * Maybe you better start another vote saying "everyone must explain their votes to Dan's satisfaction". I'm not an idiot. Equinox ◑ 09:07, 25 November 2022 (UTC)


 * Oppose. Obviously not dictionary material. Theknightwho (talk) 15:29, 8 November 2022 (UTC)
 * What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)
 * Nothing "obvious" about it given WordNet disagrees and so do multiple other dictionaries that do contain biographical entries. From the point of view of an external observer with no bias, it is dictionary material in so far as it is found in dictionaries. --Dan Polansky (talk) 15:56, 8 November 2022 (UTC)
 * It's explained in point 1 of WT:What Wiktionary is not: Wiktionary is not an encyclopedia, a genealogy database, or an atlas; that is, it is not an in-depth collection of factual information, or of data about places and people. Encyclopedic information should be placed in our sister project, Wikipedia. Wiktionary entries are about words. A Wiktionary entry should focus on matters of language and wordsmithing: spelling, pronunciation, etymology, translation, concept, usage, quotations and links to related words. Theknightwho (talk) 16:01, 8 November 2022 (UTC)
 * That's fine; it's just the "instance-of" relationship, not "in-depth collection of factual information" and not "data about places and people". And the quoted passage is flawed in that it does not even recognize semantic relationships as valid content. --Dan Polansky (talk) 16:09, 8 November 2022 (UTC)
 * Oppose. We are not a short-attention-span version of WP. DCDuring (talk) 16:46, 8 November 2022 (UTC)
 * More substance please. Neither is WordNet. Exemplification is a great principle. --Dan Polansky (talk) 16:48, 8 November 2022 (UTC)
 * You have thousands of words here. Stop demanding that people give you lengthy explanations; especially when you make absolutely no effort to come to a common understanding with other users anyway. Theknightwho (talk) 18:47, 8 November 2022 (UTC)
 * Clarifying my previous comment - ultimately oppose. At most listing a category or somethiing it's not lexical. Vininn126 (talk) 15:36, 8 November 2022 (UTC)
 * Oppose as well, it's not lexical as Vininn stated and amounts to pure taxonomising of things rather than words. —Al-Muqanna المقنع (talk) 16:04, 8 November 2022 (UTC)
 * What does it mean "it's not lexical"? Is the "instance of" relationship a problem, or just notable people? Can countries be listed in Thesaurus:country? --Dan Polansky (talk) 16:06, 8 November 2022 (UTC)
 * Individuals aren't lexical, that's an axiom. Vininn126 (talk) 16:11, 8 November 2022 (UTC)
 * Names of individual entities (people, rivers, etc.) are words, unless they are multi-word names and thus are "lexical" and even multi-word names are as lexical as phrases. "Instance of" is a semantic relationship, as per WordNet and common sense. --Dan Polansky (talk) 16:16, 8 November 2022 (UTC)
 * And categories of specific rivers are also not "lexical"? Should they therefore be deleted as encyclopedic? --Dan Polansky (talk) 16:17, 8 November 2022 (UTC)
 * Again: why are you duplicating the function of categories, while also removing a load of info? It just means either more maintenance work, or yet another thing that will become neglected; like much of the thesaurus already is. Theknightwho (talk) 16:25, 8 November 2022 (UTC)
 * What am I "removing"? I don't recall removing anything. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)
 * Re-read what I wrote. Theknightwho (talk) 16:42, 8 November 2022 (UTC)
 * Nothing to see there. I am not removing anything; I am doing exemplification on the model of Moby II and WordNet. WordNet is an amazing role model, absolutely astounding, regardless of the flaws that it necessarily has. There is no maintenance problem: the list is frozen as a union of Moby II and WordNet. --Dan Polansky (talk) 16:46, 8 November 2022 (UTC)
 * You're unbelievable. I obviously meant that you are including a cut-down version of a list that we already have. Stop finding every excuse to miss the point. Theknightwho (talk) 18:48, 8 November 2022 (UTC)
 * Yes, exemplification means that not the complete list is included. What you are saying is "exemplification is bad", without saying why it is bad. To my mind, complete lists are uninteresting: I have no interest to look at a comprehensive list of 10,000 philosophers, most of which will not ring any bell in my mind. The same for rivers: I would rather be reminded of some notable instances than of the first 200 items of a comprehensive list where the only claim the items make for themselves is that they lead the alphabet. If I wanted a complete list of rivers, I could go to Wikipedia anyway, or make a Wikidata query; I don't need a dictionary for that. --Dan Polansky (talk) 18:58, 8 November 2022 (UTC)
 * There have been arguments for that! Proper nouns are inherently different. Keeping them for etymologies and other information is one reason to keep them, but they are inherently different from common nouns. Vininn126 (talk) 16:28, 8 November 2022 (UTC)
 * Are you now saying that "Amazon" is not a word? I placed a source to Appendix:Wordhood claiming otherwise, although I find the claim that it is not a word an absurdity. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)
 * I did not say that! I said they are different. Please stick to the words that I use! Vininn126 (talk) 16:37, 8 November 2022 (UTC)
 * Fine, just answer "No" and we move on. We now have that "Amazon" is a word. Now, is the "instance of" relationship between "Amazon" and "river" a relationship that is "lexical"? --Dan Polansky (talk) 16:39, 8 November 2022 (UTC)
 * Amazon IS a river specifically, but it's ONE river, which is a different relationship than a TYPE of river which can refer to many instances of it. THAT is lexical. Vininn126 (talk) 16:46, 8 November 2022 (UTC)
 * But surely "instance of" is the relationship between the meaning of words "Amazon" and "River" and therefore is "lexical" (of or pertaining to words)? Why would hyponymy be lexical and "instance of" not given both are relationships between word meanings? --Dan Polansky (talk) 16:51, 8 November 2022 (UTC)
 * This are inherently different kinds of instances. This is a singular instance, one-of-a-kind, inherently by definition. Other instances, countable or uncountable, still refer to something that can be or be shared with multiple entities - which is why proper nouns are different from non-proper nouns and why the relationship is not lexical. If the Amazon belonged to a category of rivers that behaved differently than other categories of rivers, whatever word we used to describe that category would be lexical. Vininn126 (talk) 16:57, 8 November 2022 (UTC)
 * (outdent) There is no doubt "hyponymy" and "instance of" are different relationships, as recognized by Wikidata, although WordNet confuses the two a bit. But what does it have to do with the word "lexical"? What is the definition of the word "lexical" other than "of or relating to words"? And what is the business with the word "lexical" anyway? The words "Amazon" and "river" are semantically connected, and the thesaurus relationships are semantic relationships; the word "lexical" is not used for the purpose. And why are categories allowed to do something that the thesaurus is not? --Dan Polansky (talk) 17:02, 8 November 2022 (UTC)
 * Inclined to oppose. The example of political parties is shaky as I'm sure that many of those could be subject to RFD themselves (some have already been deleted), but the example of countries isn't relevant because those are explicitly allowed by CFI, whereas philosophers are not. A lot of the entries linked don't even have an entry for the philosopher mentioned, and a few of the ones that do don't feel super notable and could be subject to RFV/RFD. AG202 (talk) 16:51, 8 November 2022 (UTC)
 * Now that's a different line of reasoning. It would mean listing countries in Thesaurus:country would be okay because the names themselves as countries are guaranteed to be included. For philosophers, I would argue that their names are going to be included in some form, e.g. as Aristotle or as Russell, so they will mostly be bluelinks, and where they would be redlinks, ws enables saying link= to disable linking.
 * I believe among the large set of all Wikipedia-notable philosophers, those relatively few listed are likely to be very notable, given the selection made by the authors of WordNet and Moby II. A different list of notable philosophers could be chosen; I am fine with that. --Dan Polansky (talk) 16:57, 8 November 2022 (UTC)


 * No. Since when are dictionaries to present the  level? There may be a ”philosophy dictionary” doing this, but only in as much as it is not a dictionary but misnomed. You also have a link to a list of philosophers on Wikipedia which has better personnel for the same job.
 * Your attempts to deduce arguments from assumptions that you put in our mouths but we have not mentioned are all beside the point and a waste. You are the only one dropping the term “word” in this thread. WT:CFI saying “all words in all languages” is not specific enough to demand inclusion of anything that you deem a word. And still we do not ascribe value to “notability” in an absolute sense—the instance may be as notable as it can be, in the context of this project it won’t be as much. Fay Freak (talk) 17:05, 8 November 2022 (UTC)
 * Ever heard of WordNet? And Moby II thesaurus? Both lexicographical works? I made neither of them. Ever heard of Wiktionary topical categories for specific rivers? Not dictionary content? As for "word", I did not introduce the word "lexical" into the discussion and "lexical" means "of or relating to words". The point is not really notability but exemplification, and to achieve exemplification, one needs to make some arbitrary cut off or choice, which for rivers may be length and for philosophers may be notability. --Dan Polansky (talk) 17:13, 8 November 2022 (UTC)
 * It seems like you are the only one who esteems it necessary to cut off and around the dictionary arbitrarily. The other editors here work on some kind of system, which you seek every opportunity to deny by introducing anything to estrange em, though its foreignness to this place be immediately discernible, supported by the observation that few have much ambition to formulate rules—but this is your personal guiding theme, others just want to write a dictionary and not a philosophy of dictionaries, which they aren’t at a loss about. Fay Freak (talk) 17:27, 8 November 2022 (UTC)
 * Well, it is not reasonable to list all rivers as instances in Thesaurus:river, so if examples are to be given on the model of WordNet and Moby II, some selection has to take place, some arbitrary cut off. I don't understand what the above is all about. How does that relate to anything that I have said above? How does my editing of the thesaurus interfere with anything that others are doing? How does it impact "work on some kind of system"? --Dan Polansky (talk) 17:48, 8 November 2022 (UTC)
 * Why do you want to manually duplicate what we can already do with categories? And why do you want to do so in a way that requires an "arbitrary cut off"? These are arguments against your approach, because they make it very clear that there is no underlying principle here other than what you've decided to hyperfocus on today. Theknightwho (talk) 18:51, 8 November 2022 (UTC)
 * To exemplify, as I write above, not just philosophers but rivers, countries, mountains, mountain ranges, etc. I want to follow WordNet's wisdom. All I hear is "exemplification is bad", with no argument to support that notion. Exemplification is not duplication of a comprehensive list, by definition. --Dan Polansky (talk) 19:04, 8 November 2022 (UTC)

This is a lost cause, but let me make the point that the thesaurus is a word finder. To find the name of a very notable philosopher is to find a word, by starting with another related word, here "philosopher". Why make the word finding function less rich? Sure, other sources such as WordNet already do the job and are one click away, but why make the "word finder" less rich in its "word finding" capacity? It has enough space on the page. --Dan Polansky (talk) 20:40, 8 November 2022 (UTC)

The list provided a tool to navigate from philosophers to the derived adjectives: you click on the name to get to the mainspace and there you see the derived adjective. For that, the philosopher does not necessarily need to have a sense in the mainspace, only an entry for the name. Thus, one can answer the question: what notable philosophers have an adjective derived from them? Without the list, there is no way to do that in Wiktionary. A category of philosophers would only be there to serve the purpose if they all had senses in the mainspace, which is controversial; the thesaurus can work without that. --Dan Polansky (talk) 21:05, 8 November 2022 (UTC)


 * DuckDuckGo and Yandex and Twitter’s and Reddits search functions are also word finders, most useful to find usage and discussions of terms; doesn’t mean Wiktionarians should build a search engine to be accessed by the Thesaurus namespace. It’s a word finder but not for all kinds of words and only to find these limited kinds of “words” in a specific fashion. You are not making a point but a petitio principii the whole time, defining things as what you want them to be.
 * The list was a kludge, like an improvised explosive device. But you don’t have any mission to take on this site, repurpose the tools offered here to achieve your objectives, as you can just enter other sites and employ their devices. You are acting as though there were a frontline that, to extend your influence sphere, you would have to break by any argument imaginable, not being able to desert from your position, but actually we have to cooperatively restrict ourselves for a concentrated and coordinated effort to allocate scanty manhours, which are diluted if there is no prospect of contours in the resulting work. Fay Freak (talk) 21:41, 8 November 2022 (UTC)
 * I'm about 85% sure that I agree with you, but I have to admit that I did get lost in your second paragraph. Theknightwho (talk) 21:47, 8 November 2022 (UTC)
 * What nonsense. The semantic relations employed by Wiktionary and the thesaurus are modeled on WordNet, and I am merely following WordNet's lead, making my own thoughts along the way and finding that I like the result, which is still in the revision history. There is no "repurposing" of the tool: there is use of the tool as designed by the tool maker WordNet. The above is pure rhetoric full of buzzwords and figures of speech while making no substantive argument. The claim that I am doing "my" way is absurd since I am doing the WordNet and Moby II way and perhaps I would not even come up with the idea that we should list considerable number of instances without them. I tried to do what they are doing, already before the philosopher entry in geographic entries, and I find it cool. By contrast, it is the opposition that is doing "their" way by disregarding practice in external sources that serve as inspiration. It is all the more curious given the opposition does not spend any resources on the thesaurus and has made derogatory remarks about it. --Dan Polansky (talk) 22:09, 8 November 2022 (UTC)


 * This an obvious no, this is what w:Category:Philosophers is for and does. - -sche (discuss) 09:52, 9 November 2022 (UTC)

In the spirit of Sisyfos, I will try to understand the problems raised or implied. Questions: I pledge to avoid responding to individuals who have shown to produce unproductive arguments to prevent derailing the discussion. There are some individuals who have produced interesting and relevant thought and I would like to hear from them. --Dan Polansky (talk) 09:42, 10 November 2022 (UTC)
 * Is the problem with instance-of relationship? If so, planets have to go from Thesaurus:planet and countries have to go from Thesaurus:country.
 * Is the problem with cut-off on the number of instances covered? If so, specific rivers have to go from Thesaurus:river: it is not practical to list all the rivers and only a sample can be given.
 * Is the problem specifically with humans? If so, why? Why are specific humans more encyclopedic than specific rivers?
 * Is the problem with poor measurability of notability? If so, rivers could be kept in Thesaurus:river, but something would have to be done about individuals in Thesaurus:philosopher. Could we perhaps include philosophers whose names are used figuratively, as in "he is no Socrates"? We could thus exemplify without relying on notability.
 * Is the problem with including items that have no sense in the mainspace? If so, I could modify the list to include only such items: "items from Moby II and WordNet that are covered by mainspace" or "only items covered by mainspace".
 * Is the problem with duplicating Wikipedia? If so, why should we have Category:en:Rivers and why should its category structure involve the encyclopedic CAT:en:Rivers in the United States and CAT:en:Rivers in Alabama, USA?
 * I would personally be fine with, and encourage, removing the existing instances from rivers, philosophers, countries, and planets. When I say it is non-lexical, I mean that it is taxonomising referents rather than words. That is unavoidable to some extent when mapping semantic relationships, but instance-of relationships are entirely about the referents and shed virtually no light on words. Thesaurus:country, for example, would be much more useful to my mind if it had a more detailed list of terms related to countries, rather than the vast majority of the entry being a mechanical list of existing sovereign states. —Al-Muqanna المقنع (talk) 11:23, 10 November 2022 (UTC)
 * If you remove planets, the thesaurus will show no connection between Thesaurus:planet and Thesaurus:Earth and no connection between Thesaurus:country and Thesaurus:United States of America. I don't see how that disconnection can be desirable. Thesaurus:country listing countries does not prevent it from listing other terms. Granted, the country entry lists quite many instances, but if it is to connect thesaurus entries that are semantically related, it has to do it, or have a separate thesaurus entry just for the purpose, e.g. Thesaurus:country/instances. --Dan Polansky (talk) 11:52, 10 November 2022 (UTC)
 * I think our readers are smart enough to understand that absence on a Thesaurus page does not mean absence of any connection whatsoever. It can be replaced by a see also link to a category, which also prevents having to manually edit the information in multiple places. —Al-Muqanna المقنع (talk) 11:55, 10 November 2022 (UTC)
 * "any connection whatsoever" is a red herring and not under discussion, e.g. phonological connections. As for "referents rather than words", semantic relations are done via relationships between referents; for instance, hyponymy is for subset relationship on referents and meronymy is on part-of relationship of referents. Thus, we have meronymy in Thesaurus:Brazil that connects the referent of Brazil to the referent of Mato Grosso. --Dan Polansky (talk) 12:01, 10 November 2022 (UTC)
 * I agree that phonological connections are a red herring and not under discussion, and if people intuitively understand that exclusion from a thesaurus page doesn't exclude phonological connections, I'm sure they can also be trusted to understand that it doesn't exclude non-lexical instance-of relationships. These various examples are not particularly impressive to me; I don't think we gain anything from using the Thesaurus namespace to detail everything that's e.g. located within a country and if that is all it's being used for we could probably do without it entirely. —Al-Muqanna المقنع (talk) 12:32, 10 November 2022 (UTC)
 * Okay, hyponymy is a subset relationship on referents, how about that? --Dan Polansky (talk) 12:41, 10 November 2022 (UTC)

Collapsing the table of contents to only show language names
For many entries, it is quite difficult to find the language one is interested in due to the table of contents being excessively long. Take for example this page and compare it to the same page on the Spanish Wiktionary.

The Spanish Wiktionary has a nice solution: The table of contents is collapsed for all languages except the dictionary's main language (Spanish). I want to propose that we do the same.

This proposal is different from the last discussion in that section names are only collapsed, not removed, and the section names of the English entry would be shown.

(On a related note: Does it not annoy others that the sections on the mobile version aren't collapsed, or that we don't have a table of contents there? It takes forever to scroll down to the section one is interested in.)

--Hvergi (talk) 09:51, 9 November 2022 (UTC)


 * This seems pretty reasonable, with the caveat that we should keep the table of contents floating to the right instead of making a big block that forces all of the actual definitions in the entry way down the page. —Justin ( koavf ) ❤T☮C☺M☯ 10:04, 9 November 2022 (UTC)


 * &mdash; excarnateSojourner (talk &middot; contrib) 15:45, 17 November 2022 (UTC)
 * It would be useful to have some kind of count of the number of entries with multiple L2 sections, possibly differentiating lemma from non-lemma L2s.
 * I don't see benefit to this if there is only one language in the entry, or even two or three. The benefit seems to arise only for the relatively small proportion of entries that have large number of L2 sections. I also don't see the benefit for Translingual items, whether they be for symbols, CGKV or other characters, of taxonomic names.
 * Is there a way to address the problem in the case of entries with large numbers of L2 sections without diminishing the value of the ToC in cases where it poses no problem? DCDuring (talk) 16:18, 17 November 2022 (UTC)
 * Yes, since this would be done using a script (es:MediaWiki:Gadget-CollapsibleTOC.js) we could make it only collapse large TOCs --Hvergi (talk) 13:59, 19 November 2022 (UTC)
 * Honestly I wonder if mobile UI designers in general have been having a laugh at us for fifteen years, as the lack of scrollbars on any mobile browser that Im aware of have been  forcing us to flick, flick, flick our way through pages all this time, and it seems like such an easily solvable problem since there are quite often scrollbars in other areas of the mobile interface such as (on Android at least) the list of installed apps.  The problem you mention would be more annoying to me if I wasnt dealing with the same thing on every other site already. Thanks for bringing it up, though. — Soap — 00:19, 25 November 2022 (UTC)

I was confused. You mean Vector legacy 2010 without Tabbed Languages enabled. With new Vector 2022, the TOC in the sidebar is collapsed for pages with more than 20 sections. --Vriullop (talk) 13:57, 21 November 2022 (UTC)
 * 20 sections?? I have the misfortune of studying two languages that frequently coincide with other languages in their linguistic family and are last or near-last in the alphabetic list for that family. Namely, Portuguese—which comes after Catalan, Galician, Ligurian, Old (Catalan, Galician, Ligurian), Old French, and Old Portuguese (boy, makes me feel for students of Spanish!)—and Ukranian, which comes last in the Slavic family and almost dead-last in Cyrillic languages in general.
 * It’s excruciating when there are four fully-fleshed-out entries (including etymologies, declension tables, quotes, and related words). (I get annoyed when entries in Japanese are buried beneath “Translingual” and “Chinese”—but that’s just being curmudgeonly.) Why was the number 20 chosen, and can it be made a user-modifiable variable?
 * (I’ve found, btw, that currently it is much faster to collapse the earlier sections than it is to scroll them.)TreyHarris (talk) 19:55, 30 November 2022 (UTC)
 * (I’ve found, btw, that currently it is much faster to collapse the earlier sections than it is to scroll them.)TreyHarris (talk) 19:55, 30 November 2022 (UTC)

Plato and whether concrete persons are subsenses of name senses
the "Greek philosopher" sense of Plato to a subsense of the given name sense and was reverted by  both times. I pointed to Trump, Clinton and Hitler for analogous cases, though there are admittedly also entries like Stalin where the person sense is on the same line as the name sense and entries like Plato, Aristotle and Socrates (before I changed them) where the persons and names are on different lines entirely. Dan then, which is why I elected to instead bring attention to it in the BP (again).

I would be in favor of disallowing all person senses on pages where the set of page title words is a (potentially improper) subset of the set of name words of that person. As an example, Donald Trump should not be a permissible sense for either of the pages Donald, Trump or Donald Trump but it is okay for something entirely different such as Cheetolini. I don't think there's currently super-majority support for this so if we must include these person senses, we should at least include them in some way subordinate to the name senses (i.e. either as a subsense (which I prefer) or as is done in Stalin but certainly not as a separate sense) because that's what they are: The set of referents of the name sense of Plato is any person called Plato, which thus makes the philosopher sense merely a restriction of that, a subset of the set of referents, hence a subsense. &mdash; Fytcha〈 T | L | C 〉 21:27, 10 November 2022 (UTC)


 * I favour doing exactly what you did in the first place.
 * On a related note, I think we should introduce something similar to WP:POINT (if we don't have it already). I think we are all getting sick of having Wiktionary held hostage at this point. Theknightwho (talk) 22:03, 10 November 2022 (UTC)
 * On a related note, I think we should introduce something similar to WP:POINT (if we don't have it already). I think we are all getting sick of having Wiktionary held hostage at this point. Theknightwho (talk) 22:03, 10 November 2022 (UTC)


 * The reason I reverted is in the edit summary: "restore the philosopher as the main sense following a long-term tradition: this is the primary activated semantic node under the symbol out of context; Plato the philosopher is extraordinarily notorious". I did so because I found the new format ugly and stupid, which of course is subjective. The nesting indentation helps nothing from usability perspective. Some editors started to change that practice, so it is now inconsistent. The objective of your edit seems to be to doubly demote the primary semantic node by changing it to the 2nd place and indenting it at the same time; and yet, the only translation table in Plato is for the philosopher. If there is consensus for a change, fine, let's find what the consensus is and make it a policy, issue closed. And since it seems to be a matter of preference and not of factually correct or incorrect, I think 60% should be a pass in this case; we should not be deadlocked on such issues only because we require the high standard of 2/3-supermajority and then let people fight the issues by back-and-forth in the mainspace. As an aside, having a dedicated sense in Trump for the president is user-friendly: if the user asks "what are the nicknames for Donald Trump", it is most straightforward to search for them in Trump entry, and there they are. It would be better if the president sense were not nested and indented, though; now it looks ugly and stupid. --Dan Polansky (talk) 07:21, 11 November 2022 (UTC)
 * The issue is, I'm providing rigorous arguments for why something is a subsense, whereas you're just talking about your feelings and completely irrelevant things like translation tables. From your reply I take it that you have nothing to object to the actual logic of my argument which reduces your objection to "I acknowledge that subsenses are used correctly here but I object to their correct usage anyhow because I dislike them for subjective reasons." Is this an accurate characterization of your position? Also, judging off of WT:Subsenses and the linked discussion WT:Beer_parlour/2015/May, it seems like there is good consensus to not only keep them but to employ them more often. And while I personally don't care about WT:LEMMING much, I know that you do and I want to point out the fact that the majority of monolingual dictionaries (that I use) make frequent use of subsenses. &mdash; Fytcha〈 T | L | C 〉 12:09, 11 November 2022 (UTC)
 * The contrast is between "we should be using subsensing more often" vs. "whenever there is arguably a subsense relationship, we should indicate it by indenting and nesting even if there are only two or three sense lines and even if the subsense has priority in the sense activation list over the broader sense." Maybe there is consensus for the latter as well, I don't know. As you see from the thread title, is asks whether we should "allow in rare cases", whereas some people seem to think it should be done nearly always when possible in principle. As for lemmings, I know of no lemming that has Plato entry done the way you propose. As for subjectivity, there is element of subjectivity but also objectivity: I believe the notion that the philosopher sense leads the activation list is very likely to be correct. What is subjective is the assessment of what takes priority, whether semantic relations (hyponymy and the like) or activation frequency relations and usability. --Dan Polansky (talk) 13:37, 11 November 2022 (UTC)
 * On another related note, is anyone else getting sick of Dan asserting (without evidence) that whatever he prefers is always the status quo, and that it’s up to other people to overturn it? How about he accepts the burden of proof for once, given he has provided absolutely no evidence for that. As far as I can tell, it’s just a rhetorical tactic to stack the deck in his favour in every discussion. Theknightwho (talk) 15:09, 11 November 2022 (UTC)


 * His extremely long filibusters make discussions hard to follow. Equinox ◑ 15:29, 11 November 2022 (UTC)


 * I don't think Plato-person is a subsense of Plato-name, because a person is not a name. Rather, Plato-person has an instance of Plato-name; or his name (but not he himself) is an example of the name. In object-oriented programming you would never derive Person from Name. (My preference with specific people like Plato and Einstein is to put their Wikipedia links in the "See also" section, and only include them at all if they are the overwhelmingly commonest known person of that name.) Equinox ◑ 15:30, 11 November 2022 (UTC)
 * This gets into some quite tedious (literal) semantics but it's worth noting that our name senses don't (generally) define the word as referring to a name, they tend to be non-gloss definitions explaining that the word is used as a name (for people). In that case I believe it's fair to talk about instances being subsenses, though it's ultimately really a presentational issue and I don't really mind either way. —Al-Muqanna المقنع (talk) 17:54, 11 November 2022 (UTC)


 * I don't care how tedious I am, if I'm right. ("Bureaucrat Conrad, you are technically correct. The best kind of correct!") It's definitely a strange question, and actually opens up whole cans of worms: e.g. if Smith is a name, but we have a plural Smiths, then what is it a plural of? Two Smiths are two people, not two names, but we wouldn't define Smith as a person (unless it was Einstein, haha!), and then even if we did define it as a person, then the plural would usually be two of any people with the name, and not two of the defined person. I can see how this seems like boring semantic dancing, but I think it's actually a strong indicator of why a dictionary, defining words, should not get into questions of individual personalities. Equinox ◑ 05:04, 12 November 2022 (UTC)


 * On the specific issue, like Al-Muqanna, I'm not really bothered by either presentation.
 * On the broader issue, Dan has been an obstructionist for as long as he's been here, also years before his hiatus. I'm loath to block a long-time editor who does also do some good work, but I do think we have a "one disruptive editor" problem more than a "we need new rules about POINTing" problem — no shade to TKW, rules can help in general or future cases, but rules can also be gamed and part of this user's MO is rules-lawyering, so at some point a community has to exercise discretion and block people who are not participating in the collaboratively-building-a-dictionary part of working together to build a dictionary. Seeing how many other people are also fed up with his obstructionism and filibustering independent of their feelings on the specific issues like this, I have bitten the bullet and blocked him, and repeat my block summary here in case the length gets cut off in the block log: "persistent, years-long history of disruptive editing and obstructionism; in particular, I highlight as w:WP:DE does that disruptive editing need not be "intentional. Editors may be accidentally disruptive because they [...] lack the social skills or competence necessary to work collaboratively. That the disruption occurs in good faith does not change that it is harmful"."
 * - -sche (discuss) 19:12, 11 November 2022 (UTC)
 * Further discussion at User talk:-sche. - -sche (discuss) 00:24, 12 November 2022 (UTC)
 * I want to thank for their courage in this decision; I don't see myself as ever having the courage to permanently block a long-time editor. It goes without saying that I am saddened that we miss out on the good work Dan could have done in the future for this project but I also agree that the status quo would have been untenable in the long term. I want it to be known that I, being a lover of second chances, would be in favor of unblocking him, provided that he abstains from participating in Wiktionary policy making, edit wars and and the likes (I don't want to provide a comprehensive list here because I'm sure Dan is smart enough to figure out by himself which kinds of edits are fine and which ones aren't). &mdash; Fytcha〈 T | L | C 〉 01:05, 12 November 2022 (UTC)

Is Proto-Norse a dialect of Proto-Germanic
The differences between Proto-Norse and Proto-Germanic are pretty small. Would it perhaps be a good idea to treat Proto-Norse as a late dialect of Proto-Germanic, much like we did for Frankish? --  02:46, 12 November 2022 (UTC)
 * as the main (only?) editor of Proto-Norse. Thadh (talk) 00:44, 13 November 2022 (UTC)
 * Proto-Norse is an attested language; we have several hundred words in it. Now, the earliest Proto-Norse is so close to Proto-Germanic that it might better be classified as Proto-North-West Germanic (the common ancestor of North and West Germanic); Elmer Antonsen argues for this, and he is right in that it does not show any specifically North Germanic innovations, only common ones like *ē > ā, *-ai > ē, *-ō > -u. This would also solve the issue of a word like ᚱᚨᛇᚺᚨᚾ, which as it is is classified as Proto-Norse, even though it might just as well be "Proto-English", the two languages being almost identical at this time.
 * We further have certain innovations common to Anglo-Frisian and North Germanic, but not shared with the more southern West Germanic languages, such as the 3rd person plural present indicative *eʀun, or 2nd person plural pres. ind. *eʀt . Another one would be the collapse of the n-stem oblique conjugation into that of the accusative, as we already see in the genitive raihan above: Old English: -a, -an, -an, -an, Old Norse: -i, -a, -a, -a Old High German: -o, -on, -en, -en. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 11:19, 13 November 2022 (UTC)
 * All the pings: --  08:51, 13 November 2022 (UTC)

because you’re looking for a word you don’t know yet is ください, you won’t find it; but searching for, you will. TreyHarris (talk) 22:32, 30 November 2022 (UTC)

IPA transcriptions for Southern British English
(NOTE: I've moved this discussion from the Tea Room as I realised too late that was completely the wrong place for it. Sorry for any Deja Vu!)

Over the past couple of years I've been following the work of Dr Geoff Lindsey (laid out on his blog posts, YouTube channel, and his book English After RP, ISBN 978-3-030-04356-8), and specifically his increasingly popular suggestion that the traditional vowel transcription of Conservative RP used to represent a modern Southern British English accent is now so out of date as to be actively misleading, confusing, and generally unhelpful; and that Gimson's intent with his RP transcription system is that it should always be revised and updated to reflect modern changes in pronunciation, which it generally-speaking hasn't been. He therefore lays out a much more consistent and also phonetically accurate transcription system (on his blog and in a half hour video, as well as in the aforementioned book) which I've noticed has gained a little traction, and has certainly received favourable opinions from highly regarded linguists in the field. As a site that should probably remain up-to-date with linguistic research and scholarly opinion, I wonder if there could ever be a consensus for Wiktionary (and presumably also Standard Southern British transcriptions in Wikipedia) to adopt these new symbols and perhaps also new terminology like "SSB" ("Standard Southern British") to replace the confusing and no longer accurate descriptor of "RP" (which largely gets its name from a set of social conditions that no longer exist). Does anyone have any strong opinions on this matter? How might such an idea be progressed were it to come about? (Sorry, I'm not a big wiki-er so I'm not sure if this is the best place to solicit general discussion on this topic). Muzer (talk) 19:59, 14 November 2022 (UTC)


 * I’m a fan of Geoff Lindsay’s, he explains many things about phonetic representations that are confusing even to native speakers and makes some good suggestions about labelling some sounds as diphthongs that are traditionally considered to be monophthongs and some as monophthongs that are traditionally considered to be diphthongs. He also rightly debunks the silly notion that a stressed schwa is impossible (though this doesn’t typically occur in SSB). Some things are a little less convincing to me though, I’d say that ‘near levelling’ is still a minority pronunciation and the onset of the goose vowel isn’t quite as far back as he suggests for most speakers. It seems a bit exaggerated to say that the traditional PUT vowel (the capital letter omega) is nearly extinct too. His idea of transcribing the final part of the diphthong in words like ‘no’ and ‘how’ as a ‘w’ does make a lot more sense than using an omega ever did though, even in traditional RP (U-RP). --Overlordnat1 (talk) 02:16, 15 November 2022 (UTC)


 * Cheers. Yes, I agree that some of his observations on frequency are maybe not completely right in my experience, though I would say he's probably right on the "near" front; hearing it as a monophthong is very common in my experience. I find the "PUT" vowel is also very notably different between south and north; I hear it a lot more fronted in the South, so I think I'm inclined to agree with him there too. In any case I don't think his subjective observations on frequency really affect the core of his suggested new transcription system, which I think most linguists would agree is very much sound. I've also just realised I've put this entirely in the wrong section; I will move it to the beer hall. --Muzer (talk) 01:02, 16 November 2022 (UTC)


 * I don't see why SSB could not be added alongside traditional RP. I'd still keep the latter to illustrate diachronic change. Nicodene (talk) 00:21, 17 November 2022 (UTC)
 * Yes, exactly, we should keep RP and add any modern, different pronunciations alongside it. (I'm in the process of sandboxing ideas for ways we might display older vs modern pronunciations.) - -sche (discuss) 00:51, 17 November 2022 (UTC)

en- as /ən-/ in GenAm
Splitting this off of the discussion above because it's a somewhat separate issue and that discussion is big enough as it is: /ən-/, /əm-/ was recently added to some entries like entangle, embattle (see edit history) as a GenAm pronunciation. I haven't been able to find any evidence of this being a GenAm pronunciation (and it would make the prefixes indistinct from un- for GenAm speakers who, as discussed above, pronounce un- with schwa). Do we have sources for these words starting with a schwa, either as a GenAm pronunciation or as a pronunciation in some regions or dialects we could label? (Pinging User:Whoop whoop pull up.) - -sche (discuss) 03:25, 16 November 2022 (UTC)


 * In my experience, GA speakers pronounce unstressed en- (as in entangle) as /ən-/ (stressed en- [which is generally restricted to very formal registers of GA when dealing with en- as a prefix, although it does occur in the colloquial registers in other contexts, such as Endermen] remains /ɛn-/, except in some foreign loans like encore or en garde, where it's pronounced /ɔn-/ instead), while un- is pronounced /ʌn-/ in both stressed and unstressed contexts (thus preserving, e.g., entangle/untangle as a ə/ʌ minimal pair and avoiding the collapse of the phonological distinction between the two prefixes that would, as noted, occur if un- were also pronounced with a schwa). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:42, 16 November 2022 (UTC)
 * : is that /ən-/ or is it /n̩/ due to loss of an unstressed vowel before an syllabic nasal? Chuck Entz (talk) 16:01, 16 November 2022 (UTC)
 * Examining my own speech (it being the one most immediately available to me for testing), there's definitely a /ə/ starting off that syllable, with an initial (if short) burst of air making it out through the mouth before the tongue makes contact with the alveolar ridge. In my experience, that's also how the speech of other GA speakers sounds, although with a somewhat-lesser degree of certainty there. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:25, 16 November 2022 (UTC)


 * If un- is treated as having the vowel /ə/, it would probably be a good idea to transcribe it with a secondary stress marker, which would prevent ambiguity with fully unstressed /ən/ en-. There are other reasons to think of un- as having secondary stress: I believe that, like thirteen and similar words, disyllables starting with un- can be accented on the first syllable in some cases due to stress retraction (e.g. in our audio for "unknown quantity", I don't hear much stronger stress on the second syllable of "unknown" vs the first).--Urszag (talk) 19:03, 17 November 2022 (UTC)


 * OK, I (and other editors, see the edit history of America) have been changing these back to /ɛ/ or /ɪ/ as appropriate: reduction to schwa may occur (allophonically?) for some speakers (some regions?) when the words are particularly unstressed, but it doesn't seem to be the GenAm pronunciation. Perhaps all our differing opinions over whether un- or en- (or bird, etc) has a schwa are a cautionary tale about editors unscientifically assessing of personal pronunciations of things and assuming that's the phonemic value of a whole dialect, against the analysis of more scholarly and in some cases scientific sources. Perhaps the best approach is, while using the separate symbols (/ɪ/, /ɛ/, /ʌ/, /ɝ/, ...), to indicate in Appendix:English pronunciation the circumstances under which these may reduce to schwa, or the linguists who do vs don't think they reduce... - -sche (discuss) 22:50, 21 November 2022 (UTC)

Are we transcribing American English diphthongs wrong?
Going through our IPA transcriptions for lots and lots of diphthong-containing English words, I'm struck by how little most of our diphthong transcriptions have to do with how the diphthong in question is actually pronounced. I'm a native speaker of American English (born and raised in Central Massachusetts, since transplanted to Minnesota), so I'm gonna reserve judgment on our IPA transcriptions of diphthongs in other English dialects, but as regards AmE, the transcriptions in the third column of the following table seem to reflect actual pronunciation much better than the current ones (in the second column):

Should we instead be transcribing these diphthongs using the transcriptions from the third column, rather than the second? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:27, 16 November 2022 (UTC)
 * Where I live (mid-western Canada), /aɪ/ is an accurate transcription. I'm so used to seeing /aʊ/ and /oʊ/ that I'm not sure about my ability to judge whether those are accurate or not, but I think they are. I think the first sound is most accurately transcribed as /ei/ where I live (and that's how I've been transcribing it in entries...whoops). The other two suggestions seem right to me. Makes me wonder just how close to standard American my accent is, though. I thought it was closer, but maybe I haven't been perceiving a lot of differences. Andrew Sheedy (talk) 03:44, 16 November 2022 (UTC)
 * Re your last point, I feel this is why we really need [as Wikipedia would put it] "reliable sources" and not just our intuitions when deciding all these pronunciation questions, because people too often misidentify sounds under the influence of expectations (context, etc), like the undone example further up this page. - -sche (discuss) 04:29, 16 November 2022 (UTC)
 * Are you sure it's /aɪ/? Have a listen to Geoff Lindsey's clips of King Charles saying it (2nd clip on the page), I might be wrong but don't think I've heard that from Canadians. —Al-Muqanna المقنع (talk) 13:56, 16 November 2022 (UTC)
 * Hmm... I think it might actually be /ai/ for me. I was mainly focusing on the first vowel, since Whoop whoop pull up originally had /ɒi/ in the table. Andrew Sheedy (talk) 22:44, 17 November 2022 (UTC)
 * (and also pinging since it's relevant to some of the points they've made) Apologies there; I originally had the wrong lead vowel in the suggested transcription for the gyro/rice/why diphthong (damn you, Latin alpha v. reversed Latin alpha).  Should be fixed now. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:16, 18 November 2022 (UTC)
 * I once wondered the same thing about /eɪ/. I, too, used to wonder if [ɛi] might be better. But I was reminded that Dutch has an /ɛi̯/ sound, and to my ears at least it doesn't sound the same as English's /eɪ/ sound. /eɪ/ does genuinely sound like [e] with an offglide. For plenty of American English speakers, the sound even seems to be [e(ː)]. So in General American /eɪ/ is definitely /eɪ/ (again, for some, [e(ː)]).


 * I disagree with /aɪ/ actually being [ɒi]. That's simply not true. Obviously, there is what can be called an /aɪ/ - /ʌɪ/ (often [ɐɪ̯]) split for many speakers, but /aɪ/ is not [ɒi]. [EDIT: I see that the table above has been changed to suggest /ɑi/ instead of /aɪ/. For reference, the table originally suggested /ɒi/ instead of /aɪ/. That was what this segment of what I said was responding to.]


 * I also disagree that /ɔɪ/ is actually [oi].


 * I think that the transcriptions that we use for diphthongs for General American are, broadly speaking, pretty accurate. Tharthan (talk) 13:53, 16 November 2022 (UTC)
 * The offglide in raise, gyro, and koi is definitely not ɪ, though, and neither is the offglide in stow or Mao ʊ (as a matter of fact, the constructions with ɪ or ʊ as the offglide are nearly unpronounceable if you actually try to produce them, and, if you somehow do manage to do so, sound nothing like the diphthongs that they supposedly represent). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:20, 16 November 2022 (UTC)


 * I would sooner analyse the day, jade vowel as just /e/; indeed, Baruch College (City of New York University)'s pages on how to pronounce English have the vowel of say and paid as just /e/ (not even diphthongal /eɪ/, let alone /ɛi/), and the vowel of boat as just /o/; this Berkley page, although they accept the conventional notation, similarly counts "[i] [eɪ] [ɑ] [oʊ] [u]" among the "vowels", and not among the "diphthongs" ("[aɪ] [aʊ] [ɔɪ]"). Amusingly, the page on how to say /ɔɪ/ outright says to say /o/ and then /ɪ/, seemingly acknowledging that the first element is /o/ (although I agree the second element seems more /i/-y, leading to pronunciations with a very pronounced /i/, like ). FWIW Baruch also uses ɑɪ, ɑʊ (something I have also seen a few other reference works do) where we use aɪ, aʊ, and they retain /ʌ/ rather than Merriam-Webster's schwa for words like come. All of these sources seem rather basic and I'd like us to find even better references. - -sche (discuss) 00:47, 17 November 2022 (UTC)
 * For me, definitely [ɑɪ] and [æʊ] are closer to the truth about the first half of these diphthongs. But I disagree that the second half of any of the diphthongs is accurately represented as [i] or [u]. If anything I would say more like [ɑe] and [æo]. I think the issue that is confusing Whoop whoop pull up is that English [ɪ] and [ʊ] are much more central than cardinal [ɪ] and [ʊ], and representations like [aɪ] and [aʊ] are using the cardinal variants of these vowels even while [ɪ] and [ʊ] by themselves stand for the centralized variants used in English. Benwing2 (talk) 05:28, 18 November 2022 (UTC)
 * There is a how-now split of sorts where for me some words retain [aʊ] and others have shifted to [æʊ] with no clear pattern that I can see. It also happens before consonants, so town and gown dont rhyme.  Im not sure how widespread this is and yet again I might be just living in a bubble, but since this split likely has nothing to do with the cot/caught, horse/hoarse, etc, it could be quite widespread after all. My guess is that some of the dialects where the  shift is  completely to [æʊ] lent us some of their words through cultural osmosis and  that is why there is no perceivable pattern. — Soap — 11:28, 18 November 2022 (UTC)
 * Well, it is always fascinating to see when different, (relatively) nearby dialects influence one another. I would be curious to read some studies—from this century—on exactly to what extent, say, Midwestern and/or Southern speakers for instance living in places close by to or right where the Midwest and the South meet have each been being influenced by the other dialect. I have personally encountered speakers who came across to me as speaking what sounded markedly like a half-Midwestern, half-Southern dialect, and I have wondered  'Where exactly is that person from?' 


 * It will be interesting to see how things develop going forward. Mind, you have to remember that even if, in the case of your dialect, there actually is what would appear to be an informal (and not even entirely conscious) split happening with /aʊ/ (where, as you describe it, some words have [æʊ] and others have what you describe as [aʊ]), it could just be a transitional state within a shift where one pronunciation is being slowly superseded by another. You'll have to see what the situation is a decade or two from now. Tharthan (talk) 02:10, 19 November 2022 (UTC)


 * In response to the corrected table: while the forms on the right seem reasonable alternatives to me, I don't find any of the forms on the left inaccurate enough to justify calling it "wrong" for American English as a whole. The forms on the right are closer to my perception of the sounds, but I am not actually sure that this corresponds to being more phonetically precise. For example, I am pretty sure that the start of my MOUTH vowel isn't actually front of center; it just sounds like an "æ" to me because my TRAP vowel has a range that covers both front and central qualities (e.g. I have a central vowel for TRAP before dark /l/), so I think a narrow phonetic transcription of my pronunciation would use [a] in both [aɫ] as in pal and [aʊ] as in cow. Likewise (in reverse), I'm not sure my PRICE vowel actually starts with a phonetically backer quality than MOUTH: it might just sound like that due to the contrast with its later trajectory, like one of those optical illusions where the same shade of gray looks either white or black depending on what color is next to it. Regarding the use of ɪ and ʊ to represent the offglides of English diphthongs, this is defended from a phonetic point of view by the linguist Mark Liberman in the blog post "The rɑɪt sɑʊnz?" (Language Log, October 2, 2010). My not very thought out reactions are that I would be comfortable with changing the transcription of the nuclei in ɪɚ and ɔɪ to [i] and [o], but uncomfortable with changing the transcription of the nuclei in eɪ and aʊ, and I'm unsure about the other proposals.--Urszag (talk) 06:53, 19 November 2022 (UTC)


 * I like or  as the second element of the diphthongs in General American phonemic transcription. The  convention seems to indicate that the second element of diphthongs is opener than the  phoneme at the beginning of a syllable (unlike the similar diphthongs in Spanish; for instance, compare  with Spanish ), but I don't think this is phonologically relevant since we don't have a contrast between more and less close offglides in diphthongs (between  and  for instance). I feel like the offglides have the phonological feature of "high" or "close" and I would reserve the exact details of just how high or close for phonetic transcription. And  as the second element of a diphthong confuses some people because the second element is usually different from the independent phonemes, which are usually not near-close near-front, but closer to , and seems to give some people the impression that  at the end of a diphthong basically mean nonsyllabic . Writing the offglides as  might lead to less confusion with the independent phonemes at least, though it might encourage some people to use weird pronunciations with very low offglides.
 * The first element of makes sense to me. The first element feels to me close enough to the goat vowel. It doesn't really match my thought vowel, which is very close or identical to the lot vowel . I feel like you'd have to have a very close and rounded caught vowel or a very low and unrounded first element of choice for the two to be similar, which might be true of some Southern accents I've heard.
 * doesn't describe my pronunciation because I don't perceive that vowel as being related to the dress vowel but rather as a closer monophthong. It looks like the Standard Southern British transcription and my face vowel starts higher and is often not very diphthongal. But  might make sense for some eastern accents and related varieties of General American.
 * The transcription of the first element of isn't true for all General American. The first element, when it is not affected by Canadian raising, can be front or back in three main configurations and I think all of them might occur in General American-type accents. There are the accents where the first element of the diphthong has the same frontness as the second element, where the first element is the same for both , and where the first and second elements have different frontness  like in your transcription. These can be schematically represented vowel-chart-style as  ,  , and  . I'm not sure which of these is more common in General American.  would make sense to me because I intuitively analyze the first element as the lot vowel, but that probably doesn't make sense for all accents.
 * I like or  because my near vowel seems to have the fleece vowel plus . I think  makes sense for the vowel of mirror in those American accents where it's distinguished from the vowel of serious, but in my accent those aren't distinguished and the merged vowel seems to belong to the tense phoneme to me, even though it is opener than the same vowel without an  after it. (I feel like most of my vowels before  are tense, so here, square, force, cure are, not .)
 * Sorry for the long post, but I hope it's useful. — Eru·tuon 00:07, 20 November 2022 (UTC)
 * @Erutuon - TL;DR: AmE diphthongs are really, really messy? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:06, 24 November 2022 (UTC)
 * Seems like it. I discovered a map in the Atlas of North American English showing some of the variation in the ow and eye diphthongs as it relates to the group of accents that they call "the North". It's on page 2 of [the PDF of Chapter 14. The legend is a bit cryptic, but as I understand it, it's describing the relative frontness of the first part of the diphthong, and "F2(aw) < F2(ayV)" means that ow is backer than eye (roughly, ), "F2(aw) - F2(ayV) < 75" means ow is slightly fronter than eye (roughly ,  ), and "F2(aw) < F2(ayV)" means ow is significantly fronter than eye (roughly ,  ). So ow is backer or roughly equal to eye from the northern prairies of the US and Canada into the southern Great Lakes region, and some parts of upstate New York and New England, but the rest of the country has ow fronter than eye. I feel like some pronunciations from either group would sound GA enough, as long as the first element is open and not very rounded, and the second element of ow is rounded and not too fronted. — [[User:Erutuon|Eru]]·tuon 18:10, 5 December 2022 (UTC)
 * Owtchie. It looks like Appendix:English pronunciation's vowel section'll need a big chunk of added explanation regarding diphthongs in GA. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:12, 6 December 2022 (UTC)
 * About the only common thread I'm picking up on between all of those different realizations of the GA ow and eye diphthongs is that none of them are accurately represented by our present convention of using /aʊ aɪ/! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:16, 6 December 2022 (UTC)
 * I wouldn't say is inaccurate. It's one possible representation of a same-first-element pronunciation, but the use of  indicates that the second element of the diphthong isn't fully close, which I think is correct (haven't seen studies on this though). But it's confusing because the second element doesn't match many people's lowered and centralized pronunciations of the independent  vowels, and I think it's an unnecessary degree of detail. — Eru·tuon 15:07, 7 December 2022 (UTC)
 * @Erutuon: So what should we do with Appendix:English pronunciation regarding the highly-variable pronunciation of GA diphthongs? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:18, 10 December 2022 (UTC)
 * The appendix is currently describing (broad) phonemes, yes? And the various competing ways of representing /aɪ/ (etc) aren't contrastive with each other (they're competing notations / realizations of the same phoneme) ... so pick one notation for the broad phoneme (perhaps the current, traditional one, or perhaps we reach consensus for a different one) and mention competing representations and various narrow realizations in a footnote like /aɪ/'s footnote about /ʌɪ/? If specific dialects normally use a different phoneme/notation, the way e.g. Southern does (using just /a/ in certain circumstances), I've suggested mentioning that in the table or footnotes. (IMO we should try to have standards for all the dialects/lines we commonly include, since we don't want different people representing the same Boston pronunciation of a word two ways based on different sources' house notation styles any more than we want people representing representing the same GenAm pronunciation of the bed vowel two ways just because certain Collins dictionaries notate it /bed/...) - -sche (discuss) 22:26, 10 December 2022 (UTC)


 * For whatever it's worth, the book I mentioned in the discussion further down of /ol/, Syllable Structure: The Limits of Variation (2009) by San Duanmu (professor of linguistics at the University of Michigan), lists on page 185 various common, competing ways of notating diphthongs: ai vs aɪ,  au vs aʊ,  oi vs oɪ vs ɔɪ vs ɔi,  ei vs eɪ vs (ɛɪ),  ou vs oʊ,  and writes of these: "no one uses [ɛɪ] for British or American English, probably because the starting point is higher than [ɛ]. One might also point out that the ending point of a diphthong does not quite reach the height of [i] or [u] [...] However, it is possible that the ending target of a diphthong is a tense high vowel, and owing to the lack of time the target is not quite reached. There are two other reasons against using lax vowels. First, a diphthong is like a tense vowel, and so we should represent diphthongs with tense vowels. Second, the lax vowels [ɪ] and [ʊ] do not occur in word-final position in American English, but diphthongs can. [...] Therefore [...] for now I use tense vowels to represent diphthongs, although I shall argue later that the feature   is not contrastive in diphthongs." Some of that might be getting into higher-level theory than dictionary transcriptions normally do. - -sche (discuss) 22:26, 10 December 2022 (UTC)

/ɾ/ in GenAm
(Split out of a discussion above.) How do people feel about using /ɾ/ in place of /t/ (and /d/?) in /broad/ GenAm transcriptions of words that can exhibit ? My impression is that our practice has been to restrict it to [narrow] transcriptions, which seems good to me because AFAIK flapping is facultative and noncontrastive, but some entries have been changed to use it as the broad transcription, like sanity. BTW, for completeness: in the past, it has even been suggested to use /d/ (e.g. in otter), which someone in that discussion said the OED does. - -sche (discuss) 08:55, 16 November 2022 (UTC)


 * Is there any reliable source that has /ɾ/ as a separate phoneme of GenAm? I feel like I've only seen it as an allophone of /t/ or /d/. It also doesn't seem like GenAm natives internalize it as a contrasting phoneme either (considering the difficulty that arises when trying to explain the pronunciation of it in languages that actually have it as a phoneme). See also: Flapping. AG202 (talk) 13:51, 16 November 2022 (UTC)


 * Given that, a. when a word is stressed, /t/ and /d/ reappear immediately in whatever the relevant word is, b. the exact extent of flapping in speech differs from speaker to speaker; some speakers have it to the extent that they would consistently pronounce winter as [ˈwɪɾ̃ɚ] and twenty as [ˈtw̥ɛɾ̃i], others have it only to the extent that when a word with a medial /t/ or /d/ is said quickly in speech, it becomes [ɾ], and c. to AG202's point, speakers do not perceive it as an separate phoneme, I would strongly oppose using /ɾ/ in place of /t/ and /d/ in broad General American transcriptions. Tharthan (talk) 14:09, 16 November 2022 (UTC)
 * There is a contrast, in that flapped /d/ leaves a lengthening effect on a preceding vowel, while flapped /t/ does not. (A difference which does not appear, in practice, to be used as a cue for distinguishing /d/ and /t/.) See here. Nicodene (talk) 14:23, 16 November 2022 (UTC)
 * That's not the contrast being talked about though. That wouldn't lead a different in representation of [ɾ] and whether or not it should be a phoneme. AG202 (talk) 14:47, 16 November 2022 (UTC)
 * Yes, it does lead to different phonemic representations of [ɾ] < /t/ and [ɾ] < /d/. Otherwise, you'd have to posit a newly phonemic vowel length contrast. Nicodene (talk) 14:56, 16 November 2022 (UTC)
 * Yes, but the initial question is whether or not /ɾ/ should be a phoneme itself in the place of /d/ or /t/. AG202 (talk) 15:10, 16 November 2022 (UTC)
 * And? This is an argument against /ɾ/. Nicodene (talk) 20:05, 16 November 2022 (UTC)
 * Ahhh apologies, I wasn't sure why that was being brought up, but it makes sense now. AG202 (talk) 14:04, 17 November 2022 (UTC)
 * a. In flapped accents, /ɾ/ is mandatory in some contexts (e.g., GA shutter/shudder) and persists even when the word is stressed, without reverting to /t/ or /d/. In some other contexts (e.g, GA winter or militaristic), flapping is nonmandatory and does sometimes (though far from always) disappear with stress.  b. The existence of minimal pairs between /ɾ/ and /d/ (and likely /t/ as well, though I haven't pinned those down yet), such as Beatty (/ˈbi.ɾi/) / beady (/ˈbi.di/), argues strongly in favor of /ɾ/ being a separate phoneme in GA. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:11, 16 November 2022 (UTC)
 * @Whoop whoop pull up Couldn't the example of Beatty vs beady be argued as /t/ with an allophone of [ɾ] vs /d/? I don't see a place where [ɾ] would have a minimal pair with /d/ where it's not clearly an allophone of /t/. (Same thing with the reverse, [ɾ] vs /t/) Also, regardless, the consonant is still the same to me in those two words, I don't pronounce it differently, both as [ɾ] (I can feel the flap happening). The distinguishing factor between the two for me (and also listening to forvo and merriam-webster) focuses on vowel length, which Nicodene discussed above. AG202 (talk) 17:06, 16 November 2022 (UTC)
 * In that particular case, I suppose you could make that argument (although I strongly suspect that, with enough digging, you'll find at least a few full ɾ/t/d minimal triplets), but the case for /ɾ/ being nonphonemic in GA still founders on the fact that there're a solid core of cases (such as shutter/shudder, duty/doody, metal/medal, etc.) where /t/ or /d/ cannot be substituted for /ɾ/(attempting to do so makes it sound like you're imitating a foreign accent, which is a pretty surefire way of telling that that is not a way it can be pronounced in your home dialect), making any argument that /ɾ/ is merely an allophone of /t/ or /d/ untenable for those words (although there do exist other words which do exhibit facultative, rather than mandatory, flapping, such as dentist, Mediterranean, militaristic, planting, etc., for which one could make a reasonable case for /ɾ/ being allophonic rather than phonemic). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 04:55, 17 November 2022 (UTC)
 * But that does not prove that there is a phoneme /ɾ/: even if it's mandatory to use [ɾ] in some words, this can still be interpreted as a conditioned allophone rather than as a separate phoneme. Compare the standard treatment of word-initial [st] [sp] [sk] (start, spin, skin): they are analyzed as /st/ /sp/ /sk/, with the same phonemes as /t k p/ in tart, pin, kin, even though the latter set of words are pronounced with aspirated allophones [tʰ pʰ kʰ] which would sound unnatural in the first set of words (we can't use *[stʰ spʰ skʰ]), and even though the contrast between /t p k/ and /d b g/ is neutralized after word-initial /s/. It is not standard to treat unaspirated /t˭ k˭ p˭/ as their own phonemes.--Urszag (talk) 15:33, 17 November 2022 (UTC)
 * Word-initial /t/ can also be flapped, even in words like tonight per YouGlish. The initial sound there is, indisputably, an underlying /t/. A similar case can be made for the /t/ in bet versus betting. Nicodene (talk) 18:57, 17 November 2022 (UTC)
 * I'm not so sure about there being minimal triplets; the environments in which flapping can vs can't occur seem to make in difficult for there to be sequences otherwise identical in their phonemic elements like vowels and stress, where /t/ (or /d/) can flap in only one. If we discount word boundary and secondary stress differences, we might contrast "bee tea" [tʰ-] (unless it can also be flapped like the tonight example), "Beatty" [t-, ɾ-], "beady" [d-, ɾ-], but that approach would seem to equally well phonemicize things like aspiration ("hat spin", "hat's pin"), and dark l ("pool abs", "poo labs"). I don't know, perhaps we should consider phonemicizing all these things, including the vowel length differences that phonemicizing [ɾ] would require — when I look in scholarly sources, much of what I've seen so far that directly speaks against phonemicizing [ɾ] seems to amount to "it would make the description of the system more complicated, which we find less phonemic, albeit truer to the narrow phonetic realization",* but I am wary of our transcriptions straying far out away from what reliable sources seem to consider the /broad/ phonemes of American English to be, positing something like /ˈpʰiɾi/ where everyone else sees /ˈpiti/, [ˈpʰiɾi]. (That said, would someone like to add the length difference to the [narrow] transcription of rider vs writer, which is mentioned above [and below] as contrastive, but not indicated in the entries?) *For example, William Frawley, International Encyclopedia of Linguistics: 4-Volume Set, 2003, page 332, says: "Another much-discussed problem is the minimal pair writer [...] vs. rider [...]. The surface contrast is in the length of the vowel. But most analysts felt that the correct phonemicization registers the contrast in the consonant as [t] vs [d]. Comparison with the morphologically related write and ride, as well as the restricted distribution of the flap, justifies locating the contrast in the consonant. The phonetic forms can be derived by ordered application of the independently needed rules of vowel-length assignment and flapping. But if we proceed simply on the basis of minimal pairs, then we must phonemicize the vowel length—even though this contrast only appears before the flap, and correlates elsewhere with voicing of the following consonant." - -sche (discuss) 20:02, 17 November 2022 (UTC)
 * I think you're right to connect phonemicizing dark L and aspiration with the current discussion. I think they're on a similar level and it's quite possible that they are developing to the point of being phonemes in GenAm. However, I'm disinclined to say that these features are already part of General American. It seems more likely that specific regional accents are moving towards phonemicization, but not the "standard" accent. My skepticism about a broad transition of /ɾ/ to phonemic status is that I'm not aware of any English speakers ignorant of linguistics who perceive it as such. For instance, people teaching non-native speakers English will often use /tʰ/ or /d/ in its place in order to differentiate, for instance, the pronunciation of "latter" and "ladder". Or people sometimes use /tʰ/ and /ɾ/, which suggests that they perceive /ɾ/ as the same sound (and therefore an allophone) of /d/, which lends support to the idea of using /d/ in broad transcription. The argument against that, however, is that it would confuse many people who haven't even noticed that they pronounce pairs like "latter" and "ladder" the same way, which is also common, in my experience. This all suggests that in General American, /ɾ/ is just an allophone. The difficulty English speakers often have with flapping /ɾ/ in foreign languages, as mentioned above, is a further reason to think that /ɾ/ is not perceived as a phoneme. However, a potentially interesting data point, which may contradict what I've said, is that most of my siblings, before they learned to read, would typically render /ɾ/ as /tʰ/ when stressing the pronunciation of a word. I have heard several young children pronounce "ladder" as [ɫætʰɚ]. Andrew Sheedy (talk) 23:15, 17 November 2022 (UTC)
 * Maybe wait another couple decades and see, then? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 09:11, 20 November 2022 (UTC)

ja-noun for multiple counters
There are many Japanese nouns which have more than one counters (for example, can be counted with,  or ). Is it possible to put more than one counters in ja-noun? Also, do we need a classifier-by-language category system for Japanese? --TongcyDai (talk) 10:10, 16 November 2022 (UTC)


 * AG202 (talk) 13:46, 16 November 2022 (UTC)


 * I'm not really up on our Lua infrastructure and defer to others regarding . Separately, I think a classifier-by-language category system for Japanese might well be useful.  ‑‑ Eiríkr Útlendi │Tala við mig 18:53, 16 November 2022 (UTC)
 * Support inclusion of multiple classifiers and categorisations for Japanese and Korean and addition to Category:Nouns by classifier by language. Multiple classifiers are already supported in a number of languages, such as Thai, Khmer, etc. Please check Thai, which uses two classifiers in the headword like this: . Probably worth checking the implementation of Module:th-headword.
 * It seems Vietnamese supports only one classifier with cls.
 * BTW, what should be done in cases where a classifier used depends on the sense? These could be split into multiple headwords. Anatoli T. (обсудить/вклад) 22:14, 16 November 2022 (UTC)
 * Sometimes a Japanese word is spelled using different kanji depending on the sense. We've used  on the specific sense line(s) for those cases.  Could we do something similar for counters?  ‑‑ Eiríkr Útlendi │Tala við mig 23:32, 16 November 2022 (UTC)
 * Vietnamese supports multiple classifiers actually, with auto categorization; see, for example. PhanAnh123 (talk) 04:37, 18 November 2022 (UTC)
 * Thanks. The display there is OK, not broken but it's not like the Thai implementation where entry is categorised correctly by both Category:Thai nouns classified by กรณี and Category:Thai nouns classified by เรื่อง (both classifiers are used correctly for categorisations). So the Vietnamese solution is imperfect. The Thai entry doesn't require extra sets of brackets:, compare with the Vietnamese . Anatoli T. (обсудить/вклад) 05:09, 18 November 2022 (UTC)

What is the point of "partial" rhyme pages?
e.g. Rhymes:English/ɛnv... -- who would find this useful? Not even a stumped poet, as far as I can see. Equinox ◑ 18:42, 16 November 2022 (UTC)


 * They're mid-points in the infrastructure for getting to rhymes if you're wondering if any words end in a certain sequence (say -envik) and you can't think of one to look up, e.g. if you're adding what you think may be the first one (something I've used when adding Arabic loanwords with unusual codas) — or if, like at present with the two -envik words listed there, we don't have English entries with rhymes to look up. You go to Rhymes:English and click on the right vowel /ɛ/, get all the rhymes starting with Rhymes:English/ɛ-, Rhymes:English/ɛn..., Rhymes:English/ɛnv..., and find out if there are -envik words. If/as we migrate away from Rhymes: pages to categories, this ability will not be lost, because Category:Rhymes:English allows much the same thing by just presenting all possible rhymes up front, which is perhaps more efficient and better at requiring that only English words (with English entries) are listed and not foreign words and music album names and such. - -sche (discuss) 19:27, 16 November 2022 (UTC)
 * Speaking from experience, those pages are also useful for looking for near rhymes of a word, when an exact rhyme doesn't work. This specific page doesn't accomplish that, but many do. Andrew Sheedy (talk) 23:17, 17 November 2022 (UTC)

Etymology section for irregular non-lemmas
The standard established in Beer parlour/2016/March is that non-lemma forms shouldn't include etymologies. However, there are many cases where a form is irregular and it would be interesting to include a short etymology section that explains the origin of the irregularity (doing so in the lemma seems too verbose). One example is Portuguese fará, corresponding to the third-person singular future of indicative of the verb fazer, the expected form is *fazerá. Furthermore, I find the current message left by Template:nonlemma, insufficient in several cases. For instance, it would be interesting, imo, to briefly state that the verb form crerá is a regular suffixation of the lemma crer with -á, thus giving the reader the opportunity, not only to follow the etymology of the lemma form, but also the one of the suffix (which is interesting by itself). For instance, the example above could be summarized in a template that returned something akin to "From . For further etymology, see the corresponding lemma form." This approach also brings the benefit of categorization.

In short, I would like to have your thoughts on the following two changes:
 * 1) Irregular non-lemmas can have etymologies explaining the origin of the irregularity
 * 2) Regular non-lemmas can have short etymologies (preferably templatized) linking to the lemma form and to an affix or to the glossary term that explains the derivation.

What do you think? - Sarilho1 (talk) 11:57, 17 November 2022 (UTC)
 * I support common sense in cases like this. I don't think there's any point adding special etymologies to every non-lemma form out there, but I don't think they should be banned either, and they can certainly be helpful to explain irregular forms. I'm having a hard time finding where in the discussion you link that standard was established—it seems like the discussion was about a separate issue, whether to group all non-lemmas in a single etymology section, and it doesn't come to an obvious decision—but if it is a norm it doesn't seem particularly well-enforced anyway; I noticed that despite having a very predictable morphology Hungarian does have etymologies on stuff like plurals (lakatosok etc.). —Al-Muqanna المقنع (talk) 12:35, 17 November 2022 (UTC)
 * When one inflected form has a substantially different etymology, as in cases of suppletion, we absolutely allow etymologies: see . When you have different stems for whole blocks of the paradigm, though, I believe we cover that at the lemma. Chuck Entz (talk) 15:13, 17 November 2022 (UTC)
 * Somewhat relatedly, I note that we don't any etymology for many inflectional morphemes and don't have entries for some. DCDuring (talk) 15:51, 17 November 2022 (UTC)


 * On one hand, there've been irregular forms where I've been tempted to put etymological information about the source of the irregularity; OTOH, we so uniformly centralize information to lemmas that I don't know how many people would think to look up an inflected form instead of a lemma. This is also my concern with certain plural-only senses of words most people would be able to figure out (and hence likely to look up) the lemma/singular of; like in those cases, I think we should have little pointers between the entries, e.g. if there's more information (about was) in was than there is in be, be should say something like "see was for more on that form". - -sche (discuss) 20:07, 17 November 2022 (UTC)
 * That seems sensible to me. I had a similar concern about Latin pluralia tantum not being signposted at the "singular" forms that people might look up; I added a see also to plural-only from  after an IP tried to add definitions of minae to the second one. —Al-Muqanna المقنع (talk) 22:39, 17 November 2022 (UTC)

Osco-Umbrian language code
We should have it, for better catogarization. Could be an etymology-only code to Proto-Italic. See entries like,. The name I believe should be Osco-Umbrian as that seems to be the preferred term in recent literature, with Sabellic being the older alternative, also still in great use (see ngrams). I believe Sabellian is usually the historical rather than linguistical term. The code could be. Catonif (talk) 13:52, 19 November 2022 (UTC)

As an encouragement to add this, I also note bitumen, botulus, rufus, lumbus, Vibius, bos, omentum, popina. I correct my previous statement "etymonly code to Proto Italic" with "language family code", as the parent of Oscan, Umbrian, South Piecene and all the other minor Osco-Umbrian languages. Catonif (talk) 19:50, 1 December 2022 (UTC)

/ɝ/ vs /ɚ/ in GenAm
Given the disagreement about /ʌ/ vs /ə/, I want to bring up /ɝ/ vs /ɚ/. Traditionally, words like nurse, termite and turf are considered to have /ɝ/ in GenAm, but recently some users (including some who want to keep /ʌ/!) have changed /ɝ/ words to merge and unite that vowel to /ɚ/. AFAICT, the sources which support vs oppose /ʌ/ vs /ə/ are the same as support vs oppose /ɝ/ vs /ɚ/. [//dictionary.cambridge.org/us/dictionary/english/termite Cambridge], [//www.collinsdictionary.com/us/dictionary/english/termite Collins], [//www.dictionary.com/browse/termite Dictionary.com], [//www.oxfordlearnersdictionaries.com/us/definition/english/termite Oxford Learner's], and [//www.macmillandictionary.com/us/dictionary/american/termite MacMillan] all have the US pronunciation of termite as /ɝ/~/ɜr/~/ɜːr/, and though Merriam-Webster's non-IPA notation is "ˈtər-ˌmīt", they [//www.merriam-webster.com/assets/mw/static/pdf/help/guide-to-pronunciation.pdf clarify] that this means IPA [ɝ, ɚ]. So if there's no consensus to change e.g. un- to schwa, should we also be undoing the edits that changed e.g. turf to schwa? - -sche (discuss) 23:38, 20 November 2022 (UTC)
 * Although I prefer to differentiate /ʌ/ vs /ə/, I am not sure whether I prefer to use /ɝ/ vs /ɚ/ or just /ɚ/. Phonetically, I can't hear a clear difference between the vowels themselves, which makes /ɚ/ seem like the simpler option, but I do see the argument that /ɝ/ vs /ɚ/ is more consistent. Phonologically, if we do not indicate tertiary stress/stress on syllables coming after the primary stressed syllable, the use of /ɝ/ vs /ɚ/ could be helpful to indicate differences in prosody that correspond to use vs. non-use of t-flapping etc.: e.g., in the word dramaturgy. But I guess unflapped t can occur before /ɚ/ in some cases, e.g. Mediterranean, militaristic, so even in this case /ɝ/ vs /ɚ/ isn't perfectly informative.--Urszag (talk) 00:16, 21 November 2022 (UTC)
 * Given that there's neither a phonemic nor a phonetic difference between stressed and unstressed GA rhotic schwas (with the use of /ɝ/ as well as /ɚ/ being simply a notational convention to differentiate stressed from unstressed rhotic schwas, a convention rendered completely redundant by the use of ˈ and ˌ to indicate primary and non-primary stress, respectively), that the GA realization of both is [ɚ], and that GA rhotic schwas are produced by R-coloring /ə/, whereas /ɝ/ is the R-colored version of a vowel that does not exist in GA in non-rhotic form, I would argue strongly in favor of deprecating /ɝ/ for IPA transcription of GA rhotic schwas and using /ɚ/ to represent both stressed and unstressed rhotic schwas in GA, with /ɝ/ being restricted to use in IPA transcriptions of regional dialects where /ɝ/ (and /ɜ/) actually exist. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:47, 21 November 2022 (UTC)
 * I think their clarification amounts to "ɝ is what you will find in traditional transcriptions of American English". Nicodene (talk) 23:57, 21 November 2022 (UTC)
 * I agree that or  should be used instead of  or  in General American because there's no such phonemic contrast in typical GA accents. There's probably a phonemic contrast in some accents like a stereotypical old-fashioned New York City accent (in which  sounds kind of like oi). But this isn't true for General American and using two separate symbols might lead some people trying to learn General American pronunciation to believe that they need to pronounce the first syllable of murder with an opener vowel than the second syllable, which is not true, or to try to hear a difference that isn't there. — Eru·tuon 01:15, 22 November 2022 (UTC)
 * I dont have much to add to this thread, as opposed to the STRUT/COMMA thread above. I suspect the Americans  who maintain the distinction are mostly nonrhotic speakers, and that within that dialect pool, the distinction might be not in the vowel height but in whether the speaker pronounces the /r/ in their otherwise nonrhotic dialect.  Wikipedia suggests here that at least in NYC, the nonrhotic speakers now pronounce /r/ in words like bird. Whether this can be analyzed as due to stress or not, I dont kinow. — Soap — 12:04, 24 November 2022 (UTC)
 * In a sense the older pronunciation of bird with [əɪ] (coil–curl merger) was still pronouncing /r/, just with a different tongue shape. Geoff Lindsey has a blog post where the second sound file demonstrates the similarity of a bunched r to a palatal y sound. It was pretty eye-opening to me because it shows that the odd nurse vowel pronunciation could have developed just by slight changes in tongue shape from a more typical American r. — Eru·tuon 14:58, 12 December 2022 (UTC)


 * , given diff, perhaps you want to weigh in here? Personally I'd prefer to retain the info about what was originally /ɝ/ vs /ɚ/ (even if only as e.g. "older GenAm"), but my impression of the above discussion and Tea room/2022/September is that people prefer /ɚ/ for modern GenAm. - -sche (discuss) 04:23, 6 December 2022 (UTC)
 * I just reverted a change where the two should be kept distinct. If we use /ɚ/ for both full and reduced vowels, then there is no way to indicate that the second syllable in t-girl has what Webster's calls "secondary stress" except by adding a spurious stress mark, which is not a phonemic IPA transcription and would not be used in the OED, despite GA and RP having the same stress pattern. kwami (talk) 04:33, 6 December 2022 (UTC)
 * IPA has secondary-stress marking. That's what ˌ is for. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 05:41, 6 December 2022 (UTC)
 * Yes, it does. But it's supposed to mark stress, not vowel quality as in Webster's (which isn't IPA anyway). kwami (talk) 05:51, 6 December 2022 (UTC)
 * There is no vowel-quality difference here. The vowels in the first and second syllables of "murdered" are identical except for stress, and the second-syllable vowel of "t-girl" is identical in quality to both.
 * On a closely-related note, User:Kwamikagami's been revert-warring on, claiming that there's some difference in pronunciation between and unstressed , when, in fact, the two are completely homophonous in GA, both being pronounced /fɚ/.  (Maybe there's some distinction between the two in Kwami's personal regiolect, but no such distinction is present in GA itself.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:17, 6 December 2022 (UTC)
 * We make the distinction in this key. That's what this discussion is about. (Especially when we don't mark stress, as with fur, marking vowel reduction becomes important.) If you want to retire the distinction, fine, but you should get consensus here first, and change the key accordingly, and only then change the transcriptions of the articles. You shouldn't edit-war over imposing your POV while the discussion is still in progress, and contradicting the key that users are referred to.
 * Also, MW distinguishes fur from the reduced pronunciation of for, so you're contradicted by at least that source. kwami (talk) 06:23, 6 December 2022 (UTC)
 * MW uses outdated pronunciations for GA, as demonstrated both here and at below. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:50, 6 December 2022 (UTC)
 * And, as for making a distinction between /ɚ/ and "/ɝ/" for GA in our pronunciation key, that's a distinction without a difference (and one that, I suspect, has been kept around in zombie form by dictionaries like MW continuing to parrot outdated pronunciations rather than updating their pronunciation keys to reflect how words are actually pronounced nowadays) - modern-day GA has absolutely no phonemic or phonetic difference in vowel quality between stressed and unstressed rhotic schwas, which is why we're discussing retiring "/ɝ/" from use for GA and why there appears to be a consensus to go ahead and retire "/ɝ/". Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:06, 6 December 2022 (UTC)


 * Re "MW distinguishes fur from the reduced pronunciation of for": do they?? They have "fər, (ˈ)fȯr  Southern also  (ˈ)fär" for for and "ˈfər" for fur, differing only in stress, which is ... what everyone is saying here, that people who write /ɝ/ are just doing so to indicate "the /ɚ/ sound, but with stress", which MW is thus indicating in a more traditional or straightforward manner (by just writing the ər sound, but with stress). (Now, this is also what people use /ʌ/ for — "the /ə/ sound, but with more stress" — so I continue to be intrigued that the sets of people who want to fold /ɝ/ into /ɚ/ and those who want to fold /ʌ/ into /ə/ differ, but nonetheless...) - -sche (discuss) 07:22, 6 December 2022 (UTC)
 * @-sche: (I strongly suspect that the reason for those two sets differing is that not everyone in favor of folding /ɝ/ into /ɚ/ uses stressed /ʌ/ to represent "the /ə/ sound, but with more stress"; a significant subset of the former category [me, for instance] use stressed /ə/ to represent a stressed neutral vowel [i.e., "the /ə/ sound, but with more stress"] and stressed /ʌ/ to represent a stressed vowel that's significantly more open (and usually somewhat more back as well) than the stressed neutral vowel represented by /ə/.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:31, 6 December 2022 (UTC)
 * Could you provide an example of stressed /ə/ vs stressed /ʌ/? I've never seen an analysis of English like that. kwami (talk) 20:52, 7 December 2022 (UTC)
 * Yes, MW distinguishes them by marking 'fur' as having stress. But we don't mark stress, so we need to distinguish the vowels. My point was that MW is a RS that 'fur' and reduced 'for' are not homophones, so we shouldn't claim they are homophones, which is what would happen if we collapsed this distinction. I've gotten into arguments with people here who insist that monosyllables like 'fur' don't have stress, and who revert my attempts to add it. So if the Wikt convention is to not mark monosyllables for stress, we need some other remedy for distinguishing them. kwami (talk) 20:49, 7 December 2022 (UTC)
 * No we don't, because that distinction no longer exists in modern GA, MW's fossil pronunciations notwithstanding. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:59, 7 December 2022 (UTC)
 * I'm with you on wanting to transcribe stress on non-clitic monosyllables, but I don't care enough to argue with everyone else about it. — Eru·tuon 14:10, 8 December 2022 (UTC)

@-sche @Urszag @Nicodene @Erutuon @Soap @Kwamikagami: Do we have a consensus either way yet? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:37, 7 December 2022 (UTC)


 * My vote is to deprecate /ɝ/ for General American. Nicodene (talk) 21:25, 7 December 2022 (UTC)
 * I'm in favor of using /ɚ/ or /əɹ/ in place of /ɝ/ or /ɜɹ/ for General American. — Eru·tuon 14:28, 12 December 2022 (UTC)
 * OK, that makes me, Nicodene, and Erutuon in favor of deprecating /ɝ/ for GA, on the one hand, and Kwami opposed to deprecating /ɝ/ for GA, on the other., wanna weigh in? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 09:08, 16 December 2022 (UTC)
 * I have some thoughts on the matter as well.
 * In Ohio, some other nearby Midwestern states, as well as some states more southwards, /ɜ/ has been clearly established to exist as part of the inventory of vowels. I hardly think that it would be strange for General American speakers from those areas to have /ɝ/ or /ɜɹ/ in a word like her.
 * If [ɝ] or [ɜɹ] is indeed a possibility for General American speakers, discarding it in our transcriptions and uniformly using /ɚ/ or /əɹ/ might not be the best idea. Tharthan (talk) 15:41, 16 December 2022 (UTC)
 * Do you remember where you heard or read about it? I don't know what such a distinction would be like in near-General American accents. I guess older New York City English might have a distinction with the nurse vowel sounding like and the letter vowel being often just a derhoticized schwa, but that wouldn't be the difference in Ohio. Unfortunately on Wikipedia I don't remember seeing anything about the nurse-letter distinction in relation to American English except in the General American English article, which says there isn't one. — Eru·tuon 16:35, 16 December 2022 (UTC)
 * I wasn't suggesting that there exists a conscious distinction for those speakers. I was pointing out that the actual pronunciation of what is being suggested be uniformly transcribed as /ɚ/ may, at least in certain environments, be closer to [ɝ] than to [ɚ] for them.
 * As for where /ɜ/ is used by such speakers, from what I understand it is often their STRUT vowel. I have only personally ever heard it myself—for the STRUT vowel—in the speech of some Southerners. But I have read that it occurs in certain Midwestern states as well, including Ohio.
 * Anecdotally, I recall some years ago seeing a Southern speaker write a stressed "love" (in all capitals) as "lurve". In speech where /ɜ/ has been established to exist as a vowel (such as by being its STRUT vowel), "lurve" would seem to represent at the very least [lɜv], if not [lɝv]. Tharthan (talk) 01:48, 17 December 2022 (UTC)
 * I don't have a strong opinion either way. But if we use just /ɚ/ or /əɹ/ (I think I'd prefer /ɚ/) and not /ɝ/, the issue of how to transcribe unreduced/tertiary-stressed vs. reduced/fully unstressed syllables after the main stress in words like t-girl (kwami's example) or dramaturgy (my example) does not seem to be resolved yet. I don't consider stress to be a fully phonetic characteristic, so I'm not entirely satisfied with kwami's argument that we shouldn't transcribe stress in this position because it isn't phonetically present. Also, there is a contrast between "strong"/unreduced and "weak"/reduced /ɪ/ even though I don't think we transcribe them differently: e.g. I think autism and autist can be pronounced with strong /ɪ/, resulting in no flapping of the preceding /t/, but emphatic has weak /ɪ/ in the final syllable ("emphatic" example taken from John Wells's blog post strong and weak, Friday, 25 March 2011). To me, using distinct symbols for the vowels in these kinds of syllables seems like a more unwieldy way of transcribing this kind of contrast than using stress marks (as in the Oxford English Dictionary's American English transcriptions: "/ˈdrɑməˌtərdʒ/"). But if there is a consensus for kwami's position that we should not transcribe any stress on these syllables, then I can understand why /ɝ/ could be considered useful. I would like to see further discussion of this issue so we can get an answer to that at the same time.--Urszag (talk) 16:53, 22 December 2022 (UTC)
 * Anecdotally, I recall some years ago seeing a Southern speaker write a stressed "love" (in all capitals) as "lurve". In speech where /ɜ/ has been established to exist as a vowel (such as by being its STRUT vowel), "lurve" would seem to represent at the very least [lɜv], if not [lɝv]. Tharthan (talk) 01:48, 17 December 2022 (UTC)
 * I don't have a strong opinion either way. But if we use just /ɚ/ or /əɹ/ (I think I'd prefer /ɚ/) and not /ɝ/, the issue of how to transcribe unreduced/tertiary-stressed vs. reduced/fully unstressed syllables after the main stress in words like t-girl (kwami's example) or dramaturgy (my example) does not seem to be resolved yet. I don't consider stress to be a fully phonetic characteristic, so I'm not entirely satisfied with kwami's argument that we shouldn't transcribe stress in this position because it isn't phonetically present. Also, there is a contrast between "strong"/unreduced and "weak"/reduced /ɪ/ even though I don't think we transcribe them differently: e.g. I think autism and autist can be pronounced with strong /ɪ/, resulting in no flapping of the preceding /t/, but emphatic has weak /ɪ/ in the final syllable ("emphatic" example taken from John Wells's blog post strong and weak, Friday, 25 March 2011). To me, using distinct symbols for the vowels in these kinds of syllables seems like a more unwieldy way of transcribing this kind of contrast than using stress marks (as in the Oxford English Dictionary's American English transcriptions: "/ˈdrɑməˌtərdʒ/"). But if there is a consensus for kwami's position that we should not transcribe any stress on these syllables, then I can understand why /ɝ/ could be considered useful. I would like to see further discussion of this issue so we can get an answer to that at the same time.--Urszag (talk) 16:53, 22 December 2022 (UTC)

/ɜ/ vs /ə/ in RP / British
I notice that Appendix:English pronunciation already had (for some time now) a note about bird/nurse words, "For RP, /əː/ is sometimes used as an alternative to /ɜː/—for example, in dictionaries of the Oxford University Press." So should we be merging these not only in GenAm but also in RP / British? Any British editors want to weigh in? Does a word like murdered have two different vowels, or one vowel that just differs by length? - -sche (discuss) 00:57, 25 November 2022 (UTC)


 * As something resembling a phoneme in British English, I've only noticed length on consonants. I think the height difference is variable, though it should be noted that the second vowel is a schwa, and thus quite variable in itself.  The first vowel is long; British English fundamentally retains length, with roughly three length phones in closed syllable, as short before nominally voiced and long before nominally voiceless are about the same.  The qualify of the first vowel is what IPA writes as [ɘ] (close mid central), though most linguists stick with "[ɜ]", which has  been declared by the IPA to be open mid central. --RichardW57 (talk) 14:56, 5 December 2022 (UTC)
 * Formally, this is similar to the traditional use of "ʌ" for what is now [ɐ], the STRUT vowel. --RichardW57 (talk) 14:56, 5 December 2022 (UTC)


 * In my pronunciation of murdered the two vowels are subtly different beyond just length, though if I force myself to pronounce both as /ə(ː)/ it doesn't sound particularly unusual. The first one is somewhere on the ɜ~ə spectrum, [ɘ] is probably right. —Al-Muqanna المقنع (talk) 15:05, 5 December 2022 (UTC)
 * Is the difference something that cannot be explained by stress? Nicodene (talk) 15:14, 5 December 2022 (UTC)
 * It is transcribed that way in the American tradition, at least with Merriam-Webster, but that's not IPA. E.g. t-girl in the thread above. The distinction typically comes up in compound words, where one of the elements loses its stress but retains a full vowel. So in t-girl the second (unstressed) vowel is the same as the first (stressed) vowel of murdered, rather than as the second. If we're going to follow the OED in using stress marks for stress, then I believe we need different symbols here. kwami (talk) 04:41, 6 December 2022 (UTC)
 * Could you please clearly explain what you're trying to say? IPA does use secondary-stress marking (that's what ˌ is for), and in GA itself the two vowels of murdered and the second-syllable vowel of t-girl are completely identicala except for stress (certain regiolects do have a distinction between those vowels that goes beyond stress, but that's a feature of those specific regiolects, not of GA). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 05:41, 6 December 2022 (UTC)
 * There is no distinction in stress. Both murdered and t-girl are stressed on the first syllable. Webster's would mark the second syllable of t-girl as having "secondary stress", but that's just how they distinguish full vowels from reduced, since they don't have enough symbols for all the vowels. It's not stress, so we shouldn't mark it with the IPA stress mark. (English doesn't have secondary lexical stress, but it's common to transcribe all but the last stressed syllable as having secondary stress, because the last stressed syllable gets additional phrasal stress when the word is pronounced in isolation.) kwami (talk) 05:46, 6 December 2022 (UTC)
 * Hm? They [//www.merriam-webster.com/assets/mw/static/pdf/help/guide-to-pronunciation.pdf say] they use the secondary stress mark to mark secondary stress, and when I spot-check now, the things they mark with secondary stress seem to be things that bear some secondary stress. You say stress marks are "just how they distinguish full vowels from reduced, since they don't have enough symbols for all the vowels", but they mark secondary stress also on words like girlfriend and battlefield where the symbols they notate the second syllables with already indicate that the vowels are not reduced. - -sche (discuss) 07:49, 6 December 2022 (UTC)
 * Yes, they interpret it as stress, but that's contradicted by phoneticians as well as by other dictionaries. That pattern is typical of compound words. But according to Ladefoged (who should know the IPA) and the OED, those words don't have secondary stress. kwami (talk) 08:00, 6 December 2022 (UTC)
 * Unless they specifically state 'these words don't have secondary stress', there is no reason to interpret their transcriptions that way. Secondary stress is often simply left untranscribed, especially on the phonemic level. Nicodene (talk) 15:09, 6 December 2022 (UTC)
 * He actually says that no words in English have secondary stress, so that would include these words. kwami (talk) 20:59, 7 December 2022 (UTC)
 * The phonetic correlates of stress are often not simple to interpret. John Wells made a blog post suggesting that in his view, it's not the case that secondary stress is impossible after a primary-stressed syllable, it's just that it's optional, and he indicates that there are different traditions about how to transcribe stress in English. Speaking of the example irritating, Wells writes "Actual rhythmic beats following the main word stress accent are all pretty optional, which is why the British tradition is not to show any secondary stress in words like this: ˈɪrɪteɪtɪŋ, not *ˈɪrɪˌteɪtɪŋ. The alternative tradition, usually followed in the States and (for example) Japan, is to recognize a secondary stress on the penultimate, írritàting." irritating hamburgers (John Wells’s phonetic blog, 29 September 2009).--Urszag (talk) 15:18, 6 December 2022 (UTC)
 * According to Ladefoged, in GA that's a matter of vowel reduction, not stress. He was unable to find any phonetic indication of secondary stress: after primary stress, those are just un-reduced vowels, and before primary stress, they're just primary stress that doesn't have phrase-final intonation. (The difference between primary and secondary stress disappears when you put the word in a phrase so it's no longer phrase-final.) kwami (talk) 21:03, 7 December 2022 (UTC)
 * Out of curiosity I recorded myself and cropped the vowels: they are definitely a different quality though whether that's just because my stressed articulation of /ə/ is more open is past my pay grade. —Al-Muqanna المقنع (talk) 14:22, 6 December 2022 (UTC)

GenAm vs US in Template:accent
This was discussed years ago, but maybe with new editors and interest we can decide: what is the difference between a bare US accent label (as in doggy), and GenAm (as in body)? Do we want to standardize on one and make the other an alias, or establish an intended distinction, e.g. using "US" as an (empty? diaphonemic?) element atop a list of "GenAm; NYC; Southern US" etc pronunciations? (And "UK" atop a list of "RP, SSB, Geordie, Wales" etc?) I'm not talking about the use of US as part of a label like US which, while I think it's unnecessary — just say Southern US, or if you don't know which regions, say regional US! — is at least a different beast. - -sche (discuss) 23:06, 22 November 2022 (UTC)


 * I like the idea of using 'US' and 'UK' as empty geographic headers. Nicodene (talk) 23:38, 22 November 2022 (UTC)
 * As a Canadian, I very much don't. Canada is not part of the US, but Canadian English is part of GenAm. It would be silly to add a separate "Canada" label to almost every entry and inaccurate to just leave it as "US". Andrew Sheedy (talk) 04:49, 23 November 2022 (UTC)
 * This is the first time I've heard GenAm defined as explicitly including Canadian English. I would have previously said that Canadian English is strictly speaking something else, just similar enough to be mostly covered by GenAm transcriptions. It looks like General American English does suggest Canadian English might be included. Might it be clearer to label entries that are meant to cover both Canadian and US accents as "North American" rather than "General American"?--Urszag (talk) 06:01, 23 November 2022 (UTC)
 * I would be fine with that. I could understand someone wanted to restrict GenAm to a more specific accent or set of accents, but the fact stands that Canadian pronunciation is a subcategory of the class of accents found in the US. There's much more of a difference between different accents in the UK than between any variety of Canadian English and standard American English. Andrew Sheedy (talk) 15:59, 23 November 2022 (UTC)
 * We can simply make the header 'North America'. Nicodene (talk) 18:01, 23 November 2022 (UTC)
 * My understanding is that GenAm is a specific accent whereas US refers to all accents in the states. Vininn126 (talk) 17:52, 23 November 2022 (UTC)
 * No, General American is a continuum of accents. It is not a single unified accent. Tharthan (talk) 22:39, 23 November 2022 (UTC)
 * What is a label that "refers to all accents in the states" useful for? Surely if I put US en in the entry bird, this—although technically true, these are pronunciations found in the US—is unhelpful because I should be labelling them by where in the US each one is found. Should US try to be a diaphonemic representation of all US accents? But this is probably impossible/inadvisable, as Erutuon says. Should it be an empty placeholder? But I don't expect casual users to understand why there's a blank line, they'll probably keep using it for GenAm pronunciations as at present. Should be an alias that displays the same as GenAm? IMO that would be most maintainable. - -sche (discuss) 02:18, 25 November 2022 (UTC)


 * I prefer the use of as a header and not a label for individual phonemic transcriptions, because we do not have a diaphonemic transcription system that accommodates all the diverse pronunciations in the US (nor should we, because it would be way too confusing) and  is sometimes placed in front of transcriptions that definitely do not describe some pronunciations in the US. For instance, the entry for  gives two incompatible US pronunciations,  and, but only the first is labeled as US; it is actually intended to represent the General American pronunciation and should be labeled with  or .  (Granted the transcription of glory will change to  with the discussion further up this page, but still there will be two US pronunciations in entries like  or , and no matter what transcription system we could devise, it would never accommodate all US accents without having way too many hard-to-remember symbols.) There are some words in which the phonemes in the different US accents probably don't need different symbols, but it's better to err on the side of being specific because most people can't reliably determine which words those are (even I'd be unsure if we really tried to transcribe the full variety of accents). — Eru·tuon 22:45, 23 November 2022 (UTC)
 * On the subject of headers, perhaps we can have three to cover the main native zones, namely British Isles (UK + Ireland), North America (US + Canada), and Oceania (Australia + New Zealand). Nicodene (talk) 05:33, 24 November 2022 (UTC)


 * Regarding the inclusion of Canadian in GenAm: while various sources (summarized on Wikipedia) do say Canadian resembles GenAm, it seems like most people do interpret "General American" (when named that way) as a US thing; Appendix:English pronunciation not only treats GenAm and Canadian as difference accents but specifies quite a few differences; and entries (e.g. orange, out) specify some pronunciations as Canadian-only and others as GenAm-only. So, if we're intending to subsume Canadian into a "General North American", it might be necessary to rename along those lines to make the scope clear. But since most sources only describe US/GenAm and UK/British/RP pronunciations, if we just start using sources that describe US-GenAm to support our pronouncements about North American, it'd seem kinda fishy (source hijacking/falsification). - -sche (discuss) 02:18, 25 November 2022 (UTC)
 * With regard to out, I think the current treatment may not actually be ideal—so-called "Canadian raising" is definitely not confined to Canadian speakers, but occurs for a non-negligible amount of US speakers as well. However, I think this phonemic (or incipiently phonemic) split is poorly documented, so I'm not sure how good a job Wiktionary can do at showing information about it.--Urszag (talk) 09:38, 25 November 2022 (UTC)

My preference, FWIW, would be: make "US" an alias of "GenAm" so writing US produces "", and nest NYC, Boston, Southern US, California, etc underneath GenAm (and let Canadian keep its own line whenever it's different from GenAm, otherwise combine labels when the accents have the same pronunciation, like we already do: GenAm). The UK situation may need to be handled differently (e.g. not nesting things under RP) since so few British people use RP. - -sche (discuss) 02:19, 25 November 2022 (UTC)
 * I'm in favour of this. I don't think it's necessary to specify "Canadian" alongside "GenAm" when they're the same, provided "General American" is what readers see. Andrew Sheedy (talk) 06:00, 25 November 2022 (UTC)


 * I was not aware that the way that we intend to proceed regarding Received Pronunciation had been entirely decided yet (maybe I have missed something, but I don't recall seeing anything agreed to on that), but if regional British pronunciations are not going to placed beneath Received Pronunciation pronunciations, then I don't think that regional American pronunciations ought to be placed beneath General American pronunciations either.
 * To be clear where I stand on this, rather than making the "US" label an alias for General American, I think that we ought to either continue having our primary US pronunciations be focused on General American pronunciations, and try to avoid using the US label unless perhaps it has some qualifier (and even then, as -sche pointed out, it isn't necessary: regional pronunciations can simply be listed by the region) or we ought to have "US" be some sort of empty header, as Nicodene suggested. Tharthan (talk) 16:32, 25 November 2022 (UTC)

Japanese lemma or non-lemma
Entries like そうきん or ぬいぐるみ are in Category:Japanese non-lemma forms. But given that English oeconomy and naïve are in Category:English lemmas, should those Japanese entries be moved to Category:Japanese lemmas? -- Huhu9001 (talk) 01:53, 23 November 2022 (UTC)
 * Both of the Japanese entries are soft redirects. I don't think redirection entries count as lemmata.  ‑‑ Eiríkr Útlendi │Tala við mig 19:44, 23 November 2022 (UTC)
 * Hiragana entries that don't use ja-see aren't put in the non-lemma category, though. It's also inconsistent with the way alternative spellings are handled in other languages. Binarystep (talk) 22:42, 23 November 2022 (UTC)
 * Japanese writing has wrinkles that alphabetic systems don't have.  can be written as, but  is more of a phonetic guide than an "alternative spelling", not quite the same as the way that  is an alternative spelling for.
 * This might also have to do with different assumptions about what "lemma" means. My understanding is that "the lemma is the 'main' form of the word at Wiktionary for purposes of locating the definitions and other details".  How do you define "lemma", specifically in relation to Wiktionary data structure?
 * Pinging as a couple other JA editors off the top of my head.  ‑‑ Eiríkr Útlendi │Tala við mig 23:39, 23 November 2022 (UTC)
 * I really don't want to say so, but it is definitely not a good sign that some Wiktionarians are even trying to invent their new law that demotes kana from a legitimate writing system to a mere "phonetic guide". Do they think kanas are equivalent to romaji or Chinese pinyin? Had they ever paid attention to what actual Japanese people write would they immediately notice "ねこ" happens just as common as "猫" in casual texts, such as in social media. And all of the sudden they become "phonetic guides", presumably to show that Japanese are all idiots to be taught again and again how to pronounce "cat"? -- Huhu9001 (talk) 00:45, 24 November 2022 (UTC)
 * ": Japanese writing has wrinkles that alphabetic systems don't have. 猫 (neko, “cat”) can be written as ねこ, but ねこ is more of a phonetic guide than an 'alternative spelling', not quite the same as the way that nite is an alternative spelling for night."
 * How isn't it an alternate spelling? Hiragana spellings are actually used in running text (though not always very often), which makes them actual words rather than a mere spelling guide.
 * "This might also have to do with different assumptions about what 'lemma' means. My understanding is that 'the lemma is the 'main' form of the word at Wiktionary for purposes of locating the definitions and other details'. How do you define 'lemma', specifically in relation to Wiktionary data structure?"
 * I define a lemma as the base form of a word, without any inflections. Binarystep (talk) 05:19, 24 November 2022 (UTC)
 * @Huhu9001, @Binarystep, @Eirikr, @Fytcha: In purely Wiktionary terms, "lemma" is the non-inflected word. So 猫、ねこ、ネコ are all equally lemmas, just "Alternative forms". is a "Romanization", and as such it should be treated (see for instance ). Akkadian works exactly the same as Japanese, but fortunately scholars use the Romanization as main entry on dictionaries, so I can just give all the alternative spellings and Sumerograms in a table inside the entry page (see for example . The issue here is that Japanese editors inexplicably, and very unwisely, decided to go against what's done in every single existing Japanese dictionary, and instead of lemmatising the hiragana spelling, they went for a random mix of whatever they thought the "most common" spelling would be. The truth is that Japanese entries should be in hiragana (for Japanese and Sino-Japanese words) and katakana (for foreign words). All other spellings should have been given in the entry page. For instance:  should be the main entry, and,  and  should have been given in the entry page under something like "Common spellings".
 * All your troubles start there. The day bad decisions were made for Japanese on Wiktionary. — Sartma 【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】 09:22, 1 December 2022 (UTC)
 * I agree with what you say about lemmas but I don't see the advantage of having the main entry always at the kana lemma. Minimizing the number of clicks one has to perform to get to the information is also something we should strive for. Almost nobody is going to look up とりあつかい (1 page view in the last 30 days) in contrast to 取り扱い (14 page views in the last 30 days). Ideally, we would in some way mirror what jpdb is doing with alt forms but I don't think that's ever going to happen (primarily for technical reasons). &mdash; Fytcha〈 T | L | C 〉 16:04, 1 December 2022 (UTC)
 * @Fytcha: The advantage would be that you will always get to the lemma entry straight away, while pretty much solving the issue of "what is a lemma". It's well worth the price of one click. — Sartma 【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】 22:36, 1 December 2022 (UTC)


 * Much of the data structure you describe was inherited from before I became an active editor here. The only real changes I'm aware of from that initial base state are:
 * More effort at deduplication. Soft-redirect entries previously had more information provided in the interests of usability, but this was often manually copied and thus a maintenance challenge.
 * Changes to basic ideas about where to put 'main' entries. Initially, kanji renderings were preferred for almost everything.  For native Japonic terms, that can get quite complex: one kanji may have multiple Japonic terms, as we see at, or conversely one Japonic term might have multiple kanji spellings, as we see at .  After various discussions, the rough consensus emerged at that time to locate the 'main' entries for Japonic terms (i.e. kun'yomi) at the kana spellings, since these are more closely tied together, and to locate the 'main' entries for Sinic terms (i.e. on'yomi) at the kanji spellings, for similar reasons.
 * FWIW, the JA Wiktionary seems to put the 'main' entries for Japonic terms at the kana spellings, and for Sinic terms at the kanji spellings. Compare their Japonic-term entry at ja:きく and their Sinic-term entry at ja:高校.  One good argument for not using kana spellings as the 'main' entries for Sinic terms is the large number of homophones, which become kana homographs.  Consider ja:こうこう, or our not-quite-as-complete entry at こうこう.  ‑‑ Eiríkr Útlendi │Tala við mig 23:16, 2 December 2022 (UTC)
 * Have you considered bringing this to the ballot? I have to admit, I have also grown increasingly frustrated with some aspects of the organization of Japanese on Wiktionary. Where we place the main entry oftentimes isn't even related to the usage frequency of the different spellings. &mdash; Fytcha〈 T | L | C 〉 13:11, 6 January 2023 (UTC)
 * @Fytcha: I don't have enough faith in Wiktionary Japanese editors to try, sorry... But you'll have my support if you do! 22:17, 6 January 2023 (UTC) — Sartma 【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】 22:17, 6 January 2023 (UTC)

More analogies, Chinese 后天 and 儿童 are Category:Chinese lemmas. 兒童 is both Category:Korean lemmas and Category:Vietnamese lemmas. -- Huhu9001 (talk) 00:43, 24 November 2022 (UTC)


 * I invent nothing. Perhaps I expressed myself poorly?  I'll try restating.
 * In light of at Wikipedia and our own entry at, and your comment, we seem to have two senses at play here: 1) "the canonical form of an inflected word; i.e., the form usually found as the headword in a dictionary," and 2) "the base form of a word, without any inflections".  While neither 🇨🇬 nor  are inflected, the "canonical" or "headword" form we have chosen here at Wiktionary is , so the full entry for the intersection of the pronunciation  and the kanji spelling  only exists on the  page.
 * As discussed separately in other threads multiple times in years past, electronic Japanese dictionaries usually support lookup by both kana and kanji (and sometimes even romaji), returning all the records that match the input string. Wiktionary's technological underpinnings do not allow this (or at least, not as we have the site currently organized), so we have had to choose one form alone for the headword, and have the other forms redirect the reader to that headword entry.
 * Perhaps a different question I should pose is, what is the use case for having the category Category:Japanese_lemmas? For that use case, however it is defined, does it make sense to include multiple different renderings of the same word (kanji, kana, romaji), even if the full entry only exists at one of these?  For our  example, is there value in categorizing all three forms --, , and  -- as "lemma entries", when the full entry only exists at ?  ‑‑ Eiríkr Útlendi │Tala við mig 07:57, 24 November 2022 (UTC)
 * "* In light of Lemma (morphology) at Wikipedia and our own entry at lemma, and your comment, we seem to have two senses at play here: 1) 'the canonical form of an inflected word; i.e., the form usually found as the headword in a dictionary,' and 2) 'the base form of a word, without any inflections'. While neither Japanese 猫 nor ねこ are inflected, the 'canonical' or 'headword' form we have chosen here at Wiktionary is 猫, so the full entry for the intersection of the pronunciation ねこ (neko) and the kanji spelling 猫 only exists on the 猫 page."
 * My definition is more consistent with Wiktionary policy. Currently,, , and are categorized as lemmas, despite their "canonical" spellings being , , and . Simplified Chinese entries like  still go in Category:Chinese lemmas, even though they only serve to redirect to the traditional forms (in this case ). The way we handle Japanese, on the other hand, is unusual and contradictory.
 * "As discussed separately in other threads multiple times in years past, electronic Japanese dictionaries usually support lookup by both kana and kanji (and sometimes even romaji), returning all the records that match the input string. Wiktionary's technological underpinnings do not allow this (or at least, not as we have the site currently organized), so we have had to choose one form alone for the headword, and have the other forms redirect the reader to that headword entry."
 * Japanese is far from the only language where we use alternative spellings as soft redirects, but it *is* the only one (as far as I know) where we don't categorize them as lemmas.
 * "Perhaps a different question I should pose is, what is the use case for having the category Category:Japanese_lemmas? For that use case, however it is defined, does it make sense to include multiple different renderings of the same word (kanji, kana, romaji), even if the full entry only exists at one of these? For our 猫 example, is there value in categorizing all three forms -- 猫, ねこ, and neko -- as 'lemma entries', when the full entry only exists at 猫?"
 * Romaji spellings aren't lemmas. They're normally not used in running text (I've only seen one exception, which used romanized Japanese to look like fake English text), and are completely subjective, since a number of popular romanization systems exist. For instance, we romanize as, but I've been on sites that would use  or  instead, and you can even find , , or  in some places. Binarystep (talk) 20:26, 24 November 2022 (UTC)
 * I see a key difference between how multiple alphabetical spellings for a single word contrast, and how kanji vs. kana spellings contrast.
 * By way of example, 🇨🇬 and both refer to the same basic concept (as the opposite of ).  However, the latter spelling  has specific associations due to the spelling, having to do with social register and context -- aspects that we can, and should, describe in our entry as lexically relevant information.  Likewise,  and  refer to the same concept, and the spelling difference indicates something we can talk about lexically (in this case, regional differences).
 * However, not every English word has multiple spellings.
 * In Japanese, every word that has a kanji spelling also has a kana spelling. The existence of a kana spelling for a given word that is usually spelled in kanji is completely expected and unsurprising, and unworthy of anything beyond simply noting this in the main entry.  We have only created any such kana spelling entries at all due to the technical shortcomings of our platform.
 * The only places I can think of where using "alternative spelling" for a kana entry makes any sense is where the kana spelling deviates from the spoken pronunciation, due to historical sound shifts in "mainstream" Tokyo Japanese. This could include instances like, understandable as a non-standard and proscribed pronunciation-based kana spelling for .  (Nota bene: some dialects of Japanese distinguish between  and  in speech, the former as  and the latter as , and speakers of these lects would never confuse these two kana renderings.  See also Yotsugana.)
 * So unlike 🇨🇬 and, or and , there is nothing we can lexically say about the contrast between 🇨🇬 and .  We could talk about how kanji and kana are used in Japanese writing -- but that seems like a topic more appropriate for an encyclopedia than a dictionary, and indeed there is such content at Japanese_writing_system.  The use of the term  there is important: while  and  are two different words that both use the same script,  and  are two different scripts that both spell the same word.  This phenomenon in English only affects certain words, and involves lexically important differences between these distinct words.  Meanwhile, the phenomenon in Japanese occurs across the entirety of the written language, and involves a difference in script used to spell the same words.
 * Looking at Chinese, the contrast between simplified and traditional is similar -- this is a phenomenon that occurs across the entirety of the written language, and involves a difference in script used to spell the same words. Electronic dictionaries for Chinese (that I'm familiar with) treat simplified and traditional as the same thing: if you look up traditional-script  or simplified-script, you get the same information.  Depending on the dictionary, you might get a list of derived terms mirrored in traditional and simplified (example here).  The difference between  and  is not a difference between two distinct words, but rather a difference between two distinct scripts.
 * My understanding of with regard to Wiktionary entries is that the "lemma" is the "address" where we put the main entry for that word.  The "lemma" for  is .  We exclude  from the list of English lemmata, and the expectation is that users should go to the lemma entry for the full information about that word.  The entry at  is essentially a soft redirect, pointing the reader to the lemma entry, and it is it is accordingly treated as a non-lemma.
 * Along those lines, I and the other Japanese editors have understood that the "lemma" for Japanese terms should be wherever the full entry goes. So the "lemma" for the Japanese word neko ("cat") is at, and  is a soft-redirect entry, and it is accordingly treated as a non-lemma.
 * (Side note: Frankly, I don't think we (the Japanese editors here) have paid much attention to the lemma categories, and many of our soft-redirect entries using older infrastructure like the basic or  templates categorize the entries by default in Category:Japanese lemmas.  Newer infrastructure like  or  was apparently created with more awareness of categorization, and these templates categorize entries in Category:Japanese non-lemma forms instead.)
 * If "lemma" for purposes of categorizing Japanese terms should mean "the canonical form of the headword where the main entry is located", then any soft-redirect entry should be in Category:Japanese non-lemma forms.
 * If instead "lemma" should mean something more like "the canonical form of the headword written in any script that a native speaker would use", then all kana and kanji entries in Japanese (and arguably some romaji and Arabic numeral entries) should be in Category:Japanese lemmas. However, that also makes the category useless for purposes of identifying the "main" entries.
 * ‑‑ Eiríkr Útlendi │Tala við mig 20:20, 28 November 2022 (UTC)
 * "By way of example, English night and nite both refer to the same basic concept (as the opposite of day). However, the latter spelling nite has specific associations due to the spelling, having to do with social register and context -- aspects that we can, and should, describe in our entry as lexically relevant information. Likewise, color and colour refer to the same concept, and the spelling difference indicates something we can talk about lexically (in this case, regional differences)."
 * Even in English, not all alternate spellings have specific connotations. Just look at all the various ways to spell, for instance.
 * As for the rest of your comment, everything you've mentioned isn't unique to Japanese. I'm curious what you think of our treatment of Serbo-Croatian, a language where every word can be written in Cyrillic or Latin letters. There's certainly nothing unique or "interesting" about the fact that can also be written as, yet both are in Category:Serbo-Croatian lemmas. Much like kana and kanji, these are nothing more than different scripts spelling the same word. Does that mean we should de-lemmatize half of our Serbo-Croatian entries?
 * "My understanding of lemma with regard to Wiktionary entries is that the 'lemma' is the 'address' where we put the main entry for that word. The 'lemma' for goes is go. We exclude goes from the list of English lemmata, and the expectation is that users should go to the lemma entry for the full information about that word. The entry at goes is essentially a soft redirect, pointing the reader to the lemma entry, and it is it is accordingly treated as a non-lemma."
 * is an inflected form of, not an alternate spelling, so it's not really relevant to this discussion. Binarystep (talk) 22:02, 28 November 2022 (UTC)
 * , goes was intended purely as an example of a non-lemma entry. The kind of script difference that exists in Japanese does not exist in English, so there is no direct parallel.
 * Re: Serbo-Croatian, if both Cyrillic and Latin were used in mixed texts, and readers and writers would be expected to treat them identically, then yes, I would strongly be in favor of choosing one form for the lemmata, and having the other form reduced to soft-redirection stubs that would be treated as non-lemma entries. More ideally, if the Wiktionary platform could be made to support this approach, the user could enter either form and land on the same page, showing both forms and providing a unified entry.
 * However, rather that (so far as I understand it) Serbo-Croatian texts are written either in Cyrillic or Latin, these two are not really interchangeable. Moreover, it looks like we have full entries at both the Cyrillic and Latin spellings, neither is a soft-redirect to the other, and since both are full entries, both are sensibly treated as lemmata.  The treatment of Serbo-Croatian is not really comparable to how Japanese works.  ‑‑ Eiríkr Útlendi │Tala við mig 22:41, 28 November 2022 (UTC)
 * Can't say it is not funny to see a Japanese exceptionalism movement emerging from English Wiktionary. -- Huhu9001 (talk) 01:17, 30 November 2022 (UTC)
 * Can't say as I'm advocating for Japanese exceptionalism either. My position hinges entirely on how we define "lemma" for purposes of Wiktionary entry categorization.  This could apply just as well to Chinese: those entries that are soft redirects, such as 🇨🇬 (soft redirect to ), should presumably be handled similarly.  ‑‑ Eiríkr Útlendi │Tala við mig 04:22, 30 November 2022 (UTC)
 * I don't like the language used against you in this thread and I distance myself from that, but I agree with the substance brought forth by the other commenters: I understand lemma and non-lemma-ness to be solely defined on morphological grounds (i.e. a term is a lemma iff it abides by the morphological constraints imposed on lemmas). Lemma-ness doesn't have anything to do with main-entry-ness: 可愛い, かわいい and カワイイ are all equally much lemmas while 可愛くて, かわいくて and カワイクテ are all equally much non-lemmas, even though only one of the lemmas should have a main entry (i.e. more than a soft redirect). I also agree that this is exactly how it is handled for all other languages on Wiktionary and that Japanese is the sore thumb. &mdash; Fytcha〈 T | L | C 〉 02:26, 30 November 2022 (UTC)
 * As I understand your reply, your position is that "lemma" should mean "'main' or uninflected form of a word, regardless of script or whether that particular entry is our 'main' entry for that term".
 * With that in mind, why would you lemmatize at, not a common form for this word, and not at , also not a common form for this word?  What distinction do you draw?
 * Definitions for "lemma" that I've seen consistently describe how a term is indexed in a dictionary. This sounds to me like the "address" at which we expect to find the main entry.  Per w:Lemma, "lemma refers to the particular form that is chosen by convention to represent the lexeme."  For the Japanese word meaning "cat", for instance, the Wiktionary editors working on Japanese have chosen the form  to represent this lexeme, and we have treated this as the 'main' entry.  The Japanese editors have, so far as I'm aware, endeavored to avoid data duplication and instead consolidate the 'main' entries for each word at one specific script rendering, using templates like  at the other renderings to direct users to the 'main' entries.
 * If instead "lemma" just means "uninflected form of a word, regardless of script", what use are the " lemmas" categories for our users?  This artificially inflates the lemma counts for CJKV languages -- all Korean hanja renderings have corresponding hangul renderings; all Chinese simplified entries have corresponding traditional entries (albeint sometimes identical), all Vietnamese Hán tự entries have corresponding Vietnamese alphabet entries, and all Japanese kanji entries have corresponding hiragana, katakana, and possibly even Latin alphabet and Arabic numeral entries.  For these multi-script written languages, what word or category should we use instead of "lemma" for users to find the 'main' entries?  Serious questions.
 * One partial parallel with English has occurred to me. Some time back, we had discussions and votes on the handling of Middle and Old English words using the letter Ƿ or wynn, such as at Votes/2020-09/Removing Old English entries with wynns and Votes/2020-12/Bringing back wynn entries.  The outcome of that was that all entries using ⟨Ƿ⟩ spellings were deleted and redirected (via some software configuration rather than using  ) to the equivalent spellings using ⟨W⟩ instead, such as at .  Meanwhile, words using the letter Þ or thorn still exist, such as at  or, and apparently these are still categorized as lemma entries, in contrast to the complete removal of ⟨Ƿ⟩ entries.
 * For Middle and Old English, what utility is there in classifying an entry like 🇨🇬 as a "lemma" when the bulk of usable entry information is located at 🇨🇬 instead? And if we treat the  spelling as a "lemma", why do we not do the same for spellings like ?
 * I ask these as serious and honest questions. I do not understand how we (the wider Wiktionary editing community) intend for "lemma" categories to be used.  ‑‑ Eiríkr Útlendi │Tala við mig 18:46, 30 November 2022 (UTC)
 * your position is that "lemma" should mean "'main' or uninflected form of a word, regardless of script or whether that particular entry is our 'main' entry for that term". Yes, this is a fair summary of my position.
 * With that in mind, why would you lemmatize at, not a common form for this word, and not at, also not a common form for this word? If we accept what I've laid out thus far, there's not really a question as to why should be treated as a lemma. To answer the part about romaji, I draw a distinction because romaji is not a native script of Japanese (much like Pinyin is not a native script of Mandarin). (Even if you could attest  three times in native Japanese books, I would still be opposed to the inclusion of  as anything more than a romanization soft-redirect, just like I'd be opposed to the inclusion of spurious but perhaps attested joke spellings of English words in non-native scripts .) It all hinges on what is considered to be a predominant script of a language. Kanji and kana are, romaji isn't. The only reason why we should include romaji entries at all is because they're extremely useful for people who don't have a Japanese keyboard layout installed and because the WikiMedia software doesn't allow for a better solution (contrast this with jisho.org where you can enter romaji in the search bar even though the dictionary itself doesn't contain romaji entries).
 * Per w:Lemma, "lemma refers to the particular form that is chosen by convention to represent the lexeme." It seems that this is taken from Lemma (morphology) and reading that page makes clear that they actually agree with my position, that they also understand lemma to be defined on morphological grounds (they even explicitly lay out some of these morphological constraints that I was talking about). The word form in your quote was clearly meant to refer to morphological forms, not orthographic forms.
 * The Japanese editors have, so far as I'm aware, endeavored to avoid data duplication and instead consolidate the 'main' entries for each word at one specific script rendering, using templates like at the other renderings to direct users to the 'main' entries. I think you are still confounding lemma-ness with main-entry-ness. We all endeavor to avoid redundancy (ok, let's not talk about Serbo-Croatian) and we use soft-redirect templates in every language, but that has nothing to do with lemma-ness. is a soft-redirect because we don't want to duplicate the entirety of the content found in  but we still correctly consider it a lemma and categorize it as such. Even if we start to correctly categorize  as a lemma, it would still only ever be a soft-redirect, as is . Nothing changes in terms of duplication.
 * If instead "lemma" just means "uninflected form of a word, regardless of script", what use are the " lemmas" categories for our users? I think this is not really relevant to the issue at hand. If we come to agree what lemma means and if we further agree that we should employ one consistent definition of lemma all across Wiktionary (which we currently do, bar Japanese), the utility argument does not really change anything. I would not be opposed if we additionally wanted to create a category for a subset of lemmas, namely precisely those lemmas that contain at least one non-redirecting definition. However, this is probably not really implementable using only templates and modules and would thus require some massive botting just for upkeep which comes with its own set of problems. I cannot give you a more direct answer to your question of what the use is because I use Wiktionary in some specific ways that lead me to not have any use at all for both the present lemma categories as well as the lemma subset category that you envision.
 * I want to present two additional points to consider. Not only is Wiktionary's Japanese inconsistent in its use of the lemma categories compared to the rest of Wiktionary, it is also internally inconsistent:
 * ja-see adds the non-lemma category to an entry while ja-def doesn't. See for instance this entry: . Both templates are used in the exact same way, to provide a redirection from one spelling to another.
 * ja-see actually places the entries it is used in in the corresponding part of speech category: is categorized as Category:Japanese nouns. The issue with that is that these part of speech categories are subcategories of Category:Japanese lemmas. Transitively speaking,  is both a lemma and a non-lemma.
 * Everything I've brought up so far is solved by simply making ja-see categorize into Category:Japanese lemmas instead of Category:Japanese non-lemma forms. &mdash; Fytcha〈 T | L | C 〉 15:54, 1 December 2022 (UTC)
 * Everything I've brought up so far is solved by simply making ja-see categorize into Category:Japanese lemmas instead of Category:Japanese non-lemma forms. &mdash; Fytcha〈 T | L | C 〉 15:54, 1 December 2022 (UTC)


 * The case of t:ja-def is a little complicated: t:ja-def was indeed initially (2007) created as a redirect template. 11 years later in 2018, the creator of t:ja-see repurposed it as a main-entry template serving to tell t:ja-see which lines of the definitions are to be fetched and displayed on the redirect page. But the change was never fully implemented, perhaps due to lack of consensus. That's the main reason why 2 drastically different styles of redirect pages coexist on English Wiktionary. -- Huhu9001 (talk) 04:20, 2 December 2022 (UTC)


 * , thank you for bearing with me and hashing this all out. :)
 * Re: katakana forms, I am not a fan of creating these by default unless they are lexically important. If someone wants to bot-create these as soft-redirects, I suppose I would not be opposed.  That said, the katakana ↔ hiragana difference is very roughly analogous to the upper case ↔ lower case difference in bicameral alphabets – there is a one-to-one equivalency for each kana / letter.  We don't have upper-case entries for words like, even though that is a valid lemma form for the term (per the definition arrived at in this thread; while DOG exists, this has no sense line referring to a canine animal), and for similar reasons I don't think we need to have katakana entries for words like.
 * Happy to leave romanized entries out of any "lemma" categories. Still unsure of proper categorization for proposed entries like katakana, when we don't have analogous English entries like .  If your position is that  should exist and should be categorized as a lemma, then fine, I'll grant that  should be treated similarly.  But if conversely your position is that  should not exist and/or should not be categorized as a lemma, then I am unsure why we would treat  differently.
 * Re: the Wikipedia treatment of "Lemma" at w:Lemma (morphology), you mentioned that The word form in your quote was clearly meant to refer to morphological forms, not orthographic forms. That isn't as clear to me, considering the subsection further down at w:Lemma (morphology).  My working understanding of the term "lemma" is closer to what is described in this section, where the focus is on "where the 'main' entry is located" in a dictionary.
 * By way of background, Japanese print dictionaries are indexed by pronunciation, and the headwords that are printed in that order are shown with the kanji spellings. There is no functional difference between looking up  or.
 * This particular word neko only has the one entry; a more illustrative example might be vs.  -- there are several homophones pronounced the same as, so a reader would look up the right page of the dictionary for this reading, and skim through the headwords to find the  headword and associated entry.  There is a small functional difference between looking up  and , but again the headwords are listed using the kanji spellings.
 * The key point is that Japanese dictionary entries exist at the intersection between the pronunciations (as represented by the kana spellings) and the kanji spellings.
 * The disjuncture between readings and headwords here at Wiktionary is caused by the technological shortcomings of our platform -- and hence this (at least my) confusion about what we treat as "lemma" here.
 * → I am happy to change my working definition to match this thread, as in something along the lines of "any uninflected form of a word, regardless of script, so long as it is written in a script used by native writers".
 * I'll skip the deduping comment, as that is addressed by the "what is a lemma" point above. :)
 * Re: lemma categories, I don't think I've used those for much either. I am happy to concede on this.
 * Re: vs., in terms of template use, while both redirect the user to another entry, they are not used the same at all --  just creates an inline link to the Japanese entry for the provided string, while  displays a reformatted summarization of the targeted entry with additional explanatory text.   is much more heavy-lifting, using Lua to look up the targeted entry and extract only certain relevant information.   is was much simpler, using the older template architecture to just add a link to the Japanese entry for each of up to eight argument strings.
 * Incidentally,, it's not clear to me why you Lua-ized . Isn't it more problematic to add Lua to this, increasing the Lua memory load?  I'm not sure that Lua is actually needed for anything this template does?
 * I'd be happy to have reworked to categorize differently.  Note that something similar would likely be needed for related templates  and.
 * Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 03:52, 6 December 2022 (UTC)
 * Apologies for the very belated reply.
 * Re: katakana forms, I would argue that all of カワイイ, CUTE and gebäude are indeed lemmas (i.e. that they satisfy the morphological constraints imposed on them) but I, of course, agree with you that the latter two are not worthy of inclusion despite being attested and despite having a slightly different nuance than cute and Gebäude (i.e. being emphasized and being sloppily typed on a computer, respectively). We regularly exclude lemmas for all sorts of reasons. However, I concede that there is a priori no difference to カワイイ and that it thus shouldn't be included. The only reason why I would still be okay with a redirect at カワイイ is because the MediaWiki software redirects the search queries CUTE and gebäude automatically but not カワイイ.
 * Re: Lemma (morphology), I think we will have to agree to disagree about our respective readings of the Wikipedia article but given that you said you're willing to change your working definition to the one presented in this thread, this is already all I could ask for anyway :)
 * Re: ja-def, I have Lua-ized it because there has been a case where there have been more than 8 arguments (can't find it right now). Just adding more slots seemed like a band-aid solution to me so I did it "the proper way". I have almost exclusively encountered this template on kana pages so I thought the increase in memory usage wouldn't make a difference. If it does, I can change it to a hybrid template that switches to module only after a certain number of arguments have been passed.
 * &mdash; Fytcha〈 T | L | C 〉 12:47, 6 January 2023 (UTC)
 * Thank you for the ping and the background, and also for slogging through this long thread with me. :)  Agreed on the above, and I'll be sure to ping you if I notice any memory impact with regard to .  Happy 2023!  ‑‑ Eiríkr Útlendi │Tala við mig 23:23, 6 January 2023 (UTC)
 * Sounds good. Happy New Year to you, too! :)&mdash; Fytcha〈 T | L | C 〉 08:11, 7 January 2023 (UTC)

I have now implemented this change:. &mdash; Fytcha〈 T | L | C 〉 13:00, 6 January 2023 (UTC)


 * Looks good! Binarystep (talk) 07:00, 10 January 2023 (UTC)

English pronunciation appendix fails to account for General American three-way back-vowel merger
Looking through the vowel table in Appendix:English pronunciation, the GA column lists the father/palm and not/boss vowels as, and the law/caught vowel as. However, this doesn't match with my experience of GenAm speakers, who, so far as I can tell, mostly use for all three (except for a few anomalous instances of persistent, mostly in foreign loans like  or ).

Should we update our English vowel chart to reflect the GA merger of all three non- open and open-mid back vowels to ? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:00, 24 November 2022 (UTC)


 * FWIW, I pronounce the initial vowels in father and not similarly, but distinct from either palm or boss. For me, the vowels in law and caught are close, but also still distinct.  ‑‑ Eiríkr Útlendi │Tala við mig 08:01, 24 November 2022 (UTC)
 * Whereas (as you might've guessed from the question) I pronounce all of those with the same vowel. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 08:17, 24 November 2022 (UTC)
 * Could you clarify how many total distinct vowels you have? Are you saying that father=not, palm=boss, and law=caught, but no two of these three sets are equal to each other? (Usually, "palm" and "boss" are grouped either with father=not (as /ɑ/) or law=caught (as /ɔ/), although not necessarily both with the same one.) --Urszag (talk) 08:37, 24 November 2022 (UTC)
 * I have eleven distinct vowels in total (if we lump R-colored vowels in with their non-R-colored versions): four front or near-front (/i/, /ɪ/, /ɛ/, and /æ/), one central (/ə/), and six back or near-back (/u/, /ʊ/, /o/, /ʌ/, /ɔ/, and /ɑ/). Of these, /o/ and /ɑ/ only show up in limited circumstances (in my 'lect, /o/ can only occur immediately prior to the approximants /l/ and /ɹ/ or as the onset of the diphthongs /ou/ and /oi/, while almost all cases of /ɑ/ are either immediately prior to /ɹ/ [a context which, in GA, as noted earlier this month, is allergic to /ɔ/ unless the two are separated by a syllable break] or in relatively-recent loanwords).  And for me, as regards their (main) vowels, father=not=palm=boss=law=caught=sorry=/ɔ/, while Nazi=bra=star=/ɑ/. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 11:07, 24 November 2022 (UTC)
 * Would you be able to give an elaboration similar to the above?--Urszag (talk) 03:14, 28 November 2022 (UTC)
 * I am not as well-versed in vowel IPA as I'd like, so I won't dive into that. While close, father ≠ not.  The "A" in father is more open, I think.  Also, palm ≠ boss, where the "L" renders the vowels quite distinct -- palm = balsam, for instance.  Meanwhile, my vowel for law is pretty "flat" or "straight", while for caught there's maybe a tiny bit of a diphthong.
 * To crib from Whoop whoop's post, I'd group the words as follows based on the vowel values for how I speak:
 * father = Nazi = bra = star = sorry
 * not
 * palm
 * boss = caught
 * law
 * ‑‑ Eiríkr Útlendi │Tala við mig 19:13, 28 November 2022 (UTC)
 * Thanks for the reply! That is definitely interesting. When you say that palm = balm and is different from boss because of the "L", do you mean that you pronounce palm, balm with a sequence of a vowel + the consonant /l/, as in ball, bald, Paul? There are more sets here than generally exist in North American phoneme inventories, so I suspect some of the distinctions you mention might be allophonic and conditioned by the surrounding sounds rather than phonemic. Another follow up question I am wondering about is whether differences you hear between father and not and between boss/caught and law might be based on the voicing of the following consonant: comparing nod, hop, block and broad, hawk, pause might shed light on that.--Urszag (talk) 19:56, 28 November 2022 (UTC)
 * For me, bomb and balm are contrastive: the "L" is definitely present, and causes me to start the vowel sound with my lips more rounded and with my tongue pulled further forward in the mouth. The difference is essentially the same as between ah and all.  FWIW, all and awl are identical for me.  For not vs. boss and caught, a better minimal pair here might be not and naught (where the latter vowel sound for me is the same as in boss and caught, and also bog for that matter).  ‑‑ Eiríkr Útlendi │Tala við mig 20:42, 28 November 2022 (UTC)


 * I was quite surprised to read further up on the page that you have /ɔ/ in father vs. /ɑ/ in bra. Descriptions or dictionary transcriptions of General American almost always use to represent the vowel in father, bra, bother, which have the same vowel phoneme for most General American English speakers. See the section "Unrounded LOT" in Phonological history of English open back vowels. ("Palm" is complicated for the separate reason that many American English speakers either restore the /l/, or have a pronunciation that is in some way affected by the presence of the letter L.) Retaining a rounded vowel in LOT seems to be a particular feature of the New England region (which would I think fit with your background); I guess that the word father, which in RP has an unrounded vowel that was irregularly lengthened to [ɑː], developed a rounded vowel in your accent by analogy with the much larger LOT set. I am from California and in my accent, there is no phonemic distinction at all between /ɒ/, /ɔ/ and /ɑ/; for me father=not=palm=boss=law=caught=Nazi=bra=star=sorry. It's possible that my merged vowel is phonetically [ɔ] before /l/, but I would certainly not use [ɔ] as the general transcription of the merged vowel. Its quality for me is much better represented by [ɑ], so I transcribe my phoneme in these words as /ɑ/. My impression has been that aside from other speakers with a three-way merger, who seem to be common and who I hear as having [ɑ] like myself, the other largest group in North America is speakers with a two-way distinction between [ɑ] in LOT and [ɒ] (conventionally transcribed as /ɔ/, although I believe it is often not really that raised, or even necessarily all that rounded) in CLOTH and THOUGHT. Although speakers with a two-way distinction are supposedly not uncommon, my personal experience has been that the only people I've encountered who say they make a distinction are older speakers like my grandparents or people I have communicated with online. "General American" is I think generally not defined either as having or as lacking the cot-caught merger: maintaining the distinction is a more conservative feature, but from what I can tell many American English speakers are poor at perceiving whether another American speaker has a low back merger or not, and there isn't much stigma attached to any of the various possible configurations of phonemes in this area (as opposed to some distinctive regional allophones, such as certain New York-associated pronunciations of the /ɔ/ phoneme). --Urszag (talk) 08:37, 24 November 2022 (UTC)
 * It looks like you also have a three-way /ɒ/-/ɔ/-/ɑ/ merger (and one that's more complete than my own, in fact), just with a different endpoint... which actually strengthens my point that Appendix:English pronunciation should in some way take note of the tendency of more-evolved-GA speakers towards a full three-way lowish-back-vowel-other-than-/ʌ/ merger. As regards my background, one, most descriptions of New England English that I've seen seem to focus mainly on the coastal non-rhotic vowel-merge-resistant 'lects, without much notice being taken of the apparently-less-conservative inland-New-England 'lects (the linked WP article is apparently unaware of the possibility of a full or near-full three-way low-back merger occurring with a rounded LOT vowel), and, two, before someone asks, I'm 99% certain that how I pronounce things isn't just from my family passing me an idiosyncratic way of pronouncing English, given that, so far as I can recall, my classmates in middle and high school also tended to pronounce things the way my family does.  (Interestingly, judging by, my accent is apparently actually closer to a rounded-LOT variation of Canadian English than it is to General American, at least as regards low back vowels, which is not a result I would've expected.) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 11:30, 24 November 2022 (UTC)


 * The result of the merger is /ɑ/, surely. Nicodene (talk) 23:18, 24 November 2022 (UTC)


 * I think I don't have the contrast that you're talking about between palm-not-law and bra, and am not sure I've noticed it in others or what phonetic difference the notation and  indicates exactly, though you mentioned the former vowel is rounded. Is it similar to the cardinal values (official strict phonetic pronunciation, like Polish o and Received Pronunciation father) or something else? (Forgive my skepticism that it's the cardinal value, because people tend to use phonetic symbols in a way highly conditioned by the way English is transcribed so I'm never sure.) Are there any sound files or videos that you can link to (with timestamps) that demonstrate the contrast? — Eru·tuon 22:45, 25 November 2022 (UTC)


 * Just noting for clarity, I moved "boss" (and "moth") out of the "not, wasp" line: independent of if we want to start mentioning the additional/alternate vowels words have after cot-caught-merging, "not, wasp" and "boss, moth" aren't in the same set (moth, at least, is in ; this can merge back to for speakers who merge caught to cot, but that merger changes multiple lines, so it's more clearly handled by noting as much on each line rather than making a frankenline, IMO). - -sche (discuss) 00:16, 26 November 2022 (UTC)


 * The mergers in question are the father-bother and cot-caught mergers, yes? But as others have said, the outcome of these is /ɑ/ not /ɔ/, except in dialects like Boston (and, outside GenAm, some varieties of Canadian) that have an /ɒ/ I could see someone interpreting as /ɔ/. I agree the appendix should mention the mergers, though enough varieties of GenAm don't merge cot-caught that IMO we're best off continuing to show the distinct vowels first, and mentioning the merger in a labelled way — if not in footnotes, then maybe (for e.g. law) " ɔ (with cot-caught merger: ɑ) ", like how entries put the distinct sounds first and the merged cot-caught sound next? - -sche (discuss) 07:30, 26 November 2022 (UTC)
 * I think that that makes sense. Tharthan (talk) 22:03, 26 November 2022 (UTC)
 * @-sche: The problem is, not all cot/caught-and-father/bother-merging dialects merge to ɑ; for those Northeastern and Canadian dialects that you mention (and that I evidently grew up in), the cot/caught and father/bother mergers keep things like law at /ɔ/ (which some of those dialects might well realize as [ɒ]) and, instead, merge most instances (the exceptions being where this is blocked by /ɹ/ plus in some more-recent loanwords) of /ɑ/ to /ɔ/ ([ɒ]?), which makes the proposed label (which pretty much flat-out states that all back-vowel-merging dialects merge /ɔ/ to /ɑ/ rather than the other way around) wrong. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:27, 27 November 2022 (UTC)
 * The Northeastern dialect that -sche mentioned was the Boston dialect, which traditionally does not have the father-bother merger.
 * I can believe you when you assert that certain accents in more central or western parts of Massachusetts that have both the cot-caught merger and the father-bother merger can end up with all three groups having a single, noticeably rounded vowel. But that definitely is not the case with the Boston dialect. The Boston dialect pronounces father as [ˈfaðə] and bother as [ˈbɒðə]. In the Boston dialect, a word like law also has [ɒ] because the Boston dialect does have the cot-caught merger. Tharthan (talk) 22:49, 27 November 2022 (UTC)
 * @Tharthan: Ah, that makes sense regarding the Boston accent's vowel-merger status. Regardless, there are still those inland-Northeast accents that merge all three vowels to /ɔ/ (or perhaps [ɒ]), which makes the " ɔ (with cot-caught merger: ɑ) " notation misleading (because of said inland-Northeast accents where the merger goes the other way). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:44, 27 November 2022 (UTC)


 * (e/c) As far as I understand, the conventional answer to that in linguistics literature, and what seems to be the current approach here, is that these dialects are distinct things, treated on separate lines like or  (etc, as applicable), distinct from GenAm. Of course, entries like thought vs cot vs caught are inconsistent about whether they are nested under GenAm yet (or labelled at all beyond the uselessly vague "US"), etc, but maybe if we can find / agree on which label(s) to use here, we can set about standardizing and fleshing out such entries. AFAICT, these dialects are generally analysed not as shifting /ɑ/ to /ɔ/ (or even necessarily having GenAm /ɔ/ at all), but as lowering caught to /ɒ/ (so that, incidentally, it sounds to speakers outside the dialects roughly like they have simply merged it into [how speakers outside the dialects would say] cot), and either also merging cot to that, or — since in some of these, like Inland North, not all speakers merge cot-caught — keeping cot distinct by fronting it to [kat]. Or, as in Boston, speakers may merge both cot and caught but still keep words like cart distinct by fronting it to [kat]. My guess is that this lowering to /ɒ/ might be what you have, too. (I've revised this comment a few times, trying to be neither oververbose nor so short it's curt or making statements without explaining them.) Brains already need to ignore a lot of variation in the sound of a vowel from speaker to speaker that just comes from voices differing, so we shockingly often also don't perceive when people make contrasts or mergers we don't, as long as we can resolve their speech into expected word-slots, i.e. unless their accents strongly differ or a particular collision creates confusion that context can't readily resolve. Hence, for example, this discussion led me to ask Americans I know whether they merge lot, thought etc, and I was surprised that some of them did and I hadn't registered it (even though I knew the merger existed). Likewise, it's probably how you were able to think most Americans merged these sounds. (In the past, it's what led Gilgamesh to argue most Americans merged bull and bowl like him, and it's what led Mahagaja to make the still-relevant advice that this is why it's important we look to linguistic literature that looks at formants, etc, rather than our own assessments, heh.) I suspect that if you have this regional /ɒ/, and know that the two nearest GenAm phonemes are /ɑ/ and /ɔ/, and you have a few words with canonical /ɑ/, that may be why you're interpreting your merged lot-thought vowel as /ɔ/ (even if you're comparing it to other GenAm varieties' /ɔ/, GenAm /ɔ/ is already closer to /ɒ/ than RP /ɔ/ is). - -sche (discuss) 00:31, 28 November 2022 (UTC)
 * (Palm moving to /ɔ/ for some people is probably influenced by /l/ having been present at some stage in their lect's history as Urszag says; balm and psalm can have the same /ɔ/, and salt, etc. Why father would have /ɔ/ or /ɒ/, I don't know; I'd love to see data on where that pronunciation occurs. Maybe absence of non-r-coloured/r-adjacent /ɑ/ in any(?) other words reduced the extent to which /ɑ/ was distinguished as a phoneme, until its reintroduction in learnèd or loan terms, which is where you say you have /ɑ/...?) - -sche (discuss) 02:13, 28 November 2022 (UTC)
 * The word "father" is recorded as having /ɔ/ in some accents, but I didn't know it occurred in North America. The varying vowel by dialect of "father" (/ɑ/, although common, is after all an irregular development) is in fact part of what led John Wells to choose PALM instead of FATHER as the keyword for this set; even though, in the context of North American English, I would say palm is certainly unsatisfactory. Wells mentions "if we are discussing Hiberno-English, [...] father often has not the expected aː of Armagh, Karachi, Java etc but the ɔː of THOUGHT" in his blog post "lexical sets" (John Wells’s phonetic blog, Monday, 1 February 2010).--Urszag (talk) 03:07, 28 November 2022 (UTC)
 * I am late to this discussion but I have a cot-caught distinction which is /ɑ/ vs. /ɒ/, and for me, palm has the /ɒ/ vowel without any /l/; similarly bomb /bɑm/ vs. balm /bɒm/. I grew up in Tucson but spent the first four years of my life in New Haven CT, which might explain this. Benwing2 (talk) 06:40, 3 December 2022 (UTC)

Gloss/category for non-religious Jewish terms
Arising from a chat with User:Jodi1729: we often gloss religious terms by the religion (e.g. "Christianity", "Hinduism", "Judaism"): however, in the case of Jewish people, there are a lot of very distinctive culturally Jewish terms that have got no religious connection, like, ,. Jodi and I thought that it might be appropriate to have some sort of "Jewish culture" gloss for these things. Opinions? Equinox ◑ 05:16, 25 November 2022 (UTC)


 * Agreed. 98.170.164.88 05:34, 25 November 2022 (UTC)
 * Makes sense to me. Binarystep (talk) 06:19, 25 November 2022 (UTC)
 * Disagree. It's better to be specific, simce not all Jewish cultures speak Yiddish. Some better labels are Ashkenazi or Yiddishism. Ioaxxere (talk) 19:10, 25 November 2022 (UTC)


 * You're still in favour of a label, though? That's all I was saying. We can't put "Judaism" on because it isn't a religious term. But it's also a phrase that is definitely not used outside of Jewish/Yiddish-related communities. Equinox ◑ 19:20, 25 November 2022 (UTC)
 * Yep, I agree with all that Ioaxxere (talk) 19:28, 25 November 2022 (UTC)
 * Seems sensible. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:28, 27 November 2022 (UTC)

Join the Movement Charter Regional Conversation Hours

 * You can find this message translated into additional languages on Meta-wiki.
 *  m:Special:MyLanguage/MyLanguage/Movement Charter/Community Consultation/Announcement/Regional conversations • 

Hi all,

As most of you are aware, the Movement Charter Drafting Committee (MCDC) is currently collecting community feedback about three draft sections of the Movement Charter: Preamble, Values & Principles, and Roles & Responsibilities (intentions statement).

How can you participate and share your feedback?

The MCDC is looking forward to receiving all types of feedback in different languages from the community members across the Movement and Affiliates. You can participate in the following ways:


 * Attend the community conversation hours with MCDC members. Details about the regional community conversation hours are published [ here]
 * Fill out a survey (optional and anonymous)
 * Share your thoughts and feedback on the Meta talk page
 * Share your thoughts and feedback on the MS Forum:
 * Preamble
 * Values & Principles
 * Roles & Responsibilities (statement of intent)
 * Send an email to: movementcharterwikimediaorg if you have other feedback to the MCDC.

Please check the appointments of the Community consultation hours [Here] and register for the meeting that suits your availability. The conversations will not be recorded, except for the section where participants are invited to share what they discussed in the breakout rooms. We will take notes and produce a summary report afterward.

If you want to learn more about the Movement Charter, its goals, why it matters and how it impacts your community, please watch the of the “Ask Me Anything about Movement Charter” sessions which took place earlier in November 2022.

Thank you for your participation.

On behalf of the Movement Charter Drafting Committee,  Mervat (WMF) (talk) 19:41, 27 November 2022 (UTC)

Updating nonlemma's documentation
I've edited the documentation of nonlemma as well as Etymology to reflect the way this template is typically used and non-lemmas are treated. I wanted to discourage the wholesale addition of nonlemma to entries that are strictly non-lemma forms; it strikes me as a misguided attempt to complete every entry when leaving the etymology section out of most non-lemma entries entirely suggests to me that they're fine the way they are. Ultimateria (talk) 05:06, 30 November 2022 (UTC)


 * I can get behind that. It has the same vibes as demoting alt forms to not include etymologies and the like. Vininn126 (talk) 17:18, 30 November 2022 (UTC)

/ol/
Should words like pole be notated /ol/ rather than /oʊl/ in GenAm? Some editors have said that something like this is the (allophonic / [narrow]) pronunciation they have, in a few discussions over the years, most recently Vininn and Whoop whoop here. Lately, I've seen it being added to entries, but in that case we need to update the appendix which AFAICT only recognizes /oʊ/+/l/. The main rationale against a switch that I can see is that outside of a few British dialects, it doesn't seem to be contrastive; the difference between row labs and roll abs is regular vs dark l rather than a change in the vowel, and the difference in the vowels of bone and bowl is perhaps not so great as to be phonemic. - -sche (discuss) 21:08, 29 November 2022 (UTC)
 * Row labs v. roll abs does involve a change in the first word's vowel, though; row labs has a diphthong before the /l/, whereas roll abs is monophthongal in that location. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:01, 6 December 2022 (UTC)


 * That would imply /o/ is a unique phoneme vs /oʊ/ which it isn't. My accent has even more extreme dark-L shenanigans, but I still notate "goal" [gɔɰ] as /gɐʉl/. Just put more realised pronunciations on pages, that's what I've been doing for cases like this where the phonemic vs phonetic pronunciation differs quite a bit. – Nixinova [&zwnj;T|C] 03:31, 30 November 2022 (UTC)


 * True. (Moving this to BP in hopes of more input; I initially didn't want to start yet another BP pronunciation discussion, but it needs input...) Unlike with or, where e.g. floor eight does contrast with both flaw rate and flow rate, supporting a distinction between /o/ and /ɔ/ and /oʊ/ before /ɹ/, I'm not sure if there's a phonemic contrast between /oʊ+l/ vs /ol/ (the distinction seems to reside more in the quality of l than o). - -sche (discuss) 22:15, 30 November 2022 (UTC)
 * For my accent, that difference is completely predictable as a matter of syllabification (sometimes also a difference in foot structure/secondary stress): "flow ring" vs. "flooring" is parallel to the differences I have between "key ring" and a hypothetical "keering", or pairs with /l/ like "slowly" [sloʊli] vs. "goalie" [goəɫi] or "gayly" [geɪli] and "gale-y" [geəɫi]. The pair of phonemes could be transcribed as /e/ and /o/ or /eɪ/ and /oʊ/, but I consider them just two phonemes, not four: I have no cases of tautosyllabic [oʊl] and [eɪl] so there is no contrast when syllable division is taken into account. Overall, I think my preference would be to use /o/ /e/ everywhere in phonemic transcriptions of American English.--Urszag (talk) 23:10, 30 November 2022 (UTC)
 * I wouldn't necessarily disagree (on just using /o/ and /e/ for flow, day, etc) if we switched to e.g. the /o̞/ that was suggested in a prior discussion for floor, because the sounds are different in minimal pairs like flowrate vs (some pronunciations of) fluorate. (An exhaustive search would probably find more minimal pairs.) - -sche (discuss) 22:51, 1 December 2022 (UTC)
 * That would be no better from my perspective: as I said, the reason I'd be for switching to using /o/ everywhere is because I consider them to be the same vowel phoneme. I have a phonetic difference between "flow-rate" and "fluorate", but I do not accept that as an example of a minimal pair for the vowel phoneme in the first syllable because for me the difference is entirely explained by the prosodic/syllabic structure of the word. They are like the pairs "night rate" and "nitrate" or "sea king" and "seeking", which are not pronounced completely identically despite having the same sequence of phonemes.--Urszag (talk) 01:05, 2 December 2022 (UTC)
 * I'd personally be in favor of using . In my speech they are sometimes monophthongal, sometimes closing or centering diphthongs as Urszag points out and are as well though not with the exact same patterns or degree of diphthongality. And I think it's true of some General American accents. Not sure if that is true of all General American accents, if they are influenced to some degree by a regional accent that does have strong diphthongs for, like a Southern accent. — Eru·tuon 03:25, 2 December 2022 (UTC)
 * I would personally argue for using /oʊ/ where the vowel in question is a diphthong (e.g., in Southern American accents, where "roll" actually is pronounced /ɹoʊl/, as if it were "row'l"), and /o/ where it's a monophthong (e.g., in GA itself, where "roll" is not pronounced with the "row" vowel, but, instead, with a monophthong, as /ɹol/). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:01, 6 December 2022 (UTC)


 * For example, User:Kwamikagami's been reverting to show an /oʊ/ diphthong in its pronunciation, claiming Merriam-Webster Online as authority for doing so.  The problem with this is that MW's GA pronunciations are considerably out of date and do not reflect how the word is actually pronounced in GA (hear included audio files for demonstration of same). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:54, 6 December 2022 (UTC)
 * I don't think there's a phonemic contrast between a monophthongal and diphthongal oh vowel. That was lost hundreds of years ago in mainstream English accents (like toe used to contrast with tow ). We shouldn't be transcribing the same phoneme two different ways based on allophonic differences; that can be put in phonetic [] transcriptions. Unfortunately we're already kind of transcribing an allophone by writing north-force with  rather than  in harmony with other cases of the vowel, but we don't need to add more inconsistencies.... I'd rather switch to  everywhere. — Eru·tuon 00:50, 8 December 2022 (UTC)


 * @Urszag: on one hand, the traditional use of that argument is to keep floor as /ɔ/, and to the extent we rejected the idea that the difference between floor eight and flaw rate was just allophony predictable from the syllable boundary, I'm not convinced about using it to dismiss the difference between flowrate vs fluorate... on the other hand, I concede the issue arises with all r-colored vowels (the precise vowels in /-ɑ.ɹ-/ and /-ɑɹ-/ also sound different, and as this discussion shows, so do the precise vowels used before same-morpheme /l/). I find conflicting evidence of whether, if a speaker produces the flowrate vowel where the fluorate vowel is expected, the difference is so large that it's heard as the other word / a clearly different phoneme and not merely an allophone: this writer says his father's unmerged hoarse vowel made moron (VCV) sound like a name Mo Ron rather than like moron, but apparently he couldn't hear the difference in hoarse vs horse (VCC) since he didn't understand his father saying those didn't sound the same. I suppose I could get behind representing both flowrate and fluorate and row and roll with /o/ in broad IPA as Erutuon and others have suggested, if we can agree to try and more routinely include narrow IPA showing the differences, and to continue showing the contrastive syllable breaks in these cases, not drop them as proposed recently. (But if people are disagreeing on whether the vowel in roll vs rolloff is the same vs different... well, hopefully scholarship can clarify.) Another interesting (not minimal, because they differ in number of syllables) pair is (where the entry currently says "/ˈsoʊɚ/") vs ; if we change to /o/ everywhere and don't mark syllable breaks, and hence change sewer to /soɚ/, it'll be confusing, given how many reference works treat /oɚ/ and /oɹ/ as interchangeable notations or notate one with the other: [soɚ is the notation Merriam-Webster uses for monosyllabic sore, different from a sewing sewer, and we ourselves inconsistently notate e.g. the air sound interchangeably as /ɛɚ/, /ɛɹ/ or /ɛəɹ/ (or with other vowels than ɛ) in various entries. (So that's another example of the need to retain syllable breaks, /soɚ/.) - -sche (discuss) 20:36, 10 December 2022 (UTC)


 * Again, if you want to change how we transcribe words, get consensus to change the key first, and then change the articles. Our entry for roll, for example, shows a diphthong in GA, so your edit was inconsistent -- it would be very strange for GA to have a diphthong in roll but a monophthong in rolloff. I speak something very close to GA, and I also have a monophthong in roll, but a diphthong in rolloff. Your two pronunciations are lexically distinct for me, a verb phrase vs a noun. So, sources, plus consensus to change the key. kwami (talk) 07:08, 6 December 2022 (UTC)
 * @Kwamikagami: "Your two pronunciations are lexically distinct for me" - please explain how this is relevant? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:23, 6 December 2022 (UTC)
 * I don't understand the question. If the pronunciations of two words are different, then when transcribing them we should show them as being different. kwami (talk) 07:28, 6 December 2022 (UTC)
 * @Kwamikagami Which is what I've been doing. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:33, 6 December 2022 (UTC)
 * That doesn't clarify your question. kwami (talk) 07:56, 6 December 2022 (UTC)
 * You described the two pronunciations as "a verb phrase vs a noun", which doesn't make sense to me, given that, for me, both the verb phrase and the noun use the first, monophthongal pronunciation (the only difference between the verb phrase and the noun being the insertion of a hiatus between the two syllables in the verb phrase, and the absence of said hiatus in the noun). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 08:39, 6 December 2022 (UTC)
 * Yes, it's the hiatus that makes the difference. For me, the /l/ in the noun "rolloff" is ambisyllabic, and doesn't monophthongize the vowel, just as in the sound file at MW. In the verbal phrase, "roll" and "off" are separate words, and the /l/ does monophthongize the vowel. Or perhaps the vowel in rolloff is ambiguous, neither quite a monophthong as in roll nor a diphthong as in row, just as the syllabification is ambiguous. Either way, it would be weird to transcribe rolloff with a monophthong but roll with a diphthong. kwami (talk) 21:09, 7 December 2022 (UTC)


 * I've been trying to find sources for any of the positions people are taking here, since our personal assessments of how general Americans speak differ. So far, FWIW, I've found Matthew Gordon, in a section on the phonology of "New York, Philadelphia, and other northern cities" in Edgar Schneider, The Americas and the Caribbean (2008), page 81: "In some areas [of the Inland North], GOAT and GOAL appear with long monophthongs as they do in the Upper Midwest (see [...]) and Canada", whereas Bas Aarts, April McMahon, Lars Hinrichs, The Handbook of English Linguistics (2021), page 333, says "while the mid vowels /e/ and /o/ are indeed realized as monophthongs by some speakers of Northern British English dialects (Wells 1982; Watt 2002), these vowels are realized as diphthongs ([eɪ], [oʊ]) in American English (Labov et al. 2006)." I do see San Duanmu saying in Syllable Structure: The Limits of Variation (2009), page 185, that "The tense vowels [o] and [e] are monophthongs when they are followed by [l] or [ɚ], as in [oɚ] or, [gol] goal, [eɚ] air, and [pel] pale. They are often diphthongs in open syllables". (Conversely, Merriam-Webster mentions as an aside in their pronunciation guide that "in coastal South Carolina, Georgia, and Florida stressed \o\ is often monophthongal when final, but when a consonant follows it is often a diphthong moving from \o\ to \ə\".) I get the sense that while there may be a [narrow, phonetic] difference between the vowels of roll and row, which one has the mono- vs diphthong differs between the North and South, and they're the same /broad/ phoneme, for which /oʊ/ is the traditional notation. - -sche (discuss) 09:12, 6 December 2022 (UTC)
 * I don't have a problem transcribing it /oʊ/. Per Duanmu, I have [o] a monophthong in goal, or and air, but I wouldn't find [oʊ] to be confusing. Also, I have a diphthong in pale (quite audible, even sequisyllabic, as happens with diphthongs before /l/). kwami (talk) 21:19, 7 December 2022 (UTC)
 * ...You have /o/ in ? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:20, 7 December 2022 (UTC)
 * Oops, sorry. I changed something after I drafted that and didn't iron out all the inconsistencies. I was referring to the claim that GA has monophthongs [o] or [e] in those words; I don't for pale but do for the rest. kwami (talk) 22:56, 7 December 2022 (UTC)
 * I'm surprised to hear that there're AmE speakers who don't have a diphthong or outright syllable break in ! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:18, 7 December 2022 (UTC)