Wiktionary talk:Language treatment/Discussions

WT:LTD redirects to this talk page, to ease using the aWa tool to archive discussions here. For more archived discussions, see the corresponding Wiktionary:-space page, Language treatment/Discussions.

Rename Old East Slavic to Old Russian (orv)
The term "Old East Slavic" must have been created to avoid offending Ukrainians and Belarusians and to please nationalists who use "Old Ukrainian" and "Old Belarusian", since "Old Russian" may imply to some that Belarusian/Ukrainian languages are derivations of Russian. However, "Old East Slavic" (Russian: ) translates into Belarusian as and into Ukrainian as  (alternative names also exist, including "Old Ukrainian"). All three terms are literally translated as "Old/Ancient Russian/Rusian", not "Old East Slavic". Note that in Ukrainian refers more specifically to Rus rather than modern Russia. (Unlike Russian and Belarusian, the Ukrainian term means "Russian" both referring to ethnicity and the country. Cf. Russian  /  and Belarusian  / ). Despite the current tensions between Russia and Ukraine, linguistically it makes much more sense to rename "Old East Slavic" to "Old Russian". Call me biased, whatever, but respectable East Slavic linguists all use "Old Russian", it's of the Ancient Rus, not modern Russia, the centre of which was in Kiev, modern Ukraine. :) --Anatoli (обсудить/вклад) 02:47, 12 March 2014 (UTC)
 * I don't think the names that modern speakers use should really count. They're bound to have some kind of national pride in them, like discussions on the Wikipedia article show. (And I'm hoping to avoid a straw man, but by the same logic we'd call Dutch "Netherlandish", German "Dutch", Greek "Hellenic", Armenian "Hayeric", Hittite "Neshite", and so on...) I think we should focus purely on what modern English-language scholarship calls the language. 03:23, 12 March 2014 (UTC)
 * Re: English language usage. Ngram, Ngram 2 can't even find "Old East Slavic (language)", not sure if I interpret this correctly or if it is possible to check the English usage via Google.
 * Being half-Russian, half-Ukrainian, I take pride in belonging to both but pride can take various ways. E.g. is sometimes translated by Ukrainians as  and some Ukrainian nationalists claim they have nothing to do with Russian but others take pride in being the cradle of all Slavs. We all know the Serbo-Croatian language story and Israeli-Palestinian conflict doesn't make Hebrew and Arabic belong to different language families. Modern Ukrainian, Russian, Belarusian are distinct languages but they are all derived from one, so your comparison is not valid.
 * In any case, Belarusians (educated), considering the name of the language, are not even trying to deny that Belarusian is derived from Old Russian (=Rusian, Old East Slavic), the language of Rus, in Ukrainian, the term is also used, without "Old", since  has this meaning ((modern) Russian language is called  in Ukrainian). Ukrainians also know that  is quite a new term, although there are some ridiculous claims about so-called "Ukr" tribes, derived directly from Proto-Slavic, bypassing Old Russian stage. I wonder what discussion you're referring to?
 * BTW, I disapprove the policy of Russian government and respect the right of Ukrainians to decide their fate but we are talking about language names and linguistics here. Bad timing for this discussion, considering the Ukrainian and Crimean crisis. --Anatoli (обсудить/вклад) 04:22, 12 March 2014 (UTC)
 * We should probably invite our Ukrainian editors to this discussion to get a broader picture. --Anatoli (обсудить/вклад) 04:49, 12 March 2014 (UTC)
 * I think this should be decided solely by the most commonly used name in English . Thus, I tentatively support "Old Russian" based on the Ngrams above, and based on a hunch that English speakers are less likely to have to look up the language if it is called "Old Russian" than if it is called "Old East Slavic". However, there needs to be a more thorough investigation into the evidence, since Ngrams are often inaccurate and/or skewed by unexpected factors. --WikiTiki89 04:55, 12 March 2014 (UTC)
 * Agreed but Google books count of "Old Russian language" is also significantly larger than "Old East Slavic language" (even though "Old Russian language" may not always have the same sense as "древнерусский язык"). I'm happy if someone else can analyse it better using Google or other sources. --Anatoli (обсудить/вклад) 05:02, 12 March 2014 (UTC)
 * Re: solely by the most commonly used name in English. We should also take into consideration the literal names in the affected languages. I think "Old East Slavic" is a result of some kind of political correctness or fairness, ignoring the fact that Ukrainian and Belarusian don't use this fairness in the name, also not note French "vieux russe", Estonian "vanavene keel", Dutch "Oudrussisch" and many Slavic names. The rest of names must have followed English Wikipedia in their namings. --Anatoli (обсудить/вклад) 05:21, 12 March 2014 (UTC)
 * I disagree, whatever the origin of its usage, if "Old East Slavic" became were shown to be used more commonly in English, then that is the name we should use. The only other factors we should consider are the ambiguity of the name (for example, if "That Language There" were the most commonly used term for some language, we probably still shouldn't use it), or possibly some other minor issues that I can't think of at the moment. --WikiTiki89 05:30, 12 March 2014 (UTC)
 * I meant it seems it was coined out of some considerations but I agree that if it were more common we should use it. I can't think of another example but in Japanese (Hanguru-go) "Hangeul language" was specifically coined, out of political correctness, to avoid giving preferences to either Korea or their languages (North and South Koreas have different names in East Asian languages, see  and ). --Anatoli (обсудить/вклад) 05:38, 12 March 2014 (UTC)
 * If you agree that if it were more common then we should use it, then isn't that the same thing as only relying on which is more common? --WikiTiki89 05:43, 12 March 2014 (UTC)
 * --Anatoli (обсудить/вклад) 05:25, 12 March 2014 (UTC),, anyone else is active?


 * It is not easy to get a sense of which name is most common in English. Chaff like "an old Russian car" skews counts of "Old East Slavic" vs "Old Russian", but if "language" is added, the number of hits drops so low that it is statistically insignificant / unreliable. However, based on manual review of a search with "in", I think "Old Russian" is the more common of the two names. - -sche (discuss) 06:25, 12 March 2014 (UTC)


 * On BGC, "Old Russian language" gets 4 relevant hits, 7 chaff hits of things like "thirty-three-year-old Russian-language specialist" or "the old [=venerable] Russian language", and 2 hits alongside "Old East Slavic language" in Library of Congress subject heading catalogues.
 * "Old East Slavic language" gets 3 relevant hits, 6 hits in LOC subject heading catalogues or printed editions of Wikipedia, and 1 hit that actually uses both terms: "the purely vernacular, Old Russian — or, more precisely, Old East Slavic — language".
 * "in Old East Slavic" gets 13 relevant hits, 2 in books that also use "Old Russian".
 * "in Old Russian" gets 32 clearly relevant hits, 25 clearly irrelevant hits (like "dressed in old Russian uniforms") and 13 hits where its not clear whether it means Old Russian or just Russian that is old, and that's just the first 7 pages: I stopped counting after that because it was clear at that point that "Old Russian" was more common than "Old East Slavic".
 * - -sche (discuss) 06:25, 12 March 2014 (UTC)


 * I support the rename to Old Russian per the following Google Scholar Data, showing that "Old Russian" is more than 10 times as common as all other names I could think of, combined. —Aɴɢʀ (talk) 19:20, 12 March 2014 (UTC)


 * "Old Russian" language – 13,200 hits
 * "Old Ukrainian" language – 847 hits
 * "Old East Slavic" language – 167 hits
 * "Old Belarusian" language – 139 hits
 * "Old Ruthenian" language – 96 hits
 * "Old Byelorussian" language – 38 hits
 * "Old East Slavonic" language – 26 hits
 * "Old Eastern Slavic" language – 9 hits
 * <tt>"Old Eastern Slavonic" language</tt> – 4 hits

“Old Russian” reflects systemic prejudices that go back to times of the empires, when Encyclopedia Britannica defined wrote about White Russians and Little Russians, some of whom were Ruthenians. These prejudices were still felt in academics and journalism when the Soviet Union broke up, and are only going away now. I don’t think Wiktionary is part of an establishment that tries to wilfully reinforces these backward practices.

Furthermore, the moment when the Russian regime is exploiting such ideas in its anti-Ukrainian propaganda is the shittiest possible time to start this discussion. —Michael Z. 2014-03-12 19:18 z 
 * I feel kind of the same about it. I can't motivate the choice with widespread usage, but I feel that "Old East Slavic" just is a better, more descriptive name for the language. So I oppose. 19:42, 12 March 2014 (UTC)


 * I agree about bad timing and with the criticism of the Russian regime and I'm very sad about the latest events - Putin, in a short period, has created a situation, which was unimaginable for hundred years - Russians fighting Ukrainians. I wish we didn't mix politics in here, though. The political split of Yugoslavia caused the artificial language split but we all know that Serbo-Croatian is one language, despite the tensions. Jewish and Arabic are still semitic languages and Ukrainian and Russian are still derived from the same source (after a big injection of Old Church Slavonic into Russian and Polish injection into Ukrainian). I'm not stating any Russian supremacy or support any name-calling. Regarding the names, I disagree that "White Russians" and "Little Russians" were a result of prejudice, just as "Белару́сь" (Бе́лая Русь) and "Малоро́ссия" are just historical words, names among others like "Great Rus", "Red Rus", "Black Rus", "Carpathian Rus". Nikolay Gogol, native of modern Ukraine, also used "Малороссия" when referring to his homeland. The "offensive" meaning (if "Малоро́ссия" ("Little Rus(sia)" sounds offensive to Ukrainians) was acquired mistakenly in the modern times. In any case,  originated on the territory of modern Ukraine by people who lived there and this is where all East Slavs originate from. I don't see or Ukrainians don't see anything offensive in terms "давньору́ська мо́ва" and "ру́ська мо́ва" when referring to Old East Slavic language but I won't insist on continuing this discussion, if it hurts someone's feelings. --Anatoli (обсудить/вклад) 02:10, 14 March 2014 (UTC)


 * Guys, call it Old Bulgarian and that's it. Though modern Bulgarian belongs to the southern branch, the early Bulgarian and Russian texts do show their being one language. Alexdubr (talk) 14:31, 17 March 2014 (UTC)
 * You are confusing two different languages. Old Bulgarian, which we call Old Church Slavonic, was also a South Slavic language, but it was used even in Russia/Ukraine/etc. as the main written and liturgical language. The main spoken language, however was Old Russian or Old East Slavic, which was an East Slavic language and the ancestor of modern Russian/Ukrainian/etc. --WikiTiki89 16:14, 17 March 2014 (UTC)


 * Result: not renamed; no consensus for renaming. - -sche (discuss) 02:16, 26 January 2015 (UTC)

Komi language
Merge Komi-Zyrian (kpv), Komi-Permyak (koi), Komi-Yodzyak (no code) with Komi (kv). Komi-Zyrian is dominating over others. Currently, they all use the same alphabet but there were differences in the past. There's very little information on the grammar differences but they are mostly considered dialects, not separate languages. Merging kv and kpv should be straightforward, Zyrian is the language of Komi Republic.-Anatoli (обсудить/вклад) 10:39, 6 April 2014 (UTC)


 * Actually, it has been planned as far as Komi-Zyrian and Komi-Permyak are concerned, see WT:LANGTREAT: "kv and kpv refer to the same lect; one will eventually be deleted.". Apparently, there are differences with Komi-Permyak but those can be labeled as "Permyak", similar Serbo-Croatian or Albanian varieties. --Anatoli (обсудить/вклад) 01:54, 7 April 2014 (UTC)


 * Added: Komi transliteration, Module:kv-translit, which handles both Zyrian and Permyak. --Anatoli (обсудить/вклад) 02:22, 7 April 2014 (UTC)


 * Komi-Permyak has a distinct written tradition and should not and cannot be merged. Your statement that Komi-Zyrian is dominating shows your pro-Russian and anti-minority POV which should never be a basis for consensus on this wiki. Komi-Zyrian can be handled under, yes. -- Liliana • 12:34, 7 April 2014 (UTC)


 * Do you have paranoia or something? I'm not Putin and not suppressing any minorities. Komi-Zyrian is a language of majority of Komi people in Komi Republic, has a much larger number of speakers and much more materials written in this variety of Komi. Komi-Permyak is used by a minority in Perm Krai and has only 63,000 speakers. By all means, I'm not forcing anyone to merge, this is only a suggestion. They are mutually comprehensible and each of them has dialects. It's possible to merge Komi varieties like it's possible to have one L2 header for Albanian, Norwegian or Serbo-Croatian, marking varieties accordingly: тӧлысь (Zyryan)/тӧлісь (Permyak), выль (Zyryan)/виль (Permyak). --Anatoli (обсудить/вклад) 12:58, 7 April 2014 (UTC)
 * That was completely rediculous and has no place on this wiki. It shows your anti-Russian bias more than Anatoli ever showed any pro-Russian bias. 13:00, 7 April 2014 (UTC)
 * Thanks. I can't help to have some Russian bias, though. This is my language, my culture. I can help with the Russian language and Russia-related topics, even if Russian may now be interesting perhaps as a "language of enemy" for some. I don't blame people for criticizing Russian politicians looking at Russia with suspicion. I don't support Russian politics and propaganda but I don't have to apologize for being Russian either. :) Certainly I shouldn't be blamed for being against minorities. Why would I add contents for minority languages, if I wanted to supress them? --Anatoli (обсудить/вклад) 13:19, 7 April 2014 (UTC)


 * There is one problem: as WT:DATACHECK reveals, "Komi-Zyrian language (kpv) has a canonical name that is not unique, it is also used by the code kv." Presumably one of the two needs to be retired (or, failing that, at least renamed). - -sche (discuss) 18:52, 2 May 2014 (UTC)
 * Komi-Zyrian is less ambiguous and clear. Although I favoured "Komi", perhaps we should retire it and leave Komi-Zyrian for kpv and kv and Komi-Permyak for koi. Komi-Zyrian is implied if Komi is used. --Anatoli (обсудить/вклад) 00:12, 3 May 2014 (UTC)
 * I switched about half of our kv entries to use kpv, and then I got distracted. I'll try to finish sometime soon. (See also User talk:-sche and Category talk:Komi language.) - -sche (discuss) 04:29, 27 May 2014 (UTC)


 * Everything that needed to be done seems to have been done. - -sche (discuss) 02:19, 26 January 2015 (UTC)

Sardinian languages
Category:Campidanese_Sardinian_language Category:Gallurese_Sardinian_language Category:Logudorese_Sardinian_language Category:Sassarese_Sardinian_language

merged into

Category:Sardinian_language

We are not supposed to treat dialects as independent languages. How are these not dialects? --Æ&#38;Œ (talk) 20:21, 6 May 2014 (UTC)


 * I support a merger of Campidanese (sro) and Logudorese (src) into Sardinian (sc), for reasons I outlined on my talk page and repeat here for others' benefit: those two lects differ from each other, quoth WP, "mostly in phonetics, which does not hamper intelligibility among the speakers". They are perhaps comparable to the dialects of Irish, with standard Sardinian (sc) existing as a unification of the dialects (again, comparable to Irish). Notably, we already include standard Sardinian (sc) — which means the additional inclusion of sro and src is quite schizophrenic.
 * There is some disagreement over whether Gallurese (sdn) and Sassarese (sdc) are dialects of Sardinian, dialects of Corsican, or languages separate from both Sardinian and Corsican. I would not merge them at this time. - -sche (discuss) 03:19, 12 May 2014 (UTC)


 * I've merged Campidanese (sro) and Logudorese (src) into Sardinian (sc). - -sche (discuss) 08:14, 17 August 2015 (UTC)

The Asu and Pare languages
(and anyone else who loves language-name conflicts), we currently call aum "Abewa" and asa "Asu". Firstly, it seems that practically nobody calls aum by that name; Wikipedia and Ethnologue both titles their entries for it "Asu", and Roger Blench, who may be the only person ever to study it, calls it "Asu" as well. As for asa, which currently occupies that name, Wikipedia and most hits on Google Books call it "Pare" (the remainder call it "Asu" or "Chasu", for the most part). As you may have guessed, there is already a language that we call "Pare", namely ppt, but this New Guinean language is also called "Akium-Pare" and (Wikipedia's choice) "Pa", which thankfully appears to be untaken. The chain of changing language names does seem rather silly, but the overall purpose of this is to move the only language out of these three that actually has any literature on it (and thus the one I just added a translation in), namely asa, to its most commonly used name. —Μετάknowledge discuss/deeds 05:40, 9 September 2015 (UTC)


 * I'm down with renaming aum away from Abewa.
 * On a balance, I'm also OK with renaming asa away from Asu. I can find a decent amount of references to "Asu": it's hard to say whether more or less than "Pare" because both terms turn up so much chaff. There is some documentation of its vocabulary available, incidentally; The Making of a Mixed Language: The Case of Ma'a/Mbugu by Maarten Mous mentions "Pare (Chasu)" muruke "sweat" and tika "lift" (in the context of their having been borrowed without change into Normal Mbugu, and then glottalized into Inner Mbugu muru'u and ti'i); Mbugu also borrowed Pare ku-kasha "hunt", Zigua ku-kala "to hunt", and Shambaa u-kalá "hunting" and ngwilizi "eagle" (source of Normal + Inner Mbugu ngwirizi, variation of l/r being dialectal in Shambaa; contrast Pare ngwirini). Isaria N. Kimambo's Political history of the Pare of Tanzania, c. 1500-1900 (1969) implies Pare and Asu are different: "Other naming procedures include the use of u- for territorial names, e.g. Upare for the Pare country; and ki- for language, e.g. Kipare for the Pare language. The only exception here is Chasu which refers to the Asu language." John D. Kesby's Rangi of Tanzania: an introduction to their culture (1981) seems to clarify, however: "in the northeast of Tanzania, [...] the people called Pare in Swahili refer to themselves as Asu". (And Maarten Mous provides a bit more detail, that Pare/Asu has at least two dialects, north and south, with the north one apparently also being called Vudee and several spelling variations thereof.)
 * As for ppt, some works refer to it as "Pari", e.g. The Abandoned Narcotic: Kava and Cultural Instability refers to "the Pari (Pa) language" (not to be confused with ). I guess we can rename it "Pa" for now, and switch to "Pari" later if something else called Pa comes up. - -sche (discuss) 05:48, 10 September 2015 (UTC)


 * Renamed as proposed: aum from Abewa to Asu, asa from Asu to Pare, ppt from Pare to Pa. - -sche (discuss) 00:09, 27 September 2015 (UTC)

Chidigo to Digo (dig)
From "Chidigo" to just "Digo", partly because we should try to purge prefixes from our language names where appropriate and partly because the latter name is vastly more used by linguists. —Μετάknowledge discuss/deeds 19:35, 5 September 2015 (UTC)


 * Support. - -sche (discuss) 16:55, 8 September 2015 (UTC)


 * Renamed. - -sche (discuss) 00:26, 27 September 2015 (UTC)

Pol Pomo
We currently call pmm "Pomo", which makes it sound like the Pomoan "language" which some people hypothesize exists, and for this reason I almost added a translation (with nested translations for Northern and Central Pomo) using the code in this way. I propose it be renamed "Pol" or "Pol Pomo", the name Wikipedia and the International Encyclopedia of Linguistics use. - -sche (discuss) 13:57, 19 June 2015 (UTC)
 * Damn, that's very rightfully confusing. Support "Pol", since that's what Wikipedia uses. —Μετάknowledge discuss/deeds 07:17, 11 August 2015 (UTC)


 * Renamed. - -sche (discuss) 22:25, 27 September 2015 (UTC)

Aramanik (aam), Aasax (aas)
The ISO merged aam "Aramanik" into aas "Aasax", saying [//www-01.sil.org/iso639-3/cr_files/2014-042.pdf here] "Aramanik [aam] is listed as a Southern Nilotic language of the Nandi group, presumably because the Aramanik people assimilated to the Nandi. The original Aramanik language was a Cushitic language (or a non-Nilotic language with heavy Cushitic overlay) usually called Aasax (Fleming 1969) and is already included in a separate Aasax [aas] entry. Maarten Mous, in A Grammar of Iraqw, also gives them as synonyms." - -sche (discuss) 22:11, 2 November 2015 (UTC)

Gallurese (sdn), Sassarese (sdc)
These are currently named "Gallurese Sardinian" and "Sassarese Sardinian", which has led to them sometimes being nested under Sardinian in translations tables, but this is erroneous because they are (transitional) dialects of Corsican spoken on Sardinia, not dialects of Sardinian. I propose to drop "Sardinian" from their names. See also WT:RFM. - -sche (discuss) 17:52, 17 August 2015 (UTC)
 * Or we could just merge them into . —Aɴɢʀ (talk) 19:09, 17 August 2015 (UTC)
 * We could. They're subject to the same LDL CFI whether we keep them independent or merge them into Corsican (as contrasted with dialects of Italian, for example, which would find themselves subject to much higher CFI if merged into <tt>it</tt>), and a merger would reduce duplication while we could still note differences with s and s... so perhaps we even should merge them. But they occupy grey areas. Gallurese is transitional between Corsican and Sardinian; Sassarese is transitional between Corsican, Sardinian and Tuscan (which I guess we consider <tt>it</tt>?). Ah, dialect continua... - -sche (discuss) 01:45, 18 August 2015 (UTC)


 * Definitely support dropping the "Sardinian" in their names; abstain on whether or not to merge them into co (but oppose any merger of anything into it). —Μετάknowledge discuss/deeds 20:50, 10 September 2015 (UTC)


 * Renamed. Not merged at this time. - -sche (discuss) 21:23, 19 October 2015 (UTC)

Merging cbk-zam (Zamboanga Chavacano) into cbk (Chavacano)
Pretty much every source I could find referred to Zamboangueño et al. as varieties or dialects, not languages. Two of them in particular make it very clear:

“The result of the study showed that while there are observable differences in certain language features beween and among the four variants, they are nonetheless, mutually intelligible with each other even among native speakers who do not have any special language training. Thus, for the purpose of this pilot study, al four variants were identified as dialects of PCS.” Sister María Isabelita O. Riego de Dios, A Pilot Study on the Dialects of Philippine Creole Spanish

“The two variants of PCS share enough distinctive differences from regular Spanish or regular Philippine usage that they must be considered historicaly related dialects of the same language” Charles O. Frake, Lexical Origins and Semantic Structure in Philippine Creole Spanish

— Ungoliant (falai) 07:19, 15 March 2014 (UTC)


 * Support. - -sche (discuss) 08:22, 2 November 2015 (UTC)
 * ✅. - -sche (discuss) 08:57, 1 February 2016 (UTC)

Kiyaka language
There are two problems here: First, we currently have the code  referred to a language we call Kiyaka, but which Wikipedia calls Yaka (without the noun class prefix) and which authors on Google Books seem to agree to call Yaka as well. There are a couple other languages sometimes called Yaka, but fortunately they all have other names that are more common and therefore there is no conflict, and the principal name of  should therefore be modified; there are very few categories associated with this one, so it should be easy to change. Secondly, the Wikipedia article states that the codes,  , and   refer to its dialects, but Glottolog seems to consider them separate languages (possibly just following the ISO rather than actually making a judgement). Therefore we should consider merging these codes, if in fact there are not enough differences (some data would be helpful). —Μετάknowledge discuss/deeds 02:16, 18 June 2015 (UTC)


 * axk currently goes by "Yaka"; if yaf were renamed "Yaka", yaf would need to use a disambiguator or axk would need to be renamed, but to what? axk&apos;s alt name of "Aka" is taken (by soh), and I haven't offhand found evidence that anyone calls it by its alt name "Beka".
 * Ethnologue, although it grants Ngoongo a separate code (noq), labels it a dialect of Yaka in its entry on Yaka. I can find a German reference stating "Eng mit den Yaka verwandt sind die Lonzo, Pelende und Suku." ("Closely related to the Yaka are the Lonzo, Pelende and Suku", the last of which WP and Ethnologue consider to speak a separate language.) A French reference says "Les Pelende ont un accent linguistique propre, mais ils s'entendent avec les Yaka, Suku, Lonzo, Luwa, Hungana, Tsamba, Ngongo, Mbala et Kongo." ("The Pelendes have their own linguistic accent, but they get along with / can understand the Yaka, Suku, Lonzo, Luwa, Hungana, Tsamba, Ngongo, Mbala and Kongo.") Another says "Le kipelênde comme le kiyaka est un dialecte du kikongo commun. Plus répandu, «Le kiyaka comprend quelques neuf dialectes distincts, présentant parfois des variantes assez considérables.»" ("The Kipelênde like Kiyaka is a dialect of the common Kikongo. More widespread, "The Kiyaka includes some nine separate dialects, sometimes with quite considerable variations.")
 * I can find a small Yaka corpus, but not any comparison of the different [might-be-]dialects.
 * A conservative approach might leave the codes separate until such time as someone comes along with words in them. - -sche (discuss) 16:20, 18 June 2015 (UTC)
 * Hmm, re names: soh is also called (Jebel) Silak, or (Jebel) Sillok; Wikipedia uses the name Sillok, but I'm not finding many resources to assess how common that is (and some refer it by a hyphenated string of dialects). If we can move soh (and perhaps should anyway), then we could move the rest down without disambiguating (so axk would be Aka, and yaf would be Yaka). —Μετάknowledge discuss/deeds 23:22, 18 June 2015 (UTC)
 * I sometimes wonder if we should start preferring disambiguators to alt names: they'd be annoying to type, but I think the mere fact that we're discussing a three-link chain of renames (language A takes B's name, B takes C's name, C takes D's name) shows how much clearer they'd be. I pity the new user who e.g. adds aja content under an ==Adja== header, and I pity the veteran user who has to notice that that has happened. In this case, I can't find evidence of soh being called Sillok, but the people who speak it and the place they live are called Sillok, so at least it wouldn't be unclear. Perhaps we could just rename axk and soh to have disambiguators, though: "Aka (Congo)" and "Aka (Sudan)". - -sche (discuss) 02:02, 19 June 2015 (UTC)


 * I suggest renaming axk and soh to "Aka (Central Africa)" and "Aka (Sudan)", and then renaming yaf as originally proposed. - -sche (discuss) 06:52, 4 July 2015 (UTC)
 * Renamed as proposed. What to do with the [might-be-]dialects remains to be determined. - -sche (discuss) 03:47, 12 July 2015 (UTC)
 * I've generally followed the guideline that we avoid such parenthetical geographic locators; were we to use them in general, it would change a great deal of our names. I know few others care, but perhaps we ought to put this to the community at large in the BP? —Μετάknowledge discuss/deeds 19:56, 28 July 2015 (UTC)
 * In this specific case, I think there is compelling reason to deviate from the general practice/guideline of preferring alt names to parentheticals even without changing that guideline: in order to use alt names here, we'd have to chain-rename ≥3 languages such that the name each one most often went by was assigned to a different one, and one of them would end up with an unattested name, which would all be extremely confusing. As for whether/how to change the general guideline: I'll think the matter through more thoroughly before I post anything in the BP. I don't think I'd propose switching to parentheticals in all cases (I think, for instance, that Pyu/Tircul and Riang/Reang use different scripts and so are unlikely to be mixed up). I would only prefer parentheticals where people would be likely to mix up which language was meant by a given name, and where the mix-up would be likely to go unnoticed (e.g. because the script was the same). - -sche (discuss) 21:23, 28 July 2015 (UTC)
 * Convenient links to previous discussions:
 * Beer parlour/2012/August
 * Beer_parlour/2013/November
 * --WikiTiki89 21:38, 28 July 2015 (UTC)

The Tonga languages
There are far more Tonga languages than anyone would want to have to deal with, but I am excluding all but two of them in this discussion for the sake of ease. We currently call toi and tog "Tonga" and "Chitonga", but both languages are called by both names, and use of alternative names seems to be vanishingly rare. Our current system is leaving me (and at least one person who tried to give a translation in one of the languages) thoroughly confused, so as much as I find them ungainly, I'd much rather we use parenthetical geographic identifiers than have to go through this madness. (Pinging as usual (and you ought to take a look at the other ones I've posted recently on this page when you get a chance, if you are so inclined.) —Μετάknowledge discuss/deeds 05:28, 20 September 2015 (UTC)
 * I'm all in favor of parenthetical disambiguators, but what should they be? Wikipedia calls toi and tog, but that seems suboptimal to me since the parentheticals aren't parallel. Ethnologue suggests the majority of toi speakers are in Zambia and all tog speakers are in Malawi, so how about "Tonga (Zambia)" and "Tonga (Malawi)"? —Aɴɢʀ (talk) 13:25, 20 September 2015 (UTC)
 * Support a rename of toi to "Tonga (Zambia)" and of tog to "Tonga (Malawi)". While we're at it, I think we prefer (do we?) to drop "ki-", "chi-", "gi-" and such African language-name prefixes, so toh could be renamed from "Gitonga" to "Tonga (Mozambique)". Happily, to is distinct as Tongan, and we don't have tnz yet, but it seems to be consistently called "Ten'edn" or "Maniq" (the latter being properly an ethnonym) by its speakers, who SIL says are totally unfamiliar with "Tonga" as the name of a language. - -sche (discuss) 23:50, 26 September 2015 (UTC)
 * I don't know if we've talked about it before, but I think in general we should avoid language-name prefixes. However, there are some exceptions; I prefer "Luganda" to "Ganda", for example, because it's far more commonly used. I'm fine with renaming Gitonga as you suggest. By the way, thanks for dealing with some of these language issues; Kikuyu and Rwanda-Rundi are still lingering on this page, so please give them some love/research when you have a chance. —Μετάknowledge discuss/deeds 04:07, 27 September 2015 (UTC)


 * ✅. - -sche (discuss) 10:08, 18 February 2016 (UTC)

Wolof
Gambian Wolof (wof) should be merged into Wolof (wo), IMO. Ethnologue says "Senegalese Wolof [wol] intelligible by speakers of Gambian Wolof but with significant enough differences to require adaptation of materials", which seems to have been their motive in splitting these lects (like so many others). - -sche (discuss) 06:51, 25 February 2016 (UTC)
 * Support. We can make Gambian Wolof a regional dialect of Wolof and tag relevant words wo. —Aɴɢʀ (talk) 08:38, 25 February 2016 (UTC)


 * Support. Good catch. —Μετάknowledge discuss/deeds 05:57, 27 February 2016 (UTC)
 * ✅ (no entries used the code). - -sche (discuss) 05:11, 29 February 2016 (UTC)

Raga or Hano
It seems to be more common to call this language "Raga", as Wikipedia does. Compare e.g. (several books mentioning the language) vs  (no relevant hits). - -sche (discuss) 09:11, 27 February 2016 (UTC)
 * That's a good way to demonstrate commonness of use. Support renaming to Raga. —Μετάknowledge discuss/deeds 01:44, 4 March 2016 (UTC)


 * Renamed. Compare also vs the same with "Hano". - -sche (discuss) 07:23, 22 March 2016 (UTC)

Sauk or Ma Manda
Currently called "Sauk", but Wikipedia, SIL publications on the language, and work by Alexandra Aikhenvald all call it "Ma Manda". A happy side effect of the move would be that we could add "Sauk" as an alias for sac, as it is probably the second most common name for that language. —Μετάknowledge discuss/deeds 06:34, 3 November 2015 (UTC)
 * Renamed per nom. Sure enough, the only entries with "Sauk" translations are referring to sac. - -sche (discuss) 09:07, 27 February 2016 (UTC)

Waray-Waray and Warray
Our current situation is to call war "Waray-Waray" and wrz "Waray"; this is not necessarily an optimal solution. Wikipedia chooses to call war "Waray" and wrz "Warray"; although "Warray" is less common than "Waray" to refer to wrz (as far as I can tell), this gives the commonest name of war to that language, which probably deserves priority due to being much more studied. At Template talk:war, you can see that the idea to rename war to "Winaray" was rejected and Liliana's choice of "Waray-Waray" won out. However, it's clear that our current situation has caused some confusion (User:DTLHS/cleanup/mismatched translation codes shows a lot of misuse of wrz when war was intended). Basically, what we have now isn't bad, but the fact is that it's resulted in mismatched codes, so we might want to try a different approach. —Μετάknowledge discuss/deeds 20:46, 14 September 2015 (UTC)
 * I'd prefer minimizing ambiguity by calling war "Waray-Waray" and wrz "Warray" so that no language at all is called by the ambiguous name "Waray". —Aɴɢʀ (talk) 14:56, 15 September 2015 (UTC)
 * To be fair I think most of the mistakes were caused when the language was renamed but the translations weren't edited, not by whoever added them in the first place. DTLHS (talk) 23:49, 26 September 2015 (UTC)
 * I've renamed wrz in the manner Angr suggested. - -sche (discuss) 09:04, 27 February 2016 (UTC)

Sura or Mwaghavul
The name "Sura" is ambiguous (as noted on Talk:am), and the name "Mwaghavul" seems to be more common (compare, ) — and is, in any case, quite common — so I propose to rename sur from "Sura" to "Mwaghavul". - -sche (discuss) 20:25, 23 August 2015 (UTC)
 * Support. —Μετάknowledge discuss/deeds 04:40, 31 August 2015 (UTC)
 * Renamed. - -sche (discuss) 05:06, 28 February 2016 (UTC)

Tingal and Tegali
In 2011, the ISO retired the code for Tingal [tie], merging it into Tegali [ras]. I think we should follow suit. notes that there is dialectal variation in Tegali, but it's not between Tegali proper and Tingal, it is rather between Tegali proper and Rashad (but even those dialects are "nearly identical"). - -sche (discuss) 06:15, 11 August 2015 (UTC)
 * ✅. - -sche (discuss) 03:26, 29 February 2016 (UTC)

Batak or Palawan Batak
-sche has pointed out that we call this language "Batak"; that is an awful idea, due to the existence of the. To reduce confusion, we should do as Wikipedia does, and call it "Palawan Batak". —Μετάknowledge discuss/deeds 06:18, 29 February 2016 (UTC)


 * @Μετάknowledge: I support this renaming. — I.S.M.E.T.A. 01:30, 4 March 2016 (UTC)
 * Support, naturally. "Palawan Batak" still sounds like it might be a Batak language, but at least it stops people from seeing one of the Batak languages and entering it into Wiktionary as bya (compare the recent rename of "Pomo" to "Pol Pomo"). - -sche (discuss) 02:03, 4 March 2016 (UTC)
 * ✅. - -sche (discuss) 07:09, 14 April 2016 (UTC)

Shuadit
Shuadit (sdt) aka Judeo-Occitan or Judeo-Provençal is most definitely not an independent language. The literature seems to be quite sure on this point: Banitt refers to it as a "langue fantôme", Vouland as a "non-langue", and Alessio as a "langue imaginaire", to quote three especially scathing francophone scholars. One main scholar is Szajkowski, who seems to have made up a great deal about it (including the name Shuadit, of which there is no evidence of use) and who "was no linguist, and his knowledge of Occitan was quite poor" (Strich and Jochnowitz). Moreover, the so-called last speaker, and a chief primary source,, was evidently a semi-speaker who was not actually fluent in the "language". Glottolog sums all this up by saying: "This entry [Shuadit] is spurious. This means either that the language denoted cannot be asserted to be/have been a language distinct from all others, or that the language denoted is covered in another entry." To the extent that anyone wants to enter the paltry Hebrew-script text, it can be done as Old Provençal or Occitan, depending on how old it is. —Μετάknowledge discuss/deeds 06:33, 14 April 2016 (UTC)


 * Good to know. Is there any evidence that it was a dialect, or was it basically just a script variant like Judeo‐French? -- Romanophile ♞ (contributions) 06:41, 14 April 2016 (UTC)
 * Pretty much just a script variant. That was a popular thing to do, because using the Latin script was seen as too associated with the Church and distant from the Jewish educational tradition. There have been many other isolated incidences of languages like Urdu and Samogitian being written in Hebrew script as well. —Μετάknowledge discuss/deeds 06:45, 14 April 2016 (UTC)


 * do you think that much of Judaeo‐Romance contains only superficial differences? Most of them are considered ‘extinct’ by Wikipedia, bearing Ladino and Judeo‐Italian. Judaeo‐Italian notwithstanding, Ladino appears to be the language of Latin Jews around the world. -- Romanophile ♞ (contributions) 07:13, 14 April 2016 (UTC)


 * I think so, but I'd like to read through the chapter on each in the Handbook and consult some other sources before deciding. I obviously like Jewish languages, but I think that the line between language and dialect is being abused, and that we should find better ways to document these. —Μετάknowledge discuss/deeds 07:20, 14 April 2016 (UTC)


 * Merge into (Old) Provençal / Occitan per nom. - -sche (discuss) 03:23, 19 April 2016 (UTC)
 * Merged. - -sche (discuss) 02:56, 28 May 2016 (UTC)

Zarphatic
This name is used rather uncommonly, and really only by Jewish philologists, not mainstream linguists. It should be changed to the much more common "Judeo-French".

Unrelated to the name change, I'm not fully sure that we should have this as a separate language. Kiwitt and Dörr (2015) say: "It should be noted, however, that the major part of linguistic data attested in Judeo-French sources is simply common Old French written in Hebrew script, with some texts showing little to no register variation in comparison with Christian Old French sources." They go on to discuss one extensive text, a biblical glossary, where only 6% of words were not attested in Christian Old French texts. Basically, this is similar to Hindi and Urdu — is it worth keeping separate? (And if you think it'd be strange to have Hebrew-script entries under ==Old French==, remember that we have Arabic-script entries under ==Afrikaans==.) —Μετάknowledge discuss/deeds 04:52, 13 April 2016 (UTC)


 * Interesting. I’d be fine with a merge or a rename. Script variants and dialectisms can simply be marked with Judeo-French. I would love to work on this dialect, but I have no idea where to find any texts. -- Romanophile ♞ (contributions) 05:08, 13 April 2016 (UTC)


 * Before commenting here, I was planning to read the Judeo-French chapter of my new copy of the Handbook of Jewish Languages, but I now realize that Metaknowledge also recently acquired this book and has probably already this chapter and probably only started this discussion because of that. So I'll just assume that his conclusion is the same that I would have drawn and say that I agree that this should be merged with Old French. --WikiTiki89 21:10, 13 April 2016 (UTC)


 * Precisely. I intend to work through the entire book, improving how we cover Jewish languages. —Μετάknowledge discuss/deeds 01:10, 14 April 2016 (UTC)


 * Rename per nom; see also [//books.google.com/ngrams/graph?content=Zarphatic%2CJudeo-French%2C&year_start=1900&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t1%3B%2CZarphatic%3B%2Cc0%3B.t1%3B%2CJudeo%20-%20French%3B%2Cc0 ngrams]. Also merge per nom; perusing the references that turn up if I just search for references that mention both lects, I find that they agree:
 * Raphael Patai, Encyclopedia of Jewish Folklore and Traditions (2015, ISBN 1317471717, page 316: "Judeo-French is Medieval (Old) French as spoken and written by French and Rhenish Jews. It differs from the other “Judeo” languages in that there were no dialectal differences between it and the Old French spoken by the non-Jews[.]"
 * Aaron D. Rubin, ‎Lily Kahn, Handbook of Jewish Languages (2015, ISBN 9004297359, page 139: "However, this term does not imply the existence of a set of linguistic features common to these sources that would allow identifying a 'Judeo-French' language or dialect distinct from the varieties of Old French encountered in Christian sources."
 * Yes, this will result in Hebrew-script Old French (alternative-form-of) entries, and that is OK. - -sche (discuss) 00:16, 14 April 2016 (UTC)


 * A note: I do think we should allow this code to be used for etymology-only uses, however. —Μετάknowledge discuss/deeds 01:10, 14 April 2016 (UTC)
 * If I remember correctly, a number of Old French words are attested first in Rashi's writings, and those writings been used in recent years by scholars of Old French to fill in gaps in knowledge of other aspects of the language. I'm sure some of our etymologies already include those Zarphatic words as Old French, so we might as well make it official. I agree we should both rename the lect and merge it with Old French. Chuck Entz (talk) 02:29, 14 April 2016 (UTC)


 * Merged. - -sche (discuss) 03:01, 28 May 2016 (UTC)

Kichwa
Hello everyone. I'm curious as to how one would go about requesting an exceptional code for the standard Kichwa language? There are several SIL/Ethnologue codes for various Kichwa dialects (Imbabura (qvi), Chimborazo (qug), Cañar Highland (qxr), etc), but not one is for the standard Kichwa language that's taught in schools and used by the government in Ecuador. There is a common Quechua code (qu) is currently used for Quechua Wiktionary and Wikipedia, but both projects are exclusively written in standard Southern Quechua. Both Kichwa and Southern Quechua are part of the Quechua II branch of the Quechuan languages, but they are different dialects with different standardized grammars and different standardized writing conventions. --Dijan (talk) 18:28, 12 May 2014 (UTC)


 * Hmm... how mutually intelligible are Kichwa and Quechua? It wasn't that long ago that the various Quechua dialects' codes were removed from Module:languages, though I'm having trouble locating the discussion(s) that led to that.
 * If Kichwa is to be included, we could either (1) pick the code of one of the Kichwa dialects and use it for all of Kichwa (the way we use gcf for all of Antillean Creole), or (2) design our own code, like qwe-kch, according to the system outlined in WT:LANGCODE, point 3.3. - -sche (discuss) 04:25, 27 May 2014 (UTC)


 * They are mutually intelligible for the most part (similar to differences between Danish and Swedish), but they are also significantly different from each other to be considered separate. Linguistically, Kichwa belongs to a different subgroup of Quechua. The grammar of Kichwa is more simplified (loss of possissive suffixes, loss of the voiceless uvular fricative, etc) and the vocabulary is affected by native languages spoken before the Incan conquest of the territory of today's Ecuador and Colombia (meaning, Quechua was imposed as a foreign language, whereas it is a native language in the regions where Southern Quechua is spoken - in southern Peru and Bolivia).
 * I was referring to designing our own code and using that as an umbrella for all the Kichwa varieties - which now all use a standardized alphabet different from the Peruvian varieties (such as Southern Quechua), but I couldn't find the procedure for it.
 * There was an attempt to create a separate Kichwa Wikipedia, but apparently no one got around to it and it got complicated as somone pointed out that an official ISO code must be requested specifically for the standard variety. And for some the problem was that it was trying to use the Chimborazo (qug) code (which is one of the most widely spoken varieties of Kichwa). Apparently the issuing of codes is very strict on Wikipedia. --Dijan (talk) 20:57, 27 May 2014 (UTC)


 * , what are your thoughts on this? The Quechua dialects themselves were merged into qu, though I can't find the discussion that led to that. - -sche (discuss) 03:08, 29 February 2016 (UTC)


 * If none of the ISO codes cover this we should create our own code. The languages seem different enough that they warrant separate treatment, at least. -- Liliana • 21:43, 29 February 2016 (UTC)


 * OK, I have added qwe-kch "Kichwa"., I'm sorry neither I nor anyone else acted upon this for so long. The contents of Category:Standard Kichwa will presumably need to be re-headered and moved into Category:Kichwa language. - -sche (discuss) 22:52, 29 February 2016 (UTC)

Persian, Dari, etc
Although LANGTREAT notes that they should be subsumed into fa, the following codes still exist in Module:languages: As indicated above, my opinion is that we should merge all of those codes into fa. Incidentally, LANGTREAT originally also banned Tajik (tg), but this was not supported by scholarship or by our own practice (we had hundreds of Tajik entries), so after two discussions, I updated the page to note that Tajik is allowed. LANGTREAT made no mention of Judeo-Persian (jpr), Bukhari (bhh), Judeo-Tat (jdt) or Tat (ttt), and past discussions of them have assumed they were separate languages, so I also updated the page to reflect that. - -sche (discuss) 05:16, 20 November 2013 (UTC) (fixed think-o)
 * pes, for "Western Persian", the most common variety of Persian. We should merge it into fa because it *is* fa.
 * prs, Eastern Persian / Dari, the variety of Persian spoken in Iran Afghanistan . Its status as a separate language, and its very name 'Dari', were products of Afghanistani politics. Not even Afghani speakers of the language call it 'Dari' or consider it separate from Persian; we shouldn't consider it separate, either.
 * aiq, Aimaq, a variety spoken by nomads in Afghanistan and Iran. It is sometimes considered specifically a variety of Dari (which is itself little more than another name for Persian, as explained). It differs from standard Persian mainly in matters of pronunciation, something we usually handle with rather than separate L2s.
 * haz, Hazaragi, another Afghan variety. WP summarizes scholarly opinion (with citations, for which look here): "The primary differences between Standard Persian and Hazaragi are the accent and Hazaragi's greater array of Mongolic loanwords. Despite these differences, Hazaragi is mutually intelligible with other regional Persian dialects."
 * deh and phv, Dehwari and Pahlavani, which it is hard to find information on because even WP simply redirects the words to "Persian".
 * The Persian lects are an interesting issue; they are on the whole pretty similar, but Persian and Tajik have separate literary and cultural traditions, and I believe Dari does too. I think it is best to keep them separate. The Jewish varieties are often written in the Hebrew script and have a separate cultural tradition, so I think it would probably be handy to keep them separate as well. All the rest probably ought to be merged into their macrolanguages, unless there are script conflicts I'm unaware of (it is, of course, easier to keep with one script per language). —Μετάknowledge discuss/deeds 05:23, 20 November 2013 (UTC)


 * The difference between Dari and Persian is not great AFAIK, there are some references on Wikipedia. Tajik should stay separate, not just because it's in Cyrillic. It's very different from Persian and has many Russian and Turkic loanwords. There is also a significant difference in phonology (vowels). Persian e, o and â are usually i, u and o in Tajik. may be able to say a bit more. --Anatoli (обсудить/вклад) 05:36, 20 November 2013 (UTC)


 * We also currently include "Parsi" (prp) and "Parsi-Dari" (prd), which Wikipedia suggests are spurious(!). - -sche (discuss) 05:58, 3 December 2013 (UTC)


 * Regarding Dari, to merge Dari into Persian, we would have to figure out what to do with transliterations. In standard Persian, ē merged with ī into what we transliterate as i (no diacritic) and ō merged with ū into what we transliterate as u. In Dari, ē and ī and ō and ū are still differentiated. Furthing complicating the problem, standard Persian e, o, ey, and ow are pronounced i, o, ay and aw in Dari, although these differences are not phonemic. The other differences are not as much of a problem, but see a brief description at Dari. --WikiTiki89 14:37, 4 December 2013 (UTC)

The followed was copied here by User:-sche from User talk:Dijan: Hi, Do you think Dari can be merged with Persian? See Requests_for_moves,_mergers_and_splits. can still be used. CC. --Anatoli (обсудить/вклад) 04:56, 4 December 2013 (UTC)
 * That has been the practice on Wiktionary. Dari should be under the Persian heading with label. That is what we do already. Take a look at فاکولته, for example. Regarding the transliteration of long vowels,  has been trying to implement a classical Persian transliteration (which is pretty much used for Dari by scholars) as a standard for all Persian entries. So far, it's been a slow and selective process. I'm not opposed to it and it can easily be indicated as the standard Wiktionary practice in the Appendix:Persian transliteration. --Dijan (talk) 06:35, 5 December 2013 (UTC)
 * Thanks. Do you mind posting your answer on Requests_for_moves,_mergers_and_splits and perhaps address Persian/Dari vowel differences raised? Where can I look at the classical transliteration? --Anatoli (обсудить/вклад) 00:18, 6 December 2013 (UTC)
 * By the way, if you happen to know anything about Parsi or Parsi-Dari (and about their relationship to Farsi and Dari), your input on that subject would also be appreciated over in the same section. :) - -sche (discuss) 00:53, 6 December 2013 (UTC)


 * I support keeping Dari (and Western Persian) under Persian heading, but I'm not sure what to do about transliteration. By the way, Eastern Persian / Dari (prs) is the variety of Persian spoken in Afghanistan, not Iran, that of Iran is the first one, Western Persian (pes). --Z 17:48, 12 December 2013 (UTC)
 * Oh, yep, I've fixed my think-o. - -sche (discuss) 21:25, 12 December 2013 (UTC)


 * I've merged pes and prs into fa. - -sche (discuss) 22:42, 27 May 2014 (UTC)
 * I've deleted "Parsi" (prp) and "Parsi-Dari" (prd). - -sche (discuss) 06:36, 5 February 2015 (UTC)


 * James Minahan says "The Hazara language, called Hazaragi, is a Farsi dialect, although the Hazaras are physically Mongol. The intermixture of the Indo-European and Mongol linguistic groups resulted in a dialect of Dari Persian that contains extensive words and forms from Farsi, Turkic, and Mongol. [...] Most Hazaras also speak Dari Persian [...] as a second language. In Iran most are bilingual, speaking both Hazaragi and Farsi. [...] Until the 1980s educated Hazaras used Farsi or Arabic as the literary language, but a movement to create a Hazaragi literary language has gained momentum." This suggests that Hazaragi is a separate language; we also already have a few translations into it. Therefore, I've kept it separate and updated WT:LT accordingly.
 * Barbara West, in the Encyclopedia of the Peoples of Asia and Oceania, writes that the Aimaq speak "Aimaq, a dialect of Dari or Afghan Persian"; Brian Williams writes that "Aimaqs also have a strong Persian admixture, and their language is Dari or Farsi", Jake Kircher writes that "Once there was a generally used, common Aimaq language but now, few seem to speak it anymore. Dialects spoken today resemble Dari (Afghan eastern Farsi) admixed with words of Mongolian and Turkic origin." In general it seems that it should be handled the same way as Dari, via lb and a.
 * - -sche (discuss) 03:26, 4 March 2016 (UTC)


 * Several references consider Pahlavani an alternative term for Pahlavi, but the 2003 International Encyclopedia of Linguistics says Pahlavani is still spoken today in a village in Afghanistan, Haji Hamza Khan, where it is "similar to Dari Persian but still distinct" (earlier editions of the IEL are explicit that this is the only village where the language is spoken). I suppose it can be left unresolved for now.
 * Dehwari is spoken by Persians in Baluchistan; several references seem to consider it to be Persian, but Denys Bray's 1934 The Brāhūī problem writes as if it and Persian are separate languages; I suppose it too can be left unresolved for now.
 * - -sche (discuss) 04:01, 4 March 2016 (UTC)

the Lega lects
We currently have separate codes for three lects considered part of the Lega macrolanguage: lea, lgm, and khx. These are reasonable to separate into two languages, lgm (which should be called Lega-Ntara) and lea (which should be called Lega-Malinga) as there is 67% mutual intelligibility, but khx is clearly a variety of lea. All this is per the treatment of the Beya dialect of lea in The Bantu Languages. —Μετάknowledge discuss/deeds 04:23, 22 November 2015 (UTC)


 * Support; merge khx into lea, with lgm remaining separate, and name everything per nom . I'm not sure why Wikipedia names the two main dialects using placenames. The full array of alternate names I encountered in (cursorily) researching the matter:
 * Mwenga Lega = Lega-Ntara / Lega Ntara (variously translated in refs as "Lower Lega", "Upper Lega" or "Eastern/Northern Lega") = Isile, Ishile, Kisile; Mwenda-Liga
 * Shabunda Lega = Lega-Malinga / Lega Malinga (variously translated in refs as "Upper Lega", "Lower Lega" or "Forest Lega" or "Western/Southern Lega") = Lega (Kilega) / Liga (Kiliga) proper; dialects: Kanu (Kikanu), Gala (Kigala), Yoma (Kiyoma), Sede (Kisede), Gonzabale, Beya (Beia), and possibly (Ki)Nyamunsange and Banagabo and Kabango and Bene
 * - -sche (discuss) 22:22, 25 November 2015 (UTC)


 * I'v merged khx into lea. However, as names go, "Lega-Shabunda" and "Lega-Mwenga" seems to be more common than "Lega-Malinga" or "Lega-Ntara". - -sche (discuss) 05:06, 29 February 2016 (UTC)


 * (Re)named "Lega-Shabunda" and "Lega-Mwenga". - -sche (discuss) 18:55, 22 March 2016 (UTC)

Kikuyu
This is a hard one, and I'm not advocating one way or the other, just wishing to raise the issue. Ngrams reveal that in 1992, "speaking Gikuyu" and "Gikuyu language" became more common than "speaking Kikuyu" and "Kikuyu language" (as well as becoming the linguistic standard), yet overall the spelling "Kikuyu" is still more common, presumably in speaking of the people, who are less obscure than their language. We follow Wikipedia in using the spelling "Kikuyu", but this spelling is clearly no longer favoured for the language (if you're curious, the "g" is to reflect the etymon, 🇨🇬). Should we change it? —Μετάknowledge discuss/deeds 19:55, 5 September 2015 (UTC)
 * I would really like to get some opinions on this. —Μετάknowledge discuss/deeds 06:01, 27 February 2016 (UTC)
 * I really don't have a strong opinion on this. Personally, I still think of the language as Kikuyu, which makes it difficult for me to come out and say "Yes, we should rename it", but the reasons you mention make it difficult for me to come out and say "No, we shouldn't rename it". So I abstain. I'll be happy if we continue to call it Kikuyu, but I won't be unhappy if we start calling it Gikuyu. (I will be unhappy if we start calling it Gĩkũyũ, though, since that really isn't an English word.) —Aɴɢʀ (talk) 08:19, 27 February 2016 (UTC)
 * I am very familiar with the English name Kikuyu for both the language and the people (especially for the language), but I have never seen Gikuyu used in English. I think Gikuyu is the native name (more properly Gĩkũyũ). —Stephen (Talk) 22:50, 27 February 2016 (UTC)


 * As you note, there are phrases where ngrams suggets Gikuyu is now more common as a name for the language — but there are also phrases (1, 2) where Kikuyu is still more common even as a language name. I'd stick with the current spelling. - -sche (discuss) 05:00, 28 February 2016 (UTC)


 * Not renamed at this time. - -sche (discuss) 23:40, 2 July 2016 (UTC)

Kristang
Judging by Ngrams, "Malaccan Creole Portuguese" is the least common name for this language; more common is "Malacca Creole Portuguese" with no n, and most common is "Kristang". In Glottolog's list of materials on it, I note that most of the modern material on it (by Baxter and Marbeck) calls it Kristang. I suggest renaming. That entails updating several entries and moving several categories. - -sche (discuss) 19:01, 20 March 2016 (UTC)
 * Support. —Μετάknowledge discuss/deeds 00:19, 2 April 2016 (UTC)
 * Kristang gets a misleadingly high number of hits because it’s also the name of the people that speaks it, and part of the synonym Papia/Papiah/Papiá Kristang. — Ungoliant (falai) 01:05, 2 April 2016 (UTC)
 * Given that you're the most knowledgeable about PT-based creoles, what would you prefer? —Μετάknowledge discuss/deeds 17:36, 3 April 2016 (UTC)
 * Kristang, but Papia Kristang or Malacca Creole Portuguese are also good options. The most important works about this language use Kristang more prominently. — Ungoliant (falai) 19:30, 3 April 2016 (UTC)


 * Done. - -sche (discuss) 07:30, 3 July 2016 (UTC)

Loreto-Ucayali Spanish
We currently have a code  for this dialect of Spanish; Wikipedia has an article for it at  which states that "Ethnologue's reasons for doing this [making a separate code for it] are poorly documented." Although it has some mild differences, it is clearly a dialect of  and should be merged into it. (There are no entries, but we should record the merger.) —Μετάknowledge discuss/deeds 17:35, 3 April 2016 (UTC)
 * Support. — Ungoliant (falai) 20:27, 3 April 2016 (UTC)
 * Merge, IMO. (The only thing that gives me pause is the difference in attestation requirements: if Amazonian Spanish isn't well documented, then treating it as a separate lect allows its entries to be held to a lower attestation requirement. But the same could be said of a lot of varieties that don't have their own codes, e.g. New Mexico and Southern Colorado Spanish, for which there are several published references.) - -sche (discuss) 00:50, 4 April 2016 (UTC)


 * ✅. - -sche (discuss) 19:23, 2 July 2016 (UTC)

Shempire Senoufo
The Senoufo lects are a mess, and we have no consistency in naming them (some are Senoufo, some Sénoufo, some without the Senoufo at all). This one,  seems not even to be a separate lect at all, but instead what Supyire   is called in Côte d'Ivoire. —Μετάknowledge discuss/deeds 20:25, 3 April 2016 (UTC)
 * Yes, Wikipedia and Omniglot agree they are the same. (The old 2003 International Encyclopedia of Linguistics said their "Relationship [was] undetermined" at that time.) As for the other Senoufo lects: I noticed one while checking translations at water, and removed "Senoufo" from its name before I added entries in it because I saw how rarely it was actually referred to with "Senoufo" in the name. Supyire too seems to be mostly referred to without "Senoufo". - -sche (discuss) 00:45, 4 April 2016 (UTC)
 * Eventually we'll have to get to renaming them. For now, I just wanted to excise duplicates. Also, thanks for the archiving. —Μετάknowledge discuss/deeds 01:55, 4 April 2016 (UTC)


 * Merged. - -sche (discuss) 19:26, 2 July 2016 (UTC)

Brythonic to Brittonic
According to Brittonic languages, the name "Brittonic" is far more common than "Brythonic", which is apparently rather outdated. We should use the more common name. —CodeCat 17:30, 12 April 2016 (UTC)
 * This is pretty funny. I'm fairly certain I used to use "Brittonic" and then you corrected me to "Brythonic" (rightfully, since it is the one we use currently), but it's humorous to me that we're now suggesting the change. I definitely think we should change it to "Brittonic". — JohnC5 18:21, 12 April 2016 (UTC)


 * What about the (unsourced) "Some authors reserve the term Brittonic for the modified later Brittonic languages after about AD 600." statement? — I.S.M.E.T.A. 14:45, 13 April 2016 (UTC)
 * Google Books Ngrams shows what looks to me like virtually a statistical tie since 1950, though Brythonic has been more common since the turn of the century. I really don't have a strong preference either way. —Aɴɢʀ (talk) 09:28, 14 April 2016 (UTC)
 * That Ngrams result is very interesting and makes me lean towards keeping it as “Brythonic.” — JohnC5 14:56, 14 April 2016 (UTC)


 * Yes, keep as “Brythonic”. — I.S.M.E.T.A. 15:27, 14 April 2016 (UTC)


 * Yes, based on the ngram, keep as "Brythonic". - -sche (discuss) 03:02, 28 May 2016 (UTC)


 * Not renamed at this time. - -sche (discuss) 04:49, 3 July 2016 (UTC)

Extinct languages of the Marañón River basin
I reckon that we should add exceptional codes for the ones that have words directly recorded from them, even though it's precious few for each.
 * Palta (separate article) could be qfa-jiv-pal, in imitation of Linguist list's jiv-pal, but given that its family assignment is not one hundred percent certain and a three-part code is annoying and abnormal, we could also settle for sai-pal. I don't know why Jivaroan's language family uses qfa rather than sai, which seems to be our default for unsorted South American languages.
 * Rabona could be sai-rab.
 * Patagón could be sai-pat.
 * Bagua could be sai-bag.
 * Copallén could be sai-cop.
 * Tabancale could be sai-tab.
 * Chirino could be sai-chi.
 * Sácata could be sai-sac.

I'm not sure I see a point in adding codes for languages where the only words are elements in toponyms or names, rather than directly recorded. However,, , , , and others can be added if there is interest. After all, ISO already has codes for European languages like Dacian that aren't much better attested, and I suppose we could have an entry or two. —Μετάknowledge discuss/deeds 05:13, 26 May 2016 (UTC)


 * Re "why Jivaroan's language family uses qfa rather than sai": presumably human error. It could be updated to "sai-jiv". Re whether to use "sai-pal" or "sai-jiv-pal": for consistency with other codes ("nai-yuc-yav", etc) and with the schema described in WT:LANG, we should use "sai-jiv-pal" if we accept that it was a Jivaroan language. But as you say, the family identification is speculative (although the evidence which does exist is consistent with it). I suppose we could use "sai-pal" to be 'safe' / 'conservative' about the family identification and get a shorter code... this also lets us add the others as "sai-" codes without worrying we ought to reassign them if we later create a family code for the families they belong to. (Indeed, we already have a code for the Cariban family which Campbell and Grondona say Patagón belonged to.) - -sche (discuss) 22:27, 26 May 2016 (UTC)
 * I think it's better not to make any assumptions for these languages' genetic affiliations. And do you want to add the languages I mentioned in my last paragraph? —Μετάknowledge discuss/deeds 04:02, 27 May 2016 (UTC)


 * I've added Palta as sai-pal, added Rabona, Patagón, Bagua, Copallén Tabancale, Chirino and Sácata, and also renamed the Jivaroan code to sai-jiv and changed Esmeralda's code from qfa-und-esm to sai-esm to fit the usual naming scheme. - -sche (discuss) 15:42, 29 June 2016 (UTC)


 * I have added Puruhá as sai-prh. There was at one point a grammar of it (though it has been lost), so we know it existed as a discrete lect. And based on personal- and place-names, words in it have been reconstructed by scholars. Even if the only words in it we can add are words in the Reconstruction: namespace, that does seem worth having a code for (a code also lets us reference it when giving the etymologies of those personal- and place-names). The other languages could probably be given codes on the same basis (if words in them have been reconstructed). - -sche (discuss) 00:56, 1 July 2016 (UTC)


 * I've also added codes and entries for Cañari, Panzaleo and Caranqui. Everything here is done, I think. - -sche (discuss) 20:34, 2 July 2016 (UTC)

Saliba languages
We currently have a language called Saliba (sbe) and one called Sáliba (slc). In my mind, having two languages' names only differ in a diacritic is not acceptable. It confuses automated programs as much as human editors. I'm not sure there are any acceptable alternative names, so I propose using geographic disambiguation for sbe as "Saliba (Papua New Guinea)". —Μετάknowledge discuss/deeds 04:15, 26 June 2016 (UTC)
 * And slc should be "Saliba (Colombia)". As a side note, Sáliba language redirects to Saliba language. —Aɴɢʀ (talk) 13:30, 26 June 2016 (UTC)
 * Alternately we could call slc "Saliva". This seems to get about 10× more Ghits for both the ethnic group and the language (though interference from is possible). --Tropylium (talk) 04:34, 27 June 2016 (UTC)


 * Indeed the current names are so confusing that I [//en.wiktionary.org/w/index.php?title=waira&type=revision&diff=37672844&oldid=37664751 got] them backwards when I added the only three entries we have in the two languages. I think that disambiguating them both with parentheticals is clearer than allowing one to keep the ambiguous name (either while renaming the other to "Saliva" or while disambiguating the other). Current practice suggests that we should rename slc to "Saliva" or "Sáliva", but I wouldn't mind if we started making more frequent use of disambiguators instread. - -sche (discuss) 08:50, 27 June 2016 (UTC)
 * My first choice is "Saliba (Colombia)", my second choice is "Sáliva". It's bad enough we have Anus language to protect from puerile vandalism without also having Saliva language. —Aɴɢʀ (talk) 09:40, 27 June 2016 (UTC)
 * Alright, I'll rename them both to use parenthetical disambiguators. PS, don't forget Category:Anal language. - -sche (discuss) 05:16, 28 June 2016 (UTC)


 * ✅, except that per previous discussion on using geographic disambiguators rather than national ones, I used "New Guinea" rather than "Papua New Guinea". (Other languages also use "New Guinea" as a parenthetical disambiguator in their canonical names, and only mention "Papua New Guinea" in alt names because SIL uses national disambiguators like that.) - -sche (discuss) 05:24, 28 June 2016 (UTC)

Loarki and Gade Lohar
Ethnologue encoded this language twice: once as lrk "Loarki" (the name its 20,000 Pakistani speakers call it), and a second time as gda "Gade Lohar", the name its ~500 speakers on the Indian side of the border call it. (The International Encyclopedia of Linguistics entry on Gade Lohar conservatively only says the languages "may be the same" as Loarki, and notes its long list of other names: Gaduliya Lohar, Lohpitta Rajput Lohar, Bagri Lohar, Bhubaliya Lohar, Lohari, Gara, Domba, Dombiali, Chitodi Lohar, Panchal Lohar, Belani, and Dhunkuria Kanwar Khati. The IEL entry on Loarki is more explicit, breaking down the population by country and countain Gade Lohar's 500 speakers as Loarki speakers, because Loarki is "probably the same as Gade Lohar in Rajasthan, India, a Rajasthani language.") I propose to merge gda into lrk. - -sche (discuss) 20:50, 11 August 2015 (UTC)


 * ✅ - -sche (discuss) 03:23, 11 July 2016 (UTC)

Dhuwal
The ISO has retired the code duj and replaced it with dwu (in the process of splitting off dwy "Dhuwaya"). If we want to follow the ISO, all our Dhuwal entries and categories need to be switched from duj to dwu, which seems like a lot of unnecessary bother. (Whoever does this should also add dwy to the module.) - -sche (discuss) 05:54, 24 February 2016 (UTC)
 * On one hand, it's unnecessary; on the other hand, I think it would be ideal to follow ISO in all cases where we don't have a well articulated reason not to do so. —Μετάknowledge discuss/deeds 06:00, 24 February 2016 (UTC)
 * Support deprecation of duj. —Μετάknowledge discuss/deeds 06:19, 29 February 2016 (UTC)


 * ✅. (I also recoded Elfdalian from the nonstandard dlc to the standard ovd, per a BP discussion.) - -sche (discuss) 02:56, 7 July 2016 (UTC)

Kiriri
What do you think about adding a code for this language, and under what name? Wikipedia describes it at ; they cite Fabre for the claim that this language is only preserved in a single brief wordlist, where it is called Kiriri (the wordlist is on page 22 (section 3.4) of this pdf). Regardless, that document does seem to be a good place to find more words for water. —Μετάknowledge discuss/deeds 03:24, 2 March 2016 (UTC)


 * There's a Kiriri attested in a single wordlist collected from an elder from the 1960s that's a Katembri language, and another Kiriri attested in a single wordlist collected from an elder from the 1960s that's a ? Well, that's confusing.
 * The fact that there's only a limited number of words is no reason not to include the language, but it would be good to avoid the ambiguous name Kiriri. How about calling them Katembri and Xukuru, like Wikipedia does? Wait, (as a minor point of curiosity,) if the wordlist is labelled Kiriri, where'd the alternative name come from?
 * The difficult part will be assigning codes, given that the family affiliation is unclear. - -sche (discuss) 03:45, 2 March 2016 (UTC)


 * Given the naming issues, I'm suddenly confused about whether I have correctly identified the wordlist being referred to. I don't know anything about any of these languages, so I feel lost (it's so much better in Austronesian, for example, where I at least feel like I have a hold on what goes where). Anyway, that naming scheme makes sense; we can use qfa codes and not worry about the families, no? —Μετάknowledge discuss/deeds 05:01, 2 March 2016 (UTC)


 * qfa is the prefix for exceptional family codes. All of our exceptional language codes which start with qfa do so because they start with a family code that starts with qfa, like qfa-ctc-col. There's been at least one case where we've created a family code for an accepted family (qfa-len, the Lencan languages) in order to use it in constructing a language code (qfa-len-slv for Salvadoran Lencan), but Wikipedia notes that scholars aren't certain what family either Kiriri belonged to, so we couldn't do that here because we couldn't accurately, confidently assign either one a family code (even an exceptional family code). I suppose we could construct codes starting with qfa-und, like qfa-und-ktm for Katembri. I wouldn't want to use bare qfa-___ (e.g. qfa-ktm for Katembri) because it would look like a family code. - -sche (discuss) 08:21, 3 March 2016 (UTC)

More languages without ISO codes, part 1
I have gone through w:Category:Languages without ISO 639-3 code but with Linguist List code (thanks, Angr), and the languages listed below still need exceptional codes. I have not listed those that have no recorded material or toponyms, or those that are treated as a dialect of another language in the linguistic literature (like ). I have put suggested codes after them, and notes where I'm unsure (please correct me if I made any mistakes). —Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC)
 * (awd-ama) ✅ (notes below)
 * (alv-ama) ✅ (notes below)
 * (awd-ana) ✅ (notes below)
 * (cba-kan) — perhaps a different name would be better ✅ as Atanques (cba-ata) (notes below)
 * (sai-ayo) ✅ (notes below)
 * (nai-bay) ✅
 * (sai-bet) ✅
 * (gme-bur) — the name "Burgundian" is unoccupied but could cause confusion ✅ as gem-bur due to uncertainy over whether or not it is East Germanic
 * Discussed below. - -sche (discuss) 21:42, 5 July 2016 (UTC)
 * (awd-cab) ✅ as Cavere (awd-cav)
 * (nai-cal) ✅
 * (sai-cat) ✅
 * (azc-caz) ✅
 * (sai-chn) ✅
 * (sai-chp) ✅
 * (sai-chr) ✅
 * (nai-chi) ✅
 * (myn-cho) — I wasn't 100% sure that this lect is a separate language ✅ (notes below) as myn-chl for maximal distinction from Ch'orti&apos;
 * (sai-chu) ✅
 * (nai-cig) ✅
 * (sai-col) ✅
 * (nai-cui) ✅
 * (awd-cus) ✅ as Kustenau (awd-kus)
 * (nai-gua) ✅ as nai-guz to be more distinctive because so many lects and families have names starting with "Gua..."
 * (sai-gue) ✅
 * (nai-jum) ✅
 * (tup-kab) ✅ as Kabishiana
 * (awd-kaw) ✅
 * (awd-man) ✅
 * (sai-muz) ✅
 * (cba-nut) ✅
 * (sai-par) ✅
 * (sai-pyg) ✅
 * (sai-qtm) ✅
 * (nai-sin) ✅
 * (nai-yup) ✅
 * (sai-yur) ✅


 * Comments:


 * Amarizana: add per nom. Alexandra Y. Aikhenvald in Languages of the Amazon cautions that "many languages in Amazonia have 'namesakes' [and] more than one group may hide behind the same name", and more than one language has been called Amarizana: "one of the clans of the Piapoco is the Amarizanes (and the name is sometimes applied to the whole group). A now extinct language, also called Amarizana, and from the same Arawak family, used to be spoken in the Meta territory of modern Colombia." Nonetheless, the only references I find unambiguously mean this language. Čestmír Loukotka, Johannes Wilbert, Classification of South American Indian Languages (1968), page 131, lists some Amarizana words alongside and hence obviously distinct from Piapoco, including nuita "head", notuy "eye", nukagi "hand", kaxü "house", sietai "water", eriepi "fire" and keybin "sun". Julian Granberry's A Grammar and Dictionary of the Timucua Language even provides some etymology, connecting Amarizana eri(-...) "fire" to Achagua eri "sun, day", Arekena ale "sun".
 * Amasi: this happens to highlight what a mess our African language family codes are. Several codes use the prefix nic- even though their most immediate superfamily is alv, e.g. nic-vco should be alv-vco. Fortunately, fixing the nic- codes should not require updating very many pages. One that is done, precedent would have us use alv-bco-... rather than alv-... (compare nai-yuc-tip, qfa-ctc-cat), although the argument in favor of a shorter code is obvious. :-/ Some words are listed in a 1973 article in Africana Marburgensia ('AM') and in a pre-draft working paper cited by WP ('B'), including bú (AM) / bu (B) "dog", ázɔ́lí (AM) / azɔle (B) "tree", ɣà-nēm (AM) / ɣanim (B) "man", ɣà-zhyī (-zhyì?) (AM) / ɣaʒɛ (B) "woman", mwɔ̄ (AM) / muɔ (B) "water".
 * Anauyá: add per nom. Also called Anauya, but the version with diacritic is more common. has uni "water" and ahiri "sun", the latter confirmed by I Simposio Antonio Tovar sobre Lenguas Amerindias: Tordesillas... (Emilio Ridruejo Alonso, ‎Mara Fuertes, ‎Carlos González-Espresati; 2003) and both are seemingly in the aforementioned Classification of South American Indian Languages, although I can't see the exact snippet.
 * Atanque(s): out of the various names WP mentions, namely "Atanque (Atanques) or Cancuamo (Kankuamo), also known as Kankwe and Kankuí", plus others I ran across (Atanke), "Atanques" seems to be most common, at least as the name of the language. ("Kankuamo" is quite common as a placename(?) that forms part of the designation of a tribe.) A 1962 article in Anthropological Linguistics has some words, including jo̱ke "gourd cup", cognate to, and mo̱ga "two", cognate to , and the 1981 Comparative Chibchan Phonology has more words (and may drop the underline from the os of those words; it is hard to see, because all words are underlined), including ji "worm", jinua "six".
 * Ayomán (rarely also Ayoman): I've added a code for Jirajaran, sai-jir, so this language's code should be sai-jir-ayo.


 * - -sche (discuss) 07:32, 4 July 2016 (UTC)
 * Can't we just do two-part codes so we don't have to feel obligated to create these horribly long ones? It wouldn't clash with all of our preëxisting practice, despite there being some precedent. Also, I'm worried that your careful work on this is going to make this RFM section far too long, and also cause you to burn out. Perhaps this should be a user page that this section links to? —Μετάknowledge discuss/deeds 07:57, 4 July 2016 (UTC)


 * Where an existing ISO family code like alv exists, I suppose we could go with two-part codes, but then what should be done with languages that have no ISO family code but instead belong to families for which we've had to create qfa- codes? I suppose they can be treated the same as they are now. But if we accept nai and sai as family codes for this purpose, I suppose that means some qfa- things like Salvadoran Lenca and Catacao can be re-coded. I will update the existing three-part non-proto-language codes if we go that route. I've started Beer_parlour/2016/July. Yes, I considered as probably should start storing long comments and information on addable vocabulary in userspace. I wouldn't worry too much about burnout; we can take time; the only reason there would be a rush to add these codes ASAP is if we wanted to add words in particular ones of them, and if we wanted to add words, we'd need to do some research to find words to add. - -sche (discuss) 17:40, 4 July 2016 (UTC)


 * Regarding Burgundian, what should we do about ? Add it, too? - -sche (discuss) 01:48, 5 July 2016 (UTC)
 * I'm not sure. I assumed that it could be subsumed under fr, as the Oïl languages usually are, but we probably ought to address it separately. —Μετάknowledge discuss/deeds 09:01, 5 July 2016 (UTC)
 * I mean, is it separate from fr? I don't know. In any case, I guess on further consideration it doesn't stop us from adding the Germanic language as "Burgundian", because if the Romance language needs to be added, it can be Bourguignon. (And since they're from different (sub)families, it should be easy to tell which one was meant if someone enters a word from one incorrectly as the other.) I'll collect information about them at User:-sche/Burgundian (About Bourguignon). - -sche (discuss) 16:50, 5 July 2016 (UTC)


 * Btw, I found and added another one we were missing, Macoris, attested in one word (baeza) and some placenames. - -sche (discuss) 06:27, 5 July 2016 (UTC)
 * Oh, there are tons more. This is only the low-hanging fruit; a lot more languages with paltry data are waiting to be dealt with. I'm avoiding the Bantu ones for now, because pretty much all of them are in dialect continua and probably should be left alone unless good scholarship on their mutual intelligibility can be found (which I suppose I should go about finding). There's a pile of Australian ones that I'll get around to listing at some point (I thought maybe I'd give you time to digest all this first), and then even more messier ones from South America. —Μετάknowledge discuss/deeds 09:01, 5 July 2016 (UTC)
 * FWIW, re 'time to digest', I'd feel free to post any others you have the data to post (except the ones you mention are parts of dialect continua ... might as well leave them alone, as long as some part of the continuum has a code, although if no part does, then we should probably rectify that). There's no harm in it sitting around on the site unattended-to for a while, whereas letting it sit on one's computer sometimes (at least for me) means forgetting where one put it. (I can no longer find the information I thought I had collected on the separability vs mergeability of Haida dialects.) - -sche (discuss) 16:50, 5 July 2016 (UTC)


 * Regarding Ch'olti': The oldest stage of the language (written in so-called hieroglyphs), sometimes confusingly just called "Ch'olti'" but more often called by other names, has its own code emy. The Colonial- and post-Colonial-era stage is considered distinct from emy, and also considered distinct from the more recent stages of Ch'orti', e.g. one reference says "In the Mayan classificatory tradition, the Ch'olti' language, as recorded in the 1695 grammar of Pedro Moran, is generally held to be related to but separate from the modern language of Ch'orti' (see Kaufman's 1976 classification, for example)." Post-Epigraphic-era Ch'olti' and modern-era Ch'orti' (caa) are theoretically distinguished from each other as different branches of Eastern Ch'olan (and from ctu, as it is a Western Ch'olan language), but the size of the difference between Ch'olti' and Ch'orti' is hard to ascertain, especially because, quoth WP, "the post-colonial stage of the language is only known from a single manuscript written between 1685 and 1695" (as afore-mentioned). For that matter, the size of the difference between Epigraphic Ch'olti'an and Colonial Ch'olti' is not obvious to me; Søren Wichmann, The Linguistics of Maya Writing (2004), page 271, says "In this section we show how Classic Ch'olti'an became seventeenth-century Ch'olti'. The chief grammatical difference between the grammars of Classic Ch'olti'an and Ch'olti' is the difference between straight- and split-ergativity." As an example, mi "father" is used in Classic and Colonial Ch'olti'(an) and in Ch'orti'. Nonetheless, given that the corpus of post-emy Ch'olti' is small and well-defined, it shouldn't be that hard to include it separately from emy and caa. - -sche (discuss) 21:42, 5 July 2016 (UTC)


 * I also added Wanham. - -sche (discuss) 22:07, 10 July 2016 (UTC)


 * I've split the discussion so that ones that are done can be archived. - -sche (discuss) 19:28, 10 July 2016 (UTC)

Yokutsan languages
We seem to have this as a single macrolanguage, yok, despite the fact that the constituent lects seem to constitute at least a few languages. -sche added some entries, but it's all tagged by (dia)lect, so it will be easy to separate them. I think we should retire yok and replace it with exceptional codes to reduce confusion, but I am not sure what those divisions should be. —Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC)


 * Yes, the difficulty is in deciding how to divide it. Christopher Loether cautions that although e.g. "Gayton (1948) listed 26 named groups or 'tribelets'[,] many of these named groups speak dialects which are nearly identical phonologically, lexically and syntactically, while others speak varieties which are indeed quite distinct from their neighbours' speech. Kroeber [...] divided [what he considered a single] language into two main divisions: Valley and Foothill Yokuts. He further divided the Valley Branch into Northern and Southern, and the Foothill branch into Kings, Tule-Kaweah, Poso, and Buena Vista. Newman (1944: 5-3) agreed with Kroeber's analysis of a single Yokuts language and stated that his data corroborated Kroeber's dialect divisions."
 * Kroeber lists 20+ dialects, of which 21 are named in [[ˀilik']]. Wikipedia has a tree/bush diagram from Whistler and Golla of 23-28 dialects, including all of those 21 plus Koyeti, Merced "(?)", Noptinte (Nopchinche(s), Nopthrinthre(s), Nopṭinṭe, Nopthrinte, Noptinci), Yachikumne a.k.a. Chulamni, Lower San Joaquin Yokuts, and Lakisamni "(?)", and Tawalimni. (Several have multiple names, e.g. Ayticha is also called Kocheyali as well as Ayitcha; Palewyami is also Altinin and Poso Creek Yokuts in addition to Paleuyami. And Hometwoli is also Taneshach?)


 * Kroeber's and Newman's division is:
 * Valley
 * 1. Northern Valley
 * 2. Southern Valley
 * Foothill
 * 3. Kings River (including Gashowu)
 * 4. Tule-Kaweah
 * 5. Poso [Creek]
 * 6. Buena Vista
 * Whereas, Whistler and Golla's division is:
 * 1. Poso [Creek] (Palewyami)
 * General Yokuts
 * 2. Buena Vista (Tulamni, Hometwoli)
 * Nim-Yokuts
 * 3. Tule-Kaweah (Wikchamni, Yawdanchi)
 * Northern Yokuts
 * 4. Kings River (Chukaimina, Michahay, Ayticha, Choynimni)
 * 5. Gashowu
 * Valley Yokuts
 * 6. Southern Valley (Yawelmani, Tachi, etc)
 * 7. Northern Valley (Chukchansi, Kechayi, etc)
 * 8. Far Northern Valley (misc dialects)


 * For Yawelmani and Chukchansi, decent resources exist; in addition to those two, Wikchamni and Tachi are also being taught according to WP, and in addition to those four, Choinimni and Kechayi also have at least some speakers according to WP.
 * WP says the Yokutsan family consists of "half a dozen" languages, but evidently not the six just named, because those six leave out several major branches that Kroeber, Newman, and Whistler and Golla all agree on.
 * I suggest we create a family code nai-yok for the Yokutsan languages, and then distinguish the following branches which Kroeber, Newman, and Whistler and Golla all consider distinguishable, without splitting them further at this time:
 * Palewyami (nai-ply — or putting the y at the start so the codes sort together and are more apparently connected — on further thought, nah ) a.k.a. Poso a.k.a. Poso Creek
 * Buena Vista Yokuts (nai-bvy) a.k.a. Tulamni-Hometwoli
 * Tule-Kaweah Yokuts (nai-tky) a.k.a. Wikchamni, Yawdanchi
 * Kings River Yokuts (nai-kry) a.k.a. Choinimni, etc
 * Gashowu (nai-gsy), which Kroeber and Whislter/Golla agree is intermediate between Kings River and Northern Valley, though Kroeber considers it ultimately/genetically Kings
 * Southern Valley Yokuts (nai-svy) a.k.a. Yawelmani, Tachi, etc
 * Northern Valley Yokuts (nai-nvy) a.k.a. Chukchansi, Kechayi, etc
 * Delta Yokuts (nai-dly) a.k.a. Far Northern Valley Yokuts
 * A more conservative approach would keep Southern Valley Yokuts, Northern Valley Yokuts, and Delta Yokuts together as "Valley Yokuts", but Delta Yokuts is relatively divergent. - -sche (discuss) 22:16, 3 July 2016 (UTC)


 * Pinging in case you have insight or input on this Californian language family. - -sche (discuss) 04:01, 5 July 2016 (UTC)
 * Not much, I'm afraid. A couple of decades ago I read everything I could find on the Wikchamni, who used to live in the area where my brother lives now. I read all of the sources you mentioned above, but I was more interested in the ethnobotany of the Yokuts than their languages, per se, and that was a long time ago.
 * On a side note, I remember one of my professors at UCLA back in the 80s saying that Yawelmani was one of the best-understood languages in the world at the time from a theoretical perspective, because so many linguists had been publishing papers on it- it was sort of the linguistic equivalent of a model organism. Chuck Entz (talk) 04:25, 5 July 2016 (UTC)


 * ✅ [//en.wiktionary.org/w/index.php?title=Wiktionary%3ALanguage_treatment&type=revision&diff=39061881&oldid=39061591 ]. - -sche (discuss) 06:51, 15 July 2016 (UTC)

various places where WT:LANGTREAT and Module:languages are inconsistent
Module:languages includes codes from the individual members of several language/dialect groups which WT:LANGTREAT says, without citing any discussion, should be merged. Something needs to change: either WT:LANGTREAT should be updated to note that the individual varieties are allowed, or their codes should be removed from the module. The following language/dialect groups are affected: (Note 1: whenever the merging of a particular dialect group had been discussed and that discussion was cited by LANGTREAT, I simply updated the module.) (Note 2: Haida, Cree and Kalenjin face the same issue; I expect to write about them later.) - -sche (discuss) 05:16, 20 November 2013 (UTC)

the Gondi lects
Stephen Tyler (not the singer), in his oft-cited works on Gondi, states "Though I have no real evidence, the general pattern seems to be for geographically adjacent Koya and Gondi populations to speak different, but mutually intelligible Gondi dialects. Where these populations are geographically non-contiguous, the dialects are not mutually intelligible. This same pattern probably prevails among all Gondi dialects." WP says "The more important dialects are Dorla, Koya, Maria, Muria, and Raj Gond." Ethnologue, meanwhile, as encoded only two varieties, ggo (Southern Gondi) and gno (Northern Gondi). Should we deprecate those two codes? Or deprecate the macro-code gon and recognise those dialects? Or allow all three? - -sche (discuss) 05:16, 20 November 2013 (UTC)
 * Merged (whereas, the ISO, while retaining gon, split ggo into two new codes which have not been added). - -sche (discuss) 06:27, 24 February 2016 (UTC)

Merging Malay
I propose to merge the following Malay lects into Malay [ms]: These are all mere dialects of Malay with no written tradition and perfectly mutually intelligible. Even Ethnologue says they should be considered dialects of Malay rather than separate languages. -- Liliana • 23:39, 28 February 2016 (UTC)
 * Jakun [jak]
 * Orang Kanaq [orn]
 * Orang Seletar [ors]
 * Temuan [tmw]


 * Support. - -sche (discuss) 02:36, 29 February 2016 (UTC)


 * Support. — I.S.M.E.T.A. 01:32, 4 March 2016 (UTC)


 * ✅. - -sche (discuss) 04:20, 3 July 2016 (UTC)

Ngadha
This language is far more commonly called "Ngadha" than "Ngad'a"; the latter spelling is so rare that when I was trying to verify our "Ngad'a" translation of water using that spelling for the language name, I couldn't find any references at all (they all spell it "Ngadha"). This rename entails moving a few categories and updating a handful of entries. - -sche (discuss) 02:32, 29 February 2016 (UTC)
 * Support. —Μετάknowledge discuss/deeds 02:35, 29 February 2016 (UTC)


 * ✅. Should Eastern Ngad'a be merged? - -sche (discuss) 04:47, 3 July 2016 (UTC)

Bontoc
We currently include both the macrolanguage Bontoc (code: bnc) and its dialects, particularly Eastern Bontok (not even using the same spelling, you notice! code: ebk) which we have about ~35 translations in. IMO, it rarely makes sense to include both a macrolanguage and also all of its dialects; we should usually have one or the other but not both. Ethnologue says the dialects are "reportedly similar", as if they split bnc into dialects in 2010 without without knowing enough about them to tell whether they were similar or distinct. The International Encyclopedia of Linguistics considers Central Bontoc to be only 56% intelligible with Eastern Bontoc, which is only a few percentage points better than the intelligibility of the various Bontocs with Ilocano, suggesting that at least Central and Eastern Bontocs, if not the others, are different languages. Our ~15 "Bontoc" (bnc) entries seem to be Central (Igorot) Bontoc and could be relabelled accordingly if we deprecated bnc. - -sche (discuss) 07:48, 29 February 2016 (UTC)


 * Relabelled "Central Bontoc" or "Igorot Bontoc"? And is it "Bontoc" or "Bontok"? Whatever the details, I support the idea of reducing the macrolanguage Bontoc to an etymology-only language in favour or having translations and entries for the various Bontoc languages. — I.S.M.E.T.A. 01:30, 4 March 2016 (UTC)


 * "Central Bonto(c|k)" is more common than "Igorot Bonto(c|k)" or "Bontok Igorot", and [//books.google.com/ngrams/graph?content=Bontok%2CBontoc&year_start=1880&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t1%3B%2CBontok%3B%2Cc0%3B.t1%3B%2CBontoc%3B%2Cc0 "Bontoc" is more common] than "Bontok". I've tweaked the canonical spelling of Central Bontoc (lbk) accordingly; I suppose the other Bontocs which are currently spelled with k should also be updated. - -sche (discuss) 01:58, 4 March 2016 (UTC)


 * "Cental Bontoc" it is, then. — I.S.M.E.T.A. 02:14, 4 March 2016 (UTC)


 * A number of works refer to "the Bontoc language" without specifying which of the Bontoc languages they mean, and we couldn't easily include words from these works if we deprecated bnc; there are even books like Clapp's Vocabulary of the Igorot Language as Spoken by the Bontok Igorots which conflate all the languages of the Igorot people (perhaps understandably, given the point above that the Bontocs and e.g. Ilocano are equally different from each other). However, if we accept that being barely halfway mutually intelligible makes Central vs Eastern Bontoc separate languages, then we're not losing anything of quality by not following (and not being able to easily add content from) books that fail to distinguish such different lects. - -sche (discuss) 03:01, 4 March 2016 (UTC)


 * Quite. Though, there will be occasions when it will not possible to work out easily from which of the Bontoc languages a given term in a borrowing language will have been derived. — I.S.M.E.T.A. 03:28, 4 March 2016 (UTC)


 * ✅ - -sche (discuss) 04:15, 15 July 2016 (UTC)

Matanawi
Another language we need a code for, presumably sai-mat. Is there a more efficient way to find languages we've missed than my current method, which is simply happening upon them? —Μετάknowledge discuss/deeds 08:33, 28 June 2016 (UTC)
 * I suppose you could go through w:Category:Languages without ISO 639-3 code but with Linguist List code. —Aɴɢʀ (talk) 13:54, 28 June 2016 (UTC)


 * ✅ - -sche (discuss) 04:51, 15 August 2016 (UTC)


 * Btw, Native South Americans: Ethnology of the Least Known Continent lays out the case from Nimuendaju (who documented Mura) that Rivet, at least, if not others, was too hasty in grouping this with Mura. Nimuendaju considers Mat. and Mur to be isolates. - -sche (discuss) 05:12, 15 August 2016 (UTC)

The Rwanda-Rundi lects
We currently separate, , , , , and , which can all be treated as a single language and often are; their separation seems largely political. The Wikipedia page catalogues the differences between rw and rn: they are pretty minor, and a lot seem to have to do with regular spelling differences and tones, which are not even reflected in the standard orthography and could thus be relegated to pronunciation sections. To quote Zorc and Nibagwire's Kinyarwanda and Kirundi comparative grammar (2007): "The terms dialect and language are used loosely in everyday communication. In linguistic terms, the two are bound together in the same definition: a language consists of all the dialects that are connected by a chain of mutual intelligibility. Thus, if a person from Bronx, New York can speak with someone from Mobile, Alabama, and these two can converse with someone from Sydney, Australia without significant misunderstandings, then they all form part of the English language. Kigali [the capital of Rwanda] and Bujumbura [the capital of Burundi] are similarly connected within a chain of dialects that collectively make up the Rwanda-Rundi language." Kimenyi's A Relational Grammar of Kinyarwanda (1980) explains that: "This language [Kinyarwanda, rw] is very close to both Kirundi [rn], the national language of Burundi, and Giha [haq], a language spoken in western Tanzania. The three languages are really dialects of a single language, since they are mutually intelligible to their respective speakers." That seems like a strong case for merger to me, although I'd like to see if any academic sources disagree. (Pinging User:-sche as well, just to try it out.) —Μετάknowledge discuss/deeds 01:52, 17 December 2013 (UTC)
 * I got the ping, thanks. :) I've just been busy. I'll look into this more closely soon, but on the face of it, it does seem like we could merge them. (And that reminds me that en.Wikt really needs to have a discussion about merging Nynorsk and Bokmal. It's bizarre that we manage to accept that Drents and Twents — and, to use the example above, the English of Alabama and the English of Australia — are not separate languages, but haven't managed to accept that Nynorsk and Bokmal aren't. But that's for another discussion...) - -sche (discuss) 06:04, 19 December 2013 (UTC)
 * But cf. Appendix:Swadesh lists for Bantu languages, which shows large differences between the two. -- Liliana • 14:41, 14 February 2014 (UTC)
 * I was just talking to a native speaker today to get their perspective on this. They said that the vocabulary varies a lot dialectally, but not along national lines, and it's still easy to understand people on the other side. They claimed that the biggest differences were in the tones, but that's not even marked in the orthography. I think that's a pretty strong ase for merger. —Μετάknowledge discuss/deeds 19:22, 28 December 2015 (UTC)


 * R. David Zorc and ‎Louise Nibagwire have an entire Comparative Grammar (2007, ISBN 1931546320 devoted to the differences between the two, which Google unfortunately only shows snippets of, including the TOC which lays out spelling differences, noun class differences, "word pairs, one matched, the other completely different", and false friends, as well as dialect-specific tonal marking. However, vocabulary differences and false friends exist even between English dialects (luck out), and tonal marking and other pronunciation differences which aren't reflected in the orthography can be handled in pronunciation sections, as with Iranian Persian vs Dari. The only thing that gives me pause is the point that dialects on the extremes of the continuum "may not" be intelligible with each other per WP, but then, if the dialects aren't split up along national lines / along the same lines as the codes, then that's not a good argument against merger. - -sche (discuss) 00:14, 7 March 2016 (UTC)
 * Elena Zinovʹevna Dubnova, in The Rwanda Language (1984), page 15, writes: "A. Coupez maintains that "as a matter of fact, Rwanda, Rundi and Ha ( and maybe other languages spoken east of the latter two areas) are so close that they can be regarded as dialects of one language" (17. p. 11). According to the other view they are closely related but still different languages (11, p. 26)." Other sources concur, "In linguistic terms, Kirundi and Kinyarwanda are mutually intelligible dialects of the Rwanda-Rundi language." Merged under the code rw. - -sche (discuss) 19:17, 14 August 2016 (UTC)

Huarpean languages
These are three extinct languages of Argentina that lack ISO codes, but two of them have recorded material (the third, Puntano, seems pointless to add). The only problem is that some linguists consider these to be dialects of the same language, although that is debated and cannot be satisfactorily resolved with the limited preserved lexica from each. I would prefer we follow es.wiktionary's lead in adding separate codes for Allentiac (sai-all) and Millcayac (sai-mil). —Μετάknowledge discuss/deeds 05:36, 28 June 2016 (UTC)
 * ✅. - -sche (discuss) 06:11, 15 August 2016 (UTC)

renaming Atong
I suggest renaming References seem to prefer the spelling "Atong" to "A'tong" for aot, and Wikipedia says "The correct spelling Atong is based on the way the speakers themselves pronounce the name of their language. There is no glottal stop in the name and it is not a tonal language." - -sche (discuss) 04:37, 29 July 2016 (UTC)
 * ato from Atong to Atong (Africa) or Atong (Cameroon)
 * aot from A'tong to Atong (Asia) or Atong (India) if we accept the non-inclusion in the latter name of the border areas of Bangladesh where it is also spoken


 * I support renaming the languages to "Atong (Cameroon)" and "Atong (India)", respectively; in the latter, "India" can be taken to refer to the subcontinent, rather than the country. — I.S.M.E.T.A. 18:55, 31 July 2016 (UTC)


 * ✅. - -sche (discuss) 20:36, 15 August 2016 (UTC)

More languages without ISO codes, part 2
{—Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC))


 * (cba-dor) ✅
 * Glottolog mentions a dialect Chánguena (Changuina); Linguist List gives separate codes to Chumulu and Gualaca. Do we know how distinct these are? - -sche (discuss) 07:31, 6 July 2016 (UTC)
 * see below
 * (cba-dui) ✅
 * (sai-ewa) ✅
 * Sorry it took so long before I added this language... I just had to get around Duit. - -sche (discuss) 22:40, 6 August 2016 (UTC)
 * (sai-gay) ✅
 * Do we need a code for the fourth of the languages Oramas covers (in the same work where he covers Gayon, Ayaman, and Jirajira), Ajagua (Axagua, Jagua)? Loukotka says: "once spoken on the Tocuyo River near Carera, state of Lara, Venezuela. [A. Espinosa (Vazquez de Espinosa) 1948, p. 35, only two words: Oramas 1916, pp. 49-57, only patronyms.]" - -sche (discuss) 23:02, 10 August 2016 (UTC)
 * ✅. This Ajagua is to be distinguished from Achagua, which is also called Ajagua and Achawa, but is spoken 1500+ kilometers away. - -sche (discuss) 05:07, 15 August 2016 (UTC)
 * (sai-hau) ✅
 * (cba-hue) ✅
 * (sai-jir) — if we're not doing three-part codes, then this will need to be sai-jrj to avoid clashing with the family code ✅
 * (sai-kat) ✅
 * Cf User talk:-sche, where a few other needed additions are mentioned. This language is also known as Catrimbi or Kariri de Mirandela, or Kiriri, and is described by Loukotka as the "lost language of the ancient mission of Saco dos Morcegos, now the city of Mirandela". AFAICT, we should add this language (which is also known as Kariri), we already have Xukuru (which is also known as Kirirí and Kirirí-Xokó), and we should possibly add Xukuru-Kariri (postscript: ✅), which is also known as Xocó. Keeping them all straight is going to be difficult and is complicated by the fact that each is known only from short words elicted from elders in 1961. - -sche (discuss) 20:21, 14 July 2016 (UTC)
 * ✅. - -sche (discuss) 22:10, 6 August 2016 (UTC)
 * (sai-nat) ✅
 * (sai-oto) ✅
 * (ine-pae) ✅
 * (sai-pam) ✅
 * (sai-pur) ✅
 * I have also added . - -sche (discuss) 19:38, 16 August 2016 (UTC)
 * (sai-san)
 * ✅ but without the accent, since English-language sources seem to drop it at least as often as they retain it. Cabrera (1929) is said to record a few words and Serrano (1945) five more. I've also added a code for Comechingón / Comechingon / Comechingona. - -sche (discuss) 17:51, 15 August 2016 (UTC)
 * (sai-sec) ✅
 * Btw, Matthias Urban lays out a number of arguments that Spruce's semi-well-known wordlist is definitely Sechura and not Tallan. - -sche (discuss) 06:03, 14 August 2016 (UTC)
 * (awd-she)
 * Also called Shebaye, Shebayo, per Campbell, American Indian Languages: The Historical Linguistics of Native America; he says David Payne "adduces persuasive evidence from the scant fifteen words recorded in extinct Shebayo (Shebaye) of Trinidad to show that it belongs with the Caribbean group (for example, it appears to have da- ‘my’, and these languages are the only ones which have an alveolar stop and not a nasal for 'first person singular')". - -sche (discuss) 05:04, 14 August 2016 (UTC)
 * ✅ as "Shebayo", the most common spelling. Douglas MacRae Taylor, Languages of the West Indies (1977), page 15, has: The Shebayo list, taken from De Laet's Novus Orbis, is as follows (divergent spellings found in different editions are shown in parentheses): heia (heja) 'pater', hamma 'mater', wackewijrrij 'caput', wackenoely (wackenoey) 'auris', noeyerri (noeyerii) 'oculus', wassibaly (wassi) 'nasus', darrymaily 'os', wadacoely 'dentes', watabaye 'crura', wackehyrry 'pedes', ataly 'arbor', hoerapallii 'arcus', hewerry 'sagittae', kyrizyrre 'luna', and wecoelije 'sol'. - -sche (discuss) 19:22, 15 August 2016 (UTC)
 * (sai-tal) ✅
 * (sai-tpr) ✅
 * (azc-tat) ✅
 * (sai-teu) ✅
 * (cba-vot) [[Image:Symbol abstain vote.svg|20px|link=]] Not done
 * Loukotka says nothing of Voto is attested (which is ironic, because the subfamily is apparently named after it). The Indigenous Languages of South America: A Comprehensive Guide considers it a variety of Rama, possibly based on the fact that the Ramas were also called Votos. - -sche (discuss) 05:04, 14 August 2016 (UTC)
 * Glottolog doesn't list it or have any resources on it, either. Therefore, not added until content in it can be found, or at least determined to exist. - -sche (discuss) 05:45, 15 August 2016 (UTC)
 * (sai-waj) ✅ as Wayumará, a more common name
 * (also called Wayumara per several sources, and Azumara and Guimara per Loukotka)
 * (sai-wam) ✅ as Guamo, a more common name


 * Comments:
 * I've split the discussion so that ones that are done can be archived. - -sche (discuss) 19:28, 10 July 2016 (UTC)


 * As of now, with the addition of Tataviam, we include codes for 7900 languages, and will soon have codes for over 8000, including artificial languages and proto-languages. Wiktionary talk:Milestones. - -sche (discuss) 07:48, 11 July 2016 (UTC)


 * As to how different the dialects of Dorasque are: A. L. Pinart's Vocabulario Castellano-Dorasque, Dialectos Chumulu, Gualaca y Changuina gives si (and ji) as the Chumulu form(s), and gives ti as the Gualaca form and ji as the Changuina form for "water". Overall, it's hard to tell how intelligible the dialects would be; some words are quite similar (Chumulu utká, Gualaca utkál "yellow"; Chumulu katuvá, Gualaca katavá "bow"), others are very different: Chumulu sagúsaña, Gualaca θake "blue"; Chumulu sérkala, Gualaca okiyigua "" (OTOH, all three dialects use sérkala for "bench"). "Woman" is biá in Chumulu and Changuina, ωiá in Gualaca. I suppose we can follow Pinart in entering it as one language. - -sche (discuss) 01:12, 10 August 2016 (UTC)


 * I've also added a code for and . - -sche (discuss) 21:08, 13 August 2016 (UTC)


 * I've also added three Shastan languages:


 * (nai-okw): Berkeley.edu's Indian Languages project says some (&lt;100) words were recorded in the early 20th century; the Handbook of North American Indians: California refers readers to Kroeber (1925) and Voegelin (1942); Glottolog cites Victor Golla California Indian languages (2011). Kroeber says "The dialect is peculiar. Many words are practically pure Shasta; others are distorted to the very verge of recognizability, or utterly different." Victor Golla, California Indian Languages, speculates at length that Okwanchu may have been "a bilingual mix of Shasta and some other language". There was a people "whom the McCloud River Wintu considered Wintu and called Waymaq ('north people') [whom] Du Bois believed [...] were closely related to, if not identical with, the Shastan Okwanuchu; the survivor of the group whom she interviewed gave her a short vocabulary that included words of Shasta origin (Du Bois 1935:8)." These words included atsa 'water', au-u 'wood', katisuk 'bring'. Golla also says "Okwanuchu speech may also be attested in [eighteen] words identified as 'Wailaki on McCloud' (cf. Wintu waylaki 'north people') that Jeremiah Curtin recorded" in 1884, namely: gü'ru 'man', ki'rikega 'woman', hänumaqa 'old man', apci 'old' (in ki'rikega apci 'old woman'), ä'toqe'äqa 'young man', kewatcaq 'young woman', tse'gwa 'one', hoka 'two', qätsi 'four', tseapka 'five' = 'one hand', hukaapka 'six', tsuwara 'sun', kapqu'[r]wara 'moon', kau 'snow', atsa 'water', gri'tuma 'thunder', itsa 'rock', tarak (Golla's note: "[terak?]") 'earth'. Golla notes: "Curtin employed the BAE transcription system, in which q represents a velar fricative, not a velar stop." "Of these forms, five (man, old man, four, thunder) are not attested in any other variety of Shasta."
 * (nai-knm): The Handbook of North American Indians: California says "An unpublished Konomihu word list was collected by Angulo (1928a)." Glottolog cites Shirley Silver Shasta and Konomihu (1980), Roland B. Dixon The Shasta-Achumawi: A New Linguistic Stock, with Four New Dialects (1905), Lars J. Larsson Who Were the Konomihu? (1987). Kroeber says "Kroeber says "it is still questionable whether their speech is more properly a highly specialized aberration of Shasta or of an ancient and independent but moribund branch of Hokan from which Karok and Chimariko are descended together with Shasta. [...] Konomihu is their own name." Silver's work documents a number of words, including kihínàpxī́k "woman".
 * (nai-nrs): "the language is attested only in a few wordlists" per Berkeley.edu's Indian Languages project. Kroeber, who mentions the alternative exonym Amutahwe, says they were "perhaps rather nearest to the major group in speech, although at that their tongue as a whole must have been unintelligible to the Shasta proper. [...T]he tribe melted away without a survivor, leaving only a fragmentary vocabulary."
 * - -sche (discuss) 23:11, 15 August 2016 (UTC)

Bole
The di- in its current name, Dibole, is one of those language prefixes, but curiously enough, the most commonly used name for this language is actually Babole, with a different prefix. (Luckily, unprefixed Bole isn't used, because it's taken up already by bol.) We should rename this to Babole accordingly. —Μετάknowledge discuss/deeds 17:32, 29 June 2016 (UTC)
 * If I know anything about Bantu languages, "Babole" refers to the people who speak it rather than the language itself. Is this really what they call it? —CodeCat 18:24, 29 June 2016 (UTC)
 * Yes, that's why I noted that it was odd. You're welcome to compare and  yourself. Not much work is done on it, but when it is, it's called Babole. —Μετάknowledge discuss/deeds 22:46, 29 June 2016 (UTC)


 * Indeed, searching turns up nothing relevant, and searching  and  finds only references to the Nigerian Bole. Whereas,  is well attested. Rename. - -sche (discuss) 03:11, 30 June 2016 (UTC)


 * ✅ - -sche (discuss) 00:08, 17 August 2016 (UTC)

Gandhari
We currently have this as "Gāndhārī". This spelling is indeed used frequently in the literature, but we try to avoid difficult diacritics, and this spelling can be seen as using IAST to render the native name of the language, whereas "Gandhari" is the corresponding English. I think that switching to "Gandhari" would be the better choice. —Μετάknowledge discuss/deeds 08:52, 5 July 2016 (UTC)
 * I agree, the IAST diacritics are unnecessary. —Aryamanarora (मुझसे बात करो) 15:34, 5 July 2016 (UTC)
 * Yes, rename. - -sche (discuss) 22:16, 5 July 2016 (UTC)
 * ✅ - -sche (discuss) 00:45, 17 August 2016 (UTC)

Ersu languages
We follow the ISO in covering this as a single macrolanguage, ers. Following Yu (2012), we should keep ers as Ersu, but also create sit-tos for Tosu and sit-liz for Lizu. —Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC)
 * ✅. And wow! What a writing system — the colour of the writing changes the meaning! - -sche (discuss) 20:52, 28 August 2016 (UTC)

More languages without ISO codes, part 3
{—Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC))


 * (sit-cha) ✅
 * (tbq-gkh)
 * ✅ as sit- because there is apparently disagreement over whether Tibeto-Burman comprises a monophyletic group, and if we later remove it in favor of sit, the fewer things we have to rename, the better. - -sche (discuss) 03:13, 29 August 2016 (UTC)
 * (sem-him) ✅
 * (awd-kar) ✅ as Cariay
 * (sai-mas) ✅ (as -msk to not conflict with the Mascoian family)
 * (sem-mhr) ✅
 * (cba-cat) — I think we should call this "Catío Chibcha" instead ✅
 * (sai-opo) ✅
 * (awd-pas) ✅; also spelled Passé, Pazé
 * (sai-qmb) ✅, but see discussion below
 * (nai-tap)
 * ✅, although this is another of those languages that is attested only in two (or in this case three) disparate wordlists, by Sapper, Ricke, and Johnston, all published by Lehmann, who considers them nonetheless the same language and presents them in two columns, "Johnston" and "Sapper-Ricke". - -sche (discuss) 19:45, 18 August 2016 (UTC)
 * (nai-teq) ✅
 * (nai-wai) ✅
 * (sai-yup) ✅, see comments below


 * Comments:
 * I've split the discussion so that ones that are done can be archived. - -sche (discuss) 19:28, 10 July 2016 (UTC)

Quimbaya
Loukotka, and Campbell citing him (and Wikipedia citing Campbell), says only a single word is attested, but none of them say what that word is. Other sources, none of which strike me as reliable, provide different(!) words, of which the most common is quindio (one book has quindus, one has kindiyo, parallelling the spellings Quimbaya - Kimbaya), meaning "paradise", but other words mentioned include chascará, batatabatí, and fihisca "spirit" (the last of which may actually be from some other language). proel.org says eight words are attested, but doesn't say what they are. Several books provide placenames which may attest the language. - -sche (discuss) 23:57, 20 August 2016 (UTC)

Yupua

 * An old source (Daniel Garrison Brinton, 1898) says "The Jupua and Curetu dialects are properly one and the same, the difference which appears in their vocabularies arising simply from inequality in the ears and the orthographies of observers. This is evident by the following comparison..." and then proceeds to offer a comparison which IMO doesn't actually conclusively demonstrate anything. In any case, Cueretú / Curetu itself does not currently have a code. - -sche (discuss) 04:20, 14 July 2016 (UTC)


 * I can find a number of sources attesting Yupua words, but none seem like reliable primary sources: Peruvia Scythica: The Quichua Language of Peru mentions "wui" as the word for house, but does so as part of trying to connect a huge number of unrelated languages based on chance sound correspondances. Brinton's 1898 Studies in South American Native Languages has a list of words which he sources to Martius (surely referring to the Wörtersammlung Brasilianischer Sprachen) and compares to Curetu words sourced to Wallace (surely referring to A Narrative of Travels on the Amazon and Rio Negro), but I haven't located the Yupua words in the originals; the words Brinton gives are: blood: thik (Yupua), dü (Wallace); bow: patopai, patueipei; earth: thitta, ditta; flesh, ga'hi', se'hea'; finger, moh-asoing, mu-etshu; fire, pieri, piure; flower, pagari, bagaria; foot, göaphoe, giapa; hair, poa, phoa; hand, moho, muhu; head, co'ëre, cuilri; house, wu'i, wee; mouth, thischüh, dishi; sun, hauvä, aoué; tongue, toro, dolo; tooth, gobâckaa', gophpecuh; water, thäco, deco; woman, nomöa, nomi; he also offers the additional Yupua words hóggoa "water" (sic), göaphae "foot", ga'hi "meat", jih "jaguar", ikama "deer", jocheo' "star". Ruhlen has manapẹ "husband / man", apara "we two", ti "this", -mai- "we", tsīngeē "boy", pilo "fire", poa "feather". - -sche (discuss) 05:29, 14 July 2016 (UTC)
 * Martius confirms thäco is "water" and pieri is "fire", and has wúi "house", pohjá "feather". - -sche (discuss) 07:54, 20 August 2016 (UTC)

Merging Berawan
It does not make sense to have both the macrolanguage (lod, which the ISO retired) and the dialects (zbc, zbe, zbw). We should either merge the dialects or retire lod. However, Berawan is composed of more than three varieties, which the literature often speaks of, though there are some works that use the same coarser division as Ethnologue. - -sche (discuss) 09:57, 1 September 2016 (UTC)
 * Being a "lumper" rather than a "splitter", I support merging the dialects into the macrolanguage. Dialect labels can be added with lb as needed. —Aɴɢʀ (talk) 11:24, 1 September 2016 (UTC)


 * Oddly, Ethnologue does not provide a claim of how mutually intelligible vs unintelligible they are. Blust, in The Consonants of Long Terawan, asserts four dialects, all spoken within 25 miles of each other: Long Terawan (Ethnologue's zbw) and Batu Belah (part of zbc), and Long Teru (part of zbc) and Long Jegan (zbe); he considers e.g. Long Pata (which WP names, and which is noted by Ray's 1913 Languages of Borneo) to be a variant of one of these, though he is nonspecific as to which one. When I compare Blust's Terawan data and Burkhardt's The historic evolvement of true triphthong phonemes in Long Jegan Berawan, the differences are not large, especially considering the obvious differences in style of notation: e.g. Blust has Terawan gitoh "hundred", Burkhardt has Jegan getoʔ (Blust noted some uncertainty in his transcription of vowels); Terawan dimmeh "five", Jegan dimmiᵊy; T buluh "feathers or body hair", J bullǝw, T iciw "day", J iciᵊw, T iko "tail", J eko, T puté "white", J pote, T depeh "fathom", J dǝppiǝ̈. A larger difference is Terawan lebbih "two" vs Jegan duβiᵊy, T puʔ "head hair", J pǝuk, T manoʔ "bird", J manǎuk. The only entry we have in the dialects is pi "water", which is (per our entry) the same in all of them. It does seem that these could be merged. - -sche (discuss) 18:19, 1 September 2016 (UTC)


 * ✅. - -sche (discuss) 05:33, 11 September 2016 (UTC)

Sanglechi [sgy] and Warduji [wrd]
It appears from that these names are synonymous. As "Sanglechi" is vastly more common, I move that we remove Warduji. —Μετάknowledge discuss/deeds 06:12, 6 July 2016 (UTC)
 * ✅. - -sche (discuss) 17:17, 18 September 2016 (UTC)

Even more languages without ISO codes, part 1
—Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)


 * (qfa-cap) [[Image:Symbol abstain vote.svg|20px|link=]] Not done
 * → I would prefer qfa-und-cap und-cap to avoid it appearing to be a family (qfa-) code, or ine-cap... or und-cap ^ ? And are words in the language attested? WP says no texts survive. - -sche (discuss) 06:08, 11 July 2016 (UTC)  - -sche (discuss) 17:57, 14 August 2016 (UTC)
 * I'm fine with three-part codes for a couple rare lects. As for attestation, I chose to put this language on the list chiefly based on The Expositor's Bible, vol. 35, p. 218, where it says: "Thus the ancient Cappadocian language is discussed and a lexicon of it compiled in a monograph which appeared in the Museum of the Evangelical school at Smyrna (1880-84), pp. 47—265." —Μετάknowledge discuss/deeds 06:28, 11 July 2016 (UTC)
 * The monograph in question seems to have been written by ; however, I'm having a very hard time tracking it down… — I.S.M.E.T.A. 15:13, 12 July 2016 (UTC)
 * Let's not add this, at least until we find positive evidence that it needs to be added. In addition to the statement that no texts survive, I see scholars such as Holger Pedersen savaging claims (by Karolidis, Kretschmer, et al) to have found relics of the language in Cappadocian Greek. - -sche (discuss) 18:13, 14 August 2016 (UTC)
 * (sai-chn)
 * ✅ as sai-cno since -chn was taken. In addition to the catechism, Fitz-Roy (1839) has three words, and Jose Garcia (1889) has three words, per Loukotka. - -sche (discuss) 06:18, 17 August 2016 (UTC)


 * (cel-ive)
 * Not done per discussion below, though I suppose an etymology code could be added. - -sche (discuss) 07:53, 9 September 2016 (UTC)
 * (sai-jko) ✅
 * For a language that's known only from a single wordlist, this sure has a lot of names... Jeikó, Geicó, Jeicó, Jaikó, Geikó, Yeikó, all of those with unaccented 'o', and Eyco... - -sche (discuss) 02:02, 20 August 2016 (UTC)
 * (und-kas) ✅
 * (sai-mal)
 * (sai-mar)
 * (qfa-und-mim) — another wordlist language lacking a good name
 * Presumably we also need, so I suggest und-mmd and und-mmn rather than -mim to keep them maximally distinct (even if Nachtigal's Mimi is considered Maban). - -sche (discuss) 17:51, 6 August 2016 (UTC)
 * ✅. - -sche (discuss) 21:48, 6 August 2016 (UTC)
 * (nai-nao) ✅
 * (nai-per) ✅


 * (art-dam) — this might be better off considered as a natural language instead so it can be in mainspace (--Meta)
 * If we want Damin in mainspace, I'd rather consider it a dialect of Lardil than a separate language. —Aɴɢʀ (talk) 20:43, 13 August 2016 (UTC)
 * Hmm, this is an interesting case. On the one hand, they seem to be mutually unintelligible, and speakers seem to refer to them as separate languages, and outside linguists seem to treat them as separate things (compare Eskayan)... and we do consider even such very similar, often-overlapping things as Norwegian and a slightly different orthography of Norwegian to be entirely separate languages. On the other hand, Damin is said — by researchers with access to its full vocabulary, as opposed to access to only a limited surviving corpus — to have a vocabulary of "perhaps no more than 250 words in all", which makes it hard to consider it a language, and does make it seem similar to pandanus avoidance registers or especially thick cant or jargon. According to Wikipedia, some markers are different, but others are the same, e.g. genitive -kan and future -ur. I suppose treating it as a dialect of sorts does make the most sense. - -sche (discuss) 17:35, 14 August 2016 (UTC)
 * Not done per the above discussion. - -sche (discuss) 08:51, 9 September 2016 (UTC)

also this: - -sche (discuss) 05:21, 29 August 2016 (UTC)


 * Mamluk-Kipchak (see Kiptchak mamelouk) (trk-mam? -mmk?) ✅
 * Water is ; the word also means 'tempering (of a sword)'. 'Ebçi' (apparently containing the occupational suffix '-çi'; ç = č) is 'woman', mentioned in one military treatise when it says that Indian swords are "useful only for hanging on the neck of a woman who cannot give birth to a son". 'sovuq' is 'cold' and 'sol' is 'left': andın songra sol egining ūzārā salgıl taqı boynuñ ūzārā tezgindūrgil "after that, place it over your left shoulder and make it rotate over your neck". See Munytu'l-Ghuzāt: a 14th-century Mamluk-Kipchak military treatise and Vocabulaire arabe-kiptchak de l'époque de l'État mamelouk. - -sche (discuss) 00:47, 31 August 2016 (UTC)

splitting rGyalrong
Based on their limited mutual intelligibility, and following the references on these languages (see Rgyalrong languages' bibliography), we should split jya, and its category and any entries, into four languages: - -sche (discuss) 05:11, 29 August 2016 (UTC)
 * Situ (sometimes called Eastern rGyalrong) (sit-sit?)
 * Japhug (sit-jap?)
 * Tshobdun (Caodeng, Sidaba) (sit-tsh)
 * Zbu (Rdzong'bur, Showu, Sidaba) (sit-zbu)


 * ✅. - -sche (discuss) 04:44, 4 October 2016 (UTC)

Merge Qashqai and Sonqor into Azeri?
Sonqor Turkic is an Oghuz lect spoken in northwestern Iran. The Turkic Languages by Fuchs Christian et al speaks of it, Qashqai/Kashkay (code qxq) and Afshar as "deviating" dialects of Azeri. The Oxford Handbook of Inflection, edited by Matthew Baerman, speaks of Sonqor as "a minority language of Iran heavily influenced by Kurdish". (The Handbook provides some Sonqor example words: ušaġ-ækæ-le' "child-SPEC-PL" "those children" (compare ušaġ to Azeri uşaq), mincuġ-ækæ-re "bead-SPEC-ABL" "of those pearls" (Azeri muncuq-), šéʕr-eke-sin-ne "poem-SPEC-POSS-ABL" "about that poem by him" (Azeri şeir-).) Other sources note that "Sonqor and others" "transitional dialects". How should Sonqor and Qashqai be handled? Ethnologue claims there are "significant differences [between North Azeri and] South Azerbaijani [azb] in phonology, lexicon, morphology, syntax, and loanwords", but we merged them (but not qxq) after this discussion, based on references saying "[the] dialects of Azerbaijani do not differ substantially. Speakers of various dialects normally do not have problems understanding each other" (heck, even Azeri and Turkish speakers do not have that much difficulty understanding one another). Gilles Authier, in New strategies for relative clauses in Azeri and Apsheron Tat, in Clause Linkage in Cross-Linguistic Perspective (2012, ISBN 3110280698, similarly says "There is an almost perfect mutual intelligibility between Azeri and Kashkai speakers. I tested this personally by submitting recordings to both audiences." It sounds to me as if Kashkay should be merged into Azeri, and as if we don't need a code for Sonqor. However, I'm pinging our only recently active editor who speaks some Azeri, User:123snake45. - -sche (discuss) 03:33, 11 September 2016 (UTC)
 * As you noted, only political/orthographic considerations compel us to keep Turkish and Azeri apart. I'd keep all these merged in Azeri. While we're at it, have we ever discussed whether the name "Azeri" should be changed to "Azerbaijani"? —Μετάknowledge discuss/deeds 03:46, 11 September 2016 (UTC)
 * BGC Ngrams shows a statistical dead heat between the two over the years. I personally prefer the shorter name. —Aɴɢʀ (talk) 12:51, 12 September 2016 (UTC)
 * I do not speak anything Turkic; but I keep getting the general impression we make too little use of the "X is a dialect of Y" functionality. This would be well-suited for handling cases where one variety is mostly intelligible with a bigger standard language, but has its share of its own quirks (loanwords, occasional diverging phonetics, etc). --Tropylium (talk) 13:32, 12 September 2016 (UTC)


 * ✅. - -sche (discuss) 05:06, 4 October 2016 (UTC)

Belize Kriol
The name we use, "Belize Kriol English", is certainly among the more uncommon names of this language. Much better would be just "Belize Kriol"; there might also be argument to be made for the more dated term "Belizean Creole" (which is the title of the Wikipedia entry). There is one entry which would have to be changed. —Μετάknowledge discuss/deeds 20:25, 3 April 2016 (UTC)


 * On Ngrams and in the 'raw' Google Books results, and when I check academic sites on Google and academic papers (  — most results for most non-famous languages' names are academic papers, dictionaries, etc), the most common name is "Belizean Creole" (which [//books.google.com/ngrams/graph?content=Belize+Kriol%2CBelizean+Kriol%2CBelizean+Creole%2CBelize+Creole&year_start=1970&year_end=2008&corpus=15&smoothing=3&share=&direct_url=t1%3B%2CBelizean%20Creole%3B%2Cc0%3B.t1%3B%2CBelize%20Creole%3B%2Cc0 doesn't seem dated]), then "Belize Creole", with "Belize Kriol" bringing up the rear. Approximately the same pattern holds when I page through the results looking only for primary reference works about this language in particular, as opposed to works that just mention it. There is some interference from the fact that "Belizean Creole" and "Belize Creole" are apparently also used to name the people who speak it and not just the language, but I suggest a rename to "Belizean Creole". - -sche (discuss) 18:27, 20 August 2016 (UTC)


 * Renamed to "Belizean Creole". - -sche (discuss) 17:40, 11 September 2016 (UTC)

Even more languages without ISO codes, part 2
—Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)

, since you are knowledgeable of Celtic languages: do you agree that the and  should be added? Should "Ivernic" be called that, or another name? And do you know if any words in it are attested? (It's not clear to me if ond and fern, which Wikipedia mentions, are Ivernic words, or words borrowed by some other language from (potentially slightly different) Ivernic words.) - -sche (discuss) 20:18, 12 August 2016 (UTC)
 * (cel-gal) — I thought we had this already; is it subsumed under something else? ✅
 * Celtiberian, perhaps? (But WP says they are distinct.) Or was its name "Hispano-Celtic" reminding you of Hiberno-Scottish Gaelic (ghc), which we deleted and subsumed into Irish and Scottish? - -sche (discuss) 06:57, 6 July 2016 (UTC)
 * Gallaecian and Ivernic
 * , AFAIK "Ivernic" is unattested; ond and fern are Old Irish words which Cormac mac Cuilennáin (who lived in the 9th century) believed to be of "Ivernic" origin. I don't think we need to add it. Gallaecian, on the other hand, is attested. If it were to be subsumed under anything else, it would be Celtiberian (xce), but I'd rather keep it separate. —Aɴɢʀ (talk) 20:38, 13 August 2016 (UTC)


 * (qfa-isa)
 * → I would prefer qfa-und-isa und-isa to avoid it appearing to be a family (qfa-) code, or ine-isa. - -sche (discuss) 06:08, 11 July 2016 (UTC)  - -sche (discuss) 17:57, 14 August 2016 (UTC)

Here are other languages we might need codes for: - -sche (discuss) 05:21, 29 August 2016 (UTC)


 * , a small mixed language (350 speakers, all also fluent in Warlpiri). I'm not sure if we should have this or not.
 * It's gotten a lot of press, but I lean toward saying it's not worth including — mixed languages are always a little hard to justify, and one that's just come into being is covered by Warlpiri and English sections well enough, I reckon. —Μετάknowledge discuss/deeds 05:30, 4 October 2016 (UTC)

Bushi [buc]
Apparently, we merged all the Malagasy dialects except this one. I don't see any reason to keep it separate at this point (and I'm not even sure that this oversight merits an RFM). —Μετάknowledge discuss/deeds 06:19, 28 September 2016 (UTC)


 * It apparently has its own orthography and is spoken on a different island, which might have motivated a non-merger, although I can find no discussion of it so it seems most likely that it simply went entirely unnoticed. Reconstruction:Proto-Austronesian/daNum and water make use of it. The only references I can find that mention it do speak of it as a dialect of Malagasy. Merged. - -sche (discuss) 01:24, 4 October 2016 (UTC)

renaming Pu Xian [cpx]
I propose renaming  to either "Pu-Xian", "Puxian", "Pu-Xian Min", or "Puxian Min". (at least remove the space, per ). I do not know which name is most common, but Wikipedia's article is titled. —suzukaze (t・c) 09:46, 15 August 2016 (UTC)


 * On Google Books, "Puxian" and "Pu-Xian" seem to be the most common names, about equally common: but on Google Scholar, about 3.5 times as many works use "Puxian" as "Pu-Xian", so I will rename the language to "Puxian". "Pu Xian" does get a few hits. Incorporating "Min" ("Puxian Min") seems to be uncommon. Another attested spelling (not common) is "Putian". Wikipedia says Xinghua and Hinghwa are additional alt names. - -sche (discuss) 13:50, 1 November 2016 (UTC)
 * ✅. - -sche (discuss) 13:57, 1 November 2016 (UTC)

Renaming Timne
Searches for "speak X" and "X language" on Google Books reveal that [tem], which we currently call "Timne", is referred to as "Temne" far more frequently than by any other spelling. The native spelling, (or the ASCII-friendly Themne), sees extremely limited use. See also Temne language. —Μετάknowledge discuss/deeds 05:40, 4 February 2017 (UTC)


 * I support changing what we call the language to . — I.S.M.E.T.A. 15:32, 15 February 2017 (UTC)


 * ✅. - -sche (discuss) 17:56, 8 March 2017 (UTC)

Some more missing South American languages, 1
Here are a few more South American languages for which we could add codes: - -sche (discuss) 21:18, 16 August 2016 (UTC)
 * Cacán (Kakán, Diaguita) was documented, but the document was then lost, so it seems unnecessary to add.
 * Update: ✅: Some words are recorded elsewhere and collected by Nardi (1979). - -sche (discuss) 14:28, 29 August 2016 (UTC)
 * (sai-crd), of which [//books.google.com/books?id=Ew9IAQAAMAAJ Revista de la Universidad Nacional de Cordoba], volume 6, includes two quite disparate wordlists, with "sun" alternately given as Hope or Aaam, "moon" as pethara or kesha, "water" as something illegible (nhanan?) or gioi. The [//books.google.com/books?id=PvgrAAAAYAAJ Viagem a Curitiba e Província de Santa Catarina] also has some words, including goio "water". Greenberg, from who knows where, sources ope as "sun", oron as "new moon", and also e.g. čama "animal", puara "chest", teke "belly", bo, ambo "tree", ke "wood, fire", tong, ton "nape of the neck" (compare Greenberg's Puri thong), uka, høka "stone" (compare Puri ukhua), tima "love/want/enjoy" (compare Puri tamathi), baj "to live".
 * (sai-men), attested in a Vocabulario Menien by Príncipe von Wied. He wrote in German, was translated with only a few alterations into French, and then was translated badly with a number of serious errors into Portuguese (says Loukotka, giving some examples from all three editions). Words (agreed on by the German and French editions) include hioi "wax", koinin [sic not h-] "child", and kohira [sic not c-] "red".

More languages without ISO codes, part 4
{—Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC))


 * (cba-cor) [[Image:Symbol abstain vote.svg|20px|link=]] Not added; see notes below
 * Incidentally, there is also a, although we could possibly enter that as Miskito Coast Creole (<tt>bzk</tt>) plus a context label. - -sche (discuss) 00:19, 10 August 2016 (UTC)
 * (sai-cum) [[Image:Symbol abstain vote.svg|20px|link=]] Not done yet; see notes below
 * (sai-gch) [[Image:Symbol abstain vote.svg|20px|link=]] Not done yet; see notes below
 * (sai-gnc) [[Image:Symbol abstain vote.svg|20px|link=]] Not done yet; see notes below
 * Curiously (and unhelpfully, but ultimately irrelevantly), the Real diccionario de la vulgar lengua guanaca (Joaquín Meza, 2008) is not a dictionary of this language, but (according to its description) of vulgar Salvadoran Spanish. - -sche (discuss) 21:06, 12 August 2016 (UTC)
 * (paa-kwn) [[Image:Symbol abstain vote.svg|20px|link=]] ✅; see notes below
 * (awd-mar), (awd-wai) — considered the same language by some sources ✅
 * (sai-mue) ✅
 * (awd-yum) ✅


 * Comments:
 * I've split the discussion so that ones that are done can be archived. - -sche (discuss) 19:28, 10 July 2016 (UTC)


 * Note that some discussion of Yumana, Mariaté (awd-mrt), Wainumá, Wiriná, Guinau, Baré, and Marawá is found on User talk:-sche. - -sche (discuss) 04:40, 21 November 2016 (UTC)

Corobici

 * Whether Corobici existed as a separate language, and whether words in it are attested, is unclear to me (and apparently to a lot of writers).


 * The 1911 Indian Languages of Mexico and Central America says:
 * "Gomara (1: 264; 2:457) and Herrera mention a tribe (the Corobici) which seems to be identical with the Mangue (or Chorotega). The latter author says (II, dec. 3, 121), "Hablaban en Nicaragua cinco Lenguas diferentes, Coribici, que lo hablan mucho en Chuloteca," etc. <tt>[= "They spoke in Nicaragua five different languages: Corobici, who speak a lot in Chuloteca,"]</tt> Nevertheless, Peralta thinks the Coribici were the ancestors of the Guatuso (see below). It would seem that Mangue is a comprehensive term precisely equivalent to Chorotega, properly used, that is, to include the Chiapanecan element in this region &mdash; Choluteca, Dirian, and Orottinan."
 * However, (code: <tt>mom</tt>) and Corobicí are in different language families, so an equation of the two is very suspect.
 * Other sources quote Gomara (sic, not Herrera as above) as saying "There are in Nicaragua five very different languages: Corobici, which is praised much;" etc.


 * Native-languages.org, based on who knows what, identifies Corobicí as <tt>bzd</tt>. They give four words, which I would not trust without other confirmation: tun "man", si "water", unsat "earth" and guá "house". (For comparison, Wikipedia gives the following Rama words: kiikna "man", kumaa "woman", nguu "house", sii "water".)
 * The 2013 Native Languages of the Americas, volume 2, citing Mason, envisions a "Rama-Corobici" family which includes the two languages "Rama" and "Guatuso (incl. Corobici)"; the Classification and Index of the World's Languages by Charles Frederick Voegelin and Florence Marie Voegelin also includes Corobici in Guatuso. Guatuso is also called Maléku Jaíka, and Ethnologue says it's specifically not intelligible with <tt>bzd</tt>.
 * However, the 1950 Handbook of South American Indians says "Guatuso, with its variety Corobici or Corbesi, and Rama with its dialect Melchora, are obviously very different from each other and from other Central American Chibchan languages, and Mason (1940) was evidently in error in making a Rama-Corobici subfamily." It goes on to place Guatuso and Rama in separate (sub)families.
 * Comparative Chibchan Phonology (1981), speaking about someone who has "gives a small sample of" Corobici, says "The 'Corobici' words are [actually] words from the dialect of Rama that was spoken in the region of Upala, Costa Rica, up to the 1920s. Really, not a single word of the language of the Corobicies (an extinct group) was recorded". An 1890s report by Daniel Brinton to the US Congress agrees, "Nothing remains of the Corobicies or Corvesies except the name Corobici or Curubici" used in placenames.
 * Other authors say the Corobici were the ancestors of both the Rama and Guatuso. Tozzer says "The modern Guatuso are probably descendants of, and synonymous with, the ancient Corobici." Lothrop says "It is generally assumed that the Rama were once a tribe identical in language and speech with the Corobici."
 * - -sche (discuss) 21:55, 9 August 2016 (UTC)


 * Therefore, not added. - -sche (discuss) 22:00, 2 November 2016 (UTC)

Cumana
It's hard to find information on this Chapacuran language, firstly because it's not discussed much in the literature, and secondly because it has the same name, in the same range of spellings, as the Cariban language of the Cumanagotos (cuo). Some sources speak of it as a dialect of Abitana (Wanham), and Loukotka's wordlist is quite similar to his Abitana wordlist; indeed, the two are more similar than some wordlists of the same language (e.g., of Peba) are to each other. Glottolog only has resources on Kuyubi / Kaw Ta Yo, and although the equation of that with Cumana is non-obvious, it would at least be a less confusing name (any spelling of Cumana is likely to be misinterpreted as cuo). - -sche (discuss) 03:36, 31 August 2016 (UTC)

Guachí

 * There may be two lects called Guachí. Wikipedia places a Guachí / Wachí, which it considers possibly Guaicuruan, in Argentina. Glottolog places a possibly-Guaicuruan Guachí in southwestern Brazil, in Mato Grosso do Sul, where Opaie is also spoken. The index to Čestmír Loukotka, Johannes Wilbert, Classification of South American Indian Languages (1968), says they describe Guachí is two places: pages 51-52, which I can't see, and page 66, which says: "Opaie or Ofaie-Chavante - spoken on the Ivinhema, Pardo and Nhandui Rivers in the Brazilian state of Mato Grosso, now by only a few individuals. [Nimuendaju in Ihering 1912, pp. 256-260; Nimuendaju 1932b, pp. 567-573; Ribeiro ms.a.] The so-called language 'Guachi' on the Vaccaria River in the same state is only a dialectal version of Opaie. [Nimuendaju 1932b, pp. 567-573.]" Opaie is not Guaicuruan. It is possible that pages 51-52 describe a different lect. It is separately possible that the more recent (2004) Guaicurú no, macro-Guaicurú sí: Una hipótesis sobre la clasificación de la lengua Guachí (Mato Grosso do Sul, Brasil) knows more than Nimuendaju in 1932 and Loukotka in 1968 about the separateness and familiar relationships of Guachí. [//www.adilq.com.ar/MACRO-GUAICURU-MACRO-JE.pdf] lists two Guachí words, [iava] ‘diente(s)’ and [iacté] ‘pierna’. - -sche (discuss) 04:46, 11 July 2016 (UTC)

Guanaca
I'm having trouble finding evidence that this lect was recorded. Greenberg says "In Paezan, we find Guambiana, Guanaca, and Totoro with the noun plurals -ele, -el, and -le, respectively", but existence of the Paezan family has been discredited and it's not clear where Greenberg gets his Guanaca data; Glottolog apparently doesn't list the lect and I have not yet found anyone other than Greenberg who mentions words in it. Loukotka cites Castillo y Orozco's 1877 Vocabulario páez-castellano as a source for the lect, but I haven't found it in that work; maybe it's under an alternative name or a spelling I couldn't think of to check? I've found the spellings Guanaca, Guanáca, Guanaco, Guanuco, Guanukco(?), Wuanaka mentioned or used in various sources. Huanca and Huanuco are names of a variety of Quechua (whereas WP says Guanaca is "perhaps Barbacoan"). - -sche (discuss) 15:51, 18 September 2016 (UTC)

Kuwani
I'm ambivalent about this. WP says it's known only from one wordlist "and even its exact location is unknown. [And] Smits and Voorhoeve (1998) assumed it to be equivalent to Kalabra". There are, as WP notes, lexical differences, but compare other languages attested (with more certainty) in multiple wordlists that show great differences, including Tapachultec and Coroado Puri. OTOH, we could include it and just merge it later if more information came out that it was Kalabra. - -sche (discuss) 01:44, 20 August 2016 (UTC)
 * With wordlist-only languages that thus have small, finite lexica, it's best to be conservative and keep them separate when there's no clear case for merger. —Μετάknowledge discuss/deeds 04:39, 13 September 2016 (UTC)
 * OK, added. - -sche (discuss) 22:00, 2 November 2016 (UTC)

Baïnounk Gubëeher
This obscure lect does seem like a separate language, but wasn't described until Cobbinah (2013). The Wikipedia article follows his lead and calls it thus, but it seems odd for us to do so, given that all the other Bainouk languages are given names starting with Bainouk. We also need to decide on a code, perhaps. —Μετάknowledge discuss/deeds 00:19, 2 April 2016 (UTC)


 * Added. Wikipedia says Gubëeher (which I can also find referred to as that single word, or with the alternative "prefixes"/first-elements Nyun Gubëeher or Nun Gubëeher) is merely closely related to the two Bainouks, so it's probably tolerable that the spelling is Baïnounk here and Bainouk there. Incidentally, WP citing Glottolog says bcb is a duplicate code for an alternative name of bab. Should they be merged? and should we drop the "Bainouk"/"Baïnounk" prefixes? - -sche (discuss) 03:25, 13 September 2016 (UTC)


 * Incidentally, there's a book Le gúbaher, parler baïnouck de Djibonker by Noël Bernard Biagui which apparently uses yet another spelling. - -sche (discuss) 14:52, 18 September 2016 (UTC)

Antigua and Barbuda Creole English
We currently call this "Antigua and Barbuda Creole English" — names containing <tt> and </tt> are suspect, in my opinion. Wikipedia calls it "Leeward Caribbean Creole English". The most common name in literature seems to be "Antiguan Creole", which is what I suggest renaming it. - -sche (discuss) 03:22, 19 April 2016 (UTC)
 * The country is called Antigua and Barbuda, so I don't see anything suspicious about the name per se. Nevertheless, I prefer Wikipedia's name, since it's also spoken in Montserrat, Saint Kitts and Nevis, and Anguilla. —Aɴɢʀ (talk) 17:53, 19 April 2016 (UTC)
 * I concur with Angr, although I have hesitations about changing it. The use of "Antiguan Creole" seems to be mainly by scholars who are just studying the lect spoken on that island and not worrying about what is and isn't considered the same language. —Μετάknowledge discuss/deeds 04:12, 26 June 2016 (UTC)


 * Meh, left as-is. - -sche (discuss) 06:45, 27 March 2017 (UTC)

Kela-Yela language
This language got encoded twice, once as Kela [kel] and once as Yela [yel], based on its two names in two different provinces of DR Congo. Wikipedia opts for "Kela-Yela" as the compromise name, and I can't find a better one, unfortunately. (This also opens up "Kela" for [kcl], which we currently call "Kala", though that name seems to be less commonly used.) —Μετάknowledge discuss/deeds 06:00, 4 October 2016 (UTC)


 * I've merged yel into kel. - -sche (discuss) 04:20, 27 March 2017 (UTC)

Sena language merger
Sena is mutually intelligible with Chichewa, so it really shouldn't be considered a separate language, but that's probably not worth a merger, as the speakers seem to disagree. What the speakers do seem to agree with is that there is one Sena language, despite it having gotten two ISO codes, seh in Mozambique and swk in Malawi, which we call "Sena" and "Malawi Sena" respectively. We should probably just do away with swk. —Μετάknowledge discuss/deeds 23:48, 8 October 2016 (UTC)


 * I've merged swk into seh. No action taken with regard to Chichewa at this time. - -sche (discuss) 04:15, 27 March 2017 (UTC)

Old Uighur language (oui) to Old Uyghur
Spelling is inconsistent (we spell the modern language Uyghur). DTLHS (talk) 23:28, 26 December 2016 (UTC)


 * Renamed. It seems to be as common or perhaps more common to spell it with a y, although searchign is complicated by chaff from the SOP phrases "old Uighur", "old Uyghur". - -sche (discuss) 01:53, 28 March 2017 (UTC)

Chimwiini
This is pretty decidedly too distinct to be considered a dialect; I was just reading a grammatical description and it's heavily different from Swahili. Maho, Kisseberth, and others seem to agree that it is a separate language. As for the name, it seems that "Chimwiini" (sadly, with the language prefix) is a better option than "Bravanese", used at Bravanese dialect. I suppose the code could be bnt-mwi. , I noticed this because of muke; what other Chimwiini entries did you add? —Μετάknowledge discuss/deeds 03:28, 25 March 2017 (UTC)
 * I know what your reasoning is for choosing  as the language code, but I think that it's better if the code matches the English name, thus   or  . —CodeCat 18:56, 25 March 2017 (UTC)


 * I agree it merits separation; indeed, I see articles going back to at least the 1960s pointing out that "Certainly Bravanese and standard Swahili are not mutually intelligible" (Morris Goodman's 1967 Prosodic Features of Bravanese, J. of African Languages 6). And yes, for better or worse codes have tended to be named based on the English names. Did you search for "Mwini" with one i? I find roughly comparable numbers of Google Books hits for <tt>"Mwini" Swahili</tt>, <tt>"Chimwiini" Swahili</tt> and <tt>"Bravanese" Swahili</tt>. (Ngrams, catching I-don't-know how much chaff, finds Mwini to be the most common spelling recently, followed by Bravanese — the most common spelling until recently — and then Mwiini, with Chimwiini not common enough for it to plot.) It looks like we could avoid the prefix if you wanted, though I defer to you if you checked more literature than the cross-section Google has digitized. Google Scholar has ~120 articles on <tt>"Chimwiini" Swahili</tt> compared to ~90 for <tt>"Mwini" Swahili</tt>, some of which are the irrelevant last name Mwini, and also ~90 for <tt>"Bravanese" Swahili</tt>, but many are bibliographies mentioning the same few articles. Other spellings I see, besides Chimwiini, Bravanese, Mwiini, and Mwini, include Chimwini, Chimini, and Brava. - -sche (discuss) 23:37, 25 March 2017 (UTC)
 * In my own files, and likely elsewhere, the Chichewa word gets in the way. But in any case, most of the works I see dedicated to the language that aren't on the older end use "Chimwiini". Anyway,   is fine. —Μετάknowledge discuss/deeds 23:43, 25 March 2017 (UTC)


 * I've added the code . Btw, I didn't add muke: that was User:Muke themself all the way back in 2006. :-p - -sche (discuss) 03:06, 27 March 2017 (UTC)
 * I'm shocked that it wasn't part of your water-and-woman effort. I guess I'll add the word for water now. —Μετάknowledge discuss/deeds 03:25, 27 March 2017 (UTC)

Aramaic
The same language in two different scripts with two different literary standards, and yet each quite similar to each other. I think I see a pattern. —Μετάknowledge discuss/deeds 07:49, 21 December 2012 (UTC)


 * "Arc" should only be used for "Imperial Aramaic" (aka "Official Aramaic"), ideally written in the Old Aramaic script rather than Hebrew. Current usage does not reflect that though and there are a whole mix of dialects intertwined within the "arc" code, so that one at least should be split. "Syc" should stay as it is. --334a (talk) 08:04, 21 December 2012 (UTC)


 * I think SIL has done a really bad job with the classification of Aramaic languages. ARC was an umbrella code that was used to describe all later Aramaic varieties in ISO-639-2 in ISO 639-3 they introduced SYC for classical Syriac, by far the most widespread form of literary Aramaic.--Rafy (talk) 17:38, 28 December 2012 (UTC)

East Frisian lects
This is an issue that needs to be resolved soon: there are 90 module errors related to it.

First let me lay out the linguistic details:

East Frisian, a language of Lower Saxony in Germany, is, along with West Frisian in the Netherlands and North Frisian in Germany, one of the Frisian languages. It has historically consisted of two dialect groups: one near the Ems River, and the other near the Weser River. The Weser dialects are all now extinct, with the Wursten Frisian dialect surviving into at least the 1700s, and the last speaker of Wangerooge Frisian dying in 1953. Although most of the Ems dialects died out by the 18th century, Saterland Frisian has survived to the present day. The extinct dialects were absorbed into Low German to become East Frisian Low Saxon.

The ISO has typically made a massive mess out of the Germanic languages, and they really screwed up in this case- until recently, it was impossible to tell if their code for "East Frisian", frs, referred to Frisian East Frisian or to East Frisian Low Saxon. If the first were true, it would overlap with Saterland Frisian, stq. If the second were true, it would be just another of the Low German lects that we decided earlier to treat as German Low German, nds-de (not Dutch Low Saxon, nds-nl, in spite of having "Saxon" in the name). Because of this ambiguity, the frs code was pressed into service for Frisian East Frisian in upwards of 140 etymologies and translations (there might have been entries, too, but I have no way to tell).

A few weeks ago, I mentioned to User:-sche that the online description of another lect had been updated. In the process of checking this out, he discovered that frs was now unambiguously described as East Frisian Low Saxon, and thus redundant to nds-de. After making the usual checks of the categories and changing uses of frs that he knew about, he removed frs from the data module. Unfortunately, East Frisian is mostly only mentioned in etymologies as cognates, and cognates don't show up in any of the categories, nor do redlinked translations. User:Leasnam (who added most of the frs references in the first place), -sche and I have been able to whittle it down from 137 entries in Category:Pages with module errors, to the present 90 just by getting rid of unnecessary ones and by changing recognizable instances of Saterland Frisian and East Frisian Low Saxon to the correct language codes.

As I see it, there are two halfway-decent options:
 * 1) Merge all of Frisian East Frisian into Saterland Low Frisian, stq, since the latter is the only surviving dialect of the former.
 * 2) Create an exception code for Frisian East Frisian, such as gmw-efr or gmw-fre.

There's also the possibilty of restoring frs as Frisian East Frisian, but that would put us in direct contradiction to the ISO standard, and leave things open for all kinds of confusion.

The first probably fits the linguistic facts best, but the second may be more practical, at least in the short run.

I've found no references on East Frisian Low Saxon, and very little on Saterland Frisian (there's a Saterland Frisian Wiktionary, but most of the remaining terms aren't mentioned there). There is, at least, one good dictionary of Frisian East Frisian available online. Those with better sources apparently don't have the time to work on this right now. Chuck Entz (talk) 02:41, 11 April 2015 (UTC)
 * I don't think restoring frs is a good idea for the confusion you mentioned. I think option 1 is the best; we would then treat Saterland Frisian as the main dialect of East Frisian. Option 2 would just introduce more ambiguity, since one language would suddenly become a part of another. This is why we eliminated the Low German varieties in the first place. —CodeCat 17:28, 11 April 2015 (UTC)
 * Yes, it wouldn't make sense to have both "East Frisian" and "Saterland Frisian" (they are not sufficiently distinct), so option 2 would only make sense if we changed instances of stq to the new pan-East-Frisian code and -ed them. But we do need to recognize that there are inclusible East Frisian words which are not Saterland Frisian (namely, all the words and forms that we know — from records — existed in the non-Saterland varieties of East Frisian). Precedent exists of us repurposing codes to refer to slightly more things than the ISO intended them to refer to, e.g. we use gcf to refer to both gcf and acf and we used en to refer to both en-proper and hwc (Hawai'ian Creole English) and pld (Polari). Hence, I would add "East Frisian", "Eastern Frisian" etc as alternative names of stq, and then change all remaining uses of frs to stq to solve the module errors. Then, at leisure, we can go back through the affected pages and specify, whenever possible, which precise dialect they are from. (It may look a bit ugly to have e.g. "Wursten Saterland Frisian foo" or "Saterland Frisian foo ", but it's probably the best we can do, since changing the canonical name of stq to "East Frisian" would just invite people to become confused about it and misuse it again.) - -sche (discuss) 16:58, 13 April 2015 (UTC)
 * By the way, with all these module errors on highly visited pages, we can't really wait for this to go through RFM procedure. I think whoever sees this next, if (s)he has the time, should implement -sche's temporary solution. (I myself will do it if I can get my work done in meatspace quickly enough.) —Μετάknowledge discuss/deeds 06:36, 16 April 2015 (UTC)

Weyto language
This language's Wikipedia article helpfully explains that this extinct language of Ethiopia is unattested. However, since Amharic is an LDL and wordlists of the Weyto dialect of Amharic have been made that contain words purported to derive from it, we should make it an etymology-only language code. —Μετάknowledge discuss/deeds 02:23, 4 April 2017 (UTC)
 * I wonder why the SIL/ISO assigned a full code in the first place to an unattested extinct language of unclear family association that is not even certain to have existed... - -sche (discuss) 03:16, 4 April 2017 (UTC)
 * I dunno! But after my efforts above on this page to identify new codes that need adding, I reckon I need to do some work identifying codes that need removing. —Μετάknowledge discuss/deeds 03:22, 4 April 2017 (UTC)


 * ✅. - -sche (discuss) 06:59, 12 May 2017 (UTC)

Dupaningan Agta
At present, Dupaningan Agta is the language evoked by language code duo, but Dupaninan Agta is also a valid spelling. User:Mar vin kaiser has been adding entries in this language under Dupaninan Agta. —Stephen (Talk) 09:57, 5 April 2017 (UTC)
 * The only reason though why I used Dupaninan Agta instead of Dupaningan Agta is because that's the spelling that came out of Wiktionary when I inputted duo. --Mar vin kaiser (talk) 10:18, 5 April 2017 (UTC)
 * That's true: duo yields "Dupaninan Agta", as listed at Module:languages/data3/d. —Aɴɢʀ (talk) 11:38, 5 April 2017 (UTC)
 * Which, in turn, is because that's the spelling the Ethnologue/SIL/ISO used when we copied codes over, years ago. But the spelling with g seems slightly more common. Should it be made the canonical spelling, with the other one retained as another name? The names without "Agta" (‎i.e. "Dupaninan", ‎"Dupaningan") should also be, since they are sometimes encountered. - -sche (discuss) 17:04, 5 April 2017 (UTC)


 * I have updated things so that the more common spelling is the one used in all the entries and translations tables and categories now: Dupaningan Agta. I'm sorry if this causes you to have to relearn how to type the header, etc. The "joys" of muscle memory! :-p - -sche (discuss) 07:27, 10 May 2017 (UTC)

Gail/Gayle (gic)
The is a gay argot lexicon that can be used in English or Afrikaans. Words should be under whatever language they are found in, as argots are not independent languages, and this code should be removed. We have precedent for excluding gay argots based on our deletion of the code for (discussion, most of which is off-topic). —Μετάknowledge discuss/deeds 06:09, 6 July 2016 (UTC)
 * Removed. It can be reinstated as an etymology-only code if the need arises. —Μετάknowledge discuss/deeds 02:52, 16 May 2017 (UTC)

the Kewa lects
I propose to merge [kjy] "South Kewa"/"Erave Kewa", [kjs] "East Kewa" and [kew] "West Kewa"/"Pasuma Kewa" into [kew] as "Kewa". AFAICT most literature treats Kewa as a single language, and the only effect having three codes has had upon us so far is that our Kewa content is duplicated under several headers (as in ipa and utyali). There seems to be a far more marked difference between normal Kewa and its pandanus avoidance register than between South, East and West. - -sche (discuss) 06:00, 11 August 2015 (UTC)
 * ✅. - -sche (discuss) 05:37, 14 May 2017 (UTC)

(Proto-)Western Malayo-Polynesian into (Proto-)Malayo-Polynesian

 * [pqw-pro] → [poz-pro]

As discussed earlier this year. Western Malayo-Polynesian is a solely geographic group, it is not recognized by our language categorization system, and a proto-language appendix seems to be superfluous.

In addition to the mentioned durian issue, we have currently 13 Proto-WMP lemmas, with a breakdown as follows: --Tropylium (talk) 15:03, 29 June 2015 (UTC)
 * 10 entries are fully identical to corresponding Proto-MP entries (e.g. =,  = )
 * 2 entries are reconstructed from very scarce data, and the most likely situation is that they were just randomly lost in Central-Eastern MP.
 * 1 entry is, per the cited source (Blust's dictionary), probably a late Wanderwort originating in Malay(ic) and not inherited.
 * Even if there are regional differences, you don't have to resort to separate protolanguages to explain them: for one thing, the substratum languages encountered off of Southeast Asia had to have been vastly different from those farther east. As for animal (and to a lesser extent plant) names, there's the matter of the Wallace Line and other such boundaries: the farther east you go, the fewer non-marine Asian species you find. By the time you get to Polynesia, the only flightless land animals (New Zealand is an exception, or course) are human-transported creatures such as pigs, chickens, dogs and rats, and the only widely-distributed plants are those with seeds that can drift on the currents, or Polynesian canoe plants- if you don't have wild beasts, you're not likely to preserve inherited names for them. Chuck Entz (talk) 02:30, 30 June 2015 (UTC)


 * I've started to merge these. - -sche (discuss) 05:14, 29 May 2016 (UTC)


 * ✅. I've moved the remaining 8 PWMP appendices and updated the entries that referred to PWMP in their etymologies. - -sche (discuss) 09:39, 15 May 2017 (UTC)

Renaming Kxoe
There's a bit of noise on BGC, but it seems that "Khwe" is more common that "Kxoe" for [xuu], and continues to be used in modern books. See Khwe language. —Μετάknowledge discuss/deeds 06:12, 4 February 2017 (UTC)


 * Perusing Glottolog's large bibliography for this language, I note that Kxoe predominated and Kxoe was little used until about the year 2000, after which Khwe has predominated and Kxoe has been little used. Wikipedia uses Khwe and mentions that it is the preferred spelling. Renamed. - -sche (discuss) 07:51, 22 May 2017 (UTC)

More languages without ISO codes, part 5
{—Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC))


 * (sit-mor) ✅
 * (sai-nak) — not entirely sure this is separate from Krenak
 * The Encyclopedia of the World's Endangered Languages and The Indigenous Languages of South America: A Comprehensive Guide treat this as an alternative name of Krenak or a dialect. Not added at this time. - -sche (discuss) 22:47, 12 May 2017 (UTC)
 * (ath-nic) ✅
 * (tbq-plg) ✅
 * (sit-prn) ✅
 * (sai-sin) ✅
 * Also called Cenúfana, Zenúfana, Cinifaná, Sinufana, Sinú, Cenú, Zenú; dialect names Fincenú, Finzenú, Pancenú, and supposedly Sutagao. Added; see my comments below. - -sche (discuss) 05:21, 11 May 2017 (UTC)
 * (map-sor) — I can't actually find any material on this one
 * All I can find is the claim that it had 3 speakers. Not added at this time. - -sche (discuss) 22:47, 12 May 2017 (UTC)
 * (nai-taw) ✅
 * The evidence for this language is quite sketchy — it's only known from one wordlist by someone who may not have spoken Tawasa at all — but it's extinct if it existed at all. I suppose it can be added and then its (the wordlist's) entries' etymologies and those of Timucua and Muskogean entries' etymologies can discuss each other. - -sche (discuss) 15:51, 18 September 2016 (UTC)
 * Added. - -sche (discuss) 04:15, 1 May 2017 (UTC)
 * (sai-wai) ✅
 * (also called Goytacaz per Loukotka, Goyataca per Campbell, Guaitacá, Waitacá per others, and a host of other names)
 * (aus-wul) ✅
 * (sai-yar) — this name may cause confusion with Suyá suy, which also referred to by this name
 * ✅ Glottolog syas it ias attested and definitively a different family than 'suy' - -sche (discuss) 22:45, 23 May 2017 (UTC)


 * Comments:
 * I've split the discussion so that ones that are done can be archived. - -sche (discuss) 19:28, 10 July 2016 (UTC)
 * I've added Pai-lang and Nicola. - -sche (discuss) 06:31, 30 March 2017 (UTC)
 * I've added Moran. - -sche (discuss) 04:15, 1 May 2017 (UTC)

Sinúfana
The variety of names this language has and their homography with the names of the people and the place they lived, has made it hard to search for information on this language. And some scholars say there is nothing to find: The Indigenous Languages of South America (edited by Lyle Campbell and Verónica Grondona) quotes "Adelaar and Muysken (2004: 624): [Sinu] 'cannot be classified for absence of data.' Loukotka (1968: 257) grouped Zenú (Senú) with the Chocó Stock, though nothing was known of the language." However, I've been poking around for quite a while now, and finally found a copy of A. Oyuela-Caycedo's full article in the Handbook of South American Archaeology (edited by Helaine Silverman, William Isbell), which I'd previously only been able to see a page of. On the page I'd seen, Oyuela-Caycedo says "the descendants of the Sinú have lost their language, making it impossible to classify them in terms of known linguistic families (Adelaar and Muysken 2004). However, taking toponyms into consideration the area seems to have been occupied by Chibchan speakers." On the next page, however, Oyuela-Caycedo goes on to discuss Spanish records and mentions that the "name or title" of the Finzenú chief was "Tota", with which clue I tracked down Jaime Alberto Castro Núñez's Historia de la medicina en Córdoba: notas preliminares (2002), mentioning a few other words and names: "A la llegada de los cappunia, como los indios llamaban en su lengua a los españoles, la cacica de Finzenú era Tota; el Zenúfana era Nutibara y el Panzenú era Yapé." "At the arrival of the cappunia, as the Indians called in their language the Spanish, the chief of Finzenu was Tota, that [of] Zenúfana was Nutibara and that [of] Panzenú was Yapé." (Other mentioned names are Anunaybe and Quinuchu.) The ambiguity over whether Tota was a proper name or title appears to be original; I find what seems to be a copy of Sebastián de Benalcázar's writings, saying "[...] Tota, nombre que no sabemos si era el de su cargo o propio de la última reina de este riquísimo pueblo". - -sche (discuss) 05:21, 11 May 2017 (UTC)

Even more languages without ISO codes, part 3
04:54, 6 July 2016 (UTC)


 * (sai-and)
 * I've added the code and a word in it. Ethnologue confuses it with cbc, and it also needs to be distinguished from Andoque (ano). - -sche (discuss) 01:18, 24 May 2017 (UTC)
 * (sai-bae) ✅
 * (sai-ctq) ✅
 * Australian languages
 * (aus-alu) ✅
 * (aus-and) ✅
 * (aus-ang) ✅
 * (aus-bra) ✅ (as Barranbinya)
 * (aus-brm) ✅
 * (aus-uwi) ✅


 * other codes
 * (xgn-kha)
 * The Mongolic lect spoken by the long-bilingual Steppe Tungus. Juha Janhunen has described and documented it in several works. 'Water' is ůhu(n) (ůhů(n)); compare Classical Mongolian 'usun' vs modern standard Mongolian 'us', 'usu'. - -sche (discuss) 23:55, 30 August 2016 (UTC)
 * Added; see also the discussion of it at Beer parlour/2016/November, where another user proposed it for a full code. Uses of the etymology-only code bua-xmn can now be updated to xgn-kha. - -sche (discuss) 04:42, 27 March 2017 (UTC)

Alungul, Angkula
Alungul and Angkula are very similar (but, OTOH, so are other related languages), with Barry Alpher mentioning the following words: Alungul oth:o, Angkula otho "liver" (but e.g. Ikarranggal otho is also similar); Alungul mbolvm, Angkula ombolvm "mosquito" (but also e.g. Ogunyjan ombolvm); Alungul orormv, Angkula otil "nape" (the latter "likely a loan from Uw-Olkola odel", and other languages use different words); Alungul obmo(gng), Angkula obmu "nose" (but also e.g. Athima ubmu); Alungul atïl, Angkula atï "see" (Ikarranggal ara); Alungul amadhv, Angkula amadh "shin" (Athima amadhv); Alungul anggul, Angkula angkul "tooth" (Athima angkul, Ogunyjan enggul). Many words were even recorded from the same informant, with one scholar writing "West recorded considerable material in Ogh Alungul and Ogh Anggula from the now deceased Mr Jack Burton, but subsequently l was able to record and transcribe not only a few fragments more of these but a quantity of another dialect." It is difficult to say if they should be treated as one language, and what it would be called. They have separate codes for now. They could always be merged later. - -sche (discuss) 18:35, 25 May 2017 (UTC)

Other comments
I also added a code for Yangkaal, which Ethnologue had conflated into  for some reason. - -sche (discuss) 03:25, 26 May 2017 (UTC)

Languages of France
For discussion about the addition of codes for Angevin (roa-ang), Champenois (roa-cha), Lorrain (roa-lor), and Franc-Comtois (roa-fcm), Orléanais (roa-orl), Poitevin-Saintongeais (roa-poi), and Tourangeau (roa-tou), and discussion of Bourbonnais and Berrichon, Mayennais and Sarthois and Percheron, see Beer parlour/2017/May. - -sche (discuss) 21:57, 1 June 2017 (UTC)

Renaming Standard Moroccan Amazigh (zgh)
The preferred form in the ISO 639-2 is: Moroccan Amazigh.
 * I have moved this from Wiktionary talk:Language treatment/Discussions, the out-of-the-way place where finished discussions are archived. - -sche (discuss) 18:00, 24 June 2017 (UTC)
 * In general, we do try to avoid using "Standard" in language names (e.g. we have "Estonian", not "Standard Estonian"), so a rename would be good, and "Moroccan Amazigh" does seem to be more common than "Moroccan Tamazight". - -sche (discuss) 18:00, 24 June 2017 (UTC)
 * Support per -sche. This language is currently our only one using "Standard" of all the coded languages with categories. —Μετάknowledge discuss/deeds 14:33, 27 June 2017 (UTC)


 * Support I guess, but the ISO 639-3 name is actually "Standard Moroccan Tamazight". —Aɴɢʀ (talk) 18:48, 27 June 2017 (UTC)
 * You can find in the official Request for new language code:

"Name(s) of language (English): (Required)
 * Moroccan Amazigh, Amazigh, Common Moroccan Amazigh, Standard Moroccan Amazigh, Moroccan Berber

If giving variant names, indicate preferred form first Name(s) of language (French):
 * Amazighe marocain, amazighe, amazighe marocain commun, amazighe marocain normalisé, berbère marocain, amazighe standard.

If giving variant names, indicate preferred form first Reference where found:
 * The French name is used by Institut royal de la culture amazighe (IRCAM). Amazighe is the name found in the French version of the constitution approved by a referendum on the 1st of July 2011.
 * http://lematin.ma/Events/discours-royal/constitution-referundum.pdf (article 5) and http://www.sgg.gov.ma/bo5952F.pdf?cle=42 (Official Government Gazette)
 * The qualificative "marocain"/"Moroccan" is used to make sure it does not refer to a specific Moroccan dialect of Berber [tzm] (Moroccan being broader than the Central Atlas qualitificative used for [tzm]) or to the Berber/Amazigh family as a whole [ber] (Moroccan being narrower than the whole family’s coverage area).

Name(s) of language (indigenous):
 * ⵜⴰⵎⴰⵣⵉⵖⵜ". ‒ YesIn (discuss) 03:08, 5 September 2017 (UTC)


 * Renamed to "Moroccan Amazigh". —Μετάknowledge discuss/deeds 02:39, 2 October 2017 (UTC)

Renaming Banggarla (bjb)
Judging by Google Books and Scholar searches, and based on the references cited by Glottolog and Wikipedia, the most common name for this language in recent literature (since the 1990s?) seems to be Barngarla, while the most common name historically/overall (still found in some recent references) is Parnkalla. Some old references have been updated from Parnkalla to Barngarla, e.g. Mark Clendon's Clamor Schürmann’s Barngarla grammar: A commentary on the first section (where the referred-to original had Parnkalla), which argues based on recordings as well as etymology that the name is /parnkarla/, with the /ŋ/-form Banggarla as a northern dialect form "or even an exonymic pronunciation". Banggarla, and Barngala with no second 'r' and Parnkarla with 'P' and two 'r's, seem relatively uncommon, and many other spellings exist (see Glottolog). I suggest a rename to Barngarla, or Parnkalla (this entails moving the categories). (Fr.Wikt has "banggarla" and "barngala" as separate languages, but this seems to be an oversight and I will let them know.) - -sche (discuss) 15:43, 26 June 2017 (UTC)
 * I support a rename to something, with "Barngarla" being my preference, but we seem to vacillate in general on whether we should use the more traditional spelling or the one that is becoming the standard. Compare "Kikuyu" vs "Gikuyu". —Μετάknowledge discuss/deeds 19:22, 26 June 2017 (UTC)
 * Renamed. - -sche (discuss) 03:14, 9 July 2017 (UTC)

East Franconian (vmf), High Franconian (gmw-hfr)
Noting here that gmw-hfr was retired and vmf restored after becoming usable; see Beer parlour/2017/November. - -sche (discuss) 23:45, 29 December 2017 (UTC)

Splitting Monguor into Mangghuer and Mongghul
This was already discussed at Beer_parlour/2016/December, but seeing as it's been a few days I've decided to use the circumstance of this being the preferred place for such requests to bump the topic. Crom daba (talk) 05:19, 12 December 2016 (UTC)
 * Support. —Aɴɢʀ (talk) 18:39, 12 December 2016 (UTC)

, what would happen to words in older literature such as de Smedt / Mostaert 1933 or Todaeva 1973 which are not clearly marked as being either Minhe or Huzhu? Now they can be added under Monguor. Someone can later label them as Mangghuer or Mongghul using. After splitting, a lot of useful stuff from older sources will hang in the air. --Vahag (talk) 14:49, 22 December 2016 (UTC)
 * Most sources are identifiable as either Mangghuer or Mongghul, we could specify which resource contains what in the about page. A bigger problem for me is how do we even enter data from the old sources? Do we put in Todaeva's Cyrillic and Mostaert's pre-IPA phonetic symbols or do we transcribe it into Pinyin? I wasn't around long enough to see what the precedent here is. Crom daba (talk) 22:08, 22 December 2016 (UTC)
 * If you can specify which resource contains what in the about page, then I support the split. Otherwise, I would like to have three codes, like we do with ku (Kurdish macrolanguage), kmr (Northern Kurdish variety), ckb (Central Kurdish variety).
 * As for entering words from older sources, you can normalize them into Pinyin, as long as the rules of normalization are clearly defined. Look at what I have done with Udi at WT:AUDI. --Vahag (talk) 06:56, 23 December 2016 (UTC)
 * After some research, it appears that correspondence of Pinyin spellings (as written by Dpal-ldan-bkra-shis et al) and older attestations is non-trivial, so I will put off transcribing anything which isn't already in Pinyin, at least until I see an example of it supporting the full range of dialectal and historical Monguor variation. Crom daba (talk) 00:16, 24 December 2016 (UTC)


 * I added codes for Mangghuer (xgn-mgr) and Mongghul (xgn-mgl). I suggest that we move as much content as possible to the new codes and update the orthography as we go, and that will give us an idea of whether or not it is feasible to retire the macrolanguage code yet. - -sche (discuss) 00:58, 2 June 2017 (UTC)

Splitting Evenki and Solon
Solon is a language spoken by a Tungusic people living in Manchuria and Inner Mongolia, they consider themselves Evenks, but their language is somewhat different and is not usually counted among Evenki dialects in literature. We follow Ethnologue in categorizing it as Evenki, but most Russian Tungusic literature (Tsintsius, Vasilevich, ...) counts it as a separate language and I too think this would be preferable.

Also worth mentioning is that we have the language of Oroqen as a separate language already for whom Janhunen claims are " in fact much closer to the "Ewenki proper" (i.e., the Evenks of Siberia) than the Solon are" (quote from Wikipedia). Crom daba (talk) 08:37, 7 August 2017 (UTC)


 * Sadly (as the lack of response shows), I think you may be the only editor with knowledge of Tungusic. The English-language works I can find discussing the lects, while mostly general rather than specialist, also seem to mostly speak of them as separate languages, even though Wikipedia redirects "Solon language" to Evenki. I see that Solon currently has an etymology-only code ("evn-sol"); do you want it to have a "full" code and its own header and entries like дяви, ? That seems reasonable, although the code should be "tuw-sol", I think, to fit the customary naming scheme describe in WT:Languages; if that's OK, ping me to add it or add it yourself to Module:languages/datax (and then update the entries that refer to it and remove the etymology-only code "evn-sol"). - -sche (discuss) 06:44, 29 December 2017 (UTC)
 * Yes thank you, I'd like it to have its own header . Crom daba (talk) 17:02, 29 December 2017 (UTC)
 * ✅: [//en.wiktionary.org/w/index.php?title=Module:etymology_languages/data&diff=prev&oldid=48340758], [//en.wiktionary.org/w/index.php?title=Module:languages/datax&diff=prev&oldid=48340749]. - -sche (discuss) 23:17, 29 December 2017 (UTC)

Renaming (A)Ngas, anc
A pretty clear case. We call it "Angas", but the name "Ngas" has been more common for quite some time now. Compare with  for an example. —Μετάknowledge discuss/deeds 20:53, 14 August 2017 (UTC)


 * Yes, perusing those searches and Glottolog's list of reference works about it, I see that some books do still use "Angas" but "Ngas" does seem to have been more common for at least a couple decades. (And you have access to better resources on African languages than I do and I trust your judgment.) I recall that we renamed the related Mwaghavul (from "Sura") only a couple years ago. Renamed. - -sche (discuss) 06:15, 29 December 2017 (UTC)

Removing Jorto, jrt
An anon has pointed out that is apparently a spurious invention. There is a wordlist, and a case could be made for including it, but given that we chose to exclude the, I don't see how this is any different. —Μετάknowledge discuss/deeds 00:57, 28 December 2017 (UTC)


 * Remove before it lays eggs. Palaestrator verborum sis loquier 🗣 01:07, 28 December 2017 (UTC)


 * Exclude. Remove it from Module:languages/data3/j and add it to WT:LT. —Mahāgaja <small style="font-size:85%;">(formerly Angr) · talk 13:14, 28 December 2017 (UTC)


 * ✅. - -sche (discuss) 23:39, 29 December 2017 (UTC)

Even more languages without ISO codes, part 4
—Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)


 * (poz-bal)
 * ✅ - -sche (discuss) 00:35, 12 January 2018 (UTC)
 * (sdv-har)
 * ✅ as nub-har (nub is the "closer", and also more pronuncable and hence memorable, family code) - -sche (discuss) 06:43, 19 January 2018 (UTC)
 * (sai-mlb) ✅
 * (awd-mar) ✅
 * (sai-myn)
 * ✅ as sai-mys to avoid similarity to myn - -sche (discuss) 00:46, 12 January 2018 (UTC)
 * (sai-mcn) ✅, but distinguishing which words are Mocana and which are Malibu may be non-trivial
 * (sai-qng) ✅ as nai-qng (Mexico is in North America, right?)
 * (sai-ram) ✅
 * (sai-trr) ✅

Others:


 * (nai-hit). 'Water' is 'oki' or 'oke' (Thomas Noxon Toomey, Analysis of a Text in the Apalachi Language, 1917, in notes).
 * - -sche (discuss) 04:34, 31 August 2016 (UTC)
 * ✅. - -sche (discuss) 07:14, 19 January 2018 (UTC)


 * Alazapa/Alasapa/Pinto (nai-ala), a fragmentarily attested language of northern Mexico, scarcely described in English and not that much better described in Spanish. Considered to be related to Quinigua and Cotoname by Norman A. McQuown's 1968 Handbook of Middle American Indians (volume 5: Linguistics, page 100), it is sometimes identified with or considered a dialect of Coahuilteco, apparently as part of the former belief that the "the Coahuiltecans belonged to a single language family and that the Coahuiltecan languages were related to the Hokan languages of California, Arizona, and Baja California. Most modern linguists, however, discount this theory for lack of evidence and believe that the Coahuiltecans were diverse in both culture and language. At least seven different languages are known to have been spoken[.]" Some of the scholars who responded to and followed up on del Hoyo's vocabulary of Quinigua provide a few Alazapa words, like axi "tobacco" (compared to Karuk úuh "tobacco", Esselen ka'a "tobacco"). - -sche (discuss) 03:48, 6 June 2017 (UTC)
 * I'm away from my books at the moment, but I seem to remember Yuman languages having something along the lines of /up/ for tobacco. Chuck Entz (talk) 04:01, 6 June 2017 (UTC)
 * Yes, Cocopa has ˀu·p "tobacco", and ˀu·p xyay "smoke tobacco". Quechan/Yuma itself has axta/ak’sa’ for a tobacco pipe; in a short search, I didn't find the [Quechan] word for tobacco itself, but the list I found the Alazapa word in was comparing it to mostly words for "tobacco" but some words for "pipe". (I updated the spelling of the Karuk and Esselen words to the spellings given in dedicated works on those languages.) - -sche (discuss) 04:30, 6 June 2017 (UTC)
 * ✅ - -sche (discuss) 19:48, 16 June 2017 (UTC)

Ossetian: RFM discussion: December 2012–January 2018

 * Template:os

The Digor and Iron dialects of Ossetian seem quite different, and already many (most?) of our entries distinguish which is meant. It seems to me that there is a fair chance that the two are separate enough to deserve being called different languages here. —Μετάknowledge discuss/deeds 04:46, 19 December 2012 (UTC)
 * I've seen them referred to as separate languages before, but there's still some debate over that. Doesn't matter to me. But would there still be plain Ossetian language entries or would all be sorted into the new languages? There are some that aren't labelled as either Iron or Digor.Word dewd544 (talk) 17:58, 20 December 2012 (UTC)


 * Iron is by far the more common dialect, and the literary Ossetian language is based on Iron. However, Digor is different enough that it could be considered a separate language. The main things against it here are the relatively small number of speakers and that it does not yet have a written standard, as far as I know. But there is now a Digor dictionary out there, and it’s probably just a matter of time before Digor develops a literary standard of its own. I think it’s unlikely that we will get enough Digor contributions to make a difference, but it is always possible that someone will start entering words from a Digor dictionary. The Digor language code is oss-dig. We could use os for Ossetian proper (and Iron), and oss-dig for Digor. —Stephen (Talk) 02:20, 21 December 2012 (UTC)
 * In that case, I support. — Ungoliant (Falai) 03:40, 21 December 2012 (UTC)


 * A standard language based on one widely-spoken dialect, and another lect sometimes considered a dialect and sometimes considered its own language? This reminds me of Tosk vs Gheg Albanian: some references say they're mutually unintelligible separate languages, speakers say their differences present no impediment to communication. Unfortunately, we lack speakers of the Ossestian lects, and the dictionary of Digor is said to waffle, the author calling it a language and the editor calling it a dialect. Stephen is probably right that it's just a matter of time before Digor develops its own standard (and merits separation as much as Luxembourgish and Limburgish do from each other and from German); OTOH, Wiktionary, like Wikipedia, is not a crystal ball. My preference would be to wait and not split them for now. If we do split them, I agree with using for Iron (compare  and ), and we should devise an exceptional code for Digor that fits our usual naming scheme (<tt>ira-odg</tt> or <tt>ira-dig</tt>), rather than using Linguist List's ersatz "oss-dig". - -sche (discuss) 03:42, 21 December 2012 (UTC)


 * Etymology-only codes have been created for Digor and Iron, per the last part of this January 2018 WT:ES thread. - -sche (discuss) 03:21, 10 January 2018 (UTC)


 * Not merged at this time; cf the WT:ES discussion; but especially if the developments with regard to resources in the lects over the past few years suggest they should, in fact, have separate codes, please feel free to reopen (or start a new, non-stale) discussion. - -sche (discuss) 14:47, 12 January 2018 (UTC)

Renaming (Tshi)Luba, lua
This language is currently called "Tshiluba", which is a really awful choice. First of all, tshi- is that good ol' language prefix that we often try not to have in language names (which I think is in modern orthography), and there are in fact two Luba languages (the other is lu "Luba-Katanga"). To avoid confusion, we rightfully give neither the name Luba, but this is not much better, and we should rename it to "Luba-Kasai", as Wikipedia does. —Μετάknowledge discuss/deeds 19:28, 27 November 2015 (UTC)


 * On the one hand, we do try to avoid prefixes. On the other hand, "Luba-Kasai" seems to more often be a placename and an ethnonym than a language name, and "Tshiluba" seems to be about twice as common. - -sche (discuss) 21:15, 8 January 2016 (UTC)


 * The issue is that, AFAICT, "Tshiluba" is more commonly used because it refers to both Luba languages! This is not so much about prefixes so much as the issue of the name being exceedingly ambiguous in its referent. —Μετάknowledge discuss/deeds 01:28, 9 January 2016 (UTC)


 * OK, renamed. - -sche (discuss) 21:19, 14 January 2018 (UTC)

Taishanese and Teochew

 * I think Taishanese dialect of Cantonese and Teochew dialect of Min Nan both need language codes. They are covered by Chinese pronunciation modules but theire transliterations is very different from Cantonese and Min Nan accordingly. E.g. in "yon3 bit2" is not standard Jyutping and "ing5 big4" is Teochew, not POJ. . Perhaps these subdialects needs nesting in translations and numbered tone marks (also Gan, Jin, Xiang) also need superscripts, just like Cantonese Jyutping. --Anatoli T. (обсудить/вклад) 11:32, 12 November 2016 (UTC)
 * typo fix→ —suzukaze (t・c) 12:05, 13 November 2016 (UTC)
 * Thanks, suzukaze! --Anatoli T. (обсудить/вклад) 12:08, 13 November 2016 (UTC)
 * They seem to have language codes already ( for Taishanese and   for Teochew); at least they work in the etymology templates. — justin(r)leung { (t...) 00:58, 14 November 2016 (UTC)
 * Thanks but if I try add a translation using 'yue-tai' I get the error:
 * Lua error in Module:languages/templates at line 28: The language code 'yue-tai' is not valid.: Lua error in Module:parameters at line 110: The parameter "<strong class" is not used by this template. --Anatoli T. (обсудить/вклад) 06:15, 14 November 2016 (UTC)
 * Yes, those are etymology-only codes. They would standardly need different codes if we are to treat them as full languages, though. The point is that you guys need to decide what status you want these lects to have. —Μετάknowledge discuss/deeds 06:24, 14 November 2016 (UTC)
 * If there's a need to enter them into translations tables (on account of their different transliteration and, according to Wikipedia, sometimes different vocabulary), they could be given full codes, which as Meta notes would be named a little differently (using the system described in WT:LANG): zhx-tai and zhx-teo. I await Wyang's input. As an aside, we should consider taking a Chinese approach to Arabic, i.e. not have separate headers for each dialect, but retain the option of listing each dialect's pronunciation and maybe having each dialect in translations tables. - -sche (discuss) 16:36, 14 November 2016 (UTC)

Thanks all. Will further nesting open a Pandora's box of nesting subdialects if we do it like this? (pls note new rows for Taishanese and Teochew):

There's some work for translation adder as well. --Anatoli T. (обсудить/вклад) 21:27, 15 November 2016 (UTC)
 * Chinese:
 * Cantonese: 鉛筆, 铅笔
 * Taishanese: 鉛筆, 铅笔
 * Gan: 鉛筆, 铅笔
 * Hakka: 鉛筆, 铅笔
 * Jin: 鉛筆, 铅笔
 * Mandarin: ,
 * Min Dong: 鉛筆, 铅笔
 * Min Nan:, 铅笔
 * Teochew: 鉛筆, 铅笔
 * Wu: 鉛筆, 铅笔
 * Xiang: 鉛筆, 铅笔

In this case I would really prefer: where extracts and displays the simplified form from the entry, as well as extracts all the readings of the word from the entry, and (if on a computer) displays the readings on hover-over or (if on a mobile device) something. Additional topolect-specific translations can be added as:
 * Chinese:
 * Chinese:
 * Cantonese:
 * Mandarin:

Wyang (talk) 21:34, 15 November 2016 (UTC)
 * We'd need a full-blown zh-t that can serve the function of linking to zh.wikt if an entry exists, and you'd also need to run a bot to update those every now and then (or convince Ruakh to do it). This would imply some changes to the translation adder as well, and possibly other gadgets and bits of code lying around. In short, that's a big jump from what Anatoli proposed. It does sound like a good idea, though, if you want to put the requisite work into it. —Μετάknowledge discuss/deeds 00:10, 16 November 2016 (UTC)


 * so, do we need to add language codes for these varieties (in which case, please add them), or are they adequately covered by ? - -sche (discuss) 23:19, 11 January 2018 (UTC)


 * I think having full language codes for these would be useful in translation tables. What thinkest thou? Wyang (talk) 13:12, 12 January 2018 (UTC)
 * Thank you for pinging me but I'm not familiar with Cantonese and Banlamgu and their regional speeches and I currently have no idea about this... Dokurrat (talk) 13:35, 12 January 2018 (UTC)
 * Thank you for your frankness... :) Wyang (talk) 13:42, 12 January 2018 (UTC)


 * Support. — justin(r)leung { (t...) 13:35, 12 January 2018 (UTC)


 * OK, thank you all for your input. ✅. :) - -sche (discuss) 14:44, 12 January 2018 (UTC)


 * Hmm, having translations is nice but do we really want Category:Teochew lemmas? —suzukaze (t・c) 23:18, 13 January 2018 (UTC)
 * I presume such categories would be populated the same way as Category:Min Nan lemmas (and Category:Old Chinese lemmas, etc), i.e. the entries use the consolidated "Chinese" L2 header... at which point, having Category:Teochew lemmas seems no better or worse to have than Category:Min Nan lemmas. (There are some other languages where it could be good to do similar, e.g. Arabic and possibly Romani.) - -sche (discuss) 23:38, 13 January 2018 (UTC)
 * That's true. It seems weird though, because Teochew is Min Nan, and Taishanese is Cantonese. —suzukaze (t・c) 05:18, 14 January 2018 (UTC)

Diegueño
I thought I might dig out a few references I have and add a few entries in one of the lects that go by this name, but I'm a bit confused by the way we have the language codes set up.

Diegueño is the name that anthropologists have traditionally used for the language originally spoken around Mission San Diego in the southwestern corner of California. Older references referred to it as single language covering most of San Diego County, California and northern Baja California, Mexico, but I always understood it to be at least three languages: Just to confuse things, Kumeyaay is sometimes used to refer to all three, and there are some scholars who have merged Tipay and Kumeyaay into a single language that they call Tipai. Then there's Kamia, which has been used in older literature for a variety of groups who all seem to have been Diegueño of one sort or another. The ISO has a single code, dih (which we call Kumiai) and our lone entry using that code is the 'Iipay 'aa word for water. That would make sense if we treated all of Diegueño as one language, but we have an exception code for Tipai: nai-tip, and a single entry. As far as I know, no one currently considers Ipai to be part of Kumiai unless they consider Tipai to be part of it, too- hence my confusion.
 * 1) Northern Diegueño, known to its speakers as 'Iipay 'aa, and generally called Ipai in the literature
 * 2) Central Diegueño, now called Kumeyaay or Kumiai
 * 3) Southern Digueño, now calley Tipai or Tipay

I've only studied Ipai (perhaps I should say "dabbled in"), so I can't judge for myself how different the lects are from each other. Based on what little I have read, it would seem to me that we have just a few credible options: I would recommend against using dih for anything but the single-language option- this isn't a macro-language with a standard or prestige lect, and it would probably just confuse things. Chuck Entz (talk) 06:12, 19 June 2017 (UTC)
 * 1) Treat Diegueño as a single language, keeping dih and retiring nai-tip
 * 2) Treat Ipai as separate, but merge the rest of Diegueño into Tipai
 * 3) Treat each of the three lects as independent, preferably all with exception codes (nai-ipa, nai-kum and nai-tip, perhaps?).


 * (After digging into the history of the codes, I've refreshed myself that) I added the code for Tipai in [//en.wiktionary.org/w/index.php?title=Module%3Alanguages%2Fdatax&type=revision&diff=37420153&oldid=37413124 diff], so I could add [//en.wiktionary.org/w/index.php?title=xu&type=revision&diff=37420166&oldid=36345923 diff], after seeing that we called dih "Kumiai" and taking that to mean that it referred to the Central dialect and the others needed codes. (Apparently, the ISO/Ethnologue's actual reason for calling it "Kumiai" is that some people use "Kumiai" instead of "Diegueño" as the cover term, as you note.) I probably didn't add a code for the Northern variety at that time because I didn't want to bother figuring out what to call it ("'Iipay 'aa" struck me as a suboptimal name; do we have other names with spaces in them where the part after the space isn't capitalized?) at a time when no-one was coming forward with content that needed to be added in it. :p
 * Victor Golla, California Indian Languages (2011, ISBN 0520266676, page 120, says: "While Kroeber (1925) and others treated the Kamia as a Diegueño subgroup, there is no firm evidence in support of this approach, although the name they are known by [Kamia] appears to be a variant of Kumeyaay (Langdon 1975a). With this possible exception, all of the groups definitely known to have spoken varieties of Diegueño were located west of the present San Diego-Imperial County line or in Baja California west of the Sierra de Juarez. [...] Although Ipai and Tipai are to some extent mutually intelligible, they show numerous differences in vocabulary and structure (for a comparison of Mesa Grande Ipai and Jamul Tipai see A. Miller 2001:359-363) and have sometimes been treated as separate languages. Winter [...] judged [Tipai] to be no closer to (Northern) Diegueño than to Cocopa. The most recent classification (Langdon 1991; Miller 2001:1-4) distinguishes [Ipai, Tipai and Kumeyaay]."
 * Amy Miller's referred-to work, A Grammar of Jamul Tipay, says "A comparison of descriptions of Mesa Grande [...] with the results of my own research reveals that differences between Mesa Grande and Jamul pervade the phonology, lexicon, derivational morphology, inflectional morphology, syntactic morphology, syntax, and discourse."
 * OTOH, Tipai Ethnographic Notes: A Baja California Indian Community (2001, ISBN 0879191449, edited by Langdon et al, says: "These Mexican Diegueno, who call themselves Ipai or Tipai 'people,' cannot be described as now forming a tribe; they are a group of Indian families speaking mutually intelligible dialects (Northern and Southern) of a language[.]"
 * - -sche (discuss) 18:51, 24 June 2017 (UTC)
 * Amy Miller has a comparison of Ipai and Tipai in her work cited above; I have put a shorter comparison of various words at User:-sche/Diegueño. - -sche (discuss) 20:19, 24 June 2017 (UTC)


 * I have split (and retired) dih into three codes as proposed above. - -sche (discuss) 22:02, 19 January 2018 (UTC)

Renaming Azeri to Azerbaijani
I propose to rename the language name Azeri to Azerbaijani. The reasons are (1) to distinguish it from the Iranian language, (2) Ethnologue, Glottolog and Wikipedia call the language Azerbaijani, (3) Azerbaijani is closer the native form Azərbaycan dili, (4) Azeri has pejorative connotations in Armenian. , currently the most active Azerbaijani contributor, agrees. --Vahag (talk) 14:01, 24 January 2018 (UTC)
 * Support. Allahverdi Verdizade (talk) 15:52, 24 January 2018 (UTC)
 * Support. And to summon evidence more appropriate to our dictionarying efforts, Google Books searches for "speaking X" or "the X language" have invariably returned more hits with "Azerbaijani" than with "Azeri" when I have tried them. —Μετάknowledge discuss/deeds 17:22, 24 January 2018 (UTC)
 * Pinging several potentially interested users for more opinions: . --Vahag (talk) 15:23, 25 January 2018 (UTC)
 * Support. It took me some getting used to the English "Azeri" at Wiktionary, which also has negative connotations in Russian and many Azerbaijanis speak Russian. is pejorative for  in Russian. --Anatoli T. (обсудить/вклад) 15:28, 25 January 2018 (UTC)
 * Support. Azeri is a little "vague" while Azerbaijani is kind of long and sounds only limited to Azerbaijan, but I support the change, "Azeri" is used for people rather than the language itself in Turkish as well. --Anylai (talk) 17:33, 25 January 2018 (UTC)
 * Support --Z 20:06, 25 January 2018 (UTC)
 * Support for the reasons Vahag and Meta outlined. Someone with a bot will need to implement the rename, because at least three thousand entries will need to be updated, and that's just the ones where the L2 header needs to be changed; in other entries, translations tables and descendants lists and etymologies where the name is spelled out will need updating. - -sche (discuss) 23:51, 25 January 2018 (UTC)
 * Support. I never liked that we used that name. PseudoSkull (talk) 00:16, 26 January 2018 (UTC)
 * Looks like this is uncontroversial, so ✅. I made a request at the Grease Pit for a rename by a bot. --Vahag (talk) 17:52, 9 February 2018 (UTC)

Has anyone modified the tool that assists in adding translations to translation tables so that it uses the word “Azerbaijani”? — SGconlaw (talk) 04:57, 10 February 2018 (UTC)
 * That happens automatically. DTLHS (talk) 05:04, 10 February 2018 (UTC)
 * Oh, good. — SGconlaw (talk) 05:54, 10 February 2018 (UTC)

Removing spurious Dororo, Guliguli, Yarsun, and Wares
In 1953, Lanyon-Orgill provided short wordlists of two languages he called Dororo and Guliguli. His lists are so similar to Kazukuru that subsequent scholars have suspected they are dialects, if not alternative transcriptions, if not jokes. Karen Davis, in A grammar of the Hoava language, Western Solomons, notes "there was no one in present day Kusaghe who had heard of [Dororo or Guliguli], and Lanyon-Orgill does not identify his informant. As one of the names of the dialects, Guliguli, can mean 'masturbate' in Hoava-Kusaghe, I have my doubts about the existence of this language." Michael Dunn and Malcolm Ross expand on Davis's point in their 2007 article Is Kazukuru really non-Austronesian, with the view (accepted also by e.g. Harald Hammarström and Sebastian Nordhoff, in Melanesian Languages on the Edge of Asia: Challenges for the 21st Century) that if the lects were real, they were the same language as Kazukuru, which I propose to merge them into. (Sample words in Kazukuru, Guliguli, and Dororo: pito, bito, bito "arrow"; vinovo, vino, bino "banana"; viniti, vini, vinitini "body"; minata, minate, minate "die"; meta, mata, mata / meta "eye"; rano, rano, rano "head"; muni, moni, muni / moni "night".) - -sche (discuss) 06:44, 12 May 2017 (UTC)
 * merge Dororo [drr] and Guliguli [gli] into [kzk]
 * ✅. - -sche (discuss) 21:03, 19 January 2018 (UTC)

As noted by Hammarström, Melanesian Languages on the Edge of Asia: Challenges for the 21st Century, the existence of a Yarsun language seems have been based on the confusion of language names with place names which is not uncommon in Papua (Yarsun is near Anus Island, and that's not a joke); "no such language is attested". Likewise with Wares. - -sche (discuss) 06:44, 12 May 2017 (UTC)
 * remove Yarsun [yrs] and Wares [wai]
 * ✅. - -sche (discuss) 23:26, 29 December 2017 (UTC)

RFM discussion: July 2016–January 2018
This Pama-Nyungan lect seems to have been given an exceptional code with no discussion, for its use at one transwikied entry, ngargee. It's by no means clear that we should be giving it a separate code rather than treating it as a dialect of Woiwurrung wyi; Wikipedia claims they are 90% mutually intelligible, but doesn't cite that claim directly. —Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)


 * I noticed this a while ago, but left it alone because I could not at the time find a reference (i.e. outside WP) that confirmed that they were mutually intelligible, probably because the huge variety of spellings made searching for information difficult. However, I can now find references that suggest we should be merging more than just these two. Leigh Boucher and ‎Lynette Russell's Settler Colonial Governance in Nineteenth-Century Victoria (2015, ISBN 1925022358, page 8, speaks of "the Woiwurrung (Wurundjeri), Boonwurrung, Wathaurung, Taungurong and Dja Dja Wurrung [being] mutually intelligible languages that share up to 80 per cent of their terminology." A paper by Barry Blake and Julie Reid on Sound Change in Kulin, in the La Trobe Working Papers in Linguistics, v 6-8 (1993), speaks of a single Kulin language, with "material available on three dialects: Boonwurrung (B), Woiwurrung (W) and Thagungwurrung (T)" (emphasis mine). Dja Dja Wurrung = dja, and Taungurong / Thagungwurrung = dgw, and Wathaurung = wth, all of which we currently treat as separate. suggests that at least the four eastern ones, if not also Wathaurung, could be merged. - -sche (discuss) 06:48, 6 July 2016 (UTC)
 * I've merged Bunurong; the others still need to be merged. - -sche (discuss) 16:52, 18 September 2016 (UTC)
 * Bunurong was merged at the time this discussion was current. The discussion has since gone stale, and I am going to leave the others alone (unmerged). - -sche (discuss) 06:26, 29 January 2018 (UTC)

RFM discussion: January–March 2018
Here is a document summarising the changes to ISO codes that have been made in 2017. I have gone through all the approved request documents and briefly summarised their findings and my conclusions about what I propose we should do. —Μετάknowledge discuss/deeds 10:36, 31 January 2018 (UTC)

Deprecated codes

 * Mosiro [mwy]
 * Only a clan name, confirmed with fieldwork. I concur with removal.


 * Ndaktup [ncp]
 * Merge into Kwaja [kdz], confirmed with fieldwork. I concur with removal.


 * Lyons Sign Language [lsg]
 * Apparently spurious. I concur with removal.


 * Mediak [mwx]
 * Only a clan name, confirmed with fieldwork. I concur with removal.


 * ✅ I also agree, and have removed them all, adding Mosiro and Mediak and several other things as alt names of  (our spelling of the canonical name, Okiek, seems more common than Wikipedia's Ogiek), and adding Ndaktup and several other forms as alt names of  . - -sche (discuss) 16:56, 31 January 2018 (UTC)

Added codes

 * Puebla Mazatec [pbm]
 * Confirmed with fieldwork, although differences seem to be mainly phonological. I abstain on creation.
 * Tentatively added. - -sche (discuss) 03:55, 7 March 2018 (UTC)


 * Gyalsumdo [gyo]
 * Confirmed with fieldwork and dictionary creation is underway. I concur with creation.
 * ✅, tentatively with Latn, Deva and Tibt scripts listed, based on the two scripts Ethnologue lists for broader Manang [which WP thinks it is a dialect of] and how I see linguists documenting it. - -sche (discuss) 23:35, 7 February 2018 (UTC)
 * But is this consistent with how we treat other Tibetan/Tibetic varieties? E.g.  is subsumed into , somewhat like with Chinese varieties. This may need further discussion, maybe separate from this big list of codes, with Wyang. - -sche (discuss) 23:58, 7 February 2018 (UTC)
 * I realised this when reading your comment at Talk:གསར་པ. We decided on a Chinese-style merger, but this compounds the lack of interest and expertise that has plagued documenting anything other than Lhasa Tibetan (which is admittedly the only one I have formally studied either). —Μετάknowledge discuss/deeds 00:35, 8 February 2018 (UTC)


 * Magɨ (Madang Province) [gkd]
 * Request document has no content; the case seems to be based on Daniels (2016), which is the only documentation. The name should presumably be simply "Magɨ". I concur with creation.
 * Or even "Magi" (although this is also an alt name of Mailu). And, aha, so this is a separate language from  Magiyi/Magɨyi; I wondered about that back when I was looking at , since they're both sparsely-documented Sogeram languages of Madang. - -sche (discuss) 17:43, 31 January 2018 (UTC)
 * I don't think we need to be ɨ-averse when Daniels uses it with that spelling exclusively, and anyone else who publishes on it will presumably follow suit. —Μετάknowledge discuss/deeds 18:47, 31 January 2018 (UTC)
 * ✅. - -sche (discuss) 23:26, 7 February 2018 (UTC)


 * Malawian Sign Language [lws]
 * Confirmed with fieldwork and dictionary creation is underway. I concur with creation.
 * ✅. - -sche (discuss) 22:21, 8 February 2018 (UTC)


 * Ngen [gnj]
 * Confirmed with a Swadesh list as worthy to be split from Beng [nhb]. I cannot find any documentation after a quick search, nor do I find confirming fieldwork. I abstain on creation.
 * Glottlog, citing Anna Maloletnyaya's 2014 Brief presentation of Ngen language, agrees that "Ngen, a Southeastern Mande language of the Ben-Gban group not intelligible to Beng". Therefore, added. - -sche (discuss) 03:55, 7 March 2018 (UTC)


 * Western Armenian [hyw]
 * Request document is chiefly concerned with social needs (e.g. a new Wikipedia). There are significant dialectal differences, but Western and Eastern Armenian are known to be mutually intelligible and have been successfully treated as [hy] at Wiktionary (though there has been little to no prior discussion). There is currently a parallel discussion of this issue in the Beer parlour. I disagree with creation.
 * The BP discussion will take care of this (and seems to be reaching the conclusion that the code should not be added). - -sche (discuss) 23:27, 7 February 2018 (UTC)


 * Mel-Khaonh [hkn]
 * Confirmed with fieldwork; the name is that of the two constituent lects. I concur with creation.
 * ✅. - -sche (discuss) 06:07, 11 February 2018 (UTC)


 * Mankiyali [nlm]
 * Confirmed with fieldwork, although there is no published documentation. I concur with creation.
 * ✅. - -sche (discuss) 06:07, 11 February 2018 (UTC)


 * Tetserret [tez]
 * Confirmed with fieldwork and well documented. I concur with creation.
 * ✅ - -sche (discuss) 07:40, 10 February 2018 (UTC)


 * Nzadi [nzd]
 * Confirmed with fieldwork and has a published grammar. I concur with creation.
 * ✅ - -sche (discuss) 07:40, 10 February 2018 (UTC)


 * Cuitlatec [cuy]
 * Extinct, but documented and confirmed as an isolate by Lyle Campbell, so it can't be a dialect of anything else. I concur with creation.
 * And now I've noticed that we already have this, as [nai-cui]. Excellent, that will just have to be switched over. —Μετάknowledge discuss/deeds 06:34, 10 February 2018 (UTC)
 * ✅ —Μετάknowledge discuss/deeds 03:17, 11 February 2018 (UTC)


 * On the subject of languages we could include: Jedek, just documented (too recently to have been included in this page of ISO codes). (Our own incomparable Stephen G Brown has already created a Wikipedia article for it, .) - -sche (discuss) 22:21, 8 February 2018 (UTC)
 * It is one of many very closely related lects (although geographically displaced from the closest ones to itself). I skimmed the paper, but it really wasn't easy to make a quick determination about whether we should include it. —Μετάknowledge discuss/deeds 06:32, 10 February 2018 (UTC)

Name changes

 * Helambu Sherpa [scp]
 * Changed to Hyolmo, based on the current name being a misnomer (the speakers are not Sherpas and the language is not closely related to Sherpa) and preference of native speakers. I find that Yolmo is in fact the most-used spelling on Google Books and Google Scholar, so I suggest we use that name instead.
 * ✅ (Yolmo). - -sche (discuss) 05:25, 13 February 2018 (UTC)


 * Dzodinka [add]
 * Changed to Lidzonka, based on preference of native speakers. This name has almost never been used in the linguistic literature, whereas Dzodinka has and the request document itself admits that "both are correct". I disagree with the name change.
 * Ergo, not renamed. (But struck, for the sake of tracking which ones have been looked at.) - -sche (discuss) 05:25, 13 February 2018 (UTC)


 * Shixing [sxg]
 * Changed to Shuhi, based on local usage; the requester actually uses the name Xumi when documenting the language. All other use in the linguistic literature appears to follow the original name, Shixing. I disagree with the name change.
 * Ha, weird. ❌ at this time. - -sche (discuss) 05:25, 13 February 2018 (UTC)


 * Irigwe [iri]
 * Changed to Rigwe, based on the current name being erroneous. What little linguistic literature there is seems to have fully shifted over. I support the name change.
 * ✅. - -sche (discuss) 16:50, 22 February 2018 (UTC)


 * Palor [fap]
 * Changed to Paloor, based on the language's new orthographic norms. There is very little linguistic literature on it, most of it in French, and it is difficult to tell what name is or will be more common. I abstain on the name change.
 * The new spelling may well take hold, but it hasn't yet (and we have no contributors in the language who might clamor to use the new spelling when adding entries), so, not renamed at this time. We should reexamine this in a few years. - -sche (discuss) 16:50, 22 February 2018 (UTC)


 * Australian Sign Language [asf]
 * Changed to Auslan, based on popular and linguistic usage. This does seem to be more common as the standard name, and is preferred by native speakers. I support the name change.
 * Has been done, apparently.

The name of [khm] was also changed from "Khmer, Central" to "Khmer", but we have chosen to exclude this code, so it does not affect us.

—Μετάknowledge discuss/deeds 10:36, 31 January 2018 (UTC)


 * Where should discussions like this be archived? — SGconlaw (talk) 03:40, 22 February 2018 (UTC)
 * They are archived to Language treatment/Discussions. If you're asking so you can archive them, you'd be better off leaving them for -sche. —Μετάknowledge discuss/deeds 03:48, 22 February 2018 (UTC)
 * *Thumbs up* — SGconlaw (talk) 03:57, 22 February 2018 (UTC)
 * Oh, no, the more people who know how to do this kind of thing, the better! :) You just have to be sure all the requests have been handled (done/accepted, or rejected). But yes, WT:LTD is the catch-all for anything that doesn't have a more logical place to be archived; and then, if there's a change in how we treat a language (e.g. a language has been split between two codes), update WT:LT and link to the discussion. If a language code has been removed (e.g. merged into something else), I find that it's helpful (and prevents accidental re-creation) to leave a  in the module (as I did for e.g. "frc" and am slowly doing for other codes). - -sche (discuss) 16:42, 22 February 2018 (UTC)

RFM discussion: March 2018
Our spelling for the name of the is simply much less common, on Google, Google Books, or in the literature. Note that we will also need to change the name of Proto-Songhai [son-pro] if this changes. —Μετάknowledge discuss/deeds 01:10, 14 March 2018 (UTC)


 * ✅. - -sche (discuss) 18:12, 17 March 2018 (UTC)

Proto-Sarmato-Alanic [Old Ossetic]
Copied from WT:Etymology scriptorium/2018/April.

OK, let me try this again. I'd like to have 🇨🇬  renamed to Sarmato-Alanic and have entries use. Alanic and Sarmatian (which has no language code otherwise) occupy a dialect continuum, and neither might be the direct ancestor of 🇨🇬 or 🇨🇬. Alanic would then be made into an etymology-only code,. Alternatively, a new language code could be created,, and   removed all together, but I think the former the better option. --Victar (talk) 20:30, 3 April 2018 (UTC)
 * There are a couple problems here that I see just from a brief reading of the discussion. You want us to change to a name that is very rarely used instead of a more common one, and also switch to considering it a protolanguage rather than an attested one, despite the fact that it is actually attested (yes, we do that for Proto-Norse, but it is not ideal and is in large part because, as is relevant here, we try to use the most commonly used names where possible). —Μετάknowledge discuss/deeds 01:19, 5 April 2018 (UTC)
 * Thanks for the reply, . 99% of the entries I'll be entering will be reconstructions, and most will be derived from Ossetian, not Alanic or Sarmatian borrowing. I think there is a ton of precedence for using alternative names for codes on wikt, but I'll concede that I can't think of any example of using the language code of a dialect to refer to a whole dialect family. I'm not opposed to using  instead, but I do still think then the   code should be discontinued, because Alanic, Sarmatian and Proto-Ossetic should all be under the same code. --Victar (talk) 04:13, 5 April 2018 (UTC)
 * To give an example of what I had in mind for formatting descendant trees:

* Sarmato-Alanic
 * Alanic:
 * Sarmatian:
 * Ossetian:
 * --Victar (talk) 04:22, 5 April 2018 (UTC)
 * The proportion that are reconstructions is irrelevant unless it's 100%. When we have mainspace entries, we should avoid assigning them to protolanguages (which are technically hypotheses) wherever possible. We always try to use to the most common, unambiguous name possible, and if you know of any exceptions, we should see if they ought to be fixed. Basically, I think you're conflating the needs of descendant trees and the criteria for determining what ought to be a separate language. Bear in mind that regardless of what codes and names are, you can always structure descendant trees to show distinct dialects or sublects (Crom daba has done this quite fruitfully). —Μετάknowledge discuss/deeds 04:48, 5 April 2018 (UTC)
 * , I think you're missing the point of my need. I want to create reconstructions of Proto-Ossetic and Sarmatian. Sarmatian and Alanic are well established as two separate dialects. --Victar (talk) 04:59, 5 April 2018 (UTC)
 * And if they're dialects, they shouldn't have separate codes. Remember, you can still give them separate lines and reconstructions, when and where those are supported by scholarly sources, regardless of the situation with codes. —Μετάknowledge discuss/deeds 06:18, 5 April 2018 (UTC)
 * Exactly,, which is why Alanic shouldn't have its own code. --Victar (talk) 07:37, 5 April 2018 (UTC)
 * Then why did your example above have them with two different codes? —Μετάknowledge discuss/deeds 16:22, 5 April 2018 (UTC)
 * Sorry, I tried to indicate  and   where etymology-only codes by the dash in them, but I guess that wasn't clear (though I did make that point in my opening statement) .  We also have   for Old/Proto-Ossentic, which we could used instead for parent of Alanic/Sarmatian/Ossentic. We currently list it below , and I've always considered it a stage between, yet MultiTree seems to use it as their catch-all. Again though, I'm not opposed to using a new   code. --Victar (talk) 17:34, 5 April 2018 (UTC)
 * We create etymology-only codes for etymology sections. If there's a language that derives terms from both Alanic and Sarmatian with a meaningful difference between the two, then those codes should exist, but we shouldn't create them just for descendant lists, which can be freely formatted. —Μετάknowledge discuss/deeds 18:39, 5 April 2018 (UTC)


 * Here's a different idea. Alanic and Sarmatian are both barely attested; from my brief reading of the literature, it seems to be unclear whether or not they represent dialects or fully separate speech communities, and whether they represent the ancestor of Ossetian or a close relative (and given the timescales over which they are attested, they cannot be the same thing as a protolanguage, which is a hypothesis of the most recent common ancestor). The resultant action would be to have separate codes for Alanic, Sarmatian, and Proto-Ossetic, with the former two only in mainspace (in original script, e.g. Greek) and the latter only in Reconstruction space (in normalised form). As always, you can format descendant lists however you like. Does that seem like it would make sense? —Μετάknowledge discuss/deeds 18:39, 5 April 2018 (UTC)
 * No, I don't agree with that solution. Whether they exhibit the same exact timeframe is irrelevant and labeling Sarmatian and Ossetic as Alanic is inaccurate. What my sources are reconstructing is a common ancestor of all three of these dialectal branches. See https://ibb.co/jFEnSH. --Victar (talk) 19:01, 5 April 2018 (UTC)
 * Your response is confusing; I did not suggest labelling Sarmatian and Ossetic as Alanic (in fact, I suggested separating all three), and I was under the impression that Proto-Ossetic is the unattested ancestor of all these languages. —Μετάknowledge discuss/deeds 19:30, 5 April 2018 (UTC)
 * Am I understanding this correctly that it is similar to the problem that he have had/are having with Sanskrit and the Prakrits, namely that there is a dialectal continuum between Sarmatian, Alanic, and the unattested ancestor of Ossetian? —*i̯óh₁n̥C[5] 19:24, 5 April 2018 (UTC)
 * Not exactly. Sarmatian, Alanic and Ossentic are all largely unattested dialects form a single language of the Middle Iranian period. So what I'm suggesting is that we unify them under a single code and name, and have the dialects differentiated only by etymology-only codes. I'm recommending the name Sarmato-Alanic, which is what I mostly see in literature when referring to them as a whole, but if I had to choose to unify them under one name out of the three, it would be Ossetic, being the only one with modern descendents. What code we use, be it a repurposed one, or a new one, doesn't matter to me. --Victar (talk) 20:26, 5 April 2018 (UTC)
 * Would you support the idea of unifying Sarmatian and Alanic under Old Ossetic  (as per MultiTree) with both reconstructed and mainspain entries, and making   an etymology only code? That should resolve your Proto-Norse argumentment. --Victar (talk) 22:37, 5 April 2018 (UTC)
 * I see that the name "Old Ossetic" is broadly attested (when spelt correctly), and many sources seem to equate it with Alanic, but that doesn't mean Sarmatian should necessarily be merged as well. If they are indeed dialects as you claimed, then that would be perfectly fine. WP cites EB for the following: "The languages of the Scytho-Sarmatian inscription may represent dialects of a language family of which Modern Ossetian is a continuation, but does not simply represent the same language at an earlier time." If that is true, then Sarmatian should be kept separate. —Μετάknowledge discuss/deeds 23:29, 5 April 2018 (UTC)
 * , there is no code for Sarmatian. If we put Alanic under Old Ossetic as part of a dialectal continuum, Sarmatian, as a dialect thought to be very similar to Alanic, should unequivocally be included. Otherwise it defeats the point. --Victar (talk) 23:49, 5 April 2018 (UTC)
 * I would be in favor of Old Ossetic for the continuum of Alanic and Ossetic and, if it can be demonstrated as true, Sarmatian. In what way can we adjudicate this Sarmatian situation? As mentioned before, I'm getting a little bit frustrated at continuously running across this "continuum problem" (attested languages descending from unattested near neighbors). There's been a fair amount of research saying that the phylogeny of language change tends to be binary in nature, but that depends on how you look at language continua versus dialectally diverse super-languages. I'd be interested to think about the principled use of language continua in our language data (like  or the like), not just the "substrata" we use in the etymology-only language data. The question is in the utility of such a demarcation, but the inherent assumption of our current n-ary (or perhaps my theoretically binary) branching system tends to omit this subtlety of language change because frequently these continua are not protolanguages and exist clearly in the data... I dunno. —*i̯óh₁n̥C[5] 00:07, 6 April 2018 (UTC)
 * Ignoring John's tangent... I know there is no code for Sarmatian. That's immaterial; we can make one if we deem it necessary. You claim that Sarmatian is very similar to Alanic, to the point of being a continuum; I know little about this, but found a scholarly source that claims otherwise. Can you respond to that with actual evidence? —Μετάknowledge discuss/deeds 00:16, 6 April 2018 (UTC)
 * "Sarmatian and Alanic represent a dialect continuum" and "it is difficult to draw the line between Sarmatian and Alanic".
 * "Ossetic [...] is the last remnant of the essentially unknown Middle Iranian dialect area that included Sarmatian, and is said to descend from Alanic."
 * "[Ossetic] is the sole surviving descendant of the Northeast Iranian dialects of the ancient Scythians and Sarmatians and medieval Alans".
 * "Deine klare linguistische Scheidung zwischen Sarmatisch und Alanisch aufgrund der Materiallage nicht möglich ist"
 * Even if Alanic and Sarmatian were divergent enough to call separate languages, that distinction isn't apparent in the little material we have, so to reconstruct them separately at this time would be folly. --Victar (talk) 01:20, 6 April 2018 (UTC)
 * Thanks, that seems like good evidence for merger. I am now satisfied with having "Old Ossetic" as an L2 header with categorising context labels for the dialects. I would like to wait a couple of days just in case anyone raises an objection, so please ping me with a reminder. Also, please clarify if there are any etymology sections that need to distinguish between the dialects; if not, we can dispense with etymology-only codes and simply remove . —Μετάknowledge discuss/deeds 04:44, 6 April 2018 (UTC)
 * Thanks, that seems like good evidence for merger. I am now satisfied with having "Old Ossetic" as an L2 header with categorising context labels for the dialects. I would like to wait a couple of days just in case anyone raises an objection, so please ping me with a reminder. Also, please clarify if there are any etymology sections that need to distinguish between the dialects; if not, we can dispense with etymology-only codes and simply remove . —Μετάknowledge discuss/deeds 04:44, 6 April 2018 (UTC)

, if you have a moment, I would appreciate you making these changes. Thanks. --Victar (talk) 03:10, 16 April 2018 (UTC)
 * , could you please respond to the query in the last sentence of my last comment? —Μετάknowledge discuss/deeds 04:39, 16 April 2018 (UTC)
 * , yes, need the etymology-only codes as well, not for the linguist distinction, really, but for the historical one, i.e. the names of Alanic kings. --Victar (talk) 04:44, 16 April 2018 (UTC)
 * ✅. —Μετάknowledge discuss/deeds 05:33, 16 April 2018 (UTC)
 * Changes look good. Thanks, ! --Victar (talk) 05:35, 16 April 2018 (UTC)

RFM discussion: July 2019
I propose that we treat Proto-Tibeto-Burman as an etymology-only variant of Proto-Sino-Tibetan as it appears that the consensus among linguists in the field is that the set of non-Sinitic Sino-Tibetan languages is not monophyletic. (For a parallel, we treat Proto-Baltic as an etymology-only variant of Proto-Balto-Slavic for the same reason.) By the same token, I propose that we change all languages and families currently listed as  to   instead. Wyang has said on his talk page that he isn't opposed to the idea, and I don't know who else here is working on Sino-Tibetan issues. What do other people think? —Mahāgaja · talk 14:32, 19 July 2019 (UTC)
 * Three days with no response; I'm doing it now. —Mahāgaja · talk 19:05, 22 July 2019 (UTC)

RFM discussion: July–August 2018
Currently the canonical name of ira-wnj is Wanji which is also used by wbi. One of them should be changed. We also have wny which is Wanyi. DTLHS (talk) 18:27, 22 July 2018 (UTC)


 * (Also similar: wdd Wanzi / Wandji.) I've renamed ira-wnj to "Vanji", which is the spelling Wikipedia uses anyway. We don't have much content in either language, just a few entries in descendant trees for the Iranian one and a few translations for the Bantu one, but yes, it causes problems (starting with conflated categories) if they have the same name. - -sche (discuss) 15:08, 24 July 2018 (UTC)
 * I was wondering what the heck happened to this. --Victar (talk) 22:43, 3 August 2018 (UTC)

RFM discussion: September–December 2019
Searches for "speak X" and "X language" show that the spelling is much more common than, which is what we currently use. —Μετάknowledge discuss/deeds 20:59, 12 September 2019 (UTC) ✅ —Mahāgaja · talk 10:09, 24 October 2019 (UTC)
 * Support. —Mahāgaja · talk 16:41, 16 September 2019 (UTC)
 * Support — Eru·tuon 00:05, 3 October 2019 (UTC)
 * I've moved all the categories to the new spelling, but it would be great if someone with a bot could change all L2 headers from ==Inupiak== to ==Inupiaq==. —Mahāgaja · talk 10:12, 24 October 2019 (UTC)
 * Also all instances of "Inupiak" in Translation sections (including Inupiak inside t-simple). —Mahāgaja · talk 22:31, 24 October 2019 (UTC)
 * Has this been done, and if not, can someone with a bot make it so? —Μετάknowledge discuss/deeds 07:24, 24 December 2019 (UTC)
 * I don't know if there are any L2 heading left using the old spelling, but there are definitely still translation sections calling it "Inupiak". I don't have a bot (or the remotest idea how to write one);, is this something one of y'all's bots would be interested in doing? —Mahāgaja · talk 08:02, 24 December 2019 (UTC)
 * Happily all the "Inupiak" headers were caught quickly because they showed up on my incorrect headers page. I went and fixed all cases in translation sections with JWB because there were less than a hundred. — Eru·tuon 10:15, 24 December 2019 (UTC)
 * Glad to hear it! I'm still hoping a bot runner will take care of the translations sections. —Mahāgaja · talk 15:10, 24 December 2019 (UTC)
 * Erutuon in his last comment: "I went and fixed all cases in translation sections". —Μετάknowledge discuss/deeds 18:26, 24 December 2019 (UTC)
 * Oh right. Never mind. —Mahāgaja · talk 18:52, 24 December 2019 (UTC)

RFM discussion: August–September 2018
We have language codes for both   and  , despite being dialects of . I'd like to merge them under either code and rename the canonical name to. --Victar (talk) 22:31, 3 August 2018 (UTC)
 * Support. The phrasing in Liljegren (2016) that you linked to leads me to think that perhaps bsh is the code we should keep, and xvi the one we should retire, although it's wholly arbitrary. Thanks for paying attention to Nuristani, in any case. —Μετάknowledge discuss/deeds 23:07, 3 August 2018 (UTC)
 * Kata-vari is the larger of the two by at least five-fold, which on one hand would make it the logical encompassing one, but  is actually named for its eastern subdialect Bashgali (taken from Dardic), making it a somewhat inaccurate code to begin with, but I don't really care either way. Here is Strand's tree. --Victar (talk) 23:39, 3 August 2018 (UTC)
 * And here is from Strand's 1974 paper, back when he used to refer to Kati as being the parent language of all three dialects: "Kati (Bašgalī) has three major dialects: Katə́viri, Kamvíri, and Mumvíri. Katə́viri is spoken by members of the Katə́ tribe. It is divided into two major subdialects: Western Katə́viri and Eastern Katə́viri. Western Katə́viri is further subdivided into the dialects of Ramgə́l, Kulám, Ktívi (Kantivo), and Pə́řuk (Papruk)". I'm also fine keeping it named Kati, despite it perhaps being outdated, if that's preferable to others. --Victar (talk) 00:05, 4 August 2018 (UTC)


 * Support the merger. As for the name: on one hand, "Kati" seems to be about twice as common even when I search for the names together with "Nuristani" to filter out the New Guinea-area language, but on the other hand the absolute number of words that mention either name is small, so if "Kati" is dated and "Kamkata-viri" is preferred these days, and "Kamkata-viri" would also make clearer to readers, etc that we're considering all the dialects under that one header (and that we're not talking about, Muyu / Kati), then go ahead and use "Kamkata-viri", (with the other names as alt names). - -sche (discuss) 00:58, 7 August 2018 (UTC)

Done. --Victar (talk) 16:57, 5 September 2018 (UTC)

RFM discussion: December 2019–January 2020
Currently "Ruund", which is also is on WP, but "Ruwund" is more common in the literature. —Μετάknowledge discuss/deeds 07:20, 24 December 2019 (UTC)


 * Renamed. Searching the site, the only pages I see using "Ruund" are categories (in family trees) and Reconstruction pages (using desc), which should update automatically; I think there is, therefore, nothing that needs changing(?) besides the three modules I just changed. - -sche (discuss) 08:35, 11 January 2020 (UTC)

RFM discussion: March–May 2020
We currently call this "Mpuono"; "Mbuun" is much more common in the (admittedly limited) scholarly literature. For example, compare and. —Μετάknowledge discuss/deeds 05:16, 10 March 2020 (UTC)
 * Renamed. —Μετάknowledge discuss/deeds 23:26, 12 May 2020 (UTC)

RFM discussion: July 2016–January 2020
Hi, I'm not sure whether this is the right venue for this discussion, but I would like to bring up Dungan, which is spoken in Central Asia. According to Wikipedia, Ethnologue, and Glottolog, this is a Chinese language, specifically Mandarin. There are only 20 entries in Wiktionary that are for Dungan, and all of them are Mandarin words, with some from the Gansu and Shaanxi dialects. The difference, however, is that, Dungan is written in Cyrillic. In Wiktionary, all Chinese dialects are merged into one single Chinese entry, and pronunciations are listed. Shouldn't we do that, or at least partially, for Dungan? Please feel free to comment. Thanks. --Mar vin kaiser (talk) 14:54, 16 July 2016 (UTC)


 * Re venue: this is indeed a venue where discussions of merging/splitting language codes and categories and entries take place. I tend to put my "biggest" proposals (ones that needed votes in the past, or that concern major or controversial languages that I suspect will need votes) in the WT:BP, but here is OK.
 * About Chinese and Votes/pl-2014-04/Unified_Chinese intentionally left lects that don't use Hanzi separate from Unified Chinese, but I don't know if that was because Chinese editors felt they should never be merged, or just felt that merging them would be difficult and best attempted after everything else had been merged. It would obviously be possible to merge Dungan and other such lects if Chinese editors wanted to; we have plenty of other languages which use multiple scripts (e.g. Afrikaans). However, the various Chinese lects which are distinctive to the point of potentially being not-mutually-intelligible when spoken were able to be unified here because they share a written form in which they are theoretically mutually intelligible. If Dungan is potentially not intelligible with lects from other areas (lects that differ from Mandarin enough that speakers don't understand it without study) in either speech or writing, then what would be the basis for unifying them? - -sche (discuss) 15:42, 16 July 2016 (UTC)
 * Actually, Shaanxi and Gansu Mandarin is mutually intelligible with Dungan. Furthermore, a large majority of Dungan vocabulary is from Chinese, which therefore, has Chinese equivalent entries written in Chinese characters. There are Russian and Turkic vocabulary. My suggestion is to leave Dungan loanwords from Russian and Turkic as written in Cyrillic, and merge the Dungan Chinese words with Chinese entries, and perhaps leaving the Cyrillic entry of those words like how Chinese pinyin and Japanese romaji are left. --Mar vin kaiser (talk) 15:53, 16 July 2016 (UTC)
 * Also, I speak Mandarin, and I tried listening to Dungan videos in Youtube. They're actually understandable for the most part. As in I can write down what they're saying in Chinese, except for some words though (presumably Russian and Turkic loanwords). --Mar vin kaiser (talk) 16:04, 16 July 2016 (UTC)


 * I am not opposed to this, but it would require that Dungan orthography be incorporated into the relevant Chinese templates, so 's aid and support will be critical. —Μετάknowledge discuss/deeds 16:21, 16 July 2016 (UTC)
 * I support this. Cyrillic Dungan forms can be added to, under Mandarin. Wyang (talk) 00:25, 17 July 2016 (UTC)
 * The main caveats are that Cyrillic is apparently the standard script for Dungan, unlike with Pinyin or Romaji, and there may be some vocabulary that only exists in Cyrillic. I suppose the writing systems for Hokkien might be analogous, though. Chuck Entz (talk) 01:07, 17 July 2016 (UTC)
 * This doesn't feel right to me. I think it would be like folding Maltese into Arabic, or merging Hindi and Urdu. I foresee a lot of complaints from anons if entries like дянхуа and شِيَوْ عَر دٍ have a ==Chinese== heading, and I would find it disconcerting myself, too. And what would the definition then say? zh Cyrillic script? I think readers would find that more confusing than helpful. And then what about the Russian and Turkic loanwords that don't exist in China? They would have to have full definitions without a link to a Hanzi entry, and that would probably baffle readers even more, despite the zh tag. —Aɴɢʀ (talk) 14:14, 17 July 2016 (UTC)
 * Hindi and Urdu should be merged — they are separate for political reasons. Remember, we allow for Afrikaans entries in Arabic script, Old French entries in Hebrew script, and other odd happenstances of historical script usage. We can continue to use the Dungan header for words only existing in Cyrillic form, just like I believe we do for Min Nan. —Μετάknowledge discuss/deeds 16:10, 17 July 2016 (UTC)


 * Undecided for now. There are pros and cons. Cyrillic and Arabic spellings could potentially be added to each Mandarin standard pinyin syllable (non-standard could also be considered if confirmed). Multisyllabic only for confirmed ones.
 * Mandarin pinyin (with tone marks and monosyllabic tone numbered syllables), Min Nan POJ, zhuyin characters have not been "unified" under the Chinese umbrella for various reasons. Some are described above.--Anatoli T. (обсудить/вклад) 07:39, 18 July 2016 (UTC)
 * I think it would be more convenient for editors if Dungan were part of unified Chinese, since it would be easier to edit. It would just feel like too repetitive if I made a new Dungan entry that technically already has an equivalent Chinese entry. --Mar vin kaiser (talk) 08:32, 18 July 2016 (UTC)
 * Should I bring this somewhere else to a vote whether Dungan should be merged into the unified Chinese? --Mar vin kaiser (talk) 16:08, 22 July 2016 (UTC)


 * if you still support folding Dungan into Chinese, how best could that be accomplished? Make the attested Cyrillic (and Arabic?) forms soft redirects...? (Would the L2 header of e.g. фонзы be "Chinese" or remain "Dungan"?) What is to be done if, as several users worry above, there are loanwords that don't have Han-character representations? - -sche (discuss) 04:48, 21 January 2018 (UTC)


 * I'm now also concerned about the complexity of this... a possible solution is to have link to the Dungan word, but keep the Dungan L2 heading, and treat Dungan as a full language with the Cyrillic entry showing full usage examples, etc. and linking back to the Hanzi form, perhaps on the headword line. Wyang (talk) 07:03, 21 January 2018 (UTC)
 * : Perhaps attested Dungan Cyrillic and Arabic syllables and multisyllabic words could be allowed into ? E.g. and  (not sure if the latter is right) in ? Each or almost each standard pinyin syllable should have a corresponding Dungan Cyrillic syllable, so monosyllabic entries may have them by default. --Anatoli T. (обсудить/вклад) 07:28, 21 January 2018 (UTC)
 * I think it is simple. For special words w/o hanzi, do it like the 🇨🇬 entry, otherwise do it like the 🇨🇬 entry.
 * doesn't seem problematic to me.
 * Cyrillic quotations are welcome on the hanzi entries. —suzukaze (t・c) 07:45, 21 January 2018 (UTC)
 * (I just noticed, phōng-kó uses the Chinese header. I think it should use the Min Nan header, similar to how pinyin entries use the Mandarin header. But that's a separate discussion, I think. —suzukaze (t・c) 07:46, 21 January 2018 (UTC))
 * We had a discussion on this but I don't remember where. I'm OK to use the Chinese L2 header on all romanised entries if they link to (and consequently, have a written form in) hanzi. --08:03, 21 January 2018 (UTC)
 * I have made some trials to Module:zh-pron. But currently there're some problem:

--Zcreator (talk) 21:17, 23 January 2018 (UTC)
 * 1) As Cyrillic script does not imply tone and merges some consonant, the Cyrillic form is generated from pinyin-like romanisation instead. We need  About Dungan to document it. Also, current pinyin-like romanisation include some redundant vowels.
 * 2) IPA value are from Wikipedia and needs checking.
 * 3) I don't know how neutral tone work, nor whether there're tone sandhi or erhua in Dungan.
 * 4) Some values can not be generated by current pinyin-like romanisation. e.g. чў=q+u (not ü) which is not a valid Pinyin syllable, and ңыйлу (="ng+er+y" lou).
 * 5) We need a module to transcript Cyrillic Dungan entries.


 * My thoughts:
 * Instead of, it should be for consistency.
 * I don't think we should make a "pinyin". I think we should input the original Cyrillic directly, annotated with tones (somehow).
 * There is definitely erhua.
 * —suzukaze (t・c) 05:13, 24 January 2018 (UTC)
 * I have removed pinyin-like romanisation. However point 2 and 3 still needs to be solved.--Zcreator (talk) 09:05, 24 January 2018 (UTC)


 * I oppose the use of tones in the transliteration for Dungan if it's reintroduced. The Cyrillic spelling have no tones and there should be no tone numbers or marks in the romanisation. Further, the transliteration should show what's actually written, not matching the Mandarin pinyin, e.g. is actually "yüyan" (without a space), not "yu3 yan2" but perhaps, there's no need to provide any additional transliteration. --Anatoli T. (обсудить/вклад) 10:56, 24 January 2018 (UTC)
 * We have tone and length annotations for every other language whose orthography does not reflect relevant phonemic differences. Why should this be different? Korn &#91;kʰũːɘ̃n&#93; (talk) 12:08, 24 January 2018 (UTC)
 * Just to be clear, I was talking about the transliteration, not IPA. Anatoli T. (обсудить/вклад) 20:19, 24 January 2018 (UTC)
 * I don't think the Cyrillic in Dungan should be referred to as a transliteration, but as a script of its own. Therefore, I agree with you that tones shouldn't be put in the Cyrillic writing precisely because it is a script, and it is written without the tone, but the IPA should definitely provide the tone, since the script traditionally doesn't write it down. --Mar vin kaiser (talk) 06:30, 25 January 2018 (UTC)
 * I don't think Atitarev is referring to the Cyrillic as transliteration. He's talking about the romanization of the Cyrillic. I found out from Omniglot that there are two conventions used for indicating tones: -, ъ, ь or I, II, III. We should probably adopt one of these two. — justin(r)leung { (t...) 07:05, 25 January 2018 (UTC)
 * I see. Then it's settled then, I guess. For cognates with other Chinese languages, a unified Chinese entry will be used with the Cyrillic entry still existing the same way POJ entries in Min Nan still exist. The Dungan portion of the pronunciation would then contain the Cyrillic entry or the Cyrillic script with tone marks? What do you think? Maybe the Cyrillic script with tone marks, but it would link to the Cyrillic entry without tone marks? --Mar vin kaiser (talk) 07:20, 25 January 2018 (UTC)

The module can't handle variants, separated by commas, e.g. dg=Җун1гуй1,Җун1гуә2 in. --Anatoli T. (обсудить/вклад) 14:01, 25 January 2018 (UTC)
 * I found a new source for Dungan [here]. It shows more dialects and more tones compared to the Russian source. --Mar vin kaiser (talk) 16:32, 13 February 2018 (UTC)


 * My understanding is that this is all dealt with, although the Dungan pronunciations are still labelled as "experimental" — is there a reason for that? And can this discussion now be archived? —Μετάknowledge discuss/deeds 03:56, 21 January 2020 (UTC)
 * There are too many unanswered questions, such as tone sandhi, erhua and that (now missing but I have a copy) dunganDictionary.html dictionary talks about more tones, e.g. Ia and Ib, not implemented. User:Justinrleung might have a Chinese source, which has more details about the pronunciation. The Russian dictionary could be used for vocabulary but its pronunciation description is simplistic. --Anatoli T. (обсудить/вклад) 04:21, 21 January 2020 (UTC)
 * The dictionary has been revamped (here), but I think some of the information isn't there any more (like excerpts from other sources like the Russian dictionary). — justin(r)leung { (t...) 04:37, 21 January 2020 (UTC)
 * If anyone needs it, I can email the original dictionary (the HTML file and its stylesheet). The internal references are broken (when you try to click on the up arrow to see also). I've got the Dungan-Russian dictionary (pdf) as well - it's less user-friendly and you can't copy most of Dungan terms. --Anatoli T. (обсудить/вклад) 05:19, 21 January 2020 (UTC)

RFM discussion: March–May 2020
We currently refer to the other languages most commonly referred to as "Lele" with parenthetical geographic disambiguators (🇨🇬, 🇨🇬, 🇨🇬). I suggest we do the same for [lel] and change "Bashilele" to "Lele (Congo)". There is an argument that "Lele (Congo)" should be "Lele (Democratic Republic of the Congo)", but that seems too long and none of these languages are spoken in the Republic of the Congo, so I think it's safe. —Μετάknowledge discuss/deeds 05:30, 10 March 2020 (UTC)
 * Renamed. —Μετάknowledge discuss/deeds 23:33, 12 May 2020 (UTC)

RFM discussion: March–May 2020
We currently call [snq] "Chango", which is quite infrequently used. We could go for "Isangu" (with the language prefix), because there is another Bantu language called Sangu — or we could bite the bullet and go for geographical disambiguation with "Sangu (Gabon)" for [snq] and "Sangu (Tanzania)" for [sbp], which is what Wikipedia does, and is probably least misleading. —Μετάknowledge discuss/deeds 04:29, 31 March 2020 (UTC)
 * Renamed. —Μετάknowledge discuss/deeds 04:41, 11 May 2020 (UTC)

RFM discussion: March–May 2020
This is a case where a single language spoken in two countries has been given two codes. Mushungulu [xma] is the dialect of Zigula spoken in Somalia. Maarten Mous says: "I interviewed some of the refugees [from Somalia into the Zigula homeland in Tanzania]. They claim there is no difference between the way they speak Zigula and the Zigula of those who never left the area; the [Tanzanian] Zigula speakers present agreed to this. Noting down some expressions, I had the same impression when comparing this to my Zigula field notes from 1994." —Μετάknowledge discuss/deeds 19:45, 31 March 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 04:49, 11 May 2020 (UTC)

RFM discussion: March–May 2020
Currently, the code [bwt] is called "Bafaw-Balong" in an attempt to cover the two dialects. However, these "dialects" are quite divergent, though fairly closely related. Maho (2009) separates them, as does the Linguistic Atlas of Cameroon. Emmanuel Chia suggests that heavy borrowing has confused linguists into considering them the same language in The Bafaw Language (Bantu A10) (which treats Bafaw and not Balong, considering it completely separate). I suggest that we rename [bwt] to "Bafaw" and make a new code for Balong, maybe. —Μετάknowledge discuss/deeds 20:12, 31 March 2020 (UTC)
 * Renamed [bwt] and added [bnt-bal]. —Μετάknowledge discuss/deeds 05:06, 11 May 2020 (UTC)

RFM discussion: March–May 2020
The Boma-Dzing languages are a bit of a mess, as Dzing was given a code but none of its sister languages were initially. Lwel is considered a distinct language by Maho (2009) and by, and seems to have a grammar (Éléments de grammaire morphologique de la langue lwel) that I can't find anywhere, but citations to it seem like a distinct language. I suggest a new code,. —Μετάknowledge discuss/deeds 21:02, 31 March 2020 (UTC) As for Mpiin, it's also considered distinct by Maho (2009) (and, I should add, these opinions are upheld by Van de Velde et al. (ch.2, by Hammarström)), and a selection of words in Koni Muluwa (2010) seem similar to, but consistently distinct, from other Boma-Dzing languages. I suggest a new code,. —Μετάknowledge discuss/deeds 05:44, 1 April 2020 (UTC)
 * Both added. —Μετάknowledge discuss/deeds 05:29, 11 May 2020 (UTC)

RFM discussion: April–May 2020
This code for "Mengisa" is listed as a duplicate code by WP. Van de Velde et al. (ch.2, by Hammarström) say: "The Eton that most Mengisa now speak is not linguistically remarkable and therefore we count it as an Eton variety". (The Mengisa originally spoke [leo].) —Μετάknowledge discuss/deeds 00:46, 1 April 2020 (UTC)
 * Support. - -sche (discuss) 06:15, 6 April 2020 (UTC)


 * Removed. —Μετάknowledge discuss/deeds 05:16, 11 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) say: "are commonly enumerated separately but the recent comparison by Raharimanantsoa (2012) [which I cannot find online] shows that such a distinction is untenable, wherefore we count them as the same". I am agnostic as to which code to choose, but the language should be called "Central Teke". —Μετάknowledge discuss/deeds 00:55, 1 April 2020 (UTC)
 * Merged into [nzu]. —Μετάknowledge discuss/deeds 06:33, 11 May 2020 (UTC)

RFM discussion: March–May 2020
Almost everybody calls it "Tiv", except us (we have "Tivi"). Compare and  (which has almost nothing except Library of Congress categorisation), for example. —Μετάknowledge discuss/deeds 06:10, 30 March 2020 (UTC)
 * Support. —Mahāgaja · talk 05:48, 1 April 2020 (UTC)
 * Support. - -sche (discuss) 06:16, 6 April 2020 (UTC)


 * Renamed. —Μετάknowledge discuss/deeds 04:37, 11 May 2020 (UTC)

RFM discussion: March–May 2020
Currently "Augila"; many spellings exist, but all modern linguistic work on the language in English (and most of the material on the town itself) refer to it as "Awjila". —Μετάknowledge discuss/deeds 23:10, 19 March 2020 (UTC)
 * There are a bunch of language renames on this page that I'd like to do (this being one of them). If I just ignore the categories, will your regular bot runs create the new ones and delete the old, or should I move the categories instead? —Μετάknowledge discuss/deeds 04:17, 11 May 2020 (UTC)
 * I do a bot run every 3 days to create new categories, but it doesn't delete old ones. I have periodically deleted old categories listed in CAT:Empty categories, but I don't do it on a regular basis. I *think* all the old categories will eventually end up on that page. It's probably better to move the categories and manually fix the lemmas belonging to the renamed languages to have the new language name in their headers. For languages with very few lemmas it's probably not a big deal to do it manually; otherwise, let me know and I'll do it by bot. Benwing2 (talk) 04:29, 11 May 2020 (UTC)
 * Yeah, none of them are a pain in the ass individually, I was just hopeful due to how many languages there are to rename. (Awjila only has one entry, but I had to move 10 categories for it!) Thanks for the offer, I'll let you know if any have a significant number of entries. —Μετάknowledge discuss/deeds 04:36, 11 May 2020 (UTC)


 * Renamed. —Μετάknowledge discuss/deeds 04:36, 11 May 2020 (UTC)

RFM discussion: March–May 2020
The name "Kota" is commonly used for two very small languages, so this seems like another good opportunity to use parenthetical disambiguators rather than force one to use a name with a language prefix, rarely encountered in the literature. I suggest that we rename [kfe] from "Kota" to "Kota (India)", and [koq] from "Ikota" to "Kota (Gabon)". —Μετάknowledge discuss/deeds 05:35, 10 March 2020 (UTC)
 * Both renamed. —Μετάknowledge discuss/deeds 23:36, 12 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) say that "studies in the field emphasise that Lalia is simply a variety in the Bangando area with no special status vis-a-vis other Bongando varieties". While we're at it, we currently call [nxd] "Longandu", which seems to be one of the rarer names. In the tradition of lopping off language prefixes and matching the usual final vowel, I suggest we rename it to "Ngando". —Μετάknowledge discuss/deeds 01:21, 1 April 2020 (UTC)
 * Merged. As for the renaming, the issue is more complex than I had realised: there is also [ngd], which we perplexingly call "Bagandou". We could change the parentheticals later, but for now I'm going with "Ngando (Congo)" for [nxd] and "Ngando (Central African Republic)" for [ngd]. —Μετάknowledge discuss/deeds 21:38, 11 May 2020 (UTC)

RFM discussion: April–May 2020
We currently call this "Longandu". I assume the reason for avoiding "Ngandu" was originally potential conflict with [bgf], which we call "Bangandu". Given that "Bangandu" seems like a perfectly fine name, I suggest we rename [ngc] to "Ngandu". —Μετάknowledge discuss/deeds 01:27, 1 April 2020 (UTC)
 * Well, I seem to have gotten these confused while attempting to clean them up, which is as good an argument as any for better names. We actually call [ngc] "Lingombe", from which we should remove the language prefix. Unfortunately, that brings it into potential conflict with the poorly named "Bangando-Ngombe" [nmj], variously considered a dialect of Bangandu [bgf]. I am instead renaming it to "Ngombe (Congo)" for maximal clarity, and will create a new thread to discuss [nmj]. —Μετάknowledge discuss/deeds 23:17, 11 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) say: "the only source on this language (Hackett and van Bulck 1956:74) has it as a dialect of Nyanga-li [nyc]." If that's the sole source, I have no idea why it ever got a code. —Μετάknowledge discuss/deeds 01:33, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:32, 11 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) say: "Pelende and Lonzo denote political rather than ethnolinguistic sub-entities of Yaka [yaf] ... as explicit linguistic data, whenever available, confirms". —Μετάknowledge discuss/deeds 01:38, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:33, 11 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) say: "the field research of Kraal (2005 1–7) finds this distinction [between Ndonde and Makonde] untenable from a linguistic and ethnographic point of view". "Makonde" [kde] is the usual term. —Μετάknowledge discuss/deeds 02:02, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:36, 11 May 2020 (UTC)

RFM discussion: April–May 2020
Ehret and Kinsman (1981) "Shona Dialect Classification and Its Implications for Iron Age History in Southern Africa" have helpful trees of the Shona dialects, which confirm that [twl], [mxc], [twx] are all mutually intelligible to a high degree with the other dialects of Shona [sn]. (In fact, [twl] and [twx] are subdialects of the larger dialects that most Shona dictionaries treat, and [mxc] isn't even a valid clade, but merely a geographic convenience.) —Μετάknowledge discuss/deeds 02:17, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:38, 11 May 2020 (UTC)

RFM discussion: April–May 2020
The Bantu is certainly not the same as Zulu, as should be clear just from the sample in the WP article. Unfortunately, our current "Lala language" will have to be moved to make way. In the emerging tradition of geographic parentheticals, I suggest that [nrz] be renamed to "Lala (New Guinea)" and a new code, maybe [bnt-lal], be created for "Lala (South Africa)". (As a side note, there are also Lala-Roba and Lala-Bisa, which thankfully have hyphenated disambiguators.) —Μετάknowledge discuss/deeds 02:33, 1 April 2020 (UTC)
 * [nrz] renamed and [bnt-lal] created. —Μετάknowledge discuss/deeds 21:07, 11 May 2020 (UTC)

RFM discussion: April–May 2020
There's a lengthy SIL report on the Nyiha languages, and skimming it suggests that the Malawian ([nyr]) and Tanzanian ([nih]) dialects are mutually intelligible. Chances are that the codes haven't been used to separate the two countries' dialects in a consistent way anyway, because they've just been called "Shinyiha" (with the language prefix) and "Nyiha" (without) — the latter is what we should use. —Μετάknowledge discuss/deeds 05:21, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 22:15, 12 May 2020 (UTC)

RFM discussion: April–May 2020
We currently call [kmw] "Kikumu"; the language is most commonly referred to as "Komo", but we already have [xom] "Komo", which is not easily disambiguated (it's spoken in three countries and belongs to an obscure group). We can still do better than "Kikumu", though — I suggest we lop off the language prefix and call it "Kumu", which is more common than the name we currently use. —Μετάknowledge discuss/deeds 06:10, 1 April 2020 (UTC)
 * Renamed. —Μετάknowledge discuss/deeds 22:16, 12 May 2020 (UTC)

RFM discussion: April–May 2020
We currently call this "Oluwanga"; we should drop the language prefix, as we normally do and as matches the scholarly literature, and call it "Wanga". (Note that we need to reconsider the Luhya codes altogether, but that can be dealt with later.) —Μετάknowledge discuss/deeds 20:23, 1 April 2020 (UTC)
 * Support removing the prefix. Comparing to, I notice that although the tribe seems to be mostly called the Wanga, the language is not infrequently "Hanga". (Another alt name is "Luwanga"). - -sche (discuss) 06:04, 6 April 2020 (UTC)


 * Renamed. —Μετάknowledge discuss/deeds 22:23, 12 May 2020 (UTC)

RFM discussion: April–May 2020
We currently call [blv] "Bolo", which refers to a peripheral dialect of this language. This SIL report says that a conference decided on "Kibala", and Angola's Institute of National Languages followed suit (the alternative suggestion of "Kibala-Ngoya" having been rejected due to its historically derogatory usage). I suggest we do the same and use "Kibala". —Μετάknowledge discuss/deeds 20:47, 1 April 2020 (UTC)
 * Renamed. —Μετάknowledge discuss/deeds 23:10, 12 May 2020 (UTC)

RFM discussion: April–May 2020
We currently call [syx] "Shamay", which is a rather rare name for it. WP uses "Samay", but I suggest we follow Van de Velde et al. (ch.2, by Hammarström) and use "Osamayi". —Μετάknowledge discuss/deeds 20:57, 1 April 2020 (UTC)
 * Renamed to "Osamayi". —Μετάknowledge discuss/deeds 23:12, 12 May 2020 (UTC)

RFM discussion: April–May 2020
Van de Velde et al. (ch.2, by Hammarström) and indeed every linguistic source I can find has this as synonymous with or a dialect of Mashi [mho]. I can only see a bit of the relevant part in Laranjo Medeiros (1981) VaKwandu on BGC, where he seems rather confused about the language in general, but there may be useful information there if someone can see it (or if anyone still has academic libraries operating in their part of the world). —Μετάknowledge discuss/deeds 21:16, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:17, 12 May 2020 (UTC)

RFM discussion: April–May 2020
The Yauma are ethnographically distinct, but speak a Mbunda dialect according to Fleisch (2000). I see a journal article in 1970 that says Yauma is synonymous with "newer Mbunda" (in contrast with "Old Mbunda"), and I'm not really sure what's going on there, but it also supports a merger. I cannot find Yauma mentioned in the usual sources otherwise. —Μετάknowledge discuss/deeds 21:32, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 23:21, 12 May 2020 (UTC)

RFM discussion: December 2017
"Southwestern Fars" is a really awful name. Fars is not a macrolanguage as the name would suggest, but a province of Iran, and there are other lects spoken in southwestern Fars. We would do much better to use the unambiguous name "Kuhmareyi", as used by Wikipedia. —Μετάknowledge discuss/deeds 19:44, 9 December 2017 (UTC)
 * , who has been adding entries in it, may have an opinion. —Μετάknowledge discuss/deeds 20:53, 9 December 2017 (UTC)
 * hi, Metaknowledge. At the beginning when I wanted to create the category for the language I searched online to see if I can find Kuhmareyi. I could not find online sources mentioning Kuhmareyi other than Wikipedia. Online sources mostly mention Southwestern Fars. I know Fars is a province. ethnologue and some other sites use this name. I am trying to find the name Kuhmareyi in the book that I have, A treasury of the Dialectology of fars. I should also mention that I am very cautious in adding the new words and double check the words and have not recorded some words that can not be found in other sources. However the book is written by a professional linguist and is funded or guided by Iranian Academy of Persian Language and literature. I think I saw Kuhmareyi in one online source. I have not found the name "Kuhmareyi" So far but the name seems correct.--Eeranee (talk) 21:09, 9 December 2017 (UTC)
 * It sounds like you're being a very conscientious editor, so thank you! Do you think "Southwestern Fars" really is the most used name? If so, I guess we'd be wrong to do otherwise. —Μετάknowledge discuss/deeds 21:29, 9 December 2017 (UTC)
 * I can find online sources in Persian using Kuhmareyi which might not be reliable but I could not find a source written in English using Kuhmareyi. If anyone thinks kuhmareyi is more correct we can use it--Eeranee (talk) 22:06, 9 December 2017 (UTC)


 * Searching Google Books for both names, I find nothing that would help us much to decide one way or the other: nothing using "Kuhmareyi", and only a couple of general reference works on world languages that mention "Southwestern Fars" in giant lists of languages, probably just copying Ethnologue. (Wikipedia curiously says "the southwestern [Fars] dialects can be divided into three families of dialects according to geographical distribution and local names: Southwestern (Lori), South-central (Kuhmareyi) and Southeastern (Larestani)", as if calling it "southwestern" might be slightly confusing/misnomial.), do either of you have a preference or knowledge of which name is more common or appropriate? - -sche (discuss) 19:19, 17 December 2017 (UTC)
 * I don't know anything about this. --Vahag (talk) 18:38, 18 December 2017 (UTC)

RFM discussion: May–June 2020
Currently called "Chuwabu". There are quite a lot of spellings, influenced by English, Swahili, Portuguese, and Africanist notation, and none of them are unquestionably more common, but this shows that the case for "Chuabo" is the best, which is supported by paging through the Google Books results as well. —Μετάknowledge discuss/deeds 17:11, 5 May 2020 (UTC)


 * Support. - -sche (discuss) 01:39, 12 May 2020 (UTC)


 * Renamed. —Μετάknowledge discuss/deeds 21:33, 15 June 2020 (UTC)

RFM discussion: May–June 2020
We currently call [nmj] "Bangando-Ngombe". It is variously considered a dialect of Bangandu [bgf]: Wikipedia cites Glottolog to support a claim that the code is "spurious", and the Atlas linguistique de l'Afrique centrale calls it "une variété très voisine" to Bangandu. Without a clearer statement about mutual intelligibility or specific fieldwork, I am hesitant to merge it. However, the current name, which hyphenates a (differently spelt!) version of the macrolanguage with the dialect's name, is clearly undesirable. I suggest we rename it to "Ngombe (Central African Republic)" (the national parenthetical serving to disambiguate from [ngc] "Ngombe (Congo)"). —Μετάknowledge discuss/deeds 23:24, 11 May 2020 (UTC)


 * Seems reasonable; go for it. - -sche (discuss) 01:38, 12 May 2020 (UTC)


 * Renamed. —Μετάknowledge discuss/deeds 22:38, 15 June 2020 (UTC)

RFM discussion: June 2018–June 2020
We currently call [mgs] "Nyasa", and it came to my attention due to the entry, which User:Wikitiki89 mistakenly created as a "Nyasa" entry, based on an obsolete dictionary of Chichewa that calls it "Kiniassa". Nowadays, Nyasa is usually a name for a group of closely related languages including Chichewa but WP claims it is also used for mjh. Following WP, we could call it "Manda (Tanzania)", rendered problematic by the two other possibilities, which we call "Manda" and "Australian Manda" (!). I propose that we go all in for national disambiguation and make those parenthetical as well, as "Manda (India)" and "Manda (Australia)". —Μετάknowledge discuss/deeds 06:18, 28 June 2018 (UTC)
 * Just for the record, I did a lot of research to determine what the correct language code was for the language described in that dictionary, even referring to this map (source) and strongly considering "mjh". Thanks for finally sorting it out! Note also, there is a "Nyasa" translation at water, which does not align with Chichewa "madzi" (which the dictionary I used gave as "madsi"). --WikiTiki89 15:07, 28 June 2018 (UTC)
 * Next time, try Google instead of poring over maps.  I was already aware of this dictionary (and its miserable orthography), but searching "Rebman Kiniassa" gets you the Wikipedia article for, which in turn tells you this is Chichewa. As for the "Nyasa" translation, masi... I really don't know what language that should be. The word for "water" in mgs is . —Μετάknowledge discuss/deeds 17:16, 28 June 2018 (UTC)
 * I did Google quite a bit, and I probably did look at the Chichewa Wikipedia article, but didn't trust that it was accurately citing the dictionary. --WikiTiki89 17:21, 28 June 2018 (UTC)


 * WP says Australian Manda (zma) is also called "Menhthe", but the only published resources on it I could find used Manda. To distinguish it and [mgs] and [mha], the proposal of [mgs]="Manda (Tanzania)" and [mha]="Manda (India)" and [zma]="Manda (Australia)" sounds good. (As an aside, there is also a "Ma Manda" language.) - -sche (discuss) 03:11, 21 January 2020 (UTC)


 * All three renamed. —Μετάknowledge discuss/deeds 21:30, 15 June 2020 (UTC)

RFM discussion: June 2018–June 2020
Tangentially relevant to the discussion of languages named "Nyasa" above, mjh is not only one of those, but is also called "Mwera" (a name currently occupied by mwe, which can't even by distinguished by a parenthetical, because both languages are from Tanzania!). Maho's Guthrie List, the standard list used by Bantuists, calls it "Mbamba Bay Mwera", but the only hits for that string on Google Books are of that very list. WP chooses to simplify this as just "Mbamba Bay", which is both the most unambiguous option, and also an option that seems to be used only there. I'm really unsure what to call it — anything but our current name of "Nyanza", which refers to the lake and offers only more confusion. —Μετάknowledge discuss/deeds 06:27, 28 June 2018 (UTC)
 * Any thoughts on this (or the above discussion)? —Μετάknowledge discuss/deeds 03:19, 17 January 2020 (UTC)
 * Can they be [called by whatever name is most common, even though homographic to another language, and then] distinguished by linguistic family, like WT:RFM? Say, "Mwera (Nyasa)" vs "Mwera (Ruvuma)" or whatever? Btw, we should standardize on putting the family in parentheses (where we also put geographic disambiguators), which will require renaming some languages that were named with the family first, like Papuan Mor (but not Sepik Iwam, which is apparently normally called that, to distinguish it from the other Iwam that is also Sepik). - -sche (discuss) 07:05, 17 January 2020 (UTC)
 * I don't know anything about these languages, but if indeed both  and   are usually called Mwera and both are spoken in Tanzania, then I'd support "Mwera (Nyasa)" and "Mwera (Ruvuma)". —Mahāgaja · talk 07:29, 17 January 2020 (UTC)


 * I bring good tidings! Van de Velde et al. (ch.2, by Hammarström) report that "the examination by Ebner (1955: 41–43) shows the language to be a variant of Chewa/Nyanja on the opposite side of the lake". We no longer have to worry about the name, because we can merge [mjh] into [ny]. —Μετάknowledge discuss/deeds 01:55, 1 April 2020 (UTC)
 * Merged. —Μετάknowledge discuss/deeds 06:54, 15 June 2020 (UTC)

RFM discussion: January 2020
I'd like to address merging Lepontic with Gaulish because it is already included in the English article provided and the terms are frequently the same except for some differences in endings development without any necessary declension change. Comparing this to Latin, as it is merged with Old Latin, this may also be extended to Gaulish and Lepontic. Furthermore, Cisalpine Gaulish is already encompassed by Gaulish even though it was written in the Old Italic alphabet just like Lepontic. HeliosX (talk) 19:55, 15 January 2020 (UTC)
 * Having read both and, I'm not convinced they're similar enough to warrant merging. —Mahāgaja · talk 21:06, 15 January 2020 (UTC)
 * I generally push for a conservative treatment for extinct languages with small, finite corpora. We can afford to over-split a little, if that even is the case here, and be no worse off for doing so. —Μετάknowledge discuss/deeds 03:13, 17 January 2020 (UTC)

RFM discussion: February 2020
This helpful source strongly suggests that early texts written are sufficiently distinct from Omagua to be considered a separate language. I don't know much about it, but if this is right, we need a new code for Old Omagua, maybe. . —Μετάknowledge discuss/deeds 06:24, 14 February 2020 (UTC)


 * Meh. I don't object, but I'm not convinced it's necessary. We're talking about a time difference less than that between modern and Middle English, many of the differences seem to be in whether various morphemes remain attested into the present in various positions (or, conversely, whether modern ones happened to be found in the sparse early corpus), spelling variations are clearly attributable to texts being written by potentially non-fluent outsiders vs modern linguists and natives, and modern Omagua is small and dying. (Fox is another language which underwent changes in a relatively short time period that might be considered substantial, but it seems to still be treated as one language.) It seems to me like we could adequately, and perhaps even more sensibly, handle the differences by labelling words and spellings obsolete where necessary, lemmatizing the modern forms (and conversely labelling words/forms as modern developments in cases where we can tell that they are as opposed to that they just weren't attested in the old texts). - -sche (discuss) 21:12, 18 February 2020 (UTC)


 * Some languages do change a lot faster than others due to social pressures (hence why glottochronology doesn't really work). I found the differences in morphology and lexicon significant, especially in the hopes that someday, someone will build the infrastructure for (modern) Omagua here. But if you're not convinced, I'm fine with shelving this; I certainly won't be working on Tupian stuff myself. —Μετάknowledge discuss/deeds 03:11, 19 February 2020 (UTC)

RFM discussion: April 2017–August 2020

 * [vms] —Μετάknowledge discuss/deeds 04:15, 4 April 2017 (UTC)
 * Charles Grimes says in Spices from the East: Papers in Languages of Eastern Indonesia: "This speech variety has been extinct since 1974, when the last speaker died. No clues other than the name of a stream east of Kayeli called Moksela, give any indication as to where it was spoken or what it was like. If it was spoken from the stream by that name eastward, then chances are likely that it was also a variety of the Kayeli language. People in the Kayeli area remember nothing more than the name of the language, who in the community spoke it " (I cannot view beyond this in the Google Books preview.)
 * "...who in the community spoke it before they died, and that it was somehow different enough to have its own identity." is the rest of the sentence, I managed to coax Google into telling me. The name seemed familiar, as if it had been in one of the wordlists I've been looking at recently, but I just went back over them and searched through various other sources and indeed the only mentions of it I find all just say it's extinct and not recorded; how sad. Removed. - -sche (discuss) 08:19, 10 May 2017 (UTC)


 * In this vein, Makolkol [zmh] is claimed to be extinct (per Wurm 2003, after having 7 speakers in 1988) and apparently unattested (per Stebbins 2010). Harald Hammarström and Sebastian Nordhoff accept this conclusion in Melanesian Languages on the Edge of Asia, but it may be a cautionary tale instead, because an article in LoopPNG from 2016 says five Makolkol still live, and even provides words(!), saying it is related to Simbali: "mam, meaning father, and nan, meaning mother". A 2005 article in Anthropological Linguistics (volume 47, page 77) agrees on the relation to Simbali: " Makolkol (extinct),  is locally understood to have been a 'mixed language' combining Simbali and Nakanai (an Austronesian language on the northern side of New Britain)." I suppose the code should be left alone for now, pending further data. (There were widely varying estimates of how many speakers it had earlier in the 20th century, and fanciful tales of who they were, "headhunters" or "giants" who "lived in trees" and who no white person had survived meeting at first.) - -sche (discuss) 08:43, 10 May 2017 (UTC)
 * (Ergo, code kept.) - -sche (discuss) 01:30, 6 August 2020 (UTC)


 * Maramba
 * Also Maramba (myd) ? (And many more at need to be checked, but some are not spurious, like Ammonite.) - -sche (discuss) 09:51, 3 June 2017 (UTC)
 * myd has been retired by the ISO and hence now also by us. - -sche (discuss) 07:47, 23 February 2019 (UTC)

RFM discussion: July 2016–August 2020
This next batch is of languages from lists other than Ethnologue and LinguistList. As before, I've tried to vet them all beforehand, but I will have doubtlessly made some mistakes. NB if you want to find more: I've avoided dealing with most of the Loloish languages, because all the literature seems to be in Chinese. —Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)


 * (sem-cha)
 * ✅ - -sche (discuss) 02:36, 26 March 2018 (UTC)
 * (sai-cva) ✅
 * (nic-dam)
 * Created as dmn-dam, and entries added based on . —Μετάknowledge discuss/deeds 21:07, 5 August 2020 (UTC)
 * (trk-dkh) ✅
 * (cau-ers)
 * Wikipedia suggests this might be only attested vi placenames. I have seen or created codes in some similar cases, and am willing to create one here, if there are not objections... - -sche (discuss) 00:17, 2 August 2020 (UTC)
 * No! Fake language invented by charlatans. The ridiculous Wikipedia article should be deleted. --Vahag (talk) 03:11, 2 August 2020 (UTC)
 * (sai-gam)
 * ✅. Appendix:Gamela word list can now be moved into mainspace... - -sche (discuss) 04:39, 1 August 2020 (UTC)
 * (cau-jek)
 * Added. - -sche (discuss) 00:17, 2 August 2020 (UTC)
 * Removed following [//en.wiktionary.org/w/index.php?title=User_talk:-sche&oldid=59895318#Caucasian_languages discussion on my talk page] (merged into kry). - -sche (discuss) 07:04, 2 August 2020 (UTC)
 * (son-kaa) — perhaps should be kept merged in Zarma
 * Retracted. I found a SIL report that reports high lexical similarity and intelligibility within the Southern Songhay lects, and that the term "Kaado" is often seen as derogatory. —Μετάknowledge discuss/deeds 06:42, 2 August 2020 (UTC)


 * Australian languages
 * (aus-dap)
 * Wikipedia and various of its cited references and authorities say this and Tulua are the same language, but we don't have a code for Tulua ether, so something should still be added. Wikipedia suggests Tulua is a more-used name,s perhaps aus-tul? (Other names I spot in literature: Toolooa, Dulua, Dapil, Narung, Dandan. AIATSIS also lists Gureng Gureng, Goeng, Koreng Goreng, and Taribulung, which however seem to result from confusing it with .) Wikipedia also, confusingly, says it is extinct but that the Australian Priority Languages Support Project has it as a priority as an endangered language "where there are living speakers". A short search turned up A Dictionary of Non-Scientific Names of Freshwater Crayfishes (1994), which has "wunmeen PARASTACIDAE, "crayfish," prbably = Cherax gladstonensis. [Australia: Queensland; Boyne River, Toolooa or Dandan Aboriginal tribe (=Tulua Aboriginal tribe of Tindale, 1974: 186 [...])] Curr, 1887(III):124, Sta. 161." The cited work, the very old report by Curr (who doesn't provide the tribe name per se, and says the list was given to him in an illegible hand), indeed lists that as a word, and "water" as doomoo and "fire" as wi. - -sche (discuss) 18:53, 1 August 2020 (UTC)
 * ✅ (as aus-tul). - -sche (discuss) 19:20, 1 August 2020 (UTC)
 * (aus-guw)
 * ✅ - -sche (discuss) 23:03, 22 April 2018 (UTC)
 * (aus-mbi)
 * ✅, although AIATSIS says that this is Ethnologue's ... but that is a cluster of lects, of which several already have separate codes... although the ISO and Glottolog seem to split them differently. Messy. - -sche (discuss) 19:20, 1 August 2020 (UTC)
 * (aus-ngk)
 * ✅. - -sche (discuss) 19:30, 1 August 2020 (UTC)


 * Tasmanian languages
 * Western Tasmanian:


 * Northwestern:, Pirapa (aus-pee) ✅
 * Southwestern: (aus-too) ✅


 * Northern Tasmanian:


 * Northern Tasmian,, Tommeeginnee (aus-tom) ✅
 * (aus-psl) ✅


 * Eastern Tasmanian:
 * Oyster Bay (Big River, Paredarerme/Paritarami, Lairmairrener, Lemerina)? - -sche (discuss) ✅ as aus-par
 * Little Swanport? - -sche (discuss) ✅ as aus-lsw


 * (aus-set)
 * (aus-bit) ✅ as aus-bru

, back when I suggested these Australian languages, I included the codes for the Tasmanian languages that Bowern (2012) teased out of various wordlists. At the time, I was ignorant of the fact that there is an ISO code, xtz, for a language called "Tasmanian", and we have a few words in it. There was no single Tasmanian language, so I think this code should be retired and the words sorted into their respective languages by Bowern's scheme. —Μετάknowledge discuss/deeds 05:28, 3 May 2017 (UTC)
 * comments

Other needed codes
Here are other languages we might need codes for: - -sche (discuss) 05:21, 29 August 2016 (UTC)


 * Indanga (Kɔlɔmɔnyi, Kɔlɛ, Kasaï Oriental) (bnt-ind?)
 * It lacks a Wikipedia article but is documented by Jacobs, Texte et lexique indanga (2002). fr.Wikt already has a word from it. OTOH, fr.WP considers it a regional variant of Tetela. And fr.Wikt does have a tendency to treat dialects as language, also splitting e.g. Alsatian German from Alemannic German, Hoanya from Papora, etc. - -sche (discuss) 05:21, 29 August 2016 (UTC)
 * Well, it's definitely part of the dialect continuum known in Guthrie as C.70, which has 8 ISO codes that cover it rather poorly (this is a typical situation with Bantu languages, which really need their own overhaul at some point). I see that its word for "water" is bash in that reference, rather different than Tetela proper ashi. We have to draw lines somewhere, and I can't figure out where Indanga would be merged, so I suppose a new code is in order. —Μετάknowledge discuss/deeds 05:30, 4 October 2016 (UTC)
 * ✅ —Μετάknowledge discuss/deeds 01:58, 5 April 2018 (UTC)


 * Many sources seem to accept the existence of three Naish "languages" and we have codes for the other two (though Na (called here "Narua") lacks a WP article, as the editors there seem unconvinced by its separateness). ✅ (as tbq-laz) —Μετάknowledge discuss/deeds 01:58, 5 April 2018 (UTC)
 * Many sources seem to accept the existence of three Naish "languages" and we have codes for the other two (though Na (called here "Narua") lacks a WP article, as the editors there seem unconvinced by its separateness). ✅ (as tbq-laz) —Μετάknowledge discuss/deeds 01:58, 5 April 2018 (UTC)

RFM discussion: April–July 2020
Van de Velde et al. (ch.2, by Hammarström) say: "Dombe is a derogatory nickname for Tonga found in Hwange district of Zimbabwe". This raises an issue with [toi], which we currently call "Tonga (Zambia)" (to distinguish it from "Tonga (Malawi)" and "Tonga (Mozambique)") — it's spoken in Zimbabwe too. The WP article is called "Tonga (Zambia and Zimbabwe)", which is awfully awkward name for a language spoken by 1.5 million people. I think we may have to cut our losses and stick with "Tonga (Zambia)" even after the merger. —Μετάknowledge discuss/deeds 01:51, 1 April 2020 (UTC)
 * , anyone have feelings about "Tonga (Zambia)" rather than "Tonga (Zambia and Zimbabwe)"? —Μετάknowledge discuss/deeds 20:59, 11 May 2020 (UTC)
 * Reminder that I'm still looking for input on this one. When you see a country name in a parenthetical disambiguation in a language's canonical name, do you assume that the language is exclusively spoken in that country? —Μετάknowledge discuss/deeds 06:35, 15 June 2020 (UTC)
 * At least I would assume it's spoken primarily in Zambia, as in maybe at least 67% of speakers are in Zambia. But why not use subfamilies instead of countries as disambiguators? [toi]/[dov] could be "Tonga (Botatwe)", [toh] could be "Tonga (Southern Bantu)", and [tog] could be "Tonga (Nyasa)". —Mahāgaja · talk 06:41, 15 June 2020 (UTC)
 * "Primarily" is certainly correct here. But the idea of using subfamilies is an intriguing suggestion that I hadn't considered. My only fear is that those subfamilies are not very well known by any terms, as such classification is relatively recent, and to someone unfamiliar with the classification system, the word "Botatwe" would be meaningless and the concept of "Southern Bantu" would be ambiguous. —Μετάknowledge discuss/deeds 06:51, 15 June 2020 (UTC)
 * I would personally prefer the more accurate label Tonga (Zambia and Zimbabwe), even if it's a bit clumsy. But I'm not opposed to keeping it as Tonga (Zambia) if you prefer. What I'm confused about is why you want to merge [dov] and [toi]. The chapter you're referencing splits [dov] as a separate language Toka-Leya, which suggests that language isn't mutually intelligible with Tonga. Smashhoof (Talk · Contributions) 17:25, 15 June 2020 (UTC)
 * Thank you for checking! I read that footnote as meaning that the "Toka-Leya dialects" were to be thought of as dialects of Tonga, but I see that Hammarström does give the sum of Toka, Leya, and "Dombe" its own line as a language. I went to check Hachipola (1991), who did a study on eight Tonga-group lects, including Toka, and says: " Toka informants all insisted that they were all legitimately Tonga. Valley speakers in particular consider their speech as the 'purer' form of Tonga of which Plateau is the corrupted version [MK: note that Plateau is standard Zambian Tonga]. The speakers of Plateau, Valley and Toka can also understand speakers of Ila, but with some difficulty. However, conversation can be carried out between, say, a Plateau speaker and an Ila speaker each using their own speech form. This is also true of Lenje on the one hand and Plateau on the other ". I glanced through the lexical items in the second half of the dissertation and found that these three lects were quite similar, and Toka might be closer to Valley Tonga than to Plateau Tonga. I'm really not sure where to draw the line, especially since we (by default) consider Valley and Plateau to both be [toi]. What do you think? —Μετάknowledge discuss/deeds 19:17, 15 June 2020 (UTC)
 * I'm not sure what "Valley" refers to specifically. But I would follow Hammarström (and Glottolog) by classifying Toka, Leya, and "Dombe" as [dov]. Smashhoof (Talk · Contributions) 20:50, 19 June 2020 (UTC)


 * Renamed [dov] as "Toka-Leya". —Μετάknowledge discuss/deeds 21:03, 25 July 2020 (UTC)

RFM discussion: August 2020
Ashraaf is the sister language to Somali, and currently included in its code despite being mutually unintelligible. Wikipedia has an entry at, despite the fact that they quote two sources that make the case for it being its own language, and their references use the usual spelling with a double a. It is somewhat poorly documented (there is no dedicated grammar), but multiple scholarly sources include vocabulary. I suggest the code cus-ash. —Μετάknowledge discuss/deeds 01:33, 11 August 2020 (UTC)


 * I am inclined to agree/support; Green and Jones' piece outright says, elsewhere from the snippet WP quotes, that "despite all classifications of Ashraaf treating it as a dialect of Somali, our Marka speakers have intimated to us that both Marka/Somali and Marka/Maay intelligibility presents a challenge" and "despite the fact that some classifications treat Ashraaf as a dialect of Somali, Marka and Somali appear not to have a high degree of mutual intelligibility"; they say plurals are almost all different from Somali, and singulars also show some differences. (They mention the otherNames = "Af-Ashraaf", and the varieties {"Marka, Lower Shabelle"} and "Shingani", saying — as you probably saw, but as I will mention here for easier findability later — that for Shingani there is one 'theoretical article' on 'theme construction', Ajello 1984, one short grammatical sketch, Moreno 1953, and one book of pedagogical material, Abo 2007, and for Marka there is an unpublished grammatical sketch, Lamberti 1980, and an article on verb inflection, Ajello 1988.) And I see that Maay already has a separate code, reasonably (the IEL says standard Somali is "difficult or unintelligible to Maay and Digil speakers" unless they've learned it). The only thing that gives me pause is that, as they admit, many prior (albeit non-Ashraaf-specific) classifications have viewed Somali as a single language, as WP claims speakers also view it, despite the lack of mutual intelligibility. - -sche (discuss) 20:46, 11 August 2020 (UTC)
 * That's a matter of linguistic nationalism, coupled with an unfortunate habit of scholars like Lamberti to count everything spoken by people who identify as belonging to Somali clans as "Somali dialects". Maay, Jiddu, Garre, and many others were in this boat, but have all gotten ISO codes, except for Ashraaf. —Μετάknowledge discuss/deeds 21:07, 11 August 2020 (UTC)


 * Added (as cus-ash with an initial entry at ilig; you may want to add diacritic stripping rules to the code and gender to the entry). - -sche (discuss) 04:20, 18 August 2020 (UTC)

RFM discussion: March 2014 – August 2015
I think that linguists consider these to be dialects of Finnish, so that would make these pluricentric standards of a single language. I don't know if keeping them separate would hold any value? 14:05, 15 March 2014 (UTC)


 * Let's ping our active Finnish speakers to see if they have input: User:Hekaheka and User:Makaokalani. 23:16, 23 March 2014 (UTC) (updated - -sche (discuss) 06:09, 6 April 2014 (UTC))
 * The impression I get from the example at Meänkieli is that the differences are very minor, no more than there might be between Croatian and Serbian. I notice systematic loss of -d- and Finnish -ts- corresponds to -tt- in Meänkieli. They definitely look mutually intelligible. Kven looks a little more different, but it might also just be the spelling; I don't know how hard it would be to the average Finnish speaker. 23:26, 23 March 2014 (UTC)
 * I know maybe a dozen words of Finnish, so I can't judge for myself, but the impression I get from the Wikipedia articles is that there's an equal or greater range of variation between dialects in Finland as there is with these dialects- if these dialects were on the other side of the Finnish border, they would probably be considered just part of the normal dialectal variation (I'm sure there are some differences due to their isolation from the influence of standard Finnish, as well). They have special status because they're in Sweden and Norway surrounded by Swedish and Norwegian. Chuck Entz (talk) 23:53, 23 March 2014 (UTC)
 * Finnish wasn't even a single language to begin with originally. There's several dialect groups that form a continuum, but it's not easy to draw clear lines. Savonian (eastern) dialects for example might well be closer to Karelian (considered a separate language) than they are to western Finnish. 00:10, 24 March 2014 (UTC)


 * My impression is the same as Chuck's, that these could be merged. By my (quick) count, we have 11 Meänkieli entries and 14 English entries with Meänkieli translations, and 19 Kven entries and 8 entries with Kven translations. - -sche (discuss) 02:45, 24 March 2014 (UTC)
 * For more information see Finnish dialects and also Peräpohjola dialects. The map to the right may also help. 03:33, 24 March 2014 (UTC)

I somehow missed this discussion when it was active, but better later than never. I have the following comments: --Hekaheka (talk) 12:27, 25 January 2015 (UTC)
 * The map is outdated. There's practically no Finnish-speaking population left in the areas which were annexed by the Soviet Union during and after the WWII. The map on the right is more up-to-date.
 * There's some Ingrian population left in the St. Petersburg area, but their number and share of population (less than 0,5‰ in Leningrad oblast) is drastically reduced due to 1) inflow of Russians to St. Petersburg, 2) Stalin's terror in the 1930's and 3) emigration to Finland between 1990 and 2011.
 * I'm not sure of Kven-speakers, but the speakers of Meänkieli tend to be quite strong in their opinion that they are not Finnish-speakers. It is probably true that if the border were in another place, Meänkieli would be considered a Finnish dialect. But then again, it would hardly be the same language as it is today - it would have preserved less archaic features and there would be much less Swedish influence in it. If ISO regards it a language, how could we be wiser?
 * Meänkieli is an official minority language in Sweden, and is regarded as distinct from Finnish which also has a (separate) minority language status there.
 * "Finnish wasn't even a single language to begin with originally." -- Show me one that was!


 * Let's take a look at our current 15 Meänkieli and 20 Kven lemmas:
 * Meänkieli:
 * Six words indistinguishable from Standard Finnish
 * Two words indistinguishable in shape from Standard Finnish but with dialect-specific meanings
 * Four words with some phonetic peculiarities specific to Northern dialects
 * Two words widespread across Finnish dialects
 * One word that might be specific to the variety, or might be one of the previous
 * Kven:
 * Seven words indistinguishable from Standard Finnish
 * Seven words widespread across Finnish dialects
 * Five words with some phonetic peculiarities specific to Northern dialects
 * One narrow-distribution loanword from Norwegian
 * So yes, . We could well treat these as Finnish dialects, though I think to account for any local neologisms and such, they would deserve categories of their own under Category:Regional Finnish. --Tropylium (talk) 19:55, 28 February 2015 (UTC)
 * I've merged Kven into Finnish, relabelling the handful of Kven entries we had, except nelje and kahðeksen, yhðeksen and yhðeksentoista, which don't seem to be attested in any language. (kahdeksen and yhdeksen do seem to be attested as regional variants of the usual Finnish terms.) - -sche (discuss) 05:16, 18 August 2015 (UTC)

RFM discussion: April 2018–October 2019
From ‘Molise Croatian’ to ‘Molise Slavic’. It seems most scholars in the field, excluding some in Croatia itself, have been switching to the latter name; see, for instance, the papers of Krzysztof Borowski or the more recent works of Walter Breu. The endonym is or, without any specific ethnonational designation; the other names are apparently recent impositions. For this see for instance Sujoldžić 2004: "Along with the institutional support provided by the Italian government and Croatian institutions based on bilateral agreements between the two states, the Slavic communities also received a new label for their language and a new ethnic identity — Croatian — and there have been increasing tendencies to standardize the spoken idiom on the basis of Standard Croatian. It should be stressed, however, that although they regarded their different language as a source of prestige and self-appreciation, these communities have always considered themselves to be Italians who in addition have Slavic origins and at best accept to be called Italo-Slavi, while the term »Molise Croatian« emerged recently as a general term in scientific and popular literature to describe the Croatian-speaking population living in the Molise." Information about current scholarly usage is given by Walter Breu in the request for an ISO code here: "Slavomolisano: In scientific work, this name is predominantly used, either in its Italian form or in translations. As the language is 'genetically' affiliated to the Serbo-Croatian macrolanguage with its dialectal continuum and the problems of its segmentation, a denomination, referring to one of the individual Standard languages of this group, e.g. Croatian or Bosnian, should be avoided, the more so, as its individual character is mainly due to the language contact with Italian and its dialects, especially that of Lower Molise." Ethnologue, Glottolog, and SIL (as well as Wikipedia) all followed the ISO’s lead and list the language under ‘Slavomolisano’, the Italian form of ‘Molise Slavic’ (which would also be fine). AFAIK I’m the only contributor to Wiktionary in this language to date. — Vorziblix (talk · contribs) 19:57, 15 April 2018 (UTC)
 * If most scholars as well as Ethnologue, Glottolog, SIL, and Wikipedia all call it Slavomolisano, shouldn't we do the same, rather than call it Molise Slavic? —Mahāgaja <small style="font-size:85%;">(formerly Angr) · talk 11:46, 19 April 2018 (UTC)
 * Sure, that would also be fine. The two are used pretty much interchangeably in scholarly publications (so the ISO note says ‘either in its Italian form or in translations’). It’s a bit of an odd situation, given the most prominent scholar in the field (Breu) submitted the language under ‘Slavomolisano’ to the ISO (hence the adoption by all the other organizations) but uses ‘Molise Slavic’ in his own recent English publications. But if you think it’s preferable to directly follow Ethnologue and the others I wouldn’t object. — Vorziblix (talk · contribs) 12:57, 21 April 2018 (UTC)
 * I’ll start switching this over to ‘Slavomolisano’ one of these days if no one has any objections. — Vorziblix (talk · contribs) 03:04, 24 September 2019 (UTC)
 * Done. — Vorziblix (talk · contribs) 16:05, 17 October 2019 (UTC)

RFM discussion: March 2019–February 2021
I believe that Category:Nama language should be renamed to Category:Khoekhoe language. The Nama are an ethnic group, but other ethnic groups like the Damara, Haiǁom, and ǂĀkhoe also speak the same language. It's more accurate and inclusive to call it Khoekhoe. Smashhoof (talk) 18:44, 18 March 2019 (UTC)
 * On Google Books, "Nama language" and "Khoekhoe language" are about equally common. This is misleading, however, because many of the hits refer one of multiple Khoekhoe languages, as the term is used somewhat more broadly by some authors. Further evidence of this is supplied by "speak Nama" being much more common than "speak Khoekhoe" on Google Books. If we restrict ourselves to linguistic literature, "Nama language" is significantly more common than "Khoekhoe language".
 * In short, we do sometimes use a less common name for a language in order to disambiguate or avoid a name widely considered offensive. The name "Nama" is less inclusive, but also slightly less ambiguous, and seems to be the most common name in usage, pace Wikipedia. —Μετάknowledge discuss/deeds 18:52, 18 March 2019 (UTC)


 * The native name is Khoekhoegowab (literally "Khoekhoe language"). "Nama language" may have been more common in the past, but the standardized language today is called "Khoekhoe" or "Khoekhoegowab". The dictionary I have says that "Khoekhoegowab is the Language of mainly the Damara, Haiǁom and Nama." In The Khoesan Languages (2013, Routledge Language Family series), they exclusively refer to the language as Khoekhoe; however, they distinguish between Namibian Khoekhoe (Nama/Damara) and Haiǁom/ǂĀkhoe. "Khoekhoe(gowab)" does seem to be a more accurate and preferred term to me. Smashhoof (talk) 21:43, 18 March 2019 (UTC)
 * If you search a digital copy of The Khoesan Languages, you'll see that different authors use different terminology in the book. The sections concerning the language in question call it [well, usually just Kh. for short], which is far less common than either  or  and little used in non-scholarly English. —Μετάknowledge discuss/deeds 00:58, 19 March 2019 (UTC)
 * Looking in my copy of The Khoesan Languages, I see a few usages of "Khoekhoegowab", but "Khoekhoe" seems to be used more. "Khoekhoe (N.Kh.) is strictly a suffixing language...", "Khoekhoe categorizes nouns according to ...". The whole morphology section on the language seems to use "Khoekhoe". The syntax section uses "N.Kh." for "Namibian Khoekhoe(gowab)". Regardless, since the standard language is called Khoekhoegowab, I think it would be best to call it Khoekhoe, as that seems to be equivalent and more common than the full Khoekhoegowab in English. Also, Wikipedia uses the term Khoekhoe (see Khoekhoe language), so it would also make sense to keep the same terminology between wikis. Though, that article does use the term "Nama" more often than "Khoekhoe", which is a bit odd given the title is "Khoekhoe language". Smashhoof (talk) 02:49, 19 March 2019 (UTC)


 * I asked someone who knows more about this and they said that Khoekhoegowab is the standard language, taught in schools and used in media, but Nama/Damara is the colloquial spoken language. Locals refer to it as Damaranama, Namadamara, Namataal, or Namagowab. Smashhoof (talk) 23:06, 18 March 2019 (UTC)


 * ✅: Category:Khoekhoe language. —Μετάknowledge discuss/deeds 02:05, 22 February 2021 (UTC)

RFM discussion: September–November 2020
I would like to propose splitting HWC (Hawaiian Pidgin) off of EN (English) due to the following reasons:

1. The discussion in Template talk:hwc is outdated. It took place back in 2012/2013. This is before the US Census recognized Pidgin as a separate language in 2015 (2-3 years later). As such, the general linguistic view of HWC's status may have changed as a result of federal recognition.

2. Many of the arguments these people used were flawed (at least in my opinion).

- They implied Hawaiian Pidgin shouldn't be classified as a creole due to its high levels of mutual intelligibility with Standard English, even going as far as to compare it with "gangsta slang" from NYC or LA. The problem with this argument is that intelligibility doesn't define a creole. Creoles are naturalized pidgins, and pidgins are: "an amalgamation of two disparate languages, used by two populations having no common language as a lingua franca to communicate with each other, lacking formalized grammar and having a small, utilitarian vocabulary and no native speakers". That perfectly defines the origins of Hawaiian Pidgin. It formed as a result of many immigrant groups learning English. In fact, Tok Pisin has a very similar origin.

- The central claim that Hawaiian Pidgin can be understood by English speakers is only a half-truth to begin with. Many English speakers say they understand "Pidgin", but that's because the majority of Pidgin speakers mix a lot of their speech with Standard English, even more so in formal contexts. Full-on Pidgin can be a tough nut to crack. Furthermore, it's interesting that they use the argument of mutual intelligibility on Wiktionary, the same website that lists Bulgarian and Macedonian as separate languages. Those two languages are more similar to each other than Standard English is to Hawaiian Pidgin, yet for some reason, the BUL-MAC languages are separated meanwhile HWC is listed as dialect of EN.

3. Even in Wiktionary, it gets referred to as a "creole language". In the page Hawaiian Pidgin, it literally says "A creole language based in part on English used by most "local" residents of Hawaii. It's kind of odd that Wiktionary defines it as a creole language, but doesn't actually list it as one.

These are my main reasons why I do think Wiktionary should split hwc off of en. — Coastaline (talk) 00:25, 11 September 2020 (UTC)
 * I started that discussion in 2012, and I don't really agree with how I approached it then. Hopefully I can do a better job now. Your first and third points aren't really relevant here: the political status of hwc doesn't have any practical bearing on its linguistic status, and our entry has no authority over our language treatment; it could well be changed. But your second point gets to the crux of the matter, and I am no longer certain how to proceed. Ultimately, we need to identify texts or recordings that we can agree are "pure" hwc, as opposed to being mixed with (colloquial) English. You mention Bulgarian and Macedonian; one reason it's easy to separate them is that there are lots of texts in each, clearly belonging to one or the other. Mutual intelligibility may be one of our best guides if we can't even decide what to put in which category. —Μετάknowledge discuss/deeds 02:45, 11 September 2020 (UTC)
 * My point wasn’t that the political status changes the linguistic status. I tried to say that federal recognition was a result of linguists changing their attitudes towards HWC, since the government wouldn’t randomly recognize it out of nowhere if it weren’t for linguistic perception. That’s why I thought the discussion needed to be revamped because there’s likely been a change in linguistic consensus from 2012 to 2020, if we’re seeing different developments like this.
 * And fair point about the third argument. But regarding the second one, I do think there are texts that are in pure hwc, such as the Hawaiian Pidgin Bible (Da Jesus Book). The book “Pidgin Grammar: An Introduction to the Creole English of Hawaii” also showcases pure Pidgin (although the explanations are in English). — Coastaline (talk) 04:27, 11 September 2020 (UTC)
 * Just to clarify: even if there's consensus to reinstate Hawaiian Pidgin as a separate language, the template hwc will not be recreated, because that's not how we work with languages anymore. Rather, an entry for  will be added to Module:languages/data3/h. I don't know anything about HP beyond what Wikipedia tells me, but if the entire Bible is written in the language illustrated by File:Hawaii Pidgin crop.jpg, then I do think it's appropriate to call that a separate language rather than a variety of English. It may not be as different from standard English as Bislama is (as Meta said 8 years ago), but I do get the impression that it's about as different from it as Jamaican Creole is, one big difference being that Jamaican spells things phonetically while Hawaiian sticks closer to standard English orthography ( vs. HP ), which makes HP easier to read for English speakers. —Mahāgaja · talk 15:24, 11 September 2020 (UTC)
 * Poking around the copy Google Books has snippets of, it does seem to all be like that, including e.g. "But wen da police guys wen come to da jail, dey neva find Jesus guys ova dea. So da police guys wen go back an tell da leadas: 'we wen find da jail door still yet stay lock, an da security guard standing by da doors, but wen we wen open um, nobody stay inside.' Wen da leada guys hear dat, da captain fo da temple guard an da main priest guys wen come mix up, an dey wen tink plenny wat goin happen." (I also found a book called "Pidgin to Da Max" which mentions a number of loanwords from Samoan, etc, and has its own usexes like "aalas dollahs means no mo' kala", and suggests the pronunciation of some words differs from English even when the spelling does not.) On one hand, it seems mostly intelligible, and there are substantially less intelligible dialects of English that we clearly treat as such; on the other hand, I see the argument above that it has a different origin. I have no strong opinion, but will say that on a practical level we do a poor job of covering most things we've merged into English, often having only a little vocabulary, little information on e.g. how verbs conjugate, and few usexes, since putting "wen da police guys wen come to da jail, dey neva find peopo" as a usex under police (as long as Hawaiian Pidgin is considered English) would probably be frowned on as inappropriately niche. - -sche (discuss) 09:42, 12 September 2020 (UTC)
 * Thank you for sharing your thoughts. I’d just like to mention that earlier, Metaknowledge said Bulgarian and Macedonian are separate languages since both lects can be easily identified in writing. But I do in fact believe Pidgin can be identified as well. Sure, not all Pidgin books spell things the same way but it generally keeps basic rules (such as “where” being “wea” and “the kind” being “da kine”). Pidgin writing as a result should be easily identifiable from American English. It’s kind of like Neapolitan, since it’s easy to identify it from Standard Italian but not everyone spells its words the same way, even among those who speak the same dialect of Neapolitan. Much like Pidgin, it’s not standardized but it still can be differentiated. — Coastaline (talk) 01:06, 13 September 2020 (UTC)

HWC Recognition (take 2)
, In September, I requested that Wiktionary should add hwc as a language. However, the discussion died down and the status quo continued. Once again, I'd like to propose recognizing hwc as a separate code from en. The main counter-argument to my first post was that there's no agreed-upon way of separating pure Pidgin from Standard English. I actually disagree with this claim, because material in pure Pidgin does exist. The entire new testament was translated into Hawaiian Pidgin, and they're actually close to finishing the old testament. If you go to a Hawaii bookstore or search in Google Books, you can find more literature in pure Pidgin. The spelling tends to be uniform for most common words, with "where" being "wea" or "the" being "da". "Hamajang" and "da kine" are also uniform, and so are most Pidgin-exclusive terms. Spelling variations may occur but I don't think it stops it from being listed as a language. Wiktionary lists Neapolitan as one, yet spelling in that language is not standardized and many variations of the same word/pronounciation occur.

So what are your thoughts? Also, in case there's any confusion, Coastaline was my previous account. I switched to this current one since I recently recovered it. ― Haimounten (talk) 21:15, 16 October 2020 (UTC)
 * These discussions can sometimes take a while. I'd encourage you to merge this into the original conversation and ping people who contributed, rather than creating a new section, which may confuse people new to this conversation. I think I now lean toward reinstating the code, but I still want to hear others' positions. —Μετάknowledge discuss/deeds 21:44, 16 October 2020 (UTC)
 * I apologize. I will merge it with the previous discussion. ― Haimounten (talk) 22:24, 16 October 2020 (UTC)
 * I’m going to ping others just in case this discussion got missed.  — Haimounten (talk) 01:57, 3 November 2020 (UTC)


 * It's a grey area: much of the vocabulary is either identical to, or readily intelligible to a speaker of, standard English, and because Wiktionary is a dictionary (with entries for individual words) rather than a collection of long texts written in the lect (the way a hwc.Wikisource or even hwc.Wikipedia would be), differences in things other than vocabulary (such as grammar) are relatively less prominent / significant. However, I'm inclined to treat it as its own language (a) because of the argument that its origin suggests it's a distinct lect rather than a variant of English, and because treating it as its own thing would be consistent with how we treat Jamaican Creole or West African pidgin(s) as their own lects, and (b) because we're not meaningfully covering it as English, and I can't see us starting to: I can't see anyone putting up with wen da police guys wen come to da jail, dey neva find peopo as a usex on police, and we don't do a good job of indicating when a "general English" word is also used in a dialect, so while peopo might have its own entry with en, it's unclear how anyone would know that e.g. police was used in Pidgin, since we wouldn't label it en. And we have someone who wants to add content in it and it does have its own ISO code, which we're no longer sure of our decision to retire. - -sche (discuss) 05:26, 3 November 2020 (UTC)
 * I'm still in favor of including hwc as a separate language, if for no other reason than that "wen da police guys wen come to da jail, dey neva find peopo" is not a sentence of English, not even a nonstandard variety. —Mahāgaja · talk 12:48, 4 November 2020 (UTC)
 * OK, I [//en.wiktionary.org/w/index.php?title=Module%3Alanguages%2Fdata3%2Fh&type=revision&diff=61037531&oldid=60437522 restored] it, under code hwc and name "Hawaiian Creole" (since Ngrams suggests that's more common, and WP suggests it's also more correct, than Hawaiian Pidgin). I will say, though, there are things less "English" than "wen da police..." which we treat as, like various dialects of England. (To pluck two examples out of the texts in the EDD, "A dooat like t'thowts a bin ower gyversum an hankeran eftre it", "Lükee zee tü 'er, 'er'th agot a rat! My eymers, 'ow 'er shak'th 'n!" These are far from the most extreme examples, just two examples I found in a quick search.) - -sche (discuss) 17:57, 4 November 2020 (UTC)
 * (Notifying that the code has been restored.) - -sche (discuss) 17:59, 4 November 2020 (UTC)
 * I would not necessarily be opposed to creating language codes for some traditional dialects of England as they are certainly as different from standard English as Scots and Yola are. —Mahāgaja · talk 19:22, 4 November 2020 (UTC)

RFM discussion: October 2018–February 2021
We currently call this "Nhengatu", but is where we've put our actual lemma for the language name, and it does seem to be more common in English per BGC. —Μετάknowledge discuss/deeds 19:09, 21 October 2018 (UTC)
 * Support. - -sche (discuss) 17:59, 14 November 2018 (UTC)


 * ✅ —Μετάknowledge discuss/deeds 00:23, 22 February 2021 (UTC)

RFM discussion: August–October 2020
The names of most languages which belong to the Northern branch of the Jê family as they are now are not the ones actually in use (either by linguists or by their speakers), and for a couple of languages even Glottolog (let alone Ethnologue) gets it right. I suggest:
 * renaming Suyá (suy) as Kĩsêdjê (of which Suyá should be an alias);
 * renaming Kayapó (txu) as Mẽbêngôkre (of which Kayapó and Xikrin are two dialects) — if this is deemed too diacritic-heavy, Mẽbengokre is also acceptable, but Mẽbêngôkre is the standard in the field nowadays;
 * renaming Pará Gavião (gvp) as Parkatêjê;
 * renaming Apinayé (apn) as Apinajé;
 * creating codes at least for Pykobjê and Tapayuna;
 * creating a code for Proto-Northern Jê, which would be a daughter of Proto-Jê and an ancestor of all the aforementioned languages as well as Krahô and Canela.

I would also be happy to know if I can make any of the above changes myself without being an admin (I am new to Wiktionary so not everything is clear to me yet). Degoiabeira (talk) 01:19, 23 August 2020 (UTC)


 * In answer to the last question, on a technical level you can't make these changes without being an admin or Template Editor, since they require changing Module:languages. But at least three admins have eyes on this now (User:Mahagaja, User:Metaknowledge, and me) so any changes that need to be made will get made after discussion here. :)
 * As for the suggested renames: AFAICT Suyá is much more common than Kĩsêdjê (or Kisedje) overall, though most occurrences refer to the tribe, and many of the occurrences which refer to the language are from 25+ years ago, as you brought up on Mahagaja's talk page. Even then, judging by, the two names seem roughly equally(?) common in books from the last 25 years. On one hand, Kĩsêdjê being the autonym speaks in favor of it; on the other hand, Suyá being historically more common speaks in favor of it. (From the perspective of ease of input, both names contain diacritics, so neither is per se easy to type.) The situation with Kayapó / Mẽbêngôkre is much the same.
 * For Parkatêjê, it seems like that name might indeed be most common, aided by the fact that there are so many alternatives ("Gavião Perkatêjê", bare "Gavião", "Gavião do Pará", etc) that no one name is that common AFAICT.
 * For Apinayé, judging by both books and scholarly journal articles, Apinayé seems to be the most common name even in recent sources, and it is also most common among the sources Glottolog lists which are specifically about the language and which use one spelling or the other in their names, so that is fine as-is and should not be renamed.
 * I will look into Pykobjê and Tapayuna.
 * Creating a code for Proto-Northern Jê seems like a good idea (there are reconstructions of it), especially if you will be adding entries in it or mentioning it in etymologies of other words. :)
 * Pinging User:Ungoliant_MMDCCLXIV, in case you have anything to add.
 * - -sche (discuss) 07:41, 29 August 2020 (UTC)
 * I assume the other admins I mentioned above are following this, having participated in it when it was on the user talk page of one of them, but I'm going to re-ping User:Ungoliant MMDCCLXIV because I recall that pings must(?) be added in the same line as signatures in order to work. - -sche (discuss) 07:48, 29 August 2020 (UTC)
 * The mention doesn't have to be in the same line as the signature, but mw:Manual:Echo seems to say that it has to be in a chunk of added lines that contains a signature. — Eru·tuon 19:22, 31 August 2020 (UTC)
 * Not much more than what you’ve researched yourself. Ultimately each language will have to be researched individually to find its ideal canonical name. I do recall reading that our canonical name of a native Brazilian language was considered an ethnic slur by the people who speak it. I think it was one with a bird’s name, maybe Gavião or Urubu Kaapor.
 * I’ll add that I believe the main criterion in choosing a canonical name should be the name used in English-language media only. If multiple English-language names exhibit roughly the same amount of use, only then should its status as an autonym or not be considered as a “tiebreaker” (as should practical concerns such as the non-use of diacritics). — Ungoliant (falai) 22:21, 29 August 2020 (UTC)
 * Thanks for looking into it. The situation with Kayapó vs. Mẽbêngôkre is arguably not the same, however: the label Kayapó has been ambiguously used for designating both (i) a specific Mẽbêngôkre-speaking people (that is, to the exclusion of the Xikrin) as well as the dialect they speak and (ii) as a synonym of Mẽbêngôkre (that is, encompassing both the Kayapó and the Xikrin). In this sense, Mẽbêngôkre is preferrable because it helps avoid this ambiguity. Kayapó and Xikrin could be listed as varieties of Mẽbêngôkre (perhaps as different between each other as are Quebec French and Hexagonal French). Regarding the remaining comments, I realize I might be an interested party (I've been doing comparative and some descriptive work on Northern Jê for ~6 years and I've heard too many linguists' and native speakers' comments regarding their preferred ways to spell the names of these languages), so I guess I'll just wait for a consensus among unbiased Wiktionarians to form. I can also confirm that the Proto-Northern Jê reconstructions you might have seen out there are most probably mine and yes, I would be willing to add the respective entries (guess it's not a problem as long as the reconstructions are published in peer-reviewed outlets). Degoiabeira (talk) 18:43, 4 September 2020 (UTC)


 * I got a bit distracted, but returning to this, I added a code for the family Northern Jê, sai-nje, and a code for the reconstructed language Proto-Northern Jê, sai-nje-pro. You may already be familiar with how to use them, I don't know, but if not: reconstructions in it would go the Reconstruction namespace (Reconstruction:Proto-Algonquian/askyi is an example entry in that language), and are mentioned in etymologies in the way you see in e.g. hàki, while the family could can be used if you don't know the precise language (as in sagamité). - -sche (discuss) 10:12, 12 September 2020 (UTC)
 * I've also added a code for Tapayuna, sai-tap and as sai-pyk. - -sche (discuss) 10:41, 12 September 2020 (UTC)
 * Brilliant, thank you! Degoiabeira (talk) 04:56, 5 October 2020 (UTC)

RFM discussion: March–July 2021
Hi, I propose renaming Chuvantsy to Chuvan. The pro of this renaming is the consistency between the WM projects (both and Wikidata favour this name). Furthermore it's a more simple name, that is easily deductible from. I don't personally see any downside to this, since all the primary (descriptive) sources on this language are written in Russian, and the term Chuvan seems to be just as, if not more, often in use as Chuvantsy. Thadh (talk) 23:43, 22 March 2021 (UTC)
 * Renamed to "Chuvan". —Μετάknowledge discuss/deeds 23:13, 21 July 2021 (UTC)

Judeo-Arabic
(See Category_talk:Arabic_language.) - -sche (discuss) 04:24, 3 October 2021 (UTC)

RFM discussion: August 2020–July 2021
It's time to start chipping away at the mess that is our Berber codes. Wikipedia considers the to be spoken in three oases: Sokna, Fogaha, and Tmessa. According to van Putten, the claim of any Berber variety spoken in Tmessa is wholly unsubstantiated. That leaves Sokna and Fogaha, which were each documented by different Italian scholars using their problematic style of notation, but do seem to be significantly different nonetheless. I suggest that we add a code ber-fog for Fogaha (variously spelled "Fuqaha", "Foqaha"; the definite article is often tacked on, including by its only documenter Paradisi, who wrote "El-Fogaha", but this seems contrary to one's expectations when it comes to alphabetisation). —Μετάknowledge discuss/deeds 07:39, 10 August 2020 (UTC)


 * Support. (I can even find such spellings as "Fojaha", "Al-Fojaha", "Fodjaha",...) Off of the immediate topic, but re Tmessa: I can find quite a few overviews which mention Tmessa, but AFAICT none provide detail about it. The Oxford Handbook of African Languages (Rainer Vossen, Gerrit J. Dimmendaal, 2020), page 285, says it is merely undocumented, not nonexistent, as part of a lengthy lament about Berber ("an unconvincing classification has been provided by Aikhenvald and Militarev (1991), [...] at many points this classification seems to be arbitrary. Moreover, at points it classifies dialects which are fully undocumented (e.g. Tmessa in Libya) or which are not Berber at all (e.g. Tadaksahak, which is Northern Songhay, and the Kufra oases, which are Teda-speaking). Unfortunately, some of the main lines of this classification have been taken over by the Ethnologue"). I reckon these changes are fine, though, since if any documentation subsequently emerges attesting Tmessa, it'll either show that it should be (re)included in an existing code or given its own. - -sche (discuss) 21:28, 11 August 2020 (UTC)


 * Here's a blogpost that explains the deal with Tmessa. As for the spellings with (d)j, they merely betray people who have made a faulty transcriptional assumption because they can't actually read the Arabic script... —Μετάknowledge discuss/deeds 22:17, 11 August 2020 (UTC)


 * Added. —Μετάknowledge discuss/deeds 06:06, 22 July 2021 (UTC)

RFM discussion: August 2020–August 2021
The current state of our Berber (Amazigh) codes is the ISO codes without any modifications made to their scheme, depsite its many problems. The Berber languages have comparable diversity to Romance, but specialists are distinctly uncomfortable with the idea of dividing up what is mostly a dialect continuum into over 25 languages, as Ethnologue does, despite the fact that nobody disputes this practice for Romance (even though on grounds of mutual intelligibility, it's hard to say why, say, Galician and Portuguese should be considered separate). For the most part, I think we should follow the Romance practice and be (mild) splitters; in cases like Tashelhit and Central Atlas Tamazight, you wouldn't gain much by merging (as most entries would still be on separate pages), and the remarkable internal dialectal diversity in these "languages" means that good coverage will still require a great deal of dialect-tagging. I have worked through the Berber classification used by Kossmann (2013) below, using his numbered "blocks", with my notes on changes to our scheme that I recommend.

Feedback would be appreciated, especially on the naming of North Saharan Berber. Note that I intend to give etymology-only codes for all the dialects that end up getting merged. —Μετάknowledge discuss/deeds 22:04, 10 August 2020 (UTC)
 * 1) Two languages, Zenaga [zen] and Tetserret [tez], both separate from the continuum and each other. No changes needed.
 * 2) The Tuareg group, which has a macrolanguage code [tmh], but also codes for Tahaggart [thv], Tawallammat [ttq], Tayart [thz], Tamasheq [thq]. Tamasheq is meant to encompass both Adagh and Taneslemt, and the sole sedentary Tuareg lect, Ghat (spoken in Libya) is missing a code. I struggled to find data on mutual intelligibility; Adam (2017) indicated that speakers from Ghat had a lot of trouble understanding non-Libyan dialects, but that this might be a recent phenomenon. There is a written Tuareg standard using Tifinagh, although it does not indicate vowels and does not seem widely used any more. Clearly, the current state is redundant, and I think collapsing into a single "Tuareg" would be workable, but perhaps not ideal.
 * 3) South-Central Moroccan group, Tashelhit [shi] and Central Atlas Tamazight [tzm], both written in Tifinagh. There is also a poorly defined Judeo-Berber [jbe], which I don't want to get into (but reminds me of the issues with Judeo-Arabic). These are all fine, but ISO also gave a politically-motivated code for "Standard Moroccan Tamazight" [zgh], which is a redundant literary koiné that I have already posted a request to delete above.
 * 4) Northwestern Moroccan group, Ghomara [gho] and Senhaja de Srair [sjs]. No changes needed.
 * 5) The Zenatic group, and here is where things get messy.
 * 6) *The existing codes in the West are Tarifit [rif], Chenoua [cnu], Tachawit [shy]. Missing are Iznasen (sometimes considered "Eastern Riffian", so presumably merged under [rif]) and Eastern Central Atlas Tamazight (a kind of contact language between [tzm] and [rif], presumably merged under [tzm]).
 * 7) *In the Moroccan and Algerian oases, Tagargrent [oua], Tugurt/Temacine [tjo], Taznatit [grr], Tumzabt [mzb], and Tidikelt [tia] (which is almost undocumented!) all have codes, thus excluding Sud-oranais (principally the Figuig dialect, which has a grammar). Mutual intelligibility in this cluster of dialects is known to be high, and this seems ridiculously oversplit; I would recommend one code, although I'm not sure what to call it. Ethnologue calls it "Mzab–Wargla", which are only two of the dialects and is obviously not great, while Kossmann calls it "Northern Saharan oasis" Berber, which is highly clunky, and my "North Saharan Berber" is a protologism.
 * 8) *In the East, there is a single code for Sened [sds], an extinct dialect in Tunisia, and no codes for the extant Tunisian varieties (principally Djerbi), nor for the closely related Zuwara dialect over the border in Libya. I recommend that [sds] be repurposed as "Tunisian Berber", and a new code be added for Zuwara (also known as "Zuaran"), which has a grammar by Mitchell; this could be considered oversplitting, but it would be odder for "Tunisian Berber" to have entries that are almost all from Libya.
 * 9) The remaining languages of the East belong to various blocks, and are mostly covered appropriately: Kabyle [kab], Nafusi [jbn], Siwi [siz], Sokna [swn], Ghadames [gha], and Awjila [auj]. The only one missing is (the presumably extinct, and thus finite) Fogaha, which I recommended for a new code in its own section above.


 * Re North Saharan Berber: I think naming it "Northern Saharan Berber" is OK; we're not inventing the grouping, it's a descriptive name, and it's similar to and simply trims an excess word off of an existing name used in literature. - -sche (discuss) 21:06, 11 August 2020 (UTC)
 * Reviewing Tuareg further, I see that Heath (2005) refuses to take a stand: "it is very difficult to decide whether we are dealing with a single "Tuareg" language (with many dialects), or two or more languages (each with some internal dialectal variation)." That said, he endorses the four-way distinction (where "Tahaggart" should be renamed to "Tamahaq" to include the ), supports the bundling of Tamasheq, and makes no mention of Ghat, presumably lumping it into Tamahaq as well. The 4 lects with codes all have good dictionaries (ttq and thz share a dictionary, but all forms are marked for which lect they belong to), and 3 of the 4 have grammars, so they won't be hard to sort out or attest. As a result, I now definitively lean toward retiring the macrolanguage and keeping the split codes. —Μετάknowledge discuss/deeds 06:26, 26 August 2020 (UTC)
 * I still haven't dealt with this, and I found reason to merge Tuareg instead of splitting it. Sudlow (2008) says: "Tamasheq has split into three major dialects which the Tamasheq themselves refer to as ‘sha’, ‘za’ and ‘ha’, or Tamasheq, Tamajeq, and Tamahaq." His grammar itself supports the utility of that view, as it treats two dialects spoken together (often in a mixed form) in Burkina Faso, which would have to go under two different language codes in the splitter scenario. We would have to dialect-tag our entries carefully, of course. —Μετάknowledge discuss/deeds 23:33, 8 March 2021 (UTC)
 * Kossmann, who is a leading Berberist, offers this judgement on Tuareg: "In spite of important differences, it would be exaggerated to consider these variants distinct languages." This makes me feel very secure in merging them, so I will finally get all of this in working order soon. —Μετάknowledge discuss/deeds 19:41, 24 August 2021 (UTC)


 * I have made all the changes recommended in this section. I chose [mzb] as the code for Northern Saharan Berber, and preserved the Tuareg dialect codes as etymology-only codes (adding an etymology-only code [ber-ght] for Ghat). I added Zuwara as [ber-zuw]. I also merged Judeo-Berber into Tashelhit, following our practice with Judeo-Arabic (see Category:Judeo-Berber). —Μετάknowledge discuss/deeds 05:37, 25 August 2021 (UTC)

RFM discussion: July–October 2021
We currently call this language "Bangi Me"; the only grammar of it, and most recent scholarly work, uses the spelling "Bangime" instead. —Μετάknowledge discuss/deeds 19:14, 14 July 2021 (UTC)


 * Support. (I notice Appendix:Bangime word list already uses the unspaced form, unlike the entry Bangi Me and categories.) - -sche (discuss) 04:57, 15 July 2021 (UTC)


 * ✅ —Μετάknowledge discuss/deeds 07:27, 2 October 2021 (UTC)

RFD discussion: August–October 2021
Currently there are 2, but none are sourced. ·~  dictátor · mundꟾ  19:03, 26 August 2021 (UTC)


 * That alone is not a reason to delete them. Perhaps will comment. —Μετάknowledge discuss/deeds 03:30, 6 September 2021 (UTC)
 * So long as the child entries are on the Sanskrit entry, I don't mind if they're deleted, not because they're unsourced, but because Proto-Dardic is controversial. -- 06:33, 6 September 2021 (UTC)
 * Delete per
 * Beer_parlour/2020/August
 * Also, Reconstruction:Proto-Dardic/múkʰa- is poorly formatted. Kutchkutch (talk) 11:25, 6 September 2021 (UTC)
 * To clarify, I did not nominate them for deletion simply because of them being unsourced, but because of the language: seeing that some Proto-Indo-Aryan entries have also been deleted lately. ·~   dictátor · mundꟾ  06:56, 7 September 2021 (UTC)


 * As far as the descendants of categorized Proto-Dardic entries are concerned, there are no more any instances of a Sanskrit entry wanting Dardic descendants. We have thus far spotted 4 Proto-Dardic entries ([1], [2], [3], [4]), and all of them are prepared for deletion. ·~   dictátor · mundꟾ  13:57, 6 September 2021 (UTC)
 * Delete Svārtava2 • 14:01, 6 September 2021 (UTC)
 * . ·~   dictátor · mundꟾ  15:35, 27 September 2021 (UTC)
 * Delete. Those entries are unnecessary and involve complete guesswork. We can do without them. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 16:11, 27 September 2021 (UTC)
 * Does this mean that we should not be reconstructing Proto-Dardic at all? If so, the real solution would be to remove the protolanguage code entirely — is that desired? Also, before they can be deleted, someone has to move the descendants. —Μετάknowledge discuss/deeds 22:05, 8 October 2021 (UTC)
 * @Metaknowledge I would be supportive of that. No problem in removing it. I'll move the descendants. Svartava2 (talk) 02:54, 9 October 2021 (UTC)
 * The descendants had already been moved one month ago, but I am not sure if we should get rid of the lang code, as it’s used in the Descendants section. While the Proto-Dardic entries are to be deleted, should we keep the reconstructed forms in the Descendants section?  has removed the terms without prior discussion.  ·~   dictátor · mundꟾ  06:48, 9 October 2021 (UTC)
 * I did see when I thought of moving the descendants that they were already moved. We can simply show  in the descendants, name of the family (cat:Dardic languages) instead of . Svartava2 (talk) 06:54, 9 October 2021 (UTC)
 * Even if the protolanguage code is removed and the reconstructed forms in the Descendants section are not needed, the  label would still be necessary to separate those languages from other Sanskrit descendants. Kutchkutch (talk) 13:23, 9 October 2021 (UTC)
 * Proto Dardic did most likely exist but there aren't enough published sources to cite while creating reconstructions. The study in the area is quite murky, with some authors even confusing them with Nuristani languages (Nuristani languages are geographically close to the Dardic languages but form a separate subfamily within Indo-Iranian unlike Dardic which is part of the Indo-Aryan subfamily). So yeah, Wiktionary would presently lose nothing by removing the  code. It's like having Proto-Bengali-Assamese or Proto-Tamil-Kannada. If some good reliable sources that deal in Proto-Dardic become available later on, we can always add the code back. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴)  13:02, 23 October 2021 (UTC)


 * I have removed the Proto-Dardic language from the module and its entries. Note to archiver: please archive this discussion to WT:LTD. —Μετάknowledge discuss/deeds 20:38, 23 October 2021 (UTC)
 * The errors in Category:Pages with module errors that need to be addressed. Should every instance of the code wiki in descendants trees be replaced by wiki or should this stage be removed altogether? Kutchkutch (talk) 12:43, 25 October 2021 (UTC)
 * @Kutchkutch for now it's ok to change all these to, so that for is little change (just   and recons term/  removed) and because we wouldn't have to change the ordering like if there is "" before Gujarati we'd have to move the, say, Kashmiri descendant after Gujarati one in alphabetical order; hence its easier to be done by bot (most of these replacements were done by ). From now it's up to the editor what to do; like with   — diff and   are both valid and neither is wrong. However I do think that if there is only one desc like it would be better to show that one separate without label wiki. Svartava2 (talk) 15:09, 25 October 2021 (UTC)

RFM discussion: April 2020–March 2021
An absurd language name apparently copied from glottologue. This is not even attestable, nor its synonym Levantine Bedawi Arabic, and no native would ever use this to create an entry. “Bedawi” is = bedouin. The “characteristics” in the Wikipedia article  linked are in every or many Arabic dialects. There are difference in urban and desert-dweller speech everywhere but one sorts these speeches under the regiolects, so it would be Egyptian Arabic, South Levantine Arabic, South Levantine Arabic, Najdi Arabic. Fay Freak (talk) 13:11, 5 April 2020 (UTC)
 * Removed. —Μετάknowledge discuss/deeds 01:39, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021
Unattestable or SOP language name fudged by the for-profit language database ; the linked Wikipedia pages will always stay stubs. The  sorts this under Algerian Arabic. There is a continuum with Moroccan Arabic. Pinging following our considerations at Talk:زرودية; nobody has ever thought such a category. One can attest the term “Saharan Arabic” but this is of course not meant to denote one language. As distinguished from Hassānīya Arabic one could need codes eastwards for some dialects spoken in the Sahara and Sahel, like for Malian and Nigerien Arabic – Chadian Arabic we have as. Fay Freak (talk) 13:31, 5 April 2020 (UTC)
 * I think the idea here is that Algerian Arabic as spoken in Algiers belongs on a continuum that speakers in the southern and western desert of Algeria (inclusing many who are not ethnically Arab) do not belong on. I don't see the use of this code, though, because that speech definitely counts as Hassaniya. (As for the other countries on the fringe, we seem to count Nigeria and Cameroon under Chadian Arabic, which is less than ideal, but then again, I think that our whole system needs to be more like Chinese to be functional at all.) —Μετάknowledge discuss/deeds 18:18, 5 April 2020 (UTC)


 * Removed. —Μετάknowledge discuss/deeds 01:41, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021
Hardly attestable language name. Occurrences for it turn out as “Saʿiidi Arabic” for example. “Upper Egyptian Arabic” (I don’t see use of “Upper Egypt Arabic”, but this is a detail) is sometimes posited, variously ill-defined because of superstratum influence from the Nile Delta, but this is usually not accepted as distinct language and can well be an sum-of-parts term; Arabic Wikipedia clearly says Ṣaʿīdīy Arabic is “within the Egyptian Arabic dialects”. There are but some isoglosses dividing Arabic-speaking Egypt latitudinally, but so one can shed urban and bedouin speech and Muslim and Christian speech. English Wikipedia says “the realisation of /q/ as [ɡ] is retained (normally realised in Egyptian Arabic as [ʔ]” but this is the speech of the desert vs. the speech in the largest cities and [ɡ] for can be found in northern Egyptian bedouins. Most natives of Upper Egypt would use the code for Egyptian Arabic, and I would too sometimes with label “Upper Egypt” or more specific labels. I have created the single entry in “Saidi Arabic” and only because there was this code. Fay Freak (talk) 13:54, 5 April 2020 (UTC)
 * Removed, with a new category Category:Upper Egyptian Arabic. —Μετάknowledge discuss/deeds 01:43, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021
The categories are ill-defined; the second one is unattestable. is a dialectologically complicated area – the material is even partially collected outside of Yemen, as for instance the Dictionary of Post-Classical Yemeni Arabic 1990 is surveyed in Israel – with many isoglosses and the number of distinct languages is multiplied at will if one is oblivious of SOP designations of lects so for this reason, apart from nobody being able to safely use these categories, it is not wise to distinguish at the L2 level already. If you look into the Dialect Atlas of North Yemen and Adjacent Areas published by Brill four years ago and probably also you will forget those categories and won’t be able to map content onto the codes. Apart from that the distinction is inconsistent with the fact that we have a code for “Judeo-Yemeni Arabic”  – as if Jewish speech in Yemen were one language while Muslim speech were three! I just mention here that I find “Judeo-Arabic” and its sub-languages suspicious, it could all be only Classical Arabic or Arabic dialects with some peculiarities written in Hebrew script, remaining from a time when Wiktionary did not use  for this purpose. The Jews might deal with it themselves. Fay Freak (talk) 14:23, 5 April 2020 (UTC)
 * I pinged you in a separate discussion above about Judeo-Arabic. As for Yemen... it is true that the codes are oversplit, but it is also true that the WAD attests to the fact that when Hejazi and Omani speakers disagree about a word, chances are that North Yemen agrees with the Hejaz or Egypt, and that South Yemen agrees with Oman. The existence of Piamenta's dictionary gives me hope for treating Yemen as a unit, but I don't know how he does it (do you have a PDF?). —Μετάknowledge discuss/deeds 18:33, 5 April 2020 (UTC)
 * Nay, I do not. Fay Freak (talk) 18:36, 5 April 2020 (UTC)
 * I find this “South Yemen” and “North Yemen” part difficult and think that one needs an elevation profile of Southwest Arabia to dissect the Arabic dialects of and close to Yemen. Fay Freak (talk) 18:41, 5 April 2020 (UTC)
 * Deboo's Jemenitisches Wörterbuch is another good example of treating all Yemeni varieties together, with scrupulous dialect-tagging. —Μετάknowledge discuss/deeds 01:54, 29 March 2021 (UTC)


 * Merged. I have arbitrarily chosen [ayn] as the new code for "Yemeni Arabic". —Μετάknowledge discuss/deeds 01:54, 29 March 2021 (UTC)

Adding Bwala
This doesn't seem to merit a discussion, but I am putting this here for posterity: I am adding Bwala, a previously undocumented Bantu language, as [bnt-bwa] based on Bollaert et al. (2021). —Μετάknowledge discuss/deeds 20:05, 21 December 2021 (UTC)

RFM discussion: August 2021
We are currently treating Yavapai  as a separate language from Havasupai-Walapai-Yavapai , but some of our yuf entries (e.g. ) are tagged as Yavapai. Is there any difference between nai-yav and the Yavapai dialect of yuf? If not, can we merge nai-yav into yuf? And if we do want them separate, can we rename yuf "Havasupai-Hualapai" (as per Wikpiedia) or "Havasupai-Walapai" rather than keeping a different language's name in there? We only have one entry for nai-yav, namely ke. seems to be the only person who's dealt with these languages in the past. —Mahāgaja · talk 11:07, 30 August 2021 (UTC)


 * Interesting; I dug into the edit histories, and I added Yavapai in [//en.wiktionary.org/w/index.php?title=Module:languages/datax&diff=next&oldid=37420153 February 2016], at which time I also added [//en.wiktionary.org/w/index.php?title=ke&type=revision&diff=37420185&oldid=37119324 a word]. I added [//en.wiktionary.org/w/index.php?title=ha&type=revision&diff=38012019&oldid=37703424 Walapai]-and [//en.wiktionary.org/w/index.php?title=%27ha&oldid=38012016 Yavapai]-specific content under the trinitarian name Havasupai-Walapai-Yavapai a couple months later; I must've been looking to add the Walapai content, saw we had a code for Havasupai-Walapai-Yavapai, since Stephen G Brown had previously added Havasupai-Walapai-Yavapai content like T:yuf-personal pronouns and [//en.wiktionary.org/w/index.php?title=ma&type=revision&diff=20471984&oldid=20227324 entries for] [//en.wiktionary.org/w/index.php?title=nya&type=revision&diff=20471964&oldid=20283850 those pronouns] without dialectal labelling, and forgot about the Yavapai-specific code. Wikipedia has Havasupai-Walapai and Yavapai as distinct but similar languages (a 1978 United States Indian Claims Commission decision says "the Walapai and Havasupai [...] spoke a language and had a culture very similar to the Yavapai"); OTOH, Thomas Sebeok's Native Languages of the Americas, volume 1 (2013), page 467, cites them as an example of ethnic groups that fought "even though [...they] spoke the same Yuman language". Christopher Moseley's Encyclopedia of the World's Endangered Languages (2008) calls the language "Upland Yuman" ("Upland Yuman is a Yuman language"), spoken by the "Hualapai, the Havasupai, and the Yavapai, the last traditionally divided into four regional subtribes. Each community speaks a distinct variety, with the Yavapai varieties forming a well defined dialect, although all varieties are mutually intelligible with little difficulty." I suppose they should be merged. - -sche (discuss) 18:53, 30 August 2021 (UTC)
 * OK, I moved our one nai-yav entry to yuf, deleted all the nai-yav categories, and deleted nai-yav from the language modules. —Mahāgaja · talk 10:39, 31 August 2021 (UTC)

RFD discussion: November 2021–January 2022
Knaanic hasn't been definitively proven to exist as a distinct language yet, and its few attestations (and entries here) seem indistinguishable from standard Slavic languages in Hebrew transcription. Here's a pretty good article by Dovid Katz on the topic. airy—zero (talk) 03:22, 4 November 2021 (UTC)
 * Knaanic is a useful holding category for the scattered attestations of West Slavic words in Hebrew script, which have to be included in the dictionary some way or another. This discussion doesn't belong at RFDO, because there is no chance we will delete all those entries, which pass CFI and definitely belong here. The question is whether we want to merge Knaanic into another language code, presumably Old Czech, although in all honesty, I rather doubt that all the Knaanic attestations can really be called Old Czech in good faith. The difficulty of deciding where they ought to go, and whether people would even think to look for them there, is why Knaanic exists as a separate language here in the first place. —Μετάknowledge discuss/deeds 06:11, 4 November 2021 (UTC)
 * Oh, okay — thanks for the explanation! airy—zero (talk) 12:57, 4 November 2021 (UTC)

RFD-resolved. If you want to merge lects, propose something in WT:BP. &mdash; Fytcha〈 T | L | C 〉 12:13, 22 January 2022 (UTC)

RFM discussion: January–February 2022
@Fish bowl, @Solarkoid, @Lunabunn, @Echo Heo (feel free to tag anyone else). Currently, Early Modern Korean (EMK) is listed as an etymology-only language, limiting its usage as a header and leading all EMK terms to be listed under the "Korean" header with an "Early Modern" label. However, through several discussions on the matter (thank you @ Tibidibi), it's been made more and more clear, that it needs to be separated out from Modern Korean in order for its coverage to be done well. Here are a few reasons (feel free to make any corrections if needed): Overall, having Early Modern Korean as its own L2 would greatly improve its coverage if done well, make life easier for the Koreanic editor community, make our content clearer and more precise for readers, and finally put Wiktionary in line with Koreanic Linguistics as a whole. AG202 (talk) 09:06, 23 January 2022 (UTC)
 * 1) Contemporary Korean Linguistics and Korean dictionaries almost consistently separate out EMK as a separate language. For Korean linguistics done in Korean, the word  is used exclusively for it, and it's analyzed as its own. As for English sources, Cho, Sungdai; Whitman, John (2019), Korean: A Linguistic Introduction and Lee, Ki-Moon; Ramsey, S. Robert (2011), A History of the Korean Language along with several other resources also make the distinction.
 * 2) Early Modern Korean used a rather different orthography compared to Korean with letters like  and  becoming obsolete after that time period, leading Modern Korean speakers without training in the language to struggle to comprehend it.
 * 3) There are terms that are solely used in EMK such as  that deserve to have their own space and coverage rather than thrown under the "Korean" header and left alone.
 * 4) Currently our etymologies are unclear in terms of showing the development of words from Middle Korean (MK) to EMK to Modern Korean (MoK) such as in, or often skipping EMK altogether due to the lack of clarity. Additionally, with words such as , we'd be able to show more clearly the relation between EMK terms and their dialectal descendants.
 * 5) Separating EMK out will allow us to more easily create pronunciation & transcription modules for the language (such as what was done with Jeju in Module:jje-pron & Module:jje-translit), rather than the somewhat inconsistent transcriptions that have been used in the past or attempting to include all the EMK use-cases within the current system.


 * @AG202Strong support on all points. As a way of division, I suggest that:
 * The split between Middle Korean and EMK is generally a hard line at 1600, but the few texts that use MK-only characters may be treated as MK.
 * The split between EMK and modern Korean is more fluid.
 * Texts made by foreign learners of the language in the late nineteenth century and onwards are considered modern Korean, despite being written in an EMK orthography, because they are often the first attestations of archetypally modern Korean words not attested in EMK, such as and.
 * , the language of “modernizing” texts from the period from c. 1890 to 1910, is considered modern Korean despite having EMK orthography, because they are the first attestations of many of the Japanese orthographic borrowings (wasei kango) totally absent in EMK but a defining feature of modern Korean vocabulary.
 * “Traditional” texts from the 1890–1910 period, such as personal letters of provincial people less affected by Westernizing trends or traditional novels untouched by Western and Japanese influence, are still considered EMK. The quotation at, despite the late date, is a typically EMK material and would hence be quoted at.
 * For IPA, I suggest aiming at a 1750 pronunciation, with the assumption that the vowel qualities were the same as in MK.--Tibidibi (talk) 09:39, 23 January 2022 (UTC)

Being that it's been three weeks, the only active EMK editor is in strong support, and there hasn't been any opposition (along with tagging all of the Korean editors in the working group), I'd like to close this as RFM-Split. ,, if you have the time, please. Edit: Apologies for forgetting to sign, and thank you @ J3133, for pinging them for me. AG202 (talk) 01:07, 13 February 2022 (UTC)
 * ✅. Entries need to be moved over manually though &mdash; S URJECTION / T / C / L / 11:32, 14 February 2022 (UTC)

RFM discussion: April–May 2022
According to this ISO change request (Kulon-Pazeh Change Request Documentation), Kulon (uon) and Pazeh (pzh) are now separate languages. How do we update Wiktionary to add data specifically for pzh from now on (given that data is nonexistent for the long-extinct Kulon language). I'm in the process of adding lemmas and example sentences for several of the western Formosan languages. Kangtw (talk) 19:57, 3 April 2022 (UTC)

Chuck Entz (talk) 04:20, 4 April 2022 (UTC)
 * Given that all (uun)-lemmas are actually Pazeh lemmas, the easiest thing would be to just to rename "Kulon-Pazeh" to "Pazeh". –Austronesier (talk) 19:06, 6 April 2022 (UTC)
 * Well, and to recode entries to use the right code. (Is there really no data on Kulon?) Other ISO code changes discussed at Beer_parlour/2022/March btw. - -sche (discuss) 22:16, 6 April 2022 (UTC)


 * Follow-up: Etymology scriptorium/2022/May. - -sche (discuss) 17:52, 7 May 2022 (UTC)
 * See that discussion for more, but: I added uon and pzh; once instances of uun are updated, it can be removed. - -sche (discuss) 20:38, 8 May 2022 (UTC)

RFM discussion: Moving Finno-Ugric families to Uralic families
We don't recognize Finno-Ugric as a valid family; just Uralic. Hence  is a valid code, but   isn't. Nevertheless, we're using   as the prefix for four branches:   for Finnic,   for Mordvinic,   for Permic, and   for Ugric. I propose we use  for these instead, thus moving as follows: At the same time, we should move the codes for the corresponding protolanguages: as well as the code for the etymology-only lect Proto-Finno-Permic: I suppose we can keep  as an etymology-only variant of   if it's important. What do others think? —Aɴɢʀ (talk) 14:29, 23 December 2016 (UTC)
 * That seems like a lot of disruption for a small theoretical benefit: we've always used codes like aus, cau, nai and sai that we don't recognize as families for making exception codes, so it's not a huge violation of our naming logic. In this case, though, it looks to me like we don't recognize fiu more because it's too much like urj, not because it's invalid, per se (though I don't know a lot on the subject). We do have gmw-fri rather than gem-fri, for instance. Of course, I'd rather follow those who actually work in this area- especially . Chuck Entz (talk) 20:59, 23 December 2016 (UTC)
 * I have no opinion on moving around family codes either way (it doesn't seem they actually come up much, whatever they are), but if we start moving around the proto-language codes, I would like to suggest simple two-part codes. Proto-Samic and Proto-Samoyedic are already  and , so is there any reason we couldn't make do with e.g.  ,  ,   etc.?
 * Also, as long as we're on this topic, at some point we are going to need the following:
 * Proto-Mansi:
 * Proto-Khanty:
 * Proto-Selkup:
 * No rush though, since so far we do not even have separate codes for their subdivisions. The only distinction that comes up in practice is distinguishing Northern Khanty from Eastern Khanty (Mansi and Selkup only have one main variety that is not extinct or nearly extinct). --Tropylium (talk) 10:53, 24 December 2016 (UTC)
 * There is, actually, a reason: our exception codes are designed to avoid conflict with the ISO 639 codes, so they start with an existing ISO 639 code or a code in the qaa-qtz range set aside by ISO 639 for private use. fin is one of the codes for the Finnish language. fpr and ugr are apparently unassigned- for now. As for the three proto-language codes, those don't need a family prefix because they already start with an ISO 639 code. Chuck Entz (talk) 13:06, 24 December 2016 (UTC)
 * The only codes in the qaa-qtz range we actually use are  as a prefix for otherwise unclassified families and   for Sahaptin (a macrolanguage that wasn't given an ISO code of its own), right? —Aɴɢʀ (talk) 07:55, 26 December 2016 (UTC)
 * Right. And I didn't notice that we used qot although it is not an ISO code; it seems we followed Linguist List in using it. For consistency, I suggest changing it to fit our usual scheme, so nai-spt or similar (nai-shp is already in use as the family code). - -sche (discuss) 05:10, 27 December 2016 (UTC)


 * Support, for consistency. fiu is different from nai, because fiu is [a supposedly genetic grouping which is] agreed to be encompassed by a higher-level genetic family which also has an ISO code (urj), and that code can be used if we drop fiu. nai and sai are placeholders rather than genetic groupings, and they're useful ones, because If we dropped them we'd had to recode everything as qfa- (and might conceivably run out of recognizable/mnemonic codes at that point). - -sche (discuss) 05:10, 27 December 2016 (UTC)
 * Support per -sche, both the main issue being suggested here, as well as recoding Sahaptin. —Μετάknowledge discuss/deeds 05:57, 19 January 2017 (UTC)


 * I've recoded Sahaptin and all the Finno-Ugric lects except fiu-fin-pro which requires moving a lot of categories, which I will get to later. - -sche (discuss) 17:20, 8 March 2017 (UTC)
 * I can volunteer to carry out the move from fiu-fin-pro to urj-fin-pro if there's still consensus that it should be done (which there is, as far as I can tell). &mdash; S URJECTION / T / C / L / 09:19, 12 January 2023 (UTC)

Now there are lots of module errors in Cat:E as a result of these language code changes. It might be easiest to fix them by bot. , what do you think? — Eru·tuon 22:45, 9 March 2017 (UTC)
 * I'll see what I can do. DTLHS (talk) 23:09, 9 March 2017 (UTC)
 * I've done a bunch of them- I think the reconstructions should be fixed by hand. DTLHS (talk) 23:22, 9 March 2017 (UTC)
 * Three incorrect language codes remain, I think: . I couldn't figure out what   should be; it seems to refer to Proto-Finno-Permic, but I searched various language data modules and didn't find a match. Is there someone who can look through and fix the remaining module errors that relate to incorrect language codes?, , ? — Eru·tuon 04:52, 11 March 2017 (UTC)
 * Proto-Finno-Permic is an etymology-only language (and a kind of a legacy concept) that we encode as a variety of Proto-Uralic, if that helps. --Tropylium (talk) 13:58, 11 March 2017 (UTC)
 * The categories were easy to deal with: you just change the derivcatboiler to auto cat and the template plugs in the correct language code, if it exists. That also makes it a quick way to check whether there is a correct language code. by the time I finished that, there were only a dozen or so entries left in CAT:E due to everyone else's efforts, so I finished off the remainder by hand. It would have been easier if there hadn't been hundreds of other module errors cluttering up CAT:E- yet another reason for you to be more careful. Chuck Entz (talk) 21:10, 11 March 2017 (UTC)
 * I apologise for not catching and fixing those uses at the time I renamed the codes. I searched all pages on the site for each of the old codes, and some pages turned up [including pages where the codes were used inside some templates, and I fixed those pages], so I forgot to also do an "insource:" search to catch other uses inside templates like . We so rarely change language codes compared to changing language names. - -sche (discuss) 23:36, 11 March 2017 (UTC)
 * Moved, some earlier, but Finnic is now also in process of being moved from  to  . &mdash; S URJECTION  / T / C / L / 17:38, 14 January 2023 (UTC)

RFM discussion: New Permic codes
Hi, I would like to request three new codes: Furthermore, the newly made  should be given as the ancestor of   and. The writing system  should be removed from both these languages.
 * for
 * for Old Komi (Writing systems:  and  ; Family:  )
 * for (Writing system:  ; Ancestor:  ; Family:  )

The specifics of the codes' namings can be tweaked if there are any concerns. Pinging. Thadh (talk) 20:18, 16 November 2021 (UTC)
 * We already have  for the Old Permic script. Are you requesting that we change its canonical name to "Anbur"? —Mahāgaja · talk 08:04, 17 November 2021 (UTC)
 * Oh, couldn't find it for some reason... No, in that case it's okay. Thadh (talk) 08:20, 17 November 2021 (UTC)
 * Done. Thadh (talk) 13:17, 15 January 2023 (UTC)

RFM discussion: Mari phylogeny
It seems Eastern Mari is treated as an ancestor of Western Mari. Historically, that makes little sense though, because both are different written standards of the same continuum, so this is about akin to setting Nynorsk as a descendant of Bokmål, Komi-Permyak as a descendant of Komi-Zyrian or setting Livvi as a descendant of Karelian. I think we should set both as direct descendants of Proto-Uralic, or, alternatively, create a code for Proto-Mari. I know too little about the history of Mari languages to say anything sensible on that topic. Anyway, pinging some editors that might be interested,. Thadh (talk) 18:03, 20 February 2022 (UTC)
 * Previous discussions at Beer parlour/2013/September and Language treatment/Discussions. —Mahāgaja · talk 18:25, 20 February 2022 (UTC)


 * I changed the one line in Module:languages/data3/m that derived mrj from chm. AFAICT this is all that needed to be changed? - -sche (discuss) 04:31, 1 June 2022 (UTC)
 * Yeah, that was the minimum requirement. I thought maybe if there were any solid Proto-Mari reconstructions, we could add a code for that, but seeing as nobody answered it's better to leave that for the future. Thadh (talk) 07:18, 1 June 2022 (UTC)

Making a family code
Would anyone oppose moving  to mean "Mari" (as a family), creating   for Proto-Mari and adding the iso-intended code   for Eastern Mari? If I understand correctly, has volunteered to handle the bot jobs if needed. Thadh (talk) 18:52, 7 November 2022 (UTC)
 * No objection from me. —Mahāgaja · talk 18:57, 7 November 2022 (UTC)
 * No objection from me. Looking into the history, it seems we used the macrolanguage code chm for the standard variety (Eastern/Standard Mari) because of this discussion years ago, so I'll ping User:Atitarev who was involved in that old discussion. - -sche (discuss) 19:23, 13 November 2022 (UTC)
 * No objection from me either. Anatoli T. (обсудить/вклад) 22:03, 13 November 2022 (UTC)
 * I'll begin the move probably later today, since there are no objections. &mdash; S URJECTION / T / C / L / 08:57, 11 January 2023 (UTC)
 * Move ongoing. Please archive this discussion to Wiktionary talk:Language_treatment/Discussions, under the heading "Mari phylogeny". &mdash; S URJECTION / T / C / L / 11:34, 11 January 2023 (UTC)

RFM discussion: January 2023
The is a small Sino-Tibetan language spoken in Bhutan. We currently omit the umlaut from the name, but I suspect this is simply because ISO language names don't have diacritics in them. On the other hand, academic texts such as A Grammar of Kurtöp and An Overview of Kurtöp Morphophonemics do use it, and I think we should follow that trend. Theknightwho (talk) 15:57, 4 January 2023 (UTC)


 * Support. Thadh (talk) 16:58, 4 January 2023 (UTC)
 * Support. Vininn126 (talk) 17:01, 4 January 2023 (UTC)
 * Support. — Fenakhay ( حيطي · مساهماتي ) 17:03, 4 January 2023 (UTC)

RFM moved. Theknightwho (talk) 10:51, 14 January 2023 (UTC)
 * —Mahāgaja · talk 11:28, 6 January 2023 (UTC)

RFM discussion: November–December 2022
Literature on the form of common Mongolic spoken between the 13th and 16th centuries increasingly uses the term "Middle Mongol" rather than "Middle Mongolian". This is primarily because it was the ancestor to several extant Mongolic languages, not all of which are spoken by people who we would usually call Mongolians (though they are sometimes considered part of the - especially historically). For example, Buryat (spoken in Buryatia, just north of Mongolia) and Kalmyk (spoken in Kalmykia, around 4,000km to the west of Mongolia). There are several others.

I also feel that this would (slightly) reduce the faulty implication that Mongolian is the "primary" descendant of Middle Mongol, too. To give a comparison, using the name "Middle Mongolian" is roughly equivalent to using the name "Old Russian" for Old East Slavic: obviously less than ideal.

This is supported by Glottolog, and I can provide recent academic sources using the term if needed. Theknightwho (talk) 18:08, 4 November 2022 (UTC)


 * Support, though these proposals are usually done in WT:RFM. AG202 (talk) 18:15, 4 November 2022 (UTC)
 * Hromi duabh (talk) 14:32, 25 November 2022 (UTC)
 * Just wondering when we're changing Old Portuguese to Old Galician-Portuguese for similar reasons...--Ser be être 是 talk/stalk 21:16, 10 December 2022 (UTC)

RFM moved. Theknightwho (talk) 15:26, 23 December 2022 (UTC)

RFM discussion: July 2022–June 2023
As we're moving towards the creation of a more robust coverage of Yoruboid languages, see: agutan for a rough start, we'll need some changes made earlier rather than later. Those are the changes that we need for now, and hopefully we can continue increasing the coverage of these languages. AG202 (talk) 20:30, 6 July 2022 (UTC)
 * "Olukumi" is the name often, if not most used, to describe the language, unsure why Ethnologue and SIL went with "Ulukwumi". Some sources: Entry for Olùkùmi in the Olùkùmi Talking Dictionary, Olukumi Bilingual Dictionary, Ngram.
 * Lucumi (luq) should be changed to have its name have the accent mark on the "i" becoming "Lucumí", and then being that it is a liturgical language derived from Yoruba, it should have Yoruba as its ancestor.
 * Additionally, there's the case of Isekiri (its), which looks as if it's spelled "Itsekiri" the most in English, including by the main driver of teaching the language nowadays "Learn Itsekiri". The spelling "Isekiri" seems to come from the spelling in the language which involves a, but being as that does not exist in English, it's often been adapted as a "ts". At the moment, I am leaning towards the "Itsekiri" spelling, though I would like more input if possible.


 * AG202 (talk) 15:12, 21 July 2022 (UTC)
 * Sorry ooo, I don't usually pay attention to the notifications from Wiktionary and Wikipedia, I always think that it'll be difficult to use. About Ìṣẹkírì, in my opinion I think that you write it as Ìtsẹkírì because it isn't normal for them to write it with "ṣ" like us Yorùbá do. Egbingíga (talk) 02:44, 16 September 2022 (UTC)


 * I went ahead and changed Olukumi, per the persuasive evidence above. Not sure whether Lucumí really needs the accent; ngrams might suggest Lucumi was recently overtaken by Lucumí but it's not as overwhelming as with Ulukwumi not even getting enough hits to plot. - -sche (discuss) 23:25, 23 July 2022 (UTC)
 * @-sche Thank you for the changes! Hmmm for Lucumí, I'm still leaning towards including the accent, but I see your point. I do wish that there were more folks active here to comment on this though. AG202 (talk) 19:52, 13 August 2022 (UTC)
 * @-sche, seeing as though I'm just now seeing which had unanimous support, I'd think it'd be good to go ahead and make the change for Itsekiri. AG202 (talk) 15:19, 11 September 2022 (UTC)
 * I think Lucumí should be used as it more accurately transcribes how it is said Egbingíga (talk) 02:52, 16 September 2022 (UTC)
 * With this, I'm going to push forward with the changes. AG202 (talk) 22:26, 5 March 2023 (UTC)
 * @Benwing2, is it possible that we could move forward on these ones as well? The Olukumi change has already been made, but the others have not. AG202 (talk) 22:16, 21 March 2023 (UTC)
 * Isekiri &rarr; Itsekiri is now ✅. &mdash; S URJECTION / T / C / L / 12:17, 30 May 2023 (UTC)
 * Thanks! Lucumi &rarr; Lucumí is also now ✅. Closing this discussion since I think everything was covered, and will archive in a week. AG202 (talk) 03:11, 1 June 2023 (UTC)

RFM discussion: October 2019–August 2020
Our canonical name for  is "Fang (Guinea)", which is unfortunate since it isn't spoken in. It's spoken primarily in Gabon and Equatorial Guinea. I'd recommend calling it "Fang (Gabon)" since there seem to be more speakers in Gabon than in Eq.G. and since the name of Gabon is shorter. —Mahāgaja · talk 11:53, 17 October 2019 (UTC) I'd recommend calling it "Fang (Equatorial Guinea)" since according to Ethnologue there are more speakers in that country than in Gabon. —Mahāgaja · talk 12:09, 22 October 2019 (UTC)


 * The fact that there are even some speakers in Cameroon (according to Wikipedia) but it's not the same as "Fang (Cameroon)" is icing on the confusion-cake... and they're both Bantoid languages, so disambiguating by family doesn't help, and both spoken in Central Africa, so we can't disambiguate by mere region as we sometimes do. - -sche (discuss) 16:00, 24 October 2019 (UTC)
 * They are different branches of Bantoid, though. We could call [fak] "Fang (Beboid)" and [fan] "Fang (Bantu)". —Mahāgaja · talk 22:04, 24 October 2019 (UTC)
 * Ah, true; I saw that Wikipedia classified as "Western Beboid" but caveated that it was not necessarily a valid family, but if Beboid overall is valid, then that works and is clearer (IMO) than picking just one of the countries to list. I would support renaming them in that way. The only other languages that come to mind which are disambiguated by family are "Austronesian Mor" and "Papuan Mor" both spoken in West Papua), which are mentioned on WT:LANG; probably we should change those to "Mor (Austronesian)" and "Mor (Papuan)" for consistency. - -sche (discuss) 18:00, 25 October 2019 (UTC)


 * Moved to Fang (Bantu) language and Fang (Beboid) language. —Mahāgaja · talk 18:46, 1 November 2019 (UTC)


 * I was going to rename Austronesian and Papuan Mor to use the 'parenthetical, postpositive' naming format for consistency with the Fangs and with how languages with country-name disambiguators are named, but I notice we also have e.g. Austronesian and Papuan Gimi, Austronesian and Sepik Mari (and some other Maris), and several other such languages, and I don't have time to rename all of those, so consistency will have to wait. (There is also "Sepik Iwam", but it appears to actually get called that, to distinguish it from the other Sepik language called Iwam which is spoken in the same place and belongs to the same Iwam subfamily of Upper Sepik. Confusing!) - -sche (discuss) 08:41, 28 November 2019 (UTC)


 * The reason I haven't archived this is that several other languages, mentioned above, still need to be renamed to fit the format of the other languages. I or someone else just need(s) to find the time... - -sche (discuss) 01:48, 6 August 2020 (UTC)

RFM discussion: May–September 2023
Currently, the way they are treated makes no sense: Proto-Huitoto-Ocaina is handled as the parent of Proto-Witotoan and nothing else. In our word list, Proto-Witotoan is used to denote Proto-Bora-Witoto, which is a macroreconstruction that is very speculative. I propose we merge these into one language, Proto-Witotoan, which seems to be the more common term.

Notifying. Thadh (talk) 15:32, 9 May 2023 (UTC)


 * No objection here. —Mahāgaja · talk 15:36, 9 May 2023 (UTC)


 * If I recall and understand correctly (?) the current setup is based on your request here, where I noted the issue of the broader/older grouping having been set in some entries as the child of a smaller/younger grouping. No objection to changing it. I see that a handful of Murui Huitoto entries use Proto-Witotoan in their etymologies, but these should be unaffected by removing Boran from its scope. - -sche (discuss) 16:53, 9 May 2023 (UTC)
 * Oof, I didn't even recall that discussion - I guess I was a bit too hasty, sorry. If there are no further objections, I'll just manually remove and/or change the reconstructions from the Murui Huitoto etymologies and fix this. Thadh (talk) 17:05, 9 May 2023 (UTC)
 * Merged. Thadh (talk) 20:34, 8 September 2023 (UTC)

RFM discussion: August 2015
Ethnologue has assigned codes to some but not all of the varieties of West African Pidgin English, and we in turn have incorporated some (e.g. pcm) but not all (e.g. not gpe) of those codes. As WP notes, the "contemporary English-based pidgin and creole languages are so similar that they are sometimes grouped together under the name 'West African Pidgin English'" (a name which also denotes their predecessor which developed in the 1700s). WP's examples are illustrative, particularly in that its Ghanaian and Nigerian Pidgin English examples are identical. I propose to merge at least the following three varieties into wes, renaming it "West African Pidgin English": We could also discuss whether or not to merge Sierra Leone Krio (kri, which WP notes its often mistaken for English slang due to its similarity to English, but which has a somewhat distinct alphabet), Pichinglis / Fernando Po Creole (fpe), and Liberian Kreyol / Liberian Pidgin English (lir). - -sche (discuss) 21:11, 11 August 2015 (UTC)
 * 1) Ghanaian Pidgin English (gpe)
 * 2) Nigerian Pidgin English (pcm)
 * 3) Cameroonian Pidgin English (wes)
 * The question is a very complex one. Firstly (but of least importance), scholars are divided on which lects have creolised and which have not, but it is generally agreed upon that at least some of the language you mentioned are not pidgins, which would make the name "West African Pidgin English" somewhat of a misnomer (the more neutral name "Wes-Kos" have been suggested as an alternative, but even linguists haven't fully adopted it). Secondly, all these lects are remarkably similar on a lexical level, but that's unsurprising; after all, they resulted from separate but very similar language contact events, and then probably modified each other (one scholar posits that Krio and Cameroonian Pidgin English relexified each other to some degree after pidginisation). The similarities are also obscured by the fact that there is nothing close to an agreed orthography for most of these, and pronunciation does differ a bit across West Africa. Linguistically, I'd probably merge them all, but practically that may not be the best decision. I know we have entries in pcm, but probably next to nothing for the rest, and if somebody wants to add them, given how each lect is very neatly assigned to a certain West African country, at least it won't be confusing for them to do so. Conclusion: the literature is schizophrenic, the lects mutually intelligible, and the existing situation remarkably unproblematic. Therefore I abstain. —Μετάknowledge discuss/deeds 21:19, 16 August 2015 (UTC)

look

RFM discussion: July 2016–December 2020
I see no evidence that this exists as a separate language, and move that it be merged with tr. The literature which references it seems to describe the dialect of Turkish which may be spoken by Gagauz people in the Balkan Peninsula. —Μετάknowledge discuss/deeds 20:17, 3 July 2016 (UTC)


 * Wikipedia, citing Ethnologue, insists that Balkan Gagauz Turkish, Gagauz, and Turkish are all separate, and a few sources do seem to take that view, e.g. Cem Keskin, Subject agreement-dependency of accusative case in Turkish, or, Jump-starting grammatical machinery (2009) speaks of "Balkan Gagauz Turkish, Gagauz, Turkish, Iraqi Turkmen, North and South Azerbaijani, Salchuq, Aynallu, Qashqay, Khorasan Turkic, Turkmen, Oghuz Uzbek, Afshar, and possibly Crimean Tatar". Other references speak of Balkan Gagauz Turkish as a variety of Gagauz, e.g. James Minahan's Encyclopedia of the Stateless Nations says "The Gagauz speak a Turkic language [...] also called Balkan Gagauz or Balkan Turkic, [which] is spoken in two major dialects, Central and Southern, with the former the basis of the literary language. Other dialects [include] Maritime Gagauz" (which comports with Gagauz's list of its dialects). Matthias Brenzinger's Language Diversity Endangered also treats Balkan Gagauz "or slightly misleading, Balkan Turkic" in his entry on Gagauz, but says it that the Balkan "varieties might deserve the status of outlying languages but very little information is available about them." (A few generalist references seem to subsume all  into  .) I would leave them all separate, pending more conclusive evidence that they should be merged. - -sche (discuss) 23:58, 3 July 2016 (UTC)
 * I think there's some confusion about what exactly we're talking about, and whether it's Gagauz or Turkish. Just because they use the term "Balkan Gagauz Turkish" doesn't mean that they're referring to the language with ISO 639-3 code bgx. When I look at who's citing the references listed for bgx at Glottolog, Manević (the reference for its classification) is cited in papers clearly talking about the dialects of tr. These are the only actual words attributed to this lect that I can find. —Μετάknowledge discuss/deeds 00:33, 4 July 2016 (UTC)


 * , on the subject of Turkic languages spoken in Europe, do you know anything about this one, and about its differences or similarity to Gagauz and standard Turkish? - -sche (discuss) 01:08, 11 May 2017 (UTC)
 * I'm not previously familiar with this dispute, but here are a few handbooks on the topic:
 * Menges in The Turkic Languages and Peoples has the following slightly complicated quote (p. 11): "The Turkic languages spoken farthest west are the Balkanic dialects of Osman and Gagauz in Bosnia, Bulgaria and Macedonia. These seem to form two groups, one of possibly pre-Osman origin, and a later Osman one. To the former belong the Gaǯaly in Deli-Orman (Eastern Bulgaria), who, according to V. A. Moškov, are descended from the Päčänäg, Uz, and Torci (?), the Surguč, numbering about 7000 people in the district (vilājät) of Edirnä, who call themselves Gagauz. In Moškov's opinion, they, too, go back to the Päčänägs (?) and the Macedonian Gagauz; they number ca. 4000 people in southeastern Macedonia." — It seems clear that some group(s) corresponding to "Balkan Gagauz" is being identified here, but I am not even sure how to parse the sentence structure; e.g. are "Uz" and "Torci" some of the pre-Osman Turkic groups, or some of the alleged ancestors of the Gaǯaly? ("Osman" is, of course, Turkish.)
 * Hendrik Boeschoten in a classificatory chapter in Routledge's The Turkic Languages mentions that "a few speakers [of Gagauz] in northern Bulgaria, Romania and Greece, adhere to the Orthodox faith, and have their own history." This again seems to refer to "Balkan Gagauz", but with no indication of being its own language.
 * So far I would gather from this that "Balkan Gagauz" is at most a sister language of "non-Balkan Gagauz", and perhaps indeed just a different dialect group (perhaps one whose features are not reflected in written standard Gagauz). But the Manević 1954 paper would be more informative on this topic, if anyone wants to hunt it down. --Tropylium (talk) 11:55, 11 May 2017 (UTC)


 * Here's an old, unresolved issue that could benefit from Turkicist eyes. —Μετάknowledge discuss/deeds 23:59, 8 September 2018 (UTC)
 * I think Balkan Gagauz should be merged with gag, especially since it contains no entries. The few terms that would be specific for Gagauz spoken outside of the traditional Gagauz area in Moldova/Romania/Bulgaria can be dealt with within gag entries. The only thing is that some etymologies of other Turkic languages sometimes refer to Balkan Gagauz instead of Gagauz, because editors didn't know the difference between two. Otherwise I don't see any problems with merging them two.
 * On the other hand, Gagauz should definitely NOT be merged with Turkish, that is pretty obvious to me.Allahverdi Verdizade (talk) 05:09, 9 September 2018 (UTC)
 * This is a hard question, I can offer only guesswork.
 * I can't find any good maps for the distribution of Gagauz and (Muslim) Turks proper in the Balkans, most don't show Balkan Gagauz at all although we know they exist at least in Bulgaria and Macedonia.
 * It seems that they are not easily separated geographically from Muslim Turks although they presumably live in different localities. I'm guessing this means that their languages ("Balkan Gagauz Turkish" and "Rumelian Turkish") could be the same, although maybe only the latter call their language "Turkish", so I guess that they (would?) use Standard Turkish in education and administration.
 * This would be a good argument to merge Balkan Gagauz into Turkish, except that this paper shows that Balkan Turkic (if this really is a single language) is quite distinct from Anatolian Turkish and perhaps worth considering a different language. Baskakov also considers Balkan Turkish and (Moldovan) Gagauz to form a clade within Oghuz and Anatolian Turkish and Azerbaijani to form another. Crom daba (talk) 21:35, 30 September 2018 (UTC)
 * , can you find anything in Turkish on the possible differences between Balkan Gagauz and Rumelian Turkish? Crom daba (talk) 21:38, 30 September 2018 (UTC)
 * Merge / delete it. The distribution of the name, the way it is “mentioned”, points towards it being a ghost language. The name is not attestable as used by anyone having particular information about it; nobody can add anything under it either in such a situation where it is a content-filled concept for nobody. Its alleged synonyms “Balkan Turkish” and “Rumelian Turkish” show it is just an SOP term for Turkish as spoken on the Balkans respectively Rumelia, i.e. remnant speakers of the Ottoman rule. German Balkantürkisch, distinguished from Türkeitürkisch as a regiolect. Fay Freak (talk) 13:38, 2 December 2020 (UTC)

look

RFM discussion: July 2016
Maridan [zmd], Maridjabin [zmj], Marimanindji [zmm], Maringarr [zmt], Marithiel [mfr], Mariyedi [zmy], Marti Ke [zmg]: should these be merged? References speak of a singular Marrithiyel language. - -sche (discuss) 21:30, 20 July 2016 (UTC)

RFM discussion: February–March 2017
This is not a separate language at all, it's just English with different grammar and some loanwords, but other than that it's completely intelligible with standard English. As such, it should be moved to Category:Chinese English. -- Pedrianaplant (talk) 15:19, 8 February 2017 (UTC)
 * That's not at all the impression I get from . It seems to be a distinct language to me, as much as any other English-based pidgin. —Aɴɢʀ (talk) 16:45, 8 February 2017 (UTC)
 * We did delete Hawaiian Pidgin English in the past though (see Template talk:hwc). I don't see how this case is any different. -- Pedrianaplant (talk)
 * I know we did, but I didn't participate in that discussion (only 3 people did), and I disagree with it too, probably even more strongly than I disagree with merging cpi. —Aɴɢʀ (talk) 17:02, 8 February 2017 (UTC)


 * Basically, this is a terminological problem. There may have been a true pidgin in each of these cases, but it has not been recorded. What is called a pidgin in many descriptive works is instead a dialect of English that is very easy to understand, nothing like the real English-based pidgins and creoles that I have studied. If you look at the actual quotations used to support lemmas in Chinese Pidgin English, you find that it is Chinese English. Support merge, but leave [cpi] as an etymology-only code. —Μετάknowledge discuss/deeds 23:16, 8 February 2017 (UTC)


 * At least some texts seem very distinct, to the point of unintelligibility; consider "Joss pidgin man chop chop begin" (Whedon's translator begins chopping things? or "god's businessman begins right away"?). On the other hand, other sentences given by Wikipedia are quite intelligible...and possibly not attestable under the stricter CFI to which English is subject. I'm not sure what to do. (Our short previous discussion also didn't reach a firm resolution.) - -sche (discuss) 17:46, 8 March 2017 (UTC)
 * I mean, I use and  in English normally (having grown up in a fairly Chinese environment likely has something to do with that)... and I think that was chosen as an especially extreme example. —Μετάknowledge discuss/deeds 03:32, 25 March 2017 (UTC)

RFM discussion: November 2018–December 2023

 * 1) Pray somebody add {"Narb"} to Module:languages/data3/x after line 1026 for xna. (Otherwise mentions of words in it are shown in slanted letters.)
 * Added. DTLHS (talk) 03:17, 14 November 2018 (UTC)
 * It seems that even MediaWiki:Common.css needs a new class for Narb added, to get css; Sarb is there and has it, Narb is not there. If the mention of a North Arabian word in works then it is complete. Also I see that in Module:scripts/data Narb does not have css while Sarb has. Fay Freak (talk) 14:43, 15 November 2018 (UTC)
 * Good catch. I've updated Common.css and Mobile.cc and set it to display rtl. Sadly, it seems there are no fonts that display it. If you or I could find a good image of what the letters are supposed to look like, I might have time to make a basic font iff the letters don't have to be joined the way they do in Arabic. - -sche (discuss) 22:08, 18 November 2018 (UTC)
 * I as an Archfag recently had a great update three weeks ago that adds displaying support for Old North Arabian, amongst other things like which improved Arabic and Syriac script rendering everywhere. gucharmap calls the name of the font by “Noto Sans Old North Arabian”, which I find in the filelist of the noto-fonts package. Fay Freak (talk) 22:29, 18 November 2018 (UTC)
 * 1) I think everything under Category:Old North Arabian script languages should be “Ancient North Arabian” (xna), it is to wonder that Dadanitic (sem-dad), Hismaic (sem-his), Safaitic (sem-saf), Taymanitic (sem-tay), Dumaitic (sem-dum), Hasaitic (sem-has), Thamudic (sem-tha) are separate languages on Wiktionary (some also with no script assigned). (Prolly someone went through some lects and added all he found.) Those lects are at a level of attestion or study where it does not even matter whether they are dialects or languages, and “” is even a collective term for any of the Ancient North Arabian lects not further classified. Many inscriptions cannot be classified unto more specific lects anyway (you know, people also were nomads and wrote graffiti here and there) and they can only be entered as “Ancient North Arabian”. With words being found randomly and in concise consonantal writing I don’t see why one would pursue separation other than by stating the find spot.
 * 2) Also, “Qatabanian” (xqt), “Sabaean” (xsa), “Minaean” (inm), “Harami” (xha, redirects to “Minaean” on Wikipedia), Hadrami (xhd) – likewise otiose distinctions, regarding form and amount of attestion of Epigraphic South Arabian, as the name says only epigraphically attested, without any vowels –, have been unpopular in use already, entries and etymologies use the header “Old South Arabian” (sem-srb). I suggests to cross out those. Etymology-only is possible so one can use those in  when in an individual case a word is known to be attested as of one of the dialects. North Arabian epigraphy categorization is more complex and it is better anyway to mention in each etymology where a lexeme has been encountered.
 * 3)  (sem-him), as an attested language, is rather mythical because the Ḥimyarites wrote Sabaean. Wikipedia mentions “three Himyaritic texts”, at the same time in the Encyclopedia of Arabic Language and Lingustics s.v. we read about two: “It is not even possible to establish whether they were written in the same language. The first text dates from around 100 C.E. and the second from around 300 C.E.” And about the secondary material from Early Medieval Arabs: “It is easy to see that quotations from Himyaritic offer very different readings according to the manuscripts.” Or according to others, mentioned in the EALL, Ḥimyarite is the same as Arabic, only with peculiar features (which might as well derive from Arabicized transmission, or later language fusion or whatever, much that could fool us). It could be grouped with those  if this category held languages from Antiquity.
 * 4)  is according to, it’s most eminent scholar, one language with twelve dialects; others share this view. The material for this language, particularly by Leslau across his works, only lists words as “Gurage”, without qualifying if they are “Inor”, “Mesqan” or some other Gurage, so on Wiktionary one cannot simply give “Gurage” words (which has recently been done in Semitic comparisons by abusing the code of the largest dialect Sebat Bet Gurage, in spite of the source saying “Gurage”). The following dialects I find on en.Wiktionary as languages: Kistane/Soddo (gru), Mesqan (mvz), Sebat Bet Gurage (sgw), Silt'e (stv), Inor (ior), Muher (sem-mhr), Mesmes (mys), Chaha (sem-cha), Wolane (wle), Zay (zwa); some of these are considered subdialects of Sebat Bet Gurage. There are more I don’t find on Wiktionary. It’s perhaps like with the Aramaic dialects yore or the Low German dialects today. People publish Westphalian dictionaries but it’s still Low German and so treated by Wiktionary. I suspect that instead of holding controversial subdivisions deriving from Ethnologue we should, holding to the sources, keep the Wiktionary-language level higher. The source for a certain word can be further qualified by labels as with Coptic. I mean that with language, unlike with biological taxonomy, one cannot simply assume that distinctiveness of a taxon is ascertained by experiments and then authoritatively published in some reference. As the individual forms are described in this dictionary, one must weigh if the data allows distinction at all. Currently it looks to me that hence Gurage must be lumped; I don’t know if, with new data or emerging different literary standards, separating the lects with separate codes will later be convenient (the increase in language material will be disappointing and unlikely someone will come and add Gurage in thousands of entries anyway, let’s be realistic), but I doubt that it would be comfortable. See also Why is Old Novgorodian a separate language in Wiktionary? This is the question: Is the difference in data enough to justify separation? The actual language-dialect distinction does not matter, it must be seen functionally, for dictionary purposes, for dictionary purposes. And if linguists publish material as “Gurage” the distinction is probably not good for Wiktionary headers. Isn’t it out of scope of Wiktionary to distinguish lect clusters when they are generally unwritten and chiefly written by and variously lumped and splitted by linguists? That’s a difficult question. Also I fear that such distinctions might be precisely the cause why nobody comes and pours out his rich Gurage knowledge. An adept would not be sure to distinguish, pendulating between two extremes, not witting if he should split as much as he can by all kinds of criteria or if to standardize and to abstract. To help though first all mentioned codes need the Ge'ez and Latin script both assigned, and the macrolanguage created. Maybe there will be late order from early ambiguity. Though I would perhaps do the order by lumping and labelling by location, were I that certain aficionado.
 * The obese Wiktionary:List of languages currently comprising 8055 lects needs cuts however. Fay Freak (talk)
 * This discussion really belongs at rfm, because that's where we normally discuss changes to whether or how we recognize a language. The Grease pit is for discussing how to implement something along those lines- not whether it should be implemented. The other option would be at the Beer parlour, but this seems like something that would benefit from the more specialized focus of rfm. Chuck Entz (talk) 03:39, 14 November 2018 (UTC)
 * Good distinction. I hesitated at 4:13 AM where to put it because of the mixed content. Moved. Fay Freak (talk) 14:16, 14 November 2018 (UTC)


 * Some prior discussion of Thamudic et al is on Category talk:Hismaic language; IIRC they were separated because literature does mention them as distinct entities, but if they were very similar or often treated as one language, and especially if there's difficulty in assigning specific texts to specific ones due to similarity, that would be an argument for reversing that decision and going back to the conservative approach of treating them all as one language with 'dialect'/'region' labels where appropriate. (As to the venue, yes, these discussions tend to happen on RFM for quirky historical reasons — originally the discussions entailed actually merging or splitting language templates — although some have proposed the Beer Parlour as a more logical venue. There are minor benefits and drawbacks to either venue; this venue does have the advantage that discussions stay on the page until resolved.) - -sche (discuss) 17:20, 14 November 2018 (UTC)
 * I avoided Beer Parlour because I thought it is only for matters already affecting people, but it would not affect anyone we know now. Fay Freak (talk) 14:43, 15 November 2018 (UTC)
 * Who is likely to have access to resources on Africa's Semitic languages that could help judge what to do with Gurage? User:Metaknowledge, User:Wikitiki89? Wikipedia insists "The Gurage languages do not constitute a coherent linguistic grouping", which seems incompatible with merging them. William A. Shack, in his book on The Gurage, writes that "each Gurage dialect is usually understood only by its own speakers, and there is a rough correlation between the contiguity of dialect groups and the extent to which their dialects are mutually intelligible." (Steven Danver, in his (general-focus) encyclopedia, says "the languages of the different groups of Ethiopian Gurage are seldom mutually intelligible.") Marvin Lionel Bender, in his 1976 Language in Ethiopia, says "Although seventeen varieties of Gurage dialects are listed, mutual intelligibility reduces this to four languages and three dialect clusters as follows (Hetzron classification): Gogot, Misqan, Muxir, Soddo  East Gurage (Inneqor, Silti, Urbareg, Weleni, Zway)  Central West Gurage (Chaha, Gumer, Gura Izha)  Peripheral West Gurage (Ener, Geto, Indegegn, Innemor)" However, his very next sentence is: "Gogot, Muxir, Soddo comprise a geographical (non-genetic) grouping of non-mutually-intelligible languages known as 'North Gurage'", all of which seeems to suggest that merging all of the Gurages would not be sound. - -sche (discuss) 17:28, 14 November 2018 (UTC)
 * The cited grouping of course adds to the confusion. Three languages, but four dialects clusters, not mentioning their intersections? Well, we will not find out how one should see them without deep-diving. But the question is which direction Wiktionary should go: likely the current division is not correct. Should Wiktionary just add all possible splits so they can be cleaned up later when someone would commit himself to add the whole Gurage and judge about which distinctions are most convenient or should we have one macro-code because distinction is hopeless? The reason why I have even mentioned Gurage is that for example Leslau’s Etymological Dictionary of Geʿez which I like to use just gives words as “Gurage”, which sounds like there is a common vocabulary. Fay Freak (talk) 14:43, 15 November 2018 (UTC)
 * Perhaps you can deduce from Leslau's literature list which Gurage language he gets his data from? He seems to have written an etymological dictionary of Gurage as well, presumably its foreword could clear things up.
 * His own field studies. I hade linked his Etymological Dictionary of Gurage (“according to ” etc.). Fay Freak (talk) 15:23, 17 November 2018 (UTC)
 * As a volunteer project (run on fancy), we really have no other choice than to wait for someone to investigate the matter deeply and order the languages in a manner that facilitates their lexicographical work.
 * Maybe we need non-genetic language group categories and ways to give forms in unindentified languages belonging to language groups. Crom daba (talk) 15:49, 15 November 2018 (UTC)


 * : A bit late, but here are my responses to the three outstanding problems (your #2–4):
 * It is fairly evident that Ancient North Arabian is not a single language, and I advocate that sem-xna be abolished rather than the specific language codes; read Al-Jallad (2018), "What is Ancient North Arabian?". He sees Safaitic (which he has written a grammar of) and Hismaic as being of the same continuum as Old Arabic, but they are obviously too distinct from Classical Arabic for lexicographical purposes. He supports the distinctness of the others as languages, and of the various "Thamudic" lects. Based on Al-Jallad, I would prefer we split Thamudic B, C, D, etc as necessary; each language will have a very small corpus, but it seems like the most honest way to do it, and if more inscriptions are found, the lettered Thamudic wastebaskets will probably get their own names as the others did.
 * Old South Arabian is also not a single language, though Sabaean was the standard that the other lects imitated, and I advocated that sem-srb be abolished as well. Multhoff (2019) in The Semitic Languages makes the case for four distinct languages: Sabaean, Minaean, Qatabanian, and Hadrami. She makes no mention, however, of Harami. Macdonald (2000), "Reflections on the linguistic map of pre-Islamic Arabia" explains that "Harami" is a name given to a few Sabaean texts that seem to have been contaminated by other Semitic languages, which is not at all an unusual feature and not unique to that site, so I suggest we remove that code.
 * As for Himyaritic, I now think I was wrong to include it. There are three texts often attributed to it, but see Stein (2008), "The ‘Himyaritic’ Language in Pre-Islamic Yemen", which makes a strong argument to consider these as simply very late examples of Sabaean, which is indisputably the language of the other texts of the region in that script.
 * Finally, for Gurage, the chief problem is that some scholars follow Hetzron in saying that Gurage is polyphyletic, in which case lumping would be committing a grave error (and the same charge has been levelled for Aramaic, with perhaps more evidence). Meyer (2011) in the International Handbook does seem to support the unity of Gurage, and treats the lects together, which gives me hope for lumping, but he is unwilling to commit to whether they should be considered dialects or languages. I think your Gurage-adding genius is mythical, so we have to choose which is least bad: many languages with scanty coverage, because their forms may be similar to forms entered under a different L2 header; or one Gurage language with decent coverage, but many forms that are not marked for what dialect they belong to and therefore a poor resource. I hesitantly support merger, given those choices. —Μετάknowledge discuss/deeds 03:13, 10 August 2020 (UTC)
 * An addendum: "Hadrami" is a terrible name for xhd, and invites confusion with Hadrami Arabic. Wikipedia uses "Hadramautic", but N-grams and a quick literature review suggests that "Hadramitic" is more common. again (yes, I know I'm pestering, but I don't want to move forward on all this alone, both because I am fallible and because some of these, particularly splitting OSA, would require a bit of work, although in that case there is an online corpus that will help immensely). —Μετάknowledge discuss/deeds 02:40, 17 August 2020 (UTC)


 * Re North Arabian: Many works I browsed through speak of Old North Arabian as a unit with dialects, but also carefully specify what lects (including Thamudic B vs C, etc) words are attested in. Some imply, in their presentations, that a large number of words are identical between dialects, at least in the sample of vocabulary that they're treating (e.g., the pronouns treated in Roger D. Woodard, The Ancient Languages of Syria-Palestine and Arabia (2008), pages 197-198), though this seems to be because the authors are presenting 'normal', normalized and romanized forms, given Al-Jallad's evidence that words (even the supposedly distinctive definite article) varied not just among dialects but even within the writings of individual speakers. The native script also loses many possible differences in pronunciation, but then, we are a written, writing-based dictionary. I find slightly more works speaking of "Ancient North Arabian dialects" than "Ancient North Arabian languages", and the fact that some authors have argued the varieties are the same language not only as each other but even as Arabic itself does suggest a high degree of similarity (or that the scholars in question are lumpers). As we're dealing with small, extinct and apparently clearly delineated corpora, it seems like the conservative approach of treating each under its own L2 could be better, and we could retire xna ... unless we need it as a wastebasket for unsorted things, which Al-Jallad (and Fay Freak, above) suggests we would. (Bah, It's messy business, deciding what's a language and what's a dialect...) I will try to dig into the rest later. - -sche (discuss) 04:10, 19 August 2020 (UTC)

Well myself I have added Sabaean, Minaean, Qatabanian entries meanwhile, understanding and quoting a few inscriptions, although apart from some occasional features I noticed little how such an inscription can be classified as either, other than by provenience or rulers or gods mentioned—but that must be due to my blasé comparative approach that also makes me read Romance without recognizing the individual language. So somehow the volition to a merge is gone, though the lumping codes “Old South Arabian” and “Old North Arabian” must be kept for inscriptions no one has classified. Both are useful.

For Himyaritic, however, nothing is left. As here said already, the three alleged Himyaritic inscriptions don’t even need to be in the same language, and they aren’t even from anything to be called Ḥimyar (there are “Lesser Himyarites” and “Greater Himyarites” and the ethnic identity is fragmentary, too, by the way). In the “Critical Reevaluation” of the Ḥimyaritic language – cited by Wikipedia on one does not know what for: their “undeciphered-k language” header recently introduced is surely a made-up term, oddly suggesting that  these inscriptions are yet another language when those “k-language” inscriptions are exactly those otherwise claimed for Himyaritic, so we see Wikipedia editors had no clue and phantasize together languages due to their disdain for primary sources – helpfully includes a map, also coming to the conclusion “we have no reason to assume the existence of some “non-Ṣayhadic” language in pre-Islamic Yemen that was spoken besides the (Late) Sabaic idiom known from the inscriptions.” That from the fact that “Himyaritic” words typically given from Arabic sources are all also found in Sabaic, and the grammar found in the three inscriptions, including the prefixed instead of postfixed article which is only found in two of them, is too either found in Sabaean or can well be ascribed to their being poetry, which is also the reason for their being poorly understood. Many Arabic poems are also hard to understand and mostly helped by the copious material for the language which is not the case for languages with so limited a corpus, like Old South Arabian. Even in the, Latin prose, not all passages are of discoverable meaning.

What would hinder man though to add understood words with quotes from the ominous inscriptions as Sabaean? Or anything from Arabic sources transmitted as Himyaritic instead of Arabic as Sabaean? For there is no evidence for it being a particular language. You see, from the corpus-based standpoint Wiktionary takes Himyaritic must go. Nothing can get the header “Himyaritic”, it can only be mentioned at Sabaean or Old South Arabian entries that Himyaritic nature is suggested by those who have come to believe in this extraordinary claim for which extraordinary evidence is not provided. Fay Freak (talk) 04:18, 6 August 2021 (UTC)
 * I went on and moved our only “Ḥimyaritic” entry after that famous sentence to Yemeni Arabic in which the word for “gold” turns out otherwise known, and to be nothing else than Classical Arabic  meaning “refined” and therefore gold, while Old South Arabian could not have developed such sense, so it is clear the famous quote one has been so inept to classify is at best only macaronic Sabaean-Yemenite Arabic. It is well put by Marijn van Putten:
 * The Arab grammarians were interested in describing correct usage of language of Classical Arabic. It is quite clear that Himyaritic (and by extension Yemeni Arabic) did not fall in the category of 'correct usage'. Within this context, it is of course not surprising that anything that is "wrong" and from Yemen might be denoted as Himyaritic. This would then include both varieties of Yemeni Arabic and some surviving vestiges of Ancient South Arabian. Fay Freak (talk) 04:59, 13 September 2021 (UTC)
 * Now also in a new article by Koutchoukali like communis opinio, though his blogs transpire by him stalking Wiktionary: later Muslim historians would refer to anything related to South Arabia’s pre-Islamic history as “Himyaritic,” all memory of its other states having passed away. Fay Freak (talk) 01:01, 18 September 2021 (UTC)


 * So are we gonna delete Himyaritic now? I promise there is nothing to add in it, and I have ever been averse to use the code, as on بطيخ; everything one needs to know about it is in the dictionary entry Himyaritic. Everything else in this five-year old topic is resolved:
 * Old South Arabian and Ancient North Arabian are both necessary evils in addition to individual languages of the Old South Arabian and Ancient North Arabian families because inscriptions are all over the place and sometimes not informative enough for specific identification, so we can’t remove the languages like we have removed Nahuatl, but like for Nahuatls the language treatments use to be sharp. Gurage nobody will reliably untangle in the near future; we may have a family under sem-eth for the “Gurage cluster”. Fay Freak (talk) 03:07, 6 December 2023 (UTC)
 * OK, per the discussion above I have removed Himyaritic. - -sche (discuss) 14:43, 7 December 2023 (UTC)
 * I struck the thread then. Fay Freak (talk) 00:52, 8 December 2023 (UTC)

RFM discussion: September 2018–February 2024
Any objections to me renaming 🇨🇬  (4 entries) and 🇨🇬   (0 entries) to  and, respectively? Arawak is easily confused with the Arawak/Arawakan proto language and family, and Carib is one of two often confounded languages, the and the. --Victar (talk) 04:03, 6 September 2018 (UTC)
 * No objection to renaming Arawak, but I'm not sure about Kalhiphona, which seems to be quite rare even on a Google web search, and which seems to invite as much possible confusion (in its various spellings) with the various spellings of Garifuna as it avoids with other "Carib"s. - -sche (discuss) 06:56, 19 September 2018 (UTC)

Arawak → Lokono ✅. Island Carib → Kalinago (rather than Kalhiphona, after further discussion at ) ✅. — Vorziblix (talk · contribs) 20:38, 2 February 2024 (UTC)

RFM discussion: July–August 2021
The decision to deprecate Lenape as a canonical language needs to be revisited. It seems to me that removing Lenape as a canonical language is like removing French as a canonical language and instead having only Quebecois and Parisian. Classifying Wënami and Munsee as seperate canonical languages is a narrow linguistic differentiation that quickly spills into the political. In a 2013 discussion (archived at Category talk:Unami language) the difference between Wënami amd Munsee was compared to English and Scotts. This is a faulty comparison because the political and cultural situation is in no way analogous (and language versus dialect is always a deeply political not just academic subject). The continuum of dialects once spoken in Southern New York, New Jersey, Delaware, and Pennsylvania were on the brink of disappearing forever. What we are witnessing now is a process of standardization that could revive the Lenape language. I hope the Wiktionary community would not want to contribute to fracturing such efforts. Unnecessary rigidity about what is to be considered a canonical language could jeopardize such standardization and revival. I strongly urge everyone to reconsider having a canonical language category for Lenape, and specifying the dialect of origin for specific terms, usages, and grammatical conventions. Note that all orthographies for Lenape are relatively recent inventions based on various European languages, and that standardization here is more than warranted. Andreas.b.olsson (talk) 05:17, 29 July 2021 (UTC)
 * Chuck Entz (talk) 05:45, 29 July 2021 (UTC)
 * Also pinging as a participant in the previous discussion. —Mahāgaja · talk 06:06, 29 July 2021 (UTC)


 * This is tricky. Some people want to document each language as such, especially with an eye to their past as mutually unintelligible languages (Marianne Mithun, The Languages of Native North America, 2001, page 331; Siebert even seems to suggest Munsee was more closely related or similar to Mahican than to Unami) with extensive differences in orthography and phonology (e.g. in having l vs r in their reflexes of PA's *r/*l phoneme). (Archaeological as well as linguistic evidence suggests the distinction between the two groups goes back to prehistoric times.) Merging northerly Munsee into the now more dominant southern Unami could complicate documentation of that critically endangered language, which some people are studying and learning from its remaining speakers in Canada. On the other hand, the most prominent revitalization efforts in the United States, based mostly on southern Unami, do seem to speak of Lenape as a single language (and FWIW, Canadian Munsee efforts, albeit with a different spelling, also seem to speak of a Lunaape language), apparently aiming to standardize the two into one language with an eye towards keeping it alive into the future. We have to think carefully about what to do here. - -sche (discuss) 17:52, 29 July 2021 (UTC)
 * I have nothing to contribute beyond the wish for some precision in the etymologies of toponyms, which can easily be accomplished with qualifiers or labels without complicating the creation of good language entries. DCDuring (talk) 18:33, 29 July 2021 (UTC)


 * You say on Talk:mochipwis that "I'm a learner of modernized Lenape, a language in the process of being revived. I am not an expert in the differences between Wënami versus Munsee. What I do know is that there are no first-language speakers of either dialect left." This goes to the heart of the issue, I think: Wiktionary does not only cover modern languages as they are currently spoken, but also covers languages that existed in the past. Although (as noted in the earlier discussion) I initially created "Lenape" content under a unified code, following the "lumper" approach of the US revivalists, the differences between the lects are (as Chuck noted in that same disscusion) historically extensive, to the point that the modern linguistic literature I've been able to find that says anything about their intelligibility accepts Mithun's statement that they were mutually unintelligible, which militates against combining them. (OTOH, if modern speakers are trying to merge them, that militates in favor of a merger.) Wiktionary does merge e.g. many Sinitic languages even when they're mutually unintelligible when spoken (and conversely we split e.g. Bokmal and Nynorsk even though they're not merely mutually intelligible but the same language), so either approach could be made to work. - -sche (discuss) 19:29, 29 July 2021 (UTC)
 * What about recognizing all three codes? We could have Lenape (either using  or creating a new code for it, e.g. , so as to keep   for the family) for the modern language undergoing revitalization, and also have   and   for the historical stages. —Mahāgaja · talk 19:57, 29 July 2021 (UTC)
 * Adding a link here to the important Lënape Talking Dictionary maintained by the Tribe of Delaware Indians. It should be recognized that the code  (for Delaware) is potentially problematic and controversial. If I say in New York City “I speak a little Delaware” I’m likely to get quizzical looks. But if I say “I’m learning Lënape” an educated resident of the city is likely to remember social studies classes in elementary school, contextualize, and understand what I mean. Wouldn’t it be better to have a new code for modernized Lënape that contains the letters in Lënape? Andreas.b.olsson (talk) 21:56, 29 July 2021 (UTC)
 * We use the ISO 693-3 codes for languages, so that's out of our control. They're not going to change the code; they've refused for www, which has some significant real-life problems. In practice, the codes may sometimes look like abbreviations, but in theory, there's no necessary connection; it's just a three letter code.--Prosfilaes (talk) 01:30, 30 July 2021 (UTC)
 * there is no ISO designation for the Lënape variants. So what language does the ISO code  stand for? Old Wënami? Mondern Lënape? Munsee dialects still spoken in Moraviantown? Andreas.b.olsson (talk) 02:50, 30 July 2021 (UTC)
 * In the ISO itself,  stands for a macrolanguage called Delaware whose individual languages are Munsee   and Unami   (see https://iso639-3.sil.org/code/del). We're not obliged to use ISO's names, though, so we are free to call   Lenape or Lënape rather than Delaware. Don't get hung up on the letters used for the code; they don't have to bear any relation to the name of the language (  stands for Old Armenian, for example, and the code for Mapudungun is  ). —Mahāgaja · talk 07:40, 30 July 2021 (UTC)
 * Fair enough about the abstraction. I did not know that ISO had a more complex sub-categorization. Thank you for providing the link and helping my knowledge about these standardizations grow. My concern was related to the fact that  is derived from Delaware – the European name for the region – and not Lënapehokink, the Lënape name. However, as far as I understand the Lënape are themselves comfortable with the syncretic designation and retain it in their name (Nation of Delaware Indians). I should therefore not assume any offending connotation in the designation  . I’m trying to reach out to the Oklahoma branch of the Lënape. Let’s see what they say about this issue. As a dominant constituency they should have a say about this actively evolving linguistic situation. On an additional note, there is a significant difference in the orthography being actively taught by the mainly Munsee branch in Moraviantown Canada.Andreas.b.olsson (talk) 08:49, 30 July 2021 (UTC)

I’m not sure there is anything wrong in the abstract splitting of the dialects/languages. It’s the naming in Wiktionary for these lects that does not reflect how these lects are referred to in actuality (where there is unfortunately ambiguity). A proper and respectful name is needed for the following variants: The modern variant in Moraviantown seems to be referred to alternatively as Lunaape and Munsee Language. The modern variant based on Rementer et al work is consistently referred to as Lënape or alternatively without the schwa (Lenape). Andreas.b.olsson (talk) 13:06, 30 July 2021 (UTC)
 * Variant currently taught at Talking Lënape Dictionary and extensively documented by Jim Rementer et al in recent history (late 20th century).
 * Variant currently taught in Moraviantown where there seems to be wide variations between the last handful of fist-language speakers. I’m not sure any of them are still alive but the community seems to be sticking with their own historical orthographical variant rather than using Rementer’s et al. revised system (the one I’m learning and using).
 * Historical variant as recorded in the time of David Zeisberger (18th century).
 * How about the following naming:
 * Lënape
 * Lunaape
 * Old Lenape
 * The distance between the lects spoken by the Unami and Munsee tribes at the time of Zeisberger and prior seems largely speculative. Andreas.b.olsson (talk) 16:27, 30 July 2021 (UTC)
 * For the purpose of historical linguistics Old Lenape can be divided into:
 * Southern Unami
 * Northern Unami
 * Munsee
 * Note here that Unami seems to be a term derived from historical linguistics. I believe it should not be used to denote any of the revitalized lects currently in use. However, the Stock-Bridge Munsee do refer to their Lunaape variant in English as the “Munsee language”. Andreas.b.olsson (talk) 14:54, 31 July 2021 (UTC)
 * Unless the Stockbridge-Munsee Community object, I would suggest using Lunaape as the name of their orthographic variant and dialect.Andreas.b.olsson (talk) 15:04, 31 July 2021 (UTC)

Here is what I propose in terms of naming, categorization, and ISO encoding. LNP and LNE are not used yet by ISO 639-3. Andreas.b.olsson (talk) 15:29, 1 August 2021 (UTC) North and South Unami can be treated as minor variants (i.e. dialects). Munsee is treated as distinct, leaving the question open about whether it was more closely related to Mohican. Andreas.b.olsson (talk) 15:35, 1 August 2021 (UTC)
 * 1) Old Lenape:
 * 2) Munsee:
 * 3) Unami:
 * 4) Lënape:
 * 5) Lunaape:
 * We can't be making up our own three-letter codes, but we can put them after a family code, separated by a hyphen. So if we wanted to, we could in theory create,  , etc. —Mahāgaja · talk 10:08, 2 August 2021 (UTC)
 * Why can’t we be making up codes if they are not used? This can be in conjunction with ra request to ISO that these lects be added with those codes. It’s one thing to request a change, another the treat the ISO 639 as extensional. With its increased influence Wiktionary ought have some influence on ISO. Andreas.b.olsson (talk) 16:40, 2 August 2021 (UTC)
 * We can't make up codes because ISO might assign them to some other languages in the future. I think you might be overestimating Wiktionary's influence; also, the ISO committee would take a whole lot of convincing before they agreed to two more codes. Considering you started out this discussion arguing against treating Unami and Munsee as two different languages, it seems like it would be quite a stretch to convincingly argue that they are, in fact, five different languages. From everything you've explained above, I'm starting to think we should recognize only  as a canonical language and reassign   and   to be etymology-only codes for subvarieties of  ; if necessary, we can add more etymology-only codes (since these are local to Wiktionary and have nothing to do with ISO) for other chronolects and regiolects. —Mahāgaja · talk 18:38, 2 August 2021 (UTC)
 * I think your low-balling Wiktionary’s growing influence, and a few people’s ability to influence seemingly immobile institutions. My point has evolved based on ’s input about linguistic historicity and the strong difference between orthography used by those who learn Lunaape versus Lënape. Importantly, there is extremely strong merit in treating the language recorded by Zeisberger et al in the 18th century as a different language. It is grammatically quite distant from Lënape and Lunaape. Note that the,  , and   seem based on the historical study by Moravian missionaries (who referred to the language as “Delawarische Sprache”). There needs to be a clear distinction between these historical lects (which I’m referring to as “Old Lenape”) and the languages spoken today. Andreas.b.olsson (talk) 03:26, 3 August 2021 (UTC)
 * Seeming immobile to whom? They've seemed relatively responsive to me; they offer a final decision on most requests in an annual wrapup. I certainly question any need to bluster ahead and then expect SIL to follow; that seems likely to be counterproductive.--Prosfilaes (talk) 09:31, 4 August 2021 (UTC)
 * I would also note here that the German entry in Wikipedia is titled “Delawarisch” and talks about it first as a single language, and then alternatively two languages depending on one’s view. The equivalent English entry is “Delaware languages”. However, in both entries the focus is on the language(s) studied by Moravian missionaries et al, and then linguists studying those initial European studies (which haphazardly invented wildly different orthographies based on the authors own mother tongues). is right: it’s complicated. Here are the facts:
 * Old Lenape is grammatically very different from modern variants.
 * There are two very different modern orthographical variants.
 * If this were Norwegian, you would without hesitation have Gammelnorsk, Bokmål, and Nynorsk. Andreas.b.olsson (talk) 04:01, 3 August 2021 (UTC)
 * So if we are going to base decisions on precedent – which one ought to - then the way Wiktionary has approached the orthographically distinct Norwegian Bokmål and Nynorsk speaks for having distinct language codes for Lënape and Lunaape regardless of whether they should be treated as the same language or not. Unfortunately,  really in the end refers to Old Lenape since the ISO further subdivides it into Unami and Munsee, which are really references to the language(s) studied by Zeisberger’s et al and those who continue to study their studies. To use it for Lënape and Lunaape would be confusing and improper.Andreas.b.olsson (talk) 04:25, 3 August 2021 (UTC)
 * This is not Norwegian, and there's no other language that is presented like Norwegian. The closest are Ottoman Turkish and modern Turkish and Urdu and Hindi, and in both of those cases you're looking at different scripts, Arabic and Latin or Devanagari plus vocabulary divergence. One has to look at all the precedent, not just one example.--Prosfilaes (talk) 09:31, 4 August 2021 (UTC)
 * On a last note for tonight, the codes  and   proposed by  for Lënape and Lunaape could be a reasonable compromise.Andreas.b.olsson (talk) 04:34, 3 August 2021 (UTC)


 * IMO, as someone who as added a good portion on the Munsee entries, unless someone can demonstratively show the need, I would say the way things are set up right now is perfectly fine. More work should be made to show the linguistic differences before any new codes are created. This all seems like hypothetical conjecture to me. -- 05:25, 3 August 2021 (UTC)
 * Having looked closer at Zeisberger’s work, I should say that how distant Old Lenape is from Lënape and Lunaape is uncertain to me. There seems to have been a lot of simplification in how verbs are used, with certain modes falling greatly out of favor (e.g the subordinate mode). The issue is complicated by the very different orthography used by Zeisberger, who used German inspired phonological conventions. If Wiktionary were to want to document these changes though, it ought to leave a space for Old Lenape. The issue reminds me of Swedish, which went through a drastic change in orthography in the late 19th and early 20th century.
 * I think what I get back to is that Unami is not the proper L2 name for a language. The proper name for this language is Lënape. So even if we keep the coding structure in tact, the name of that lect needs to be changed (Unami => Lënape). Andreas.b.olsson (talk) 09:24, 3 August 2021 (UTC)
 * Out of curiosity, how come you chose Munsee as the name of the L2 lect and not Lunaape? I see that the community in Moraviantown uses either designation. Nonetheless, what made you chose one over the other? If you are a member of the Stockbridge-Munsee community, please help me better understand and my sincerest apologies for potentially overstepping. Andreas.b.olsson (talk) 09:36, 3 August 2021 (UTC)
 * We (Wiki/Wikt) base much of our language nomenclature and classifications on the recommendations of Ethnologue.. -- 19:15, 3 August 2021 (UTC)

I guess another thing that bothers me is that the code  – however abstract in the mind of the Wiktionary community – carries the stigma of a split. It’s continued use will cause continued sclerosis. It’s does not seem a term that all parties – Lënape in Oklahoma, Munsee in Canada, and enthusiasts like myself in New York City – can all rally around. It seems to me that there is something quite new going on, an amalgamation and standardization that should be captured in vivo by Wiktionary. Andreas.b.olsson (talk) 10:14, 3 August 2021 (UTC)

(Sorry, I was busy/distracted.) If we could reach out to Lenape folks, including Munsee speakers or learners, for clarification of whether they consider the lects one language (in particular, if Munsee consider their language to be the same as Unami, or if it is only the US-based folks running the Unami-based Lenape Talking Dictionary who name it as if it speaks for all Lenape), that'd provide useful evidence: if speakers want to consider the lects one language, it'd be evidence that we should merge (and distinguish with qualifiers, etc); otherwise, the current setup is fine. It seems some of the objection comes down to just the names, preferring "Lenape" to "Unami"; in another discussion Andreas notes that speakers would sooner say the equivalent of "I speak Lenape" than "I speak Unami", but this is not evidence that they are one language (consider various Dogon languages, which are all Dogons though not all mutually intelligible, including whose speakers consider themselves Dogon who "seem unaware that [their language] is not mutually intelligible with any Dogon language"). Simply renaming "Unami" to "Unami Lenape" (and "Munsee" to "Munsee Lenape") is an option, although we usually try to use the most commonly used names for languages, which may be the current names. The shape of the codes is a non-issue since they're mostly not visible to readers, only editors. This discussion does make me realize that they're reader-facing in the names of certain categories, and that this is probably confusing, but that's a general issue not specific to Lenape. (How many people looking at our Khanty content would guess that the list of berries is at "Category:kca:Berries"? Shouldn't we always use language names?) There's no linguistic basis for introducing more than one new code (e.g. a five-way split!), as floated above. - -sche (discuss) 02:24, 5 August 2021 (UTC)
 * I’m the newcomer here . Though I have used Wiktionary for a long time, I did not prior to this participate in its maintenance. I will defer to you all on the best back-end encoding, and whether proliferating codes for anticipated future linguistic differentiating makes little sense. My objection – and this was not clear to me before either – seems reduced to the fact that “Lënape” was stripped from how the lect(s) are referred to. I think there is sufficient online evidence to make the affirmation that the so called Unami branch is almost exclusively referred to as Lënape. Therefore I think it would be warranted to make this change as soon as possible. Having looked at quite a few of the “Unami” entries it seems to me that they are almost all derived from the work of the Talking Lënape Dictionary project. Hence the issue of the distance between the lect recorded by Zeisberger et al versus the more recent collected speech samples of the Talking Lënape team is mute, and can be resolved later. Note though this may mean having to encode for another lect at some point in the future (especially since the spelling used by the Moravian missionaries was based on German and very different). I agree fully that the Munsee community should have a say in whether to treat their lect as the same as that taught by the Talking Lënape Dictionary project. It should be noted though that it can be factually stated that they have to date used a very different orthography in lessons and materials they have posted online. Andreas.b.olsson (talk)

,, can we split out the decision about the naming for the lect with the code   and rename it from Unami to Lënape? There is overwhelming evidence that Lënape should be the proper name of the language, and not Unami. The code issue and whether to treat the Munsee branch as a separate language (and what to call it) can be dealt with separately and later.
 * ,, asking again about splitting out the issue of the language's proper name and making an initial narrower decision. I would like to continue building out Wiktionary's Lënape entries, but first I would like to make sure the language currently referred to as "Unami" is referred to by its proper name. Andreas.b.olsson (talk) 13:01, 12 August 2021 (UTC)
 * I have no objection (other than that it's hard to type) to renaming  Lënape. —Mahāgaja · talk 13:10, 12 August 2021 (UTC)
 * AFAICT "Lenape" is a far more common name than "Lënape", so renaming "Lenape" to "Lënape" does not seem appropriate. And because Unami is only one variety of Lenape, renaming Unami to Lenape/Lënape would be confusing. The difficulty seems to be that the main US-based language project produces its Talking Dictionary seemingly based on Unami but named as if it's one overarching "Lenape"/"Lënape" language. (I know this kind of thing has come up before, where a prominent dictionary of what it considers "a language" doesn't specify which of two actually-distinct lects its words are, or conflates them; I will track down other examples if necessary, but it seems tangential. ) I don't want to stand in the way of a Native language revitalization effort, so the above-discussed option of leaving Unami and Munsee for the historical (and historically not mutually intelligible) varieties and making Lenape a language code (rather than a family code) for the "unified" language has some appeal, but it would entail duplicating a lot of content which is currently Unami. There is also the option of renaming Unami to "Unami Lenape" to get the "Lenape" name in there. - -sche (discuss) 19:19, 30 August 2021 (UTC)
 * Again, I agree -- renaming Unami to Lenape would simply cause confusion whilst adding no benefit and renaming it to "Unami Lenape" does not serve to disambiguate. -- 20:13, 30 August 2021 (UTC)
 * I’m not sure I understand for whom this would be confusing . I agree that Lenape may be simpler to type than Lënape. Anyway, the schwa indicator is optional in the dominant modern dialect of Lënape. I have a deep hunch here that Linnaean taxonomic orthodoxy and perpetuation of erroneous past anthropological claims are at fault for the strict linguistic distinction between Unami and Munsee. The more I study Lënape, the fewer distinctions I see in the various dialects except for in the choice of orthography (which noticeably stems from quite recent European attempts to record an unwritten language). Andreas.b.olsson (talk)

I suspect that Wiktionary is perpetuating a European construct erected between the late 18th and early 20th century. Of course, these faulty past errors of European linguists have had real effect, seemingly causing “two languages” to appear to exist. I need to further study the Munsee branch to substantiate my claims.

I continue to believe Unami is a culturally and linguistically inappropriate term for any dialect of Lënape. If the language spoken by the descendants of the Unami tribe of the Lënape were to have any other name than Lenape – which I insist is the appropriate designation – it might be Delaware. This is a name I believe the (predominantly) Unami descendants living in Oklahoma accept and sometimes use in English despite its European origins.

Wanishi Andreas.b.olsson (talk)

RFM discussion: January–February 2024
Resurrecting an old discussion on the same subject above, I propose we rename Arawak and Island Carib  to  and, respectively. As User:Victar noted in that discussion, Arawak is easily confused with the Arawak/Arawakan family. (Indeed, I have run into many etymologies where someone has mislabelled a word from a different Arawakan language as Arawak). Island Carib, meanwhile, is (1) not a Cariban language, making etymological discussions occasionally confusing; (2) no longer generally called by that name, since the people are now officially called the Kalinago by the Dominican government; (3) easily confused with the unrelated 'true' Carib language, which we call Galibi Carib.

I would also add this latter language (the one we currently call Galibi Carib ) to the discussion, and suggest it should be renamed either Kari'na or Kari'nja, names that are both more common in the current literature and closer to the native name ( in Courtz’s orthography). (The most common name for this language in the literature is actually simply Carib, but this may itself be confusing.) — Vorziblix (talk · contribs) 15:56, 3 January 2024 (UTC)
 * : Obviously still support renaming both to Lokono and Kalinago, respectively. I can't speak to Galibi Carib as I'm not familiar with the academic literature. -- 16:22, 5 January 2024 (UTC)
 * I hope that the entries Arawak, Island Carib, Cariban, Kalinago, Galibi Carib, Kari'na, Kari'nia, and Carib have or will have all the definitions and attestation to help relatively normal users through this. DCDuring (talk) 18:54, 5 January 2024 (UTC)
 * I’ve added most of those missing entries (and improved one or two others), and have also changed Galibi Carib to Kari'na, since I am actively working on that language and would rather make the change while our coverage is still small. I will leave the discussion to run for a few more weeks before changing the others, and we can always also revert/change Kari'na to something else if the disucssion turns that way. — Vorziblix (talk · contribs) 13:42, 16 January 2024 (UTC)
 * All three of these moves are now complete. The change away from ‘Arawak’ came not a moment too soon; a large amount of references to ‘Arawak’ in our entries were entirely wrong and needed to be corrected to ‘Arawakan’, and indeed there may be some remaining references to   that are still wrong. — Vorziblix (talk · contribs) 06:42, 2 February 2024 (UTC)

Arawak → Lokono, Island Carib → Kalinago, and Galibi Carib → Kari'na ✅. — Vorziblix (talk · contribs) 20:38, 2 February 2024 (UTC)

RFM discussion: January–March 2024
Since the Mansi language represented in Wiktionary isn't a thing, since it is several languages compiled under one name and code, I would like to propose a split. It would be beneficial to split it along the 4 main dialects, (the ones still extant) Northern, Eastern and (extinct) Western, Southern. The subdialectal markings can be represented with labels, already in place in the current category for Mansi. Ewithu (talk) 18:19, 9 January 2024 (UTC)


 * ✅Ewithu (talk) 13:22, 9 March 2024 (UTC)

RFM discussion: January–March 2023
"Pomeranian" is essentially a term for the family consisting of the Kashubian and Slovincian languages. In fact, Pomeranian (or its Polish counterpart, język pomorski) has always been used as a synonym of Kashubian (Polish język kaszubski), as witnessed by the 1893 dictionary called Słownik języka pomorskiego czyli kaszubskiego ("Dictionary of the Pomeranian a.k.a. Kashubian language").

There are no written records of an ancestor of both Kashubian and Slovincian, and any attestation of a Pomeranian lect will automatically fall into either of the two categories according to our current handling. However, the two languages do share an ancestor, and this ancestor did influence other languages. As such, it seems only logical to set this language as a proto-language, and rename it to Proto-Pomeranian in reference to its being the unattested ancestor of a language family, rather than being an attested language.

Pinging. Thadh (talk) 13:55, 9 January 2023 (UTC)


 * . Vininn126 (talk) 13:59, 9 January 2023 (UTC)
 * . Sławobóg (talk) 14:10, 9 January 2023 (UTC)
 * . // Silmeth @talk 15:09, 9 January 2023 (UTC)
 * . Gnosandes ✿ (talk) 16:47, 9 January 2023 (UTC)
 * . —Mahāgaja · talk 08:24, 11 January 2023 (UTC)
 * . — Fenakhay ( حيطي · مساهماتي ) 11:56, 11 January 2023 (UTC)
 * I have looked into usage and it appears that this ancestor—not only in English, but also Russian and other current tongues of linguistic science—can only be called “Pomeranian” in the same way as Proto-Slavic can be called “Slavic” language, and it the same way as “Slavic” is any Slavic language, or “Turkic” any Turkic language etc., “Pomerian” is any language of the said group. Fay Freak (talk) 12:35, 11 January 2023 (UTC)
 * ✅. Currently there is no family corresponding to Proto-Pomeranian and the code for this language is 'zlw-pom', which is exceptional in lacking the '-pro' suffix normally given to proto-languages. I'm thinking we should add a 'Pomeranian' family with code 'zlw-pom' and give Proto-Pomeranian the code 'zlw-pom-pro'. Thoughts? Benwing2 (talk) 07:25, 28 February 2023 (UTC)
 * Yes, absolutely. zlw-pom should be a family code and zlw-pom-pro the protolanguage code. —Mahāgaja · talk 08:13, 28 February 2023 (UTC)
 * Can you take a look at Reconstruction:Proto-Slavic/žarъ? @ZomBear listed a non-reconstructed descendant for Proto-Pomeranian and I'm not quite sure how to fix this as I'm not sure where that term comes from. Benwing2 (talk) 23:18, 28 February 2023 (UTC)
 * @Benwing2 error corrected by adding * (in ). Pomeranian term I met here — -- ZomBear (talk) 03:16, 1 March 2023 (UTC)
 * BTW I renamed the codes. Benwing2 (talk) 23:24, 28 February 2023 (UTC)
 * Obviously, there is no “Proto-Pomeranian”. There are no reconstructions, no comparisons of paradigmatic morphology, and so on. Gnosandes ❀ (talk) 08:41, 1 March 2023 (UTC)
 * The little reality of the field of study does not equal the irreality of the language itself. Subbranch proto forms are often left unreconstructed for being too small a fish when a larger one is available. Fay Freak (talk) 22:28, 4 March 2023 (UTC)

RFM discussion: January–February 2024
As it stands, Proto-Pomeranian is next to useless and I don't see it becoming any more useful. There is a single lemma that we could treat as Pre-Kashubian. I don't see any reconstructions coming in the future as a lot of linguistics consider Slovincian a dialect of Kashubian (this needs further analysis and it's something I'm working on). The next highest level that linguists agree on is something like Pomeranian-Polabian, they share a lot of sound changes in common, but I think it would make more sense to treat descendants sections with something like:

* Pomeranian-Polabian:

**

**

**

I propose we remove Proto-Pomeranian and merge with Kashubian. Vininn126 (talk) 12:42, 30 January 2024 (UTC)


 * Support This was worth a try but didn't turn out well. Thadh (talk) 12:53, 30 January 2024 (UTC)
 * There is no such thing as "Pomeranian-Polabian", hard no to that. Sławobóg (talk) 14:06, 30 January 2024 (UTC)
 * And to the other, more pressing issue? Vininn126 (talk) 14:43, 30 January 2024 (UTC)
 * I support removing Proto-Pomeranian, adding it was a mistake. We have nothing to make it work. It is probably not different from Proto-Polish and stuff like that. Sławobóg (talk) 15:00, 30 January 2024 (UTC)
 * . I wrote a long time ago that "Proto-Pomeranian" is nonsense. I also now write that "Pomeranian-Polabian" is nonsense. ɶLerman (talk) 18:28, 30 January 2024 (UTC)
 * I'm going to speedy delete this tomorrow - I think everyone who has an opinion has spoken. As to the grouping, I have a new idea that I will bring up elsewhere. Vininn126 (talk) 13:27, 5 February 2024 (UTC)

Deleted. Vininn126 (talk) 13:42, 6 February 2024 (UTC)
 * : It turns out there were at least half a dozen entries linking to Proto-Pomeranian words in either the etymology or the descendants. There were a couple of those where the proto-form was followed by its alleged descendants, which I fixed by simply replacing it with the word "Pomeranian:". The rest need to be dealt with. Chuck Entz (talk) 04:56, 7 February 2024 (UTC)
 * @Chuck Entz I thought I had caught them - can you provide a list? Vininn126 (talk) 08:12, 7 February 2024 (UTC)
 * @Vininn126 See CAT:E. Benwing2 (talk) 09:33, 7 February 2024 (UTC)
 * Ah, those hadn't popped yet when I checked, guess the server needed time to refresh. Vininn126 (talk) 09:37, 7 February 2024 (UTC)
 * : Sometimes it takes a few days (widely transcluded modules can take langer- there are still errors popping up from a mistake someone made and immediately fixed last week).
 * A better method is to use  in Special:Search with the option for searching all namespaces checked. The single remaining one is in the Reconstruction namespace (There's also the archived rfv discussion for the reconstruction you deleted, but I'm not sure what to do about that). Chuck Entz (talk) 15:42, 7 February 2024 (UTC)
 * @Chuck Entz I don't see any reconstruction? Vininn126 (talk) 15:47, 7 February 2024 (UTC)
 * Reconstruction:Proto-Slavic/vydati Chuck Entz (talk) 15:52, 7 February 2024 (UTC)

RFM discussion: February–March 2024
A long-standing issue with Ob-Ugric Languages as a whole (which I tried pointing out with the request for splitting Mansi as well) is that they are categorized under one language, while they could be considered separate languages, one fact is due to mutual unintelligibility.

But this talk is considering Khanty languages. We could do the way that it is on glottolog with the Khantic being the root of the all other Khanty languages. You can also find other sources for this on the List of languages of Russia (v2023) (in russian) site with the 143-146 numbers on the list

@Nyuhn I know you've only started to work on the Khanty language not so long ago, but I especially would love to hear your thoughts on this matter :] Ewithu (talk) 13:02, 6 February 2024 (UTC)


 * @Thadh Can you comment (or ping the appropriate editors, if you know them)? You seem to know a lot about obscure languages. Benwing2 (talk) 03:28, 7 February 2024 (UTC)
 * I'm not read-up on Mansi and Khanty and don't have the time to at the moment, but maybe may have a clue. Thadh (talk) 08:37, 7 February 2024 (UTC)
 * Yes almost all western researchers have been on board for decades now with "Khanty" as being at least three languages. Russian scholars are catching up to it too (e.g. a study was recently advertized as "Khanty dialects found to differ more than Slavic languages"). The minimal division would be Northern Khanty, Southern Khanty, Eastern Khanty as still used by e.g. Salminen in Routledge's recent (2023) The Uralic Languages, 2nd Edition handbook. In terms of practicality for someplace like Wiktionary, there's three or four essentially different literary standards for Northern (Kazym, Shuryshkar, Obdorsk; also the by now little used Middle Ob) and two or three for Eastern (Surgut, Vakh-Vasyugan, nascently Salym) which each could be treated as different languages too. The division of Northern is not highly in line with what comparative studies would have to say (many field doculects could not be nicely fit to it) but for anything documented from literary use, this would save a lot of "(Vakh dialect)" "(Obdorsk dialect)" comments. --Tropylium (talk) 12:48, 7 February 2024 (UTC)
 * If the literary standards represent mutually intelligible dialects I would suggest keeping them unified. This is somewhat similar to the situation with Galician, where there are two competing literary standards ("standard" and "reconstructionist", with the former made to look like Spanish and the latter to look like Portuguese), and we do use labels to handle the differences (and if the same spelling is used for both, there are two verb tables under Conjugation). The main issue I see with creating separate languages is it leads to duplication as you can't simply point the word in one literary standard to another using alt form. Benwing2 (talk) 21:18, 7 February 2024 (UTC)
 * Compare also Occitan, handled as a single language with six or so dialects. Benwing2 (talk) 21:21, 7 February 2024 (UTC)
 * I don't know too much about Galician nor Occitian, but if the vocabulary difference would be our biggest problem between the Khanty dialects, we would be ok to keep it unified.
 * But since between Khanty dialects, even the grammar differs. I don't know how we could handle the different ways nouns decline or verbs conjugate if we keep it unified. Ewithu (talk) 22:10, 7 February 2024 (UTC)
 * @Ewithu What I mean is, split based on mutual intelligibility, hence maybe a 3-way split or however-many-way split, but not split dialects just because they have different literary standards, if the literary standards refer to mutually intelligible dialects. Grammar also differs among Galician and Occitan standards and it's handled just fine. Benwing2 (talk) 22:49, 7 February 2024 (UTC)
 * Coptic is another example where we handle multiple (4) literary standards under a single L2. Benwing2 (talk) 22:56, 7 February 2024 (UTC)
 * Komi is splitted into three languages, Udmurt is in two, Selkup is also in two, Mari is in three (should be Northwestern one two by the way), Koibal has its own section despite being essentially a Kamassian dialect. I think splitting it into big chunks would really help reduce messiness, especially if there are also going to be dialects written down.
 * Two of the Mansi languages are extinct, to me it just feels wrong to mix everything together. Kaarkemhveel (talk) 23:27, 7 February 2024 (UTC)
 * @Kaarkemhveel I am not proposing mixing everything together but splitting based on mutual intelligibility. Glottolog for example has 5 Khantyic languages and 3 Mansic languages. What I'm concerned about is splitting every proposed literary standard into its own microlanguage; this would give us 8+ Khanty languages and I don't know how many Mansi ones. Things like "Koibal has its own section despite being essentially a Kamassian dialect" IMO should not be considered precedents for micro-splitting. Benwing2 (talk) 00:57, 8 February 2024 (UTC)
 * Right, that sound reasonable. In my head at least it looks like Northern/Eastern/(Salym?)/Southern Khanty and for Mansi either Nothern/Eastern/Western/Southern or Northern/Eastern/Western&Southern (though the last suggestion based rather on their current use status).
 * Kaarkemhveel (talk) 06:29, 8 February 2024 (UTC)

Dear EnWiktionary bosses! I must keep you informed that so called 'Khanty', 'Koryak', 'Selkup', 'Nenets' and some other languages which are used to be considered as united languages have never been a matter of fact of the reality but only a product of early-Stalin national policy. Speaking about Khanty subfamily, the main sources have been already provided above. Khanty subfamily consists of at least 4 languages: Vakh-Vasyugan with Vakh literary norm; Surgut with its literary norm; Southern with no actual written use afaik; and Northern with previously used Middle Ob literary norm and currently used Kazym, Shuryshkary and Lower Ob literary norms. All literary standards I mentioned have, for instance, schoolbooks presently being released, some of them I have in paper on my right hand. The languages of Khanty group (as well as Sami to give an understandable parallel) are not mutually intelligible. For instance, Northern Khanty morphology is very close to the Northern Mansi one with 3-4 noun cases while Vakh-Vasyugan has 10-13 cases like Southern Selkup which is believed to be the remaining of the archaic state because of the close contact with Selkups. So people applying for splitting Khanty just range things as the things are, moreover they are ready to work. Grigoriy Korotkih (talk) 09:50, 8 February 2024 (UTC)


 * @Grigoriy Korotkih Hi Grigoriy, thanks for your comments. I don't think anyone is objecting to splitting Khanty or Mansi, it's just a case of working out what the splits will be. Benwing2 (talk) 21:42, 8 February 2024 (UTC)

Khanty languages could be considered separate languages. Most of the dictionaries describing them through dialects or subdialects (говоры). Those division already presented in the Wiktionary using regional categories.

How it would be beneficial to the Wiktionary to have three Khanty L-2 sections on the "тур" page instead of one?

Also we will need to somehow handle translations and etymology links from other languages, that refer to unspecified Khanty. Nyuhn (talk) 09:23, 9 February 2024 (UTC)


 * Should we merge East Slavic languages because they have words like нога or зуб then? Don’t different Khanty lects have different inflection paradigms, pronunciation, even use sometimes? Speaking of refering to "unspecified Khanty": wouldn’t it be beneficial to the Wiktionary to specify it, rather than pile everything into "Khanty"? Kaarkemhveel (talk) 12:47, 10 February 2024 (UTC)

So in light of all that has been discussed, we should be good to split both languages, according to the cardinal directions of the North East South, and West. We can use labels to further specify which dialect it came from. Naturally each separate language (represented by the 4 cardinal directions) will have to (if possible to find) have an alt form section to make switching between dialects (represented by the river they lived beside traditionally) easier. Support or Oppose? Ewithu (talk) 09:59, 24 February 2024 (UTC)


 * @Ewithu Northern/Southern/Eastern/Western seems correct for Mansi but for Khanty maybe it should be Northern/Southern/Eastern per Wikipedia; Western Khanty appears to be a larger grouping consisting of Southern and Northern. I agree with the idea of identifying dialects by rivers. (We also ended up doing this for the CAT:Ye'kwana language, inspired by the Khanty and Mansi situation, as using river names was the only unambiguous and accurate way of referring to different dialects that we could come up with.) Benwing2 (talk) 10:20, 24 February 2024 (UTC)
 * @Benwing2 Alright, that's sounds good. Now we just need to get to it haha Ewithu (talk) 13:52, 26 February 2024 (UTC)
 * @Ewithu I think what we need to do is as follows (for Khanty, similarly for Mansi):
 * Assign language codes, and  to the split-out languages.
 * Assign a temporary family code to the new Khantyic (Khantic?) family; see below.
 * Set up tracking for all uses of as a language code. (This will make it easier to find the references to this code so they can be changed.)
 * Split the current kca-* templates into per-new-language templates. Thankfully there are only 4 templates. I assume the pronouns listed in kca-table-ppron are Kazym pronouns. I'm not sure if all the Khantyic languages use the same set of cases; if not we need to modify the new declension templates to have the appropriate sets of cases in them.
 * Split the Khanty modules. There are only three of them: Module:kca:Dialects, Module:labels/data/lang/kca and Module:kca-translit. I think the translit module can stay as-is.
 * Move all the lemmas and non-lemma forms to the new languages, and change template references accordingly.
 * Change all references to language to use the appropriate code. This will principally be in Etymology, Translations and Descendants sections.
 * Delete the Khanty language from Module:languages/data/3/k.
 * Repurpose the code for the Khantyic family.
 * Benwing2 (talk) 03:41, 27 February 2024 (UTC)
 * @Ewithu Steps 1-3 are done. Tracking can be found here for Khanty: Special:WhatLinksHere/Template:tracking/languages/kca. There is also tracking for Mansi here: Special:WhatLinksHere/Template:tracking/languages/mns. I have not done the corresponding split for Mansi yet because Glottolog only has a three-way split for Mansic (Southern, Northern, Central) instead of a four-way split (Southern, Northern, Eastern and Western), preferring to consider East Mansi and West Mansi as dialects of Central Mansi. This means we need a bit more discussion. Benwing2 (talk) 05:05, 28 February 2024 (UTC)
 * @Benwing2 Wonderful Thank you!
 * So I would say split it 4 way, because on one hand, the western dialect is extinct, while the eastern (is also extinct in theory, but not proven yet) is extant, and western has a lack of Cyrillic dictionaries or sources (except that one Gospel of Mathew written in Lower Lozva most likely), while Eastern even has audio pronunciations on Lingvodoc. While I would also say keep it as Central Mansi, because those two languages have very few resources but I don't this is a really valid argument. Ewithu (talk) 06:17, 28 February 2024 (UTC)
 * @Ewithu My general take is we should go by mutual intelligibility unless there's really a good reason to do it otherwise. Since Glottolog seems to think the two varieties are mutually intelligible (at least that's what I assume by the Central grouping) and there are few resources overall, I think it's easier to have fewer splits; we can always split later if the need arises. But I don't feel super strongly about it; maybe someone else can chime in. Benwing2 (talk) 06:23, 28 February 2024 (UTC)
 * In any case let me know if you need help with the Khanty splitting effort (in particular anything best done by bot). Benwing2 (talk) 06:24, 28 February 2024 (UTC)
 * @Benwing2 Minor point, but are we sure about the name "Khantyic"? It sounds really odd to me, and I think "Khantic" is more common going by GBooks. Theknightwho (talk) 06:51, 28 February 2024 (UTC)
 * @Theknightwho Good point. I used "Khantyic" because this is what Glottolog uses. Benwing2 (talk) 07:06, 28 February 2024 (UTC)
 * I can't actually confirm your assertion about Google Books; "Khantic" does seem to occur more often but most of the hits seem to be garbage. Benwing2 (talk) 07:09, 28 February 2024 (UTC)
 * @Benwing2 I was just going by the number of confirmed hits I could see. I'll have a proper look later today. Theknightwho (talk) 12:21, 28 February 2024 (UTC)
 * Can't we just call it "Khanty languages"? Everyone calls the protolanguage "Proto-Khanty", nobody calls it "Proto-Khantyic" or "Proto-Khantic". &mdash; S URJECTION / T / C / L / 13:42, 28 February 2024 (UTC)
 * I agree with this.
 * Also steps 4 and 5 are done for Khanty. Also redirected the reference templates according o their language used. Ewithu (talk) 14:29, 28 February 2024 (UTC)
 * @Surjection I renamed Khantyic -> Khanty. I see you split all the lemmas and carried out step 9 above but step 7 isn't done. Benwing2 (talk) 22:03, 28 February 2024 (UTC)
 * 7 should be done. A few cases were left with etymology templates that used  as a source, but those should work, as it now means "from one of the Khanty languages". Most references were updated; some others were indiscriminately changed to   with attention templates, as I was requested to do. &mdash; S URJECTION  / T / C / L / 22:06, 28 February 2024 (UTC)
 * @Surjection See e.g. роман. There is an error in the Descendants section due to the use of language code . See the tracking link above for all remaining references. Benwing2 (talk) 22:12, 28 February 2024 (UTC)
 * Done. The remaining instances in mainspace at least are valid uses of the family code. I'll check the other namespaces (at least the ones that matter) as well. &mdash; S URJECTION / T / C / L / 22:22, 28 February 2024 (UTC)
 * OK thanks. Benwing2 (talk) 22:26, 28 February 2024 (UTC)
 * If the differences in grammar, phonology and morphology of Eastern Mansi and Western Mansi are not too large, then the fact one is little-attested and extinct is all the more reason to keep the two together. Thadh (talk) 09:49, 28 February 2024 (UTC)
 * @Thadh Alright, keep 'em one. Could you do the honors if you are free? Also should we add Proto-Mansi and Proto-Khanty as well? Ewithu (talk) 09:54, 28 February 2024 (UTC)
 * @Ewithu I have added the Mansi family along with Northern/Southern/Central Mansi languages. Benwing2 (talk) 22:35, 28 February 2024 (UTC)
 * Alright thank you! I'll move the templates to the Northern one. Ewithu (talk) 22:39, 28 February 2024 (UTC)
 * Considering how often etymological sources distinguish between Western and Eastern Mansi though, we may want to consider adding them as etymology-only languages. &mdash; S URJECTION / T / C / L / 08:13, 29 February 2024 (UTC)
 * @Surjection Sounds good, I'll go ahead and do that. Benwing2 (talk) 08:15, 29 February 2024 (UTC)
 * ✅ Into: Eastern Khanty, Northern Khanty, Southern Khanty, Central Mansi, Northern Mansi, Southern Mansi Ewithu (talk) 13:29, 9 March 2024 (UTC)

RFM discussion: February 2024
Hi, this is a proposal to add three new language codes, relating to a small family of (-adjacent) languages which have been dubbed the (from Chinese 蔡龍語支), spoken in western Guizhou.

very kindly created an expansive vocabulary list of Greater Bai Macro-Bai languages several years ago, which covers all of these in detail, including a number of sources (all of which are in Chinese, unfortunately). Theknightwho (talk) 18:59, 3 February 2024 (UTC)
 * (蔡家話)
 * The only extant member of the family, with approximately 1,000 speakers in western Guizhou, with the numbers having declined from over 10,000 in the 1980s. Just over 950 terms are listed in the vocabulary list, including those of the dialects spoken in Shuicheng and Yangjiazhai.
 * (龍家語)
 * Extinct from around 2010. Just under 600 terms are listed in the vocabulary list, including those of the dialects spoken in Anshun and Dafang).
 * (盧人語)
 * Extinct from sometime between 1960 and 1980. 61 terms are listed in the vocabulary list.


 * No objections from me although I know little-to-nothing about these languages. Benwing2 (talk) 03:52, 7 February 2024 (UTC)
 * . What should their ancestor/family be listed as? – wpi (talk) 17:28, 11 February 2024 (UTC)
 * @Wpi Macro-Bai, and then giving the core Bai languages (i.e. the ones we already have language codes for) their own language family within that simply called "Bai languages". I've checked, and I can't find any evidence the term "Greater Bai" is actually used outside of Wiktionary or Wikipedia, so we should definitely fix that. Theknightwho (talk) 17:17, 16 February 2024 (UTC)
 * Created. Theknightwho (talk) 02:05, 17 February 2024 (UTC)

RFM discussion: February 2024
Currently, we use the code  for Northern Bai, classifying it as one of the s. However, in 2014,   was split into Panyi Bai and Lama Bai. Lama Bai was given the new code, while Panyi Bai retained the old code. At some point since then we added Lama Bai to our list of languages, but we never renamed  to Panyi Bai, which seems to have been an oversight. From the documents, it seems like they still do form a "Northern Bai" grouping, however. Theknightwho (talk) 02:25, 17 February 2024 (UTC)

We only have two lemmas for Northern Bai, so I don't think this would be a difficult job. Theknightwho (talk) 02:25, 17 February 2024 (UTC)


 * @Theknightwho . Benwing2 (talk) 02:56, 17 February 2024 (UTC)

Split. Having checked the source, both current lemmas should be under Panyi Bai. Theknightwho (talk) 15:35, 24 February 2024 (UTC)

RFM discussion: March 2024
On the back of the recent split of Min Nan into a family, I thought it would be a good idea to consider renaming the various branches of Min to the names usually used in English. With the exception of Coastal Min and Inland Min, the names we currently use are transliterations from Mandarin, whereas most sources - including academic sources - prefer to translate them into English. I have no doubt we only do this because the ISO uses the transliterated names, for whatever reason. As such, I propose the following: It's difficult to use Google Ngrams with this, since there are a ton of false positives, but the general trend is very strongly towards the English names historically, with the transliterations increasing in popularity in recent publications. However, taking a look at Google Books results, a lot of the transliterations come from poor quality sources, or in many cases texts that simply contain the ISO language names. Google Scholar is somewhat more helpful, though, and shows a clear trend towards the English names even with very recent papers. Theknightwho (talk) 18:02, 4 March 2024 (UTC)
 * : Min Bei &rarr; Northern Min
 * : Min Dong &rarr; Eastern Min
 * : Min Nan &rarr; Southern Min (note: currently being converted into a family)
 * : Min Zhong &rarr; Central Min
 * : Puxian &rarr; Puxian Min (edit: added per the discussion below)


 * Support; this is also consistent with general practice here at Wiktionary to prefer English terminology rather than native-language terminology. Benwing2 (talk) 00:15, 5 March 2024 (UTC)
 * this is what i normally do outside of wiktionary anyways — 義順 (talk) 21:21, 7 March 2024 (UTC)
 * Neutral / Support —Fish bowl (talk) 05:43, 10 March 2024 (UTC)

Question: Should cpx "Puxian" also be moved to "Pu–Xian Min" (cf. Pu–Xian Min?) —Fish bowl (talk) 05:43, 10 March 2024 (UTC)
 * @Fish bowl Oppose use of em-dash or en-dash in the name of the language. Benwing2 (talk) 05:54, 10 March 2024 (UTC)
 * Yeah that's sensible. I copy-and-pasted it from Wikipedia and didn't notice it was there. Imagine it's an ASCII hyphen. —Fish bowl (talk) 05:58, 10 March 2024 (UTC)
 * Not sure here. Wikipedia uses dashes in the names of many languages (e.g. most of the ) where Glottolog uses closed compounds. Glottolog uses the form Pu-Xian with a hyphen not en-dash (although Omniglot uses Puxian). In general I prefer less punctuation than more so my gut would say Puxian but properly you'd have to look at scholarly sources to see which one is most common. Benwing2 (talk) 06:19, 10 March 2024 (UTC)
 * It is because the name 莆仙 Pu(-)Xian is formed from two placenames, 莆田 Putian and 仙遊 Xianyou. This is also the case for the Yue dialects listed on Wikipedia. —Fish bowl (talk) 06:24, 10 March 2024 (UTC)
 * @Fish bowl "Puxian Min" seems to be a lot more common than "Pu-Xian Min" in English-languages sources, but I agree it should be renamed as well. I'll add it to the list at the top. Theknightwho (talk) 06:34, 10 March 2024 (UTC)


 * . (Also the Bei, Dong, Nan, Zhong are not native anyway; they're Mandarin). — justin(r)leung { (t...) 18:17, 10 March 2024 (UTC)

Moved, given this isn't controversial. Theknightwho (talk) 14:12, 12 March 2024 (UTC)

RFM discussion: February 2022–March 2024

 * 1) The family tree looks weird.
 * 2) Old Chinese is subsumed under ==Chinese==.

—Fish bowl (talk) 06:34, 6 February 2022 (UTC)


 * I wonder if we should push it even further and have Proto-Sino-Tibetan be the ancestor? Even Old Chinese being the ancestor of Old Chinese may be weird. — justin(r)leung { (t...) 06:54, 6 February 2022 (UTC)

I note that Category:English language similarly belongs to Category:Middle English language, although a major difference in Wiktionary treatment is that ==English== does not cover Category:Middle English language or Category:Old English language. —Fish bowl (talk) 11:19, 7 February 2022 (UTC)


 * @Fish bowl I'm not a Chinese editor, but from a outside perspective, that'd feel more weird to see (how can Chinese be the ancestor of Old Chinese?). It's also make the Chinese lects go like "Chinese -> Old Chinese -> Middle Chinese -> lect", which seems more confusing to me. Honestly, at this rate. I'd just remove Chinese from the family tree entirely with its current treatment. Or, at least make it on the same level as Old Chinese, rather than an ancestor. AG202 (talk) 04:33, 4 March 2022 (UTC)


 * I don't see a problem with the current setup...? As AG says, making modern Chinese an ancestor of Old Chinese would be weird (and wrong). I think the current setup is fine...? The only awkwardness is that we group all kinds of Chinese under one L2. I'm going to &lt;s&gt;strike&lt;/s&gt; this as no action taken so it can be archived soon, since the discussion is very old and this page is very large, but you can reopen it if you object... - -sche (discuss) 05:33, 31 March 2024 (UTC)

RFM discussion: December 2023–February 2024
Has been discussed many times and was also agreed up to add them like here and here

The proposal was because there are tons of terms which are restricted to certain branches, sometime being only attested in like 4 languages of 1 branch ( for example) in a family of 80+ languages with 4 branches but are reconstructed to the proto stage, R:dra:DL has many reconstructions for the inner branches. R:dra:Southworth mentions many cases where BK reconstructs terms to PD when they are restricted to certain branches like PSD or PCD further saying it should be reconstructed only to the proto branch only. AleksiB 1945 (talk) 13:21, 21 December 2023 (UTC)


 * Support. Theknightwho (talk) 23:06, 30 December 2023 (UTC)

Created (several days ago now): we have: This is to properly align with the format our Proto-Dravidian entries have been following anyway. Theknightwho (talk) 00:03, 24 February 2024 (UTC)
 * Proto-Central Dravidian
 * Proto-North Dravidain
 * Proto-South Dravidian
 * Proto-South Dravidian I  (covering what we formerly called South Dravidian)
 * Proto-South Dravidian II  (covering former South-Central Dravidian)