Wiktionary talk:Language treatment

moving dialect codes to the etyl namespace
For cases where we consider the macrolanguage to be the individual language and the subdivisions to be dialects, I think we should move the subdivision language code templates to the etyl: namespace. Similar to language families & other dialects, this is where we house codes that should only be used in Etymologies and not as valid L2 languages. Sound fine? --Bequw → ¢ • τ 21:59, 20 January 2010 (UTC)

Treatment by SIL
I thought it may be interesting to post what SIL's (the Registration Authority for ISO 639-3) criteria are for determining if language varieties are dialects or distinct languages. It can be found on their Change Request Form (page 3). "For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature (traditional or written), a common writing system, the views of users concerning the relationship between language and identity, and other factors. The following basic criteria are followed: We are of course independent of these, but they may be useful nonetheless. --Bequw → ¢ • τ 21:58, 21 January 2010 (UTC)
 * Two related varieties are normally considered varieties of the same language if users of each variety have inherent understanding of the other variety (that is, can understand based on knowledge of their own variety without needing to learn the other variety) at a functional level.
 * Where intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be strong indicators that they should nevertheless be considered varieties of the same language.
 * Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages"

Allowing macro and non-standard dialects
Sometimes (as with Latvian and Estonian) we treat the subdivisions of a macrolanguage as individual languages, but we use the macrolanguage name/code in place of the "standard" dialect name/code. I just added this option to the table. Are there other macrolanguages where this is the case (possibly Arabic and Malay)? --Bequw → ¢ • τ 17:09, 23 January 2010 (UTC)

Aramaic
Apparently some have been treating "Jewish Babylonian Aramaic" (aka "Talmudic Aramaic", code= ) as a variety of Aramaic. Does anyone know if this is standard, or if this is true of other ISO 639-3 coded Aramaic varieties? --Bequw → τ 18:34, 8 February 2010 (UTC)
 * "Apparently" should now link to an archive. &#x200b;— msh210 ℠ 18:52, 15 February 2010 (UTC)

Chinese
I'd like to update the Chinese entry. Is there any way to just write in plain English, without passing through a template? Mglovesfun (talk) 16:41, 25 March 2010 (UTC)


 * Use the templates, please; because they standardize the possible texts, and standardization is good. Another way to contribute to the page is typing here what you need, so I may update the table. --Daniel. 14:15, 22 April 2010 (UTC)


 * I've changed the table to a regular wikitable so that anyone can edit it and so that it can handle more complex situations and the presence of deleted codes. Cheers, - -sche (discuss) 21:05, 23 May 2013 (UTC)

Aramaic redux
Because at least one RFM is ongoing(?), I'll list this here rather than on the main page: oar (Old Aramaic, up to 700 BCE) is not used, as it has been superseded by arc and syc. tmr (Jewish Babylonian Aramaic, circa 200-1200 CE) is not used, as it has been superseded by arc and etyl:tmr. - -sche (discuss) 00:11, 16 July 2013 (UTC)

Montagnais/Innu
Currently, some main-namespace pages use Montagnais/Innu's language code (probably mostly in translations tables) while a few use other Cree dialects' language codes. Innu is different enough from Cree that Innu is regularly considered side-by-side with (rather than subordinated under) Cree; e.g. the Linguistic Atlas of Canada speaks of "different Cree and Innu dialects". OTOH, they're not that different, and splitting them at the L2 level would raise questions of what to do with e.g. Naskapi. I'm curious whether we should (a) allow Innu its own L2, (b) merge it completely into Cree, or (c) leave it subordinated under / merge into cr at the L2 level, but let it keep its code (it currently still has one, as no-one ever deleted it) so that it can be used in translations tables (like the Romani lects' codes). The translations could be nested under Cree/cr, or could be separate, sorted under M or I depending on which name we end up using for the lect. - -sche (discuss) 22:26, 20 July 2013 (UTC)

RFM 1
This is an old, old mistake in ISO. Both codes refer to the very same language, namely the Frisian dialect spoken in Saterland, which is an Eastern Frisian dialect. I have no idea how that was overlooked, but it means the two codes should be merged somehow. I'd prefer, since that one is in 639-2. -- Liliana • 14:24, 17 October 2011 (UTC)
 * Should the language name be "East Frisian" or "Saterland Frisian"? I'd prefer to use the code "frs", but the name "Saterland Frisian". - -sche (discuss) 19:08, 17 October 2011 (UTC)
 * To me it seems like Saterland Frisian is the most common name, so we should probably use that. -- Liliana • 19:14, 17 October 2011 (UTC)
 * Alright, frs = Saterland Frisian it is, but is in fact widely used — someone will need to replace it by bot. - -sche (discuss) 23:50, 23 October 2011 (UTC)
 * Or a really bored person like me needs to spend an hour or two. -- Liliana • 00:12, 24 October 2011 (UTC)
 * But what about etymologies involving Eastern Frisian, at the time it still existed? With no code, how should they be entered? Or even, how should the etymologies that already exist be fixed? —CodeCat 11:23, 24 October 2011 (UTC)
 * If it warrants a distinction, it should get one of these constructed codes. It isn't covered by the code frs anyway, which ISO classifies as a "living" language, not an extinct one. -- Liliana • 12:13, 24 October 2011 (UTC)
 * East Frisian isn't really extinct strictly, but the only surviving instance of it is now called Saterland Frisian. —CodeCat 16:11, 24 October 2011 (UTC)
 * This doesn't explain why ISO assigned two codes to one language. We do not have that for any other language of the world. Using frs for a different language than what ISO intended would make a precedent case, and almost certainly require a vote. Another problem is that the current name "East Frisian" is really confusing, since there's an (unrelated) Low German dialect which is also called East Frisian. So in any case, you would have to sort out the erroneous uses. -- Liliana • 16:15, 24 October 2011 (UTC)
 * I agree with Liliana, we need a separate code of our own for non-Saterland varieties of East Frisian (or we need to clearly indicate that we are using "frs" to refer to a language other than the one the ISO refers to as "frs"). If a word is derived from a variety of East Frisian other than the one the ISO calls "stq", it cannot be derived from what the ISO calls "frs", because "frs" is living, and the only living East Frisian lect is "stq". - -sche (discuss) 00:34, 26 October 2011 (UTC)

Proposed additions / clarifications
These are all from translation tables, which I will edit to reflect consensus for any of these cases:
 * Macro languages:
 * Chinese: dng, ltc, och
 * Sorbian: dsb, hsb
 * Apache: apw, apm, apj, apl, apk
 * Sami: smn, smj, sms, sma, se
 * Frisian: fy, ofs, frr
 * Berber: shi
 * Marquesan: mrq, mrm


 * Dialects / script group:
 * sq: als does not exist any more, change to just Tosk
 * cop: Bohairic, Sahidic, Fayyumic
 * lt: Aukštaitian
 * ms: Rumi, Jawa
 * sc: Nugorese
 * tly: Asalemi, Anbarani, Masali
 * sh: Cyrillic, Roman, Arebica, Latin
 * arc: Hebrew, Syriac
 * ks: Arabic, Devanagari
 * cu: Cyrillic, Glagolitic
 * ro: mo no longer exists; Latin, Cyrillic
 * os: Digor, Iron
 * kea: Badiu, São Vicente, ALUPEC, Sotavento, Barlavento, Santo Antão
 * az: Cyrillic, Roman, Perso-Arabic, Arabic, Persic
 * avd: Vidari
 * egy: Archaic Egyptian, Old Egyptian, Middle Egyptian, Late Egyptian
 * tt: Cyrillic, Roman
 * lad: Roman, Hebrew, Latin
 * pa: Gurmukhi, Shahmukhi (has its own code?)
 * nso: Sepedi
 * vot: Roman, Cyrillic
 * rom: table says that rmc, rmf, rml, rmn, rmo, rmw, rmy are deprecated but they still exist in the languages module
 * kw: Kernewek Kemmyn
 * be: Cyrillic, Roman, Narkamaŭka, Taraškievica, Tarashkevitsa
 * tg: Cyrillic, Persic, Roman
 * ug: Persic, Roman, Cyrillic, Perso-Arabic
 * uz: Cyrillic, Roman, table says that uzn and uzs are deprecated but they still exist in the languages module
 * zza: Persic, Roman
 * ko: South, North
 * fia: Fadicca, Kenzi
 * cr: some codes are deprecated but still in languages module
 * lmo: Eastern, Western, Milanese
 * ms: Rumi, Jawi, Latin, Arabic
 * la: New Latin
 * pi: Burmese, Devanagari, Latin


 * Other:
 * ar: xaa, mey
 * fr: frm, fro
 * de: ksh, gsw
 * nds: deprecated but still in languages module, add nds-de, nds-nl
 * mn: cmg
 * es: osp
 * hy: xcl
 * pnb: pa
 * id: ace, ban, bjn, bug, jv, mad, mak, min, nia, sas, su
 * ga: sga, pgl
 * fy: stq
 * arc: syc
 * tt: crh
 * ko: oko, okm
 * rom: rmq
 * pl: zlw-opl

I apologize if this is in an inconvenient format- rearrange it as you like. DTLHS (talk) 00:44, 20 August 2013 (UTC)
 * Nice. Some additional things that I noticed after a quick read: okm should be under ko, pgl should be under ga, zlw-opl should be under pl, there are tons of missing Arabic sublects that should be under ar, and grc (and possibly some other lects) should be under el . —Μετάknowledge discuss/deeds 02:40, 20 August 2013 (UTC)
 * grc is already under el on the page. What Arabic sublects aren't in my list or the existing table? DTLHS (talk) 03:08, 20 August 2013 (UTC)
 * Never mind. Only mt, which shouldn't be under ar anyway (well, linguistically it should, but not sociopolitically). —Μετάknowledge discuss/deeds 04:04, 20 August 2013 (UTC)

Use title text for the language names?
A lot of the language codes in the table don't have a name next to them, but if we added the name it would become very hard to see. Would it be useful to turn it into title text, so that the name is shown when you over the mouse over the code? 19:36, 25 August 2013 (UTC)
 * Hmm. One downside to that is that it would no longer be possible (would it?) to hit Ctrl+F and search the page for a particular dialect's name. Given that one of the reasons this page exists is so that people can see if the reason we don't have a code is because we've merged it into something else (vs we just haven't added it yet), that's a significant downside. - -sche (discuss) 05:15, 26 February 2014 (UTC)

RFC discussion: May 2013
This is causing some script errors because some of the codes have since been deleted. I'm not sure what to do about that. 13:34, 23 May 2013 (UTC)


 * It needs to be redesigned so that the table can contain/mention codes that have been deleted, for the reason you mention and several other reasons. - -sche (discuss) 19:03, 23 May 2013 (UTC)


 * I've started redoing the table. - -sche (discuss) 19:30, 23 May 2013 (UTC)

List of codes the ISO has retired
This was previously at User:-sche/retired codes, but I think it is useful to have it in the Wiktionary: namespace. - -sche (discuss) 23:53, 21 December 2014 (UTC)

Retired codes which were not used on Wiktionary in February 2014
Codes which were retired from the ISO and which were not used on Wiktionary as of February of 2014. (Since then, several other codes which were retired from the ISO by that date have also been retired on Wiktionary; see the following sections.)


 * 1) nln — Durango Nahuatl - split into [azd] Eastern Durango Nahuatl and [azn] Western Durango Nahuatl
 * 2) noo — Nootka - split into [dtd] Ditidaht and [nuk] Nootka / Nuu-chah-nulth
 * 3) ''unp|getCanonicalName}} (unp) Worora - split into Worrorra [wro] and Unggumi [xgu]
 * 4) ''wiw|getCanonicalName}} Wirangu - split into Wirangu [wgu] and Nauo [nwo]
 * 5) aay — Aariya - retired as nonexistent
 * 6) acc — Cubulco Achí - merged with [acr]
 * 7) aex — Amerax - retired as nonexistent
 * 8) agp — Paranan - split into Pahanan Agta [apf] and Paranan [prf] (the lects are extremely similar, but only due to convergence; their different grammar reveals their different origins)
 * 9) ahe — Ahe - merge into Kendayan / Salako [knx]
 * 10) aiz — Aari - split into Aari [aiw] (new identifier) and Gayil [gyl]
 * 11) akn — Amikoana - retired as nonexistent
 * 12) amd — Amapá Creole - retired as nonexistent
 * 13) arf — Arafundi - split into three languages: Andai [afd]; Nanubae [afk]; Tapei [afp]
 * 14) atf — Atuence - retired as nonexistent
 * 15) auv — Auvergnat - merge into Occitan (post 1500) [oci]
 * 16) ayx — Ayi (China) - merge into Anong [nun] as duplicate / nonexistent as separate lect
 * 17) azr — Adzera - split into three languages: Adzera [adz] (new identifier), Sukurum [zsu] and Sarasira [zsa]
 * 18) baz — Tunen - split into Tunen [tvu] and Nyokon [nvo]
 * 19) bcx — Pamona - split into Pamona [pmf] (new identifier) and Batui [zbt]
 * 20) bgh — Bogan - merge into Bugan [bbh] as duplicate
 * 21) bhk — Albay Bicolano - split into Buhi'non Bikol [ubl]; Libon Bikol [lbl]; Miraya Bikol [rbl]; West Albay Bikol [fbl]
 * 22) bii — Bisu - split into Bisu [bzi] (new identifier) and Laomian [lwm]
 * 23) bjq — Southern Betsimisaraka Malagasy (later granted another code which we ourselves retired)
 * 24) bkb — Finallig
 * 25) bke — Bengkulu
 * 26) blu — Hmong Njua
 * 27) bnh — Banawá
 * 28) boc — Bakung Kenyah
 * 29) bqe — Navarro-Labourdin Basque
 * 30) bsd — Sarawak Bisaya
 * 31) bsz — Souletin Basque
 * 32) btb — Beti (Cameroon)
 * 33) bvs — Belgian Sign Language
 * 34) bwv — Bahau River Kenyah
 * 35) bxt — Buxinhua
 * 36) byu — Buyang
 * 37) cbm — Yepocapa Southwestern Cakchiquel
 * 38) ccx — Northern Zhuang
 * 39) ccy — Southern Zhuang
 * 40) chs — Chumash   -   Extinct
 * 41) cit — Chittagonian
 * 42) cjr — Chorotega   -   Extinct
 * 43) cka — Khumi Awa Chin
 * 44) ckc — Northern Cakchiquel
 * 45) ckd — South Central Cakchiquel
 * 46) cke — Eastern Cakchiquel
 * 47) ckf — Southern Cakchiquel
 * 48) cki — Santa María De Jesús Cakchiquel
 * 49) ckj — Santo Domingo Xenacoj Cakchiquel
 * 50) ckk — Acatenango Southwestern Cakchiquel
 * 51) ckw — Western Cakchiquel
 * 52) cmk — Chimakum   -   Extinct
 * 53) cnm — Ixtatán Chuj
 * 54) cru — Carútana   -   Extinct
 * 55) cti — Tila Chol
 * 56) cun — Cunén Quiché
 * 57) daf — Dan
 * 58) dap — Nisi (India)
 * 59) dat — Darang Deng
 * 60) dha — Dhanwar (India)
 * 61) dkl — Kolum So Dogon
 * 62) drh — Darkhat
 * 63) drw — Darwazi
 * 64) dyk — Land Dayak - actually a family
 * 65) elp — Elpaputih - "Lack of information may have cause Elpaputih to be considered different from [amq] (Amahai) and [plh] (Paulohi) "
 * 66) eml — Emiliano-Romagnolo - split into Emilian [egl] and Romagnol [rgn]
 * 67) eni — Enim - merge into [pse] Central Malay
 * 68) eur — Europanto - constructed - hoax / joke
 * 69) fiz — Izere
 * 70) flm — Falam Chin
 * 71) fri — Western Frisian
 * 72) gav — Gabutamon - merge into Domung[dev]
 * 73) gen — Geman Deng - merge into Miju-Mishmi [mxj] as duplicate
 * 74) ggh — Garreh-Ajuran - split between Borana [gax] and Somali [som] (!)
 * 75) ggm — Gugu Mini - extinct - nonexistent
 * 76) gmo — Gamo-Gofa-Dawro - split into three languages: Gamo [gmv], Gofa [gof], and Dawro [dwr]
 * 77) gsc — Gascon - merge into Occitan (post 1500) [oci]
 * 78) hsf — Southeastern Huastec
 * 79) hva — San Luís Potosí Huastec
 * 80) itu — Itutang
 * 81) ixi — Nebaj Ixil
 * 82) ixj — Chajul Ixil
 * 83) jai — Western Jacalteco - merge with Popti' [jac]
 * 84) jap — Jaruára - merge into [jaa] Jamamadí
 * 85) jar — Jarawa (Nigeria) - split into Gwak [jgk] and Bankal [jjr]
 * 86) kds — Lahu Shi
 * 87) knh — Kayan River Kenyah
 * 88) kob — Kohoroxitari
 * 89) krg — North Korowai
 * 90) krq — Krui
 * 91) kxg — Katingan
 * 92) leg — Lengua
 * 93) lmm — Lamam
 * 94) lms — Limousin
 * 95) lmt — Lematang
 * 96) lnc — Languedocien
 * 97) lnt — Lintang
 * 98) mbg — Northern Nambikuára
 * 99) mdo — Southwest Gbaya
 * 100) mhh — Maskoy Pidgin
 * 101) mhv — Arakanese
 * 102) miv — Mimi
 * 103) mja — Mahei
 * 104) mld — Malakhel
 * 105) mly — Malay (- language) - actually a family
 * 106) mms — Southern Mam
 * 107) mob — Moinba
 * 108) mof — Mohegan-Montauk-Narragansett - extinct
 * 109) mol — Moldavian
 * 110) mpf — Tajumulco Mam
 * 111) mqd — Madang
 * 112) mst — Cataelano Mandaya
 * 113) mtz — Tacanec
 * 114) muw — Mundari
 * 115) mvc — Central Mam
 * 116) mvj — Todos Santos Cuchumatán Mam
 * 117) myq — Forest Maninka
 * 118) myt — Sangab Mandaya
 * 119) mzf — Aiku
 * 120) nbf — Naxi
 * 121) nfg — Nyeng
 * 122) nfk — Shakara
 * 123) nhj — Tlalitzlipa Nahuatl
 * 124) nhs — Southeastern Puebla Nahuatl
 * 125) nky — Khiamniungan Naga
 * 126) nlr — Ngarla
 * 127) nxj — Nyadu
 * 128) occ — Occidental - constructed - merge into Interlingue [ile] as Duplicate
 * 129) ogn — Ogan - merge into [pse] Central Malay
 * 130) ope — Old Persian - ancient - merge into Old Persian (ca. 600-400 B.C.) [peo] as duplicate
 * 131) ork — Orokaiva - split into Orokaiva [okv] (new identifier), Aeka [aez] and Hunjara-Kaina Ke [hkk]
 * 132) paj — Ipeka-Tapuia
 * 133) pbz — Palu
 * 134) pec — Southern Pesisir
 * 135) pen — Penesak
 * 136) pgy — Pongyong
 * 137) plm — Palembang
 * 138) poa — Eastern Pokomam
 * 139) pob — Western Pokomchí
 * 140) poj — Lower Pokomo
 * 141) pou — Southern Pokomam
 * 142) ppv — Papavô
 * 143) prv — Provençal
 * 144) pun — Pubian
 * 145) puz — Purum Naga
 * 146) quj — Joyabaj Quiché
 * 147) qut — West Central Quiché
 * 148) quu — Eastern Quiché
 * 149) qxi — San Andrés Quiché
 * 150) rae — Ranau
 * 151) rjb — Rajbanshi
 * 152) rmr — Caló
 * 153) rws — Rawas
 * 154) sap — Sanapaná
 * 155) sdd — Semendo
 * 156) sdi — Sindang Kelingi
 * 157) sgl — Sanglechi-Ishkashimi
 * 158) sic — Malinguat
 * 159) skl — Selako
 * 160) slb — Kahumamahon Saluan
 * 161) srj — Serawai
 * 162) stc — Santa Cruz
 * 163) suf — Tarpia
 * 164) suh — Suba
 * 165) sul — Surigaonon
 * 166) sum — Sumo-Mayangna
 * 167) suu — Sungkai
 * 168) szk — Sizaki
 * 169) tkk — Takpa
 * 170) tle — Southern Marakwet
 * 171) tlz — Toala'
 * 172) tmx — Tomyang
 * 173) tnf — Tangshewi
 * 174) tnj — Tanjong
 * 175) tot — Patla-Chicontla Totonac
 * 176) ttx — Tutong 1
 * 177) tzb — Bachajón Tzeltal
 * 178) tzc — Chamula Tzotzil
 * 179) tze — Chenalhó Tzotzil
 * 180) tzs — San Andrés Larrainzar Tzotzil
 * 181) tzt — Western Tzutujil
 * 182) tzu — Huixtán Tzotzil
 * 183) tzz — Zinacantán Tzotzil
 * 184) ubm — Upper Baram Kenyah
 * 185) vky — Kayu Agung
 * 186) vlr — Vatrata
 * 187) vmo — Muko-Muko
 * 188) wgw — Wagawaga
 * 189) wit — Wintu
 * 190) wre — Ware
 * 191) xah — Kahayan
 * 192) xkm — Mahakam Kenyah
 * 193) xmi — Miarrã
 * 194) xsk — Sakan - ancient
 * 195) xst — Silt'e
 * 196) xuf — Kunfal
 * 197) yib — Yinglish - merged into English
 * 198) yio — Dayao Yi
 * 199) ymj — Muji Yi
 * 200) ypl — Pula Yi
 * 201) ypw — Puwa Yi
 * 202) yus — Chan Santa Cruz Maya - merge with Yucateco [yua]
 * 203) yuu — Yugh - Yugh [yuu] is a duplicate of Yug [yug]
 * 204) ywm — Wumeng Yi - merge into Wusa Yi [ywu], renamed Wumeng Nasu (cf. 2007-038)
 * 205) yym — Yuanjiang-Mojiang Yi - split into Southern Nisu [nsd] and Southwestern Nisu [nsv]
 * 206) ztc — Lachirioag Zapotec - merge into Yatee Zapotec [zty]

Retired codes which have been discussed since February 2014
Please see Language treatment/Discussions (Beer parlour/2014/February) and Language treatment/Discussions (Beer parlour/2014/March).

Retired codes which are still used on Wiktionary
Some codes which were retired from the ISO but which are still used on Wiktionary. (This list is not necessarily comprehensive.) Some codes in the list have been discussed, and these have been intentionally retained: sh "Serbo-Croatian", gio "Gelao", kzh "Dongolawi" / "Kenuzi-Dongola", mnt "Maykulan". Meanwhile, these have not yet been discussed.

List of ISO 639 codes absent from Wiktionary
Most of the 7865 codes present in ISO 639 are present on Wiktionary; most of those which are not are recorded on WT:LT. The only ones which have slipped between those two cracks are these, which should be investigated and discussed in the coming weeks. In many cases, the exclusion is likely nothing more than an oversight; in some cases, it's clearly because a naming conflict prevented importation of the codes back when Wiktionary bot-imported ISO 639 en masse (something we can now solve with disambiguators): (This list is complete as of August 2015, before the 2015 change requests were finalized. Notes and misc.) - -sche (discuss) 15:47, 11 August 2015 (UTC)
 * 1) cek — Eastern Khumi Chin - a dialect of cnk (Khumi Chin)
 * 2) dda — Dadi Dadi
 * 3) dgw — Daungwurrung
 * 4) dja — Djadjawurrung
 * 5) deq — Dendi (Central African Republic) - presumably failed to be included because of the naming conflict with ddn — Dendi (Benin)
 * 6) dmd — Madhi Madhi (Muthimuthi)
 * 7) dth — Adithinngithigh - compare rrt, which is said to be a different language
 * 8) dty — Dotyali
 * 9) gku — ǂUngkue
 * 10) gll — Garlali
 * 11) gpe — Ghanaian Pidgin English - probably to be combined with other African Pidgin English (see RFM)
 * 12) gwm — Awngthim
 * 13) gmz - Mgbo
 * 14) hna — Mina (Cameroon) - presumably failed to be included because of the naming conflict with myi Mina (India), which, however, is spurious
 * 15) ihw — Bidhawal - a dialect of/with unn
 * 16) jan — Jandai
 * 17) jbi — Badjiri - possibly not even Karnic; cf my notes about ekc above and on User:-sche/retired codes
 * 18) jbk (Barikewa) and jmw (Mouwase) — varieties of  Omati/Mini, said to be quite divergent from each other: but we should either have mgx or have jbk+jmw, not all three
 * 19) jbw — Yawijibaya
 * 20) jgk — Gwak
 * 21) jjr — Bankal
 * 22) jms — Mashi (Nigeria)
 * 23) jog — Jogi
 * 24) jui — Ngadjuri
 * 25) kbn — Kare (Central African Republic)
 * 26) kmf — Kare (Papua New Guinea)
 * 27) kol — Kol (Papua New Guinea)
 * 28) myi — Mina (India) (see hna)
 * 29) nmx — Nama (Papua New Guinea)
 * 30) npg — Ponyo-Gongwang Naga
 * 31) nqy — Akyaung Ari Naga
 * 32) nsf — Northwestern Nisu
 * 33) ntx — Tangkhul Naga (Myanmar)
 * 34) nwg — Ngayawung
 * 35) nxk — Koki Naga
 * 36) oke — Okpe (Southwestern Edo)
 * 37) okx — Okpe (Northwestern Edo)
 * 38) olk — Olkol
 * 39) orc — Orma
 * 40) pnl — Paleni
 * 41) ptq — Pattapu
 * 42) sfe — Eastern Subanen
 * 43) sgj — Surgujia - Suraji, Surguja, Surgujia-Chhattisgarhi, Surjugia
 * 44) sim — Mende (Papua New Guinea)
 * 45) sng — Sanga (Democratic Republic of Congo)
 * 46) sox — Swo
 * 47) spb — Sepa (Indonesia)
 * 48) tcl — Taman (Myanmar) - (extinct)
 * 49) tgj — Tagin
 * 50) tgz — Tagalaka - (extinct)
 * 51) tjl — Tai Laing
 * 52) tmn — Taman (Indonesia)
 * 53) tnz — Tonga (Thailand)
 * 54) tst — Tondi Songway Kiini
 * 55) xsn — Sanga (Nigeria)
 * 56) xud — Umiida - (extinct)
 * 57) xun — Unggaranggu - (extinct)
 * 58) xyy — Yorta Yorta
 * 59) yhs — Yan-nhaŋu Sign Language - signed by 10 people, not that distinct from ygs (exclude?)
 * 60) ykn — Kua-nsi
 * 61) yku — Kuamasi
 * 62) ysg — Sonaga
 * 63) yxy — Yabula Yabula - (extinct)
 * Codes in the above list which have been added to Module:languages or WT:LT or otherwise dealt with have been stuck. - -sche (discuss) 03:11, 21 August 2016 (UTC)

Bidhawal
The ISO added a code for Bidhawal, which we never got around to adding. That seems to be OK; Robert M. W. Dixon says in Australian Languages: Their Nature and Development (2002, ISBN 0521473780 that "Bidhawal appears not to constitute a separate language, but rather to be the most eastern dialect of Q, Muk-thang (or Kurnai). The grammatical forms given by Mathews for Bidhawal are almost identical to those for Muk-thang, as are most of the verbs and a good proportion of nouns." - -sche (discuss) 03:02, 21 August 2016 (UTC)

Treatment of reconstructed languages?
We merged Proto-Finno-Ugric and Proto-Finno-Permic into Proto-Uralic, and Proto-Baltic into Proto-Balto-Slavic. The original languages remain as etymology codes. Should this be mentioned here? —CodeCat 18:48, 21 August 2015 (UTC)


 * Sure. Maybe in a separate table, though? Since those aren't cases where we deprecated, split, or broadened an ISO code, but rather cases where we assigned a code of our own devising and then went "wait, on second thought, nah". - -sche (discuss) 19:10, 21 August 2015 (UTC)

Akan and its subdivisions
As for Akan we can currently find that both the macrolanguage and its subdivisons are treated as languages though Category:Fanti language and Category:Twi language were merged previously. It seems that we have to modify the description. How's that? --Eryk Kij (talk) 22:53, 26 May 2016 (UTC)


 * [//en.wiktionary.org/w/index.php?title=Wiktionary%3ALanguage_treatment&type=revision&diff=38606792&oldid=37476310 Like so]; thanks for pointing out that this page still needed to be updated. - -sche (discuss) 23:21, 26 May 2016 (UTC)

ISO code changes 2018
Some codes have been merged or retired following the ISO's 2018 code changes; these changes are not necessarily recorded on WT:LT because the codes in question were not just merged or retired by Wiktionary but by the ISO. See Beer parlour/2019/February for a list. - -sche (discuss) 00:07, 24 February 2019 (UTC)

Scope of the page
I think we need to be more specific in terms of what we mean by language treatment. It should only apply to how languages are treated for entry making, but not anywhere else. For example, we allow many etymology-only languages in etymologies. See Wiktionary talk:About Chinese — justin(r)leung { (t...) 04:52, 9 June 2020 (UTC)
 * Oh my~ got to extirpate the remnants of truth from Wiktionary! This is just about pretending Chinese is a language when we know it's a macrolanguage. Just treat Chinese like any other macrolanguage group. So sad. --Geographyinitiative (talk) 05:05, 9 June 2020 (UTC)
 * Many macrolanguages are treated the same way Chinese is. Have you even read the page? — justin(r)leung { (t...) 05:21, 9 June 2020 (UTC)
 * If I may inquire, what other macrolanguage group is treated like Chinese is treated on Wiktionary? If you can give me a good answer on this, I could be much more convinced that the current system for covering Chinese languages on Wiktionary is not a disaster. --Geographyinitiative (talk) 05:24, 9 June 2020 (UTC)
 * Zhuang is one of among many. Please see the main page - any macrolanguage in the table that is marked with "Only the macrolanguage is treated as a language" would be the same situation (more or less). — justin(r)leung { (t...) 05:35, 9 June 2020 (UTC)
 * A rhetorical question: How many times the sad geography troll should be spoiled with responses so he stops treating Wiktionary and its editors as a complete disaster? --Anatoli T. (обсудить/вклад) 07:20, 9 June 2020 (UTC)
 * If I may ask, what are the different Zhuang languages? Are there any other macrolanguage groups not associated with Chinese characters or influenced by Chinese politics that are not split up by language? I think every language should have its own header on Wiktionary, don't you? Atitarev, please don't hate me man! I am bringing a perspective that represents the opinions of many others and I am trying to make honest inquiries about really important things. There were no Wade Giles or Tongyong Pinyin derived geo terms before I came here, and I helped add an important perspective which was being neglected. I am a 'troll' because I bring an outsider perspective, but I am not a troll because I am actively working and negotiating to make the dictionary better with tangible results. Geographyinitiative (talk) 22:43, 20 June 2020 (UTC)
 * You don't bring anything new. All valid forms are welcome and nobody blocked any language or any script or dialect or transliteration scheme. Yes, that includes Wade-Giles, Tonyong Pinyin, Min Nan in Chinese characters and Min Nan in POJ. Your conspiracy theories have no grounds at all. Bring away your perspective but don't poison people's minds about the achievements of this site. You don't raise any awareness, everyone is aware of what's out there. You just don't want to see it. You're slinging dirt around, then apologise or start praising people, which I find hypocritical. You talk a lot about your own achievements but nobody does it here, this is called narcissism. If there is not enough coverage for anything, then there was not enough contributors. Languages are somewhat like currencies. If a value of currency of small third-world country is low, nobody is interested in it but people of that country have to use it. Even if you do add Wade-Giles, Tonyong Pinyin, you pose it as an opposition of Mandarin and Hanyu Pinyin domination, which you blame this site for, not accepting the reality but it's still someone's fault, isn't it? And you keep blaming someone and no-one in particular for that. Everything is doable and achievable. You want to make the distinction between Min Nan and Mandarin, just do it within the existing infrastructure. Nobody stops you from defining specific senses, usage examples, etc. You want to add alternative English spellings, varieties of Chinese, go ahead, just do it in a positive way. Stop blaming everyone or the site. You just turn away people from your cause. All the work is welcome, if it's not breaking agreed conventions or rules. In short, add you Wade-Giles your forms, POJ, Min Nan, whatever but start making sense, stop attacking pinyin, Mandarin, this site, etc.
 * The Zhuang situation is a good example of a macrolanguage but it's harder to demonstrate at Wiktionary as the Zhuang coverage is very low at Wiktionary. The unified approach for other language, other than Chinese is better demonstrated by Serbo-Croatian, which combines two to four different standards, depending how to count - Croatian, Serbian, Bosnian and Montenegrin, two scripts - Cyrillic and Roman (Latin), two major dialects - Ekavian and Ijekavian (+Kajkavian). I don't want to cause more trolling from you but Serbo-Croatian "unification" had much stronger opposition and hate. You can imagine the passions after the Yugoslav war where language identity was a reason to be shot at or imprisoned. Nevertheless, at Wiktionary, the scientific and technical reasoning prevailed over hate. Don't imagine for a second that Chinese varieties and Serbo-Croatian standards and dialects are comparable. No way. They are not. Chinese varieties are mostly not mutually comprehensible. However, the rationale for the unified approach was presented and it won. You won't achieve anything by winging and trolling negative messages. Yes, I consider your mentioning that this site may be a complete disaster or similar at every opportunity is trolling. --Anatoli T. (обсудить/вклад) 23:37, 20 June 2020 (UTC)
 * Zhuang lects. (Whether these pronunciations actually belong on the entry is a different matter) —Suzukaze-c (talk) 23:39, 20 June 2020 (UTC)
 * Let me mull it over a little more. However, I think that it would be wildly difficult to reach the conclusion "the Chinese macrolanguage header, including all language in modern China's borders from Cangjie to to-day, should be portrayed as equivalent to the Danish, Norwegian, Sweedish, English, German, French etc headers, implying they are all equally "languages"." I would that in my worst case scenario some kind of further disclaimer should be added automatically to every page that has the "Chinese" header so we know it means "any Chinese characters used in China since the Shang dynasty til today, including numerous unintelligible dialects with independent Wikipedia versions". What an expansive header it is! --Geographyinitiative (talk) 00:20, 22 June 2020 (UTC)
 * Wiktionary doesn't have to apologise on every page on how it works. The votes on Chinese and Serbo-Croatian unifications defined the dictionary policies, which is or should be mentioned on appropriate About pages. If it's too hard to accept, which is understandable, you have two options: 1. get a new vote and win it or 2. leave, which was the case with some unhappy Croatian and Serbian editors. We didn't have a precedent in my memory with unhappy Chinese editors wanting to reverse the change and leaving, you may be the first. If you decide to stay, my personal advice is, you have to stop complaining at every opportunity in talk pages or edit summaries, bite the bullet and contribute in your favourite area, including enhancing dialectal coverage, strive to make it work, so that all promises in the vote to adequately cover all Chinese varieties are kept. --Anatoli T. (обсудить/вклад) 00:53, 22 June 2020 (UTC)

Stale but unresolved discussions of languages to add or remove
Because they are so stale, I am unilaterally moving these off the RFM page because that page has grown too massive (800,000 bytes) to be usable; however, because they are unresolved, I don't want to hide them away on WT:LTD... so here they go... - -sche (discuss) 06:52, 28 December 2023 (UTC)

RFM discussion: July 2016–October 2020
This next batch is of languages from lists other than Ethnologue and LinguistList. As before, I've tried to vet them all beforehand, but I will have doubtlessly made some mistakes. NB if you want to find more: I've avoided dealing with most of the Loloish languages, because all the literature seems to be in Chinese. —Μετάknowledge discuss/deeds 04:54, 6 July 2016 (UTC)


 * (tbq-alp) — perhaps should be named Yiqing
 * (map-alt)
 * (omq-anz) — hard to say how different it is, but it's extinct, so a finite lexicon
 * (azc-aut)
 * I have yet to find content in this lect to judge how different it is from our other Nahuatls. - -sche (discuss) 04:32, 24 May 2017 (UTC)
 * (map-ave)
 * Wikipedia (and Lyle Campbell, Anna Belew, Cataloguing the World's Endangered Languages, 2018) says this is dix "Dixon Reef". Is it not? (Or if it is, should the name associated with that code be changed?) - -sche (discuss) 20:10, 1 August 2020 (UTC)
 * (tbq-ban)
 * (tbq-cha)
 * (sit-dam)
 *  (ira-day)
 * I've asked some of our editors of Iranian languages for input. - -sche (discuss) 00:17, 2 August 2020 (UTC)
 * Based on feedback there, not added at this time, although I note that content in the language seems to exist, which suggests we would eventually need to figure out a header to include it under. - -sche (discuss) 20:44, 2 August 2020 (UTC)
 * (crp-joo)
 * (alv-kas)
 * (aav-kas) — questionable whether this is a separate language
 * (urj-kya)
 * Perhaps better prm-kya? Also while I am not convinced treating the Komi varieties as separate languages altogether is the best solution, as long as we do so, we might moreover need . --Tropylium (talk) 18:44, 11 July 2016 (UTC)
 * (crp-kur)

Australian languages

 * (aus-bug)
 * (aus-dyi)
 * (aus-gul)
 * (aus-gun)
 * (aus-kth)
 * (aus-kur)
 * (aus-mir)
 * has an ISO code (to be added), see BP - -sche (discuss) 04:17, 14 October 2020 (UTC)
 * (aus-ngr)
 * (aus-ngg)
 * (aus-ngu)
 * has an ISO code (to be added), see BP - -sche (discuss) 04:17, 14 October 2020 (UTC)
 * (aus-wom)
 * (aus-wpa)
 * (aus-yim)

Tasmanian and other

 * Northeastern Tasmanian:


 * Northeastern, (aus-pye) ✅
 * alt names/varieties: Plangermaireener, Plangamerina, Cape Portland, Ben Lomond, Pipers River
 * North Midlands, (aus-tye) — Bowern considers this a dialect; perhaps we should just trust her
 * now has an ISO code which should be added instead, see BP shortly - -sche (discuss) 04:27, 14 October 2020 (UTC)
 * (aus-lbt) — the worst name in Bowern's set!
 * I'm not sure... the very language is "reconstructed" by Bowern on the assumption that three wordlists (of which only two make it into the name) attest the same language, although apparently none of the three bothered to name the language. The chance of someone "would run across [a word in] it and want to know what it means" seems nonexistent. If we wanted to host the wordlists, we could do that in an appendix or on Wikisource. - -sche (discuss) 16:09, 9 August 2016 (UTC)
 * Bowern's methods are scientific; but I would feel better if more than one scholar was saying there was one language in this set of wordlists, the way that for e.g. Port Sorrell, Dixon & Crowley and Glottolog agree that there is a unit/lect there. - -sche (discuss) 16:55, 4 June 2017 (UTC)
 * and what of "Norman Tasmanian"? - -sche (discuss)


 * Here is another language we might need a code for: Ma(') Pnaan (poz-map?), also known by the exonyms Punan Malinau and Punan Segah, a language of Borneo / East Kalimantan, summarized by Antonia Soriente here and elsewhere. Compare the other things listed at . - -sche (discuss) 05:21, 29 August 2016 (UTC)

RFM discussion: August 2016
Here are a few more North American languages for which we could add codes:
 * (nai-ako). WP says it is attested certainly in two words in Spanish records (Yegsa "Spaniard[s]", which Swanton suggests is similar to Atakapa yik "trade" + ica[k] "people"; and the female name Quiselpoo), and possibly in more words in a wordlist by Jean Béranger in 1721 (if the wordlist is not some other language).

Labrador Inuit Pidgin French, less often called Belle-Isle Pidgin, was spoken in Labrador from the late 1600s (probably since before the 1660s, but first written down in 1694) until at least the mid 1760s, based on Inuktitut, French, Basque, Montagnais, and possibly Spanish and Breton. Louis-Jacques Dorais, An Inuit Pidgin around Belle-isle Strait (1996; with reference to "Clermont - Martijn 1980; Dorais 1980; Bakker 1988"), covers the records:
 * Louis Jolliet recorded words at Baie Saint-Louis in 1694, including the 'greeting' thou tcharacou, saying the latter word is "peace", which Dorais says is "corroborated by two other sources, from 1717 (characoua [...]) and 1720 (characo [...]). But a text from 1743 (Privy Council 1927: 3284), written by the French merchant Louis Fornel, gives to characo the meaning 'war'." Thou is probably from . The other would could be Basque txarrakoa "bad", thus "are you bad?".
 * Le Cour in 1742 records some more words: bons camaras "good comrades", tous camaras "all comrades", capitaine "captain", kellanoré (which Dorais says "seems to be Le Cour's [or the pidgin's?] rendering of Inuktitut kinaunali 'but who is he?'?), the personal name Amargo (a rendering of Amaqqut "Wolves"), rénombek "bead" (probably a loanword), maumek "file" (probably a loanword), monkoumek "knife" (probably a loanword from Montagnais mukuma:n, as spelled in Marguerite Ellen MacKenzie Towards a Dialectology of Cree-Montagnais-Naskapi).
 * Louis Fornel in 1743 recorded more: tout camara "all comrades", troquo balena "let us trade whale" (from French troquons!), non characo "no war" (sic, per Fornel).
 * Jens Haven wrote other words in 1764-5: makagua "peace" (perhaps from Basque bake[a] "peace" plus a suffix -koa), kutta (French couteau "knife"), memek "to drink" (from Inuktitut imiq "drinking water").
 * Few references discuss the lect and it is difficult to judge whether it is really a language or just something like broken French or like Spanglish (which I think we exclude), but the fact that the Inuit apparently changed the meaning and even part of speech of words in their own language when speaking pidgin suggests it is more on the pidgin-language side of that continuum than the code-switching side.

and from South America: - -sche (discuss) 04:04, 16 August 2016 (UTC)
 * (crp-abp). Wikipedia has a sample. The Atlas of Languages of Intercultural Communication, citing Bakker, says it was spoken from at least 1580 (and perhaps as early as 1530s) through 1635, and "only a few phrases and less than 30 words attributable to Basque were written down" (though apparently more words, attributable to other sources, were also recorded).
 * (Cuauchichil, Quauhchichitl, Chichimeca) ( nai-gch or, if Guachí is added as sai-gch, perhaps nai-gcl to prevent the two similarly-named lects from being mixed up by only typoing the initial n vs s), apparently sparsely attested.
 * Concho (nai-cnc). The Handbook of North American Indians, volume 10, says "three words of Concho [...] were recorded in 1581 [and] look like they may be [...] Uto-Aztecan".
 * Jumano (Humano, Jumana, Xumana, Chouman, Zumana, Zuma, Suma, and Yuma) (nai-jmn). The Handbook says "It has been established that the Jumano and Suma spoke the same language. Three words have been recorded" of it.
 * Peba / Peva (sai-peb), said by Erben to more properly by called Nijamvo, Nixamvo. Spoken in "the department of Loreto" in Peru. Attested in wordlists by Erben and Castelnau, which Loukotka provides, and which disagree with each other substantially: munyo (Erben) / money (Castelnau) "canoe, small boat"; nero (E) / yuna (C) "demon"; nebi (E) / nemey (C) "jaguar"; teki (E) / tomen-lay (C) "one", manaxo (E) / nomoira (C) "two"; etc. I would even consider that one might not be the same language as the other... what's with these languages that survive in disparate wordlists? lol.
 * possibly Saynáwa: fr.Wikt grants a code to this variety of, described here (see also ).


 * Support all except possibly Akokisa. I think it's a dialect of Atakapa, and that the wordlist is very likely not being linked correctly. That said, it's so few words, that there's no real reason not to accept it as a separate language, just to be conservative about it. —Μετάknowledge discuss/deeds 04:08, 16 August 2016 (UTC)
 * Good point about Akokisa. (I am reminded that you had mentioned its dialectness earlier; sorry I forgot!) The wordlist, labelled only with a tribal name per WP, is possibly plain Atakapa, but Yegsa is supposedly recorded as specifically Akokisa; OTOH that doesn't rule out that Akokisa is a dialect. Indeed, M. Mithun's Languages of Native North America treats as dialects Akokisa, Eastern ("the most divergent, [...] known from a list of 287 entries") and Western ("the best documented. Gatschet recorded around 2000 words and sentences, as well as texts [...] Swanton recorded a few Western forms", all published in 1932 in a dictionary). I suppose the benefit to treating it as a dialect would be that we could context-label Yegsa and Quiselpoo as and then Béranger's forms as  without needing to agonize over which header to put them under. - -sche (discuss) 15:31, 16 August 2016 (UTC)

RFM discussion: April 2017–October 2020
The following languages have ISO codes, but those codes should be removed, as there is no linguistic material that can be added to Wiktionary. This list is taken from Wikipedia's list of unattested languages, but I have excluded languages which are not definitively extinct (and thus which may have material become available). If there was any reliable source I could find corroborating the WP article's claim of lack of attestation, it is given after the language. —Μετάknowledge discuss/deeds 04:15, 4 April 2017 (UTC)


 * [aga]
 * Unclear if it even existed per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).


 * [bpb] (the Wikipedia article has a discussion of the conflation of this unattested language with Pasto, which needs a code; for clarity, I think this [bpb] should be retired and an exceptional code made explicitly for Pasto)
 * Retired, following the ISO, see Beer parlour/2020/October. Content, if needed for migration to a Pasto code, was m["bpb"] = {	"Barbacoas",	"Q2669202",	"sai-bar",	otherNames = {"Pasto"},	scripts = Latn, } - -sche (discuss) 06:23, 14 October 2020 (UTC)


 * [dek]


 * [giy]
 * AIATSIS has the following to say: "According to Ian Green (2007 p.c.), this language probably died before the 1920's and neighbouring groups in the Daly claim it was the language of Peron Island which was linguistically and perhaps culturally distinctive from the nearby mainland societies. Black & Walsh (1989) say that this may or may not have been a dialect of Wadiginy N31." —Μετάknowledge
 * The 1992 International Encyclopedia of Linguistics, v. 1, p. 337, says "Giyug: 2 speakers reported in 1981, in the Peron Islands in Anson Bay, southwest of Darwin." The 2003 edition repeats the claim that "2 speakers remain". Wikipedia says it's extinct and unattested, but Glottolog, although having no resources on it, suggests it's not extinct. Might be best to leave it alone for now. - -sche (discuss) 01:13, 6 August 2020 (UTC)


 * [wma] (We call this "Mawa", if removed, [mcw] Mahwa ( can be renamed to the evidently more common spelling "Mawa".)
 * Removed, and mcw renamed. Glottolog had only one reference to support the existence of Mawa, Temple (1922), which does not even include a section under that header. There may be confusion with the section on the "Marawa", but that does not even mention what language those people speak. (Temple also knows very little about linguistics; while skimming through, I found that Margi (a Chadic language) was said to be similar to the languages of South Africa. —Μετάknowledge discuss/deeds 01:39, 6 August 2020 (UTC)


 * [nbg]
 * Appendix I in The Indo-Aryan Languages records this language as being a subdialect of Dhundari [dhd] and the 1901 Indian Census concurs; this is at odds with its description as an unattested Dravidian language, but the geographical specifications seem to match up.


 * [nrx]
 * AIATSIS says: "Harvey (PMS 5822) treats Ngomburr as a dialect of Umbukarla N43, but in Harvey (ASEDA 802), it is listed as a separate language." Nicholas Evans confirms in The Non-Pama-Nyungan Languages of Northern Australia that it is unattested.


 * [tme]


 * [tka]


 * [waf]


 * [was]
 * Unclassified due to its absence of data per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).

RFM discussion: May 2017–October 2020
Geoffrey Hull, director of research for the Instituto Nacional de Linguística in East Timor, notes (in a 2004 Tetum Reference Grammar, page 228) that "the alleged Atauran Papuan language called 'Adabe' is a case of the mistaken identity of Raklungu," a dialect (along with Rahesuk and Resuk) of Wetarese. He notes (in The Languages of East Timor, Some Basic Facts) that only Wetarese is spoken on the island, and Studies in Languages and Cultures of East Timor likewise says "The three Atauran dialects—with the northernmost of which the dialect of nearby Lirar is mutually intelligible—are unquestionably Wetarese, and not dialects of Galoli, as Fox and Wurm suggest for two of them (n. 32). The same authors refer (ibidem) to a supposedly Papuan language of Atauro, the existence of which appears to be entirely illusory." (The error appears to have originated not with Fox and Wurm but with Antonio de Almeida in 1966.) - -sche (discuss) 01:45, 31 May 2017 (UTC)
 * remove Adabe [adb]
 * We could repurpose the code into one for those three Atauran varieties of Malayo-Polynesian Wetarese, Rahesuk, Resuk, and Raklu Un / Raklungu (the last of which Ethnologue does list as an alt name of adb, despite their erroneous family assignment of it), perhaps under the name "Atauran Wetarese" for clarity. - -sche (discuss) 01:52, 31 May 2017 (UTC)

Glottolog makes the case that this is spurious. - -sche (discuss) 07:57, 31 May 2017 (UTC)
 * remove Agaria [agi]

Arma (aoh) is also said to be "a possible but unattested extinct language"; I am trying to see if that means it is entirely unattested, or if there are personal/ethnic/place names, etc. - -sche (discuss) 09:45, 3 June 2017 (UTC)
 * Arma
 * Removed, see Beer_parlour/2020/October. - -sche (discuss) 06:18, 14 October 2020 (UTC)

The VU Amsterdam report linked to here seems to indicate that one lect has been given multiple codes, and that "Jair" at least is spurious. Further research wouldn't hurt. —Μετάknowledge discuss/deeds 00:24, 3 October 2019 (UTC)

Splitting Selkup
After a while of deliberation with Kaarkemhveel and two other future Selkup editors, we have come to the conclusion that it's best to split Selkup into two codes: Northern Selkup (sel-nor) and Southern Selkup (sel-sou) [the exact form of the codes is up for debate], which will both be part of the Selkup family (sel).

These two dialect areas are so different that treating them as a single language would be too bothersome. All subdialects are going to be marked with labels, and provided as languages in descendants sections (much like the two Karelian proper varieties are, or the Zyrian dialects).

The two branches are often named as different: Glottolog splits Selkup into "Kety-Central-Southern Selkup" (Southern) and "Taz-Turukhan" (Northern); The Oxford Guide to the Uralic Languages also shows a split between "Northern Selkup" and "Tomsk region Selkup" (p.778). A few more examples of papers that do this include Wurm (1997), Budzisch (2015), Vorobeva et al. (2017)...

There is precedent for treating these as different languages: ELP splits the family into three full-fledged languages. On the pages there is the following reasoning for this split: "The three main varieties of Selkup have traditionally been counted as dialects of a single language; their differences are, however, comparable to those between, for instance, Ket, Yug, and Pumpokol". The Russian institute RAN also splits Selkup into Northern and Southern, as two full-fledged languages.

So, does anyone have an issue with this split? Thadh (talk) 11:04, 19 June 2023 (UTC)


 * Not oppose as there are clear differences both lexical and cultural. Tollef Salemann (talk) 11:14, 19 June 2023 (UTC)
 * The Wikipedia article also mentions a Central Selkup. What are you doing with that one? Does it belong to Southern Selkup? —Mahāgaja · talk 14:03, 19 June 2023 (UTC)
 * Yes, that one will then be handled as Southern Selkup, just like it is by the above sources. Thadh (talk) 14:12, 19 June 2023 (UTC)


 * No opposition on this much, Northern Selkup is by now clearly distinct from non-Northern and has its own literary standard. Bridging historical data exists but would be probably better handled in Proto-Selkup entries anyway, about all of it is field records and not direct literary use by the speaker community.
 * Depending on how work on non-Northern Selkup develops, further division could be eventually meaningful too. The other recent handbook, Routledge's The Uralic Languages, Second Edition discusses things from a primarily tripartite Southern / Central / Northern perspective and notes that, though the sharpest modern boundary is Central vs. Northern, the most taxonomically significant difference is Southern vs. {Central, Northern}. I believe currently Southern is better-documented than Central, but the latter is what still has some attempts at literary usage and revival. --Tropylium (talk) 14:48, 19 June 2023 (UTC)
 * Done. Cleanup is ongoing. Thadh (talk) 20:01, 28 June 2023 (UTC)

RFM discussion: December 2023–January 2024
Pinging (possibly) interested users, as always, feel free to ping more:.

I propose we split Carpathian and Pannonian Rusyn into two codes ( and   respectively, in line with their ISO 639-3 codes), and then set Old Slovak the ancestor of Pannonian Rusyn. I have made a list of typical Slavic developments on User:Thadh/Rusyn and given both a Pannonian Rusyn form (from Ramač 1995, Српско-русински речник) and a Carpathian Rusyn form (from Kercha 2012, Словник русько-русинськый). I think this proves beyond much of a doubt that Pannonian Rusyn belongs to the West Slavic group, and specifically to the Slovak dialects, while Carpathian Rusyn is part of the East Slavic group. This is also a view that is supported by many scholars. Thadh (talk) 13:28, 14 December 2023 (UTC)


 * Sławobóg (talk) 13:36, 14 December 2023 (UTC)
 * @Thadh would it be possible to add an Eastern Slovak column to your tables (presumably the variety of Slovak that Pannonian Rusyn would be closest to) for comparison? I'm not sure how much extra work that would be, but if it's not a huge amount, it would be helpful. Chernorizets (talk) 13:44, 14 December 2023 (UTC)
 * Unfortunately, I don't have an Eastern Slovak dictionary at hand, but if anyone does, they're encouraged to add the forms! Thadh (talk) 13:59, 14 December 2023 (UTC)
 * . The reflexes are clear, there are language codes, and it's the right moment to do this as Rusyn isn't highly developed yet, so splitting will be easier. Vininn126 (talk) 13:49, 14 December 2023 (UTC)
 * Thanks for pinging me. I don't have enough background in Rusyn to wager a strong opinion here. Benwing2 (talk) 21:39, 14 December 2023 (UTC)
 * @Thadh: Does Pannonian Rusyn completely lack native pleophony (polnoglasie) or they are all late borrowings? E.g. Pannonian/Carpathian vs  and  vs . If yes, then it looks like it can't belong to East Slavic languages. I support tentatively but I don't have much knowledge on Pannonian. Anatoli T. (обсудить/вклад) 22:31, 14 December 2023 (UTC)
 * Yes (which is kind of the point). Similarly reflexes of PS palatals, strong yers, and other things. Everything points to Pannonian being West Slavic and Carpathian being East Slavic. Thadh (talk) 22:34, 14 December 2023 (UTC)
 * @Thadh: I see, thanks. I have yet to digest other differences.
 * Pannonian examples and  kind of contradict the overall differences, no? Anatoli T. (обсудить/вклад) 23:26, 14 December 2023 (UTC)
 * The language has been influenced by Czech, Ruthenian (> Ukrainian/Rusyn), Hungarian and Serbo-Croatian for the last two-hundred years quite intensively, so some inconsistencies due to borrowings are expected. For гарло, this might be a language-specific innovation (I can imagine grdl- and -rdl- overall not being a very easy cluster, and for this specific example Slovincian also does some simplification). дороги is undoubtedly a borrowing though. Thadh (talk) 23:36, 14 December 2023 (UTC)
 * @Thadh: I think it's worth addressing possible loanwords for your case (e.g. дороги, etc.). Compare with the English, which has more Romance words than native words and the Korean, which has more Sinitic words than native but it doesn't change their language family belonging. These languages are described well, though, but for Pannonian Rusyn, need to make it explicit, IMO, in case someone questions. Anatoli T. (обсудить/вклад) 23:49, 14 December 2023 (UTC)
 * @Atitarev I think the words chosen are unlikely to have been borrowed. Or at least there are enough that are unlikely to have been borrowed that it's even more unlikely that we chose only borrowed words. Vininn126 (talk) 09:49, 15 December 2023 (UTC)

It's been a month and there's been overall support for this. I'm going to mark this thread as closed and lang codes for Carpathian and Pannonian should be assigned. Vininn126 (talk) 12:49, 14 January 2024 (UTC)