Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's

Regional and Obsolete variations as LDL's
Voting on: Changing WT:CFI as follows:

Current text:

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:


 * the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
 * each entry should have its source(s) listed on the entry or citation page, and
 * a box explaining that a low number of citations were used should be included on the entry page (such as by using the template).

Proposed text:

For languages well documented on the Internet ("WDLs"), three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in dialects of WDLs, one citation showing clear use restricted to the dialect is the minimum, or one mention in an agreed-upon reference work subject to the below requirements is adequate. For terms in historical lects in WDLs as well as extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. What is considered a dialect or historical lect should be determined on a case-by-case basis within a community of editors for a language and listed on the appropriate language considerations page.

For all other spoken languages that are living (known as limited documentation languages or "LDLs"), only one use or mention is adequate, subject to the following requirements:
 * the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
 * each entry should have its source(s) listed on the entry or citation page, and
 * a box explaining that a low number of citations were used should be included on the entry page (such as by using the template).

Rationale for the proposal
Treating all variants of a language as well documented can make it impossible to document other lects such as dialects. This proposal is intended allow for easier documentation of dialects or other such lects (such as certain historical forms of speech) which often have only one linguistic work done on them.

Schedule:
 * Vote starts: 00:00, 6 September 2022 (UTC)
 * Vote ends: 23:59, 12 October 2022 (UTC)
 * Vote created: Vininn126 (talk) 09:31, 30 August 2022 (UTC)

Discussion:
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2022/August

Support

 * 1)  as the proposer. Vininn126 (talk) 00:00, 6 September 2022 (UTC)
 * Varieties that don't conform to the written standard are far less documented on the internet. Tajoshu (talk) 01:38, 6 September 2022 (UTC)
 * : Unfortunately, you are ineligible to vote (second requirement; 19 edits by the start time of the vote). J3133 (talk) 05:20, 6 September 2022 (UTC)
 * 1)  A useful change, probably of limited applicability, but of great importance in those few cases where it is required. This, that and the other (talk) 06:34, 7 September 2022 (UTC)
 * 2)  --Vahag (talk) 09:12, 7 September 2022 (UTC)
 * 3)  It can definitely be problematic sometimes but it is better to deal with these problem than giving up on hundreds of LDL's forms Tashi (talk) 09:38, 7 September 2022 (UTC)
 * And how do you propose to deal with these problems? Ad hoc deletions via RFD contrary to voted-on policy? --Dan Polansky (talk) 09:41, 7 September 2022 (UTC)
 * 1)  – more balanced, scraps the false dichotomy between languages of extensive and limited documentation. By its very quality of a language having a large presence, that makes us expect great attestability, it has varieties of limited documentation within itself, incongruous with their current or historical linguistic reality. Fay Freak (talk) 11:04, 7 September 2022 (UTC)
 * 2)  Hythonia (talk) 17:31, 8 September 2022 (UTC)
 * 3)  as someone interested in dialectal and archaic language. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 09:15, 9 September 2022 (UTC)
 * 4) . —Svārtava (talk) • 02:56, 16 September 2022 (UTC)
 * 5) . AG202 (talk) 15:24, 29 September 2022 (UTC)
 * 6) . --沈澄心✉ 09:04, 4 October 2022 (UTC)
 * 7)  - it's not fair for dialectal regionalisms spoken by a small number of people to be held to a higher standard simply because they're part of a large language. Theknightwho (talk) 09:42, 6 October 2022 (UTC)
 * 8) . Binarystep (talk) 09:44, 6 October 2022 (UTC)
 * 9)  DovaModaal (talk) 15:55, 6 October 2022 (UTC)
 * 10)  --Skiulinamo (talk) 09:03, 12 October 2022 (UTC)

Oppose

 * 1)  A single reference could be fine, but a single use is too weak form of evidence, I think. The vote shows no supporting material showing examples and counterexamples so it is hard to say what we gain and what we lose by allowing criteria so lenient. One could argue that the same argument applies to extinct languages as well, and it does, which is why I opposed the vote for extinct languages as well. The requirement of 3 quotations is there to ensure some minimum independence, extrapolation from at least 3 data points; here, the proposed extrapolation is from a single data point. --Dan Polansky (talk) 07:55, 7 September 2022 (UTC)
 * Concerning extinct & minority languages, I think that you are too strict. In some cases, there may not be 3 independent sources. In other cases, there may be 3 independent speaker sources but these speakers were consulted by the same linguist (thus presumably not independent as I understand the Wiktionary criteria) – furthermore, many publications by linguists do not list the speaker sources (they might be indicated in the linguist's unpublished fieldwork notes, but one might have to travel to particular university to consult them).
 * For instance, if I recall correctly, most of the GL Trager's information on the Taos language (which I added to Wiktionary years ago) relies on a single speaker (which was mostly necessary since Taos culture has considered it taboo to discuss Taos culture & language with outsiders, his consultant talked to Trager at great personal risk). JP Harrington does have some attestations of Taos words but they often differ from Trager's forms, and Harrington transcribed words more phonetically & sort of predated the phonemic theory of the US structuralists.
 * Basically, if you require 3 independent sources, you will exclude many languages. Although it's not your intention, this will introduce a bias against including minority languages. Ish ishwar (talk) 23:09, 27 September 2022 (UTC)
 * The distinction between a single mention in a reliable source and a single use in a sentence is important. I do not require 3 independent reliable sources: "A single reference could be fine". Taos words covered by  in his linguistic works written in English are by definition mentioned and not used by him. To trace a Taos word to Trager is to give a reference. About dialects, the problem of single use is even worse (for a Taos sentence, you at least know it is in Taos) and was discussed by me below with my valid concerns dismissed as contrarian. The concerns raised by -sche seem similar to mine: how does a single use reveal the dialect of the word? --Dan Polansky (talk) 07:09, 28 September 2022 (UTC)
 * 1)  The Dictionary of Finnish Dialects will have nearly 400000 entries, many with more than a dozen senses. A general dictionary maintained by the same source has 100000 entries. This proposal comes with a risk of introducing massive amounts of unattested, potentially incorrect, entries, and therefore requires particularly strong proof of necessity. I have seen no such thing. brittletheories (talk) 05:51, 9 September 2022 (UTC)
 * That is why there is a clause to allow for a community to decide whether or not something should be trusted. If this dictionary you provided isn't trustworthy, it shouldn't be included in the list of acceptable materials. On the other hand, there is a ton of (real) information out there about dialects that is uncovered or not very covered that should be. It all depends on what a group decides is acceptable or not. Vininn126 (talk) 07:55, 9 September 2022 (UTC)
 * It is the most trustworthy source on Finnish dialects. However, with potentially millions of definitions, there are going to be mistakes. The only way to combat them are consistent attestation criteria, which this vote aims to ravage. brittletheories (talk) 08:47, 9 September 2022 (UTC)
 * It aims to give editors nuance - it's not like it has to be an all or nothing case. It gives editors the ability to opt-in certain resources to do so. You could add a clause saying "this resource is reputable but not reputable enough to be used as a single source." If it's not trustworthy enough to be used for the purposes of this vote, fine, the vote allows for that. Vininn126 (talk) 08:50, 9 September 2022 (UTC)
 * I will note parenthetically that while editors can restrict sources for single mentions, no such provision has been made for single uses, AFAICS. --Dan Polansky (talk) 08:53, 9 September 2022 (UTC)
 * As with any quote, editors have the right to call into question whether the given quote is representative of the definition it is supporting - the quote should be clearly supporting a specific dialect - if other similar ones can be found supporting other dialects, then the quote can be questioned. Still within the vote. Vininn126 (talk) 08:55, 9 September 2022 (UTC)
 * I have no idea how a quote is going to reveal that the term in question is restricted to a dialect: do we have some example quotations to demonstrate the idea? --Dan Polansky (talk) 09:04, 9 September 2022 (UTC)
 * You mustn't have looked very hard. Compare any quoted dialectal term as it is. Vininn126 (talk) 09:07, 9 September 2022 (UTC)
 * It should not be hard for those who claim knowledge to give us some examples. --Dan Polansky (talk) 09:10, 9 September 2022 (UTC)
 * come correct
 * strambang
 * Vininn126 (talk) 09:13, 9 September 2022 (UTC)
 * Great, thanks. Now, strambang has "an way that a geed en sich a wap in the niddick that strambang a het es head agin the clovel, an made a bump in es brow". That sentence has suspectly many odd forms. But how do I know, just by looking at the sentence, which word forms are restricted to a dialect? Not all of them are, obviously. Some kind of additional background knowledge is required, I suppose, and that is the kind of background knowledge used by the makers of authoritative references.
 * For come correct, the quotations do not look suspect in any way, so it is not clear how the dialect is identified. --Dan Polansky (talk) 09:21, 9 September 2022 (UTC)
 * You do you, Dan. If you enjoy being a contrarian, have at it. Vininn126 (talk) 09:25, 9 September 2022 (UTC)
 * Let me just note that strambang traces to an authoritative source, which seems to be safest option to disclose whether something belongs to a dialect; meaning extraction is another matter. The proliferating love of unrestrained original research is not an unconditional good. --Dan Polansky (talk) 09:33, 9 September 2022 (UTC)
 * 1)  I lament that holding dialects to WDL standards excludes swathes of poorly attested ones, but I think a change this broad just eliminates the 3-cite standard altogether. How do we distinguish an obscure rapper's personal nonce in a song sung in AAVE / MLE / etc from a dialectism? Any nonce will be argued to be a dialectism, since even if a work isn't markedly Geordie or whatever, its author is from somewhere, so it's a work of American, British, Indian English, etc., and not every dialectal work spels evrithing difrentli, some dialectisms are e.g. an otherwise-standard sentence by a Southern author using y'all or using garden peculiarly. (This vote will also get used to bring back all the Spenser and Early Modern English nonces that failed RFV.) If the aim is to stop treating Middle Polish as a WDL, it'd be better to have a discussion specifically about that, like the line about Arabic was tweaked to only refer to Modern Standard Arabic, and to vote on other specific things, rather than allowing an open-ended kettle of fish. I know Vininn worried about giving people who don't know/edit a language too much power to vote on it, but I think this does the same thing. The proposer says each community of editors will work the issues out to not include wrong things, but do we even all agree who counts as the community of editors knowledgeable about a lect without it coming down to one editor rancorously striking another's votes? In many cases we have no active editors of a dialect, so RFVs can only be decided by the general userbase, where we already see some people vote to include everything (SOPs, typos, misanalyses of which parts of an expression are idiomatic or what POS they are, ...); at RFD, certain users throw irrelevant WT:IDIOM tests at everything and hope one sticks; I do not share the confidence that they'll start to exclude things that should be excluded; rather, especially now that the single use can be a tweet, I think this will be used to reduce our standard from three cites to one cite for almost anything (especially if, as discussed in some of the pre-vote discussion, some users want to consider jargons to be like dialects). - -sche (discuss) 15:01, 9 September 2022 (UTC)
 * And I mean, I'm so sympathetic to the plight of poorly-attested dialects that there's a word I was going to give as an example instead of y'all but which I decided not to mention because it only has one cite (from an academic work which quotes a valid use and then explains it and its peculiarity to the dialect) and so without this vote it'd probably fail RFV (unless there are web examples) — I agree the current CFI are a problem! I just worry this is way too inclusive of way too much. - -sche (discuss) 15:01, 9 September 2022 (UTC)
 * Some secondary thoughts:
 * 1) You say that when it comes to individual lects, communities of editors should decide, as stated in the vote...
 * 2) You worry about small communities, but in reality most languages with small communities are LDL's anyways, which only need one quote or citation to pass CFi anyway...
 * I fail to see how this vote really affects your concerns... Vininn126 (talk) 22:50, 9 September 2022 (UTC)
 * Re "You say that when it comes to individual lects, communities of editors should decide": I'm sorry I was unclear, what I was trying to say is that while this vote/proposal seems to think that when it comes to individual lects, communities of editors who speak the lects should decide what terms are dialectal terms and what standards they should be subject to, I think that's not always possible (if there simply is no community of editors for that dialect). As it happens, I'm also not sure if it's desirable even when it is possible; obviously information from those most familiar with the topic can be helpful, but we're making decisions about general criteria for inclusion, so it seems reasonable for the general community to be able to give input on that decision. - -sche (discuss) 20:02, 23 September 2022 (UTC)
 * We just weigh the likelihoods. This can naturally reintroduce the requirement of more than one occurrence to prove the dialectism, as the reasons for why something is not well found must be considered. We should exercise some restraint in “arguing”. This has always applied to the textual situation, too: The reading of a word must be most probable. This goes particularly for Arabic already mentioned as it is known that the old dictionaries contain chains of textual corruptions, e.g. most meanings claimed for as outlined in the journal piece cited there. Fay Freak (talk) 23:08, 10 September 2022 (UTC)
 * 1)  for the reasons described by -sche. Otherwise, the rules are unclear to me, a non anglophone: I do not understand if, for dialects, the needed source is a ref to an official or well-known dictionary or Institute, if citation  means a text from a published book or if it means a text from some internet site. Thank you. &#8209;&#8209;Sarri.greek &#9835; I 15:47, 11 September 2022 (UTC)
 * 2) I don't see where words used by a small community should be excluded unless they can be defined as dialectal.--Prosfilaes (talk) 23:59, 22 September 2022 (UTC)
 * 3)  reluctantly. I do agree it should be easier to include words from poorly attested dialects of major languages, but the proposed wording is IMO too broad. —Mahāgaja · talk 07:58, 23 September 2022 (UTC)
 * 4)  the wording, but support in principle. I do see allowing laxer attestation criteria for dialects is beneficial, but I must say that I agree with Mahagaja that the criteria is too broad. In particular it allows dialect terms to be even more prone to nonce words when compared to LDL, but our rules on LDLs are already very lax. I counter-propose that they should simply be given the same treatment as LDLs, or something different through further discussion. – Wpi31 (talk) 12:50, 23 September 2022 (UTC)
 * What difference are you seeing between the proposed treatment of LDLs and the proposed treatment of dialect terms? It seems to me that this vote will treat them effectively identically, subject to being able to identify that a use is dialectal. This, that and the other (talk) 11:55, 5 October 2022 (UTC)
 * (ping did not work unfortunately) the difference is a minor issue, see the talk page where I mentioned it earlier. In the proposed wording, the dialectal terms, when cited via the "showing clear use restricted to the dialect" clause, would not be subject to the LDL requirements. I understand that the intention is to treat them identically, but the wording does not reflect this properly (and it is ambiguous). – Wpi31 (talk) 10:01, 6 October 2022 (UTC)
 * 1) .  I support weaker standards for dialects, but not this.  Perhaps two durable uses and a mention in a reputable dialect dictionary. I don't see any value in adding every letter combination found in Exmoor Courtship as a "word" of the essentially oral West Country dialect of English.  (An example chosen because I satisfied the RFV for the obsolete word .) Vox Sciurorum (talk) 13:57, 6 October 2022 (UTC)
 * : 2 Uses 1 Mention. Here's the question I always ask myself when I look at these citation and quotation questions: What would an authoritative dictionary of English do in my (or Wiktionary's) position? I am afraid that the authoritativeness of Wiktionary as a dictionary enterprise could be damaged by editors making entries relying on "one citation showing clear use restricted to the dialect" as the minimum. Two uses one mention as Vox Sciurorum says would be better in my mind; sorry if I'm being obstructive. --Geographyinitiative (talk) 01:11, 7 October 2022 (UTC)
 * 1)  I would support the 2 uses and a mention criteria discussed by others or a 1 use criteria if the use is required to be from a pre-approved source, as this proposal would do for mentions. I also disapprove of the idea that "[w]hat is considered a dialect or historical lect should be determined on a case-by-case basis within a community of editors for a language". As others have said drawing this boundary is fraught if not sometimes impossible. Additionally, I think all editors should be able to participate in such discussions and at the end the strength and level of informedness of each editor can be weighed. To that end, I do think discussions for any similar policy about how a lect should be classified should happen in the Beer Parlour so that the fact that they are occurring is relatively clear and public.  &mdash;The Editor's Apprentice (talk) 19:35, 12 October 2022 (UTC)

Abstain

 * 1) . Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:04, 25 September 2022 (UTC)
 * : I very much support the premise, but I agree with Mahagaja that we need some objective or at least more firm criteria for making a variety LDL. Thadh (talk) 11:19, 1 October 2022 (UTC)
 * See the talk page for input. I wish I had this input back when I asked for it in the Beer Parlor when I made the vote, but here we are. Since most people are on board with the idea but would like to see changes in the details, let's figure those out. Vininn126 (talk) 11:31, 1 October 2022 (UTC)

Decision
No consensus 14-10-2 GreyishWorm (talk) 01:01, 13 October 2022 (UTC)