Wiktionary talk:Votes/2022-08/Regional and Obsolete variations as LDL's

Grammar
The part “For terms in dialects and obsolete senses in said languages, extinct languages,” seems to have a grammatical problem. MuDavid 栘𩿠 (talk) 00:40, 31 August 2022 (UTC)


 * You are right. I thought there was a third element! Thanks. Vininn126 (talk) 08:22, 31 August 2022 (UTC)

Proposal for clarification of meaning
I’ve been rereading the proposal, and there are some places I think it’s not completely clear. For example, do the three bullet points not apply to dialects and obsolete senses? (If they don’t, doesn’t the community of editors for that language need to agree? I can concoct up my own reference work, agree with myself, and start adding crap to Wiktionary?) And the “dialects and obsolete senses” are of WDL, not of extinct languages, are they?

What about:

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in dialects of said languages, one citation showing clear use restricted to the label is the minimum, or one mention in an agreed-upon reference work subject to the below requirements is adequate. For obsolete senses in said languages as well as extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements::
 * the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
 * each entry should have its source(s) listed on the entry or citation page, and
 * a box explaining that a low number of citations were used should be included on the entry page (such as by using the template).

&mdash; MuDavid 栘𩿠 (talk) 01:53, 1 September 2022 (UTC)


 * I can get behind this - it's clearer and in the spirit of the proposal. Vininn126 (talk) 08:30, 1 September 2022 (UTC)

Obsolete senses
I couldn't support broadening the obsolete sense CFI for English at least. What we treat as Modern English is very well attested right from the beginning (1500), and essentially any word that appears once in one of these old texts would become includable. It would also allow us to create an entry for every term used in Spenser - not sure if that's a good thing or a bad thing, but it seems like a side-effect which at least ought to be discussed.

I could certainly support the dialect part of the vote, although I would like more clarity on what is a "dialect". Is American English a dialect of English? Surely not. But then, say, Belizean English might well be in the spirit of the new dialect-as-LDL rule.

I wonder if the vote should be split into a dialect part and an obsolete-sense part? This, that and the other (talk) 05:38, 2 September 2022 (UTC)


 * @This, that and the other One issue with this (per the Beer Parlor discussion) is that Middle Polish (1500-1750) is considered a variation of Modern Polish (to avoid lots of duplication) but is not well attested. How might we word this so as to disclude regular obsolete varieties but include others?


 * As to determining what is a language or dialect that should really be up to a language community as it depends on the language and is not part of the scope of this vote. I do not think you can ever codify what is or isn't a language in that way as it is and should be handled on a case-by-case basis. Vininn126 (talk) 09:03, 2 September 2022 (UTC)
 * Couldn't you then decide to treat Middle Polish as a "dialect" of Polish for the purposes of this part of CFI? Then we don't need the obsolete-sense part of the vote at all... This, that and the other (talk) 09:55, 2 September 2022 (UTC)
 * I think the issue is more just the use of the term dialect - would replacing "obsolete" with "historical lect" be better?


 * I propose the text "For terms in dialects of said languages, one citation showing clear use restricted to the label is the minimum, or one mention in an agreed-upon reference work subject to the below requirements is adequate. For terms in historical lects in said languages as well as extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. What is considered a dialect or historical lect should be determined on a case-by-case basis within a community of editors for a language and listed on the appropriate language considerations page. Vininn126 (talk) 09:59, 2 September 2022 (UTC)
 * Yes, I'd be on board with that. I think it more closely mirrors the intent and avoids having to deal with the obsolete sense issue. This, that and the other (talk) 05:34, 3 September 2022 (UTC)
 * Hmmm, couldn't it be argued that Early Modern English (15th to 17th century) is a historical lect, then? For the record, I support this change, but knowing this website, I know that that'll come up in the future. AG202 (talk) 09:18, 4 September 2022 (UTC)
 * Something for the English community to decide. Vininn126 (talk) 14:54, 4 September 2022 (UTC)


 * The point about American English is similar to what I worried about in the BP, that any nonce can be asserted to be a term in the dialect of the region its author is from (American English or whatever specific state, British English or whatever region, etc). The proposed text does not have safeguards / guidance on how to determine what is a dialect; I appreciate the proposal above which adds some ... but I also wonder what the right balance is between deferring to a language community vs having policies like "how many citations are needed" be decided by the overall community. This vote would offload some (hitherto voted-on) criteria for inclusion from the CFI page to not-voted-on About pages. I hate to be reluctant about this, because I'm sympathetic to the idea of giving little-attested dialects an easier time, but the devil is in the details. I suppose my preferred way of doing things would be that a language community proposes "treat Polish dialects and Middle Polish as a LDL" but that is to be voted on by everyone in a vote like this, perhaps even in an omnibus vote ("treat X, Y, and Z dialects of Polish; A, B, and C dialects of French; and D, E, and F dialects of Italian as LDLs") but nonetheless one about specific lects and not something open-ended. But I understand the reasons to keep it open-ended and adaptable, and avoid "why is this vote only allowing Middle Polish and A, B, C French dialects but not my favourite dialect of Spanish no-one thought of until just now?!". Meh. - -sche (discuss) 06:25, 3 September 2022 (UTC)
 * I appreciate that many people wonder about the same thing - that this proposal does not deal with what IS or IS NOT a dialect/historical lect, and to be honest I think that is outside of the scope of this vote. As I have said. Making each one a vote would be one approach, but I think that would be inviting too many people who are not knowledgeable on a given lect to have an opinion on it. I also think that if someone tries to make an assertion, a future editor will be able to defer to this saying "we need to establish that the reference you are working to is about specifically X lect." So while it does push the responsibility of that somewhere else, I think that is the best solution in this given case. Furthermore, the current CFI already pushes similar things off onto the community of editors. "the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention". If you issue is with that, then I believe that is a separate issue unrelated with my proposal. Vininn126 (talk) 08:53, 3 September 2022 (UTC)

Clearer Phrasing and Formatting of the Proposal
I recognize the vote has already started, and I regret that I am only making this suggestion now, but I honestly did not expect it to get going so soon, which is my bad. That aside, I think the currently proposed addition of new text makes a already clunky section more clunky. The following is my proposal for an alternative phrasing and formatting, which I tried to word as similarly as possible, while still improving, with a lean towards improvement.

Terms used in the contemporary, standard variety of a language that is also well documented on the Internet require a minimum of three citations in which the term is used to be included in Wiktionary.

Here, the terms and  stand in contrast to what is considered a dialect or historical variety of a language. The specifics of what is a dialect or historical variety for the purposes of this policy should be determined on a case-by-case basis within the community of editors for a language. The results should be listed on the appropriate language considerations page.

Terms in the dialects of a language well documented on the Internet can be included if there is


 * one citation showing clear use restricted to the dialect, or
 * one mention in an previously agreed-upon reference work.

The communities of editors for langues well documented on the Internet should maintain a list of the reference works that can be used to support a term's inclusion based on a single mention.

Terms in historical variety of languages well documented on the Internet as well as extinct languages can be include if there is


 * one use in a source from the period the variety was used, or
 * one mention in an previously agreed-upon reference work, as described above.

Terms in living languages with limited documentation on the Internet can be included if there is there one citation including use or mention of the term. The entries for terms in these language should have their sources listed on the entry or citation page. A box explaining that less than three citations support the terms inclusion should be included on the entry page (such as by using the template).

As others have mentioned, I think the use of in this proposal is not great. The phrases "minority variety" (using particularly in the sociological sense), "non-standard variety", "proscribed variety", "subordinated variety", or "jargon" (both in the "technical terminology" and "characteristic language" senses) probably serve our purposes better. I also replaced uses of term with  since  is a uncommon jargon word which presents a barrier to new and non-academic users. To everyone reading this, please let me know what your thoughts are. Take care. &mdash;The Editor's Apprentice (talk) 04:13, 6 September 2022 (UTC)

Rewording
The main thing I am seeing is clearer wording. What sort of wording would convince you? Based on the BP discussion I was hoping having a safeguard option by having people opt in various things (sources, lects themselves) would help. Since we all want the same thing, can we come up with a way that will satisfy everyone? Vininn126 (talk) 14:01, 23 September 2022 (UTC)
 * I'm not sure, but I do feel like a distinction should be made (somehow, but how?) between well-documented dialects and less-documented dialects, parallel to well-documented languages vs. less-documented dialects. For English, for example, I feel that Southern U.S. English and AAVE are both well documented and so should be held to the same attestation standards as standard English, while traditional dialects of England (e.g. the West Country dialect that sentence using strambang is in) are less well documented. Or maybe we should just be more generous in defining traditional dialects of England as separate languages, the way we already do with Scots. I certainly would say that "an way that a geed en sich a wap in the niddick that strambang a het es head agin the clovel, an made a bump in es brow" is no more comprehensible to someone familiar only with standard English than an equivalent sentence in Scots would be. But I do anticipate enormous headaches in trying to establish a consensus on what is considered a "WDD" and what is considered an "LDD", or alternatively, which dialects should be granted language status at Wiktionary and which ones shouldn't. —Mahāgaja · talk 14:11, 23 September 2022 (UTC)
 * This is an issue we already deal with with regard to WDL's and LDL's. One way of determining this would be to create a BP discussion where people can chime in. It seems like a lot of red tape but I am unsure of any other solution. Vininn126 (talk) 14:15, 23 September 2022 (UTC)
 * As I've said, I don't see one citation would suffice for the attestation criteria, and I've proposed that they should simply be subject to the same requirements as LDL. One thing that is really confusing and baffling to me is the fact the titles of this vote and the original BP discussion suggest "LDL", yet the wording in the proposed text does not.
 * I do agree that a better way to approach this would be to distinguish well-documented dialects and less-documented dialects, instead of only seeing the language as a whole. There are certain cases where a dialect is relatively better documented than the language proper, e.g. for Cantonese (which I mainly work on), the coverage on Hong Kong Cantonese is probably more or less similar to some of the smaller WDLs, but it is much less on Cantonese proper (i.e. Guangzhou dialect). IMO we should redo the WT:WDL list and take into account the dialects of the languages on that page, rather than only listing the (standard) languages. (In fact the list is already more or less doing this by mentioning specifically both Norwegians, Standard Written Chinese, and Standard Indonesian) Obviously this will lead to considerable amount of disputes in which dialects are considered well-documented, but I think it will help improve our coverage of rarer dialectal terms in the long run.
 * Perhaps the elephant in the room is the WDL/LDL policy itself. As I've mentioned in a brief discussion on WT:Discord, the gap between the attestation criteria of the two is quite large. This means we will see heated debates on a language (or dialect assuming we implement the WDD/LDD changes) if it sits somewhere near the boundary of the two as the policies for them are drastically different. I think there should be more levels to reflect the documentation of languages more precisely - English has much more coverage on the Internet than all the other WDLs, yet they all subject to the same attestation criteria; meanwhile among the LDLs there are some with considerable amount of coverage on one end of the spectrum, and (nearly) extinct languages with only one water/translations entry on the other end. It might be better if these issues are brought to BP for further discussion. – Wpi31 (talk) 15:50, 23 September 2022 (UTC)
 * How does the wording not actually make them LDL's? LDL's only need one mention already - so needing one citation for them here would be the exact same criterion.
 * A lot of people have mentioned the WDL/LDL distinction as it is - perhaps that needs to be sorted out before something like this is brought to a vote, because a good portion of criticisms related to the vote are actually about LDL's as a whole - outside the scope of the vote. Vininn126 (talk) 16:01, 23 September 2022 (UTC)
 * Minor quibble: in the case of dialects, the "subject to the below requirements" part is only restricted to the mentions clause, but not to citations. Whereas LDLs require both mentions or citations to be "subject to the following requirements".
 * That being said, dialects being treated the same as LDLs is only at the bare minimum for me (i.e. I would want to see something slightly stricter, but this treatment is fine), based on the fact that we want to have laxer rules for dialects but there are only two levels of attestation criteria, while we don't want to complicate things by introducing something new. – Wpi31 (talk) 17:59, 23 September 2022 (UTC)


 * Part of my concern is, in line with Mahagaja's comment, what dialects are covered. The "least bad" approach I can think of is to vote on specific ones like we specifically name WDLs. That way we could ease requirements for Middle Polish but maybe not Early Modern English or the dialect called American English. (Is Australian English a dialect we want to reduce requirements for? NZ English? Probably people should decide up-front. Do we even have editors familiar with every dialect of every WDL that a general vote like this would cover, to help us determine which are well-documented? Maybe we don't need to vote on some until they actually come up.) This means the broader community may be involved in deciding whether to reduce requirements for a dialect, but we did that with the decision on WDLs / LDLs, too. Another part of my concern was how to distinguish an author's nonce from a dialectal term. I considered suggesting requiring both a use and a reference work that says the term is dialectal, but some dictionaries consider any term they can to be a part of the language/dialect they cover without any apparent effort to determine if it's a nonce, so that wouldn't actually help there, and conversely dictionaries don't exist for some other dialects of some WDLs, so the requirement would block those dialects. So maybe I shall just accept that some nonces are the cost of getting good dialectal terms in; hopefully editorial judgement can help us if the 'Chimpmania' editor figures out to start making short Usenet posts in Scots, lol. - -sche (discuss) 19:44, 23 September 2022 (UTC)
 * The nonce thing is an issue with LDL's anyway - so that's an issue regarding LDL's, not necessarily treatment of dialects. Vininn126 (talk) 22:08, 23 September 2022 (UTC)


 * There's no magic wording issue. Separate languages are important historically and linguistically. I might even make an exception for early modern forms of languages where the documentation is scarce, like English and French up to 1600 or 1700, and Albanian up to 1900. But pretty much by definition, every word that might deleted under WDL rules and kept under LDL belongs to at least one small lect, and I see no reason to cut these lines between some lects and other lects. Why should keeping a word used only in 19th century Georgia depend on whether we can shoehorn it into a dialect or not?--Prosfilaes (talk) 22:58, 25 September 2022 (UTC)
 * It is not whether the word or definition belongs to a lect, but whether it is unique to that lect. So perhaps it is a wording issue, as that message was not conveyed, it seems. Vininn126 (talk) 07:39, 26 September 2022 (UTC)
 * That does not address my concerns. I do not see the value in splitting out a bunch of dialects from modern English or modern French or modern Spanish and treating them differently.--Prosfilaes (talk) 22:46, 26 September 2022 (UTC)
 * Then I think we just hold completely different values, and that said I do no think this vote really affects that. It only affects our requirements for documenting them, not for splitting them. So you are right, it does not address your concerns because your concerns were never even related with the vote. Vininn126 (talk) 08:51, 27 September 2022 (UTC)