Wiktionary talk:Votes/pl-2018-12/Lemming principle into CFI

Rationale
Rationale for having such a principle: Using the principle, we can speed up the RFD discussions, freeing our mental energies for expanding the dictionary for our readers. At worst, we end up including items that are redundant; accuracy is not at stake since the term has to be attested as per the proposed wording.

As for impact of the proposal, there are two groups of terms impacted:
 * 1) Terms that we would create and keep anyway, after a RFD discussion and deliberation. For these terms, there is no difference in inclusion, but there is a difference in RFD administration cost: for these, the RFD may not be created at all where it would be created otherwise, or when RFD is created, it may get swiftly closed. Also, someone pondering to create an entry might give up on it fearing the deletion, where the lemming principle might quickly confirm that the effort is worth it to the prospective creator.
 * 2) Terms that a discussion would determine to be sum of parts, but that we would include because of the lemmings. For those on the inclusion side, this may be an added benefit. For those on the deletion side, this is may be a disadvantage.

Rationale for having the principle in CFI: Having the text in CFI is most convenient for a newbie looking for criteria for inclusion, and this is a criterion for inclusion. Furthermore, the lemming principle is arguably not necessarily a test of idiomaticity, and therefore, its presence in Idioms that survived RFD is not ideal.

Why is WordNet excluded: WordNet is as respectable as any dictionary, but its purpose is to create a semantic network and aid automated reasoning and understanding of text. Therefore, WordNet is liable to include all too many sum-of-part entries by design.

--Dan Polansky (talk) 09:58, 27 December 2018 (UTC)
 * Expanded. --Dan Polansky (talk) 13:51, 27 December 2018 (UTC)

"At least one" is not enough IMO
No individual dictionary deserves that much trust enshrined in policy! Equinox ◑ 09:10, 27 December 2018 (UTC)
 * "At least two independent"? --Dan Polansky (talk) 09:20, 27 December 2018 (UTC)
 * Made it "at least two"; I left "independent" out since that would make it too long, and is kind of implied. --Dan Polansky (talk) 09:27, 27 December 2018 (UTC)
 * I agree. I create entries without hesitation when I find them in three dictionaries (say Macmillan, Cambridge, Longman); a recent example would be . I'm slightly more hesitant with only two, but I generally find it acceptable. However, with just one, I'm instantly wondering "why is this not covered by other dictionaries? Is it an oversight on their part, or is it simply not really an idiom? Is it SOP?" in particular seems prone to include SOP terms that I don't want to see here. Per utramque cavernam 10:19, 27 December 2018 (UTC)

How to gauge dictionary quality?
We would want to exclude things like spammy clickbait dictionary sites based on copying other sources etc., and focus on professional respectable ones &mdash; I assume. Equinox ◑ 09:11, 27 December 2018 (UTC)
 * Good point. "at least one respectable general monolingual dictionary, e.g"? --Dan Polansky (talk) 09:20, 27 December 2018 (UTC)
 * Added "respectable". --Dan Polansky (talk) 09:27, 27 December 2018 (UTC)
 * Would "professionally edited" be better? Or "notable", a term used by Wikipedia? --Dan Polansky (talk) 09:44, 27 December 2018 (UTC)
 * I often find, (which is not the same as the OED) and  to be of good help too. Especially the first of the three. Per utramque cavernam
 * I added Longman and Cambridge; note, however, that the list was not intended to be exhaustive as indicated by the use of "e.g.". I now made it even more explicit by the use of "include but are not restricted to". --Dan Polansky (talk) 11:07, 27 December 2018 (UTC)


 * "Respectable" seems to open another can of worms: respected by whom? I think I would prefer something along the lines of "professionally published". Basically I would include works created by academically qualified people and sold for money, probably not other works. Equinox ◑ 12:15, 27 December 2018 (UTC)
 * I have now used "professionally published". I tend to agree that "respectable" was less than ideal. --Dan Polansky (talk) 12:37, 27 December 2018 (UTC)

I don’t trust any dictionary, and I don’t trust native speakers or those who strive to be native speakers by focussing on a count of languages lesser than five. They include many set phrases that are sum of parts, because of their unbalanced view on their language. Hence the double-entendre of the word “sophistication”. And the business of lexicographers is not respectable but dirty. This new motion also tries to weaken once again Wiktionary’s character as secondary source, and again privileges English and other imperalist languages that teem with dictionaries. Fay Freak (talk) 12:59, 27 December 2018 (UTC)
 * This is only about an override of the sum of parts requirement; attestation is still required regardless of what other dictionaries do. True enough, if this passes, the result will be more beneficial for English since it has more dictionaries, but it can benefit many other languages as well. --Dan Polansky (talk) 13:12, 27 December 2018 (UTC)
 * This is not different from what I said. I understood that this is only about an override of the sum of parts requirement. But I doubt that it gives a competitive advantage to look into other dictionaries to override. It perhaps would cause too much influence: Working with other dictionaries in mind, like also assuming idiomaticity and defining it so because other dictionaries assumed it idiomatic. Basically this vote creates a fiction of idiomaticity. Though it be intended procedurally it can easily effect like an irrebuttable presumption. Fay Freak (talk) 13:26, 27 December 2018 (UTC)
 * The vote rationale states speeding up RFD as a benefit, not competitive advantage. Further, one of the above posts suggests that monolingual native speakers are more likely to see terms in their native tongue as more than sum of parts, whereas from my experience, they tend to see SOPness where non-natives see idiomaticity or other kind of inclusion-worthiness. --Dan Polansky (talk) 13:37, 27 December 2018 (UTC)
 * If neither dictionaries nor speakers define what a language is, what does? The imperialists? - TheDaveRoss  13:16, 27 December 2018 (UTC)
 * Nobody does. The suffices, and the whole usage counts. It is to be avoided to formally give the competitors additional weight or one loses form oneself. Fay Freak (talk)
 * But speakers are who use the language. - TheDaveRoss  13:43, 27 December 2018 (UTC)
 * Trivial. And editors look what of the speakers’ uses is lexical. This vote breaks this principle. It tells to look at what other editors do. A principle corrosion. Fay Freak (talk) 13:51, 27 December 2018 (UTC)
 * We should not be "gauging dictionary quality" at all. In general we have very little insight into the process other dictionaries use to decide what to include. We need to make our own criteria. I intend to oppose strongly. DTLHS (talk) 16:08, 27 December 2018 (UTC)
 * We can assume that, whatever the process the dictionaries use, the intent is largely to make the dictionary useful and appealing to users. And whatever the process, it is designed and administered by paid professionals. --Dan Polansky (talk) 17:11, 27 December 2018 (UTC)
 * I don't assume that. And people get paid to do lots of stupid things. Maybe they included a word because they in turn were using other dictionaries as sources. We have no idea. DTLHS (talk) 17:34, 27 December 2018 (UTC)
 * Let emphasize that I would consider it unwise to include unattested words only because they are in other dictionaries. The proposal requires attestation as usual, and only lifts the requirement of idiomaticity. --Dan Polansky (talk) 18:19, 27 December 2018 (UTC)

"should"?
"An attested term that is included in at least two respectable general monolingual dictionaries should be included.": I'm not sure I agree with that modal. Why not ? Per utramque cavernam 12:11, 27 December 2018 (UTC)
 * WT:CFI uses should throughout. Software specifications often use shall for binding requirements and should for mere recommendations, but that is not the custom of CFI. Can suggests to me "can but does not have to". --Dan Polansky (talk) 12:40, 27 December 2018 (UTC)


 * "Can" seems correct, then. We're not implementing a policy that says Wiktionary must add every eligible lemming. Equinox ◑ 17:18, 27 December 2018 (UTC)
 * The intent is that two lemmings ensure inclusion of an attested term, without editor discretion to override the inclusion. Of course, that does not mean there is a deadline for creation or anything. Compare "A term should be included if it's likely that someone would run across it and want to know what it means." --Dan Polansky (talk) 17:22, 27 December 2018 (UTC)
 * How is it going to work out? Will the entries satisfying that condition be entirely protected from RFD and speedily kept if nominated? Per utramque cavernam 17:43, 27 December 2018 (UTC)
 * I take the override note back. People can override CFI to the extent they feel free to override CFI. However, the added specification does not expressly provide for override. Thus, if someone votes "delete" despite two lemmings, that would be against CFI, and therefore a CFI override. Similarly, those who previously invoked WT:LEMMING so far in RFD were making a CFI override, albeit one that appeared supported by the consensus in the linked 2014 discussion.
 * Understood. Per utramque cavernam 17:54, 27 December 2018 (UTC)
 * I think this discussion is getting involved, and there really isn't anything truly complicated. The proposal uses the same tense/modality as CFI's "A term should be included if it's likely that someone would run across it and want to know what it means", and that's it. --Dan Polansky (talk) 17:50, 27 December 2018 (UTC)

Scope
This vote lists only English dictionaries. Are we making English into a limited documentation language? Is this supposed to apply to all languages? DTLHS (talk) 17:32, 27 December 2018 (UTC)
 * This applies to all languages, and the wording does not say otherwise. I considered mentioning Duden, but then omitted it to make the wording and choice of dictionaries simpler; the dictionary list is there to give an idea of what kind of dictionaries we are talking of. As for LDL, the proposal does not lift the requirement of attestation, merely the requirement of idiomaticity; the whole LDL business is that LDL are not required to be attested in use. But I am not sure I understand the above comment about LDL. --Dan Polansky (talk) 17:41, 27 December 2018 (UTC)
 * I added Duden and DRAE to prevent any doubt. --Dan Polansky (talk) 17:54, 27 December 2018 (UTC)

Possibility of replacing defs with ?
I don't think we should allow other dictionaries to override our independent determination of idiomaticity, but I agree that for the sake of competition, we should include everything multiple professional dictionaries include. I think it would be a good idea to allow RFD votes to replace a definition included in other dictionaries with if we decide that it is SOP. We could allow it to be used as a translation target and have any other information regularly included in an entry, while at the same time making it clear to users that it is simply the sum of its parts. (If we ever get to the point of including a collocations namespace or section in entries, these could be converted to collocations.) Andrew Sheedy (talk) 06:16, 2 January 2019 (UTC)
 * First, let me point out that the proposal as worded does not change Wiktionary's notion of "idiomaticity"; rather, it allows inclusion of certain attested terms even when they are not idiomatic or appear not to be. Second, I for one don't like &lit, but think it might be a good idea to include label "sum of parts" before a definition when someone feels it is a sum of parts; the definition may select only some senses from the component words, and there is utility in that selection; to rather let the reader search for the pertinent senses for themselves would be a disservice. --Dan Polansky (talk) 20:01, 2 January 2019 (UTC)

"general monolingual dictionaries"
Hm, not all languages are blessed with the existence of a monolingual dictionary. (Am I being nitpicky?) —Suzukaze-c◇◇ 09:17, 5 January 2019 (UTC)
 * You are right. The phrase "general monolingual dictionaries" is from Beer_parlour/2014/January. We could design a broader lemming principle that would drop the "monolingual" requirement. The thing is, the principle as proposed is automatic, with no tentativeness specified, and therefore, the specification is rather conservative. You can see how many opposes there are for the current principle; making the principle weaker, even more inclusive, would generate even more opposition, I fear. The principle is to benefit languages lucky enough to have monolingual dictionaries, and there are quite a few. If consensus arises to make the principle broader, it can be made so in another vote. We would probably want to make it broader only for languages that do not have two monolingual dictionaries in the first place. --Dan Polansky (talk) 10:18, 5 January 2019 (UTC)