Wiktionary:Votes/2020-09/Misspellings and alternative spellings

Misspellings and alternative spellings
Background: The Spellings section of CFI is cluttered and imprecise. It does not link to our policy page for alternative forms. Specifically regarding misspellings, it allows common ones, but does not establish criteria for what constitutes a common or rare misspelling. Misspellings are frequently challenged and sent to RFD where they are individually debated based on users' personal criteria. These are sometimes lengthy and contentious debates pointing to the lack of specific policy.

Current CFI text:

Misspellings, common misspellings and variant spellings: Rare misspellings should be excluded while common misspellings should be included. There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example, occurred is often spelled with only one c or only one r, but only occurred is considered correct.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Combining characters (like ́|this) should exist as main-namespace redirects to their non-combining forms (like ´|this) if the latter exist.

References:

Proposed text:

Rare misspellings should be excluded while common misspellings should be included. A misspelling is an erroneous spelling resulting from cognitive error (i.e., not knowing how to spell a word), but not from input error (e.g., typos). Editors have not yet reached a consensus as to whether typos should be included alongside misspellings. A misspelling is considered common if it appears at a ratio of 1 misspelling to 5000 accepted correct spellings in Google Books search results, and rare if it occurs less often. The ratio should be determined by an Advanced search specifying the language of the term. For languages without Advanced search support, challenges should be considered individually at RFD. For formatting of misspelling entries, see Misspellings.

Misspellings are not to be confused with alternative forms and spellings, which are considered correct spellings in different regions, or have been considered correct in the past. For policy regarding alternative forms, see Forms and spellings.

Combining characters (like ́|this) should exist as main-namespace redirects to their non-combining forms (like ´|this) if the latter exist.

References:

Rationale:
 * Establishing which misspellings are common and which are rare will expedite the RFD process, saving time and energy currently spent debating the merits of many individual misspellings.
 * The 1:5000 ratio allows many common spelling errors, such as ei-ie transpositions.
 * Google Books is the largest freely searchable collection of published material for many languages, so it gives us the best indicator of a term's frequency.
 * The sentence establishing the scope of the section and the vote determining its name is removed as irrelevant.
 * The paragraph about typos and the paragraph about language academies are removed as uninformative and irrelevant to the criteria regarding spellings and misspellings.
 * The paragraph about alternative spellings is simplified and directs to a more in-depth policy page.

Schedule:
 * Vote starts: 00:01, 25 September 2020 (UTC)
 * Vote ends: 23:59, 25 October 2020 (UTC)
 * Vote created: Ultimateria (talk) 19:27, 18 September 2020 (UTC)

Discussion:
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2020/September
 * [[Image:Wikt rei-artur3.svg|20px]] Wiktionary talk:Votes/2020-09/Misspellings and alternative spellings
 * [[Image:Wikt rei-artur3.svg|20px]] Votes/pl-2014-04/Keeping common misspellings
 * [[Image:Wikt rei-artur3.svg|20px]] User_talk:Dan_Polansky/2013
 * [[Image:Wikt rei-artur3.svg|20px]] Beer_parlour/2018/February
 * [[Image:Wikt rei-artur3.svg|20px]] Votes/2019-03/Excluding typos and scannos

Support

 * 1) . Ultimateria (talk) 00:01, 25 September 2020 (UTC)
 * 2)  —Μετάknowledge discuss/deeds 05:35, 26 September 2020 (UTC)
 * 3) . Imetsia (talk) 18:53, 26 September 2020 (UTC)
 * 4)  فين أخاي (talk) 02:51, 27 September 2020 (UTC)
 * 5)  Eventually we will include any frequent spelling, ie. the right one(s) (according to the norm but that's implicit), the common misspellings and the spelling variations, in other words everything but rare misspellings,  and in that sense I strongly agree excepted for the two following things. A rate of 1/5000 is too low in common writings (if one has seen 5000 times the word occurred, the number of common misspelling will be far higher than 1) and thus some very seldom spellings may pass this rule. As for Google books, we like on Wiktionary to rely (when possible) on recent publications, and the recently-published books are always verified for orthography and written people who generally have a very good orthography, therefore on Google Books, misspellings are never found (or at a rate lower than 1/5000) for recent books which is a useless search. As a conclusion this rate should be higher and applied on a broader scale, while searching misspellings on Google Books recent publications is irrelevant.  Malku H₂n̥rés (talk) 18:48, 29 September 2020 (UTC)
 * Based on your comments I think you meant to vote "oppose". Ultimateria (talk) 22:14, 29 September 2020 (UTC)
 * Actually I agree on the basis that we should include this form but not that form, which is the principal idea, but even if it represents most of the comment, these are two minor disagreements upon a way among others to make this distinction. Malku H₂n̥rés (talk) 05:49, 30 September 2020 (UTC)
 * 1)  Seems an improvement overall. A minor quibble: "input error (i.e., typos)" should say "e.g." not "i.e." (since typos are just one form of input error, along with scannos and voice recognition errors). I hope we will not exclude common deliberate sensational spellings like . Equinox ◑ 14:50, 2 October 2020 (UTC)
 * Fixed e.g. (surely that's not controversial). I believe those spellings are sufficiently covered as not "resulting in cognitive error". Ultimateria (talk) 16:20, 2 October 2020 (UTC)
 * 1)  İʟᴀᴡᴀ–Kᴀᴛᴀᴋᴀ (talk) (edits) 17:13, 2 October 2020 (UTC)
 * 2)  Jberkel 15:14, 7 October 2020 (UTC)

Oppose

 * 1)  While a better process for determining whether misspellings are "common" or "rare" may have its advantages (though I don't think it's a good idea to have a firm, unappealable qualitative cutoff), this proposal is seriously vitiated as it currently stands: Hazarasp (parlement · werkis) 02:19, 26 September 2020 (UTC)
 * 2) * Google Books is hardly comprehensive or complete. We may end up in a position of having to reject misspellings that are clearly prevalent.
 * 3) * We would be putting ourselves at the mercy of Google Books' OCR (optical character recognition), which is far from consistently reliable. For example, within Shakespeare's First Folio, the OCR is so useless that passages cannot be meaningfully searched for at all. The chance of it running into serious problems increases as texts get older, meaning that we would in effect be hardwiring a recency bias into Wiktionary's processes.
 * 4) * Additionally, Google Books is not static; books can be and are added or removed from it, and there is a good chance that its OCR algorithm gets modified on a regular basis. This means that any decision made at WT:RFV for misspellings would lack finality. While this is already a problem, by implementing this proposal, we would be magnifying it and baking it into the system.
 * 5) * As written, the proposal allows for any and all typos to be included within Wiktionary (as long as one can gather three attestations for them). This is clearly undesirable; ideally, there should be a process for excluding rare typos as there is for misspellings. This is especially ill-advised given that it is impossible to reliably distinguish typos from misspellings.
 * Hazarasp (parlement · werkis) 02:19, 26 September 2020 (UTC)
 * I'm not seeing how "the proposal allows for any and all typos to be included within Wiktionary (as long as one can gather three attestations for them)": in fact it explicitly states there is no consensus about whether to include typos at all. An earlier draft of this proposal, discussed on talk, did change existing practice regarding typos to be more inclusive, by including ones which were as common as common misspellings, but I've seen no draft of this proposal (including the current one) which allows typos with only three attestations. - -sche (discuss) 05:34, 26 September 2020 (UTC)
 * A careful reading of the proposal turns up no mentions of any attestation requirements for typos, and none exist elsewhere within WT:CFI. Therefore, if this proposal is adopted, typos will default to the basic WT:CFI attestation requirements (three attestations within a year etc.). The attestation requirements for misspellings can't be considered to cover typos, given that misspellings are defined as to exclude typos ("A misspelling is an erroneous spelling resulting from cognitive error... ...not from input error (i.e., typos)"), The bit you mentioned about the lack of consensus wrt. typos is irrelevant. Hazarasp (parlement · werkis) 08:25, 26 September 2020 (UTC)
 * "Therefore, if this proposal is adopted, typos will default to...": how so? This proposal doesn't change how typos are treated, so unless typos are already allowed with only three citations... - -sche (discuss) 19:03, 27 September 2020 (UTC)
 * It does (probably unintentionally), since our current treatment of typos is governed by portions of WT:CFI that would be be superceded by this proposal if it was adopted. Hazarasp (parlement · werkis) 02:53, 28 September 2020 (UTC)
 * 1)   – Guitarmankev1 (talk) 02:47, 26 September 2020 (UTC)
 * 2)  I support most of the changes (removing excess and unneeded bits, simplifying the rest), and as I said on talk, would support a version of this proposal which mad only those changes (dropping the sentences from "A misspelling is considered [...]" to "[...] considered individually at RFD"). But I'm not sure 1:5000 is a good threshold for commonness, and I am inclined to agree with the editors who've said codifying reliance on a specific Google Advanced Search product into CFI is a bad idea. - -sche (discuss) 05:40, 26 September 2020 (UTC)
 * No ratio is perfect, as any choice is arbitrary. Do you have a better suggestion? —Μετάknowledge discuss/deeds 07:25, 29 September 2020 (UTC)
 * 1)  It has stayed contradictory, and it still kowtows to Google. Fay Freak (talk)
 * 2) . The ratio for determining is not very comprehensible or convincing, as it seems to me, and Google Books are quite limited too. HeliosX (talk) 21:08, 28 September 2020 (UTC)
 * 3)  Dentonius (my politics | talk) 03:52, 2 October 2020 (UTC)
 * 4)  I like that the proposal removes the stuff about language academies, because I am not sure that linguists should consider these bodies authoritative. But I agree with Hazarasp about Google Books. Actually I think that Google search results can be different depending on your location. &mdash;Internoob 23:02, 4 October 2020 (UTC)
 * 5)  the fixed criterion "A misspelling is considered common if it appears at a ratio of 1 misspelling to 5000 accepted correct spellings in Google Books search results, and rare if it occurs less often." While this data can be taken into account, we should not be beholden to such a test. (For some tangential discussion about Google numbers, see Tea_room/2020/September.) Mihia (talk) 23:14, 4 October 2020 (UTC)
 * , but I would support if the text identified by -sche was removed and the comma before "or" in the second paragraph was deleted. This, that and the other (talk) 03:57, 5 October 2020 (UTC)
 * 1)  because of the reference to Google, which is not reliable for these purposes. Andrew Sheedy (talk) 17:04, 7 October 2020 (UTC)
 * 2)  I object to giving Google books the power to dictate Wiktionary. Second, not all words have even been used 5000 times. Third, the importance of a text can mean that a rarer alternative spelling can be significant. For example, Zola spells goyau as goyot. The current proposal would cause goyot to be deleted. However, more people have read Germinal than a textbook on constructing a coal-mine in French from the nineteenth century. I'd much rather have glosses that explain the rationale for inclusion than banning significant alternative spellings that do not meet an arbitrary criteria. Languageseeker (talk) 15:16, 11 October 2020 (UTC)
 * Your example is an alternative spelling, not a misspelling, so it would not be affected by this policy. —Μετάknowledge discuss/deeds 16:54, 16 October 2020 (UTC)
 * 1)  Per Andrew Sheedy; I would have supported the proposal otherwise. ←₰-→  Lingo Bingo Dingo (talk)  10:14, 18 October 2020 (UTC)
 * 2)  per Internoob. --Droigheann (talk) 08:20, 24 October 2020 (UTC)

Abstain

 * 1)  --Robbie SWE (talk) 12:00, 6 October 2020 (UTC)
 * 2)  --DannyS712 (talk) 04:11, 8 October 2020 (UTC)

Decision

 * Failed 8–13. —Mahāgaja · talk 07:43, 27 October 2020 (UTC)