Wiktionary talk:Votes/2020-09/Misspellings and alternative spellings

Misspellings in extinct languages
I'd like our policy on misspellings to allow extinct languages to list misspellings, e.g. CAT:Old Irish misspellings and CAT:Gothic misspellings. These are often the only attested form of a word, but they're clearly slips of the pen on the part of the scribe writing the manuscript. —Mahāgaja · talk 22:00, 18 September 2020 (UTC)
 * Allowing those is the status quo, and this vote does nothing to modify that. As it says, "For languages without Advanced search support, challenges should be considered individually at RFD." —Μετάknowledge discuss/deeds 23:36, 18 September 2020 (UTC)

Help:Misspellings?
We seem to have a page Help:Misspellings, which seems like a misuse of the Help: namespace. As I understand it, Help: is for help on technical issues regarding using Mediawiki software. Shouldn't it be moved to Misspellings (which is currently a redirect to the Help: page)? And does this vote take it into consideration? —Mahāgaja · talk 22:00, 18 September 2020 (UTC)
 * I didn't take this page into account, so thank you for bringing it up. I agree that it should be moved to the Wiktionary namespace, and also categorized as a think tank policy. If this vote passes, I'll update the page. I'd like CFI to link to it, so at the end of the first paragraph, I'll add "For formatting of misspelling entries, see Help:Misspellings." Ultimateria (talk) 01:04, 19 September 2020 (UTC)
 * Let's just move the namespace now, and have CFI link to Misspellings. —Μετάknowledge discuss/deeds 01:13, 19 September 2020 (UTC)
 * I moved the page and added the sentence to the proposal. Ultimateria (talk) 02:04, 19 September 2020 (UTC)

Misspellings vs typos
As I have no doubt already mentioned several times elsewhere, the present wording is confusing for dropping in the word "typo" without any explanation of if or how it might differ from "misspelling". In some previous discussions, the word "typo" has been used for an accidental slip, and "misspelling" for a wrong spelling that someone uses believing to be correct. Whatever terminology we use, there is in my view an important distinction between these two things, and potentially in the way that we treat them, and this needs to be reflected or mentioned in the CFI wording. Mihia (talk) 22:26, 18 September 2020 (UTC)
 * There is a meaningful theoretical distinction between manual and cognitive errors, but we cannot always determine which has occurred in a misspelt text, and this standard of frequency does not force us to make any distinction. Perhaps the issue here is that the wording should specify that "misspelling" is being used to mean any kind of spelling considered to be an error. —Μετάknowledge discuss/deeds 23:39, 18 September 2020 (UTC)
 * How about after the first sentence, "For the purpose of inclusion, a misspelling is considered an erroneous spelling resulting from cognitive error or input error (i.e., typos and scannos), but not a deliberately erroneous spelling, e.g.,, Internet slang form of ." Ultimateria (talk) 00:50, 19 September 2020 (UTC)
 * I support that. —Μετάknowledge discuss/deeds 01:13, 19 September 2020 (UTC)
 * I don't support including typos and scannos at all. Sadly, despite a 12-7 majority in favour of excluding these, a vote a while back was deemed "no consensus". Are we now saying that for inclusion purposes we should treat accidental typos/scannos exactly the same as misspellings that the writer believes to be correct?? I would be very unhappy to see that codified in the CFI. As far as your wording is concerned, I think some readers may not understand "cognitive error" and "input error", and it also can easily be read as meaning that "typos and scannos" applies to "cognitive error or input error", possibly with "typos" being "cognitive errors" and "scannos" being "input errors", which presumably is not the intention. I think this could be explained in simpler/clearer language. By the way, saying in the CFI that "misspellings" include typos is also at odds with the semi-convention that has arisen elsewhere in Wiktionary discussions. Mihia (talk) 09:42, 19 September 2020 (UTC)
 * I think that distinction should be the subject of a future vote, not this one. I've made the language more clear based on your feedback, so thank you. Ultimateria (talk) 17:33, 19 September 2020 (UTC)
 * Thanks, the proposed new wording around "cognitive error" and "input error" is much clearer now. As far as the distinction between typos (accidental slips) and misspellings (ignorance of correct spelling) is concerned, yes, the existing CFI wording is unclear, but it can be understood as trying to make a distinction between the two, or at least not ruling this out. Your proposed wording makes it explicit that they should be treated identically, so in this sense the distinction is already the (or a) subject of this vote, as it seems to me. Mihia (talk) 18:55, 19 September 2020 (UTC)
 * Suggestion. Perhaps you would consider offering two options, one as now, and the other to the effect that for these purposes "misspellings" does not include typo/scannos, and that presently there is no consensus on how to treat the latter. (Personally I think a 12-7 majority is a good enough consensus, but there you go. In any case, I don't think it should lead to a statement or reinforcement of more or less the exact opposite in the CFI.) Mihia (talk) 21:15, 19 September 2020 (UTC)
 * I thought typos and scannos aren’t even misspellings, they are an aliud, because he who typed it did not spell: He did not realize what he wrote; if he had spelled, gone letter by letter, he would not have written it. You would need another hypernym. On the other hand claiming “deliberately erroneous spellings” are not misspellings while at the same time claiming there are misspellings by cognitive error is contradictory, since the cognitive error includes specifically that someone, while realizing what he writes, reckons that he writes correctly while he actually doesn’t – his deliberation was misinformed, while the other deliberation disdained the correct information about how something is written. But the latter are misspellings, so-called deliberate ones. Fay Freak (talk) 19:55, 19 September 2020 (UTC)


 * I agree with Mihia's suggestion of having two options, and suggest the second option be the proposed 'new' text but with everything from " For the purpose of inclusion, a misspelling [...]" to "[...] individually at RFD. " dropped, because (1) identifying typos as misspellings and explicitly including them on the same basis is contentious, given the majority support for not doing that in the previous vote, which could well sink this one, (2) the proposed 1:5000 frequency cutoff is probably contentious, and (3) explicit reliable solely on Google Books is contentious (as seen below). - -sche (discuss) 05:42, 20 September 2020 (UTC)


 * Why are scannos mentioned at all? Is not the very nature of a scanno that it does not occur in the text but only in e.g. Google's scan and digitization of the text, while the actual page has the correct spelling? So on what basis would a scanno be included here? If it is so common in Google's notions of what the pages say that it appears more frequently than 1:5000 in Ngrams despite not appearing on the actual pages of Google Books? - -sche (discuss) 05:48, 20 September 2020 (UTC)
 * Okay, I've made the misspelling/typo distinction. I won't bother giving two choices, because I wasn't actually trying to take a stance on that issue. I think the new language is more neutral and accurate. I've also removed any mention of scannos. The idea was to illustrate what "input error" meant, but I didn't consider the difference between typos and scannos. Thank you both for your suggestions. Ultimateria (talk) 01:14, 21 September 2020 (UTC)

Scroogle
I strongly oppose including any dependance on any Google service. Relying additionally on Google’s language detection is absurd (“Advanced search specifying the language of the term”). I also doubt one can correctly calculate a “ratio of 1 misspelling to 5000 accepted correct spellings”. You also don’t know what Google does behind the scenes; their algorithms are not designed for our purposes and not open-source. The results about whether something is rare will be random, if not arbitrarily decided by the way an algorithm works because they have personalized or something like that. This vote substitutes gut-feel and common-sense and experience and expectations of editors about what should be included, which aren’t that bad, with a black hole of randomness, which is worse. Fay Freak (talk) 19:55, 19 September 2020 (UTC)


 * ^ —Suzukaze-c (talk) 21:20, 19 September 2020 (UTC)
 * I agree that Google services are not ideal, but in order to include misspellings based on frequency, we need an external metric to determine frequency. Google Books is the logical choice because it's the largest searchable collection of published materials, not to mention it's the de facto metric already. As long as misspellings are allowed, I believe the only alternative to an external metric is a concrete number of citations, which would likely raise the threshold to include virtually any misspelling. Ultimateria (talk) 00:35, 21 September 2020 (UTC)