Talk:toleratie

toleratie
Wrong spelling of tolerantie. Should be deleted.15:50, 10 July 2013 (UTC)
 * Keep as an obsolete spelling or a common misspelling. This Dutch spelling is plentifully attested in Google books. The search in Dutch books of toleratie finds 90 hits, while the search in Dutch books for tolerantie finds 54,100 hits. The ratio of the two numbers is 601, which suggests a common misspelling to me; compare to the ratio of "conceive" vs. "concieve". --Dan Polansky (talk) 18:13, 10 July 2013 (UTC)
 * A ratio of 1:601 does not suggest a common misspelling to me at all. I'd call that rather rare. —Angr 18:57, 10 July 2013 (UTC)
 * Then find a couple of misspellings that you consider common and determine their frequency ratio. Then you will have a data-based or fact-based idea instead of an opinion not based on actual knowledge. If you want to see the homework I have done, you can have a look at User talk:Dan Polansky. --Dan Polansky (talk) 19:05, 10 July 2013 (UTC)

Anyway, here is a relevant snippet: --Dan Polansky (talk) 19:09, 10 July 2013 (UTC)
 * To judge from your data both here and on your talk page, I'd say a misspelling needs a frequency ratio < 100 to be considered common. —Angr 19:28, 10 July 2013 (UTC)
 * Thus, of the examples listed in the above table, only "condensor" is a common misspelling per your assessment. My assessment differs: "recieve" is a prototypical common misspelling by my lights, and it has frequency ratio of 1874. In Google web search, "recieve" has 32,700,000 hits. Furthermore, I see no reason to believe that copyedited Google books material should contain common mispelling in ratios less than 100. Be it as it may, you still have not listed your prototypical common misspellings with their frequency ratios. --Dan Polansky (talk) 19:46, 10 July 2013 (UTC)
 * Yes, and of the ones listed on your talk page, "referencable", "experiencable", "influencable", "sequencable", "idiosyncracy", and "supercede" are. As for what I consider to be common misspellings, that's a bit hard to judge, but two words I often misspell myself are separate and existent. I'm not sure how to read Google's Ngrams, but if I've read them correctly, then this says the ratio of seperate to separate is about 1:1030, while this says the ratio of existant to existent is about 1:52. So do that mean I have to increase my maximum frequency ratio for misspellings to be considered common? Not at all; it means seperate isn't as common a misspelling as I thought, so if someone were to nominate it for deletion, I'd vote delete. I'm surprised that existant is so much more common, though, and I wonder whether the French and Latin words are perhaps showing up in the results despite the "from the corpus: English" setting. Maybe the French and Latin words are showing up in quotes inside otherwise English-language texts. At any rate, the results are making me very skeptical of the reliability of Google Ngram Viewer as a reliable linguistic corpus analysis tool. There are so many corpora of written English out there, surely someone has analyzed some of them following proper statistical procedure to estimate the frequency of various misspellings. —Angr 21:15, 11 July 2013 (UTC)
 * What is the basis of your choice of the frequncy ratio in copyedited Google books of 100 as a threshold? What possible factual observation could shake your choice of that ratio? What about the results is making you very skeptical, as per your statement above? Given your doubt, have you considered looking at other corpora, such as COCA, BNC or even world wide web? --Dan Polansky (talk) 15:41, 12 July 2013 (UTC)
 * "What possible factual observation could shake your choice of that ratio?" Probably nothing; any spelling that occurs less than 1% of the time the word is used is simply too rare to be called "common". I would much prefer we use a real corpus like COCA or BNC, but there too I would want the frequency threshold to be at around 1%. —Angr 15:29, 13 July 2013 (UTC)
 * Do you agree with the following statement? 'Any spelling that occurs less than 1% of the time the word is used in a copyedited corpus is simply too rare to be called a "common misspelling".' Is there any further reasoning or evidence that you could provide in support of that statement? Let me emphasize that we are talking about common misspellings, not common spellings. --Dan Polansky (talk) 18:25, 13 July 2013 (UTC)
 * Yes, I agree with that statement. My reasoning is based on sense 3 of common: "Found in large numbers or in a large quantity". If we're going to call something an alternative but correct spelling, it had better occur far more often than 1% of the time. —Angr 18:40, 13 July 2013 (UTC)
 * Yes, correct spelling. But we are discussing threshold for common misspelling, not common correct spelling. The threshold is not for an alternative spelling tag to be used but rather for common misspelling to be included. --Dan Polansky (talk) 19:23, 13 July 2013 (UTC)
 * But we're still calling them common misspellings. I don't think a misspelling is common unless it slips past a copyeditor at least 1% of the time. I really don't think that's an unreasonably high threshold. —Angr 13:09, 14 July 2013 (UTC)


 * Well yes, Google books for lang=en for existant finds the term in many French and Latin books or snippets. Whether the results are similarly skewed for other spelling pairs can be discovered for each pair by having a glance at what the links present on the Ngram page show in Google books search. Actually, the suspect low ratio of around 50 suggested such a glance was worthwhile. I find it very likely that, for most investigated spelling pairs, there is no such skewing. --Dan Polansky (talk) 16:08, 12 July 2013 (UTC)
 * Wouldn't toleratie be equivalent to toleration not tolerance? Could it be a separate word, not a misspelling? Mglovesfun (talk) 20:28, 10 July 2013 (UTC)
 * And any chance it might be dated? The bulk of hits popping up at, for instance, are not terribly recent.  Limiting that search to the 21st century produced only two hits.  FWIW.  &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 21:39, 10 July 2013 (UTC)
 * As I said, "Keep as an obsolete spelling or a common misspelling." If it is not obsolete, then maybe dated. If it is a misspelling, then a common one. I don't feel qualified to decide whether this is an obsolete form, a dated form or a common misspelling. --Dan Polansky (talk) 17:40, 11 July 2013 (UTC)
 * I've never seen this word anywhere, but Mglovesfun is right, this would be "toleration" rather than "tolerance". So I don't think this can be considered a misspelling for sure, it's quite possibly another formation, which happens to be very rare. Of course we can't tell the difference in this case because they're still only one letter apart. 15:53, 12 July 2013 (UTC)
 * @DP: Obviously the frequency ratio of alt/misspelling to unmarked (prevailing) spelling alone is not sufficient evidence to choose among the classifications and presentations we use for current spellings: unmarked, "alternative spelling" and "common misspelling". Some weighting by its frequency in the corpus as a whole or by the absolute number of occurrences of the spelling in the corpus is needed. The natural log or square root of such frequencies or absolute numbers would give the right shape to a criterion curve, though it would need to be calibrated to reflect our judgement, preferences, or whims.
 * What Dutch corpora are there that reflect adequately misspellings? Is there a way to use Google searches of the web to be reasonably sure that one is counting mostly occurrences in Dutch text? Can we create a template for each language that would provide a way of having consistent searches for this purpose? DCDuring TALK  16:35, 12 July 2013 (UTC)
 * If you can create a compelling multi-factor method, that's great. In the absence of a presented specific alternative method complete with factor weights, the presented single-factor method using a reasonably reliable and already copyedited corpus is compelling to me. You can constrain a Google books search by language, which I have done for Dutch; the results seem reasonably reliable overall, with some skewings and glitches. --Dan Polansky (talk) 16:50, 12 July 2013 (UTC)
 * Accordingly, Delete as too rare a misspelling, if misspelling it is . DCDuring TALK 17:09, 12 July 2013 (UTC)
 * After your musings about multi-factor logarithmic analysis, what is the basis for your claim of "too rare a mispelling"? What method and threshold have you used? --Dan Polansky (talk) 17:22, 12 July 2013 (UTC)
 * I cry foul: an editor requests a complicated method and when he does not get any, votes upon a whim with providing no method whatsoever. --Dan Polansky (talk) 17:38, 12 July 2013 (UTC)
 * To DCDuring: well, as I noted above, there is no way to tell if it's a misspelling or an independently formed (but rare) word. So there is no grounds for considering it a misspelling that I can see. 17:41, 12 July 2013 (UTC)


 * Keep as a non-misspelling per Mglovesfun and CodeCat. — Ungoliant (Falai) 17:48, 12 July 2013 (UTC)
 * @DP: All of our decisions on misspellings are whimsical as we have no express criteria of any kind, quantitative or otherwise. In particular, we have no accepted criteria for determining what makes a misspelling common. Accordingly, I whimsically determine that this is too rare at less than 1%. But 1% is not in particular my criterion, nor of any implicit consensus, AFAICT. I don't think that we have ever accepted a challenged misspelling with such a low frequency, but facts could prove that wrong. I was willing to contemplate other criteria using this as a test case, but it does not have the makings of a good test case, IMO. DCDuring TALK  18:12, 12 July 2013 (UTC)
 * Thank you for providing the specific tentative threshold of 1% AKA 100 frequency ratio for Google books or similar copyedited corpus. Based on the table I have posted above, I think the threshold is eminently unreasonable. On another note, you might want to consider the analysis provided by a Dutch native speaker above (CodeCat) as an input to your vote. As for consensus, there is an implied consensus in Category:English misspellings, from which most of the items in the table have been taken. It is merely implied, but much better than anything else we have on consensus as far as the inclusion of common misspellings in Wiktionary.
 * If you want to be musing about previously challenged common misspellings, you'd better find some. Otherwise, yours is an idle speculation produced in the absence of actual knowledge. --Dan Polansky (talk) 19:28, 12 July 2013 (UTC)
 * I look forward to your offering your findings for discussion. DCDuring TALK 21:19, 12 July 2013 (UTC)
 * I was not making any claims about frequencies of previously challenged misspellings; you were: "I don't think that we have ever accepted a challenged misspelling with such a low frequency". My suspicion rests: yours is an idle speculation produced in the absence of actual knowledge. --Dan Polansky (talk) 06:24, 13 July 2013 (UTC)


 * Outcome: RFD kept: no consensus for deletion. Boldfaced keeps included mine and by Ungoliant; pro-keeping arguments were made by Mglovesfun and CodeCat; boldfaced delete is by DCDuring and, by implication, by the unsigned nominator; pro-deletion arguments were made by Angr. --Dan Polansky (talk) 09:55, 7 December 2013 (UTC)