Wiktionary talk:Votes/pl-2007-12/Attestation criteria

allowable foreign words
The proposal reads:
 * Foreign language terms approved on a mature Wiktionary of that language may bypass attestation on English Wiktionary.

What is meant by approved? Does this mean it has passed foo.wikt's version of RFV? Or merely that the entry exists? Or that the entry has existed for a length of time? (And how long?)&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * "Approved" means that it meets foo.wikt's criteria for inclusion or has passed the process that establishes legitimacy, depending on how that wiktionary deliberates such issues. DAVilla 05:33, 3 January 2008 (UTC)


 * And how do we determine that it meets foo.wikt's CFI? Do we raise it on our RFV page and ask people to cite the foreign word according to fo.wikt's attestation criteria (and other CFI)? Or did you mean merely "'Approved' means that it has passed the process...."? If the latter, we'll lack a lot of words that never get RFVed on foo.wikt.&mdash;msh210 &#x2120; 17:31, 3 January 2008 (UTC)


 * We do not make such determination. That's where the "otherwise" clause kicks in and the regular rules of attestation apply. The implication is one-directional: if it has already been approved on a mature wiktionary, by attestation or whatever policy is in place, then it need not be attested here. I will try to make this clearer in the wording. DAVilla 18:43, 3 January 2008 (UTC)

The proposal reads:
 * Foreign language terms approved on a mature Wiktionary of that language may bypass attestation on English Wiktionary.

Why only on a wikt of that language? Why not on any mature wikt?&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * The point is that it is primarily the wiktionary in each language that should be responsible for determining the legitimacy of terms of that language. Native speakers would be the best capable of determining what is idiomatic with respect to that language alone. It is acceptable for another wiktionary to allow a term in a foreign language that for some reason is not allowed in the native wiktionary. For instance, Mandarin Wiktionary may allow a Hebrew term that Hebrew Wiktionary does not allow. In those cases, I believe it would be best for us to determine by our own processes whether we should also allow the Hebrew term, rather than relying on the deliberation of Mandarin Wiktionary. DAVilla 05:33, 3 January 2008 (UTC)

Does the FL rule apply to simple.wikt? (Right now, of course, the project isn't "mature", but that will, I suppose, change.) In other words, if an English word passes simple's CFI, then we don't RFV it?&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * I would say no, since simple English is not a foreign language. However, I'm not certain how that case would ever arise. We would probably allow everything that simple.wikt ever has. DAVilla 05:33, 3 January 2008 (UTC)

allowable foreign Wiktionaries
I think this is not specific enough:
 * A mature Wiktionary is one that has objective criteria for inclusion or an established process for deliberating the legitimacy of terms.

What are we considering objective? or established? or a process?&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * "Objective" means that the legitimacy of a term can be determined without subjectivity, that any reasonable individual would reach the same conclusion. Criteria need not be objective in the case that there is an established deliberation process, perhaps more precisely an approved policy that allows any contibutor to bring the question of even an adminstrator's judgement to the full community. For instance, idiomaticity is not a fully objective criterion here, but that issue can be raised in WT:RFD. Wiktionaries with few regular contributors who are not admins usually do not need such policies.
 * Whether a Wiktionary is mature or not can be debated, and is an issue that might spill out of deliberation on specific foreign language terms. Maturity would accept terms without question, but we are more lenient for languages where the wiktionary is not mature, so the argument for maturity is not always an inclusivist argument. It's a question of trust in deferment, ensuring that subjective judgement will not be blindly carried over here. For idiomatic terms, three citations will always supercede the maturity debate. DAVilla 06:04, 3 January 2008 (UTC)

time span
I think this is opaque (to the extent I will describe):
 * In some cases additional citations may be needed, for instance, if these span fewer than three years. When lacking instances in print,[...]

Do you mean that if the three citations span fewer than three years, then additional citations will be needed, or that if the three citations span fewer than three years, then additional citations may be needed? I assume you mean the former (from indications in other parts of the proposal), but then reword it. Perhaps
 * In some cases additional citations are needed. See the paragraph on time span, below, for one example. As another, when lacking instances in print,[...]

is better?&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * I've reorganized it to incoporate the time-span section. DAVilla 07:20, 3 January 2008 (UTC)

new Wiktionaries
I think this is opaque:
 * For terms of a language where the Wiktionary is underdeveloped or does not exist, the time span and the first citation are waived.

Does this mean that only two citations are required, none from an edited, printed work? Or that three are, none from an edited, printed work?&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * It means that, rather than requiring three citations, one of each type:
 * A quotation from an edited, printed work that conveys meaning.
 * A reference in a refereed academic journal, or a quotation as above.
 * Any durably archived quotation that conveys meaning, or a reference as above.
 * we require only two citations, of type 2 and 3. Type 3 may be any durably archived work, in print or otherwise, but type 2 must be edited and printed since, apart from type 1, academic journals are obviously edited and printed.
 * So to answer your question, of two citations, one must be from an edited, printed work. On the other hand, the substitution rule could apply here, and three citiations from unedited and unprinted works will suffice if they all convey meaning. DAVilla 06:31, 3 January 2008 (UTC)


 * Not every refereed, academic journal is printed. Not at all. There are a number of online-only journals nowadays; see, e.g., .&mdash;msh210 &#x2120; 17:28, 3 January 2008 (UTC)


 * Interesting. In earlier responses, understand that quotations of use from these are strong enough to be on par with anything in print in my view. The wording in the proposal has been updated, but didn't need to change much. The three classes above still apply except that a quotation of use from a refereed academic journal would obviously be stronger than a simple reference. DAVilla 19:46, 3 January 2008 (UTC)


 * I also did not understand this as written in the proposal. The explanation above makes sense, but the short version worded in the proposal does not.  Could it be rewritten for clarity? --EncycloPetey 07:47, 3 January 2008 (UTC)


 * Please, yes! DAVilla 18:32, 3 January 2008 (UTC)

For terms of a language where the Wiktionary is not mature or does not exist, the minimum point total is 7. Can we make that 6?&mdash;msh210 &#x2120; 19:17, 22 January 2008 (UTC)
 * Thanks.&mdash;msh210 &#x2120; 19:25, 22 January 2008 (UTC)


 * Yes, that wasn't true to the previous wording. Done. DAVilla 19:26, 22 January 2008 (UTC)

the three citations
I think this is opaque:
 * Otherwise three citations are required for verification:
 * A quotation from an edited, printed work that conveys meaning.
 * A reference in a refereed academic journal, or a quotation as above.
 * Any durably archived quotation that conveys meaning, or a reference as above.

How about, instead, the following?
 * Otherwise three citations are required for verification:
 * At least one citation must be a quotation from an edited, printed work, one that conveys meaning.
 * If three such quotations are not found, then at least one citation must be a reference (mention of the word) in a refereed academic journal.
 * If the above still yields only two citations, the third may be any durably archived quotation that conveys meaning.

&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * This is a fine rewording, but it would also require rewording the reduced requirements in the talk section immediately above. I was trying to be concise without being mathematical, but that's not easy for me to do. DAVilla 06:35, 3 January 2008 (UTC)


 * Revised the second as follows, to reflect the original meaning:
 * 2. If three such quotations are not found, one citation may be a mention in a refereed academic journal.
 * However, I wonder if this should be changed, since two mentions in journals is already pretty strong. DAVilla 19:43, 3 January 2008 (UTC)


 * I do not understand this "may". If three cites of the first type are not found, then there must be one of the second type, no? (Barring adding cites to make up reduction in quality, of course.)&mdash;msh210 &#x2120; 23:16, 3 January 2008 (UTC)


 * No, if only two citations of the first type are found, then the third citation may be unprinted. A journal reference is not required, and in fact these are rare. DAVilla 02:24, 5 January 2008 (UTC)

the fourth and fifth citations
I think this is opaque:
 * When lacking instances in print, a citation of the last type may be substituted if there is an additional uncounted quotation that conveys the same or very closely related meaning, such as for an inflection or different part of speech with parallel meaning.

How about, instead, making it a separate paragraph, the following?
 * When lacking instances in print, a citation of the last type may be substituted if there is an additional uncounted quotation that conveys the same or very closely related meaning, such as for an inflection or different part of speech with parallel meaning. For example, if the only two print citations are references in academic journals, then the word is considered attested if an additional two citations are brought of the third type: one to satisfy the requirement for a third citation, and another to satisfy this paragraph.

&mdash;msh210 &#x2120; 19:33, 27 December 2007 (UTC)


 * Okay, I've made some changes. Feel free to edit directly. DAVilla 07:20, 3 January 2008 (UTC)

extinct languages
should be excluded from this policy. Not all of them are so well attested like e.g. Ancient Greek, but some words are attested only once, usually in non-lemma form, but often in nonambiguous context (e.g. a translation from a well-known source), or sometimes no one even knows the meaning, or the script/tablets carrying attestetions are damaged so that only partial reconstruction can be made (very common for cuneiform..). --Ivan Štambuk 19:55, 3 January 2008 (UTC)

"may be substituted with"
Could the phrase "may be substituted with" be replaced with something better? Traditional usage has "replace A with B" = "substitute B for A", and while I'm not ordinarily a stickler for traditional usage, in this case I'm really not sure what the paragraph is trying to say, and I think fixing this phrase might help a good deal.

Better yet, it might be better to separate the whole thing into a "definitions" section, which lists (and preferably names) each type of citation, and a "criteria" section, which specifies what combinations of different types are acceptable (and preferably also indicates what combinations are considered better than others).

—Ruakh TALK 15:50, 12 January 2008 (UTC)


 * I've added two commas as a minimum, or would any of the following be clearer?
 * may substitute with an additional, uncounted quotation
 * may take its place with an additional, uncounted quotation
 * may substitute, taking its place along with an additional, uncounted quotation
 * or something else? DAVilla 23:44, 21 January 2008 (UTC)


 * Oh! The first comma explains a lot. But, I think that's a bad way to put it: it's not that a citation of another type is being substituted, with blah-blah-blah, but rather that a citation of another type and a blah-blah-blah are being substituted. All told, I think I'd replace this:

 “Attested” means a term is verified through independent instances in durably archived media. A term with clearly widespread use or with usage in a well-known classical work is automatically deemed verifiable. Otherwise three citations are required for verification: At least one citation must be a quotation that conveys meaning from an edited, printed work. If three such quotations are not found, one citation may be a mention in a refereed academic journal. If the above still yield only two citations, the third may be any durably archived quotation of use, conveying meaning. If there are insufficient instances of the first or second type, a citation of another type (second or third) may be substituted, with an additional, uncounted quotation that conveys the same or very closely related meaning. For example, if the only two print citations are mentions in academic journals, then two additional citations are needed for attestation: one to satisfy the requirement for a third citation, and another to satisfy this paragraph. </dd></dl></dd></dl></dd></dl>


 * with this:

<dl><dd><dl><dd><dl style="background-color:#99FFCC; color: #004422"><dd> “Attested” means a term is verified through independent instances in durably archived media. The following may be used in verifying a term: <ol><li>A quotation, taken from a well-known, classical work, using the term to convey meaning.</li> <li>A quotation, taken from any edited, printed work, using the term to convey meaning.</li> <li>A quotation, taken from a refereed academic journal article, mentioning the term.</li> <li>A quotation, taken from any durably archived work, using the term to convey meaning.</li></ol> Any of the following, in approximate order of preference, is considered to constitute verification: <ol><li>Three quotations of type 2 above.</li> <li>Two quotations of type 2 above, and one of type 3 or type 4.</li> <li>A quotation of type 1 above.</li> <li>One quotation of type 2 above, one of type 3, and one of type 4.</li> <li>Two quotations of type 3 above, and two of type 4.</li> <li>Clearly widespread use. (This is essentially a fall-back: a term merits inclusion if there's consensus that it's clearly widespread, even if this can't be demonstrated with durably archived quotations.)</li></ol></dd></dl></dd></dl></dd></dl>


 * Ridiculously complicated? Yes. Debatable? Yes (especially with regards to the ordering of "A quotation of type 1 above."; your version was ambiguous with regard to that, so I picked a spot). But I think it's very clear about what counts and what doesn't, even if readers will have to scroll up and down a few times to appreciate the clarity. ;-)


 * —Ruakh <i >TALK</i > 00:45, 22 January 2008 (UTC)


 * It's more straightforward Ruakh's way, I think.
 * If I have my arithmetic right, which I probably don't, you can say something like "type 1 is 100 points, type 2 is 45 points, type 3 is 33 points, type 4 is 22 points, 100 or more points means verified." But that's crazy. Cynewulf 05:20, 22 January 2008 (UTC)


 * I've majorly overhauled it, and done away with the "same or very closely related meaning" bologna, since as far as the root is concerned we shouldn't be having to cite inflections of verbs independently anyway so long as they're being used as verbs and not ambiguously as adjectives or nouns outside of gerundrial phrases. DAVilla 19:03, 22 January 2008 (UTC)

Precatory language obscures true relaxation
The wording of the proposal expresses a preference for "higher quality" citations, but permits the use of "low quality" sources. Unfortunately, when push comes to shove, which is when rules really matter, only hard limits will be important. The expressed wish for higher quality quotes is likely merely to cause conflict. All the regulars will recognize the real force of the wording in the proposal, but any members of the broader population of users may be misled. DCDuring 17:00, 12 January 2008 (UTC)


 * Can you think of any more specific examples that could illustrate where the boundaries lie, so as not to mislead? For instance, saying that movie scripts do not count as edited, printed works since they are not published... or something better? DAVilla 23:33, 21 January 2008 (UTC)


 * My comment was based on my difficulty in understanding the proposal. After having re-read it a few times, I now realize that the changes would be largely irrelevant to my attestation efforts. 99% of my citations have been sourced from b.g.c. Citations from other sources are often not readily verifiable without subscriptions that I do not have (Scholar or News archives). The other sources have had some value for relative frequency (alt forms) and quick evidence for regional distribution. (US vs UK) DCDuring 00:27, 22 January 2008 (UTC)

"printed"
So we would be dispensing with all non-printed materials, except where 2 print citations are available? This raises a number of problems for me:
 * Verifiability: Anyone can claim that a cite exists on page W in a work X written by Y and published in the year ZZZZ. Identifying fraud (or honest mistakes) in such cases can be extremely difficult; on the other hand, checking online citations is usually a trivial task.  Frankly I think it would be more in the interest of the project to require at least one cite to be electronically accessible in some form.  Failing that we could at least require either a ISBN/ISSN number or electronic accessibility.
 * Arbitrariness: For most newspapers and magazines today offering content online, it is virtually impossible for a non-subscriber to determine what was in the print edition and what was not. In the journalistic industry today, there really is no meaningful line to be drawn between the two; online content is as likely or unlikely to be editorially vetted as any other.  This policy would also mean that most recent cites drawn from Google News Archives (for example) would have to be considered guilty until proven innocent.  That would be a significant loss for us.
 * Exclusivity: Many have argued that one of Wiktionary's greatest strengths is its ability to reflect changes in the lexicon which ordinary dictionaries cannot be bothered to notice. Making our attestation criteria so stringent as to exclude even those linguistic shifts which are readily verifiable is not in the interests of the project.
 * Further on this, we have heretofore accepted durable non-print media such as CDs and DVDs, and rightly so. DVDs (particularly those with  subtitles) are a potential gold-mine for citations of colloquial usage.  After all, most movie/TV scripts are not readily available in print (or even in editorially-controlled online editions).  I think it would be seriously detrimental to the project to exclude such material.

In general, although clarification is good I don't think it serves our interests to introduce stricter requirements for inclusion when most entries have no citations at all. It would be a pity for us to start deleting readily-verifiable terms simply because two print citations could not be provided. -- Visviva 03:23, 13 January 2008 (UTC)


 * The distinction between printed and unprinted periodicals bothers me too, but we have always required that the sources be durable, and your argument applies to only those few electronic forms that actually are in thier own right. The correctness of verification is a more important factor than speed.
 * I will leave you to your opinion, but please note that these criteria would still allow for terms not verifiable in print, though with the stricter requirement of essentially usually around five citations based on thier age (three not in print plus two for a very closely related meaning) when they span at least three years . DAVilla 23:16, 21 January 2008 (UTC) Edited. 70.112.121.70 17:26, 25 January 2008 (UTC)

complicated heuristic
The algorithm for determining the number of required citations is fairly complicated as given. How about phrasing it something like this:


 * Five citations spanning at least three years in durably archived media are sufficient.
 * Four citations are acceptable if ...
 * Three are sufficient if all three are from edited, printed works.
 * Two or fewer citations are never sufficient.

Rereading the original, I'm not sure I've understood it properly. Cynewulf 00:06, 22 January 2008 (UTC)


 * Never mind. What Ruakh said. Cynewulf 05:02, 22 January 2008 (UTC)

grandfather clause
Shall we have a grandfather clause that ensures an entry that passed RFV before, say, the start of 2008 can't come up for verification again under the new criteria?&mdash;msh210 &#x2120; 21:09, 24 January 2008 (UTC)


 * No, because sometimes things have "passed" that weren't really shown to meet CFI, and almost any valid passes would still pass under this system (the exception would be those three-Usenet-citations cases, but it's probably a good idea to revisit those anyway). -- Visviva 01:43, 25 January 2008 (UTC)


 * Agreed. The change isn't meant to be that radical, just more precise. 70.112.121.70 17:29, 25 January 2008 (UTC)

Looking much better
May I just say, I really like the way this has been revised. I think this, together with the new Citations: namespace, will help to greatly rationalize our handling of verification. -- Visviva 01:43, 25 January 2008 (UTC)

independence of news articles
The proposal reads: "...19 points, such as five newspaper articles from five presses that do not cite any of the same sources." Do we really want such a strict threshold of independence? I think "such as five newspaper articles from five authors that use the term independently of any common source" is far better.&mdash;msh210 &#x2120; 03:19, 27 January 2008 (UTC)


 * I thought we already had such a strict threshold. Five authors is not independent enough, since any two from the same press would have the same editor who would allow, advise, or himself write in the same wording. But if you think it's helpful, I've stripped it down a lot to only say that they should be independent, without saying how. DAVilla 14:49, 27 January 2008 (UTC)

classical
I think it is a big mistake to use "classical" without defining it. If we're redoing the attestation criteria already, we might as well do the job.&mdash;msh210 &#x2120; 03:19, 27 January 2008 (UTC)


 * Personally I would define it as a work which has been the subject of multiple (at least three) peer-reviewed academic studies. But that would lead to howls of outrage because it would class Harry Potter, Star Wars and The Matrix among classical works... perhaps "a work which is at least X years old and has been the subject of multiple peer-reviewed academic studies," where X is greater than 50?  -- Visviva 04:27, 27 January 2008 (UTC)


 * Your definition strikes me as suited better to "classic" than to "classical". Gulliver's Travels and The Time Machine and Little Women and maybe even To Kill A Mockingbird are all classics, but they're not classical works (by my definition, anyway). —Ruakh <i >TALK</i > 15:06, 27 January 2008 (UTC)


 * Er, well, yes, I had assumed that's what was meant, since classical in the stricter sense would only refer to words from extinct (varieties of) languages and those are -- as I understand it -- exempted by definition. For Wiktionary's purposes, I think "classical work" should be defined as "a work of such prominence that a reader would reasonably expect to be able to find every word in the dictionary," which would certainly include those mentioned above; but this is not really a hugely important point. -- Visviva 23:52, 27 January 2008 (UTC)


 * As I told EncycloPetey, I wouldn't say it has to be any more well defined than "clearly widespread". If you think of anything more definitive than "classical" please feel free to edit. However, these schemes of counting additional sources really miss the point. I am satisfied that the language used in the more popular, older works has been copied in more recent ones, and this if anything is the measure of how well-known a work is. Certainly not Star Wars nor anything newer would be of sufficient age, but if the debate arises for something in the gray area, the test is simple. The more classical and the more well-known it is, the more likely that the terms will be attestable. To date I have only seen one counter-example to this claim, and I would just as well vote on that specific term separately, than try to adjust a plain-language guideline by making it any more complex than I already have. DAVilla 15:01, 27 January 2008 (UTC)


 * I'd define it as a work that's old enough, but loved enough, that living speakers of a language are willing to stretch the definition of "the same language" in order to consider it to be written in their language. ;-)  (I'm not just mocking here — I prefer to think of Hebrew as one language going back at least as far as the Torah, but many people think that's a stretch.) —Ruakh <i >TALK</i > 15:04, 27 January 2008 (UTC)


 * I would think that, in a living language with a lot of printed literature, we do not really have to have any special treatment of "classics". Even if Shakespeare used it, if it hasn't been taken up by now, it needn't be in Wikltionary. I think we are talking about formalizing a way of saving some labor on citations that leads us to a bit of bias toward antiquarianism, which aggravates the bias caused by dependence on copyright-free corpera and early 20th C. dictionaries for entries. DCDuring <i >TALK</i > 15:50, 27 January 2008 (UTC)


 * What I said above, essentially, is that if Shakespeare used it, then it has been taken up. That's the conclusion I reached working halfway down the list Encyclopetey gave, starting with kicky-wicky. DAVilla 17:47, 27 January 2008 (UTC)


 * My belief is that works that are studied, read, and explained to people as part of standard education, that are quoted and referenced by everyone, that function as a cornerstone of a culture are more likely to survive, and their words more likely to be encountered, than other works. This is what I'm thinking when I say that we should include all of Shakespeare's words. Cynewulf 17:45, 27 January 2008 (UTC)


 * Great! Is there a way to sum that up into one word? DAVilla 17:47, 27 January 2008 (UTC)


 * Aye, there's the rub. Cynewulf 17:49, 27 January 2008 (UTC)


 * A classic (since the age of printing) is a work whose influence is so pervasive that even words used in it for the first time would easily meet our normal attestation criteria with citations from other works. For works authored before printing, any work that has been published in more than one printed edition in its language or has been translated into more than three languages. To me the discussion of post-printing "classics" is not very critical except for the labor-saving value in citations. We honor Shakespeare by exempting him from a second thought about the value of any of his words, but there are few other English authors who merit such honor. DCDuring <i >TALK</i > 18:06, 27 January 2008 (UTC)


 * Well, they wouldn't necessarily have met our existing criteria (3 uses), but they should certainly be able to meet the proposed criteria (1 use in a work more than 120 years old + 2 scholarly mentions = 10 points, or 3 mentions for more recent classics). So actually I'm not sure we need #1 at all. -- Visviva 23:52, 27 January 2008 (UTC)


 * The best I can think of to replace "classical" is "significantly old, well-known, and influential". Defining "classical" as "old work with scholarly mentions" seems somewhat circular, but I guess that's an effective and objective way to measure influence. Cynewulf 15:30, 28 January 2008 (UTC)


 * Could this be done inductively? What (English-language) works have we accepted as "classical" in the past? A list of precedents might make it possible to infer the pattern even if we can't or don't make it explicit. Is there a way to extract that information? DCDuring <i >TALK</i > 16:08, 28 January 2008 (UTC)


 * Not sure, but I believe in actuality there have been very few (if any) such cases. There is general agreement that Shakespeare qualifies, and probably a few others of similar vintage and prominence (KJV, Milton) too. In general, the vast majority of words that turn up in such sources are also attested elsewhere, at least in variant spellings.  -- Visviva 14:56, 25 February 2008 (UTC)

Inflected, derived, and alternative forms
If this goes forward in the future -- which I hope it will -- it would be nice if it also made some provision for words that are
 * a) regularly inflected forms of an attested word (facepalming);
 * b) regularly derived forms of an attested word (mollipilosity); or
 * c) imperfectly-attested alternative spellings of an attested word.

IMO, the first case should almost get a free pass (maybe +8?), and the second and third should get a significant bump (maybe +4). Whatever the bonus, it should be commutative, i.e. the same bonus should apply to a root form with an attested derivative as to a derived term with an attested root. -- Visviva 14:56, 25 February 2008 (UTC)


 * There was a similar idea on the page before the point system was introduced, but I took it out because it complicated an already complex issue, for one. If you disagree, something could be added for derived forms, but as a more important reason not to include these other points, I'm not convinced that inflected forms are given proper treatment in the first place. We don't attest tenses anyway, we attest spellings de facto, so "to neopost", "will neopost", "could neopost", "doesn't neopost", and "did neopost" are all grouped together, likewise gerunds with continuous forms, participles with past and passive, whereas attestation of "neoposts" would require specifically a third-person singular simple present construct in a non-negative frame. Considering how differently each language treats tenses, if there even are any, to me this is a bit ridiculous, and while I will accept it at present, exact spelling does not appear as part of the current CFI, nor do I wish to slide it in without a discussion that would probably be very difficult to achieve consensus on. Communication occurs at many levels, but if nothing else language is oral. Exact spellings, though they highly influence our index, are not a reflection of that. Applying these criteria to inflected forms would not be a way to let them in more easily, it would be an acknowledgement that the spelling matters more than anything else: more than speech, more than tenses and grammar labels, more than similarity in meaning. I don't mean to say that spelling isn't important. I recognize that we have to have spelling standards and that these standards should be reflected in a dictionary, but why should a word not be considered a word only because no one agrees on the spelling or because it has never been written down except by linguists? Considering that we do not even have criteria for common misspellings, which are attested terms anyway, introduction of this hard-coded logic seems premature. DAVilla 16:40, 25 February 2008 (UTC)


 * It may indeed be premature, but IMO resolving this issue is somewhat critical to RfV practice. Either we are citing senses of written forms, or we are citing senses of lexemes.  But perhaps this needs to be discussed separately, as you say. -- Visviva 06:13, 26 February 2008 (UTC)


 * I disagree with (a), on the grounds that I don't think we should be attesting by form, anyway. Under current policies, I would mark "flutuite" RFV passed, and give it, if we were given one cite for "flutuite", one for "flutuites", and one for "flutuiting". (This is unless there's a specific reason to question its regularity or fully-inflected-ness; for example, while babysit: is well attested, I wouldn't accept it on the basis only of cites for babysitting:, because it's well known that noun+participle, noun+gerund, and noun+action-noun forms are much more productive in English than noun+finite-verb forms.) However, if there's consensus that I've been misapplying CFI in this regard, then I'd support your suggestion as better than nothing. :-P  (I suspect this might be tied in to the lemma/non-lemma debate: I consider flutuite(s/d)/flutuiting to be one word with four different forms, but I suppose some editors would consider them to be four separate words.)


 * I also disagree with (b), because if undefined: gets a million hits and mollipilosity: gets two, I see no reason to give undefined: an artificial boost. I might agree with a modified form of (b), however.


 * I don't know how I feel about (c), and would like some examples, please. :-)


 * —Ruakh <i >TALK</i > 00:31, 26 February 2008 (UTC)


 * On (a), interesting; I like the idea of citing by lexemes, and have tended to do so in practice myself (mixing various tenses & forms when citing an entry). On the other hand, there is a certain elegance in requiring each entry to stand or fall on its own merits.  This would need to be a separate discussion, I guess.
 * On (b), I was thinking that both "mollipiloseness" and "mollipilosity," which are both plausible derived forms, would get the same boost. IMO there is no place for morphological purism on Wiktionary.
 * On (c), I'm momentarily at a loss for good examples. But consider, for example, nedyll wark as an archaic Scots spelling of needlework (attested only in, circa 1525); or any of the various alternative spellings that are often listed within OED entries but attested in only one or two sources.  We need to decide whether these deserve an extra bump, or should be held in Citations: space unless fully cited.  Perhaps the latter? -- Visviva 06:13, 26 February 2008 (UTC)

some discussion archived hither from elsewhere
Dan Polansky is asking me about Votes/pl-2007-12/Attestation criteria which I never really loved enough to put through. Would you be interested in salvaging it? DAVilla 09:28, 27 October 2010 (UTC)
 * I definitely want it not deleted. We were supposed, in citing works for RFV's sake, to be bearing this proposal in mind and making a note in RFV of how many points a term got, in order to see how the proposal would pan out in practice. (That was Connel's idea, but you agreed with it and pulled the vote.) Unfortunately, that lasted about, oh, a second. I'd be interested in doing that in order to see the viability of the proposed rules. &#x200b;—msh210℠ 15:13, 27 October 2010 (UTC)