Wiktionary talk:Criteria for inclusion/Archive 2

Multi-word entries, sums of their parts and translations
I've been thinking about a possible guideline regarding the multi-word terms. In particular, I'd like to neglect Davilla's Pawley test topic for this post, although the best solution would probably be a combination of both. See this as a possible additional test.

Some words that have been RFD'd lately I feel do merit some kind of inclusion here, whilst others don't, and the easiest way for me to determine that is to look at their translations. Example is WT:RFD.

Thinking in English-only, I don't see the merits of including this particular term, or any of the other indoor terms, as their meaning is defined by indoor. However, as my argument there described, such terms are translated in one word in at least two languages that I know of, German and Dutch, and possibly as well in more languages that I don't know of.

This rule may have a large impact, which those who know some German or Dutch will know, for terms like vintage car may be (I don't know) translated into one word there.

There may be two benefits:
 * Non-English entries of such terms, for instance the Dutch zaalvoetbal, can link properly to indoor football, instead of to indoor football.
 * indoor football will list the correct translations for at least German and Duch, so that users don't have to go through the process of looking up indoor, which would have the Dutch translation zaal- (a combining form), then looking up football, and then guessing how to link them, keeping in mind the various very complex rules for morphological word building in Dutch and German.

Opinions? — Vildricianus 13:24, 4 June 2006 (UTC)


 * I think there's a very close relationship between single words in other languages and what we would consider to be a single concept in English. However, other languages clearly also have concepts that do not exist in English as set phrases, such as father's older brother. To a person who speaks Chinese, mention of the word would immediately conjure images of what an older uncle might be to a younger, and the people in his own family who are associated with the word, as well as his father's friends as it turns out. To an English speaker, the phrase would have to be mentally summed, and the implications are not immediately obvious. So I don't think this could be used as an inclusive rule. I would wonder if it could be used as an exclusive rule, for instance if no other languages had a single word for skateboard wheel, or at least a term that passed the inclusive rules; essentially, if skateboard wheel isn't demonstrably a single concept in any other language, then it isn't one in English either. The inability to apply an exclusive rule like this to, say, vintage car because of some translation, would add credibility to the idea that it very well could be a single concept in English. Davilla 15:42, 5 June 2006 (UTC)


 * There have been a few debates on RFD where this migh apply, particularly active volcano. Despite what I wrote above, I've been thinking that this is a pretty good criterion to fall back on, even if it isn't ever included specifically. We should definitely include last night because you can't say yesterday night in most contexts without sounding a little funny. We should probably include last year because of the translations, and it's a pretty common expression anyways. There isn't any reason I can see for keeping last financial year, but maybe I'm just trying to stir trouble. DAVilla 21:33, 15 December 2006 (UTC)


 * I think that the question of the inclusion or exclusion of an expression as a derived term should not simply be a question of whether the meaning can be derived from a composition of its constituent words, but rather if it also includes a significant degree of markedness such that the use of some other combination of words to express the same meaning would be considered unnatural to a native speaker. This conventionality can be measured by looking at the distribution of the collocation relative to the distribution of other collocations with a similar derived meaning.  As I understand it we're not just building another Webster's here, but rather trying to declare a much larger, more detailed description of the human lexicon. We don't just want a laundry list of how you might express a particular meaning, but also how one would express a particular meaning. If wikitionary is going to function well as a cross-linguistic resource, which I think it should, it needs to include the conventional.  We can make another formal argument in favor of the inclusion of conventional expressions in the lexicon by considering the performance of Natural Language Processing systems.  Systems that include statistically mined conventional multiword expressions in the lexicon perform significantly better at selecting a correct syntactic parse from amongst the many thousands of well-formed possibilities. This makes a lot of sense when you realize the agent, computer or human, that is listening or reading must first tokenize the input stream before interpreting it.  If a collocation doesn't exist in the lexicon, then it can't be treated as a token, thus greatly (and unnaturally) increasing the combinatorial complexity of the language stream.  Johnfbremerjr 10:54, 10 February, 2008


 * Please see Idioms that survived RFD, which is an attempt to import the Pawley guidelines and rationalize why the community supports some phrases and not others. DAVilla 07:34, 11 February 2008 (UTC)

Blogs
When were "blogs" added as durably archived? They are not. All of the citations of blog sources I've seen so far have not been google archive links (therefore, not to the durably archived source.)

This seems to be quite fallicious, as google doesn't seem to archive them.

The discussions that do mention blogs above give clear reasons for not using them (as CFI used to state) oddly, from the most inclusionist contributor Wiktionary has seen so far!

What gives? Who added "blogs" and why?

--Connel MacKenzie 10:27, 20 August 2006 (UTC)


 * Of course being durably archived is the most important aspect. If the CFI is incorrect then this needs to be changed.
 * I vaguely remember a time of revision of this article when this change might have been made, anyways it's somewhere in the history. (Wouldn't it be nice to be able to select a portion of the article and find when that text was most recently changed?) I don't think the intent was malicious but it would be nice to ask for the reasoning. One of the arguments at the time was about feeds being archived. I'm not literate enough in the technology to know if that applies to blogs. DAVilla 21:07, 21 August 2006 (UTC)


 * I'm still puzzled by this. Can we make it go away? -- Visviva 14:37, 22 June 2007 (UTC)


 * I have now removed this text. -- Visviva 02:20, 28 September 2007 (UTC)

Reconstructed languages
I object in the strongest possible terms to the unilateral imposition of 'policy' on the part of User:Robert Ullmann. Refusing repeated invitations to constructively state his position on Wiktionary_talk:Reconstructed terms, and following a failed deletion request, he just made unilateral changes to Reconstructed terms to suit his whim, without bothering to give any explanation beyond '1/2 rewrite', knowing perfectly well his changes would be controversial.

I am not seeking to impose any fixed opinion of mine, I am looking for intelligent debate among people aware of the issues involved. Robert Ullmann's suggestion has some merit, but it also has flaws, and as long as he just keeps imposing it without debate, there is no way of ironing them out. Robert Ullmann does important work on wiktionary. But he has very idiosyncratic views on etymology and langauge reconstruction, and no interest in, and consequently no knowledge on the matter. It is bad enough that he abuses his admin privileges to chastise me over alleged violation of CFI (which has still 'Semi-Official status'), but to insert such a "policy" into CFI after the fact, and after realizing that it had not in fact been there at the time he chose to chastise over it is simply wikityranny (making up your laws as you go along), indefensible under wikiquette, and unacceptable on any Wikimedia project. Let him either discuss the issue amicably, or step down from policing about it.

I do invite anyone interested in the topic to seek for a solution acceptable to everybody, but I will not put up with such bullying tactics. Dbachmann 10:48, 26 January 2007 (UTC)


 * User blocked for one week, knowingly removing CFI clarification made as result of policy vote, change reverted. Robert Ullmann 12:09, 26 January 2007 (UTC)

Oxford English Dictionary
I noticed that materteral is listed for deletion yet it appears in the OED. It is my thought that if a word is in the OED, it merits inclusion in wiktionary. What do others think? WilliamKF 19:43, 8 February 2007 (UTC)


 * The tenuous decision has been generally to allow such references as some kind of refereed academic work, even though it is no such thing. For RFV, see  which explains some of the reasons why we don't/won't/can't take everything the OED has.  The concession was made, I might add, during a dispute with Wiktionary's most infamous copyvio vandal, long before his actions were exposed as being 100% copyright violations.  If it comes to a vote, I'd vote strongly against such folly; the minute a vote passed, someone would start a bot stubbing in the OED entries, exposing WMF to certain copyright concerns.  On the other hand, if a word appears in the OED and here, but no other major dictionaries, we probably should delete it, even if it isn't a word-for-word copy.  --Connel MacKenzie 20:02, 8 February 2007 (UTC)


 * I'm not convinced by the copyvio reasoning implied in your last sentence (though I very much agree with the rest). AFAIK, OED only includes words for which it has either found prior use or prior mention (eg in earlier dictionaries).  In the former case, we can make up our own minds as to the meaning of a prior use (though it would be dodgy if we cannot find the same or different cites via a separate search).  In the latter case, copyright is less likely to be a problem, particularly since the dictionaries cited are usually >>120 yrs old.


 * In the case of materteral I see that there are at least three good b.g.c. cites, so having found it's meaning, and feeling somewhat avuncular about it, I suppose I might weigh in behind it. --Eng in ear 20:28, 8 February 2007 (UTC)


 * Yes, the Oxford English Dictionary (OED) is strictly based upon giving examples quoted from literature. In terms of copyright violations, the OED in its first edition (1928) and supplements dates back to the beginning of the twentieth century and therefore, would not be subject to copyright similar to how the Encyclopedia Britannica 11th edition is used on wikipedia. WilliamKF 20:48, 8 February 2007 (UTC)


 * Here is a link where images of out of copyright pages of the fascicles may be found and rationale. WilliamKF 22:02, 8 February 2007 (UTC)


 * Note: I'm blocking this latest sockpuppet .  --Connel MacKenzie 15:02, 13 February 2007 (UTC)

Sign languages?
Does Wiktionary in principle allow inclusion of words in sign languages? Showing the gesture shouldn't be too difficult -- stationary gestures can be shown with an image, mobile ones with a video -- but getting a gesture to be the name of an entry page might be more difficult. Angr 23:00, 10 February 2007 (UTC)
 * This might need a separate namespace because of the very different format of presentation. How do you list synonyms and antonyms, for instance?  How do you put a sign language entry into a translations table?  Besides which, there are many different sign languages, including American, British, Hungarian, and that of the American Plains Indians.  There are some websites linked from the Wikipedia article on sign language that might provide ideas.  Can you imagine what the American sign language Wiktionary would look like? --EncycloPetey 23:06, 10 February 2007 (UTC)
 * Listing synonyms and antonyms is not hard: include a picture or the like. Having an entry is what's hard: how do we include the term as the PAGENAME? See also my comment below, in this section.&mdash;msh210 &#x2120; 17:44, 11 February 2008 (UTC)


 * There has been a bit of discussion on this question at various times. About sign languages has some of the results of that discussion, and Information desk currently has my reply to someone else who recently asked the same question you just did, Angr. Any ideas you have would be great; probably the Beer parlour or Wiktionary talk:About sign languages would be the best place to mention them, the former especially if they are ideas on how to include SL entries.&mdash;msh210 &#x2120; 17:44, 11 February 2008 (UTC)

Inflected forms
I think that inflected forms should be included if they belong to two different words in the same language. I once looked at a play script in Spanish and had to pause at the word viste to figure out which verb was meant. Other examples in Spanish are fue etc., ve, and the regular siento, sienta, and siente. Russian has дне and хоре.PierreAbbat 02:25, 11 June 2007 (UTC)


 * Because spellings so easily overlap in different languages, here on en.wiktionary, we aim to include all inflected forms, not just ones that might have obvious problems. --Connel MacKenzie 07:51, 11 June 2007 (UTC)

use in a refereed journal: mathematics
I'm not sure about other fields, but in mathematics a refereed article will often have what we call "ad hoc definitions". That is, for example, the author, call him Smith, will say "let a foo subgroup be a subgroup that is finite and central". Smith then uses the word "foo" a hundred times over the course of his paper, but is never heard of again in the literature. Words like these should I think not have entries. On the other hand, sometimes Smith does the same thing, and then another author will say "Let a foo subgroup be, after Smith, a finite central subgroup" and use the term in his paper, and a third author will say "If the subgroup is foo (in the sense of Smith 2007), then..." and a fourth will say "If the subgroup is foo, then...". (This process does not occur over the span of four papers. But the progression is approximately correct.) At what point in this process does the word become acceptable in en.wikt?&mdash;msh210 18:45, 16 August 2007 (UTC)
 * (Note incidentally that the word may have been in use by Smith and his colleagues in various universities well before it was ever published. But i'm assuming for the sake of argument that we cannot attest that.)&mdash;msh210 18:45, 16 August 2007 (UTC)


 * If a word, e.g. “foo”, becomes strongly-enough associated with a particular definition that many authors begin to use the word without defining it, “foo” in that sense will naturally meet the existing CFI. Rod (A. Smith) 18:52, 16 August 2007 (UTC)
 * Well, yes, that's the "fourth" stage above. But does a "in a refereed academic journal" rule apply to any of the earlier stages, was more my question.&mdash;msh210 19:55, 16 August 2007 (UTC)


 * Well, the attestation section (actually, bulk of the CFI) was written to clarify the general rule: “A term should be included if it's likely that someone would run across it and want to know what it means.” Following the spirit of that general rule, I'd say that readers of academic journals are only likely to want to know what a given term means if the journal uses the term without defining it.  That is, I assume that the “Appearance in a refereed academic journal” part of the attestation section is present to refine the clause “someone would run across it”, not to override the “[someone would] want to know what it means” clause.  Does that make sense?  Rod (A. Smith) 20:23, 16 August 2007 (UTC)


 * Actually, that was Dmh's phrase ("if it's likely that someone would...want to know what it means,") if I recall correctly. Frankly, I don't know how that phrase escaped notice.  --Connel MacKenzie 20:39, 16 August 2007 (UTC)


 * The problem is that a word could have just one "appearance in a refereed academic journal" and be admitted, even if it were just the definition—a mere mention of existence, or even inexistence! Likewise, "usage in a well-known work" allows for literary nonces, which have been received with skepticism. We should alter CFI to say that all terms must convey meaning in three independent instances over a year, and that if disputed they must be so cited, with the exception of clearly widespread use. I cannot imagine that change eliminating anything of substance. DAVilla 13:09, 17 August 2007 (UTC)


 * I disagree. I think there is a problem here, and I think you've mostly identified it correctly, but I don't think the solution is to remove these exceptions; the exceptions serve an important purpose. For example, there are plenty of languages that simply don't have the written corpus needed for their words to meet the normal CFI; but the academic-journal exception means that if linguists (or anthropologists) publish papers about these languages (or people) and define some words, we can include those. Also, while it might be obvious to us, after looking for independent cites, that a word in Romeo and Juliet is a nonce-word, the casual reader might not find that so obvious; and while obviously it's not worthwhile to include every nonce-word in every work, some works (the King James Version of the Bible, several of Shakespeare's plays, the U.S. Constitution and Declaration of Independence, etc.) are sufficiently well-known and widely read that it does make sense to include even their nonces. —Ruakh TALK 16:37, 17 August 2007 (UTC)


 * Anthropologists and ethnolinguists are wonderful people, but they are no more reliable than lexicographers when it comes to defining words. If there are no authentic durable records of a language whatsoever -- no recordings, no transcripts -- there is simply no material for us to work with.  In this respect I think the use-mention distinction trumps any value in peer-reviewed scholarship ... so I'm inclined to agree with DAvilla that this clause no longer serves any useful purpose, and in fact contradicts our current practice. -- Visviva 02:39, 28 September 2007 (UTC)

Personal names from languages with a non-Latin script
Having perused this I cannot fathom why the Russian name Дмитрий exists only in Latin form and Владимир exists in both scripts??? I recommend strongly for names with no (orginal !) Cyrillic articles yet written to be moved to articles with the appropriate Cyrillic titles, if nobody minds. Is the article in Latin letters appropriate at all, since transliteration is always provided? Bogorm 08:56, 16 August 2008 (UTC)


 * I think the issue here is simply the incompleteness of Wiktionary. Дмитрий certainly should exist (and Dmitry was in some serious need of cleanup).  However, much as the two are related, the existence or lack thereof of Дмитрий should not in any way affect whether we keep Dmitry.  If it can be attested, it should be kept.  -Atelaes λάλει ἐμοί 09:08, 16 August 2008 (UTC)
 * Deleting of Dmitry is not my concern, since I am not administrator and until someone proposes it for deletion. If you do not mind, I am going to move Dmitry to Дмитрий, so that the Latin title be used as redirection. Bogorm 09:12, 16 August 2008 (UTC)
 * That would be inappropriate, since we do not use redirects that way. See Redirects. --EncycloPetey 16:48, 16 August 2008 (UTC)


 * In the future, please bear in mind that an entry requires some reformatting if you move it to a different language. -Atelaes λάλει ἐμοί 17:58, 16 August 2008 (UTC)


 * Why have you deprived the article of the Transliterations section? What do you mean under work? It is not moved to a different language - Дмитрий is the only admissible form of the Russian name and Dmitry is an independent article about the English case, though I strongly doubt that any British would opt for a purely Russian name, unless he is a fervent adherent of the current Russian president. Bogorm 18:17, 16 August 2008 (UTC)


 * I have removed the transliterations because we have a specific transliteration format for Russian words, and I don't know Cyrillic script. I put a marker on it to garner the attention of someone who knows Russian to come and add it.  It is moved to another language because you took the content of Dmitry (an English word), and moved it to Дмитрий (a Russian word), without properly reformatting it.  It was still classified as an English proper noun, as well as being in the English category for Latin and grc derivations.  -Atelaes λάλει ἐμοί 18:25, 16 August 2008 (UTC)


 * Well, I speak fluently Russian, but am not knowledgeable about formatting of proper names according to your wishes - so I shall essay to be helpful by elucidating here the transliteration but without adding it in templates, since I have not yet got accustomed to using templates (besides the quoted on my talk page): the official scientifical transliteration is "Dmitrij", but the popular for the English-speaking world is Dmitry(corroboration for my words is to be found in the article about President Medvedev, whose first name is rendered as transliteration in brackets and in the popular rendering in the title of the article). I do not know however, old Russian (Eastern Old Church Slavonic) and the question about the ancient spelling can hopefully be resolved by EncycloPetey (see below). Bogorm 19:15, 16 August 2008 (UTC)
 * I'm not sure what you mean by "official scientifical transliteration" of Russian, unless you mean the International Scholarly System, since there are several standard schemes in use just for Russian transliteration, including the system used by the Libarary of Congress (which would transcribe Дмитрий as "Dmitrii". There are also different Latinizing transcription systems used in Germany and Poland that I have seen, and presumbaly there are many more besides.  See  Romanization of Russian on Wikipedia for a little more, including a table comparing 7 of the systems.  The system preferred by the Russian government and the Russian Commenwealth is GOST 7.79. --EncycloPetey 20:16, 16 August 2008 (UTC)
 * Yes, I meant the first one, because it is international and present in all articles about Russia in Wikipedia as the quoted one. And regional regulations have not international jurisdiction. Bogorm 22:11, 16 August 2008 (UTC)
 * While there may be only one Cyrillic form in common current use, that is not the only spelling possible in previous centuries. Unfortunately, I have discovered that I am missing the relevant page from my copy of Nikolaj Michailovič Tupikov's Wörterbuch der Altrussischen Personnenamen (Köln & Wien: Böhlau Verlag, 1989).  However Wickenden's Dictionary of Russian Names includes a large number of spellings.  (Wickenden's are transliterated, but use a consistent transliteration system). --EncycloPetey 18:26, 16 August 2008 (UTC)

Somewhat related discussion is at my talk page.—msh210 ℠ 21:09, 20 August 2008 (UTC)

Scientific nomenclature
In general, how far should we be diving into the scientific/technical names of things? I have two main question's in mind: Now, I understand people don't normally use dictionaries for looking up this kind of information, but for the sake of completeness (and the limitless potential of Wiktionary), I guess my main question is where do you draw the line? Voxii 23:47, 3 March 2009 (UTC)
 * Should every (established) genus-species of organisms have an entry? ("established" meaning there's good citations, but that's pretty easy if you consider all the scientific journals out there who use the terms without defining them)
 * Likewise, what makes chemical names worthy of inclusion? There's so many variations.. see 8-Azaguanine where I listed all the synonyms I could find appearing in at least two independent sources. If you Google most of them, the chemical uses are buried deep in the results, however, if you use Google Scholar, they're usually all you get.


 * Here's my sense of the current state of things:
 * The sense of the community lately seems to be "no", that we should have genus names and species epithets but leave actual binomials and trinomials to Wikispecies. (But I would defer to EncycloPetey, DCDuring, and any others who have actually worked on this area lately.)  This has varied over time, and we do have a number of binomial names.  The not-yet-closed RFD for B. splendens may be pertinent.
 * Sum of parts hyphenated chemical names (e.g. full IUPAC names and variations thereof) are out, with a possible exception if they happen to be used outside of the chemical field. So I don't think we would want anything above "8AG" in your list.  Trade/commercial/brand names are out unless they happen to satisfy WT:CFI.  I expect that identifiers like "NSC-749" are also out, though I don't think that's ever been put to the test.  Terms like "triazologuanine" should be fine, I think, provided that they are verifiable. -- Visviva 03:11, 4 March 2009 (UTC)
 * Ok, that sounds reasonable. I figured things like "5-Amino-1,6-dihydro-7H-v-triazolo(4,5-d)pyrimidin-7-one" were out. Most of the ones that are a couple letters and numbers are usually only used by certain organizations or are something like a brand name, so I guess we don't want those either. As for the species names, I'll take a look around. I think some of the more popular ones might be worthy of inclusion. Thank you very much for your answers. Voxii 10:29, 4 March 2009 (UTC)
 * I agree that "some of the more popular ones" should be included, such as Homo sapiens, Tyrannosaurus rex, and E. coli. Angr 15:06, 6 March 2009 (UTC)


 * I think Visviva has fairly summarised current unofficial preferences.
 * I think we do a good service when we take attestable, includable vernacular (even brand/product) names and translate them into current scientific terms and find the best current WP article (and/or outside source) to link to. Sometimes dated scientific terms can be sussed out, though that is harder. WikiSpecies is already much more complete than we are ever likely to be about the structure of the taxonomic tree.
 * Getting all the one-part taxonomic names is plenty challenging. I suppose the same would be true for all the combining forms of chemical terms as well. DCDuring TALK 16:02, 6 March 2009 (UTC)