Wiktionary:Grease pit/2-level dictionary

History
It seems that all along there have been 2 camps of people working on Wiktionary. a) Those who wanted to build a free open online dictionary of all words in all languages that works just like a traditional dictionary such as those produced by Oxford, Websters, or The RAE; and b) Those who wanted to build a free open online dictionary of all words in all languages that is not constrained by tradition in any way shape or form and wants to broaden the scope to include inflections, misspellings, word combinations that might be too transparent for traditional dictionaries, video game characters, etc.

I've been thinking of such a dictionary for years before somebody came along and created Wiktionary. All along I thought the world needed a free OED or Webster's that they could add to and improve. While I expected that contributors to such a free project would be amateurs untrained in lexicography or linguistics, I also thought they would fall mostly into camp a) with another groups who wanted to promote their own invented words (protologisms) who we would need to dissuade. Somehow though I never invisaged camp b).

It's only in the last couple of days after fighting with various degrees of energy since the time I discovered Wiktionary that the majority of contributors, at least up till now are actually in camp b). How foolish I was!

So now I will no longer fight against entries that I wouldn't accept in a traditional dictionary, indeed I will mostly stay out of debated on RFD. But what I will do is mention the benefits of becoming a 2-level dictionary that works for both camps since I feel there is a very large camp a) out there who have never looked at Wiktionary but will in the course of time.


 * I think I've been pretty vocal about my opinions for inclusion... I hope not too much so, though the CFI talk page could almost be construed as yelling. I'm interested in knowing which camp you think I fall under, or which one I'm closer to anyways. Davilla 14:15, 3 June 2006 (UTC) (re-signed)

What the Grease pit can do
So the job for me and anybody who understands the point I'm trying to make, is to come up with the technical ways of bringing about a 2-level dictionary so that tradition users can either see just what would be in a traditional dictionary or at least be able to tell visually which entries and which senses would be in a traditional dictionary and which are part of the extended scope of Wiktionary.

Here are some of my ideas:


 * Come up with a set of templates, or something else, which can be inserted into an article at the language entry level, to mark the whole entry as an extended entry.
 * As above but to mark individual senses.
 * As above to mark entries or senses by type: encyclopedic, misspelling, transparent combination, fictional character, etc.
 * The default behaviour of these should be to do nothing at all. The typical user will see just what they always saw. But there will be a CSS class which can be accessed by some new skin or by people who choose to customize. They can be used to show types of entries by colour, or to hid types of entries (or senses)
 * More difficult problems include how to handle links to these in Related terms or the disambiguation see also section at the top of pages, or how to improve search results to indicate which results would be for extended entries. These could be solved with more advanced methods such as toolserver with its own database of which articles are marked in which way - in some ways similar to Connel's Random article by language work.

Rough poll
It might help to get an idea of how people feel about these issues:
 * a) I think it's all stupid and want nothing to do with it
 * b) I've always been in camp b and didn't know there was a camp a
 * c) I've always been in camp a and never understood camp b
 * d) I'm in camp a personally but always understood that most Wiktionarians were in camp b
 * e) I'm firmly in one camp but can see that Wiktionary would be much better if it worked for both camps
 * f) I'm firmly in camp b and don't want Wiktionary to change but I'm still interested in what you guys think
 * g) all kinds of other positions that haven't come to mind right now

Ok. So let's hear peoples' thoughts... &mdash; Hippietrail 22:40, 27 May 2006 (UTC)

Vild's thoughts
I'm for option h): I oppose polarization and antagonism, and would like to single out any attempt at moving towards such status. Besides, none of your poll's points apply to me. I'm not certain everyone here - apart from the most idealist among you - feels they are 100% in one camp. At least, I'm not in any, and I'll always refuse to think in such terms.

I do understand your position, Hippietrail, and of course you have way more experience here, but one thing I'd gladly avoid is creating "camps" of ideologically different contributors. Such differences, of course they exist, but do we really need to encourage and endorse them? Why can't we have a unified purpose, with one set of criteria and one set of guidelines? Sure both parties will have to give in some of their ideas, and both will have to conform. That's no big deal, is it? It has worked for a couple of years now without too many problems, and even though it's becoming obvious that we'll need some adjustments to our systems somewhere in the near future, there's no reason to overhaul the most basic and fundamental reasoning behind Wiktionary.

I don't think your proposals will solve anything. I could be wrong, but I expect them to divide the community to a great extent. Fine, there may be two camps. Let's make one dictionary for both. Don't get me wrong, if it seems that such a solution is beneficial, then by all means good luck with it. But I'm highly sceptical. &mdash;Vildricianus 23:45, 27 May 2006 (UTC)


 * I was under the impression that Hippietrail's proposal is to quell the opposition between the two camps. There may not be a clear dividing line between the two camps, but at the very least we can say that some contributors are more inclusive than others. By allowing purists to mark terms "not fit for inclusion" as such, they can have their voice and can filter the resulting Wiktionary without being bothered by seeing anything "brillig" to "undigital". Others, likewise, can enter terms that they feel are vital to define here without getting their good faith efforts trampled over. I don't see how the proposal is divisive. Rod (A. Smith) 00:06, 28 May 2006 (UTC)


 * I agree with Vild. In essence, this is a political problem much not a technical one...technical remedies sometimes help, but not without TREMENDOUS amounts of PR and cheerleading.  --Connel MacKenzie T C 14:31, 28 May 2006 (UTC)

Yes please
I am very strongly in support of this proposal. I want wiktionary to provide me with both kinds of dictionary. For example "owl" means "message" in the Harry Potter books. This is way too much information if I wasn't looking for it, and when I saw it I thought "Yuck, get this fancruft out of my face". However, if I was trying to read a HP book in a foreign language, this kind of thing would be incredibly useful to me. One reason I almost always vote with camp b is precisely because it is possible to create two-level dictionary which caters to almost any kind of need, so no-one has to be let down. Another point it that it should be very easy to get concensus on most terms and thus not require any actual voting, which is divisive and seems to have semi-random results. Kappa 01:20, 28 May 2006 (UTC)

Two-level too arbitrary, I think
My goal (though I tend to be somewhat blue-sky on this sort of thing) would be to try to describe an entire continuum of word/term notability, from absolutely-accepted to jargon to fancruft, with all sorts of gradiations in between.

And, in fact, there's not a single axis here; but several:


 * technicality : from "every English speaker knows" to "particular to a field" to "very specific jargon or techspeak known only within a field"


 * formality : of usage: stilted, formal, normal, informal, slang, vulgar slang, unacceptable


 * regionality : how widely is the term known or used: worldwide, national, regional


 * notability : what group considers this term important: everyone, those in some field or large community, those in some small or very specialized community


 * minutae : whether every derived and inflected term exhaustively, explicitly listed (or only the "interesting" ones)

Obviously, we capure many of these characteristics already, in somewhat ad-hoc ways, generally with tags such as or  or  (which of course is just what real dictionaries tend to do, too). Me, I'd rather be more systematic about this sort of thing, but the ad-hoc tags do work pretty well. My holy grail (which may be exactly what Hippietrail was getting at) would be, not a two-tier system, but one in which rather than deleting terms which failed to meet some arbitrary, binary threshold of "notability", we could instead tag and then filter things so that the horrendously obscure or "non-notable" terms are effectively invisible to those who don't care about them.

–Scs 13:50, 28 May 2006 (UTC)


 * Minor gripe: We don't speak of notability here, except to point out to Wikipedians that this is not Wikipedia...attestation is our rule here, not notability. (More relevant comments later, when I've digested this more properly - looks good, though.)  --Connel MacKenzie T C 14:23, 28 May 2006 (UTC)
 * The CFI is an arbitrary limitation, arbitrarily set by an arbitrary group of people. It is far from immutable. It is precisely to overcome these arbitrary limitations that we are discussing these ideas.--Richardb 10:47, 29 May 2006 (UTC)


 * A few terms that I've added to Wiktionary are non-standard interpretations of English words, labeled as such. So when I first read this, tagging was an immediately obvious way that we already make some of these distinctions. Thank you for spelling out the idea more fully. -Davilla 59.112.52.124 17:20, 28 May 2006 (UTC)


 * Here's another big distinction I forgot:
 * reality : real-world terms versus terms from fiction (e.g., the recent debate over veritaserum)
 * Yet another one is
 * status : archaic / obsolete / current / neologism / protologism
 * –Scs 14:02, 29 May 2006 (UTC)


 * See also Beer_parlour. —scs 00:29, 10 September 2006 (UTC) (That's as of 9/2006; eventually it'll be archived and that link will stop working.)

How is progress on the next version of Wiktionary software going?
Some of these ideas would be so much easier to implement if we have a proper database version of Wiktionary. It might be worrth waiting for that next version if it's really in prospect.--Richardb 10:47, 29 May 2006 (UTC)
 * See reply below. &mdash;Vildricianus 14:18, 29 May 2006 (UTC)

Connelthoughts
I like all that I've read here, so far. I confess that when I found Wiktionary, I assumed it was a "camp A" type dictionary. After a short time, I learned it was a "camp B" dictionary, and got with the program. (I didn't think of it in those terms, then, of course.)

I do strongly agree with Eclecticology's disproval of the vulgar nonsense that often creeps in. The more ridiculous stuff we allow, the less likely it is that Wiktionary will ever be considered a "respectable" resource. To me, the problem has always been one of distinguishing between "valid camp B stuff" and vandalism/peruile nonsense.

OTOH, we already have branched out far away from traditional dictionaries. To a newcomer, there may be little distinction. That is why I find myself falling into "camp A" on occasion, even though I know that "camp B" is the goal that Wiktionary is heading towards.

Reading Scs's thoughts, I really like the multi-level approach. But that is clearly me thinking as a bit-head, and not as a dictionary user.

I guess the biggest problem I have, is that "All Words In All Languages" is inherently incompatible with "camp A". Due to the unrecognized camp A/camp B conflict that has gone one for the past couple years, we don't do "camp B" properly either.

I think that no matter where this conversation goes, Wiktionary should address the "All Words In All Languages" paradox. That is, if we were to ever get to that point, we would be so far from what people expect when finding a dictionary, the effect might be the same as if we included all Urbandictionary vulgarities. (And recently, even Urbandictionary seems to have expunged much of last year's nonsense.)

I don't know what the "right" answer is. Throwing up my hands and saying "time will tell" may be true, but not very helpful. I'd like to see how Scs' ideas pan out, as I think they hold the most promise.

I recognize the danger that Vild pointed out regarding the use of "camp foo" vs. "camp bar" labels, but I hope it is clear that no one falls 100% into one abstract concept nor the other. It may be worthwhile to drop the labels.

--Connel MacKenzie T C 15:48, 28 May 2006 (UTC)

Richardb view
I must admit that as a non-linguist I am somewhat despairing of Wiktionary progressing anywhere positive for the "general public" user. It has a huge volume of words, but does not cover the basic words well (Many of the most basic words are barely improved from the Webster importation), becuase everyone follows their special interest, which is not often around the most basic words. Nor does it handle the capture of new words/meanings well (variously deleting them, consigning them to a list of Protologisms etc). But it does have a huge number of totally obscure words in obscure languages (Old English). WikiSaurus is not exactly a roaring success yet, though it would seem some form of Word-Relationships capture is needed. --Richardb 11:18, 29 May 2006 (UTC)
 * I guess one reason for the "neglect" of the basic stuff is because it's more difficult to describe accurately? I don't think it's a big problem, though, it'll get fixed over time. Remember there's only a handful of contributors here. &mdash;Vildricianus 13:51, 29 May 2006 (UTC)

So, somehow, we need to capture everything we can (be totally inclusionist), but categorise what we capture (technicality, regional etc), so that users can tailor their view to limit the content to what they want to see normally. I don't think we can do that realistically with the current software. How close is the next version, with full database capabilities, where we can apply all these different types of dimensions to words, spellings, meanings, relationships etc ? --Richardb 11:18, 29 May 2006 (UTC)
 * What is that, "inclusionist"? Allowing things like big red bus? Or all the dubious content and spam we get daily? I believe that we can work this out, provided we put a lot of effort in it, and that a unified CFI is possible. &mdash;Vildricianus 13:51, 29 May 2006 (UTC)

In the absence of new, purpose built software, I think we will be defeated. If the new software is not really going to happen, perhaps we need to form some sort of relationship with another online dictionary which can better support our efforts. My view is that the software used by www.thefreedictionary.com would be a better starting point for us. Perhaps we could form a relationship with them (as Wikipedia seems to have done), so that we get free use of their software, and they get free use of our content. Are there other candidates we should consider ? --Richardb 11:18, 29 May 2006 (UTC)
 * The new software (you might want to take a look at (http://wiktionaryz.org) is under development, yes, but I don't think it'll soon replace en.wiktionary completely. It might take over our non-English content, yes, but I don't recommend counting on it for a big solution for our internal problems. Not yet, that is. I'm not sure whether we should count on yet another new software proposal, though. &mdash;Vildricianus 13:51, 29 May 2006 (UTC)

I also feel we are going to end up needing some sort of voting software (which I think Urban Dictionary has) to vote on the applicability of the various tags we put onto words. eg: Is "gay = happy" a current usage or not. But even this would be complicated - A vote of 10-20 year olds would probably be 99% saying it's dated. A vote of over 65's would be probably be 75% or more still saying it's current. --Richardb 11:18, 29 May 2006 (UTC)


 * Then just make sure everyone's complete information is entered, including age, and the system will churn out the perfect tag, in this case somewhere between "dying out" and "outdated". Like "fogly" or something. :-P Davilla 13:11, 29 May 2006 (UTC)


 * Voting? No thanks! Honestly, I can't imagine anything worse than that. You want to let people decide how language is used? That's not very far from prescriptivism, right? Unless I'm mistaking, that's not what Wiktionary is about. It would be too easy for one group to dominate the other. Whether gay means happy is not up for decision or voting; it has to be examined "in the field" and proven. &mdash;Vildricianus 13:51, 29 May 2006 (UTC)


 * Sorry if I missed your joke, but to me, allowing "voting" sounds very much like descriptivism, not prescriptivism!
 * Seriously, one kind of "voting" I've thought about would be not for definition accuracy, but rather, simply for notability/usefulness/reality/existence of an entry. It would be an implicit kind of voting, based just on... hit counts.  Pages that never get viewed in N months might not be words and so might be candidates for automatic deletion or relegation to some less-visible category of "really obscure" words.
 * (With that said, I certainly concede that there would be significant difficulties. It would be easy for suupporters of favorite nonce words to keep their word supported through false clicks.  There might be genuine words which are so obscure that they never happened to get a visit in N months.  But it's interesting to think about all the same.)
 * –scs 21:43, 31 May 2006 (UTC)
 * Pardon me if this sounds as POV pushing from my part, but I don't have the slightest idea of how this would work. Neither by means of voting nor by counting page views. The proposed system would be highly susceptible of real POV pushing and, contrary to how you view it, would be prescriptivism incarnate. I wasn't joking this time - I try to limit that to my signature. Heck, voting is such a controversial issue for policies and proposals, so let's please not consider this as a means to attest a word's usage. Contrary to what often is displayed at WT:RFV, we do have our methods and means of doing so properly, it's just that we still need to shape it and fix it according to what suits best for Wiktionary. I guess we will forever be refining it, and it'll never be completely error-free, but at least it'll be a working system. &mdash;Vildricianus 21:58, 31 May 2006 (UTC)

But, in the end, maybe Wiktionary will never really work perfectly. Wikipedia works because on each topic the "experts" eventually take over and win the debates. But in Wiktionary we are all "experts" on current usage of words, and are not going to easily agree on common definitions, synonyms, even pronunciations. We can't even agree on spelling half the time! Maybe a dictionary still does need an editor/publiosher, to be the final arbiter, if it is going to have consistency. --Richardb 11:18, 29 May 2006 (UTC)
 * What's perfect? Do we need it to work perfectly? Do you think Wikipedia works perfectly? Far from! I dare say we're on tracks - we still need to work out a huge number of bugs and hitches but the concept is working, albeit slowly. &mdash;Vildricianus 13:51, 29 May 2006 (UTC)


 * I view WZ as a branch, not a continuation. The goal seems to be quite different.  The emphasis is in translations, not so strong on description in a given language.  Wiktionary has instead the ability to describe terms in the reader's language of choice, with language-specific subtleties emphasized.  I hope I am ultimately wrong about WiktionaryZ, but it will take a lot more software development to get there.  The notion that a word can be described in localized terms only makes the ultimate goal of WZ laudable.  But for 10,000+ languages, it may be a little too "pie-in-the-sky."


 * Yes. And very long-term. Did you know, the biggest dictionary in the world is the "Dictionary of the Dutch language" (way bigger than OED), and it took 147 years to complete. For one language. &mdash;Vildricianus 20:28, 29 May 2006 (UTC)


 * Partnerships with entities are great, but I think the notion of using software from http://www.thefreedictionary.com/ is silly. They have amalgamation software, not data-entry/data-conversion software.


 * My long term goal for Wiktionary has been to consolodate the level three headings. If we can come up with a finite set of "valid" third level headings, the ability to convert our plain-text representation to a data model are greatly improved.  (Level two headings are currently language headins, barring the occasional newbie mistake.)  If we ever *do* come up with a finite set, we can then use Javascript to enforce "better" data entry (to combat the "garbage-in; garbage-out" syndrome.)  So far, the thorniest issues I've seen are with things like "Gismu" (in the Lojban language,) "Romaji" (Japanese) and "Letter"/"Character"/"Symbol" (Interlingual.)


 * Add to the confusion the assinine practice of using different levels depending on the number etymology descriptions, and it gets much messier. (My philosophical POV regarding etymology sections is that they should have the same style "disambiguation" as translation sections.  They should also be relegated to secondary headings, in my opinion.  Currently, Wiktionary gives them superior status, even when "unknown.")


 * Will language Wiktioanry content ever be used in WZ? I don't know.  I do hope so, but then, I see the current formatting problems as being insurmountable for automated import issues there.


 * Similarly, I don't think the emphasis on POS is beneficial at all. In my philosophical POV, a ===Definitions=== section is all that is needed, with individual meanings described by tags like, , , , , etc.


 * Taking that one step farther, it would be nice to have software do the "dismabiguation" of all the various other third level headings. Maybe a mechanism similar to the "references" keyword used in Wikipedia.  The ultimate goal would be to have just the definitions appear, with links to all other information being in collapsed/hidden "collapsible" sections, linked to each definition (not to a spelling.)  E.g. "(etymology +)", that when clicked would display that meaning's etymology.  Or "(translations +)" which when clicked would expand the translation for that meaning.
 * Again, I don't that is the direct that WZ is headed in. Again, I hope I am wrong, thinking so.


 * I think our best bet for developing the next generation of Wiktionary software is something that will be done, right here; not on WZ, not using freedictionary, nor anything else.


 * I think Richardb's goal of focusing on "Basic" words and drastically improving them is laudable, but is not something that I am particularly good at, myself. On one hand, all entries need improvement.  On the other, the most commonly viewed entries should be "improved" first.  It is very reasonable to assign priority to which entries to target, for that improvement.


 * --Connel MacKenzie T C 20:09, 29 May 2006 (UTC)

Widsith thoughts
The only way for a dictionary to be respected and respectable is to cite usage. That is the levelling criterion which does away with debate over what is and isn't proper or vulgar or whatever. If a word can be seen to be used we should include it and define it – and otherwise we shouldn't. Hippietrail, you want to retire from RFD discussions, but I wish you wouldn't.  I think by and large the community works well here. Debate is important and if decisions have not gone the way you would have liked, I don't think it's a good response to walk away and suggest splitting the dictionary in two. Although we argue over details, most people accept the CFI and we should use that as common ground for moving forward, not start talking about dividing content. Personally, I have my own ideas about what things should not be part of a dictionary (video game characters among them), and I will continue to argue that way in RFV discussions, and I invite those of you who disagree to argue back. I have conceded many points in the past and had my opinion changed on several matters, and I hope I've changed other people's minds about things too. We have to work towards consensus, and not abandon it in favour of division. Widsith 13:48, 29 May 2006 (UTC)


 * Well said. I think, though, that the major points of disagreement for Hippietrail are the vintage cars and such. It's not easy to determine to what extend we can/can't/should/shouldn't include such terms, but difficult is not impossible. &mdash;Vildricianus 13:56, 29 May 2006 (UTC)


 * Widsith, I think you misunderstand me. I don't at all want to "divide" Wiktionary in two. What I want is to make everybody happy and still keep it in one place. If I fail in this then the only solution will inevitably to really "divide" Wiktionary by forking the database to some other site and removing all but the "conservative" entries and stating policies to match. That would be a bad bad thing. My proposal is about "tagging" content, not about division. Division is what I don't want. But sooner or later, especially when we have decent browsing features, the number of people unsatisfied with all the "cruft" in the index when they're just looking for plain old everyday words, will get to the point where somebody will say "We need another free dictionary without all this stuff in it". That would be division. It has happened to Wikipedia in the past and it can happen here too. Also, I'm sick of arguing. &mdash; Hippietrail 22:45, 29 May 2006 (UTC)


 * Well said, too. However, you're going to have to argue much more if you want this made clear to the broader public outside of the Grease pit. This is not "arguing", by the way, it's discussion and feedback, which is what you wanted. Your idea is quite revolutionary and may have many consequences and implications. People are conservative and afraid of drastic changes. That rule applies to Wiktionary as well. In my initial comments I was sceptical - I still am, but also see the benefits of your proposal and the possible solutions it may offer. I'm willing to experiment with it to prove its values, but will keep in mind what I've said above. &mdash;Vildricianus 10:20, 30 May 2006 (UTC)


 * Couple comments:
 * I'm not suggesting throwing open the floodgates to any and all contributions, and dismantling RFD, but as a thought experiment: how big a problem is it if there's "cruft in the index"? Under what circumstances would people be dissatisfied with this?  Simplistically, it seems to me that an entry for some hopelessly obscure term is either useful to a user who does happen to search for it (and to whom it is therefore, almost by definition, useful), or is effectively invisible to everyone else.  By "decent browsing features" do you mean ways of scrolling through alphabetical lists of all our entries?  That seems unworkable even if we don't have cruft!  (But it is true that people might be bothered by cruft if it happened to be spelled the same as the real word they were looking for, or nearby and shown to them by some automatic "did you mean?" function.)
 * At any rate, this seems very much like a Beer Parlour topic, not one to have off in this nuts'n'bolts gearhead room, where presumably we'd talk about how to implement new mechanisms after the policies which required the new mechanisms had been decided. Trying to invent the mechanisms first, or having the mechanisms drive the policy, is arguably wrong (though of course it happens all the time, because having the tail of implementation wag the dog of policy is sometimes the best one can do).
 * Anyway, all armchair philosophizing aside, I'm in complete agreement with you, HT, that some solid tagging mechanisms will be vital so that users can get useful views into some subset of an otherwise overwhelming corpus. –scs 22:30, 31 May 2006 (UTC)
 * Mmm, I followed this reasoning a couple of days - when I was only starting out here. Many reasons, here are a few:
 * Respectability - what sort of dictionary would we be? We're trying to become a reliable source.
 * Random paging - would lose its function; cruft would quickly outnumber the good stuff.
 * Space - wiki is not paper, but wiki is not just a storage for random strings of words either, which it would become if we started to allow all this nonsense.
 * Honour! - I wouldn't feel comfortable among the cruft, but that's probably more personal. Still, I guess the majority of regulars thinks this way.
 * And many more reasons, probably Wikimedia policy as well. Your idea is really a non-starter, but I guess it has to be spelled out from time to time.
 * Besides, it's very much Grease pit talk - working out a concept, predominantly technically. Presenting it on the BP is for a later stage, I guess. Also, from what I've learned here, policy comes after the solutions and decisions have been made. First comes the technical bit, a thought or two, then someone is bold; see for instance the creation of the Grease pit. If it appears to work, it will become common practice, and in a later stage, someone writes it down as policy. I know things are done completely different at Wikipedia. &mdash;Vildricianus 22:59, 31 May 2006 (UTC)

I agree with Widsith and would also welcome Hippietrails input which I feel is very valuable. I am developing a proposal at present not fit for public consumption yet which can be seen in my playpen (if anyone can find it - I think there is a link from my talk page) If anyone is really upset with what I am proposing please let me know. Andrew massyn 20:19, 29 May 2006 (UTC)