Wiktionary:Thesaurus/Improvements 2

''This was a place to discuss and coordinate efforts to improve Wikisaurus. It was created in April 2006, active in May 2006 and then stopped; a surge of activity appeared in May 2008.''

''Further discussions on Wikisaurus have been lead in Beer Parlour. For all discussions on Wikisaurus, see Wikisaurus.''

Subpages

 * /archive

Can we make use of bots to help populate Wikisaurus

 * 1) Perhaps simplest, least automated bot idea, therefore least dangerous..
 * Check every entry for existence of word "synonym". If exists, tag with category "possibility for Wikisaurus". Then we can manually go through these and create appropriate Wikisaurus entry, and delete the tag.--Richardb 12:59, 2 April 2006 (UTC)
 * I am not sure this would be that helpful, but it might be worth a try. An alternative would just be to take the entirety of Category:Adverbs and Category:Adjectives and start with them :) - TheDaveRoss 00:52, 3 April 2006 (UTC)
 * 1) Most useful would be a bot that checked to see that entries made in wikisaurus pages had entries in the wiktionary. If they did not and no other source was cited, an error report would be generated.  This would give us a starting point for clean-up of pages without actually altering anything. Amina (sack36) 08:52, 14 June 2008 (UTC)

New Format 1
TheDaveRoss is trying out some new ideas for Wikisaurus - see Wikisaurus:book, Wikisaurus:annoy, Wikisaurus:annoyed for examples.
 * Looks interesting. Any explanation ?--Richardb 09:15, 2 April 2006 (UTC)

My thoughts on what Wikisaurus needs right now are as follows: The format I have been playing around with attacks 2 and 4, and I have been working on 3 also. 1 will be contentious I am certain, so I have been leaving it alone for a bit.
 * 1) accuracy.  The stuff we have there should be as accurate as possible, if the stuff we have there isn't valid, what is the point of keeping it?
 * 2) organization/usefullness. We need to figure out a way to make sure that when someone shows up at a Wikisaurus page they can easily figure out what they came for, without having to consult other sources or read a manual of some kind.  This leads to usefullness.
 * 3) population.  The more the merrier, I think we should have a complementary page for pretty close to everything Wiktionary has, within reason.  Most adjectives, verbs and adverbs have synonyms, near synonyms, idioms etc., there is no shortage of work to be done
 * 4) unification.  One of the things that marks a good site rather than a bad one is the unity of the site.  People shouldn't have to reorient themselves when they navigate from one page to the next, so we should figure out one standard format for the pages and stick with it.  It should be: versatile so it can fit all the words, as easy to use as possible and good looking.

Here is how it goes: example:
 * 1) All lists are formatted in the alternating table, I think it makes them easier to read, even though it isn't the simplest to use.  It uses 4 templates,, ,  and wse simple

is contructed as follows:
 * }

wse: hrunk brief definition of the sense of hrunk

wse: krunk breif definition of the sense of krunk

wse: strunk brief definition of the sense of strunk

|} The obvious problem is that adding an entry to the middle of the list is more work than simply adding it, you have to some templates below it. This means you have to be more familiar with the formatting in order to add an entry which is good in some ways and bad in others.


 * Changed it all...so ignore that. - TheDaveRoss 16:20, 27 May 2006 (UTC)

The overall formatting of the page isn't that much different, I have added a section called "Related Wikisaurus entries" intended for other forms of the word ( would have and  etc.) and other related words that seem applicable. Also other words which seem applicable, has, because a "doer" is an active person.

Other than that, I put all the definitions of a single part of speech in one sense, seperated by definitions, it isn't the most clear, but it was the best I could come up with. I have several sub-sections, synonyms, antonyms, idiomatic, etc. I have done away with translations because I don't think they are actually useful, translations can be found at each entries Wikt* page, and they are more accurate there. I have made it so each synonyms, antonym etc. has a space for a sense definition, so that the user can see what each one means, implies, and get some idea of usage, rather than picking one blindly and incorrectly.

There is certainly more that can be done to the format, but I have redone a few of the smaller ones into this format, so there are several examples about how different words would look. - TheDaveRoss 00:52, 3 April 2006 (UTC)

New Format 2
This has prompted me to dabble a bit. See User:Richardb/danger for an example.--Richardb 09:15, 2 April 2006 (UTC)

I also started on Wikisaurus:weak, but found the format not so helpful, for a word with so many synonyms, and subtle differences of meaning.--Richardb 10:36, 2 April 2006 (UTC)

Scepticism
I'm somewhat sceptical about where all this is going. While a nice format is pretty, we should be more concerned with how the content fits together semantically. Each core idea needs a big picture overview, rather than just a higgledy-piggledy assortment of words. These assortments will just need to be revisited when we have a clearer grasp of the concept.

In Roget danger has its own class, 1006; it is also included in Class 971, uncertainty. Class 1006 is itself sub-divided into 5 noun, 3 verb, 8 adjective and 1 adverb sub-classes; it's a fairly modest class whose contents are included in only slightly more than one full page. I think that we would do better to fully analyze such a page to see where that leads us, and what it tells us about future structures, possible cross-referencing, and what kind of new semantic structures can be added. Eclecticology 02:16, 3 April 2006 (UTC)


 * What is the benefit of following Roget's? I don't understand why we should be concerned about how things fit together semantically, what is it exactly that you think Wikisaurus (and a thesaurus in general) is/should be for.  I don't see the point in having "core structures", but that is probably because I have a different idea of what the purpose of the project is.  Wiktionary is easy, the goal is to include all the definitions in all the languages for every term.  Wikisaurus doesn't have as clear a mandate, should we be looking to include all the synonyms and antonyms?  Should we be trying to expand vocabulary?  Should we be aiming to make a semantic web with interrelated words linked together?  Should we be trying to mirror Roget's thesaurus?  Maybe if we had a better idea of what the end goal is we would have a better idea of how to progress from here. - TheDaveRoss 03:23, 3 April 2006 (UTC)


 * That's a question for Wiktionary talk:Thesaurus considerations, where I raised it before. &mdash; Vildricianus 09:46, 5 April 2006 (UTC)


 * Wiktionary talk:Thesaurus considerations hasn't progressed much recently. But really this project, Wikisaurus improvement 1, is just an idea to attempt to give Wikisaurus a bit better reputation, not a complete new strategy. TheDaveRoss sort of kicked off the idea of providing some decent appearance to it (After I'd lambasted him for deleting too much out of Wikisaurus). Seems to me that, populist entries apart, Dave's idea might be worth pursuing, though we need to find a way to keep adding entries as easy as possible. In populist entries, why not have the top of the page as organised as possible (as we've sort of started to do), and let all the "extra" words (most of which can actually be found in literature, in Google searches etc) be clustered at the bottom. There is value in that vast list of apparent "dross". If someone, non-english native, is reading a populist book which uses some of these populist words, they should be able to find them, and some useful information, in this Wiktionary. At the moment, for many of these perfectly valid, though taboo, words, they only appear, as yet, in Wikisaurus. To just chuck them out would mean Wiktionary is less valuable.--Richardb 14:58, 5 April 2006 (UTC)

Compromise Proposal RB 2006 April 5th

 * 1) We work on finding some "structure" suitable for the majority of straight forward, uncontentious Wikisaurus entries.
 * 2) We use that same structure at the beginning of the "populist" Wikisaurus entries, and police that mostly only words with entries get into the structure.
 * 3) We allow all the unattested words (some of which are very valid, very valuable) to cluster at the bottom in only a mildly organised way.--Richardb 14:58, 5 April 2006 (UTC)

With glee I notice that even such words as uncontentious, straight forward and unattested do not have entries yet, and so, in the minds of some people, would not qualify to appear in Wikisaurus. :-)--Richardb 14:58, 5 April 2006 (UTC)
 * straightforward. ;-) &mdash; Vildricianus 15:05, 5 April 2006 (UTC)
 * attested, contentious. :-) Eclecticology 01:45, 7 April 2006 (UTC)


 * Perhaps multiple sections at the bottom would be in order. ==Nonces==, ==Protologisms==, ==Slang==, ==Phrases== and ==Unverified== might be appropriate groupings for our most commonly vandalized edited Wikisaurus entries.  As long as all entries in those sections are not wiki-linked, I think the argument could be made that they would be uncontentious.  I'd prefer them on a sub-page, myself.  --Connel MacKenzie T C 23:28, 10 April 2006 (UTC)
 * Sounds a bit complex, compared with "only in a mildly organised way". For starters, many of the people adding words inthese populist areas would not necessarily know what nonce words and protologisms are. To be honest, I'm not sure I really know the difference between a nonce word, a protologism and unverified. To me they all pretty much sound the same in practice - they do not meet the criteria.--Richardb 13:56, 14 April 2006 (UTC)
 * Hiding them on a separate page will, I fear, only lead to them being re-added to the "front page". Make them available but separate, down the bottom, seems more likely to be successful.--Richardb 13:56, 14 April 2006 (UTC)

Merged other page into this
I had a previouis page going - "Wiktionary:Project - Improving Wikisaurus". That page was not receiving much attention, and was essentially covering the same as this page, so I "merged" it in (with some difficulty). The talk page was transferred to the talk page of this project (Unfortunatekly with loss of history, though mostly it was my stuff anyway), and the previous article page was moved to Wikisaurus improvements/archive of Project - Improving Wikisaurus.

Many things which can be included (2006)
There are many relationships which can and should be included in this thesaurus, I have started a list of possibilities, please add other things that ought to be included, or note things you think should not be included. - TheDaveRoss 17:32, 27 May 2006 (UTC)

Information

 * etymology (A comes from B)
 * usage
 * regional information (A is primarily used in B)
 * connotation (A often suggests B)
 * denotation (A literally means B)
 * trivia (e.g. in "crwth" crwth and cwm are the only two words which use 'w' as a vowel)

Gramatical relations:

 * forms ''Each of the various forms of the root: e.g. "annoy" has "annoyance", "annoyed", "annoying"

Semantic relations:

 * meronym (A is part of B)
 * holonym (A has B as a part of itself)
 * hyponym (A is subordinate of B; A is kind of B)
 * troponym (A is subordinate of B; A is kind of B)
 * hypernym (A is superordinate of B)
 * synonym (A denotes the same as B)
 * antonym (A denotes the opposite of B)
 * heteronym (A is spelled the same as, but means something different than, B)
 * homonym (A is spelled and/or pronounced the same as B)
 * homograph (A is spelled the same as B)
 * homophone (A is pronounced the same as B)

Other relations:

 * anagram (A is an anagram of B)
 * cognates (A is derived from the same word as B)
 * false cognates (A appears to be but is infact not derived from the same word as B)

A Re-prioritization of the Wikisaurus Project (2008)
This section was mostly written by Amina (sack36) in May 2008.

Preface
Below, under proposal, I have put together a plan of action for cleaning up and making manageable the wikisaurus project. Could you read through and give me your impressions?

Proposal
In order to bring the project into a manageable form, I've set up a prioritization schedule. By following this schedule we will be able to present at least something to the general public (not to be confused with the geek public who actually--through perseverance and sideways thinking--were able to find this back alley.

Phase 1: Clean-up

 * The words that are already suggested will be cleaned up and set in their proper sequence.

Phase 2: Startup

 * only one form of each word will be used.
 * verbs: present tense, standard use (e.g. to be -> am)
 * nouns: no possessives or proper names
 * all etymology, usage, etc. we leave for later phases
 * as each section has consensus of "adequate" in the Beer hall, it will be locked and released to a more public area with a note on it that if anyone wishes to add to the project they can list their synonym for any given word on the discussion page of the appropriate page, and we will include it at our earliest convenience.
 * If they would like to join the wikisaurus project they can join the discussion and say hello in the Beer Garden. This will allow us a filter while we're setting things up.
 * Once we have a sufficient body of work finished in phase one, we will release the work to be handled in the normal wiktionary fashion.

Phase 3: TBD
This phase to be discussed in the Beer Garden forum. Amina (sack36) 02:20, 29 May 2008 (UTC)


 * I think that locking pages is not generally a good idea. The current situation on the 'Saurus is not due to the fact that anyone can edit, but the lack of any obvious criteria for what does and doesn't belong.   Of course there will be certain pages (particularly those related to sex and feces) that may require protection, but that shouldn't be the norm.  If a rigorous structure is set up and enforced, as is done in entry space, it will be easy for patrollers to identify problematic additions.  In this regard I think TDR's earlier work sets a helpful example. -- Visviva 03:27, 29 May 2008 (UTC)


 * Perhaps I misunderstand what you are saying. I went to the first of the pages and I see headwords.  So far, so good.  This is exactly what I was saying when I mentioned we would keep with the structure already established.  I clicked on one at random, and the structure that was there was way too complex for us to keep a handle on and still produce a product before 2030. It also had a very confusing introductory list at the top that was headed "Wiktionary | Wikisaurus | Sense"  There were several errors to the placement of words in categories.  I do want to incorporate nearly everything that is on the page, even keep the page almost the same as it looks, just put "coming soon" in the areas we can't handle right now. I also want to cut down on all that white space.  We need some, yes, but people get irritated when they have to scroll down to see if they're on the right page and thesauri need to be a "quick lookup" affair. One more thing and I will end. I got to thinking of the nature of a thesaurus.  Aren't all words headwords? Amina (sack36) 20:27, 29 May 2008 (UTC)