User:Dan Polansky/Collocation

I have started adding to Wiktionary what is probably called "collocations": to nouns, adjectives that are often applied them ("knowledge": "tacit knowledge"); to adjectives, nouns to which they often apply ("tacit": "tacit knowledge"); to adverbs, adjectives that they often modify ("morally": "morally responsible"); to adverbs, verbs that they often modify ("fast": "ride fast"); to verbs, adverbs that often modify them ("ride": "ride fast").

Example lists:
 * Adjectives often used with "knowledge": extensive, deep, superficial, theoretical, practical, useful, working, encyclopedic, public, private, scientific, tacit, explicit, general, specialized, special, broad, declarative, procedural, innate, etc.
 * Nouns to which "noxious" is often applied: substance, chemical, fume, gas, odor, plant, weed, animal, stimulus, stimulation.
 * Adjectives to which "morally" is often applied: right, wrong, good, bad, acceptable, unacceptable, responsible, correct, reprehensible, repugnant, corrupt, justified, questionable, neutral, objectionable, permissible, offensive, relevant, ambiguous.

The collocation information can be determined rather objectively, its presentation in Wiktionary entries is compact, and the information is rather useful--above all for non-native speakers, but in part also for native ones. For an adjective, the definition alone often does not make it clear which nouns it readily modifies, a class of information crucial for sentence production. Collocations are useful for verifying and correcting definitions of adjectives: a candidate definition of an adjective can be applied to various nouns and checked for how well it fits. Collocations make it easier to think clearly about adjectives; many adjectives are too abstract without the choice of a class of objects to which they are applied ("disagreeable smell" is much less ambiguous than "disagreeable person"). The collocation part is often the single most interesting information in an example sentence in Wiktionary; while an example sentence takes a line, a single collocation takes only one word in the list; collocations used in example sentences are often not checked for frequency, unlike collocations that I enter. Lists of collocations serve in part as word-finder, much like a thesaurus: they often contain synonyms or almost synonyms, but only the ones most often used; they also contain contrast sets. Collocations provide a differentiating added value to Wiktionary: as of now, most online English dictionaries do not offer collocations.

To determine which adjectives are often applied to a noun, I generate a candidate list by brainstorming and extend the list from looking into search results of the noun as found by Google on the web and in Google books. Then, for each candidate (as "tacit knowledge"), I determine the number of Google hits and Google books hits. I watch for things that can skew the numbers, such as the existence of a work that has the search term in its title, and modify the search to remove the skewing. Thus, the search '"hideous strength" -"that hideous strength"' has to be used to exclude references to a work by C. S. Lewis. Then I determine the candidates with the highest number of hits, either approximately by impression, or I calculate a score in a spreadsheet based on the two numbers. (The score is the sum of the two values divided by the median of the respective value set.) I cut off the bottom of the list of candidates, and order the rest in part semantically, and in part by the number of hits or the score.

Attributive application or modification ("useful knowledge") is the only one detected by the above method; predicative application ("knowledge is useful") remains undetected.

As the location for these lists, I use usage notes.

Candidate phrases to be used in the usage notes:
 * Adjectives often applied to " ":
 * Nouns to which " " is often applied:
 * Adjectives to which " " is often applied:
 * Verbs to which " " is often applied:
 * Adjectives often used with " ":
 * Adjectives that often modify " ":
 * Adjectives that often modify " ":
 * Adjectives that often modify " ":
 * Adjectives that often modify " ":

Dictionaries of collocations:
 * LTP Dictionary of Selected Collocations, Language Teaching Publications, 1997
 * Oxford Collocations Dictionary, Oxford University Press, 2002
 * Oxford Collocations Dictionary, Oxford University Press, 2007
 * Oxford Collocations Dictionary, Oxford University Press, 2009
 * English Collocations in Use, Cambridge University Press, 2008
 * A Collocation Inventory for Beginners, LAP Lambert Academic Publishing, 2009