Wiktionary:Votes/pl-2022-02/Adding Collocations

Adding Collocations
Voting on: Providing the necessary infrastructure to greatly facilitate the documentation of collocations. Concretely:
 * 1) Enforce the creation of collocation, co and coi, analogous to usex, ux and uxi but with separate categories.1
 * 2) Move User:Fytcha/Collocations  to Collocations, thereby making it binding policy.
 * 3) Add   to WT:EL between   and   that may be used in any language.
 * 4) Add   to WT:EL between   and   that may be used in any language.
 * 5) Add the L3 section   to WT:EL between   and   with the following body:
 * Main article: Collocations

Collocations are combinations of words that occur with much higher frequency than would be expected by chance.

Collocations may either be added under the corresponding sense using coi or co (after all nyms but before all examples), or under a dedicated  header, as described in Collocations.

1: A failure of the vote as always means that the status quo ante (prior to the vote) is upheld; in particular, this means that anybody is free to create and employ collocation, co and coi as well as the categories, but nobody is forced to do so.

Rationale:
 * Collocations are indispensable linguistic information. Collocations are of major importance in language learning to familiarize oneself with the idiomatic (sense 1) and natural mode of expression of a language.
 * There is clear consensus among bilingual dictionaries on the internet that this information is very valuable. Compare our article for with dict.cc, leo.org, pons.com. Wiktionary is lagging behind; no, en.wiktionary is lagging behind, compare de:verdict (under "Charakteristische Wortkombinationen:").
 * In German, there are multiple collocation dictionaries (e.g. https://kollokationenwoerterbuch.ch/web/dict/?article_id=640548) and major monolingual dictionaries also come with their own (statistically generated) list of "typical combinations" (e.g. https://www.dwds.de/wb/einjagen). The same is true for the major Romanian dictionary DEX (see e.g. mânca -> ro:mânca): Expressions like "A-și mânca credința (sau omenia, lefteria) = a-și pierde prestigiul, cinstea, creditul." would not survive as standalone articles on en.wikt per WT:SOP. It is also true for Alemannic: Schweizerisches Idiotikon for Schrëck lists collocations such as "En Schr. īnnin", "E(n) Schr. han."
 * Contrary to our existing guidelines at WT:USEX ("Example sentences should: be grammatically complete sentences, beginning with a capital letter and ending with a period, question mark, or exclamation point." (emphasis mine)), editors already frequently provide incomplete examples - collocations (see e.g. back: "back action", "to back a letter", "to back the oars"). Thus, this vote would amend our policies to be more in line with the de facto standard.

Schedule:
 * Vote starts: 00:00, 18 March 2022 (UTC)
 * Vote ends: 23:59, 16 April 2022 (UTC)
 * Vote created: &mdash; Fytcha〈 T | L | C 〉 14:50, 26 February 2022 (UTC)

Discussion:
 * [[Image:Wikt rei-artur3.svg|20px]] 
 * [[Image:Wikt rei-artur3.svg|20px]] 
 * [[Image:Wikt rei-artur3.svg|20px]] 
 * [[Image:Wikt rei-artur3.svg|20px]] Wiktionary talk:Votes/pl-2022-02/Adding Collocations

Support

 * 1) . &mdash; Fytcha〈 T | L | C 〉 00:02, 18 March 2022 (UTC)
 * , and moral support for a concerted effort to add quality collocations for key English entries as a starting point. This, that and the other (talk) 04:51, 18 March 2022 (UTC)
 * 1)  Allahverdi Verdizade (talk) 09:55, 18 March 2022 (UTC)
 * 2)  A sensible proposal for adding valuable information. JeffDoozan (talk) 13:54, 18 March 2022 (UTC)
 * , after much work done on these. Hopefully we'll be able to develop them further in the future. Vininn126 (talk) 16:08, 18 March 2022 (UTC)
 * , I think it would be useful to have these listed on a main entry rather than having full fledged entries for them, and I've seen many other published and online dictionaries also do so. —Svārtava (t/u) • 17:02, 18 March 2022 (UTC)
 * 1)  Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:27, 18 March 2022 (UTC)
 * 2)  The afforded exemplars are convincing to try. Fay Freak (talk) 22:11, 18 March 2022 (UTC)
 * 3) . Imetsia (talk) 14:45, 20 March 2022 (UTC)
 * 4) . Pablussky (talk) 16:05, 21 March 2022 (UTC)
 * 5) . To quote :
 * "In 1933, Harold Palmer's Second Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a foreign language. Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of monolingual learner's dictionaries. As these dictionaries became 'less word-centred and more phrase-centred', more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text corpora and intelligent corpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the Macmillan English Dictionary and the Longman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations."
 * In other words, providing collocations is something good paper dictionaries have done for decades; surely a universal online dictionary ought to do the same. Tetromino (talk) 20:11, 22 March 2022 (UTC)
 * , excited to see this change in the books. AG202 (talk) 12:23, 23 March 2022 (UTC)
 * 1) . Thadh (talk) 14:26, 23 March 2022 (UTC)
 * 2)  – Jberkel 14:31, 23 March 2022 (UTC)
 * 3)  and thank you General Vicinity (talk) 04:59, 24 March 2022 (UTC)
 * 4)  brittletheories (talk) 09:25, 25 March 2022 (UTC)
 * 5)  Sarilho1 (talk) 10:30, 25 March 2022 (UTC)
 * 6)  —Globins (yo) 09:41, 29 March 2022 (UTC)
 * 7) . This is long overdue and I am very happy to see that this finally has a chance of passing. As an aside, I'm a bit surprised that a lot of very active and prominent users, like Equinox, -sche, DCDuring, etc. have not participated in this vote. Since I haven't been around much, is there a particular reason for that? Andrew Sheedy (talk) 22:11, 30 March 2022 (UTC)
 *  I've done a lot of work on cross-linguistic collocations over the years and find this to be an indispensable aspect of language. Kangtw (talk) 02:14, 3 April 2022 (UTC)
 * Struck: not eligible to vote per WT:VP since the account was created only today with no other edits. —Svārtava (t/u) • 03:29, 3 April 2022 (UTC)
 * 1)  - think this is an excellent and long-overdue idea. BigDom 05:11, 3 April 2022 (UTC)
 * 2)  —Mahāgaja · talk 19:30, 5 April 2022 (UTC)
 * 3)  —AryamanA (मुझसे बात करें • योगदान) 16:45, 15 April 2022 (UTC)

Oppose

 * : Can of worms. -- 04:44, 29 March 2022 (UTC)
 * Care to expand? 70.172.194.25 17:12, 29 March 2022 (UTC)
 * 1) . Collocations should be represented as either independent entries or usage examples.  ·~   dictátor · mundꟾ  17:03, 29 March 2022 (UTC)
 * See the last line of the rationale. —Svārtava (t/u) • 04:23, 30 March 2022 (UTC)
 * See the point raised by &  here.  ·~   dictátor · mundꟾ  09:32, 30 March 2022 (UTC)
 * Lexical items will be represented the same: either as independent entries or under senses as usage examples. It appears misleading by Sgconlaw to bode that “collocations [will be] gathered in a separate section”. As said I reckon it mostly a kind of markup, possibly allowing for inclusion of content that earlier would not have found a place at all due to old ambiguities and possibly causing a paradigm shift for inclusion, that I want to see how it works out: probably won’t make editors do bad. Fay Freak (talk) 00:01, 31 March 2022 (UTC)
 * but the proposal expressly allows for collocations to be gathered in a separate section, so I don't see how that is misleading. — Sgconlaw (talk) 18:24, 11 April 2022 (UTC)
 * Which is not different to the situation when they are also gathered in a separate section but with a different name and otherwise dodgy presentation to pass off as something else. Fay Freak (talk) 18:26, 11 April 2022 (UTC)
 * if that happens, then shouldn't a few be used as usage examples and the rest deleted? Not sure how the proposal helps to resolve that particular problem. — Sgconlaw (talk) 18:30, 11 April 2022 (UTC)
 * Because this is not the problem it helps to resolve, nor creates. The problem it helps to resolve is indicating which combinations are idiomatic in a language. For example you would not know that in German man says for “to make preparations”, which appears not enough for a usage example (also due to the principally not invalid doctrine that usage examples should be whole sentences). So if editors become circumventive there is something that should not be deleted but presented properly. Fay Freak (talk) 18:35, 11 April 2022 (UTC)
 * why can’t a reasonable number of such collocations be indicated as usage examples? I’m not seeing why it is necessary to have a new way of handling collocations. If our policy currently says that usage examples have to be complete sentences, that can be changed. It seems that that requirement isn’t followed anyway, because I see many usage examples that are not complete sentences. — Sgconlaw (talk) 18:46, 11 April 2022 (UTC)
 * That’s why I wanted to see how it will look like. I see more chances than dissuasion. If you don’t exactly know then you will see, and methinks it is exceedingly unlikely you will be disappointed. Fay Freak (talk) 18:52, 11 April 2022 (UTC)
 * Yes, intended or not, this proposal effectively weakens the role of usage examples, which is a good thing IMO. We should focus on quotations. – Jberkel 10:09, 30 March 2022 (UTC)
 * 1)  On further thought, I'm going to oppose. I am concerned that having a separate “Collocations” section after “Related terms” will be an invitation to editors to include numerous collocations which aren’t derived or related terms. To give an example from an entry I’ve been working on recently, I can well imagine editors adding to a “Collocations” section abject fear, abject failure, abject poverty, abject shame … On the other hand, if a modest number of collocations are to be added, I think the current practice of including them as usage examples suffices (and if our policy says that usage examples have to be complete sentences, then it is simple to change that policy). This practice also eliminates the need for the sense of a collocation to be repeated in the “Collocations” section using . — Sgconlaw (talk) 18:54, 11 April 2022 (UTC)
 * 2)  At this late stage in the voting process, where strong support is apparent, I guess this is academic. In principle I am not against collocations section. In fact, it is a great idea. But, I feel that the present proposal doesn't really explain clearly what would count as a collocation and what not. I can only see tears and heartache ahead and arguments over whether something is a collocation or a compound term worthy of its own entry abounding, with collocationers and anti-collocationers, and so on. Splenetic? Moi? Perhaps. But we shall see, I suppose. - Sonofcawdrey (talk) 11:50, 15 April 2022 (UTC)
 * @Sonofcawdrey Hmmm actually I agree with you here, maybe @Fytcha could illuminate on that point. I wouldn't want entries being deleted just to be thrown under a collocation header. I personally envisioned it as a place for more clarity for usage examples that already exist + a place for future collocations that wouldn't warrant becoming an entry normally. AG202 (talk) 19:42, 15 April 2022 (UTC)
 * I also agree with this point (Inqilābī posted something on the talk page to a similar effect). I simply think that the benefits outweigh the harms in this case. Imetsia (talk) 21:47, 15 April 2022 (UTC)
 * I'm not sure this is the correct place (or time?) to bring this up, but in terms of attempting to define what is or is not a collocation, scholars in linguistics have made certain headway, at least along certain lines. The following is an extract from the methodology section of a paper on creating 2-word collocation lists from a corpus (https://ris.cdu.edu.au/ws/files/47907814/PaperESPDraft.pdf):
 * Following Simpson-Vlach and Ellis (2010) and Biber et al. (1999), a phrase had to occur at a minimum of 10 times per million words to be on the final lists; however, while they applied their metric across disciplines, we set the minimum frequency criteria within each discipline. A Mutual Information (MI) threshold was also set. Simpson-Vlach and Ellis (2012) in their phrase list argue that not only is this useful for teachers who may prefer to teach by MI ranking rather than frequency, but the MI score is a better indicator of the holistic nature of the phrase. Following Hunston (2002), any phrase in which the two words had a minimum MI below three was excluded from the list (see also Durrant, 2009). Following Lei and Liu (2016) and Gardner and Davies (2013), it was further required that to be pedagogically useful a phrase had to be of generally utility in the discipline, rather than occurring only in few texts and contexts. For this, two threshold metrics were applied to the phrases. The first was the Oakes dispersion test, which computes the distribution of an item across a corpus on a scale of 0-1. An item that occurs frequently and with a high MI may nevertheless only occur in a cluster of passages in a corpus. All phrases therefore had to meet a minimum dispersion metric of 0.5, following Lei and Liu (2016). Secondly, a range criterion to ensure the phrase occurred in a minimum number of texts for the target discipline. The range chosen was that the item had to occur in at least 20% of all corpus files at the minimum frequency of 10 occurrences per million words.
 * Not saying we use this, exactly, but it give one an idea of the types of statistical tools that can be useful.-Sonofcawdrey (talk) 03:35, 17 April 2022 (UTC)

Abstain

 * : Whatever one thinks of usexes, they can be useful in the absence of quotations. I do add them occasionally. DonnanZ (talk) 17:07, 11 April 2022 (UTC)
 * : I am left a bit unsure of how this would work in practice. Collocations are massively important, but they are also potentially infinite. "Collocations" sections have the potential to spiral out of control in the way that they're currently described. Are we distinguishing between different types of collocations? For words of the type "wood X", these can mean "X living in woods", "X used to hold wood" , they can be locative constructions , instrumental , similitive , generally attributive , or any of various other things. And this is off the top of my head, there are much more complex examples. Not sure what bearing this vote has on the matter, but I am also very opposed to restricting the possibility of collocations existing as entries in their own right, which is crucial in the many cases where they have specific etymologies, sense-developments, or other histories that need to be explored. So no opposition to a new section in principle, as long as it doesn't restrict the possibility for collocations to exist as independent lemmas. Ƿidsiþ 05:50, 12 April 2022 (UTC)
 * 1)  I was brought here by the talk page; speaking on behalf of ignoramuses everywhere, I have difficulty reading the text of the above proposal and visualizing the results with relation to entries like nature-lover. --Geographyinitiative (talk) 17:49, 12 April 2022 (UTC)

Decision
Passes 22-4-3. Enjoy implementing the new toys. I played with them at wish, and made Template:coi. Yes, after 14 years here, I'm still on template-making level 0. Notusbutthem (talk) 00:07, 17 April 2022 (UTC)
 * Looks like dropped the mic. – Jberkel 09:22, 26 April 2022 (UTC)