Wiktionary talk:Votes/pl-2022-02/Adding Collocations

Fun stuff
Some policy legalistics: This, that and the other (talk) 05:56, 27 February 2022 (UTC)
 * Can you create WT:Collocations as a policy-ED page? I wonder if it should be merged to WT:Example sentences as WT:Example sentences and collocations to avoid a proliferation of project pages.
 * Some verbiage will have to be added to EL, surely?


 * On the former point, I think we should then prescribe that the page be created with policy-VOTE, right? Not sure if merging is the way to go, WT:USEX gives many guidelines and explains idiosyncrasies that are not applicable to WT:Collocations which in turn would elucidate things like lemmatization etc. On the latter point, yes, of course, but I didn't want to go through the tedious work of fleshing that out before starting a discussion first to get some input. In my view, the collocations sections should be similar to WT:EL, i.e. first briefly explaining what it is about, then link to WT:Collocations and give some formatting guidelines. &mdash; Fytcha〈 T | L | C 〉 10:19, 27 February 2022 (UTC)
 * Thanks, all of that makes sense.
 * As for the first point, I had in mind that you would use policy-ED for now, and it would become policy-VOTE if the vote passes. However, I'm not really in favour of immediately applying policy-VOTE to this page. It's a brand new concept and hardly settled in its finer implementation details, so policy-TT feels like an appropriate level for the informational/instructional page. This, that and the other (talk) 10:47, 27 February 2022 (UTC)
 * Makes sense; how about I create User:Fytcha/Collocations, we all collaboratively flesh it out according to our likings and then make the vote's content (in that regard) simply be the move of that page to the WT namespace? &mdash; Fytcha〈 T | L | C 〉 10:59, 27 February 2022 (UTC)
 * I'm on board with that. This, that and the other (talk) 11:01, 27 February 2022 (UTC)

phrases
All collocations should be NPs, VPs, or predicates. We don't want non-constituents, even if the headword is a non-constituent. DCDuring (talk) 18:35, 2 March 2022 (UTC)


 * But this would exclude collocations such as in one's judgement, wouldn't it? &mdash; Fytcha〈 T | L | C 〉 20:12, 2 March 2022 (UTC)
 * Yes. Would it be missed?
 * It would. Allahverdi Verdizade (talk) 19:00, 11 March 2022 (UTC)
 * My main concern is just that the collocations be coherent as non-constituents are not. Prepositional phrases can be put off for now. We would be going where other English dictionaries have not gone before, at least at our scale. Why not try to concentrate our efforts on core word classes and phrase types and see how well we could do this. We have had failed efforts (Shorthand!) and marginal successes (Wikisaurus). Let's try to make it a success by not spreading ourselves too thin.
 * BTW, what tools are we going to use to find collocations that are sufficiently high frequency. How will we handle synonyms (eg, "get down to the nitty-gritty/brass tacks"? Both? Reference to Wikisaurus or synonyms at referenced entries? DCDuring (talk) 00:39, 3 March 2022 (UTC)
 * The three major bilingual dictionaries leo.org, dict.cc and Langenscheid all list, which I take as evidence that it is useful for language learners. In my opinion, it also clearly falls under the purview of my first argument: They are important for familiarizing oneself with the natural mode of expression. So yes, it would be missed in my view. I am willing to compromise and to put off prepositional phrases for now but with the eventual goal to include them too once we've gathered more experience with collocations, just like I also want to eventually include collocations for adjectives and adverbs too.
 * Re tools: I personally like Merriam-Webster's "phrase" section which is a lot closer to a collocations section than a derived terms sections (many SOPs); Collins also has a collocation section but it's not that good from my experience. Apart from English, there are very useful tools for the languages that I edit (see vote page). I think we should also rely heavily on common sense in finding collocations; even if 90% of people drove a black car, that wouldn't make black car a collocation of car, even though it appears much more often than is expected by chance. &mdash; Fytcha〈 T | L | C 〉 09:03, 3 March 2022 (UTC)
 * In this context relying on "common sense" means relying on "one's own idiolect". This goes against our own practice with respect to definitions. I think it is a poor tool to rely on in this context as well. DCDuring (talk) 17:31, 3 March 2022 (UTC)
 * There is no and there will never be a hard and fast rule to ascertain what is and isn't a collocation, just like there is no perfect set of sufficient conditions for almost anything we talk about in our daily lives. We probably cannot define the word game yet we can still have meaningful discourse about it and agree that Monopoly definitely is a game and the fine-structure constant definitely isn't. What's more, one person's common sense may be in parts reflective of their own idiolect, but the collective of all our editor's idiolects is a close approximation of what the language actually is. I initially thought about providing an RFD-like venue for collocations but decided against it for the time being (we need less bureaucracy, not more); we might add it at a later stage though. Lastly, I've provided various resources for collocations in other dictionaries which can serve as a basic guideline for us. With all of this, I'd say we're reasonably equipped to discriminate collocations from non-collocations. &mdash; Fytcha〈 T | L | C 〉 12:26, 6 March 2022 (UTC)

May cause paradigm shift
While the examples in other dictionaries look very reasonable and I would like to try, this will necessarily change views on what is “sum of parts” and cause controversy therein. For example, should we now add, ,  and so on to , , and ? Probably not in sections, because these pages are already unwieldy, whereas  style templates (you say “analogous to ” instead, which boils down to the same) are applicable; but then do we need the pages? You regularly won’t ever prove that pursuant to WT:JIFFY the collocation page should be kept because the term was originally only found in that: it is always doubtful in so far as the noun will naturally, by a certain likelihood and frequency, occur with a certain verb. (And the rappers seem to have endless synonyms for the indicated idea, only aware of which editors would look at the phrases differently: I gave but three for the figure.) Fay Freak (talk) 14:19, 5 March 2022 (UTC)


 * I originally proposed only allowing non-articles as collocations (Beer_parlour/2021/December) which would have sidestepped this issue I think. &mdash; Fytcha〈 T | L | C 〉 12:03, 6 March 2022 (UTC)

Header order
I just noticed that collocations will be placed between Descendants and Translations. Currently we have all the intra-language info coming before the inter-language info, but this would disrupt that order. Is there a particular reason why it is wanted to be placed in this location? Can I suggest putting Collocations after Related terms? Or even after Derived terms? This, that and the other (talk) 09:12, 12 March 2022 (UTC)


 * Good suggestion, I didn't notice there was this split (though, to be fair, "See also" also breaks it). I don't want to put them between derived and related terms because these two feel somewhat fluid to me at times but I think putting them between "Related terms" and "Descendants" is an improvement (and probably the overall best spot). &mdash; Fytcha〈 T | L | C 〉 10:03, 12 March 2022 (UTC)
 * Thanks. Yes, See also is an oddball header really. I try and avoid it where I can. This, that and the other (talk) 10:05, 12 March 2022 (UTC)

ux
I see no value, perhaps negative value, in this policy change compared to simply encouraging people to add uxi with common colocations. I do that. If it's against the rules I hadn't noticed, and we should simply relax the rules for using uxi to allow examples to be strings of phrases like

What would the proposed new templates do beyond burdening our memories? Vox Sciurorum (talk) 13:21, 12 March 2022 (UTC)
 * Yes, I'm confused as well. PUC – 13:22, 12 March 2022 (UTC)
 * coi will correctly categorize: when I'm working through a collocation dictionary (to add them to Wiktionary), I want to find all lemmas with no collocations, not all lemmas with no usage examples.
 * coi will make it easier to correctly order collocations and usexes as they can easily be differentiated in the source code.
 * coi being easily differentiable from uxi also has the benefit that anybody, even somebody who doesn't speak the language, has the ability to move excessive collocations down into the collocations section.
 * coi is machine-parsable: while I don't have concrete plans in this direction, I can definitely see why someone would want to make use this at some point. Someone may want to generate a list of all collocations (in other articles) a word is a part of that are not present in the word's collocation list, i.e. have a program run through the database dump so it can suggest the collocations from to be added to  and.
 * It's semantically not the right thing to have only one template for two types of strings that follow wildly different policies: ux may not contain links, but collocations should of course be allowed to contain links (cf. the above ); collocations follow a lemmatization scheme; usexes may not be complete sentences etc.
 * In addition to these points, whether editors have to memorize the name of a new template (3 letters) or the change in policy that we have to make to WT:EL and WT:USEX doesn't really make a difference. Editors have to memorize and change something in any case as the status quo is untenable (large-scale breach of policy). &mdash; Fytcha〈 T | L | C 〉 14:39, 12 March 2022 (UTC)
 * Is the purpose of this proposal to allow for the addition of collocations which do not qualify as entries and so would not appear under “Derived terms”? If so, I agree with Vox that these should just be added as usage examples under relevant senses. (In fact, I didn’t know doing so was supposedly against the instructions for usage examples as they are widespread in the Wiktionary. We should just update the wording of the instructions to allow them.) The advantage of putting collocations as usage examples is that readers will know which sense of the entry is used in a collocation. If the collocations are gathered in a separate section, then it may be necessary to repeat the sense using which is cumbersome. — SGconlaw (talk) 04:34, 15 March 2022 (UTC)
 * This is clear enough for other languages (see any of our Latvian entries!) but it would look funny for English, wouldn't it?
 * An error; a blunder.
 * to make a mistake
 * to admit one's mistake(s)
 * There's no context as to why the words "to make a mistake" and "to admit a mistake" are there. They need to be labelled as a collocation somehow.
 * Fytcha is apparently proposing something like:
 * An error; a blunder.
 * Collocations: to make a mistake, to admit one's mistake(s)
 * although I'm not sure exactly how it is intended to work with the templates (@Fytcha can you clarify? A mocked up example might help.) This, that and the other (talk) 05:26, 15 March 2022 (UTC)
 * it doesn’t seem to me any more confusing than the current usage examples in the form of full sentences, but I suppose I have no strong objection if we wish to add the label “Collocations” in front. However, I don’t think we should allow for collocations to be gathered in a separate section after the definitions, as with “Derived terms” and “Related terms”. — SGconlaw (talk) 05:31, 15 March 2022 (UTC)
 * At the time of me writing the vote, I didn't have anything fancy in mind with coi, but yes, this is also an advantage of having separate templates: in the future, we'll have the freedom to change the styling of collocations at any time such as adding a label in front of them if we so wish. &mdash; Fytcha〈 T | L | C 〉 09:06, 15 March 2022 (UTC)
 * Ah, I see you intend for coi to be visually identical to uxi. Interesting. I think SGconlaw makes some good points, in that event. (I wouldn't oppose the vote just on this basis, I hasten to add.) This, that and the other (talk) 09:41, 15 March 2022 (UTC)
 * Collocations are still generally to be listed under their respective senses just like ux: User:Fytcha/Collocations: "Similar to nyms, collocations may either be placed under the corresponding sense, after all nyms but before all example sentences, " The second option, using a collocations subheader, is just there as an alternative should there be too many collocations for a given sense, in which case I personally strongly prefer having a collapsible box further down over outright removing them. &mdash; Fytcha〈 T | L | C 〉 08:58, 15 March 2022 (UTC)
 * Collocations are still generally to be listed under their respective senses just like ux: User:Fytcha/Collocations: "Similar to nyms, collocations may either be placed under the corresponding sense, after all nyms but before all example sentences, " The second option, using a collocations subheader, is just there as an alternative should there be too many collocations for a given sense, in which case I personally strongly prefer having a collapsible box further down over outright removing them. &mdash; Fytcha〈 T | L | C 〉 08:58, 15 March 2022 (UTC)

New parameter for synonyms of collocations?

 * If one wants to mention a collocation (entry-worthy or not) which is equivalent to / synonymous with another collocation as a whole, there's currently no good way to accommodate that.

This may not be the most convincing example (I've been digging through my contributions to find others, but no luck so far, though I'm absolutely certain I've done that elsewhere), but see, interjection, usex 3:

doesn't strictly belong there: it's not a usage example / collocation of, but a synonym of bien fait pour ta gueule.

Do you think it would be worthwhile to add a parameter to ? PUC – 16:31, 3 April 2022 (UTC)

Implications of this vote
While it has been discussed that collocations will partially take over the role of usage examples, the proposal may also have an impact on entries. Many terms that some users regard as SoP may be RFD-deleted with the justification that they could be better represented as collocations, following changes in our policy due to this proposal. Looking at the backlog at WT:RFDE, it seems like disputed entries such as mink coat, nature-lover, breakthrough infection, smooth-running, many-branched, subway car, three-forked, morning grouch, two-hand sword, etc. may no longer remain as independent entries due to the new proposal. Since this proposal will drastically affect our entries, I’m letting some editors know about this vote:  ·~   dictátor · mundꟾ  22:49, 11 April 2022 (UTC)
 * Thanks for the ping- as an example could you explain how policy relative to nature-lover is changed by this vote and why it would be more likely to be removed as an entry? Thanks. --Geographyinitiative (talk) 23:20, 11 April 2022 (UTC)
 * While this vote has not explicitly stated that collocations can’t be created as entries, the idea of this vote traces back to a previous BP discussion where editors were thinking of a solution to the ongoing ‘SoP dilemma’. And of course, entries like nature-lover will eventually get RFD-deleted because only 4 editors want to keep the entry, as opposed to 9 editors who want it deleted. ·~   dictátor · mundꟾ  19:57, 12 April 2022 (UTC)
 * On the basis that Collins Dictionary and Lexis have entries for nature lover, I don't believe that either nature-lover or nature lover can ever be permanently deleted from Wiktionary because the question will always remain: why do the legitimate, authoritative English language dictionaries have this word, but the policies of the crowd-sourced internet dictionary block inclusion of this word? Me saying this is not to disparage Wiktionary; it has strengths that can make it better than those more authoritative dictionaries. But the OCLC-infused cites will always be ready and waiting on the Citations page when the inevitable successful challenge to reverse any spurious future deletion comes. There is no policy here that will ever successfully trump the actual authorities. --Geographyinitiative (talk) 20:07, 12 April 2022 (UTC)


 * On the other hand, your argument would seem to suggest that any mistake by the mainstream dictionaries must propagate to us, and we cannot be better by omitting their mistake. Aren't we descriptive, and not prescriptive? Copying another dictionary makes us always slow, lagging behind, and copying something we don't understand. Better to do our own work. Equinox ◑ 06:19, 24 April 2022 (UTC)
 * Well, they should serve as a guidance, not a necessary reference. ·~   dictátor · mundꟾ  23:52, 25 April 2022 (UTC)