Wiktionary:Votes/2014-01/Treatment of repeating letters and syllables

Treatment of repeating letters and syllables
Pursuant to the discussions linked below, I propose to amend the WT:CFI to state the following immediately after section WT:CFI, as a section with a level-3 heading:
 * Voting on: Treatment of repeating letters and syllables
 * Vote starts: 00:01, 29 January 2014 (UTC)
 * Vote ends: 23:59, 27 February 2014 (UTC)


 * Vote created: bd2412 T 19:40, 22 January 2014 (UTC)
 * Discussion:
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2013/July
 * [[Image:Wikt rei-artur3.svg|20px]] Talk:hahaha

Support

 * 1)  TeleComNasSprVen (talk) 02:01, 2 February 2014 (UTC)
 * 2) * Amend for meta-vote; I agree on removing the 'hard redirect' portion and I have also suggested removing the part after 'The above treatment may be overriden by consensus' as being superfluous to the current state of affairs. TeleComNasSprVen (talk) 10:30, 16 February 2014 (UTC)
 * 3)  Equinox ◑ 02:05, 2 February 2014 (UTC)
 * 4)  --WikiTiki89 02:06, 2 February 2014 (UTC)
 * 5)  But do we need to include the definition of "hard redirect"? Surely this should be on some other page than CFI, and we can link to it there. This, that and the other (talk) 06:41, 2 February 2014 (UTC)
 * I don't see how it hurts. --WikiTiki89 06:53, 2 February 2014 (UTC)
 * 1)  (although I can see myself proposing to remove the definition of a hard redirect the next time we have a vote on removing unnecessary clutter from the CFI page) - -sche (discuss) 19:02, 2 February 2014 (UTC)
 * 2)  as nom. Cheers! bd2412 T 16:06, 10 February 2014 (UTC)
 * 3)  and I would like to second TTATO/-sche. Maybe if User:BD2412 agrees, we can count this as consensus (nobody has disagreed) and remove that bit from the version that gets copied into the CFI. —Μετάknowledge discuss/deeds 02:27, 12 February 2014 (UTC)
 * 4) * I have no objection, and in fact that was not part of my original language. It was added by Dan Polansky, and I think that he has already expressed a willingness to see it removed. bd2412 T 05:21, 12 February 2014 (UTC)
 * 5) ** I don't actually agree with removing the clarifying bit about what "hard-redirect" is supposed to mean, so no consensus on this one. I don't believe the meaning of the term is so obvious that clarifying it harms CFI. If a supermajority wants to have the clarifying bit removed, the bit will get removed, but removing the bit is not proposed in this vote, and most voters do not comment on a proposal to remove the bit. --Dan Polansky (talk) 16:32, 14 February 2014 (UTC)
 * 6) *** Sorry, I thought I had seen you say that elsewhere, but couldn't find the conversation. Since hard redirects are used in circumstances other than these, perhaps we can merely provide a link to Redirections, or Help:Redirect. bd2412 T 15:29, 16 February 2014 (UTC)
 * 7) **** No hard feelings. As for your proposal, neither Redirections nor Help:Redirect mention let alone define "hard redirect" or "hard-redirect", and the two pages are not policies, unlike WT:CFI. Furthermore, I don't get what is wrong about defining a term at a location that is very close to the only use of the term in CFI. Generally, I don't see anything wrong about documents defining their terms; this is what contracts usually do, as well as mathematical articles. Moreover, directing readers to other pages instead of providing them with a 15-word definition in place is bad usability, IMHO. --Dan Polansky (talk) 18:00, 16 February 2014 (UTC)
 * 8)  --Anatoli (обсудить/вклад) 03:37, 12 February 2014 (UTC)
 * 9)  Jamesjiao → T ◊ C 03:40, 12 February 2014 (UTC)
 * 10) I support this on the assumption and stipulation that, should the long word be found to be a word in another language (where it's not repetition or where it's repetition of a different word), be as common as or more common than the shorter word, or have a different or additional sense or even some interesting etymology, it not be a hard redirect. &#x200b;—msh210℠ (talk) 18:45, 14 February 2014 (UTC)
 * Re "on the assumption [...that...] should the long word be found to be a word in another language [...] it [will] not be a hard redirect": yes, this is the case, hence the vote says "having no other meaning in any language". Re if the long, repetitive word is more common than the shorter word, or has a "interesting etymology": the text to be added to CFI says that the general rule of redirecting repetitions "may be overriden by consensus, for example where a variation having four repetitions is more common", but there is no automatic exception to the rule. - -sche (discuss) 18:59, 14 February 2014 (UTC)
 * I… missed those bits somehow. I'd prefer the last paragraph of the text be more automatic ("any editor can… subject to being overridden by consensus" or something). But I suppose this is okay, too. Consider my vote unconditional. &#x200b;—msh210℠ (talk) 20:12, 14 February 2014 (UTC)

Oppose

 * 1)  I think this proposal has drawn some very reasonable inclusion criteria. But I ultimately think CFI should be modified to completely disallow entries for repetitive forms. I don't think entries like dinoosaur, dinooosaur, and dinoooosaur ("Why did you like the movie?" "It had dinooosaurs.") add value to Wiktionary (and I say this as someone who considers myself an inclusionist). There are infinitely more possible repetitive forms than we have proper entries, and so I think this policy change would open the door to what I view as a lot of unnecessary clutter. I also foresee it adversely impacting the usefulness of the search bar's autocomplete feature. -Cloudcuckoolander (talk) 07:56, 2 February 2014 (UTC)
 * Re: "There are infinitely more possible repetitive forms than we have proper entries, ...": Any evidence? Recall that this vote only deals with attested repetitive forms. Your "infinitely more" I read as "several times more" to make it meaningul and testable. Your example "dinooosaurs" is unattested, AFAICS. --Dan Polansky (talk) 07:58, 2 February 2014 (UTC)
 * You don't know that any given repetitive form is unattestable until you take the time to look for cites and rule it out. There are thus an infinite number of possible repetitive forms to be considered. And even if a very small fraction of those infinite possibilities turn out to be attestable, that could still mean the addition of a very large amount of what I view to be clutter to Wiktionary, and a lot of time expended trying to attest things like dinooosaur (this was just an example) when we could be making headway in other areas. -Cloudcuckoolander (talk) 08:42, 2 February 2014 (UTC)
 * First, the not-so-useful figurative language aside, there is not an infinite number of attested repetitive forms, since the number of items in all man-made records of language including the web is finite. Furthermore, you have not provided any evidence supporting your claim that there is a huge number of attested repetitive forms; I do not believe there is. --Dan Polansky (talk) 09:08, 2 February 2014 (UTC)
 * On the flipside, you're asserting the existence of a small, manageable number based on your own reasoning (i.e. without hard empirical evidence). The only way to conclusively establish anything either way would be to try to systematically attest every imaginable repetitive form from aaardvark down. I don't think that's feasible. -Cloudcuckoolander (talk) 09:43, 2 February 2014 (UTC)
 * I think the burden of proof is with those who assert existence, not non-existence--that is, existence of a huge number of entries; there being some attested repetitive entries has already been proven. If you ask that a policy be changed, you should provide more than a wild speculation, I think. --Dan Polansky (talk) 12:32, 2 February 2014 (UTC)
 * I think Cloudcuckoolander misses the point. Under the current CFI, if dinoooosaur is attested then it gets an actual entry. Under this proposal, it gets only a redirect. For the record, of course, dinoooosaur gets zero Google Books hits. bd2412 T 14:01, 2 February 2014 (UTC)
 * I acknowledge that there is currently a policy vacuum and that this proposal seeks to address that situation by setting some guidelines. That's basically what I was trying to get across with the "drawn some very reasonable criteria for inclusion" comment. But I personally don't see explicitly allowing entries for some repetitive forms as being preferrable to implicitly allowing entries for all repetitive forms. I think such entries would take up space without adding any informational value to Wiktionary, and thus I think policy should be modified to disallow them entirely. I don't see redirects as a solution, either. -Cloudcuckoolander (talk) 18:19, 2 February 2014 (UTC)
 * It is always possible that a non-native English speaker will come across pleeease or hahahahaha or gooooo in a book (as they are out there) and not know what they mean, since they are different from their usual representations. This way, at least, there's an entry explaining that gooooo is an emphatic variation of go and not of goo, and explaining the actual meanings of the others. bd2412 T 22:50, 2 February 2014 (UTC)
 * 1) [[Image:Symbol oppose vote.svg|20px]] Oppose The criteria of three repetitions is completely arbitrary and doesn't make sense in any form. -- Liliana • 14:22, 9 February 2014 (UTC)
 * Couldn't the same be said of the three-citations rule of CFI? Equinox ◑ 22:59, 9 February 2014 (UTC)
 * That one makes sense to determine if a word has entered the English lexicon and thus is worthy of inclusion. In practice, at least, it seems to work fairly well. But so far, no one has explained how hahaha is any more worthy of inclusion than hahahahaha. Where's the difference? -- Liliana • 23:07, 9 February 2014 (UTC)
 * The three repetitions limitation is not arbitrary, but is selected because it is very uncommon for words to have more than three repetitions of a letter or syllable in circumstances other than an emphatic form of a normal, shorter version. Note that this proposal only applies to words where there is no change in meaning with the addition of more repetitions. Under our current CFI, hahaha is included, and the current CFI would include hahahaha and hahahahaha and hahahahahaha and hahahahahahaha and hahahahahahahaha and hahahahahahahahaha and hahahahahahahahahaha and hahahahahahahahahahaha and numerous others. This proposal is intended to avoid having a large number of duplicative entries where the only difference is that a letter or syllable is repeated, with no change to the meaning of the term. The repetition of syllables is itself arbitrary, but it forms words that we would otherwise include. Of course, this is more pronounced in considering letter repetitions, such as gooo, goooo, gooooo, goooooo, gooooooo, goooooooo, and so on. Note that defaulting to the two-letter repetition would not be useful, because in a case like this, it would be to the existing word goo. Three is both the least that can be used without a likelihood of stepping on other words, and the most that is needed to indicate that the repetition is merely for emphasis. bd2412 T 16:06, 10 February 2014 (UTC)
 * Hmmmmmmmmmmmm1, I originally thought it was arbitrary as well, but that seems like such a well thought-out argument, that it makes a lot of sense especially with English and languages based on the Latin alphabet. It's extremely rare for an entry to contain more than two vowels and still be considered a separate word with a distinctly different meaning, in most of the European languages, and I would almost claim nonexistence. BTW, I fixed your link above. TeleComNasSprVen (talk) 08:21, 11 February 2014 (UTC)
 * 1)  How would this proposal deal with cases like, which is attested as a repetitive–emphatic form of both  and ? It can't hard-redirect to both. IMO, this regulation requires too many exceptions, and provides too little benefit, to be worth instituting. BTW,  I'm sorry I didn't raise this objection at an earlier stage, before the vote had begun. — I.S.M.E.T.A. 16:59, 23 February 2014 (UTC)
 * The proposal specifically only addresses terms "having no other meaning in any language". If this claimed use of "goooooo" can be attested, then the rule would not apply to it, and we would instead have an entry at goooooo with two definitions, one being "an emphatic form of go" and the other being "an emphatic form of goo". This has no bearing on whether we should also have entries for the single-meaning attested words, goooooal and goooooooal and goooooooooal. bd2412 T 01:50, 24 February 2014 (UTC)
 * Sorry, my mistake; I had read "having no other meaning in any other language" into the text of the proposal. You're probably right that, , and (all of which are b.g.c.-attestable) are only attestable as repetitive–emphatic forms of the English word . But what about interjections generally? Interjections are a class of words particularly susceptible to letter-repetition for emphasis, and they also have the translinguistic property of tending to be short words. For example, consider ja and jaa, whose pages have entries in thirty-eight and three languages, respectively. In various languages these words have meanings akin to the English umm, yes, oh, and aye. Those are exactly the kinds of words you would expect to see lengthened for emphasis, and my contention is that letter-repetition is a common method of expressing that lengthening in writing, in languages that use the Latin alphabet. Consequently, one should expect to see many homographic repetitive–emphatic forms across languages (far more than within languages), all of whose entries will be exceptions to the regulation you propose. (To express this concretely, I mean, for example, that there will be pages for strings like jaaaa, jaaaaaa, jaaaaaaaaa, jaaaaaaaaaaaaaa, etc. which will be repetitive–emphatic forms in multiple languages, representing ja and/or jaa.) There is another problem with this, which is that whereas unique repetitive–emphatic forms redirect to the repetitive–emphatic form with three repetitions, non-unique repetitive–emphatic forms (soft-) redirect to the lemma form; to deal with this, will there need to be duplicated usage notes, both at the lemma's page and at the three-repetition form's?
 * As I wrote above, it is my opinion that this proposed regulation will add too much complexity (because of the large number of exceptions to it that will exist), whilst providing proportionately too little benefit. The benefit it provides (according to your post above in response to Liliana, timestamped: 16:06, 10 February 2014) is that it "avoid[s the English Wiktionary] having a large number of duplicative entries where the only difference is that a letter or syllable is repeated, with no change to the meaning of the term." I agree that entries for repetitive–emphatic forms have little value; the worst thing about them is that they create undesirable blue links and their presence clutters up the top–right-hand search bar's autocompleted-results list. However, those drawbacks are features of soft- and hard-redirect entries alike. Conversely, hard-redirect entries have the added drawbacks of being both less informative and far easier to create than soft-redirect entries (alongside the attendant problems I described in my first paragraph). If entries of any kind (hard-redirect ones included) for repetitive–emphatic forms are worth having, then they should be attested and (at least minimally) informative.
 * Consider the analogy of how to treat inflected forms. Verbs across the Romance languages share the same spellings for a lot of their conjugated forms, which is one very good reason for giving them short, minimally informative, grammatical form-of definitions which then link to the lemmata for the given languages. But why don't we just hard-redirect pages for strings, when the string in question only exists as the spelling of one form of one word in one language? — Pages like respondebat, which only exists as the Latin, which is the third-person singular imperfect active indicative form of ? Why isn't that the default regulation, with soft-redirects being created only in the (very many) exceptional cases? (I admit that this case isn't strictly analogous, because those words are usually pronounced differently from the lemma, and thus, their entries can be furnished with useful, non-duplicating pronunciatory transcriptions; although, in practice, they seldom are. Likewise, there is nothing to stop entries for repetitive–emphatic forms being given anagrams sections, for example.)
 * Please excuse the very great length of my response. I hope it doesn't get 'd by everyone. <tt>:-S</tt>  — I.S.M.E.T.A. 17:14, 24 February 2014 (UTC)
 * The question, in my view, is whether this is an improvement over the rules as they currently stand. Under the CFI as it exists right now, we should have individual identical entries for the two dozen or so attested variations of "goooal" having additional instances of the letter "o", and the similarly formed attested variations of "pleeease" and "heeelp" and "hellooo". With respect to emphatic elongations, I don't think that a hard redirect to a shorter emphatic form would be any less informative than an individual entry containing exactly identical content. <i style="background:lightgreen">bd2412</i> T 21:21, 24 February 2014 (UTC)

Abstain
--Dan Polansky (talk) 10:00, 1 February 2014 (UTC) Challenge accepted and by the way BD2412 there's already a self-generated report at Special:ListRedirects. TeleComNasSprVen (talk) 01:36, 3 February 2014 (UTC)
 * 1) * Any reason in particular? The current CFI defaults to having an individual entry for every attested form (e.g. gooooooal, goooooooal, gooooooooal, goooooooooal, gooooooooooal, etc.). <i style="background:lightgreen">bd2412</i> T 00:29, 2 February 2014 (UTC)
 * I don't believe hard-redirects for repetitive forms are better than soft-redirects. For soft-redirects, you do not need to add anything to CFI, so you can avoid making CFI longer and more complicated. I do not recall anyone explaining why having hard-redirects for those attested forms is preferable to having soft-redirects. We are planning to have soft-redirects for plethora of obsolete forms (such as those of the word knowledge); we are having soft-redirects for inflected forms; we are having soft-redirects for romanizations; I do not see why the repetitive forms should get a different treatment. --Dan Polansky (talk) 07:22, 2 February 2014 (UTC)
 * Hard redirects do not need to pass RFV, and therefore we can go and create hard redirects for everything from gooooal to gooooooooooooooooooooooooooooooooal without verifying that each one of them actually exists. Not to mention that none of these spellings will ever waste anyone's time being nominated for RFV. --WikiTiki89 07:33, 2 February 2014 (UTC)
 * Not really. The vote only treats attested forms, as per its text. So unattested forms still can get RFV and fail. At least one of the supporters of the vote wanted to have the attested forms completely removed from Wiktionary, real or not, attested or not. --Dan Polansky (talk) 07:39, 2 February 2014 (UTC)
 * Since when do we RFV hard redirects? --WikiTiki89 07:45, 2 February 2014 (UTC)
 * I do not know of any precedent of either RFV-failed or RFV-passed hard-redirect. However, I see no reason to protect hard-redirects from RFV. Creating hard-redirects for unattested "goo{n}al" for a large number of n such as 1...1000 seems a truly silly and useless idea, and I oppose this. In general, I oppose creation of unattested hard-redirects. The things is, hard-redirects are anomalies. Today's CFI does not treat them in any particular way. I still see no reasoning supporting the claim that hard-redirects are a good thing; if they were really exempt from RFV, that would make them a truly bad thing in my eyes.
 * Be it as it may, this vote only treats attested forms, as per its text; it says nothing of how we should treat unattested forms. I still do not see you acknowledging as much. --Dan Polansky (talk) 07:54, 2 February 2014 (UTC)
 * This vote allows us to create hard redirects despite the form being attested. And you can't create a "goo...oal" with 1000 o's because the maximum entry name length will stop you before then. --WikiTiki89 07:59, 2 February 2014 (UTC)
 * Do you acknowledge that this vote does not treat unattested forms? --Dan Polansky (talk) 08:05, 2 February 2014 (UTC)
 * Yes, but as you said, CFI doesn't treat unattested hard redirects at all anyway. The only hard redirect rules I know of are (1) that they need to make sense and (2) they should not get in the way of other words that meet CFI. --WikiTiki89 08:07, 2 February 2014 (UTC)
 * Why you do state something that this vote does not treat as an advantage brought about by this vote? --Dan Polansky (talk) 08:11, 2 February 2014 (UTC)
 * It is an advantage brought about by this vote. We no longer need to differentiate between attested and unattested forms that meet the other requirements. --WikiTiki89 08:15, 2 February 2014 (UTC)
 * I find your response implausible, but I will drop this particular item, since I did as much as I feel I could to make the point and get the message home.
 * On the desirability of including unattested repetitive forms (of which this vote does not treat): Only a tiny fraction of repetitive forms are attested. If hard-redirects of unattested repetitive forms were allowed, this would lead to an inclusion of a hard-to-overview gigantic list of items, including the repetitive forms of unattested "dinoosaur". The creators of these unattested redirects could just pick a vowel in any of a huge variety of words and repeat it at whim. So I find it hard to believe you find an indiscriminate inclusion of hard redirects for unattested repetitive forms a good thing. --Dan Polansky (talk) 08:23, 2 February 2014 (UTC)
 * But since there's still no policy on unattested hard redirects, you can also delete any that are not useful. --WikiTiki89 08:41, 2 February 2014 (UTC)
 * A nice turn the other way around.
 * In any case, let it be made on record that, AFAIK, the common practice is to exclude unattested hard-redirects; I know of no counter-example, that is, included unattested hard-redirect. --Dan Polansky (talk) 08:48, 2 February 2014 (UTC)
 * Generally, no one knows much about what kind of hard redirects exists, because people rarely stumble upon them. --WikiTiki89 08:56, 2 February 2014 (UTC)
 * My hypothesis that, as of now, there are no unattested hard redirects is a strongly universally quantified hypothesis, and thus emenable to Popperian falsification. Thus, go ahead, find a single counter-example, and I will have no other option than to stand corrected. --Dan Polansky (talk) 09:11, 2 February 2014 (UTC)
 * Well, I've found a couple below, now you go check to make sure. TeleComNasSprVen (talk) 10:30, 16 February 2014 (UTC)
 * Probability says there has to be one. But the situation is complicated by you being able to delete any example I find, not to mention me being able to create an example at will. --WikiTiki89 09:14, 2 February 2014 (UTC)
 * Re: "Probability says there has to be one." It does not. Probability does not say that there necessarily is one. If you create a new one at your will, it will obviously not count, since I expressly said "..., as of now,...". Aren't you ashamed of your lame lawyering, trying to find a loophole even when there is none? I don't think I will write any more answers to your posts in this thread. --Dan Polansky (talk) 09:22, 2 February 2014 (UTC)
 * I'm not trying to find a loophole, I'm just explaining why it's a waste of my time to look for a hard redirect that would not pass RFV. Not least because determining that something does not pass RFV usually requires RFVing it. --WikiTiki89 09:27, 2 February 2014 (UTC)
 * Hogwash. A form with zero Google web hits is unattested; no need to go for RFV. --Dan Polansky (talk) 09:32, 2 February 2014 (UTC)
 * That's even more unlikely to find than something that does exist on the web, but without enough valid citations. --WikiTiki89 09:46, 2 February 2014 (UTC)
 * I bet that a bot could be programmed to find all of the hard redirects in Wiktionary, test them against Google Books, and generate a report. <i style="background:lightgreen">bd2412</i> T 14:04, 2 February 2014 (UTC)

Decision

 * Passes with 10 voters in support, 3 in opposition, and 1 abstention. That means 71% of the users who edited this page supported the proposed amendment of CFI, 21% opposed it, and 7% abstained; alternatively, if (per our usual practice) abstentions are not counted, 77% of users supported the proposal, 23% opposed it. - -sche (discuss) 23:53, 17 March 2014 (UTC)
 * The first counting should not be provided at all, since it is irrelevant; it is confusing the reader to start a list of countings with one that is irrelevant, and introduce the relevant one with "alternatively". --Dan Polansky (talk) 12:07, 22 March 2014 (UTC)