Wiktionary talk:Votes/pl-2022-01/Handling of citations that do not meet our current definition of permanently archived

Online-only sources
@Kiwima Regarding the line “if sources are online only, media explicitly allowed by a vote on the RFV page lasting at least two weeks from the time the term is cited”, I am interpreting this as: someone brings online-only quotations for any term at that time at RFV, which would not be accepted by the "durably archived" standard; so editors vote on whether to accept it ⇒ if the online-only source is good and legitimate and not utter nonsense then editors vote support for it and it is accepted as a source. I slightly dislike this, because at the RFV page it may not get the attention it needs and vote may just pass without anyone caring. I think a better solution to this might be a BP discussion in order to allow any website (so that it gets more eyes) and allowing all websites from google news search. —Svārtava [t•c•u•r] 04:28, 6 January 2022 (UTC)
 * If there is a better place to hold such votes, I am happy to accommodate it. We could even create a new page for such votes. I proposed the RFV page, because that is where words are challenged and then deleted because the sources are all online, in spite of the words being obviously in common use. I believe that most people who care about whether words that have already been added are actually used visit the RFV page at least semi-regularly. I am not sure that BP gets more eyes than RFV, and yet another BP discussion is unlikely to move us any further to resolving this issue than the previous ones have. Kiwima (talk) 00:17, 7 January 2022 (UTC)
 * @Kiwima What about allowing the websites in google news search? —Svārtava [t•c•u•r] 03:21, 7 January 2022 (UTC)
 * I am certainly happy with that, but it does not completely solve the problem. Terms like PMV, sniddy, Talibro, etc. are in fairly common use, but within different internet subcultures. Kiwima (talk) 20:45, 7 January 2022 (UTC)
 * @Kiwima I mean, allowing all google news search websites + any other website(s) approved through a vote on RFV page. I think that would be an improvement too, and news websites are almost never utter nonsense, and the other websites can be voted upon. —Svārtava [t•c•u•r] 03:07, 8 January 2022 (UTC)

Vague
What was formerly a reasonably objective process will instead cater to the whims of the moment. We can mass import definitions from Urban Dictionary. Better to have a standard that distinguishes professionally edited material from keyboard spew. For professionally edited material, like newspapers before they downsized their editorial staff, if it's been on the internet consistently and findably for 20 years it's probably good enough despite being vulnerable to EMP. For the latter -- Twitter, self-published books, online comments sections -- we should adapt the "clearly in widespread use" rule perhaps including mentions in reliably sources as evidence of use. Vox Sciurorum (talk) 18:34, 6 January 2022 (UTC)
 * While the current process is objective, it is becoming more and more inadequate as written material becomes more and more online-only in blogs, e-magazines, etc. We already use a voting procedure in requests for deletion, and no one complains that THAT "caters to the whims of the moment". I very much doubt anyone is going to support mass importation of definitions of Urban Dictionary - there is no way such a vote would pass. As we are supposedly a descriptionist, not a prescriptionist dictionary, we want samples of language as it is used, not as professional editors think it should be used. We basically never use the "clearly widespread use" rule, because no one seems to agree on what that means, beyond words that no one would ever challenge. Kiwima (talk) 00:11, 7 January 2022 (UTC)


 * I see the "clearly widespread use" rule as precisely that: preventing (bad-faith, trollish) challenges of things like or ! Equinox ◑ 15:26, 12 January 2022 (UTC)


 * @Vox Sciurorum I have to say I too was initially nervous about the idea of turning RFV from a relatively objective process to one that is more consensus-based, but the more I think about it, the more I realise we will never find a truly objective solution that properly gets to the heart of this problem. So I'm definitely willing to give this proposal a go. You're always welcome to vote against "keyboard spew" in the individual discussions if this passes - I'm certain you wouldn't be alone in doing that, and in time, people will get a feel for which online sources are considered acceptable and which are not. This, that and the other (talk) 00:59, 9 January 2022 (UTC)


 * With the current formulation, “print, or”, we have to debate whether this is an exclusive or, that is whether there is media that is not print and not agreed per word but still allowable. The first case is the interpretation of “permanently recorded” according to the materialization theory. I emphasized durability as an institutional guarantee. E.g. if a song is, as usual nowadays, distributed across various streaming platforms (paid and non-paid), this is durable for contemporary music measures. But it wasn’t “printed”. Except, ehm, we understand “printed” as widely, as exemplary, for the obligation of a publisher towards its author to copy and disseminate (vervielfältigen und verbreiten) a work.
 * But okay, all is relaxed if it can be voted away.
 * It is strange however that in the current formulation this vote works only “if sources are online only”—not mixed printed online, say one “traditional” printed word plus more?
 * A more concerning issue is that you might need more objective criteria, as Vox Sciurorum’s “standard”, for when a foreign language does not even have the people to vote. Fay Freak (talk) 01:48, 9 January 2022 (UTC)

Efficiency and consistency
I'm happy to see this proposal. It's definitely an improvement over the status quo.

I'm foggy about how to interpret "the time the term is cited". Is that the time the debated citation was added to the mainspace page? It seems like you would want the debate to be allowed to last two weeks from when the RFV was created, instead, since the RFV may be created quite a while after the citation was added to the entry.

A more serious concern is that this proposal implies that every Internet-sourced word would need its own vote to stay on Wiktionary, presuming anybody cares enough to get the RFV process started. That's probably a lot of time spent arguing over substantially the same issues, and opens the possibility of really inconsistent decisions, since different people will show up to different RFV votes. Better to have a preestablished policy that applies to all words, or at least a whole class of words, such as Internet slang in well-documented languages. —Kodiologist (t) 04:08, 10 January 2022 (UTC)


 * I second this, particularly the second half. Rather than fixed criteria about what is "permissible" for attestation, we would have a system of ad hoc votes on every new web-source citation. And because RFV's are routinely archived, there is no easy way to cite prior votes as precedent to justify a new citation in question; lest someone track down the archived pages. Finally, RFV is already overloaded, and new votes on each individual web citation would exacerbate the problem. Imetsia (talk) 20:49, 10 January 2022 (UTC)

Redrafting
@Kiwima as I said above, I am in favour of the principle being espoused here, but upon closer inspection I've realised that the drafting of the proposed policy text is lacking, to the extent that I can't support this vote as it stands. In particular, the proposed new sentence of policy text is somewhat contradictory with the paragraph under the "Attestation" heading that's already in CFI. If the vote passes, CFI will look like this:
 * Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived. We do not quote other Wikimedia sites (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource). When citing a quotation from a book, please include the ISBN.
 * Allowable media
 * Media is considered allowable if it appears in print, or, if sources are online only, media explicitly allowed by a vote on the RFV page lasting at least two weeks from the time the term is cited.
 * Media is considered allowable if it appears in print, or, if sources are online only, media explicitly allowed by a vote on the RFV page lasting at least two weeks from the time the term is cited.

Here we have the paragraph already in CFI listing three allowable categories of sources: Usenet, books and magazines, and audio and video. Then the new policy text groups sources into two different, mutually exclusive categories: print and online-only. The second paragraph seems to allow any print sources - even things like tourist brochures and supermarket catalogues which are not presently accepted - while it is confusingly silent on widely available songs, movies and TV shows, which are presently accepted.

I would suggest amending the vote so it proposes the following change:
 * Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source.
 * As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Other online-only sources may also contribute towards attestation requirements if editors come to an agreement through a discussion lasting at least two weeks from the time the term is cited.
 * Print media such as books and magazines will also do, particularly if their contents are indexed online.
 * Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived.
 * We do not quote other Wikimedia sites (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource). When citing a quotation from a book, please include the ISBN.

My proposed text has two changes from what is currently in the vote - (a) removed the requirement for the discussion to be held at RFV. Perhaps we might want to globally allow, let's say, whitehouse.gov - that discussion would happen at BP. (b) changed "vote" into "agreement" (could be "consensus" too); like RFD it does not need to be a strict vote.

Sorry for the long message! This, that and the other (talk) 14:17, 10 January 2022 (UTC)


 * Sounds good to me. Kiwima (talk) 21:02, 11 January 2022 (UTC)
 * I would suggest removing the line "When citing a quotation from a book, please include the ISBN" or at least change it so that it is non-mandatory. In practise, just a lot of quotes are without ISBN and not even Google books always provides the ISBN so its not always that easy/possible to find. And also,, when do you plan to start the vote? —Svārtava [t•c•u•r] 15:11, 12 January 2022 (UTC)
 * That is just copied from the current text, and is not a requirement. Kiwima (talk) 22:24, 12 January 2022 (UTC)
 * How about changing another line: "As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Quotations from websites appearing in Google news search are also considered valid." or something similar? The websites from Google news search are in 99% times legitimate and I would not want every such news website to be voted on when it is w/o doubt legitimate. —Svārtava [t•c•u•r] 15:16, 12 January 2022 (UTC)
 * That sounds okay to me, but may be more controversial. Kiwima (talk) 22:24, 12 January 2022 (UTC)