Wiktionary talk:Votes/pl-2012-08/Citations from WebCite

I suggest that you transclude this vote on WT:V so it actually gets noticed by a fair amount of people before it starts. --Μετάknowledge discuss/deeds 15:12, 26 August 2012 (UTC)

E-mail
Didn’t you get that e-mail from the owner of WebCite about how many pages are taken down? You should paste it in this vote’s rationale. — Ungoliant (Falai) 15:37, 26 August 2012 (UTC)
 * I've already linked to that discussion and also stated the essence of it in the proposal. Spinning Spark  16:19, 26 August 2012 (UTC)


 * Regardless, I agree with Ungoliant. I think you should put the numbers on the proposal page. There are many people who will not read that far. --BB12 (talk) 17:02, 26 August 2012 (UTC)

In other words . ..
So, as I understand it, this proposal is to allow citations from arbitrary web-sites, but with some extra hoops (higher numeric threshold; and have to register the web-page with WebCite). Is that correct? —Ruakh TALK 16:27, 26 August 2012 (UTC)
 * The proposal is to allow citations from WebCite, as written on the tin. All other requirements remain as before. Spinning Spark  22:19, 26 August 2012 (UTC)
 * Right, but as I understand it, "citations from WebCite" means "citations from arbitrary web-sites", right? I mean, are there web-sites that WebCite will refuse to archive? (O.K., yes, there are some; they respect "do-not-cache" and "no-archive" flags. But broadly speaking, it seems like allowing WebCite approximately means allowing arbitrary web-pages. No?) —Ruakh TALK 22:29, 26 August 2012 (UTC)
 * Wiktionary already allows arbitrary shit -- from usenet. I don't particularly approve of that, but it is not in any way relevant to this proposal. Spinning Spark  22:40, 26 August 2012 (UTC)
 * So, I take it that's a "yes"? —Ruakh TALK 22:45, 26 August 2012 (UTC)

Placement in CFI and first sentence
I'm glad to see you have proposed this!

Would this be a better fit for the attestation section of the CFI?

Also, for the first sentence, how about this: Two citations from WebCite (and other archives the community may deem of similar status) count as one durably archived citation.

--BB12 (talk) 17:06, 26 August 2012 (UTC)


 * Sounds ok to me. Spinning Spark  22:19, 26 August 2012 (UTC)


 * I'm hesitant to change the wording of your proposal. Is that what you would like me to do? --BB12 (talk) 21:04, 27 August 2012 (UTC)
 * I'm fine with either wording, go ahead and change it. Spinning Spark  22:25, 27 August 2012 (UTC)

WebCite's outages
I have some serious doubts about this site's durability, mainly because of technical limitations. For about a month in 2009, and again for a shorter period of time in 2011, WebCite had major outages that rendered all links to it inaccessible. This would make Wiktionary's citation system especially precarious. --Μετάknowledge discuss/deeds 23:05, 26 August 2012 (UTC)
 * Ok, so it went down, but at least someone cared enough about it to get it back online. Libraries sometimes close for maintenance (annoyingly, my town library recently) that doesn't mean that the books inside them are not still durably archived.  Apparently, the main problem was Wikipedia letting bots loose on it which caused them to move to better servers.  That's all good news, and if Wikipedia can no longer succeed in breaking it then probably no one can.  Note also that WebCite claims to share data with digital partners (see here) so in principle citations could be rescued even if WebCite died and went to heaven. <font style="background:#fafad2;color:#C08000">Spinning <font style="color:#4840a0">Spark  10:45, 27 August 2012 (UTC)
 * Yes, but those citations would not be digitally archived any more than any other website, so the point would be moot. Google's servers are more impressive by an order of magnitude, but the fact that an outage happened again after it was "fixed" makes me think that there's no long-term reason that we can rely on it being fixed. --Μετάknowledge discuss/deeds 05:01, 28 August 2012 (UTC)


 * MK, would it more acceptable to you if a disclaimer template were required? --BB12 (talk) 05:13, 28 August 2012 (UTC)
 * It's not that. I trust their citations. It's just that I don't trust their servers. --Μετάknowledge discuss/deeds 05:15, 28 August 2012 (UTC)

Vote posting
This proposal still hasn't been posted on the voting page as MK suggested at the top of this discussion page. I don't know if that means another seven days should be added or not, but it does need to be posted! --BB12 (talk) 00:48, 1 September 2012 (UTC)
 * No, it has been posted. Take a look. --Μετάknowledge discuss/deeds 01:00, 1 September 2012 (UTC)


 * Good grief! I'd looked at that page two or three times over the past week. Thank you for the correction! --BB12 (talk) 01:41, 1 September 2012 (UTC)

Opposition
I am probably going to oppose. I do admit that the "in permanently recorded media" criterion of current CFI serves as a proxy criterion for a different criterion: that the citation is from a material that has been subject to copyediting. While the custom has it that Usenet quotations have the same weight as quotations from materials with ISBN or ISSN, that should not IMHO be the case. That opens the question of whether I am taking a prescriptivist stance. I do not know. I am not sure I want to include diacritic-free Czech spellings ("kocka" instead of "kočka"), which are plentifully available on the world wide web, but not in printed books. I do not know what other consequences the proposed regulation would have. We might give it a try, but if so, I would like the factor to be 10 world wild web quotations to 1 ISBN or ISSN quotation. Even that factor could be low. --Dan Polansky (talk) 13:59, 1 September 2012 (UTC)


 * Ditto. —Ruakh <i >TALK</i > 14:07, 1 September 2012 (UTC)
 * I think that at that point, it's a language's editors' decision. If you don't want an entry for "kocka" to exist (and I don't think it should), then you say so at WT:ACS, and whenever one of those appears, it'll get deleted. --Μετάknowledge discuss/deeds 16:23, 1 September 2012 (UTC)


 * I raised this very point in the original bp discussion ("I can't help feeling that the durability issue is not really at the heart of the objection and what really is at issue here is that we want to exclude random, poor quality webpages from the CFI requirements"). I was assured at the time that that was not the case and only durability was an issue, so I am quite pissed that after spending significant time on this proposal it is now being shot down for that very reason.  Do you really support the situation where illiterate garbage on usenet has a higher credibility than an informative specialist website carefully maintained by qualified individuals? - railwaysurgery.org for instance.  Wiktionary has already let in the worst crap on the internet by allowing usenet without qualification.  What is really needed here is a quality criterion in CFI.  Trying to control the situation by shutting the door on every source that might contain some bad stuff will, if taken to its inevitable conclusion, leave Wiktionary with no citable sources at all. <font style="background:#fafad2;color:#C08000">Spinning <font style="color:#4840a0">Spark  18:03, 1 September 2012 (UTC)
 * I might oppose based among other things on talk:Spinningspark's comment "Wiktionary already allows arbitrary shit". I find that a bit like saying murder already exists in society, so a few more murders won't hurt. Mglovesfun (talk) 18:12, 1 September 2012 (UTC)
 * Except this isn't murder. Chillax. We're just trying to cite more real terms around here. --Μετάknowledge discuss/deeds 18:14, 1 September 2012 (UTC)
 * @Μετάknowledge: Are you sitting down? O.K., good. Then please read Analogy. It's a subtle concept, but once you understand it, it'll totally change your world. —Ruakh <i >TALK</i > 18:22, 1 September 2012 (UTC)
 * I think you know what I meant. That is, I thought you would. Evidently, this needs explanation. Comparing citations to murders is a rhetorical device that I find rather excessive. --Μετάknowledge discuss/deeds 20:02, 1 September 2012 (UTC)
 * Hey, if you can pretend to misunderstand Mglovesfun's comment, then turnabout is fair play. As you very well know, no one is comparing citations to murders; Mglovesfun was simply saying that the existence of some acknowledged evils does not justify the creation of new ones. —Ruakh <i >TALK</i > 20:29, 1 September 2012 (UTC)
 * Yeah, well, therein lies my issue with it. GGC citations are an "evil"? It's news to me that there's any widespread sentiment to this effect. If I do not vote support, it will definitely not be due to this strange feeling that we might as well chuck all the colloquial sources of citations because they tend to have crappier orthography and grammar. --Μετάknowledge discuss/deeds 21:11, 1 September 2012 (UTC)
 * "Arbitrary shit" is an evil. Usenet is an evil to the exact extent that it's describable as "arbitrary shit". Now, we can certainly haggle over the magnitude of the evil — it's a lesser evil, in my opinion, than "chuck[ing] all the colloquial sources of citations" would be — but I think we need to acknowledge that that's what we're doing. —Ruakh <i >TALK</i > 21:55, 1 September 2012 (UTC)


 * @Spinningspark: I think the issue is one of ordering: before we open the door to arbitrary web-pages, we should first figure out what things we value copyediting for. I don't think anyone wants editors to waste their time citing words at RFV, only to have all their cites tossed out a few months later as insufficiently copyedited. —Ruakh <i >TALK</i > 18:22, 1 September 2012 (UTC)

I don't intend to go live with this vote, with this amount of opposition it is just not worth it. It might jsut be worthwile though, testing if there is any chance of reaching consensus with the opposers with a modified proposal.

@Dan Polansky, I would be happy to see the number of quotations to be increased but making it x10 is getting impractical - for a standard entry that would mean 30 citations, and no one is ever going to do that, you might as well not bother. I am sure I could find a large number of entries were it would be impossible to find anything like 30 durably archived cites. One I dealt with recently at RfV; cob for instance, turned up 3 cites for the building method sense, but there is no way I could have found 30, even if I was willing to write them up.

@Mglovesfun, I would like some clarification of your comment "I might oppose based...on Spinningspark's comment 'Wiktionary already allows arbitrary shit.'" Is that because you feel I have insulted Wiktionary? (not my intention) Or do you think that I am trying to allow "arbitrary shit" into Wiktionary? (also not my intention)

@Ruakh and others, do you believe that Wiktionary has some process for eliminating unwanted citations on usenet from being used? If so is there any reason the same process could not be applied to WebCite? Why can we not formalise that process into the CFI? Some have said here that the number of citations required of WebCite should be higher. I think that if this is so, the number shoudl be higher for usenet as well and for the same reason. Do you agree? <font style="background:#fafad2;color:#C08000">Spinning <font style="color:#4840a0">Spark  21:02, 2 September 2012 (UTC)
 * Excellent idea. I think the criterion should be something along the lines of (and this is very much just a draft off the top of my head) "A citation counts if the source in which it appears can reasonably be considered free of spelling errors, including deliberate ones". This would also exclude portions of Flowers for Algernon, for example, which IMO is a good thing. &#x200b;—msh210℠ (talk) 16:27, 10 September 2012 (UTC)


 * Ok I'm a bit late, yes it was a bit of a silly, over-the-top analogy. Mglovesfun (talk) 21:07, 2 September 2012 (UTC)


 * FWIW, I agree with Dan Polansky. - -sche (discuss) 21:42, 2 September 2012 (UTC)

@SpinningSpark: There will always be opposition. It still would probably be a good idea to run the vote, but waiting for us to haggle out a new deal wouldn't be bad. The option of allowing RFV consensus to determine whether or not a certain cite is "arbitrary shit" is a little bit unwieldy, but quite possibly the best choice. --Μετάknowledge discuss/deeds 23:35, 2 September 2012 (UTC)

Either durable or not
If the sources are durable then they should count for a single citation, not half. There is little point, apart from exemplification of use, in citing something that isn't durable. Citations are required to prove that the term is legitimate, and one must be able to inspect this far into the future. If we suspect that the website might not be functional in the next half century, then we shouldn't waste our time using it for any quotations, let alone potentially all external links pointing to random instances of use. I don't mind a handful of pages out of millions being pulled. Governments can also ban books, after all. I don't mind if it cites the web at large. I will consider these as I do Usenet, treating them as unedited works and ignoring them altogether if it's not clear that the authors are independent. If the date cannot be pinned then that might come into play for timespan. But these issues have to be decided on a case by case basis, and a properly cited quotation from, say, a well known blogger gets full credit in my book if I know it will still be there regardless of the whims of the blogger or the opinions of his/her heirs. DAVilla 08:35, 9 September 2012 (UTC)
 * Re: "Citations are required to prove that the term is legitimate, and one must be able to inspect this far into the future": I disagree. First, terms are not "legitimate" or "illegitimate"; they are actually used or not. Second, even quotations from sources that are not durably archived may be sufficient evidence of a term's actually being used. Such quotations can serve as evidence as long as their source is still available. --Dan Polansky (talk) 09:39, 9 September 2012 (UTC)
 * And now we're attempting to predict the future. How can anyone know which sources will remain durable and which will not? Well, that's what we're trying to do. Mglovesfun (talk) 09:42, 9 September 2012 (UTC)


 * Fifty years?! I don't know if I can imagine websites still existing fifty years from now.... --BB12 (talk) 18:38, 9 September 2012 (UTC)


 * I feel confident that the content of some websites will still be around in some form fifty years from now. —Ruakh <i >TALK</i > 00:29, 10 September 2012 (UTC)


 * I think that sort of confidence is wonderful; in my opinion, however, the jury is still out as to whether civilization will outlast my lifetime, much less fifty years. --BB12 (talk) 00:35, 10 September 2012 (UTC)


 * Well, look. If the world ends in December 2012, then we only need to worry about whether a given cite will still be around until then. If civilization continues for the next million years, but a new religious movement sweeps the world in May 2017 that results in the destruction of the Web and of all content that originated on the Web . . . then we only need to worry about whether a given cite will still be around until then. Because Wiktionary itself is a web site; once our content disappears, it won't matter whether it will have used to refer to other content that will also have disappeared. —Ruakh <i >TALK</i > 00:49, 10 September 2012 (UTC)


 * There’s the Butlerian Jihad, which will take place in 15900. — Ungoliant (Falai) 01:00, 10 September 2012 (UTC)


 * My point is that fifty years is excessive. --BB12 (talk) 05:25, 10 September 2012 (UTC)


 * Maybe I'm missing something, but . . . aren't you the one who started talking about "fifty years"? So, rather than making up a time-frame and saying it's excessive, why not describe the time-frame that you deem appropriate? —Ruakh <i >TALK</i > 15:16, 10 September 2012 (UTC)


 * No, I'm not. DAVilla did. (Search on half century). --BB12 (talk) 18:24, 10 September 2012 (UTC)


 * I don’t think durability is binary like that. Nothing is 100% durable. You could track down all the copies of a book and burn them, but that would be quite a feat to pull off, and so books are durable enough for our purposes. Websites archived by WebCite aren’t as durable as books for three major reasons: 1. the author of a website can have his page removed, but according to the e-mail SpinningSpark received this is extremely rare and the page can still be inspected on an individual basis; 2. WebCite’s hardware could fail. This happened before, as MK pointed out, but all this shows is that they are backing up their data properly; 3. WebCite could go bankrupt, but according to their FAQ “WebCite® feeds its content to digital preservation partners [...]”. Clearly the makers of WebCite created good measures to take care of these issues, so even though it’s not as durable as printed stuff (and therefore should require more than 3 cites) it’s durable enough for me to consider it a valid source of citations. — Ungoliant (Falai) 01:00, 10 September 2012 (UTC)


 * I wouldn't say categorically that "books are durable enough". There are a lot of self-published books on b.g.c. that can't be found in any library, and I'm betting that many have less than a dozen copies. Normal, real, book books are durably archived, but some books aren't. —Ruakh <i >TALK</i > 01:06, 10 September 2012 (UTC)

A question for those who opposed
What sort of regulation and limits would it take for you to support allowing websites archived by WebCite as a source of citations? Or is there no way in hell you would accept it? — Ungoliant (Falai) 13:35, 24 September 2012 (UTC)


 * I'm not quite sure, but I think I would require:
 * That the current concept of "misspellings" (whereby we only include them if they're common) be somehow, cautiously, extended to other sorts of errors. (Unfortunately, this is very difficult to do while remaining descriptivist; but if everyone who uses a word uses it with one specific meaning, except for three random people on Internet fora who've confused it with a similar-sounding word, then we really don't need to include the mistake.)
 * That we require a larger time-range for Web citations than for printed matter; if the earliest or latest cite is from the Web, then the time-range between the earliest and latest should have to be, say, three years. (I'd also like this requirement to be applied to Usenet, but obviously that aspect of it wouldn't affect my vote on a WebCite proposal.)
 * —Ruakh <i >TALK</i > 15:15, 24 September 2012 (UTC)

An example of the usefulness of this proposal is the current RfV for chappy. We cannot find durably archived citations (at least not by online searches), but the sense is clearly in use, the uses are clearly not errors, and it is not a neologism or short-term fashion. <font style="background:#fafad2;color:#C08000">Spinning <font style="color:#4840a0">Spark  16:56, 24 September 2012 (UTC)