Wiktionary talk:Votes/pl-2014-06/Allowing attested romanizations

Status quo
AFAIK, there is no policy forbidding attested romanizations from being included. The vote you should have created is "Forbidding attested romanizations unless they have granted an exception". --Dan Polansky (talk) 17:29, 10 June 2014 (UTC)


 * For the benefit of anyone coming here without having read the BP thread, I will note that this subject is discussed in somewhat more detail there. I and some others feel that a vote is necessary to allow romanizations, because until now Wiktionary has tagged romanizations with Template:wrongscript, and/or simply moved them to the correct (native) script and/or deleted them, unless they were specifically allowed by a vote. I have noted that votes have been accepted as necessary even for languages such as Gothic which are more often found in romanized form than in native script, and I have noted that after one vote to allow romanizations of Punic and some other languages failed, a second vote washeld (and passed) before romanizations of Punic et al were allowed. (See also Template talk:romanization of.) On the other hand, Dan interprets WT:CFI as not containing an explicit ban on romanizations. The disagreement is somewhat reminiscent to me of those that occur when users argue that citing news websites / blogs / etc is acceptable because, in their interpretation, WT:CFI does not contain an explicit ban on news websites / blogs / etc. - -sche (discuss) 23:34, 10 June 2014 (UTC)
 * It's not really reminiscent. News websites are considered not to be permanently recorded media, a term used in WT:CFI. There is nothing in WT:CFI that corresponds to a ban on romanizations; in particular, there is no exclusion regulation in WT:CFI that uses a broader phrase of which romanizations would be a special case.
 * On another note, the fact that votes were created to explicitly allow romanizations is insufficient evidence for romanizations being forbidden before these votes; it is merely an evidence that at least one editor deemed the inclusion of romanizations controversial, or that he intended to have the inclusion explicitly codified. Similarly, the existence of Votes/pl-2014-04/Keeping common misspellings is no evidence for the claim that, before the vote, common misspellings were excluded from Wiktionary per policy or common practice; contrary is true; I created the vote since some editors started to vote in RFD for exclusion of common misspellings, without linking to relevant policy and contrary to previous common practice. --Dan Polansky (talk) 08:57, 14 June 2014 (UTC)

Drastic simplification
The vote creates an impression of complexity where there is rather little, IMHO. The proposal seems to say not much else but this:
 * "Romanizations shall be subject to WT:CFI, including WT:CFI and WT:CFI, rather than being excluded by default. For some languages, romanizations can be included even if unattested as long as the native-script form being romanized is attested, as per votes establishing that on a per-language basis."

--Dan Polansky (talk) 17:37, 10 June 2014 (UTC)


 * I think drastic simplification of this vote is unlikely to happen. But since multiple editors maintain that romanizations are excluded by default, the following vote should pass with their support: Votes/pl-2014-06/Excluding romanizations by default. The wording of the vote is simple by design. By contrast, Votes/pl-2014-06/Allowing attested romanizations may fail over a disagreement over wording and its implications. --Dan Polansky (talk) 07:04, 14 June 2014 (UTC)

Wording: is attested three times
The wording "is attested three times as per WT:CFI#Attestation" should ideally be improved, IMHO. It incorporates part of WT:ATTEST without incorporating other parts: it incorporates three times, without incorporating e.g. conveying meaning. Furthermore, it misleads, since being attested involves having the requisite number of independed quotations; on a strict reading, "being attested three times" does not really mean anything. IMHO, "is attested as per WT:CFI#Attestation" is the best one can do, since WT:CFI#Attestation already specifies what "attested" means.--Dan Polansky (talk) 18:22, 10 June 2014 (UTC)


 * The issues with saying "is attested as per WT:CFI" are that (a) WT:CFI considers words attested if they're used in well-known works, and moreover (b) words in most of the living languages Wiktionary covers are considered attested if they have "only one use or mention". If one wishes, as BD does, to require that romanizations of words in these languages be attested by three (or more) uses, then it is necessary to spell that requirement out, since it is higher than the requirment that would be imposed by saying "attested per WT:CFI". - -sche (discuss) 18:56, 10 June 2014 (UTC)


 * Re "conveying meaning": I'm not sure it would be wise to include wording about "use" or "meaning" in this vote. If such wording is included, and the vote passes, the users who have already expressed the opinion that romanizations are not "uses" and/or don't convey meaning will presumably continue to express that opinion and thus argue that the vote has not actually allowed some or all of the romanizations which proponents probably read it as allowing. In short, I'd like this vote to clarify the inclusion or exclusion of romanizations, and I think such wording would actually invite continued unclarity and disagreement. - -sche (discuss) 19:08, 10 June 2014 (UTC)
 * I disagree. Compare the following:
 * But, just as always when faced by crisis, the peasant sought solace in the rodnoi village and in the all-protective commune.
 * But, just as always when faced by crisis, the peasant sought solace in the gkkkkg village and in the all-protective commune.
 * I believe that the average reader will assume that the "rodnoi" in the first sentence means something (which they may need to look up in a dictionary), while the "gkkkkg" in the second sentence is meaningless gibberish. Obviously, every word in "ih'dinā l-ṣirāṭa l-mus'taqīma ṣirāṭa alladhīna anʿamta ʿalayhim ghayri l-maghḍūbi ʿalayhim walā l-ḍālīna" or "Henansheng 1937 xianyi shiling zhuangding tongji biao. Zhongguo dierlishi danganguan" conveys meaning, even if assistance is needed to understand that meaning. bd2412 T 21:19, 10 June 2014 (UTC)


 * I disagree with this criterium and I won't be supporting it, except perhaps as a stepping stone. Attestation of the romanization itself should not be required in the case that the transliteration follows a regular and established scheme, such as IAST for Sanskrit. I don't see the practical value in requiring attestation for transliterations separately, it would increase maintenance for us and not serve any purpose to our users to exclude them. After all, does Wiktionary's quality really improve in any way by excluding them? That should really be our primary concern. The criteria we use for Gothic or Japanese are much more workable. 21:21, 10 June 2014 (UTC)
 * There's a separate vote proposed for Sanskrit that doesn't include this provision. However, to the extent that common lay romanizations exist (and do not necessarily follow "a regular and established scheme"), their inclusion should be CFI attestation-based. Including romanizations of certain languages, Cyrillic languages for example, seems to engender more opposition. bd2412 T 21:41, 10 June 2014 (UTC)
 * For Cyrillic I imagine the main problem is that although transliteration standards exist, there are many of them and even then people widely use nonstandard schemes too. Even Wiktionary uses a nonstandard transliteration for some Russian words... 22:56, 10 June 2014 (UTC)
 * Here we go again. None of "standard transliterations" is used in Russian-foreign language dictionaries. Languages like Russian, Greek are considered easy, in term of the script. If a transliteration is used, as in textbooks or phrasebooks, it is phonetic, not literal and is always customised for specific books, so is Wiktionary transliteration for Russian. There is no need for romanised Russian or Greek entries, anyway. Specific language policies, common practice and common sense dictate that e.g. Russian is written in Cyrillic, Greek uses Greek alphabet, Hindi is written in Devanagari and Arabic in Arabic script, not in Roman letters. --Anatoli (обсудить/вклад) 23:37, 10 June 2014 (UTC)
 * We aren't looking to add these because these words need some system of transliteration (at least, that's not my thinking). The whole purpose of maintaining an attestation requirement is so that our entries reflect words as they are used in the real world. In other words, we have entries because readers will see things that look like words that they would reasonably expect that a dictionary would helpfully define for them. I say, let's be helpful. bd2412 T 00:49, 11 June 2014 (UTC)
 * For a simple Russian word like, you will find xorošó/xorošo, horošó/horošo, khorošó/khorošo, khoroshó/khorosho, or more phonetically also appearing in books, phrasebooks - kharasho or harasho. They are all unnecessary, they are not in the native script and we don't cater for all possible transliterations for each language in proper, native script entries, so these are not even searchable.
 * I'm worried about the future state of Wiktionary, if we allow various transliterations as entries, it looks it's going to be at the expense of native scripts. Users and editors will simply believe that it's OK to write in Roman letters in any language, as was the case with Pinyin and Romaji entries. The active proponents of these introductions are not even actively working in foreign languages, such as Sanskrit, and not planning too, only worried about their romanisations. It's a worrying trend that arguments used that Sanskrit should be written in Roman letters, not Devanagari. Search functionality can be improved, for that we don't need mass-introduction of romanised entries in languages, normally not written in Roman. --Anatoli (обсудить/вклад) 01:06, 11 June 2014 (UTC)
 * Correct me if I'm wrong, but uses in dictionaries and phrasebooks would not count as uses for the CFI, would they? If there is a "worrying trend" to address, it seems to me that it is the practice of authors in the world generally romanizing these scripts. We are merely here to provide definitions for words used by authors. It might be just as worrying that we include common misspellings, which might encourage people to think that these spellings are legitimate, or that we include grammatically disfavored constructions like ain't and eye dialect spellings like cuméquié. I would also note that requiring strict attestation (which the vast majority of our entries do not have) means that we will not be autogenerating these entries, but that editors who wish to make them will have to make them based on finding the attestations first. I do not anticipate a flood of new entries based on this criteria; rather, I expect that very common transliterations and transliterations that are confusingly similar to existing words will be the ones to be made. With respect to these romanizations, would it be helpful if the entries contained a note stating that these words are not usually written in their romanized form? bd2412 T 01:56, 11 June 2014 (UTC)
 * A published phrasebook or textbook may meet CFI by the current proposal (they would also be uses, not mentions). I think hard redirects, using would be sufficient, if the existing search functionality is not helping. If a user successfully arrives at the proper script entry, I don't think they need to be further explained that the Roman spelling they used is the transliteration (usually appears in brackets or in special boxes). Common misspellings are not the same thing, that's a native feature of languages and they are common by definition. I don't quite follow your real intention, motive or interest, looking at your various posts. Are you interested in Sanskrit? Are you learning it? Do you have trouble finding entries? Is it a problem only for words that use diacritics? Any chosen Wiktionary transliteration may not cover all possible ways a transliteration of a word may appear in print, a Sanskrit term ददाति may appear as "dádāti" or as "dadāti". If a word is a proper noun, it may be even considered an English word, it may qualify for inclusion, even if it uses Sanskrit-specific diacritics. --Anatoli (обсудить/вклад) 02:18, 11 June 2014 (UTC)
 * The problem is, I'm not interested in Sanskrit at all. I am interested in being able to define words that I come across. As I have explained elsewhere, I specifically came across mahā while fixing disambiguation links on Wikipedia, and needed a Wiktionary entry to point some of those links to. I searched here, but putting "mahā" in the search engine unhelpfully took me to maha; I was only able to find mahā by using my admin bit (which most readers will not have) to look at the previously deleted article at this title. If mahā happened to be a word in any language written with Latin script, I wouldn't have found it that way either. As it happens, this is not the first transliteration for which I've seen the issue of inclusion arise. I created the entry ayubowan all the way back in 2006. I'm sure I just came across it in a book or a newspaper somewhere. It was there as a Sinhalese entry for over seven years before anyone objected, and after an RfD it was ultimately kept as an "English" word. It may well be a "regional" loanword, but I continue to find it strange that it sits here as part of the English corpus, not the Sinhalese. bd2412 T 02:48, 11 June 2014 (UTC)
 * ayubowan was found to be an English word by Wiktionarians, whether it's true or not, it's beyond the point of this discussion. The proper Sinhalese entry ආයුබෝවන් can be found either by using "ayubowan" or "āyubōvan" in the search, regardless whether ayubowan exists or not. The same story is with "mahā" -mahā, which can link you to महत्, మహా or maha (which now has ). Whether it's easy or hard to find words by their transliterations, it's a technical feature of Wiktionary. It's always harder to find shorter words, even if they're spelled in Roman letters. --Anatoli (обсудить/вклад) 03:00, 11 June 2014 (UTC)
 * Flip the ayubowan situation around, though. If that is "English", what is to prevent the inclusion of every transliteration from any language found in English running text under the theory that it is an "English" loanword? Isn't that just as bad (and apparently permissible, as of now)? bd2412 T 03:38, 11 June 2014 (UTC)
 * I don't think it's the same. English, more than other languages, absorbs words form all over the world. English speakers in Sri Lanka use "ayubowan" even if they speak English, it gives their speech a special flavour. In short, if a word penetrated English (or other language) in some form, then it's English. As I said, it's not an RFV discussion, I'm not trying to prove that "ayubowan" is an English word, if you think that it was not permissable, you can reopen the RFD/RFV. Every transliteration may only be applicable to specific non-Roman languages. It needs to be proven that that transliteration IS indeed an English loanword (without double quotes). Yes, it would be very bad, not just as bad, to report transliterations as English words. You have to be clear what you're really trying to achieve, either have L2 English entries as transliterations or another language entries (soft/hard redirects/full-blown entries). --Anatoli (обсудить/вклад) 03:51, 11 June 2014 (UTC)

Romanizations allowed by other votes
Should a technical note be added to this vote to make explicit that romanizations which are allowed subject to lower requirements (e.g. Punic romanizations and toned Pinyin, which per other votes are not required to be attested at all) are not affected by this vote and continue to be allowed subject to those lower requirements? I don't think it's necessary, but it might prevent people who don't read through the reams of RFD, BP and talk page discussion from thinking that this vote would apply attestation requirements to Punic, Pinyin, etc. PS, "lower requirements" could be changed to "other requirements" if people want to keep the above-mentioned ban on multi-syllabic toneless pinyin intact. - -sche (discuss) 22:31, 10 June 2014 (UTC)
 * There should probably be a link to the agreements that were made in the past concerning transliterations, and explicitly state that the vote only applies to the languages that are not covered by those past votes. 22:57, 10 June 2014 (UTC)
 * I don't think that much specificity is needed. (For one thing, it would require tracking down all the votes Wiktionary has had on transliterations.) And, as I stated, some people may want to overturn the previous arrangements which instituted higher requirements than this vote. - -sche (discuss) 23:38, 10 June 2014 (UTC)
 * I certainly agree that this is not a vote intended to raise the requirements over those struck in any previous discussion. bd2412 T 00:50, 11 June 2014 (UTC)

One possible format of romanization entries
I don't think votes should decide details of format, so I emphasize that this is not part of the proposal, but I couldn't think of a better place to float this thought about one possible format of romanization entries: Have a template similar to, which by default would just display "Romanization of", but which would have a parameter that could be set to specify particular romanization schemes: We'd have to be careful, though: many romanization systems have considerable overlap, so entries like [[i]] could end up looking like: ...or we could just say something like "Common romanization of" in such cases. In fact, if we were clever and wanted to, we could even make it so that it was possible to specify all those parameters, and the template just knew that if more than some number (say, 3) of schemes was specified, it should reduce the displayed text to "Common romanization of". - -sche (discuss) 03:17, 11 June 2014 (UTC)
 * → IAST romanization of foobar.
 * → Nonstandard romanization of foobar.
 * → Nonstandard romanization of foobar. (Notice where the first link goes.) (Alternatively, if we were very clever, we could make the template recognize that the combination of from=nonstandard and lang=ru should result in the link going to Informal romanizations of Russian.)
 * → ISO 1968, ISO 1995, GOST 1971, GOST 2002, ALA-LC and BGN-PCGN romanization of и.


 * What's the point of envisaging all this chaos? Wouldn't it be more worthwhile to develop gadgets which allow language-specific, transliteration/transcription field-specific searches, or write reverse transliteration modules and use them in an advanced ambiguous transliteration search function? Wyang (talk) 03:54, 11 June 2014 (UTC)

merge the votes
I propose that, instead of having two separate up-or-down votes, the proposal in this vote and the proposal at [[Wiktionary:Votes/pl-2014-06/Excluding romanizations by default]] be merged into one vote that contrasts them and any other options people think of before the vote starts. That would be simpler (one vote) and cleaner (allow people to express their true opinions instead of merely voting yea or nay on the proposers' wording). &#x200b;—msh210℠ (talk) 17:19, 16 June 2014 (UTC)
 * Discussion has continued at [[Wiktionary talk:Votes/pl-2014-06/Excluding romanizations by default]]. &#x200b;—msh210℠ (talk) 23:52, 16 June 2014 (UTC)

User:-sche/svobodnyx
User:-sche/svobodnyx is more than a "modicum of information...". I strongly appose citations and anything other than headers and links:

This below is the modicum, if we follow other (allowed) romanisation entries

svobodnyx


 * 1) Romanization of свободных.

Citations (if they are required), should be moved to citations page. --Anatoli (обсудить/вклад) 03:11, 17 June 2014 (UTC)


 * The only difference between your format and mine, AFAICT, is that mine includes citations. Because this is a vote on including attested romanizations if and only if they are attested, citations have to be present. They could be included on the citations page, with used in the entry, but that seems entirely unnecessary. - -sche (discuss) 04:41, 17 June 2014 (UTC)
 * That's what I meant - on the citations page, not the entry page, otherwise it looks like a full language entry, which it isn't. --Anatoli (обсудить/вклад) 05:04, 17 June 2014 (UTC)
 * Having the citations only on the Citations: pages as you propose is okay with me. --Dan Polansky (talk) 19:09, 18 June 2014 (UTC)
 * This is exactly what I had in mind. Here's a cite, by the way:
 * 1968, Kaufman, V. Sh. O raspoznavanii nekotoryx svojstv kontekstno-svobodnyx grammatik, I-ya Vsesojuznaja konferencija po programmirovaniju, Kiev, 1968. [Title of article reported in Computer and Automation Institute, Computational linguistics and computer languages (1969), p. 92].
 * Cheers! bd2412 T 20:38, 18 June 2014 (UTC)
 * What is your preference regarding the placement on the citations? Is requiring them to be placed on a citations page rather than in the entry OK? - -sche (discuss) 20:59, 18 June 2014 (UTC)
 * I have no preference, but I think it's a non-issue, since citations on the entry page are hidden anyway. If they are on the citations page, then the entry requires a header and link to the citations page, which actually takes up a bigger footprint. See, e.g., noctivagant, which has both. I would add that in my opinion, they serve different functions. Quotes in the entry are definitional, showing the reader how the word is typically used to elucidate its meaning. Citations on the citations page demonstrate the existence of the term for the etymological record, and (ideally) give a sense of how long it has been in use, how current it is, and how it has evolved in usage over time. We've never really hammered these things out, but that seems like common sense to me. <i style="background:lightgreen">bd2412</i> T 22:44, 18 June 2014 (UTC)
 * OK, I have changed the format of the example entry. - -sche (discuss) 01:25, 19 June 2014 (UTC)