Wiktionary talk:Votes/pl-2014-06/Romanization of Sanskrit

Rationale
Since the 20th century, Westerners have mostly printed Sanskrit texts in romanized form. Having entries for the romanizations will help Wiktionary-users find the original-script entries. Plenty more could be said, but that captures the gist, I think. - -sche (discuss) 14:08, 9 June 2014 (UTC)
 * I'd say it's since the 19th century, but it's important to remember (1) that Devanagari has been the de-facto standard for Sanskrit also only since the 19th century (up till then Sanskrit was written in whatever the writer's native writing system was), and (2) that it was primarily Western scholars, not native writers, who established Devanagari as the de-facto standard. So in some ways Devanagari is no less "foreign" to Sanskrit than the Latin alphabet is. —Aɴɢʀ (talk) 12:52, 10 June 2014 (UTC)
 * It's interesting that the discussion about Sanskrit goes from 1. "I need to be able to find a native spelling, so I need a romanisation entry" to 2. "Sanskrit can be/is written in Roman, anyway or there are various scripts used". Is this vote to promote the POV that Sanskrit can and should be written in Latin (i.e. Latin is also native for Sanskrit) or to help users find Devanagari entries? As for the point "Sanskrit was written in whatever the writer's native writing system was", I don't think it's 100% correct. Sanskrit was a literary language for many nations, obviously they have transliterated, quoted it in their own way, in native scripts, similar or comparable to how Latin, Chinese, Arabic or Pali were used in the past. Latin can be cited in Cyrillic, e.g. "ин вино веритас" (in vino veritas), does it make Cyrillic an alternative script for Latin? --Anatoli (обсудить/вклад) 03:20, 11 June 2014 (UTC)
 * I'm not trying to promote a POV at all, merely raise awareness of two facts: (1) the predominance of Devanagari for Sanskrit is only about 200 years old, and (2) use of Latin for Sanskrit is just as old. Older Sanskrit manuscripts were written in the local script of wherever the scribe lived (for example, if the scribe lived in the Tamil-speaking part of India, the manuscript was written in the Tamil alphabet). It wasn't just transliterations of Sanskrit used in quotes, it was the entire text. It's really nothing at all like "ин вино веритас" and much more like Serbo-Croatian, which for a long time could be written not only in Latin or Cyrillic but also in the Arabic alphabet, depending on the writer's religion. The difference is that Sanskrit was originally written in a lot more than 3 writing systems, and by now it's settled down to 1 or 2. I don't know if there are still Sanskrit-language books printed in the Latin alphabet, but certainly in the 19th and early 20th century there were. —Aɴɢʀ (talk) 12:49, 11 June 2014 (UTC)
 * Recently published uses exclusively IAST. Usage of Devanagari for Sanskrit in the West today seems mostly tied to 1) religiously-related usage 2) continuation of 19th-century traditions using ancient textbooks. Since usage of all of the other Indic scripts for Sanskrit is allowed (since Sanskrit is written in them as well), the central issue should be "how to organize such massive duplication of content". The answer is simple: use IAST exclusively (since it's the least lossy) and generate other spellings thence. Additional argument for IAST is that it's culturally neutral and doesn't promote Hindu-only POV for Sanskrit. --Ivan Štambuk (talk) 20:56, 30 June 2014 (UTC)

If the rationale is to “help Wiktionary-users find the original-script entries” based on “Sanskrit texts in romanized form,” presumably according to various romanization schemes, then a solution is not to create single entries “for the romanization of that word (using the scheme(s) we use for romanizing Sanskrit within entries.” This would fail unless our set of romanization schemes is equal or greater than the set of all romanization schemes ever used.

Please go back to the drawing board with this one. A better solution would be to allow a romanization section in an entry that may list all attested and/or standardized romanizations. This idea has been proposed at least a couple of times (by me), and seemed to receive general acceptance in principle, but little enthusiasm. —Michael Z. 2014-06-16 22:07 z 


 * It may be partially possible but it's a bit involved and not everyone can do it. It may not be just enthusiasm but skills, time and priorities. Sanskrit in Devanagari is 100% phonetic, unlike some other languages that use Devanagari, so any Sanskrit transliteration can derive from the script, accept for any attested but non-standard or erroneous transliterations. I now think that having various transliterations inside entries is more preferable than having separate entries for them. --Anatoli (обсудить/вклад) 00:33, 17 June 2014 (UTC)
 * It's not 100% phonetic: Vedic Sanskrit uses special accent marks which we don't use in Devanagari, but which are indicated in IAST transcriptions. However there are some special Devanagari tone marks used in some works which have no equivalents in IAST or any other transcription scheme. I suspect there are similar complications with other Indic scripts that have been/are historically used for Sanskrit. --Ivan Štambuk (talk) 21:15, 30 June 2014 (UTC)

Is attestation required?
Is this a proposal to allow romanizations of all Sanskrit terms, irrespective of whether the romanization is attested? bd2412 T 14:15, 9 June 2014 (UTC)
 * I assume so. It wouldn't make as much sense otherwise, and we do the same for Gothic. 14:30, 9 June 2014 (UTC)
 * I don't see why attestation should be required. These aren't words so I don't see why they would need to be attested. Renard Migrant (talk) 14:34, 9 June 2014 (UTC)
 * How do you define "words" in a way that excludes these from being words? bd2412 T 14:40, 9 June 2014 (UTC)
 * I'd say transliterations aren't words though something can be a transliteration and a word. I was going to say because you can't use it in a sentence in the language, however I see above that there are books written in transliterated Sanskrit, so I recant my statement. Renard Migrant (talk) 15:02, 9 June 2014 (UTC)
 * I appreciate that, but still don't see how you can define words such that "transliterations aren't words". They are groups of letters that convey meaning, are they not? If someone comes across a book referring to years spent as a sādhu, and wants to know what that means, are you going to tell them with a straight face that it has no meaning because it's not a word? bd2412 T 15:24, 9 June 2014 (UTC)
 * If the book says something like "he spent ten years as a sādhu", I would argue (as I have in the BP) that "sādhu" is being used as a word in English. If the book says something like "bhaja sādhu-samāgamam", see my comment below about "vikarṇaśca". - -sche (discuss) 15:48, 9 June 2014 (UTC)


 * (after E/C; @Renard) FWIW, my comment above refers to the fact that Westerners take pre-existing (Devanagari) Sanskrit words/texts, romanize them, and print/type those romanizations because it is easier for them to print/type Latin-script characters than Devanagari ones. If one takes a line such as " अश्वत्थामा विकर्णश्च सौमदत्तिस्तथैव " and renders it "aśvatthāmā vikarṇaśca saumadattistathaiva", the extent to which one can claim "vikarṇaśca" to be a Sanskrit word used in Sanskrit (and the extent to which one can compare such "use" to the use in the previous sentence of "" as an English word in an English sentence) is debatable. Compare works printed for singers in which the lyrics the singers are to learn are printed in the IPA. Compare linguistic works which analyze phonemes and formants and print waveforms of human pronunciations of certain words. Are such waveform-pictures "uses" of the words? The picture is somewhat less clear when native speakers use Latin-script to compose new text their language, but do so only because of technical limitations . - -sche (discuss) 15:48, 9 June 2014 (UTC)
 * The question is, are readers likely to come across uses of the string of letters in running text and turn to a dictionary to find out what it means. I am skeptical that IPA lyrics and pronunciation waveforms can be found in such a state, but in any event would suggest that things that the average person would think of as words should be defined as if they were words. bd2412 T 16:14, 9 June 2014 (UTC)
 * Yes. The same is done for all other alphabetic/abugidic/syllabaric languages for which romanizations are allowed to have entries. The example I gave on RFD was that "if you discovered that one of our Gothic romanizations had 0 attestations at Google Books, Groups, etc, we would still keep it as long as it was derived from an attested native-script form according to the rules of Gothic transliteration." Phoenician is the same. Only Chinese characters are a bit different (for reasons which are hopefully obvious). - -sche (discuss) 15:48, 9 June 2014 (UTC)

What if the word is attested using a transliteration different from our module?
This seems to say that we will have entries for transliterations formed in accordance with the transliteration system used in our existing Sanskrit entries. What if there are Sanskrit words found in print for which authors have tended to use different transliterations (e.g. used a subtly different accent over a certain letter)? Would that be includable? Would it be an alternate spelling? bd2412 T 16:18, 9 June 2014 (UTC)


 * We could allow multiple romanizations, as we did with Gothic when we allowed both th and þ for 𐌸 and both hw (as in [[hwar]]) and <tt>ƕ</tt> ([[ƕar]]) for 𐍈 ([[𐍈𐌰𐍂]]). But note that some users were not happy about having multiple romanizations of Gothic! Hopefully many people will join in this discussion and give us an idea of how they feel about having multiple romanizations of Sanskrit. Also note that for Gothic, the decision was made to never include diacritics in entry titles, so e.g. 𐌵𐌹𐌽𐍉's headword line gives its romanization as <tt> qinō </tt>, and we have an entry [[qino]] but not an entry [[qinō]]). Diacritics are not contrastive in Gothic—ō and o are always romanizations of the same letter. For Sanskrit, it might make sense to allow both diacritical and diacritic-less romanizations, the way we allow both toned and toneless pinyin. But inasmuch as we don't, AFAIK, allow pinyin with just any variant diacritics, I tend to think we shouldn't allow just any diacritical variation of romanized Sanskrit, either. - -sche (discuss) 18:08, 9 June 2014 (UTC)


 * In keeping with Dan Polansky's comment below, my inclination would be to have entries that reflect attested real-world use. It may be well and good for us to provide our idea of the right romanization in the entry आढ्य, but if any number of variations of āḍya are sufficiently attested in actual use, we should have entries reflecting that actual use. <i style="background:lightgreen">bd2412</i> T 18:22, 9 June 2014 (UTC)
 * The current transliteration module has a small flaw, which can be hopefully fixed by an experienced Lua programmer -anusvara. It's a temporary method (symbol used) I used in the Hindi module (please don't blame for introducing it, I did my best) but was copied to the modules of all Indic languages, including Sanskrit. I will describe the situation later when I'm back to my desktop. In short, the module doesn't currently produce 100% transliteration, the symbol or letter to render anusvara depends on the following letter/sound. This current method is still acceptable but volunteers can lookup anusvara and Sanskrit transliteration. It's not very different from our current policy in transliterating anusvara in Hindi. --Anatoli (обсудить/вклад) 00:31, 10 June 2014 (UTC)


 * OK. Further details on both anusvara and candrabindu  (currently transliterated as "ṁ" and "m̐ " accordingly by Module:sa-translit (you'll see the note "until a better method is found"). Both symbols can often be used interchangeably. The rule at Hindi_transliteration can be applicable for Sanskrit as well to fix the transliteration problem. Both "ṁ" and "m̐ " are literal transliteration, non-phonetic, the phonetic would be to use tilde, e.g. ã, ā̃, ẽ, ĩ, ī̃, ũ, ū̃, etc. for nasalised vowels and "n", "m" and "ṅ" for cases described at Hindi_transliteration. I think it would be easier to work with the resulting transliteration letters, rather than with the original Devanagari script. Please let me know if you want me to set up test cases or need assistance in understanding the rule.--Anatoli (обсудить/вклад) 01:17, 10 June 2014 (UTC)


 * I could try to fix this... Isn't the anusvara always written as "ṃ" here for Sanskrit (IAST)? Wyang (talk) 03:06, 10 June 2014 (UTC)


 * I guess there are different transliteration standards, I'm not an expert on IAST and I'm more familiar with Hindi. Rendering nasalisation or lack of thereof (just consonants n, m or ṅ) phonetically is usually a better idea but it's more complicated. If IAST is the preferred standard and it really requires ""ṃ"" in all cases, then nothing needs to be done, I think. --Anatoli (обсудить/вклад) 03:35, 10 June 2014 (UTC)


 * FYI, the Monier-Williams dictionary, which most of our entries are copied from word for word, uses a slightly different transliteration system (ç instead of ś; sh instead of ṣ; ṛi instead of ṛ etc.). Whitney's Sanskrit Grammar and Macdonnell's History of Sanskrit Literature also use ç instead of ś (but not Monier-Williams's other idiosyncrasies), so that particular substitution at least is fairly likely to be encountered by students learning about Sanskrit language or literature. —Aɴɢʀ (talk) 12:45, 10 June 2014 (UTC)

Attestation 2
I propose to expressly make Sanskrit romanization includable if we have the romanized corresponding native-script entry. This was done in Votes/2011-07/Pinyin entries via this: "That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling." As a consequence, the romanization itself does not need to be attested, merely the form being romanized. --Dan Polansky (talk) 16:58, 9 June 2014 (UTC)
 * Support. — Ungoliant (falai) 17:04, 9 June 2014 (UTC)
 * Good idea. I've modified the vote's wording accordingly. - -sche (discuss) 17:29, 9 June 2014 (UTC)

Module:sa-translit
The vote should not link to Module:sa-translit as if it were part of the voted proposal. The thing proposed must be comprehensible without reading the module. --Dan Polansky (talk) 17:02, 9 June 2014 (UTC)