Module:sa-convert/documentation

This module is used to convert Sanskrit Devanagari text to other scripts. It is principally used in Template:sa-alt and its function  is exported in Template:sa-convert.

Example
ॐ त्र्यम्बकं यजामहे सुगन्धिं पुष्टिवर्धनम् । उर्वारुकमिव बन्धनान् मृत्योर् मुक्षीय माऽमृतात् ॥ कः खगौघाङचिच्छौजा झाञ्ज्ञोऽटौठीडडण्ढणः। तथोदधीन् पफर्बाभीर्मयोऽरिल्वाशिषां सहः॥

All Scripts

Unresolved Issues

 * Burmese:
 * Round AA also needs to be replaced with tall AA in some situations. ✅
 * Some conjuncts need to be cleanup like -y-, -r-, -v- when they come together.
 * NGA floating င္ → င်္ ✅
 * RA repha ရ္ → ရ်္ (This never happens in Pali.) ✅
 * NYA + virama + NYA → great NYA ✅
 * SA + virama + SA → great SA ✅
 * Final virama → asat ✅
 * Lao:
 * Lao does not have characters for ऋ ॠ ऌ ॡ so it uses equivalent ຣິ ຣີ ລິ ລີ instead. ✅
 * Evidence? I've read that it uses ຣຶ ຣື ລຶ ລື, which would eliminate the ambiguity.
 * In "Lanexang Mon4" font, there are already invented characters ຤(=ฤ) ຦(=ฦ) at unassigned codepoints but their usages are nowhere to attest.
 * Khmer:
 * RA repha រ្ → robat over next consonant ៌ (This never happens in Pali.) ✅
 * Final virama → viriam ✅
 * Javanese:
 * no spaces in the script (need to remove the ones that enter the module); also causes the following two issues
 * ꦾ and ꦿ for word medial conjuncts, but ꦪ and ꦫ for conjuncts that cross word boundaries, e.g.
 * -> / ꦥꦢꦾ  (pa-dya) vs.
 * -> / ꦥꦢ꧀ꦪ (pad-ya) and
 * -> / ꦥꦸꦠꦿ (pu-tra) vs.
 * -> / ꦥꦸꦠ꧀ꦫ (put-ra)
 * ꦂ for aksaras that end with r, but aren't aksara initial, e.g.
 * -> / ꦏꦂꦩ (kar-ma) and
 * -> / ꦏꦂ (kar) vs.
 * -> / ꦥꦏ꧀ꦫꦩ (pak-rama) and
 * -> / ꦏꦿꦩ (kra-ma)
 * enclosing numbers around ꧇ (꧇꧑꧙꧇ = 19). Test: त्र्य०६म्बकं ->
 * ꦘ should be used for the conjunct ज्ञ, not ꦗ꧀ꦚ. Test: ज्ञ ->
 * Balinese:
 * also no spaces, and causes the following issue
 * ◌ᬃ for syllables that begin with r
 * e.g. ->  / ᬓᬃᬫ (kar-ma),  ->  / ᬓᬃ (kar)
 * enclosing numbers around ᭞ (᭞᭑᭞ = 1). Test: त्र्य०६म्बकं:
 * Bengali:
 * reverse transliteration is problematic due to being both  and.
 * in certain words (needs more research), word final in Sanskrit becomes
 * e.g. ->  instead of  (which we currently have)
 * ❌ - Khanda Ta is not used for Sanskrit: "As it is a later innovation in the development of Bengali script, khanda ta is not used in older texts, and would not normally be expected in Sanskrit-language documents."
 * depending on location of in a word, Bengali may prefer  or
 * e.g. ->   ->
 * Assamese:
 * in certain words (needs more research), word final in Sanskrit becomes
 * e.g. ->  instead of  (which we currently have)
 * ❌ - Khanda Ta is not used for Sanskrit: "As it is a later innovation in the development of Bengali script, khanda ta is not used in older texts, and would not normally be expected in Sanskrit-language documents."
 * depending on location of in a word, Assamese may prefer  or
 * e.g. ->   ->
 * Sinhala
 * for Sanskrit, conjuncts are formed not by simply using its virama (U+0DCA) but by either abutting the consonants, encoded by the sequence  or by forming a ligature, encoded by . (The extra character is ZWJ.)  Which is used depends on the consonants, but as a general rule  forms a ligature with a consonant to either side (very like Devanagari w:repha and rakar), while formally  ligates with a preceding consonant, but in fact the glyph simply changes shape.  There is some evidence for geminate  being ය‍්ය in Sanskrit rather than ය්‍ය as in Pali.  Finally, at least one pair form a separately encoded ligature -  plus  becomes .  My best estimate so far for the combinations has been encoded in Module:sa-utilities/translit/SLP1-to-Sinh, and ultimately I believe this module and that module should share common code for the fix-up of naive transliteration that just uses U+0DCA.  ✅
 * Additionally, for the Pali and Sanskrit I can find, /e/ and /o/ do not have their length marked, but use the same symbols as the Sinhalese language uses for the short vowels. ✅
 * I have just (18/19 December 2023) added some evidence-based test cases to Module:sa-convert/testcases. Research continues to plod along.
 * Tamil
 * Final nasals.
 * Final visarga - the Grantha visarga is used. ✅
 * Encoding of superscript digits and vowels.
 * Syllabic consonants
 * Rules for /n/ - v..
 * Alternative forms, e.g. subscript digits.
 * Rules for /n/ - v..
 * Alternative forms, e.g. subscript digits.