Module talk:mr-IPA

Wow, I wasn't expecting the implementation to happen so quickly! Thanks!

I was wondering what you think about the following:

✅ At what point do you think this module could be moved to Mod:mr-IPA for actual use on entries? Could that be done soon, or would the:
 * ✅ ċ, j̈, j̈h
 * syllable boundary
 * declined/inflected-form
 * issues have to be resolved first?

I thought the IPA values for most of the letters would be very similar to Hindi other than ċ, j̈, j̈h + ळ. According to the Wikipedia page :

✅ आ is /a/ instead of /aː/ or /ɑː/: /a/ is sometimes interchangable with /ə/ so perhaps /a/ is fine. I've avoided making some articles because both versions are found: हलणे = हालणे (to move), = हारणे,  = हांबरणे, etc.

✅ इ is /i/ instead of /ɪ/, उ is /u/ instead of /ʊ/: I agree with this because it shows that they are closer to the long high vowels ई, ऊ.

✅ ए is /e/ instead of /eː/: /eː/ makes sense for Hindi since it has /ɛː/ for ऐ or for South Indian languages that contrast short and long /e/, but I see no need for ː for Marathi.

✅ न is /n̪/ instead of /n/: This looks like an unnecessary hypercorrection that makes it look like न is dental like in Sanskrit or Malayalam.

✅ श is /ɕ/ instead of /ʃ/, च (c pronunciation) is /t͡ɕ/ instead of /t͡ʃ/, ज (j pronunciation) is /d͡ʑ/ instead of /d͡ʒ/: This makes this look more like an East Asian language. Is this plausible?

Kutchkutch (talk) 04:26, 8 November 2017 (UTC)
 * Looks accurate to me; so that means Classical Sanskrit were actually identical in pronunciation to Marathi! That's totally plausible. —Aryaman (मुझसे बात करो) 15:53, 8 November 2017 (UTC)


 * Thanks for that insight! There was a time when said  (j pronunciation) was, but that's been fixed there by User:Kwamikagami as the user noted at w:Talk:Marathi_phonology. That looked really strange.
 * Dhongde & Wali says "[ʂ] does not occur in modern Marathi, its symbol ष is retained only in writing." This suggests ष should be whatever श is for ordinary use.


 * The etymology-based pattern of problematic issues first seen at Module talk:mr-translit/testcases might apply elsewhere too. Native, Sanskrit, Perso-Arabic, and English appear to be separate paradigms that may overlap or assimilate into the native paradigm. The Perso-Arabic borrowings are usually older and there are few modern borrowings so those words are the most likely to assimilate into the native paradigm. Sanskrit borrowings, of course, are for prestige and neologisms. English borrowings may also have prestige, and now almost any English word could be borrowed.

Kutchkutch (talk) 23:36, 8 November 2017 (UTC)


 * ✅ Until now the expected result for the  testcase under ह was a guess. The actual rule is in "2.6.11.3 Word-medial [h] is optionally deleted" in Dhongde & Wali and there's a related rule "2.6.3 Aspiration" for the murmured consonants. So if more accuracy is needed I might add a few testcases based on those rules. Kutchkutch (talk) 02:14, 9 November 2017 (UTC)


 * Great, I'll update the module as you add them. Wow, I used to be so bad with Lua but I'm kind of getting the hang of it. —Aryaman (मुझसे बात करो) 02:15, 9 November 2017 (UTC)


 * As always, Thanks for your willingness to help out with your abilities!
 * Since I was intrigued by the testcase, I added the additional testcases in Dhongde & Wali and explained the rules as I understood them. However, I realised at the end of the exercise that like Gujarati  there is a formal pronunciation without ह deletion that mirrors the spelling and a casual pronunciation with ह deletion.


 * The module currently shows the correct formal transcription
 * (except चेहरा since eʱ is a murmured vowel, and Marathi has no murmured vowels).


 * If /m/, /n/, /l/, /ʋ/, /ɾ/ or /d͡ʑ/, /d̪/, /b/ does not occur as C in CV₁ɦV₂, parentheses could be used to show both the formal and casual pronunciations at the same time.
 * If /m/, /n/, /l/, /ʋ/, /ɾ/ or /d͡ʑ/, /d̪/, /b/ does occur as C in CV₁ɦV₂ and V₂ ≠ /o/, then CV₁ɦV₂ → C(ʱ)V₁V₂ (V₁ = /ə/ or /a/; if V₂ ≠ /i/ or /iː/, then /ə/ → /∅/) is the rule for obtaining the casual pronunciation. Conceptually,


 * ✅ Is there any way the module can show the formal transcription and the casual transcription simultaneously?


 * ✅ As for implementation, if it's too much work or too difficult, it's fine, and the formal transcription can be used. If you know how to implement the rule and only one pronunciation can be shown perhaps only the casual transcription should be shown since the transliteration is the formal/spelling pronunciation. If you know how the module can handle formal transcription and casual transcription, that would really be innovative! Kutchkutch (talk) 07:27, 9 November 2017 (UTC)


 * I think it definitely can be done. Why not keep the casual pronunciation in phonemic ([] instead of //) transcription? Let me try to implement something. —Aryaman (मुझसे बात करो) 23:32, 9 November 2017 (UTC)


 * Again, Wow! I'm really impressed with the results. Thanks! I thought the murmur rule might be too complicated or confusing to achieve.
 * I noticed that the actual results don't have any parentheses, and perhaps that's better for readability. So should all the parentheses be removed from the expected outcomes? Their original purpose was to show ह-deletion is optional, but that's may not be necessary.
 * If were syllabified it would probably have a syllable boundary between "" and "" since "ae" is not a diphthong and "æ" is only in English words, and I thought  was working for a while. However, these are really minor things compared to the implementation of that murmur rule.


 * ✅ I was actually wondering if an IPA module is for [--] or /-/, and I think what you're saying is:
 * If a word's casual prounciation and spelling pronuciation vary, then use [--] instead of /-/ when this module is used.
 * ✅ MOD:hi-IPA and T:hi-IPA don't appear to be adding // around the IPA transcription, but they appear in entries using them. So I haven't figured out if // is being added automatically or manually when is used in the Pronunciation section. Kutchkutch (talk) 03:57, 10 November 2017 (UTC)


 * ✅ I think I've found a way to resolve the issue shown by . Dhongde & Wali  show  सांग as 'saŋg' and say 'g' is optionally deleted so perhaps 'g' should remain even word-finally. I was assuming 'g' would be deleted based on English such as in /rɪŋ/ 'ring', but perhaps Marathi retains the 'g' and the 'ə' at the end might be an illusion caused by the word-final 'g'. By extension perhaps the optional ɦ in word-medial and word-final positions should remain as well. Kutchkutch (talk) 21:08, 10 November 2017 (UTC)


 * Hindi maintains final /g/ as well, so I was confused by that testcase. I can't say anything about /ɦ/, since I am unclear on what happens to them in Hindi too. —Aryaman (मुझसे बात करो) 21:27, 10 November 2017 (UTC)

Deployment
I'd like to get this deployed. Could you explain tell me the informal pronunciation variants of and  specifically? I'm a bit confused about the rules. —AryamanA (मुझसे बात करें • योगदान) 23:10, 18 February 2018 (UTC)


 * In CV₁ɦV₂ (V₁ = /ə/ or /a/ and V₂ ≠ /o/)
 * If C is /m/, /n/, /l/, /ʋ/, /ɾ/ -- murmur class
 * or C is /d͡ʑ/, /d̪/, /b/ -- aspiration class
 * CV₁ɦV₂ → CʱV₁V₂ (If V₂ ≠ /i/ or /iː/ and V₁ = /ə/ then V₁ → ∅)
 * CV₁ɦV₂ → CʱV₁V₂ (If V₂ ≠ /i/ or /iː/ and V₁ = /ə/ then V₁ → ∅)
 * CV₁ɦV₂ → CʱV₁V₂ (If V₂ ≠ /i/ or /iː/ and V₁ = /ə/ then V₁ → ∅)


 * The subsequent ह-deletion rule that I forgot to add appears to be:
 * If C is not in either of those two classes (murmur or aspiration)
 * CV₁ɦV₂ → CV₁V₂


 * For V₁ = /ə/ and V₂ = /a/, V₁ → ∅
 * ✅ तहान /t̪ə.ɦan/ → [t̪an]
 * ✅ शहाणा /ɕə.ɦa.ɳa/ → [ɕa.ɳa]


 * ✅ पाह, सहन, शहर, चेहरा, पेहलवान have V₂ → /∅/ instead. Kutchkutch (talk) 12:37, 19 February 2018 (UTC)
 * Thanks! I'll get on it. I feel that murmur and aspiration classes are the same to be honest (both use /Cʱ/, which is murmuring, while aspiration is /Cʰ/). —AryamanA (मुझसे बात करें • योगदान) 23:35, 19 February 2018 (UTC)


 * Yes, the two classes could be merged since voiced aspiration is really breathy/mumured voice. The reason for the separation was that /d͡ʑʱ/, /d̪ʱ/, /bʱ/ could be phonetically represented with a single letter: झ, ध, भ


 * What was meant by the last line is:
 * ✅ चेहरा /t͡ɕeɦ.ɾa/ → [t͡ɕe.ɾa]
 * ✅ पाह /paɦ/ → [pa]
 * ✅ शाह /ɕaɦ/ → [ɕa]
 * for C not in the two classes


 * “There are no word-final consonant-clusters except in words borrowed from English”
 * So perhaps the ह-deletion rule could be used to avoid the coda [ɦɾ], [ɦn] caused by the schwa-dropping in:
 * ✅ शहर /ɕə.ɦəɾ/ → [ɕəɾ]
 * ✅ सहन /sə.ɦən/ → [sən]
 * or these testcases could just have /ɕə.ɦəɾ/ and /sə.ɦən/ without ह-deletion or schwa-dropping if [ɕəɾ] and [sən] are too reduced.


 * Since there are no consonant clusters in codas (except perhaps homorganic nasals like सांग), रक्त would be [ɾәk.t̪ə], but it would probably need to be transliterated as rakta first.


 * Now that the module handles phonemic and phonetic IPA it could show the
 * /t̪s/ → [t͡sʰ] rule in उत्सव /ut̪.səʋ/ → [u.t͡sʰəʋ]
 * and /əʋ/ → [əu] in लवकर /ləʋ.kəɾ/ → [ləu.kəɾ], अवकाश  /əʋ.kaɕ/ → [əu.kaɕ].
 * /ʂ/ → [ɕ] Kutchkutch (talk) 05:33, 20 February 2018 (UTC)

Deployment (again)
Okay, so I've moved this module to namespace. Following the precedent of MOD:hi-IPA we won't show any of the aspiration/murmur assimilations in the broad transcription //, so that means for that the module is ready. I'll be implementing the aspiration rules for narrow transcriptions, as well as all that was previously discussed, soon. I want to note some useful things I have added: This way, mr-IPA can use the Devanagari script entirely. —AryamanA (मुझसे बात करें • योगदान) 01:55, 23 September 2020 (UTC)
 * The nuqta can be used in respelling to indicate j̈/j̈h/ċ/ċh
 * Like hi-IPA, the asterisk * can be used to force schwa insertion.
 * Also, I have removed vowel length indication in the broad transcription, since vowel length is not contrastive in Marathi. It will be present in the narrow one. —AryamanA (मुझसे बात करें • योगदान) 02:08, 23 September 2020 (UTC)
 * Thanks for your renewed interest! The development this infrastructure does depend on your interest and how much time you have available since you have the ability to understand both the language and coding aspects. Since isn't thorough enough on declension, I started User:Kutchkutch/mr-decl but it's nowhere near completion yet. Perhaps a slow manual deployment as you've been doing so far is probably better for now compared to mass deployment by bot since mass deployment  would reveal too many weaknesses at once. The nuqta for j̈/j̈h/ċ/ċh and * to force schwa insertion are certainly useful. See here for a paper on phonology. The study refers to Pandharipande (1997) and . Unfortunately, I don't have access to Pandharipande (1997).
 * च़ and the * seem to work fine, but ज़ and झ़ appear as /d͡ʑ̈/ and /d͡ʑ̈ɦ/, which should be /d͡z/ and /d͡zʱ/
 * would expected to be: /mət̪.səɾ/, [mə.t͡sʰəɾ] according to . /mət̪.səɾ/ is the default output, but results in /mə.t͡sɦəɾ/.
 * CC codas are not allowed according to so according to that restriction  would be /maɾ.ɡə/ instead of /maɾɡ/. Kutchkutch (talk) 08:36, 23 September 2020 (UTC)
 * After some manual deployment, the module seems to work in most cases for the broad transcription. and any other narrow transcription issues can be addressed later. Many of the broad transcription issues can be fixed with the manual intervention tools (such as  ).
 * The asterisk * is a good way to address the issue with, , etc. especially if it's not predictable.
 * Perhaps some of the broad transcription issues that require manual intervention are due to MOD:mr-translit (see Category talk:Konkani language).


 * and are cases in which  and  would be better compared to  and .  can be manually fixed with.


 * and can be fixed with  and.


 * Many instances of word-medial ज़ and झ़ show . This can be seen with अंदाज: and अंदाजे: .  works but  doesn't work without manual syllabification. So,  →  fixes the issue. Here are examples:




 * The transliteration should have a 'a' in the second position in many words beginning with 'd'. For, this absence of the 'a' in the second position of the transliteration leads to in the automated IPA. There doesn't appear to be a fix for the IPA using the Devanagari script. Here are examples:


 * ,, , , , Kutchkutch (talk) 11:41, 24 September 2020 (UTC)


 * Thanks for all the work and addressing many of the issues. Perhaps best characterisation of codas is in that article.


 * Consonant clusters are not allowed in coda position. Word-final consonant clusters are therefore not allowed, except in borrowings from English such as ‘silk’ or ‘test’.


 * So, English borrowings such as, in  and  would be the exceptions to the rule  ‘CC codas are not allowed’. Since modules have no way to know the etymology, perhaps there could be a hack in the Devanagari respelling to indicate English borrowings can have CC codas like the nuqta for च़, ज़ and झ़.


 * Although words with homorganic nasals are transcribed as in Dhongde and Wali for, perhaps it would make more sense to have a word-final schwa at the end in these cases. In fact,  is transcribed as  in the Grezause paper. The existence of schwas following homorganic nasals may feel like an illusion but they're certainly there. So, if this weakness is to be emphasised, it could transcribed with a superscript schwa . In that case:


 * क वर्ग
 * would be
 * would be
 * would be
 * would be
 * would be


 * च/च़ वर्ग
 * would be
 * would be
 * would be
 * would be


 * ट वर्ग
 * would be
 * would be
 * would be


 * त वर्ग
 * would be
 * would be
 * would be
 * would be


 * प वर्ग
 * would be
 * would be
 * would be


 * Compare : with :  and 🇨🇬.


 * Perhaps the following cases could retain the full schwa:


 * Geminates:


 * Consonant clusters:


 * ✅ In word medial position, the first consonant of a cluster is assigned to the coda position of the preceding syllable and the rest of the cluster is assigned to the onset of the next syllable


 * There is a related process for verbs with a stem-final homorganic nasal such as (not ).
 * verbal stem नोंद- + verbal suffix -णे  → नोंदणे.


 * goes into further detail on how the consonant following the nasal and before -णे is deleted in the narrow transcription:


 * The voiced non-aspirated syllable-final stops, , and  preceded by a homorganic nasal and followed by a nasal are deleted.


 * : →  (compare )
 * : →  (compare )
 * : →  (compare )
 * : →  (compare )


 * The process appears to have gone even even further in Konkani from what User:Bhagadatta said at Category talk:Konkani language:
 * [For nasal + voiced stop] you get the corresponding nasal: ङ for ग and घ, म for ब and भ and so on. Following this, the voiced stop is then dropped (see the pronunciation of, ). Kutchkutch (talk) 12:15, 25 September 2020 (UTC)
 * Thank you so much for all this analysis, I have fixed all the bad errors: ज़/झ़/च़/छ़ are now handled correctly; syllabification should be okay now; final schwas now occur after clusters/geminates; y is treated as any other consonant (so we get āyte). English borrowings with final clusters will require a phonetic respelling with the virāma unfortunately, there's no easy way to automate it. I'll be starting the phonetic IPA implementation now. I wanted to ask, are there any words that are not verbs which end in -णे? If no, then I can easily implement the phonetic rules you gave at the end. If yes, verb info will have to be passed to the template. —AryamanA (मुझसे बात करें • योगदान) 04:55, 26 September 2020 (UTC)
 * Thanks again for all the work.
 * -णे
 * Although most words that end in -णे are verbs, there are a considerable number of words that end in -णे that are not verbs (or derived directly from verbs). A few examples are below. Although Dhongde and Wali doesn't explicitly restrict the rule to just verbs, only and ) in the following list would qualify.  is a poetic word that is usually pronounced very carefully. Therefore, this analysis indicates that that some info would have to be passed to the template if there's no way to automatically detect N[,, , ]N.


 * Nouns:
 * (> Thane, 🇨🇬)
 * (> Thane, 🇨🇬)
 * (> Thane, 🇨🇬)
 * (> Thane, 🇨🇬)
 * (> Thane, 🇨🇬)


 * Other:
 * lemmatised to (🇨🇬)
 * lemmatised to (🇨🇬)
 * lemmatised to
 * lemmatised to (🇨🇬)
 * lemmatised to (> )
 * lemmatised to (> )
 * lemmatised to (> )
 * lemmatised to (> )


 * ळ is incorrectly syllabified when it is in word-medial codas (e.g. : ):
 * The verbs could be fixed by using  (for :, : , etc.)
 * and are possibly compounds involving  and  so it might be okay to use respelling with.
 * and are possibly compounds involving  and  so it might be okay to use respelling with.


 * e-stem neuter nouns (e.g. ), declined adjectives (e.g. ) and all verbs are all usually pronounced with a schwa at the end in their lemma forms despite word-final character being ए. This ए is only used instead of the schwa when being pedantic. The colloquial schwa is indicated in writing with ). It would be helpful if this word-final schwa could be indicated along with the pedantic ए. Only the verbs have no exceptions. Deciding whether the other parts of speech are applicable may require manual judgement. The format at could possibly be used.
 * e-stem neuter nouns (e.g. ), declined adjectives (e.g. ) and all verbs are all usually pronounced with a schwa at the end in their lemma forms despite word-final character being ए. This ए is only used instead of the schwa when being pedantic. The colloquial schwa is indicated in writing with ). It would be helpful if this word-final schwa could be indicated along with the pedantic ए. Only the verbs have no exceptions. Deciding whether the other parts of speech are applicable may require manual judgement. The format at could possibly be used.


 * ✅ The length of high vowels
 * The conclusion that the paper indicates that high vowel length can be indicated in the narrow transcription.


 * Average vowel durations for /i/ versus /iː/ are 145ms for short vowels and 238ms for long vowels with an average long to short vowel ratio of 1:1.69. Results for these two minimal pairs indicate that high short vowels are indeed shorter than long vowels.


 * The following might be considered lower priority:
 * ✅ Orthographic CʰCʰ → Phonological CCʰ in, , , and
 * Word-initial ज्ञ in, , etc. are currently showing as in the broad transcription, which would need to be changed manually if entries for those words are created.  As the word  shows, the transliteration and the broad transcription word- would be   . In a narrow transcription, word-initial ज्ञ could be represented as . Kutchkutch (talk) 13:29, 26 September 2020 (UTC)
 * The h-deletion and aspiration rules work in most cases, but here are some cases to consider:


 * Diphthongs (with the first vowel being ) would be better in the following:
 * Here are other words in which the h-deletion and aspiration rules apply:
 * V₁ =
 * [əC]: ,
 * V₁ =
 * [aC]:, ,
 * V₁ =
 * V₁ =
 * [eC]:
 * V₁ =
 * is shown as li•in/lin on page :
 * V₁ =
 * [aC]:, ,
 * V₁ =
 * V₁ =
 * [eC]:
 * V₁ =
 * is shown as li•in/lin on page :
 * V₁ =
 * V₁ =
 * [eC]:
 * V₁ =
 * is shown as li•in/lin on page :
 * V₁ =
 * is shown as li•in/lin on page :
 * is shown as li•in/lin on page :
 * is shown as li•in/lin on page :


 * Of the remaining narrow transcription rules, the diphthongisation rules on page might be helpful since many lemmas qualify. Although it says ʻespecially in fast speechʼ, it might actually vary from  ʻwhen the pronunciation is not carefulʼ to ʻall the timeʼ. English borrowings probably apply as well.
 * The only lemmas that have appear to be Perso-Arabic words with  + word-initial इ such as  and.
 * All verbs with -व- (≈ 🇨🇬 in )
 * from : →
 * has an idiosyncratic pronunciation
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * The only lemmas that have appear to be Perso-Arabic words with  + word-initial इ such as  and.
 * All verbs with -व- (≈ 🇨🇬 in )
 * from : →
 * has an idiosyncratic pronunciation
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * All verbs with -व- (≈ 🇨🇬 in )
 * from : →
 * has an idiosyncratic pronunciation
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * has an idiosyncratic pronunciation
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.
 * In terms of and, the diphthongisation rules are probably after h-deletion since :  has  so it wouldn't qualify for  →.


 * Although there are some guidelines for stress on page and in the paper, not even MOD:hi-IPA has stress, so those could be considered low priority. Kutchkutch (talk) 14:08, 27 September 2020 (UTC)


 * was created to see how showing both the ए and the schwa pronunciations might look like. If there's a better way, then feel free to make the necessary changes (including deleting the template).


 * This is how transcribe words with :


 * is often divided across syllable boundaries as
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)
 * Kutchkutch (talk) 12:56, 28 September 2020 (UTC)

h-Deletion and Murmur Rules