Module talk:hi-IPA


 * Close, but some errors. Aryamanarora (talk) 00:48, 28 October 2015 (UTC)
 * Close, but some errors. Aryamanarora (talk) 00:48, 28 October 2015 (UTC)
 * Close, but some errors. Aryamanarora (talk) 00:48, 28 October 2015 (UTC)

How’s it going?
How is the development going along? Let me know if you need help with anything. — Ungoliant (falai) 00:21, 29 October 2015 (UTC)
 * I don't know how much User:Aryamanarora knows about Lua but I'm not good with it. The module now has issues, which are partially solved in Module:hi-translit - . The inherent sound [ə], transliterated as "a" is dropped in certain positions. All the failed cases are about that - dropping of "a" ([ə]). --Anatoli T. (обсудить/вклад) 00:38, 29 October 2015 (UTC)
 * This is actually my first time using Lua for something useful - as you can guess, I'm not very versed in Lua. Somewhat off topic, but has anyone seen Module:pl-IPA or Module:de-IPA? Modeling the Hindi IPA module off of those could fix the schwa dropping. Aryamanarora (talk) 00:41, 29 October 2015 (UTC)
 * If the transliteration module already deals with the schwa issue (and presumably others) correctly, it could be simpler to have this module call the transliteration function and generate the IPA from the transliteration. — Ungoliant (falai) 00:44, 29 October 2015 (UTC)


 * (After E/C)The uncomplicated (for a human) rules for schwa dropping in Hindi are different from the above languages. They need to be implemented in the module. One doesn't need to know Devanagari script (very phonetic) but to understand the principles. The rules need to be described, e.g.
 * CāCaCa->CāCaC
 * CāCaCCa->CāCaCCa
 * C represents any consonant (?), ā - any long vowel or a vowel other than "a", a - just the inherent (unwritten a).
 * @Ungoliant, I agree about your point. --Anatoli T. (обсудить/вклад) 00:53, 29 October 2015 (UTC)

Failed testcases

 * I have now added another failed testcase that should be dealt with. —Μετάknowledge discuss/deeds 00:00, 19 July 2017 (UTC)
 * It is fixed now. Wyang (talk) 09:10, 19 July 2017 (UTC)

Invalid characters

 * When there are two options given, the slashes are causing an error. See e.g. उत्पन्न, where the module is putting the entry in CAT:IPA pronunciations with invalid IPA characters. —Aɴɢʀ (talk) 13:02, 14 August 2017 (UTC)
 * It is fixed now. Wyang (talk) 13:07, 14 August 2017 (UTC)

Syllabification
Per Pandey 2014. —AryamanA (मुझसे बात करें • योगदान) 18:22, 24 November 2017 (UTC)

ख़, ग़
Do you want to add a secondary pronunciation for these using /x/ and /ɣ/. Many speakers do try and speak like this when speaking "properly" or "educatedly". DerekWinters (talk) 18:44, 24 November 2017 (UTC)
 * Yeah, I was planning too. I'll try to do so right now, but beware, I'm on a phone. —AryamanA (मुझसे बात करें • योगदान) 22:33, 24 November 2017 (UTC)
 * Regarding your revert here: didn't think this was a controversial issue or would require consensus, as it already seems to be the general sentiment in this thread above (at least 's). But this is documented here. Getsnoopy (talk) 04:53, 14 September 2020 (UTC)
 * If you read it properly, User:AryamanA agreed a long time ago, back in 2017, to add a secondary reading for ख़, ग़. That is ख़ /x/ also as /kʰ/ and ग़ /ɣ/ also as /ɡ/. He recently added a similar treatment for a few other nuqta letters. "Proper" or "educated", these are alternative pronunciations. If you don't want to spend time understanding and following discussions, don't edit modules, which are used by a large number of entries. --Anatoli T. (обсудить/вклад) 06:02, 14 September 2020 (UTC)
 * It seems like your first and third sentences are in conflict. Either way, can you cite sources which claim that the "educated" pronunciation is an alternative and not the primary one? Getsnoopy (talk) 07:15, 14 September 2020 (UTC)
 * The nuqta phonemes are not native sounds to Hindi. My mother, who is highly educated, does not pronounce any of the nuqta consonants except /f/ and occasionally /z/. My grandfather, who has a Master's in Psychology, doesn't pronounce any of them. There have been few sociolinguistic studies on the nuqta phonemes, but one (that too from an Urdu perspective) is The polyphony of Urdu in postcolonial North India. in The Journal of Modern Asian Studies by Rizwan Ahmad, which finds a majority of young people surveyed in Delhi who identify as Urdu speakers do not pronounce the nuqta consonants as they are prescribed either.
 * Personally, I pronounce /f/ /z/ but not the others. Same with my father. —AryamanA (मुझसे बात करें • योगदान) 15:18, 14 September 2020 (UTC)
 * Those all seem to be anecdotes, although I don't know how having degrees in psychology or otherwise has much to do with linguistics. But I have similar anecdotes supporting the opposite side, where almost every native Hindi speaker I know pronounces all the nuqta consonants correctly except for ग़. I'd agree the sounds are not native to Sanskrit, but Standard Hindi, as we know it today, is the Khariboli dialect spoken around Delhi. This was formed from somewhere around 1200 CE, around the same time of the immense Persian influence on India as a whole and, more specifically, Khariboli. I don't know that one could convincingly argue that the incorporation of words like k͟harāb (ख़राब) and ġam (ग़म) into Standard Hindi took different paths lexicographically vs. phonetically. On the issue of whether they are required, that seems to be an issue of great debate. This is not to mention the fact that even you agree, like many in the discussions I cited, that people distinguish the nuqta in the cases of फ़ and ज़. The former has, in fact, been subject to hypercorrection where many people now pronounce words like phūl (फूल), phal (फल), and phir (फिर) as fūl (फ़ूल), fal (फ़ल), and fir (फ़िर), respectively.
 * For the purposes of this template, however, I don't think it makes much sense to have a lexicographical distinction (the nuqta) be made that doesn't manifest in the pronunciation somehow. My point is that it's odd to have the template be aware of the nuqta's role, but return the same output for the inputs of खराब and ख़राब all the same, especially given that people sneer when the फ़ and ज़ distinctions are not made. All of that notwithstanding, it's good to remind ourselves that what we're really debating is merely whether the nuqta pronunciation is displayed before the non-nuqta pronunciation. Getsnoopy (talk) 18:36, 14 September 2020 (UTC)
 * Of course, I offered anecdotes, and you responded with anecdotes--we simply don't have much in quantitative data to talk about, so this is the best we have. But I also offered a sociolinguistic study that affirms that these sounds are not as standardized as you seem to be making them out to be.
 * 1200 CE is far too early of a date for the beginning of Persian influence on Hindustani. Only in the Mughal period was a Persianized Hindustani cultivated in Delhi, and that is when these sounds could be considered anything more than marginal, and that too only in the dialect of educated elite of Delhi steeped in the Mughal power structure. And given that these sounds only came into the language through borrowing from Persian and Arabic (and now English), it's perfectly fair to say these sounds are not native to *all* Hindi speakers.
 * Yes, the only point of contention is the order. I don't think it really matters to be honest; both of them are there so no information is lost. We do not know the prevalence of these sounds across a wide sample of Hindi speakers, so any ordering is not based on quantitative data. I prefer the status quo. —AryamanA (मुझसे बात करें • योगदान) 20:22, 14 September 2020 (UTC)
 * I think you're confusing what is the standard vs. what people end up doing. This is similar to how the SI (modern metric system) is used. For example, "five kilometres" should always be written as "5 km", but you have people writing "5km", "5 KM", "5 kms", "5kms", etc. What people end up doing has a lot of do with people's access to education, etc. As Motilal Banarsidass, Michael C. Shapiro, and Harold F. Schiffman note in Language and Society in South Asia:
 * "The pronunciation of many speakers includes a number of consanants that are not part of the indigenous systems, and which have been introduced into the language through the absoption of Persian and Arabic loan words. The presence or absence of thee consanants is seldom categorial within a community, and tends to be correlated with the degree of education, sex, and social background of the speaker."
 * As for the history, kingdoms with Persian as the official language have been in the Delhi region since as early as 900 CE with the Ghurid dynasty. And this is not to mention the trading between the subcontinent and the Arabs and Persians that is basically as old as the recorded history of the Indian subcontinent goes. Seeing as linguists place the development of Khariboli around 900–1200 CE, the influence of Arabic/Persian on what is now modern Hindi is at least as old, if not likely far older than that.
 * Nevertheless, I think you've misunderstood my point. I was saying that because the template displays all the information already and we're debating about order, the nuqta pronunciation should come first, since the template seems to label that which comes first as "standard". If a word explicitly has a nuqta written, then pronouncing it without a nuqta clearly can't be "standard" pronunciation. If on the other hand, a word which should have a nuqta doesn't have it written, then I take your point about it being pronounced either way. But the template, as it's programmed currently, doesn't have a repertoire of all the words that have Perso-Arabic etymological roots in order to be able to figure out which ones have alternative pronunciations and which ones don't. Since it always goes off of what the underlying Devanagari text is, it should reflect the user/editor's preference of providing the nuqta where necessary in the pronunciation. Getsnoopy (talk) 22:42, 14 September 2020 (UTC)
 * (1) That is only called "standard" in the code, not in any publicly-visible context. Nevertheless, I've renamed "standard" to "nonpersianized" for clarity in the code.
 * (2) Languages are not comparable to measurement systems at all. Wiktionary is a descriptivist dictionary, we examine language as it is, not as what anyone says it ought to be (and there is no really well defined Standard Hindi anyways; even English has no one set standard pronunciation). Furthermore, there is no well-established regulating body of Hindi. The Central Hindi Directorate does exist but their work is not prescriptivist and they are not a governing body of the language; and even if they were, we would still approach Hindi documentation from a descriptivist standpoint. —AryamanA (मुझसे बात करें • योगदान) 22:49, 14 September 2020 (UTC)
 * "Languages are not comparable to measurement systems at all." That's not entirely true and it depends on the language, as the pronunciation for Indic languages is clearly laid out by various historical texts. But even if one were to consider it to be purely descriptivist, that still supports what I'm saying. If one were to be descriptivist, then using characters without the nuqta would be the place where the pronunciation is malleable (e.g., फ can be pronounced as pha or fa) because that's exactly what happens in the real world. It makes no sense, however, to suggest that a word that explicitly has the nuqta specified is still pronounced as if the nuqta was not there. So insofar as the template is concerned, seeing as we have entries for both nuqta and non-nuqta versions of each word, the entries for words which include a nuqta should have the proper nuqta-based pronunciation listed first. Getsnoopy (talk) 17:07, 4 October 2020 (UTC)

Commas in transliteration

 * At सौ सुनार की, एक लोहार की, the module is putting the comma into the IPA, which in turn is putting the entry into Category:IPA pronunciations with invalid IPA characters. Can this be fixed, please? Thanks! —Mahāgaja (formerly Angr) · talk 14:27, 22 May 2018 (UTC)
 * Yep, fixed now. —AryamanA (मुझसे बात करें • योगदान) 14:50, 22 May 2018 (UTC)

Syllabification error
The module is wrongly syllabifying as gar.bh.pāt even when spelled गर्भ-पात or गर्भ॰पात. Benwing2 (talk) 01:03, 16 August 2020 (UTC)
 * Some notes on how syllabification should work:
 * Syllable will prefer morpheme boundary whenever possible.
 * If just two consonants are together, syllable boundary is in between them.
 * For CCC the special cases are:
 * Cs.C, Cś.C, Cṣ.C
 * NC.C
 * C.CC for all else
 * I will add some testcases for these kinds of clusters and then we can fix it. Thanks for noticing this. —AryamanA (मुझसे बात करें • योगदान) 02:01, 16 August 2020 (UTC)
 * Fixed the aforementioned issue. I've also added narrow IPA transcriptions in [], let me know if you see any problems. —AryamanA (मुझसे बात करें • योगदान) 18:28, 16 August 2020 (UTC)

Alternative reading of ŏ as ā, short ŏ, space before sentence-final punctuation

 * Just adding this topic as a reminder from Talk:कॉफी about making words like also pronounced as . --Anatoli T. (обсудить/вклад) 23:19, 16 August 2020 (UTC)

(moved from Talk:कॉफी)

Should the ŏ vowel be short? If not, what does the breve mean? Benwing2 (talk) 00:36, 16 August 2020 (UTC)
 * Nice catch, you are right that it should be short. Fixed. —AryamanA (मुझसे बात करें • योगदान) 01:40, 16 August 2020 (UTC)
 * I have to mention that I only came across these symbols romanised with breve at Wiktionary. A few textbooks, a dictionary and a phrasebook I have, don't use them at all. E.g. is given as, romanised as "kāfī". Note also that Urdu doesn't have an equivalent letter, words with ॉ are normally spelled with an alif (ا) in Urdu, e.g. . Must be a relatively new introduction. Do most such words have alternative pronunciations with a long "a" in Hindi?
 * BTW, @AryamanA that no spaces are currently used before । or other sentence final punctuation signs. My books all use spaces in texts. Possibly a new development as well? --Anatoli T. (обсудить/вклад) 04:02, 16 August 2020 (UTC)
 * I don't think it's very new. All my life, I have only seen the way without a space before. Newspapers, books, etc. And yes! These words do have an ā pronunciation as an alternative. I should add that to hi-IPA, good point. —AryamanA (मुझसे बात करें • योगदान) 04:04, 16 August 2020 (UTC)


 * Thanks, @AryamanA, I am not challenging you, just asking. I got so used with the space before the "।" character (or ?, !), that I always corrected it when I saw it in usage examples (not that I edited a lot of Hindi entries). Anyway, I trust your judgement and knowledge.
 * Thanks for considering adding alt pronunciations. So, will get four possible pronunciations automatically? Seems legit by me. --Anatoli T. (обсудить/вклад) 04:15, 16 August 2020 (UTC)

More syllabification issues
I've noticed that the syllabification is still wrong in some cases. For example, is syllabified as aṇ.ḍkoṣ when it should be aṇḍ.koṣ. Similarly, is syllabified as aṇ.ṭār.kṭi.kā when it should be aṇ.ṭārk.ṭi.kā. Basically, the module needs to take possible onsets into consideration when dividing syllables. Note that this is already done by all the various Slavic pronunciation modules I've worked on, e.g. Module:uk-pronunciation, Module:be-pronunciation, Module:ru-pron; you can use them, esp. the first two, as a guide. Benwing2 (talk) 06:03, 19 August 2020 (UTC)

Use of ॰ in अज्ञेयवाद
I'm confused as to what the ॰ is doing in अज्ञेयवाद, with the pronunciation template written as. Written as such, you get a reduced schwa instead of a full schwa (if you omit the ॰) or no vowel (if you insert a virama). Why does this happen, and what's the underlying principle? Thanks. Benwing2 (talk) 06:19, 19 August 2020 (UTC)
 * I don't know either. I just see a different result. --Anatoli T. (обсудить/вклад) 12:18, 19 August 2020 (UTC)
 * So, it marked morpheme boundaries. Currently our support of weakened schwas is pretty basic, but the obvious rule that I have implemented is that if a difficult consonant cluster ends a word, that results in a weakened schwa at the end. E.g. by itself has a weakened schwa at the end. This process also takes place in compounds, which is what this symbol is trying to represent. —AryamanA (मुझसे बात करें • योगदान) 13:52, 20 August 2020 (UTC)

Strange pronunciation with extra final schwa
I've seen some words use a strange notation. An example is, which uses the following pronunciation template call:. The second pronunciation has the noun followed by an independent a syllable. What is the purpose of that and why does it yield the following: with schwa followed by a second reduced schwa? I don't understand what this is supposed to be signifying. Benwing2 (talk) 07:35, 20 August 2020 (UTC)
 * I just suspect it's an accidental result of some changes, probably unintentional, just a guess. The intention probably was to force the final shwa to be pronounced (imitating Kannada pronunciation), lacking shwa dropping), which we can now do with *, e.g. (hmm, but it only gives a weakened shwa in this position.) --Anatoli T. (обсудить/вклад) 10:12, 20 August 2020 (UTC)
 * Yeah, that was the old way of doing it and didn't work well, it's a mistake. I fixed it. —AryamanA (मुझसे बात करें • योगदान) 13:50, 20 August 2020 (UTC)

ur-IPA dialects
@AryamanA, @Inqilābī, @Kushalpok01, Hi all! Sorry for bothering you, but I was wondering if you guys could help me alter the ur-IPA part of this module by including the Deccani dialect ( -> ) in this, which is literally just a matter of changing /q/ -> /x/. I'm trying to include more accents and dialects (see . If you could also perhaps explain to me, how I can do this myself as my knowledge on programming is quite limited - I'd be quite grateful. Thanks! نعم البدل (talk) 02:41, 25 May 2022 (UTC)


 * ✅ Sbb1413 (he) (talk • contribs) 10:14, 28 April 2023 (UTC)
 * Hi, thank you for that, but it seems to appear for all pages using the Template:ur-IPA. I was hoping that the Deccan label would only appear if the word contained the letter 'q'. نعم البدل (talk) 23:19, 29 April 2023 (UTC)
 * IDK how to do so, but you can use "style=standard" to remove the Deccani pronunciation. Sbb1413 (he) (talk • contribs) 11:55, 30 April 2023 (UTC)
 * Sorry, I've just seen your reply. I've reverted your edit for the mean time, since it's pretty much showing a duplicate transcription across all pages. I will probably go back to this, some time soon نعم البدل (talk) 05:54, 13 May 2023 (UTC)
 * Some eastern dialects like Bihari still retain the diphthong quality for ऐ,औ, some UP dialects close to Delhi pronounce /ɽ ɽʱ/ as [ɭ̆ ɭ̆ʱ] and some other dialects of UP and Bihar pronounce /ʃ/ as [s]. They should be included too

f, z, other Perso-Arabic phonemes and Hindi dialects
Out of all the Perso-Arabic phonemes, f and z are a lot more commonly used than the rest especially f, some speakers even pronounce the native ph as f.

Some eastern dialects like Bihari still retain the diphthong quality for ऐ,औ, most pronounce /-Vɦ/ as [V̄], some UP dialects close to Delhi pronounce /ɽ ɽʱ/ as [ɭ̆ ɭ̆ʱ] and some other dialects of UP and Bihar pronounce /ʃ/ as [s]. They should be included too AleksiB 1945 (talk) 17:08, 30 April 2023 (UTC)

/n/ न as alveolar [n] ?
It has been stated by some that /n/ in Hindi is actually alveolar and not dental (denti-alveolar) like /t/, /d/, etc. For example here. Is this correct? and shouldn't the module reflect this by showing [n] instead of [n̪] ? Exarchus (talk) 14:33, 4 July 2023 (UTC)


 * To answer my own question: this paper shows that Hindi /n/, /l/ and /r/ are rather alveolar than dental, except when coarticulated with /i/, then /n/ and /l/ are denti-alveolar.
 * So I suggest to remove the dental diacritics from at least /n/ and /l/, and probably also from /s/ as Ohala also gives this as 'alveolar'. Exarchus (talk) 13:27, 6 January 2024 (UTC)

Hindi schwa = [ɐ] ?
@AryamanA Hi, when I listen to the Hindi sound fragments (and you seem to have done all of them), I don't hear the 'schwa' as [ə], but as [ɐ] (it would definitely be classified as an 'a'-sound in Dutch, never as schwa). This lady sounds the same in her pronunciation of दस (das). On the other hand, when I listen to Urdu sound fragments, for example خط, I do hear [ə].

Is this a general tendency of Hindi vs. Urdu and would it be an idea to change the Hindi phonetical (not phonological) pronunciation to [ɐ]? Exarchus (talk) 13:12, 4 January 2024 (UTC)


 * Here it is said: "The central vowel /ɐ/ is usually transcribed in IPA as /ə/, but since my sources tell me it’s more open than [ə], I’m going with /ɐ/."
 * @نعم البدل Hi, can you tell more about this? Exarchus (talk) 18:34, 4 January 2024 (UTC)
 * In Urdu, the schwa is pretty much always [ə]. In Punjabi, however, it's not rare for the schwa to turn into [ɐ] especially in one-syllable words, and that may well be the case with Hindi too, seen as they're spoken in related regions. Aryaman could probably expand on the Hindi side of things. نعم البدل (talk) 19:12, 4 January 2024 (UTC)
 * The case with /  is quite interesting. It definitely feels more open than [ə] but I'm not sure if that's because my Urdu is quite influenced by Punjabi. نعم البدل (talk) 19:20, 4 January 2024 (UTC)
 * Thanks. I'm not even sure if Aryaman's 'a' in अक्स is [ɐ], and not simply [a]~[ä]. Exarchus (talk) 19:38, 4 January 2024 (UTC)
 * This paper, for what it's worth, also locates Hindi /ə/ at [ɐ] (but has managed to reverse 'open' and 'close' on its chart). Exarchus (talk) 20:22, 4 January 2024 (UTC)
 * It's definitely not as low as [ä] for me, since आ exists and is noticeably lower. Currently though, I would agree that the Hindi "schwa" is mid-low [ɐ]. I don't think Urdu is different in this regard; I would say it's also mid-low. I do agree that Punjabi's "schwa" is lower than Hindi's. —AryamanA (मुझसे बात करें • योगदान) 04:21, 6 January 2024 (UTC)
 * @AryamanA Then I would suggest to change the Hindi module to /ə/ = [ɐ]. When you say Urdu isn't different in this regard, I suppose you mean Urdu as spoken in India? (for Urdu I'd look first of all to the pronunciation in Pakistan) Exarchus (talk) 09:23, 6 January 2024 (UTC)
 * Less certain about Pakistani Urdu due to its greater variation; phonetics seems to depend on the L1, e.g. in Lahore, Urdu has a Punjabi vibe to it that is not present in Karachi.
 * One thing I forgot to mention is that I believe the height of the "schwa" in Hindi is also influenced by prosody. Stressed schwas seem to be lower. I don't think this phenomenon is well-described in any academic work yet though. —AryamanA (मुझसे बात करें • योगदान) 21:35, 6 January 2024 (UTC)
 * @AryamanA Your remark about stress makes me think of Dutch, where historically the non-stressed vowels became [ə].
 * It would be interesting to add stress marks to the module, but I suppose this would be a lot of work (because the syllabification may not always be clear?) Exarchus (talk) 23:25, 6 January 2024 (UTC)

इह, ओह/एह and Word-final अय
Many speakers pronounce word-final अय as [ɛː] (e.g., विषय) so I feel like that should be included. एह (e.g., मेहनत) and ओह (e.g. तोहफ़ा) are usually pronounced as [ɛh] and [ɔh] before consonants in my experience. इह is also usually pronounced as [ɛh] although most words containing इह are nowadays spelt with एह (e.g., मिहनत->मेहनत). The same is true for उह being spelt as ओह (e.g.,दुहराना->दोहराना) however the IPA module already correctly displays उह as [ɔh] for the few words still spelt the old way (e.g. कुहरा, मुहर, etc.) so I feel like the same should be done for इह. ATallSteve (talk) 21:17, 29 March 2024 (UTC)