Module talk:mk-pronunciation


 * Hello. Do you have something to say about the module? Is everything handled right? Guldrelokk (talk) 21:17, 15 April 2018 (UTC)
 * Hello. How could I use this module to generate IPA transcriptions for my entries as I create them? It looks good, but I don't agree with the final stress of ; it is indeed final in the Skopje dialect, but as far as I know, it's initial in the standard language. I would also like to point out that the assimilation of sibilants is characteristic of rapid, natural speech, but that in slower, more solemn discourse, one would likely say [ˈbɛst͡ʃɛstɛn]. Finally, a function should be added to convert [r] to [ɾ] in intervocalic contexts except when the following vowel is stressed (this is the only case where [ɾ] is obligatory, whereas it is a free variant of [r] elsewhere], since the module already takes allophony into account, and [bara] as a phonetic realisation of, for instance, would constitute an incorrect pronunciation (except in songs, perhaps), although phonemically /bara/ is, of course, perfectly all right. Martin123xyz (talk) 21:48, 15 April 2018 (UTC)
 * Thank you. I’ve implemented /r/ allophony. You only need to include Template:mk-IPA in the entry, unless the stress is irregular, in which case you can supply the word with a stress mark as a parameter, like this: ; for  just  is enough. I marked the stress in одвај explicitly for a test, I didn’t know it is dialectal, but the entry is not affected anyway. Wouldn’t it be better to keep the assimilation as both the less obvious and the more natural pronunciation variant? Guldrelokk (talk) 23:23, 15 April 2018 (UTC)
 * Thank you for implementing the /r/ allophony. I agree with you regarding the sibilant assimilation. I will now start using the module and will notify you if any issue arises (other than unpredictable stress or syllabic /r/, as in, where you cannot tell by the orthography that the word is trisyllabic, and which is inconsistent with the bisyllabic , for instance, for which no entry exists yet) Martin123xyz (talk) 11:55, 16 April 2018 (UTC)
 * I have noticed another thing that needs to be added: /n/ shouldn't assimilate only to [ŋ] before velars (as it does in, but also to [m] and [ɱ] before labials and labiodentals, e.g. in and . Meanwhile, /m/ and /ɳ/ do not assimilate to the place of articulation of the following consonant, e.g. /ramka/ stays [ramka], so they should not be modified. Martin123xyz (talk) 13:48, 16 April 2018 (UTC)
 * I have now noticed that is transcribed with an optional ]j] at the beginning, which is incorrect. An epenthetic [j] might appear before /ɛ/ if there's another preceding vowel, i.e. in case of hiatus, but never at the beginning of a word. Martin123xyz (talk) 13:57, 16 April 2018 (UTC)
 * Everything seems fixed now. But does /m/ really not even become labiodental before /f/ or /v/? Also, can’t also be written ? I thought this is a rule, as stated in Правопис на македонскиот јазик, page 6: „Самогласното р се пишува со апостроф кога се наоѓа на почетокот на зборот, а по него следува согласка: ’рбет, ’рбетник, ’рѓа, ’рж, ’ржи, ’рскавица,’рт, ’рти, ’рчи. Овие зборови се пишуваат со апостроф и кога ним им се додава префикс што завршува на самогласка: за’рѓа, за’ржи, про’рти, но кога им се додава префикс што завршува на согласка, апостроф не се пишува: безрбетник, изрти, сржи.“ If not, then you still can supply про’рти as the parameter to get the correct pronunciation, this won’t affect the entry display. Guldrelokk (talk)
 * Thank you for the adjustments. Honestly, I don't know whether [m] becomes [ɱ] because the difference is too slight for me to perceive whereas I haven't read anything about it; the assimilation of [n] to [ɱ], on the other hand, is much more conspicuous. However, I do think that it's logical for [m] to assimilate to a following labiodental, so I suggest we include that in the algorithm. You are also right about the rule regarding the apostrophe. I will add the forms with the internal apostrophe. Martin123xyz (talk) 14:52, 16 April 2018 (UTC)
 * I have come upon a new issue : the transcription for includes too many consonants in the onset of the second syllable; the /d/ should be assigned to the previous syllable, since /dgl/ is not a possible onset in Macedonian. In fact, no sequence of the type plosive-plosive-liquid is. Martin123xyz (talk) 22:11, 16 April 2018 (UTC)
 * Yes, sorry, it was only meant to affect /t͡s/, /d͡z/ and the like. Now it’s fixed. Guldrelokk (talk) 22:27, 16 April 2018 (UTC)
 * I'm afraid that now the transcription of is incorrect: the /v/ is assigned to the first syllable but actually belongs to the second. The sequence /zv/ is not a possible coda in Macedonian. The syllabification rules need to be refined to take into account the nature of the consonants in addition to their number in a given cluster. I would fix this myself but I don't understand the code. Martin123xyz (talk) 22:37, 16 April 2018 (UTC)
 * It already did that, just in a wrong way. Изв-ршител is of course not correct, as the second syllable is left open. It seems to work fine now. Guldrelokk (talk) 23:51, 16 April 2018 (UTC)
 * Hello again. The transcription for now splits the affricate, inserting the stress mark between the plosive and the fricative, whereas the affricates are actually single consonants. Martin123xyz (talk) 10:41, 18 April 2018 (UTC)
 * Hello; the transcription for is wrong because it marks the initial /m/ as syllabic, whereas it it forms part of the consonant cluster of the onset of the first syllable whose nucleus is /o/. Martin123xyz (talk) 18:54, 19 April 2018 (UTC)
 * Yes, sure. I fixed it. Guldrelokk (talk) 21:42, 19 April 2018 (UTC)
 * Hello again. I have noticed that the syllabification for is wrong - it should be [pɔdˈmnɔʒɛstvɔ] instead of [pɔdmˈnɔʒɛstvɔ] because /dm/ is not a possible coda. Martin123xyz (talk) 20:50, 19 September 2018 (UTC)


 * Excuse me, I didn’t notice your report – should be fixed now, let me know if something else goes wrong. Guldrelokk (talk) 12:53, 28 December 2018 (UTC)

Sorry if I stepped on your toes, I'm crap at computing and I was excited that I could do the fix myself for once! --Per utramque cavernam (talk) 14:29, 16 April 2018 (UTC)
 * That was nice of you, thank you – I was glad to see that is already fixed. Guldrelokk (talk) 14:46, 16 April 2018 (UTC)

Hello. Could you tell me how I can use the " " template to indicate a common non-standard pronunciation, e.g.  with the stress on the second syllable as the common colloquial pronunciation? Martin123xyz (talk) 07:27, 4 August 2021 (UTC)
 * I have also noticed that stress placement is not sensitive to morphemic boundaries, which results in incorrect syllabification. For example, in, the stress mark should be before the -др- sequence because that it where the root starts, whereas in , the stress mark should be between the "д" and the "р", since the prefix is под- and the root is ред-. This is partly covered in Правопис на македонскиот јазик, e.g. in section 357, but the information on syllabification is not too comprehensive.
 * Is there anyway for the problem to be dealt with automatically? If not, unwary users might add " " without checking the syllabification. Perhaps the module should be adjusted such that by default, no stress is indicated, unless the user specifies a second parameter, e.g. reg=1 (for regular stress). For the other case, the stress mark can be inserted manually, as I have done at . Martin123xyz (talk) 11:38, 4 August 2021 (UTC)
 * There is also a problem with sonorant sequences in words like . The module indicates both sonorants as syllabic because they are both interconsonantal, except than once the "р" becomes syllabic, the "н" is no longer interconsonantal and therefore remains [- syllabic]. Could perhaps help with some of this since Guldrelokk seems to be inactive? Martin123xyz (talk) 09:15, 5 August 2021 (UTC)
 * Hello. Sorry, I can't help, as I don't know how to code. But perhaps or  can? Per utramque cavernam – PUC – 09:21, 5 August 2021 (UTC)
 * I've added two testcases to Module:mk-pronunciation/testcases. Are they correct? PUC – 09:31, 5 August 2021 (UTC)
 * Thank you for adding the testcases; they are correct (the expected transcription is correct and the actual transcription is wrong). Sorry to bother you; I saw that you had fixed something else for this module in 2018 so I pinged you. Martin123xyz (talk) 09:42, 5 August 2021 (UTC)
 * No worries! It's a shame that Guldrelokk is no longer around; I really liked their work here. PUC – 10:09, 5 August 2021 (UTC)
 * Another issue is the treatment of . The module indicates that the m is syllabic, but it never becomes syllabic after another sonorant. Martin123xyz (talk) 06:46, 9 August 2021 (UTC)
 * I tried to fix the issue with грнчарство and стокхолмски. Can you add more testcases, esp. for anything that seems wrong, e.g. impossible syllabifications? BTW I can try to make the module smarter with respect to prefixes like под-, but there are some issues that can't be handled automatically, e.g. AFAIK in Macedonian both под- and по- are possible prefixes, so there's no way of knowing what to do with подр-, which might be под.р- or по.др-. Benwing2 (talk) 02:41, 11 August 2021 (UTC)
 * Also we should strive to use for everything and not just use  when the module doesn't do something right. Instead the module should be fixed or the appropriate respelling used. Benwing2 (talk) 02:42, 11 August 2021 (UTC)
 * Thank you for fixing the syllabic sonorants. For the time being, I do not have any more testcases to add; was in the same category as . Since as you point out, it is not possible for the algorithm to recognize all morphemic boundaries automatically, I suggest that you not add smart rules. Instead, I will transcribe the words with the "`" symbol to indicate the syllabic boundary, as I've done for  (see the code behind the displayed pronunciation). I agree that we should use  as much as possible, but I didn't know whether anyone would actually agree to improve it, so I started using  while waiting. Is there any way I can see a list of all the Macedonian pages with  so that I can fix them now (or in future, if further improvements to the module are required?). Martin123xyz (talk) 05:54, 11 August 2021 (UTC)
 * Could you also advise me as to the proper way to add non-standard or colloquial pronunciations? So far, I have been using the system that you can see at, which is improvised. Martin123xyz (talk) 05:54, 11 August 2021 (UTC)
 * See User:Benwing2/mk-raw-ipa for a list of all 54 pages that use . BTW I still think it might be useful to make the module smarter with respect to impossible syllabifications, which you mention currently occur. If you can create test cases for these, I can look into them. As for colloquial pronunciations, if they are just a matter of putting the stress in a different place, you can use the acute accent like (same as is done in the Russian and Bulgarian modules). For other types of pronunciations, if they can be expressed using respelling, you should do that, otherwise let me know what the issue is and I'll devise a mechanism to represent it. Benwing2 (talk) 01:35, 12 August 2021 (UTC)
 * Thank you for the list. I'll create test cases for both the words that are incorrectly syllabified and the ones that are transcribed as they should be so that the modifications don't end up fixing one problem and causing another. As for the colloquial pronunciations, I know how to indicate the stress in a different place; what I wanted to know is how I should label them as colloquial. Currently, I use italics in parentheses, but maybe there's a special template that I should use, or maybe I should add a Pronunciation 2 header. Martin123xyz (talk) 05:59, 12 August 2021 (UTC)
 * Ah, I see. You should not use Pronunciation 1 or Pronunciation 2 headers in general. Instead, use Etymology 1 or Etymology 2 headers, putting a level-4 ====Pronunciation==== section under each. If this doesn't make sense, and you want to list entries with more than one pronunciation under the same Etymology header, you should list all pronunciations under the same Pronunciation header, if necessary distinguishing them using qualifiers. I changed and  appropriately to show how it should be done. For qualifiers, generally you can use  or, which are equivalent (both of them are redirects to ). For Russian, we have an ann parameter that adds a qualifier consisting of just the spelling, with the accent marked; see  for an example.  for Bulgarian supports the same parameter, and I can add support for this to  if you want, but I'm not sure how useful it will be, since differing stresses don't normally distinguish different forms of the same word in Macedonian. Benwing2 (talk) 04:19, 13 August 2021 (UTC)
 * Thank you. In future, I'll use the qualifiers that you suggest. For the words I had in mind, separate etymology headers are not appropriate, since it's really just two pronunciations of the same word with the same meaning and origin, except that one is standard and confined to contexts like education, news broadcasts and audiobooks, and the other is nonstandard but more natural and more widely used, such that it is worth documenting it. I agree that ann would be relatively useless in Macedonian. The two stresses of are an exceptional case and only exist in the standard and some dialects; I have never even heard anyone make the distinction orally in Skopje. Have you looked at the 13 failed testcases? Martin123xyz (talk) 04:33, 13 August 2021 (UTC)

Yes. Thank you for adding them. Comments:  For things like конфликт vs. комфорен, I will probably add support for  to prevent assimilation, so you'd write. We use this character for this purpose also in Russian, Italian and various other languages. For од немај-каде, I assume that all written voiced consonants are pronounced voiced before another voiced consonant across word boundaries? If so, I will fix the module to support this. For the hyphen, the module currently treats hyphen like a space because of terms like, , etc. In the case of , either I can change the module to recognize certain hyphenated suffixes like (if there are several words ending in this suffix), or we can just respell it without the hyphen. For од играчка плачка, is the voiced /d/ before a vowel a property only of (and presumably also,  and certain other prepositions)? I can fix the module to recognize these prepositions specially. We currently do this in Russian, which uses the following list for prepositions as well as for hyphenated suffixes and certain other special cases: local accentless = { -- class 'pre': particles that join with a following word pre = ut.list_to_set({'bez', 'bliz', 'v', 'vo', 'da', 'do',      'za', 'iz', 'iz-pod', 'iz-za', 'izo', 'k', 'ko', 'mež',       'na', 'nad', 'nado', 'ne', 'ni', 'ob', 'obo', 'ot', 'oto',       'pered', 'peredo', 'po', 'pod', 'podo', 'pred', 'predo', 'pri', 'pro',       's', 'so', 'u', 'čerez'}), -- class 'prespace': particles that join with a following word, but only --  if a space (not a hyphen) separates them; hyphens are used here --  to spell out letters, e.g. а-эн-бэ́ for АНБ (NSA = National Security	--   Agency) or о-а-э́ for ОАЭ (UAE = United Arab Emirates) prespace = ut.list_to_set({'a', 'o'}), -- class 'post': particles that join with a preceding word post = ut.list_to_set({'by', 'b', 'ž', 'že', 'li', 'libo', 'lʹ', 'ka',	  'nibudʹ', 'tka'}), -- class 'posthyphen': particles that join with a preceding word, but only --  if a hyphen (not a space) separates them posthyphen = ut.list_to_set({'to'}), }  For words like обновува and трамвајскиот, and in general for syllable boundaries in clusters, I think the best we can do is have a list of allowed onsets. We have such a list for Russian, which is as follows (it also includes all consonants followed by /j/, which is not listed here): local perm_syl_onset = ut.list_to_set({	'spr', 'str', 'skr', 'spl', 'skl',	-- FIXME, do we want sc?	'sp', 'st', 'sk', 'sf', 'sx', 'sc',	'pr', 'br', 'tr', 'dr', 'kr', 'gr', 'fr', 'vr', 'xr',	'pl', 'bl', 'kl', 'gl', 'fl', 'vl', 'xl',	-- FIXME, do we want the following? If so, do we want vn?	'ml', 'mn',	-- FIXME, dž is now converted to ĝž, which will have a syllable	-- boundary in between	'šč', 'dž', })   Benwing2 (talk) 05:02, 13 August 2021 (UTC)


 * 1. For комфорен and конфликт, the problem is the reverse of what you're referring to: конфликт is transcribed without assimilation because the stress mark intervenes between the /n/ and /f/, whereas it should have assimilation just like комфорен. In the test case list, the second column indicates the desired result, not the third one. As far as I can tell, there is no need for a symbol which blocks assimilation within words.
 * 2. "I assume that all written voiced consonants are pronounced voiced before another voiced consonant across word boundaries" - this is correct. As for немај-каде, there is no suffix: this is a compound word composed of an imperative verb and an adverb. It is an isolated case, so the best option would be to respell it inside the template.
 * 3. Per 2), there is no need to compile a list of prepositions. талог од кафе (coffee dregs, lit. dregs from coffee) is also pronounced with a voiced [g]. There should also be devoicing of voiced consonants before a voiceless consonants across word boundaries (e.g. for the [g] in долг пат, "long route", which is already handled by final devoicing), but not the opposite process: длабок бунар (deep well) is pronounced with a final [k], not [g]. In this respect, assimilation across word boundaries differs from assimilation within words.
 * 4. Can we have a list of forbidden onsets instead? I have been manually adding the bare template to thousands of words, and if we now add a list of allowed onsets and omit something (which is probable since some onsets occur in only one word, e.g. /vmr/ in вмрежува), it might ruin some of the transcriptions that I have already validated individually. A list of forbidden onsets, in turn, would only concern the words whose transcriptions were problematic. Since I have already respelled those, the list would not interfere with the output - it would only eliminate the necessity of more manual respellings in future. For starters, we could add /bn/ and /mv/ to the forbidden onsets. For the latter, we can more generally forbid "nasal stop + non-sonorant". However, I have doubts about this too. If a forbidden cluster becomes allowed due to the borrowing of a new foreign word, we will have to remove it from the list of forbidden clusters, which will ruin the previously validated transcriptions of words containing it. Martin123xyz (talk) 06:23, 13 August 2021 (UTC)
 * For бара преку леб погача, can you make the rule which transcribes all monosyllabic words as unstressed only apply to clitics, so that леб can receive stress? The list of clitics should include all pronominal clitics (ме, ми, те, ти, го, му, ја, ѝ, нѐ, ни, ве, ви, Ве, Ви, ги, им, се, си), all monolexical prepositions (без, од, со, на, во, покрај, околу etc.; you can look for them here), and all monolexical conjunctions (и, да, а, ама, но...) I would also like to be able to write `без if for whatever reason, stress is required (possibly in some emphatic context), i.e. I would like for the stress-stripping to be overrideable. Martin123xyz (talk) 06:39, 13 August 2021 (UTC)
 * Thanks for your comments. The intention of the list of "allowed onsets" is not to list every possible onset but to list those that are most common. For example, шк is an allowed onset but only in a few words, e.g. школа and derivatives (same in Russian). For this reason we don't list шк among the allowed onsets, meaning that the syllable division by default is ш.к not .шк, which should work for most words. As for making changes in the allowed onsets, I have set up a system in the Russian pronunciation module to have a proposed new version of the module and automatically find all the words whose pronunciation would change as a result of that new version. We can set up the same system for Macedonian, and using this we can flush out the words that would end up with a wrong pronunciation and add the appropriate respelling before pushing the new version into production. The worst that can happen by doing this is we end up with unnecessary respelling, which I can remove by bot once we stabilize the set of allowed onsets.
 * As for the list of unstressed words, I can definitely add that. Is it really the case that three-syllable prepositions like околу are completely unstressed? That seems odd to me; in Russian, only one-syllable words and a very small set of two-syllable words can be unstressed. But whatever the case is, I can definitely implement it, along with allowing for explicit stress on these words. In Russian also, unstressed prepositions sometimes get stress, in which case the following word loses it, as in по́д гору "downhill"; for this reason the Russian module has a special symbol to indicate that a normally stressed word should not get any stress. Does Macedonian have something similar? Benwing2 (talk) 07:03, 13 August 2021 (UTC)
 * . Detecting all words which would be affected by a change in the module and respelling them before sending the new version into production sounds like a good idea. In that case, you can make a list of allowed onsets. As for polysyllabic clitics, not only is околу usually unstressed, it can form a single stress unit with a following word, e.g. "околу тебе" (around you) is pronounced with a single stress on the /u/, because the whole sequence behaves a single word to which the antepenultimate stress rule is applied. A sequence like "околу градот" (around the town) can have two stresses, one on околу and another on градот, but the variant with only one stress (on градот) sounds more natural to me. With two stresses, it sounds more emphatic, possibly in an explanatory or admonitive context. I think that it is normal for fast-spoken syllable-timed languages like Macedonian to have relatively less stress than stress-timed languages like English. In "затоа што не можам" (because I am unable to), I only hear a single stress on the negation "не". I don't know what the academic literature has to say about this, and the locally published reference works on Macedonian only deal with basic cases (the French Bon Usage is up to 1700 pages, ours is at less than 400), so I am relying on my intuition. Martin123xyz (talk) 07:12, 13 August 2021 (UTC)
 * For more cases like Russian под гору and Macedonian околу тебе, see the list of rules here. However, most of these restructurings of phonological boundaries are extremely stilted and pompous-sounding, so we I advise against including them into the pronunciation module. Only the following are natural: а) and ј) from Акцентски целости; points 2, 3, and 4 of a) from Проклитики, everything from Енклитики, and point 5 of a) from Акцентски низи. It should also be noted that some of the rules listed on that page explain when stress is NOT transferred; since these are default situations, I have not mentioned them here, even though they are correct and natural, e.g. б) from Акцентски низи. Since most of these rules will only concern full sentences and not lemmas or non-lemma forms, they are not a priority for the time being. They will become relevant if at some point we start adding example sentences for Macedonian, drawn from corpora, and transcribe them with the module in their entirety, even though I see that this is not done for other languages. Martin123xyz (talk) 07:26, 13 August 2021 (UTC)

вмровец
For, I can't figure out what respelling to use to prevent the schwa followed by the /r/ from fusing into a syllabic /r/, which is needed only for words like. If I use a break marker (#), stress is assigned twice. I haven't added this as a testcase because the pronunciation cannot be directly deduced from the spelling, which is irregular because the word is a suffixed acronym. Martin123xyz (talk) 12:58, 20 August 2021 (UTC)

I have found a solution: в‘м#‘#ро`вец. Martin123xyz (talk) 06:46, 24 August 2021 (UTC)

Improvements
I have resolved several of the testcases myself. I have modified the nasal assimilation rules, such that they know apply across the stress mark too. Consequently, "Унгарија" and "конфликт" are properly rendered, without any detriment to "унгарски" "комфорен", which was already transcribed correctly.

I have also modified the rule regarding the optional insertion of /j/ following /i/. Specifically, I have split it into four rules:

- A written /j/ between /i/ and /aɛɔu/ is optionally omitted.

- A non-written [j] between /i/ and /aɛɔu/ is optionally added.

- A written /j/ preceding a stressed syllable is not omitted.

- A non-written [j] preceding a stressed syllable is not added.

If you wish to modify the code to make it less explicit and more efficient, make sure that it still satisfies these rules. Martin123xyz (talk) 06:41, 24 August 2021 (UTC)

I have also fixed "комбајн" and "мјаука" by adding /j/ to the rules regarding syllabic sonorants. The undesired result is that the module now also lets /j/ become syllabic. However, this is not a problem, since /j/ never appears, syllabically or otherwise, in the contexts which make the nasals and liquids syllabic. There are no words like Czech "jmout" or Hungarian "adj". Martin123xyz (talk) 06:51, 24 August 2021 (UTC)

I have fixed everything except unpredictable syllabic boundaries and lack of devoicing preceding a voiced consonant across a word boundary. Martin123xyz (talk) 08:00, 24 August 2021 (UTC)


 * I have also taken care of the rules regarding devoicing and lack thereof. I have decided that the module requires no further changes. Martin123xyz (talk) 08:26, 25 August 2021 (UTC)


 * I just noticed a new problem; see "-pɫ̩ɔ-" in . --Gorec (talk) 20:19, 8 October 2021 (UTC)
 * Thank you for pointing this out. The problem was caused by the placement of "`" after the consonant cluster. If one wishes to use the symbol "`" or "/" (the two are equivalent) to indicate which syllable is stressed, one should either place it after the stressed vowel, in which case the stress mark will appear before the consonant cluster or within the consonant cluster, according to the individual rules the module uses to determine the onset; or place it within the consonant cluster, i.e. between two consonants, in which case the stress mark will appear in exactly the same place as the "`". Thus, writing "воздухопло/вец" will place the stress before the [pɫ], and writing "воздухоп/ловец" will give [pˈɫ]. Of course, one can also use Cyrillic letters with an acute diacritic, which is how the module was initially intended to work. I added a rule transcribing "`" as a stress mark only so that I could add transcriptions faster, because copy-pasting accented Cyrillic letters slowed me down. I subsequently added "/" as an alternative because I don't have "`" on my Cyrillic keyboard. Thus, "`" and "/" are only auxiliary elements in respellings and have some shortcomings. Martin123xyz (talk) 09:47, 9 October 2021 (UTC)