Module talk:sla-noun/data

Accents
Can you make this module conform to the new accent symbols as well? At the moment, it's putting tildes on and  in some of the forms. The module should take into account the length of the neoacute syllable in AP b.

Also, at the moment the module requires explicitly specifying the nominative singular form when an accent is provided. This can be generated automatically if the accent paradigm is known, in most cases: always a double grave or circumflex on the first syllable of the stem in AP c, always a grave or tilde on the last syllable of the stem in AP b, and grave on the stem if the stem has one syllable in AP a. Only in the case that an AP a noun has multiple syllables can the module not infer its position, and does the accent need to be indicated as an argument to the template. It may even make more sense to explicitly forbid providing the accented form as an argument, when the module can infer it from the accent paradigm and page name, to avoid conflicting/nonsensical forms. —Rua (mew) 10:48, 19 May 2019 (UTC)
 * Will do. Can you clarify again what exactly needs changing besides what you mentioned above? Benwing2 (talk) 10:56, 19 May 2019 (UTC)
 * I'm not sure what you mean. I think this is all that needs changing? You can use the function I wrote in Module:sla-headword to check if the accents are correct. —Rua (mew) 11:01, 19 May 2019 (UTC)
 * Just checking what exactly you want me to do. I'm gathering you want this:
 * Convert tilde to single grave when placing it on a short vowel (= e o ь ъ when not occurring in liquid diphthongs). (Should it also convert circumflex to double grave on short vowels?)
 * Infer the accent of the nominative singular if possible and not specified, according to your description above. I don't want to forbid supplying the nominative singular, e.g. for use on test pages where the page name doesn't correspond to the lemma, but it might make sense to throw an error if the wrong type of accent is supplied given the paradigm.
 * Benwing2 (talk) 15:48, 19 May 2019 (UTC)
 * Yes, that's pretty much it. To recap:
 * Tilde and grave are the long and short neoacute respectively, and occur in the stem-final accented forms of AP b.
 * Inverted breve and double grave are long and short circumflex respectively (at least according to some systems of nomenclature), and occur in the initial-accent forms of AP c.
 * In AP a, the stem must be long/acute if it has a single syllable. In multisyllabic AP a words, the accent is on one of the syllables, but this cannot be predicted. It is usually acute, but it could be on a non-initial short syllable (stem-internal Dybo's law) or even have a short or long neoacute on a non-final syllable of the stem (Dybo + Ivšić).
 * —Rua (mew) 16:43, 19 May 2019 (UTC)
 * Just to verify, inverted breve and double grave are always allophones of each other, with the inverted breve occurring only on long vowels and liquid diphthongs, and the double grave occurring only on short monophthongs? This is not currently what's implemented, where the inverted breve can occur on short monophthongs in some circumstances, while the double grave can never occur on long vowels and liquid dipththongs, and is always converted to inverted breve in those circumstances. Note also that there's only one reference to inverted breve in Module:sla-noun/data, which is here:
 * This is specifically intended for nouns like, so they get accusative singular *brъ̑vь where the accent has to go onto the stem suffix because the base stem is nonsyllabic; otherwise you would get a double grave on the first syllable of the stem (but there don't appear to be any syllabic accent-c consonant-stem nouns in our current corpus). If you think this is wrong and should be brъ̏vь because short vowels can never bear the inverted breve, then we can simplify a lot of code. Benwing2 (talk) 19:58, 19 May 2019 (UTC)
 * Yes, they are in complementary distribution. The long circumflex or long neoacute on original short vowels is another Leiden thing, which we also got rid of. So that form should be . —Rua (mew) 20:00, 19 May 2019 (UTC)
 * The odder thing about is that it has a long-short alternation, and therefore also an acute-short alternation. That's not unique to this noun, though, it occurs with many verbs ending in a diphthong (or sometimes a long vowel) too, e.g. *pluti ~ *plovǫ "swim", *tęti ~ *tьnǫ "cut", *derti ~ *dьrǫ "tear", *biti ~ *bьjǫ "beat". I wonder what accent paradigms such verbs appear in, given that half the paradigm might have an acute and the other half not (thus being half in AP a and half in AP b?). —Rua (mew) 20:05, 19 May 2019 (UTC)
 * The odder thing about is that it has a long-short alternation, and therefore also an acute-short alternation. That's not unique to this noun, though, it occurs with many verbs ending in a diphthong (or sometimes a long vowel) too, e.g. *pluti ~ *plovǫ "swim", *tęti ~ *tьnǫ "cut", *derti ~ *dьrǫ "tear", *biti ~ *bьjǫ "beat". I wonder what accent paradigms such verbs appear in, given that half the paradigm might have an acute and the other half not (thus being half in AP a and half in AP b?). —Rua (mew) 20:05, 19 May 2019 (UTC)

Chuck, forgive the errors, we are working through them. Rua, can you look at the errors? The accents on *voľa, *koža and *žęďa are probably correct (the special *voľa type of accentuation) and I need to handle them, but the others may be errors. Benwing2 (talk) 23:37, 19 May 2019 (UTC)
 * The stem neoacute on *voľa, *koža and *žęďa should now be allowable. Benwing2 (talk) 00:28, 20 May 2019 (UTC)
 * I've fixed (had the wrong AP),  (given with a grave by Derksen, but probably in error),  (was written with an incorrect letter). The accent at  is probably correct and it's the module that's wrong; according to  all stem-accented AP c forms must be circumflex or short, and can't be acute. The accent at  may actually be incorrect for this reason, but Derksen gives no accent or AP for this word.  looks like an error in the accent or AP, but it's missing from Derksen so I have nothing to check it with. —Rua (mew) 07:52, 20 May 2019 (UTC)
 * Olander gives AP c for . —Rua (mew) 08:51, 20 May 2019 (UTC)

More on accents
While doing some editing on the module, I noticed that in some of the forms, AP a is marked as having an unknown accent. But AP a has the same accent in all forms, so once it's known for one form it's known for all of them.

I also noticed that the instrumental plural ending is given with a long ī, and with an accent this becomes a long í and ý. We no longer include the ´ symbol, so this would have to become ĩ and ỹ, but that's the symbol for neoacute which cannot occur on a final syllable. Why is this ending being marked as long in the first place? Jasanoff gives them with acutes, so they should be ì and ỳ. —Rua (mew) 18:36, 19 May 2019 (UTC)
 * The long instrumental plural ending comes from the Leiden School. I don't remember the details any more but it might have something to do with the outcomes in Slovenian, which is supposed to preserve reflexes of vowel length in final inflectional syllables. I think I took the accents of many of these paradigms from Verweij 1994 "Quantity Patterns of Substantives in Czech and Slovak" because that was the only reference I could find at the time that listed all the paradigms in detail including the accents. Verweij is definitely of the Leiden School. I can't unfortunately get a complete copy of this paper; Google only has every other page, and I don't have access to JSTOR or a copy saved on my local computer. If Jasanoff differs, I would follow him. Also feel free to fix up the AP a forms. Note that I already implemented the change from tilde -> grave on short accents; I'll do the remaining changes shortly. Benwing2 (talk) 19:20, 19 May 2019 (UTC)
 * Jasanoff, by virtue of being in the traditional camp, doesn't reconstruct distinctive length at all, so the macron is purely a Leiden thing, as User:Guldrelokk noted at Wiktionary talk:About Proto-Slavic. Kapović explicitly states that all final syllables shorten throughout Slavic. This must be after the application of Ivšić's law, which retracts from older final circumflex vowels. The single instrumental plural form that I can find in Kapović's paper,, has no length indicated on the i. —Rua (mew) 19:30, 19 May 2019 (UTC)
 * I have implemented all the things you asked of me. You might get errors from the accent-checking behavior that's now in place. If so, please let me know if there is a bug in the accent-checking algorithm. Benwing2 (talk) 22:02, 19 May 2019 (UTC)
 * I've noticed more problems., ,  appear with macrons on the first syllable, but that's invalid. The macrons in AP b should always appear on the last syllable of the stem, just like the tildes, and of course only when that syllable was historically long. In this case, the syllable was short, so there should be no macron. One two-syllable AP b stem with a long first syllable  seem to be fine, for unknown reasons. —Rua (mew) 08:11, 20 May 2019 (UTC)
 * Looking more at, it appears that the form with macron is not generated by the template but rather given directly as an argument. Derksen gives the form with the macron on the first syllable, but how that's supposed to work I don't know. The macron is supposed to reflect the position of the accent prior to Dybo's law, but in a case like this, if the accent was indeed on the first syllable originally, the accent would not have been moved onto the ending but rather onto the last syllable of the stem. That would result in an unusual pattern with accent remaining on the strong yer in the nominative singular, but retracted from that yer in its weak forms in the other cases due to Ivšić's law. I suppose this could be considered AP b, but there'd be no actual ending-accented forms in this case. On the other hand, Russian does have the expected end accent in the genitive, which can't be reconciled with the macron on the first syllable that Derksen gives. It's a bit of a mystery. —Rua (mew) 08:19, 20 May 2019 (UTC)
 * I've stumbled upon Olander's Common Slavic accentological word list (which now has a template, ), which gives accent paradigms but not actual accent placements in the lemma. He gives AP b for, like Derksen. However, given that the traditional school to which Olander belongs does not consider unaccented length a thing, there wouldn't be macrons in his reconstructions anyway. —Rua (mew) 08:25, 20 May 2019 (UTC)
 * I wouldn't be so sure the macrons are wrong; I think in the Leiden school, accentual movements skip over weak yers. Benwing2 (talk) 12:35, 20 May 2019 (UTC)
 * I'm not sure if that's true of Dybo's law, though. After all, it does advance the accent onto a final weak yer, only for it to be retracted again by Ivšić's law. I'd like to know more about what the Leiden linguists have to say about this. In any case, the macrons in the neuter o-stems definitely are wrong based on Derksen, especially which doesn't even have a yer. —Rua (mew) 12:40, 20 May 2019 (UTC)
 * I fixed kry, but now bry gives an error. BTW I found the following comment that I left concerning macrons on liquid diphthongs:

-- (1) Original short vowels e o ь ъ can't get a macron. Per Derksen 2008, --    this also includes liquid diphthongs, which normally behave like --    long vowels; cf. 'borzdà' "burrow" in class b, where you expect the --    preceding vowel to be long if possible. However, we go against --    Derksen in this respect when the first vowel is e or o because Czech, --    Slovak and Polish show clear length distinctions (or reflections thereof) --    in original pre-tonic syllables in class b vs. c. (Serbo-Croat reflects --     length in both classes but this can be a later development due to --     analogy.) Per Kortlandt, the metathesis of liquid diphthongs preceded --    Dybo's law and (probably) the shortening of pre-tonic vowels.
 * Benwing2 (talk) 12:49, 20 May 2019 (UTC)
 * The metathesis must have preceded the split of short and long vowels by quality. Given that oR and eR metathesize to Ra and Rě, these must have had the same quality at the time of the change. This is probably one of the earliest dialectal changes that can be recognised for Slavic. The common use of liquid diphthongs for Proto-Slavic in this case is a bit like a placeholder for the various outcomes in the dialects. Whether such sequences could have length distinctions is probably equally a dialectal question due to the differences in syllable structure. It would be expected that the dialects with metathesis + early lengthening (i.e. south and southern west Slavic) allow length distinctions, since Ra and Rě are just like any other historically long vowel at that point. —Rua (mew) 13:34, 20 May 2019 (UTC)