Wiktionary talk:About Proto-Uralic

Transcription
A preliminary proposal. This closely follows what we have at Proto-Uralic language, though that page having been heavily edited by me, this is largely according to own preferences.


 * Consonants
 * Nasals *m *n *ń *ŋ.
 * Older UPA *η (eta) or a digraph *ng for the velar nasal should not be used.
 * Several Hungarian etymology sections have, bizarrely, substituted for this the IPA symbol for the retroflex nasal (*ɳ).
 * The "palatal nasal" *ŋ́, suggested by some researchers, is not phonemic and should not be marked either (it is proposed solely to explain certain issues of Finnic phonology). Some sources may confuse this with *ń.
 * The retroflex nasal *ṇ is obsolete and should be replaced with *n.
 * Stops/affricates *p *t *č *ć *k.
 * *ć is disputed and any reconstruction with this instead of *ś should be, at minimum, cited exactly.
 * Sibilants *s *š *ś.
 * Some old sources include a fourth, *š́, which is obsolete and has not been supported in literature in decades; it should be replaced by *ś.
 * Retroflexion for *č *š is not necessarily accepted and the notation *č̣ *ṣ̌ should not be used.
 * Semivowels *j *w.
 * Some sources use Americanist *y for the former, or older UPA *β for the latter.
 * Labiodental *v for the latter is obsolete.
 * Liquids *l *r.
 * The retroflex lateral *ḷ is obsolete and should be replaced with *l.
 * The palatal lateral *ĺ is disputed, has not been defended in decades, and probably should not be used. Etymologies using this may be either rejectable, or either the plain lateral or the palatal spirant could be substituted; depends on the case.
 * The spirants are a contentious area. I don't have an immediate preference.
 * The traditional UPA notation is *δ *δ́ *γ (using Greek delta and gamma).
 * IPA-fied *ð *ðʲ *ɣ is sometimes used but if we do not use IPA symbols for palatalization elsewhere, "*ðʲ" would stand out.
 * A closer-to-ASCII system is *d *ď *x.


 * Vowels
 * First syllable: *a *ä *e *ë *i *o *u *ü.
 * The 8-vowel system used in the most detailed reconstruction systems so far.
 * *a is sometimes also written *å, but this seems superfluous.
 * *ë is sometimes also written *ï, *e̮ or *i̮. The latter two are more UPA-compliant, the former two exist as single entities in Unicode which I see as a point in their favor.
 * Long vowels *ē *ī *ō *ū are obsolete for PU and should not be used.
 * Second syllable: *-a *-ä *-i.
 * This partly implies vowel harmony. I suggest a distinction between *-a and *-ä chiefly to be able to include roots traditionally reconstructed as *i-a (though these are traditionally also considered "Finno-Ugric" rather than "Uralic")
 * There is still no clear consensus on what "*-i" was.
 * The traditional notation is *-e.
 * A vowel harmonic transcription uses *-i (front) versus *-ï/-i̮/-ɨ (back).
 * The most recent paper on the topic argues for *-ə. I personally find this convincing, but it has yet to catch on.
 * In light of current literature using mostly just *-i, this seems like the best option to use on WT.

--Tropylium (talk) 19:50, 1 December 2013 (UTC)


 * From what I've seen, I agree with your proposal (for the most part). Concerning the spirants, I believe that we should use *d and *ď. This is because not only is there still some disagreement with Uralic scholars on the actual phonetic character of these consonants, but Sammallahti also uses this notation (I think). Similarly, there still is disagreement on the phonetic character of *x. Most scholars agree that it should be a voiced velar fricative (ɣ), however some would still argue the use of a voiceless velar fricative. For neutrality, we should use *x, especially to go along with the fact that for PIE reconstructions on Wiktionary, they still use *h₁, *h₂, and *h₃, even if a majority of scholars agree on their phonetic character (*h, *χ, and *ɣʷ, respectively). We could even use *d₁, *d₂, and *x to go along with that...


 * In the case of "*-i", I think that we should use the harmonic transcription. One theory I've seen suggests that vowel harmony was only kept in certain dialects of Proto-Uralic (esp. more eastern dialects). In the more westerly dialects, it simply became a schwa. This would make sense, and it also explains why "*-i" diverges so frequently. The traditional use of "*-e" is somewhat misleading, so I might also argue the use of *-ə, as this emphasizes its harmonic implications, while still covering the possibility of it being a schwa. Plus, it simply looks neater than "ɜ", which is often used as well. Honestly, I think we need more input on this.


 * FWIW "*-ɜ" is not the same as *-e/*-i/*-ə. It is UPA transcription indicating a vowel of unknown quality. This problem frequently comes up if cognates from the key languages for the reconstruction of 2nd syllable vowels are not available. --Tropylium (talk) 14:23, 16 June 2014 (UTC)
 * For *ɤ or *ɯ, I believe that the best option is *ë. Again, there's still disagreement on whether it's a mid back unrounded vowel or a close back unrounded vowel, and this serves as a more neutral solution. Even though it leans to the former, this is what most Uralic scholars agree on.


 * --Lisztrachmaninovfan (talk) 20:56, 1 December 2013 (UTC)
 * I'm not sure about using ë in initial syllables. It makes the vowel system look kind of asymmetrical. How is this vowel reflected in the various languages? I presume it becomes õ in Southern Finnic, and e in Northern Finnic, but what about the other languages? 18:59, 26 February 2014 (UTC)
 * Proto-Uralic *ë has actually nothing to do with Proto-Finnic *ë. For those interested in the later development of PU *ë (which is in full detail as complicated as anything else involving Uralic vowel history), I've written a very brief recap of the research history on my blog. --Tropylium (talk) 14:23, 16 June 2014 (UTC)

Distribution
Given the traditionally assumed highly stratified family tree of Uralic, one problem that comes up is what we could and could not call a Proto-Uralic root.

Out of the traditional intermediate subgroups, at least Volgaic, Finno-Volgaic, and Finno-Samic (/"Finno-Lappic") can be considered obsolete by now. This leaves the "deep" groups: Finno-Permic, Finno-Ugric, and Ugric. (In theory, Ob-Ugric is also a question, but given Wiktionary's current rate of work on Mansi and Khanty, I don't think we'll need to worry about it any time soon.)

FP and FU, as reconstructed entities, are essentially identical to Proto-Uralic, and whether they ever existed as separate entities is disputed. I am not sure if we require separate appendices for these. At least in the case of words clearly traceable to Proto-Uralic, this would be completely useless ("PU *kala > PFU *kala > PFP *kala"). However, we already have some appendix pages such as Appendix:Proto-Finno-Ugric/ńeljä, complete with attached part-of-speech etc. categories. Should these be left in place and their model followed? Or should these be moved to Appendix:Proto-Uralic/ńeljä, etc? I lean on the latter. Perhaps a category such as "Proto-Uralic words of Finno-Ugric distribution" could be established as a neutral point of view between "FU is not a clade" and "FU is a clade".

Ugric seems to be a moot point, as no generally accepted reconstruction exists, and practically no proposals either. However, Hungarian etymologies not infrequently refer to Proto-Ugric when no cognates elsewhere in Uralic are known. I am unsure what should be done with these. Leave reconstructions unmentioned and simply list the cognates? Source each reconstruction? (Category:Hungarian terms derived from Proto-Ugric could of course remain in place.)

--Tropylium (talk) 02:01, 12 July 2014 (UTC)
 * I think taking out some of the ambiguous intermediate steps is a good idea. We can label terms as dialectal as necessary. 02:04, 12 July 2014 (UTC)
 * After four months+ of no further objections, I think I shall go ahead and convert Proto-Finno-Ugric entries to regional Proto-Uralic ones in terms of formatting. --Tropylium (talk) 20:59, 24 October 2014 (UTC)
 * What would be done with etymologies? Essentially you're eliminating Proto-Finno-Ugric from Wiktionary as a distinct language. That's probably something that would need a bit of wider scrutiny, as not everyone might like the idea that they can no longer use their favourite language name, or the name used in a particular source. —CodeCat 21:06, 24 October 2014 (UTC)
 * What do we do with language name synonyms or partial synonyms currently? The issue is quite similar. --Tropylium (talk) 21:55, 24 October 2014 (UTC)
 * We choose one and use it everywhere and don't use the other name at all. We also have special etymology-only codes, though. —CodeCat 22:00, 24 October 2014 (UTC)

The non-low vowel
There was some discussion above about the vowel denoted as ë or ï, both in first and subsequent syllables. I'm still not quite sure which of the two symbols is preferable, but I'm leaning towards e/ë, at least for non-initial syllables. The reason is that this vowel is widely reflected as a non-high vowel in the descendants. I haven't studied all the Uralic languages in much detail, but in Finnic it's e except finally, in Samic there is evidence of it having been no higher than ɪ, and Hungarian also clearly reflects a mid vowel (ë in endings is reflected as o, it appears). For initial syllables it's more varied, with ë reflected as i in Hungarian but as a in Finnic.

Therefore, I think that e/ë is the better symbol to use, for these reasons: I also think we should use the same symbol for both initial and subsequent syllables. This is based on Samic, where the two vowels develop identically. Not sure about the other languages though. —CodeCat 00:29, 14 January 2015 (UTC)
 * Its phonetic range was clearly between mid and high, probably in free variation, so it's somewhat pointless to try to pin it down.
 * If it was indeed a schwa, these two letters are already much more commonly used to denote a schwa in many languages. It's a more generic letter.
 * As a symbol, it's neutral with respect to height, standing between Finnic's low vowel and Hungarian's high vowel.
 * The idea that the unstressed non-open vowel was specifically */ï/ is not entirely generally accepted. It's based on two premises:
 * Proto-Uralic had full-blown vowel harmony.
 * The unstressed vowel system was maximally differentiated, and so contrasted open vs. close vowels.
 * These lead to reconstructing the unstressed vowel system as /a ä ï i/. If an unstressed /ï/ existed, one ought to assume also a stressed /ï/.
 * Still, there are at least two ways to go about doing this: either reanalyzing *ë as *ï, or reanalyzing *i-a as */ï-a/.
 * Moreover, premise #2 has been criticized as based on a misunderstanding of vowel system typology: the full phoneme paradigms of all languages include close vowels, but it is entirely possible for them to be absent from unstressed syllables. An unstressed [i] or [e] or [ə] or [ɤ] could be identified as an archiphoneme representing the neutralization of all non-open vowels.
 * Premise #1 has also been questioned, in which case something like /a i/ or /a e/ could be reconstructed. Graphically this would be the simplest option, so we'd have e.g. *pesa 'nest', *päla 'side', *pese 'to wash', *pëse 'mitten', *poske 'cheek'. But I guess adding in vowel harmony (and so *pesä, *pälä, *pësë, *poskë) would not be a major change, no.
 * Also, note on historical development: Samic reflects stressed *ë as *uo, i.e. the same as *a, and not the same as unstressed "*ë". You may be thinking instead of *e in words like ?*were > *vërë 'blood'. Hungarian, then, does not retain the original A/E contrast of unstressed vowels, so the value of its evidence here is unclear (moreover, modern stressed and unstressed ë/ö/o come from Old Hungarian i/ü/u, which were probably something like [ɪ]/[ʏ]/[ʊ]). It's been proposed that all unstressed vowels merged as *[ə] in the Ugric era.
 * Hungarian does have a mid-low contrast in endings and suffixes though, so if that's not inherited I wonder where it came from.
 * Concerning words with i-a, is the first syllable reflected as a in Finnish and uo in Samic as well? I'm asking because I wonder if the harmony of the second syllable entirely depends on the first. If i was retained as i in this case (and therefore was a true front vowel), then the a in the second syllable was independent and contrasted with ä. —CodeCat 01:21, 15 January 2015 (UTC)
 * , what's the current consensus on the use of *-ə in PU? -- 23:25, 11 February 2019 (UTC)
 * No PU entries use it as far as I am aware, so it would be considered reduced from something else (but PU/UPA has symbols for unclear vowels as well). &mdash; surjection &lang;?&rang; 07:52, 12 February 2019 (UTC)
 * Yes, I'm aware that no entries currently use *-ə. Please see discussions above and . -- 08:37, 12 February 2019 (UTC)

d and ď
Are these symbols really that widely used for Uralic? All I've ever seen is δ and δ́. —CodeCat 20:18, 20 January 2015 (UTC)
 * Not especially widely I suppose, mostly in Finland in recent decades (or some variant such as *d *j, and reserving *y for the semivowel), but as User:Lisztrachmaninovfan notes above, this might be the most neutral notation that has a precedent in usage. I don't think anyone has been using the traditional notation in a while, either.
 * δ́ also has the usability problem of being difficult to visually distinguish from δ in several fonts, and I would not want to switch to a digraph like δ´ or a different palatalization notation like δʲ either. --Tropylium (talk) 22:17, 20 January 2015 (UTC)
 * Ok, but if we are using it for Uralic, we should be using it for all of its descendants too? —CodeCat 00:18, 21 January 2015 (UTC)
 * I'm not sure what you mean. The point is not using ‹d ď› to transcribe, it is that we don't know what the realization of *d and *ď was. Aside from dental spirant values, various researchers have proposed e.g. , , . --Tropylium (talk) 21:36, 30 January 2015 (UTC)
 * I was thinking specifically about Proto-Samic, where δ is used as well. Personally I don't like some of the special UPA symbols. They're confusing to the wider number of people who know IPA, and the use of Greek letters instead of similar-looking IPA letters is especially silly. So my personal preference is for ð. But like you said, it's not clear what the phonetic value is, so in that respect it makes less sense to choose an IPA symbol. However, choosing d is equally deceiving because it is also an IPA symbol. So I propose using đ. It's not an IPA symbol, so it's abstract enough to convey the intended meaning. —CodeCat 21:42, 30 January 2015 (UTC)
 * Moving from *δ to *ð for Proto-Samic should be mostly OK. If we're going to be entertaining entirely novel schemes, I've used in my own writing elsewhere the convention *δ = *d₁ and *δ́ = *d₂, following the notation for PIE laryngeals. --Tropylium (talk) 04:22, 31 January 2015 (UTC)

Dialect issues
We treat Mari and Komi as pluricentric languages, not as minifamilies of two languages, and so it is unclear to me what benefit there is to "distinguishing" identical forms in the two standards from each other. Strikes me as akin to doing something like "cognate to British English fish, American English fish, Australian English fish." In terms of encoding, at least for Mari this would be straightforward, since we have the top-level code chm in use.

About 90% of all sources I've seen do a good job at distinguishing Erzya and Moksha from each other, so that should not be a major problem.

Nenets is a relatively straightforward case too: most sources either list Tundra Nenets reflexes only, or make sure to mention if a form is Tundra or Forest Nenets. The only problem is that Wiktionary currently does not distinguish TN and FN in any way, as far as I can tell!

Khanty, Mansi and Selkup are messier problems, especially given that several dialects are extinct or moribund and unwritten. I guess we can for now stick mainly to data from literary Northern Mansi / Northern Khanty / Northern Selkup. --Tropylium (talk) 21:29, 30 January 2015 (UTC)
 * The important part is that they have separate codes and names, and therefore can have separate language sections on a page. We can't use to link to both of them at the same time, so it's necessary to link to each one in turn.
 * My note regarding Mordvin is about what to do when someone writes just "Mordvin" as the language. I don't know whether that's Erzya or Moksha or if there is no way to tell.
 * We could create a distinct code for Forest Nenets, if that's necessary? —CodeCat 21:35, 30 January 2015 (UTC)
 * Perhaps. Forest and Tundra Nenets are usually considered to be more distinct from each other than Forest vs. Tundra Enets. --Tropylium (talk) 21:43, 30 January 2015 (UTC)

Order of descendants
Since by far most speakers of Uralic languages (and, hence, people reading these pages) are either Hungarians or Finns, I wonder if these languages should be moved up to more prominence.

In terms of similarity between the languages, the most logical order would be a more strongly east-to-west (or west-to-east) order with Hungarian relocated between Mansi and Permic, and Finnic relocated between Samic and Mordvinic. On the other hand, we could just start with:
 * Ugric
 * Hungarian
 * Mansi
 * Khanty
 * Samoyedic

which would leave Hungarian prominently at the top, Finnic prominently at the bottom. --Tropylium (talk) 01:25, 1 February 2015 (UTC)
 * I don't think it's a fair assumption that the people reading these pages will be looking for their own language. In fact, the chances are high that they will end up on the page from their own language, so they'll be wanting to see other languages than their own. —CodeCat 01:40, 1 February 2015 (UTC)