User:PhanAnh123/notes

Mon-Khmer vs. Austroasiatic
Since the concept of "Mon-Khmer" (Munda vs. the rest) is now almost universally rejected, native words from Austroasiatic languages ought to be edited accordingly when an actual reconstruction of proto-Austroasiatic comes out, hopefully in a not so far future. Entries for reconstruction forms created using Shorto (2006) can be redirected.

Sino-Vietnamese and Nôm readings
When it comes to Sino-Vietnamese morphemes, it seems that nowadays the Northern forms dominate to the great extent, with non-Northern forms such as, and  becoming much less used or just completely disappeared. On the other hand, forms such as, , are still maintained and many compounds involving these are no longer considered dialectal, existing alongside Northern , ,.

There is also the interesting triplet, ,. Not too much can be said about the former two as their distribution was already all over the place when the Latin alphabet started to become widely used during the colonial period, but it seems like thực was quite a bit more common in Northern texts. thực and thật are now the two main forms seen in compounds, while thiệt still has its relevance in colloquial Southern speech.

Also particular is the case of and, while the adjectival hảo (< ) is the Northern form, the verbal hiếu (< ) is Southern.

As far as I know, there is no comprehensive study on the dialectal differences of Sino-Vietnamese morphemes, which is unfortunate. Considering how very Sino-centric the study on Vietnamese was (and to some extent, still is), I'm surprised no one has picked up the task of examining the body of Vietnamese Latin texts written in the late 19th-early 20th century, when dialectal features in written texts were even more pervasive than nowadays.

This applies to texts not written in Latin script too, of course: the character in a text by a Southern writer would almost certainly not be pronounced  as often transliterated these days, but. In the same way, although we can't be absolutely sure considering his life and that his mother was a Northerner, it's possible that the character 𠊛 in the first line in Truyện Kiều would be read as, not , by the author himself.

Tones in reduplicative "suffixes"
In Vietnamese, there seems to be a very noticeable tendency for "suffixes" in rime replacement reduplication patterns to bear tone C. Discounting, which is not subjected to tonal assimilation, in Appendix:Vietnamese reduplication, only three "suffixes" (-ăn in B tones, -ang in A tones, -âp in B/D tones [this one is a bit of a cheat since it's a checked rime]) do not bear tone C. The other 5 "suffixes" all bear tone C.


 * -oi reduplication (which I can only think of 4 examples:, , , ) which also bears tone C, so that's a potential 6th.
 * -ung seems to be a pattern that emerged recently, appears to be somewhat productive and also bear tone C, that's the 7th.

Is there an explanation for this?

South Central Vietnamese
The dialects spoken in Đà Nẵng, Quảng Nam, and Quảng Ngãi are very marked for the very numerous vowel and rime shifts. From my very limited interaction with speakers of these dialects, I can say that the phonetic realizations of vowels in these dialects are extremely different from the rest of Vietnamese, with and  bouncing around everywhere and the weaker presence or absence of labial-velar allophones. There have been some preliminary studies on the phonetics of these dialects available in English, as well as a long dissertation by Tooyama  in Japanese.

I am not sure if these dialects should be classified as belonging to the Central or Southern dialect region, as they share many features with both the North Central and Southern dialects, but they sure blow my mind very time I interacted with a speaker of one.

Thanh Hoá
Needless to say, there is no such thing as a single "Thanh Hoá dialect". In the Northern half of the province, a variety of Northern dialects are spoken, while the Central dialects are spoken in the south. Some of the features commonly associated with the Southern dialects seem to originate from this province, including the merger of the two C tones (the hỏi-ngã merger) and the transphonologization of the morpheme into hỏi tone used in pronouns (i.e., , etc.). As with some other coastal Northern dialects, the merger has taken place in some coastal areas of Northern Thanh Hoá (although with the on-going dialect leveling, they might disappear within the next 100 years), so that there are  for mainstream,  for mainstream ,  for mainstream.

Also needless to say, these dialects are horribly under-researched.

Vowels
In Northern and Southern Vietnamese, each of the three vowels have two reflexes:, ,  (note that like Modern Vietnamese, Proto-Vietic only had length constrast for  and , so the length of  was not phonemic, but phonetic, and probably not pronounced any different from Modern Vietnamese , Ferlus assumed that they were phonetically short automatically before ). If we go from Proto-Vietic, there is one way to predict whether these vowels would diphthongize: But other than that, it is literally random, and even the above rule has exceptions: has a diphthong, while  has a conservative monophthong, although both ended in  at the Proto-Vietic stage.
 * didn't diphthongize when they were word-final or when they were followed by and

What we do know for sure, is that the North Central dialects mostly escaped diphthongization and usually maintain the conservative monophthongs.

The Northern dialects can be defined as having underwent these innovations in vocalism:
 * The lowering of in closed syllables to either modern  (,, , ,  (interjection), etc.) or  (, , , etc.). Southern dialects usually escaped lowering to  (, ,  are some exceptions) but was either influenced or adopted the lowering to  in many words, thus either replaced the unlowered forms or created doublets, the North Central dialects escaped both. There are also cases where all dialects escaped lowering (, , etc.)
 * The diphthongization of to  in word-final position. North Central dialects seem really much unaffected by this shift, Southern dialects had been affected to a large extent but the monophthong remains in some items ( instead of,  instead )
 * The late shift of and  that have been mostly undone at least writing-wise. Forms like  for current,  for current ,  for current ,  for , were widely attested in Northern texts, the Southern dialects were also affected although less so, Central dialects were not widely written during the colonial period but it probably can be assumed that they were unaffected.

Ngã-nặng "alternation"
It is fairly well-known that some North Central dialects (not all, probably not even the majority) exhibit the merger of the tones ngã (C2) and nặng (B2) into one tone that usually perceived as being nặng. I wonder if this has anything to do with pairs like -, -. The implication is whether there were cases of borrowing from the dialects with this merger to the "mainstream" dialects without it.

Biggest mystery in Vietnamese linguistics
... Why is the Sino-Vietnamese reading of  and ? Note that other characters within the same phonetic series, such as (> ),  (> ),  (> ), have the expected readings.
 * Intense contamination by (presumably native) ?
 * Taboo avoidance? (the standard ad hoc explanation)
 * Simply "misreading"? (whatever that is)

Lenited/plain vs. aspirated pairs
The innovative aspirated forms are a marked feature of the North Central dialects, some of these also present in the Southern dialects. One curious thing about these is that they are almost all verbs or adjectives, with / being the only noun.

Note that not all of the aspirated forms occur in all North Central dialects, but rather this is compilation from various sources. Some of these are orthographic inference from forms given in IPA in Nguyễn Thị Thuỷ (2022)'s paper on the Cao Lao Hạ lect; for example, "to scratch" was orginally given as and rendered as  here.

Potentially also vs., this is really uncertain, however.

Words for animals across registers
Pretty much every language with written tradition has at least three speech registers: colloquial, formal, and literary (many languages have more, and unwritten languages can still have multiple formal registers). It seems that there is some preference in the option of whether to include "category word" in the words used call certain animals.
 * Fish: It seems that for fish, the "category word" is very dominant in colloquial and formal registers, meanwhile, in literature, it's often omitted.


 * Snakes: For snakes, interestingly the colloquial and literary registers often omit the category word for snake names that are disyllabic. It almost always present in formal language, no matter the length of the snake names. Words like, , etc. always have category word across all the registers.


 * Birds: certainly the most difficult one to pin down, I personally often omit, but many people don't do so. There's some that are exclusive literary, for example, , while colloquially and formally, it's just.

The origin of Vietnamese "general" inanimate classifier
The extremely common classifier is a weird case. It's often considered a loan from the well-known Sinitic classifier, to which it shares notable phonetic similarity and for some time also in my view the origin of the Vietnamese word.

However, this etym presents not only in Vietnamese but also other Vietic languages: 🇨🇬 Bi, 🇨🇬 [Cuối Chăm] keː³, 🇨🇬 [Mày] kɛ⁴, 🇨🇬 [Rục] kɛ⁴ (Nguyễn Văn Lợi, 1993), all used as classifier. The dominant and general trend in Vietic languages when it comes to vowel shift is diphthongization, not monophthongization (there is monophthongization in the Southern dialects of Vietnamese, but they're obviously unrelated), so the written Vietnamese form and the Muong Bi form can be taken as innovative. The most likely original vowel was *eː, as preserved intact in Cuối Chăm (cf. Cuối Chăm keː¹ vs. written Viet. ). Now I think is out of the picture because the vowel mismatching, that doesn't mean I don't think the Chinese phoneme did not influence the Vietnamese word semantically though (on the contrary, I absolutely think it did), but what's the next most likely etymology? A native Vietnamese speaker would probably connect it with (> homophonous, and ), and I think it's not bad too, "female" is usually connect with "great, main" in Vietic so I can see its development.

Anyway, I think there's a third option: the demonstrative "that" in Austroasiatic languages (🇨🇬 > 🇨🇬, 🇨🇬 ke and the likes). The Katuic cognate of Vietnamese "thorn" is 🇨🇬 > 🇨🇬 so the vowel correspondence is not a problem. If (big if) this is true, I don't think the item still had the meaning "that" at the Proto-Vietic stage, but might be just some kind of "focus/topic" marker (the thing there, you there). As seen in Northern Middle Vietnamese and modern Muong Bi, apart from classifying inanimate things, it also marks/marked some animals, indicating that its function at least when it comes to Proto-Viet-Muong used to be broader than it currently is in Vietnamese, which might not the most compelling argument for the "that"/focus marker hypothesis, but I do think it does point to that direction. Also I am not sure if the modern use of the Vietnamese word as a focus marker placed before another classifier is a trace of this possible old usage or not, although I lean more on that it is not.

Tuệ Tĩnh
Tuệ Tĩnh had some formulaic poems (that are mostly) in Chinese, but with the first line providing the translation of the medical ingredient in Vietnamese:

The first line means "The is colloquially (i.e. in Vietnamese) called ".

In, the first line is:

The last character is 巴⿱例 ( + ), a compound phonogram (or phonogram with double phonetics, called ). It is a small puzzle why "sky" was written with as the phonetic in semantophonograms or a phonetic in compound phonograms, instead of something more suitable like ; if Shimizu was correct, poems like this had been altered later by scribes in order avoid the  in the name of Lê Lợi (黎利), after all Tuệ Tĩnh died quite a while before the ascend of Lê Lợi to the throne.

His Nam dược quốc ngữ phú (南薬國語賦) is a work in Vietnamese that also lists Vietnamese names of many medical ingredients along with Chinese translation.

Northern features in the early Vietnamese texts
The early Vietnamese texts include Phật thuyết đại báo phụ mẫu ân trọng kinh (佛說大報父母恩重經), Cư trần lạc đạo phú (居塵樂道賦), Đắc thú lâm tuyền thành đạo ca (得趣林泉成道歌), Giáo tử phú (敎子賦), Thiền tông khoá hư ngữ lục (禅宗課虚語録), Nam dược quốc ngữ phú (南薬國語賦), Quốc âm thi tập (國音詩集). These texts are all obviously written in Northern dialects, some observations can be made:
 * Reciprocal marker is always the innovative, not the conservative (and modern North Central).
 * The demonstrative system is obviously the Northern-Southern type, not the Central-type with, , , etc.
 * Specifically, the chief distal demonstrative in all of these is (most often spelled with the phonogram ); not even  (the chief modern Southern distal demostrative) showed up in any of these, let alone North Central.
 * Taking Phật thuyết as the earliest, diphthongization already took place in (spelled  < ), later texts do show, but this can be taken as their due to the use of the graph to write homophonous  or indicating a Central loan (yes, modern Northern-Southern  is a Central loan).
 * Similarly but more straightforwardly, "to enter" (modern Northern ) was spelled with or other characters with  as phonetic, clearly indicating diphthongization, Central-Southern  started showing up much later, spelled as  (SV: ).
 * Common Viet-Muong final liquids became  and was lost after mid and high front vowels due to phonotactic constraint (phonemic  are non-existent in all Vietnamese dialects), while these became  in a number of North Central dialects and thus escaped it:  for  vs. some modern North Central dialects,  for  vs. modern North Central.
 * Building on the previous point, obviously became  after other vowels too, but was preserved after these vowels:  (phonetic ) for  vs. some modern North Central dialects, , etc.
 * was very suspiciously absent, although also barely had any attestations either, so it was probably just seen as uncourtly to use direct 3rd person singular pronouns.
 * More ad hoc, but and  was spelled with graphs like  and  respectively ( and  in Phật thuyết), showing cluster simplification, while the North Central dialects at the time almost certainly still had a cluster with  for these words.

Series of demonstrative and interrogative pronouns
"Place" series, attributive. At earlier stage, must always modify a noun, therefore, ,. is probably originally a member and variant of, cf. , . "Place" series, nominal. Can be used on themselves, hence, ,. disappeared in common use some time before colonial period, but survived for a while as part of but is now fully obsolete. "Manner" series. disappeared some time in 20th century,  continues in common use in all dialects. is probably also a member. "Extent" series. "Extent" series.
 * n-series: //,, , , ,
 * đ-series:, , , ,
 * r-series: /,, , ,
 * b-series:, ,
 * v-series (variant of the b-series?): ,

appears to be a stray. All of these series have at least one "proximal": /,, /, , whose nucleus all certainly goes back to a high front vowel, with the Central  and  preserve the vowel as is, they also all have an A tone. Four series have an interrogative pronoun/question particle:, , , whose nucleus goes to a high or mid back vowel, with  preserves the vowel as is, they also all have an A tone. might be in fact a part of the r-series: it was extensively spelled with, indicating earlier , if so, it and might be technically just variant of each other.

Same chart, but with Nôm characters:

Nôm texts
Nôm texts don't contain only semantophonograms, but were usually a mix of phonograms and semantophonograms, and of course, obviously Sino-Vietnamese elements with good ol' Chinese characters. Some pure semantograms also appeared occasionally. The choice of whether to use phonograms or semantophonograms was entirely up to the writer: what spellings were their preference, how much ambiguity they felt they could spare using phonograms, and of course, conventions (yes, Nôm characters as a whole were unstandardized, doesn't mean that there weren't conventions). This is same sort of deal with the choice of whether to use phonograms or semantograms for Old Japanese writers: some logic, but a lot of whims. The latter Nôm texts show tendency to use more semantophonograms, although phonograms did not go away and many words were still spelled predominantly with phonograms, similarly to how modern Japanese mostly uses phonograms for personal names and place names only, with content words spelled mostly with semantograms and kanas (which are descendants of phonograms).

Here're some lines that were spelled with almost only phonograms (and regular Chinese characters for Sinitic elements): (Nguyễn Trãi, QÂTT)

(semantic + phonetic ) is the only semantophonogram here, or maybe  as whole was used as a phonogram, per Miyake (2003).

On the other hand, this line was spelled with only semantophonograms: (Nam quốc phương ngôn tục ngữ bị lục)

Most often, it's a mix of both: (Nguyễn Trãi, QÂTT)

Demonstratives, personal pronouns, question words, final particles, as well as some common adjectives and verbs, were all chiefly spelled with phonograms; that is to say, the more common a word was, the more likely it was to be spelled with phonograms. For examples, the particle the vast majority of the times was spelled with, the common classifier  with ,  with ,  with ,  with ,  with. Some common verbs are usually spelled with semantophonograms, however, like with  (this one shows the power of convention: it's unlikely that literati could misread  if it was spelled  all the time, but because 咹 became the conventional character, it was used).

Vịnh Hoa/Vân Yên tự phú
This work is attributed to (1254-1334). There seems to be two versions available on the Internet, this (1) and this (2). I've only compared the beginning of each, but there're very obvious differences between the two. Also, shouldn't 認𫀅 be read as instead of nhận xem?
 * The title is 詠雲煙寺賦 in (1) and 詠花煙寺賦 in (2)
 * tiên is spelled 仙 in (1) and 僊 in (2)
 * "hundred" is spelled in (1) and  (⿰林百) in (2)
 * the first "strange" is spelled in (1) and  in (2) (both have 𨔍 for the second "strange")

Initial *k- and *h- in Vietic vs. null initial
Only these few examples. Likely innovation in Vietic, no clear environment.

More complicated than *k-, which seems obviously innovative. In "to sniff" and "foul-smelling" at least, Proto-Vietic *h- could potentially a reflex of pre-Proto-Vietic *sʔ-, while for "(that >) 3sg.", the various forms with h- might indicate a grammatical function.

There is also at least one case of post-Proto-Vietic innovation for *h-: Vietnamese, Cuối Chăm hɐːm² vs. Arem ʔæːm, Rục təŋʔaːm¹, Pacoh , etc.