Wiktionary talk:About Proto-Sino-Tibetan

Proto-Sino-Tibetan phylogeny
For page creation:

Etymology
*


 * 1)

Descendants

 * Old Chinese
 * Middle Chinese
 * Modern Mandarin
 * Beijing
 * Jin
 * Wu
 * Shanghai
 * Xiang
 * Gan
 * Hakka
 * Cantonese
 * Guangzhou
 * Min
 * Amoy
 * Kamarupan
 * North Assam
 * Tani
 * Deng
 * Kuki-Chin
 * Peripheral Chin
 * Northern Chin
 * Southern Plains Chin
 * Central Chin
 * Maraic
 * "Old Kuki"
 * "Naga"
 * Northern Naga
 * Central Naga (Ao Group)
 * Angami-Pochuri Group
 * Zeme Group
 * Tangkhulic
 * Meithei
 * Mikir
 * Mru
 * Bodo-Garo = Barish
 * Chairel
 * Himalayish
 * Tibeto-Kanauri
 * Western Himalayish
 * Bodic
 * Tibetan
 * Lepcha
 * Tamangic
 * Dhimal
 * Newar
 * Mahakiranti
 * Kham-Magar-Chepang-Sunwar
 * Kiranti
 * Eastern Kiranti = Rai
 * Western Kiranti
 * Tangut-Qiang
 * Tangut
 * Qiangic
 * rGyalrongic
 * Jingpho-Nung-Asakian
 * Jingpho
 * Nungic
 * Asakian
 * Tujia
 * Lolo-Burmese-Naxi
 * Lolo-Burmese
 * Burmish
 * Loloish
 * Northern Loloish
 * Central Loloish
 * Southern Loloish
 * Naxi
 * Karenic
 * Bai

60.240.101.246 00:03, 23 December 2012 (UTC)

Wildcards
Per the Grease Pit, apparently some current PST entries are "lemmatized" in an ugly enough form to break templates/modules.

This looks more like lack of editor attention than a necessary problem. Considering the above example — Reconstruction:Proto-Sino-Tibetan/p(r)an/t ~ b(r)an/t — the function of all these extra symbols is not to be a part of the reconstruction per se, it's to highlight that it's not clear what the reconstruction should be. The intended set of reconstructions to consider seems to be {pan, pat, pran, prat, ban, bat, bran, brat}. This is of course a bit long to be a lemma by itself, but even with the conventions on show, it seems completely arbitrary why this should be abbreviated as "p(r)an/t ~ b(r)an/t" and not as "p/b(r)an/t", "p/b(r)an ~ p/b(r)at" or "p(r)an ~ p(r)at ~ b(r)an ~ b(r)at".

Also, instead of any of this playing around with slashes and tildes, I would suggest capital letters as a first recourse on this kind of disambiguation (e.g. "P(r)an"), followed by superscripts (e.g. "p(r)an¹").

At some point we should be also divvying up the given reflexes according to which reconstruction variant they point to (compare e.g. Proto-Uralic ), but that's surely less pressing. --Tropylium (talk) 11:19, 8 May 2017 (UTC)


 * “Ugly”? You should provide this feedback to the Sino-Tibetan Etymological Dictionary and Thesaurus instead – the best Sino-Tibetan etymology resource there is at the moment. This “playing around with slashes and tildes” is not a “kind of disambiguation”; this phenomenon is called allofamy. Instead of the rebuke here, the templates/modules need to be fixed to take into account the fact that not every proto-language reconstruction works like European languages. Wyang (talk) 11:33, 8 May 2017 (UTC)
 * I know it's allofamy. If allofamy is a form of disambiguation is a more philosophical question, but what I question is to what extent it needs to be explicitly spelled out in lemmatization, and if so, by what means?
 * At minimum, if we do want to keep doing this, the conventions should be explicitly spelled out on this About page, not left for the reader to decrypt by themselves. --Tropylium (talk) 12:46, 8 May 2017 (UTC)

Concerns about STEDT
I am deeply concerned by the choice of STEDT as the default reconstruction, given the numerous methodological issues detailed by Laurent Sagart, Nathan Hill, and Hannes Fellner, especially regarding "allofams" and "word families", which prevent its users from recognizing wanderwords, analogy, contamination, and other forces of linguistic change, while allowing preconceived notions of which words belong to a "word family" to prevent the consideration of other possibilities. While there are some lexical items for which the signal is so great that such noise is irrelevant, the vast majority do not belong to that set.

While STEDT is claimed to be the best Sino-Tibetan etymological resource that currently exists, one must keep in mind that etymology is a science. Science runs on explicitness. The lack of a set of explicit sound changes from STEDT's proposed proto-forms to the reflexes in modern languages makes one wonder if such sound changes exist, and if not, STEDT can hardly be called a scientific resource, and so should not be considered an etymological resource, let alone the best one that exists. If that is the case, and I hope I have made the case, then presenting its proto-forms as Proto-Sino-Tibetan reconstructions is pseudoscientific. Given Wiktionary is supposed to be committed to presenting scientific data, the removal of STEDT's "Proto-Sino-Tibetan" should be done as soon as possible.

P.S. My arguments are largely taken from the following papers: Fellner, H.A.; & Hill, N.W. (2019). Word families, allofams, and the comparative method. Cahiers de Linguistique Asie Orientale, 48(2), 91–124. https://doi.org/10.1163/19606028-04802001 Fellner, H.A.; & Hill, N.W. (2019). Word families, allofams, and the comparative method. Cahiers de Linguistique Asie Orientale, 48(2), 159–172. https://doi.org/10.1163/19606028-04802006 Vampyricon (talk) 15:47, 15 June 2023 (UTC)


 * I share your concerns here. I would also want to point out that what we have been calling Proto-Sino-Tibetan is actually what STEDT calls Proto-Tibeto-Burman, which is supposed to exclude Chinese. I believe removing STEDT reconstructions wholesale might be best to resolve all these issues. However, I do have one concern then; do we just not provide a reconstruction at the PST level? Would we ever provide a PTB level reconstruction? — justin(r)leung { (t...) 22:46, 26 June 2023 (UTC)
 * @Justinrleung Upon rereading my sources, I believe I have overstated my case. While STEDT does not provide good reconstructions, Hill and Fellner have praised it for the work put into collecting likely related words, and so these pages need not be removed entirely. (Though perhaps an "experimental" tag a là Dungan would be appropriate.)
 * But with that out of the way, given that there are no Proto-Sino-Tibetan reconstructions, as even STEDT doesn't apply one, the best case one could make for these pages is to move them wholesale to Reconstruction:Proto-Tibeto-Burman. However, given the dubiousness of the Extra-Sinitic grouping as a whole (which I take to be equal to Tibeto-Burman, though do correct my misunderstanding if any exists), due to the lack of demonstrated linguistic innovations, I would consider the various subgroup reconstructions to be the best we can do.
 * I think an appropriate model here would be the Afroasiatic family. The two main reconstructions disagree on almost every major point, so instead of providing these Proto-Afroasiatics, each branch has its reconstruction, e.g. Reconstruction:Proto-Semitic/ṯalāṯ-, in which their etymologies are pointed out: "Perhaps cognate with Egyptian ḫmtw, as are *ṯamāniy- and ḫmnw." For example, a hypothetical Proto-Sinitic *s.rum page would have an etymology section that states "Compare Written Tibetan གསུམ gsum, Tangut 𘕕 *sọ¹ or *so, Burmese သုံး /θóʊɴ/…" etc. (I also have issues with how all Sinitic languages are lumped under "Chinese", but I digress.)
 * I think removing STEDT reconstructions should be a high priority, but their wordlists can stay given a warning that they may not be true cognates. I have to emphasize that PST reconstruction is at a very early stage, so the best we can do for many lexical items is to point at possible cognates. To me, STEDT's confidence (and its supporters') is unwarranted. Vampyricon (talk) 03:32, 13 July 2023 (UTC)
 * I think is useful to have at least some central way of relating words with the same STEDT reconstruction. I suggest we consider an external link to the STEDT from the etymology section?
 * Note that our appendix does include Chinese descendants, and WT:About Proto-Sino-Tibetan rather applies that we should attempt to take them into consideration. STEDT is our starting point, which is not to say that updating it might not be rather like going from Brugmannian Proto-Indo-European to including the 3-laryngeal theory.  I do recall claims that non-Sinitic branches have lost the contrast between a certain pair of proto-vowels.
 * The terms "Sino-Tibetan" and "Tibeto-Burman" are ambiguous. The first can be taken to imply acceptance of a division into Sinitic and "Tibeto-Burman", and the latter to include Sinitic or to not include Sinitic.  The expressions of different views on terminology can be quite vehement. --RichardW57m (talk) 13:12, 2 October 2023 (UTC)
 * I don't think you're in this thread but @Benwing2 mentioned that it seems non-Chinese branches have also preserved the vowel. Either way, I don't think we are in a position to propose a new reconstruction, since we are documentarians, not innovators. At least, I assume we have something similar to Wikipedia's original research prohibition. Vampyricon (talk) 02:43, 12 October 2023 (UTC)
 * @Justinrleung It also seems that the main force here arguing for the use of STEDT has become inactive, so perhaps they can be moved? I am not sure what the procedures here are. Vampyricon (talk) 15:10, 18 July 2023 (UTC)