User:Ivan Štambuk/TODO

Words to inspect

 * - Paleo-Balkan substratum in Slavic, or borrowed from Romanian as it claims?
 * , - find exact Iranian source or approximate etymons
 * - Ottoman Turkish çanta: or German < Latin  < Ancient Greek  < Akkadian  < Sumerian  ?
 * < Turkic kūinig < Old Chinese undefined:, undefined: < steppe word < Sumero-Akkadian source whence also Latin cuneus: for cuneiform???

Policies to discuss/made

 * Uncertainty on how to lemmatize Sanskrit verbs. Classical dhātu-based approach seems to me much more better, after I've find out that there are not so few (important!) verbs verbal roots that exhibit several conjugation classes simultaneously. Also, commonly citable Rigvedic forms in etymological dictionaries sometimes never appear in Classical Sanskrit.
 * How to treat Vedic endings which became obsolete in post-Pāṇinian times? Either 1) to create e separate set of "vedic declension" templates, or 2) overload existing ones with type=Vedic or 3) overload on per-case basis, covering only actually attested forms
 * Illiterate Pinyin without tone marks appears to be meritable for inclusion; how about the the actually phonemic pitch accent marks in my mother tongue which educated folks use in actual writing (indicating e.g. genitive singular with circumflex or macron on last vowel), and which is not written in official orthography? WT:CFI boils down "standard orthography" to attestation evidence.
 * Lots of folks uses the term "cognate to" when in fact both the words in discussion where simply borrowed from a common source. Manual of style for ===Etymology=== would include:
 * Classical languages of Sanskrit, Latin and Ancient Greek linking cognate etymons among each other + Englisc/English cognate when present (this is English Wiktionary after all). Old Persian, Avestan & Anatolian Grabar, OCS, OHG, OE, ON, OI etc. would constitute freely open layer of mutual mentioning.
 * suggest avoiding diachronical mismatch. I.e. mentioning in modern Russian or Pashto word a Latin or Hittite cognate is a no-go.
 * discuss controversial macrofamilies. Some potentional cognates within the Nostratic framework look really cute (e.g. Latin - especially when the lack of PIE cognates is obvious). It is of utmost importance to emphasize that these are potential rather than actual cognates.
 * For big families (Slavic, Germanic, II) it is pointless to repeat a list of cognates in the same PIE subfamily and in other branches and discuss more detailed etymology at every individual language's ===Etymology=== section. Instead, they should be favoured to appear in Appendix: like with what I've been doing with Proto-Slavic stuff.
 * Provide exemplary formatted examples as role models
 * All extinct/ancient languages that were not described by native grammarians (i.e. all the other ones beside la/sa/grc) should take special care for lemmatization. If the evidence is scarce enough (i.e. no verb/noun attested in all inflected forms) - only attested forms should merit inclusion. For those more abundantly attested and almost completely predictable (like Avestan and OCS), lemmatizing into unattested base forms might be OK. General guidelines should be set somewhere, with per-language override policies.
 * Discuss with Hebrew/Arabic/Aramaic folks on how to mention Semitic cognates. Investiage genetic-dialectological relationship between Maltese and MSA.