User:RichardW57/WIP

=Work in Progress=

Intobesa Repair
Fixing generally poor formatting (the documentation on Wiktionary is hard enough for a native English speaker) and sometimes outrageous modifications. This was requested at WT:RFC, in the October 2022 section. His works are recorded in Special:Contributions/咽頭べさ. The list is pruned to omit discussion and earlier edits of the same page.

A blank cell for Status/TODO should be interpreted as 'Review'. Ideally, when the job is done, the Status will be 'OK'.

Mon Encoding
Mon is bedevilled by an inconsistency in encoding. Renderers vary in what they will accept. There are three different specifications as to what sequences they will accept, as shown in the table below. The three specifications are:
 * 1) Table 16.4 of the Unicode Specification (denoted USMB in the table below), which strictly speaking only applies to modern Burmese.
 * 2) Unicode Technical Note 11, which is not formally endorsed by the Unicode Consortium.
 * 3) the MicroSoft Typography guide for font creators, designated MS in the table below.  This is somewhat corrupt .  With seven  asterisks added to make the structure plausible, it gives the structure of the complex syllable as
 * [K] [VS]
 * (H  [VS]) (As)
 * [MY [As]]
 * [MR]
 * []
 * (VPre) (VAbv)* (VBlw) (A) [DB [As]]
 * (VPst [MH] (As)* (VAbv)* (A)* [DB [As]])
 * (PT < [A] [DB] [As] | [As] [A] > )
 * (V)* [J]

Alternative specifications and rendering results are given below:

A large amount of the differences arise from the treatment of ဲ as either a final consonant, like anusvara, as in the MS and UTN-11 rule sets, or as a vowel above, like ိ, as in the USMB rules. The former set is preferred on Wiktionary. As one mode of using Wiktionary is to cut and paste words from text to Wiktionary, the encodings required by USMB have stub senses (see WT:Votes/pl-2022-07/Stubifying alternative forms) defined using.

Chakma inflections tests
Chakma script for Pali:

, ,

Mon WS Inflection
Manual claimed Mon script declension with no transliteration fixes:

Manual claimed Mon declension with no transliteration fixes with transliteration override and a demonstration of a single form being targeted for manually specified transliteration:

Example of subst being abused to show Roman equivalents instead of transliteration, and to make the equivalents into links:

Manual claimed Mon aorist conjugation with transliteration fixes:

Writing Systems
In principal homorganic nasals might be written using niggahita instead of the explicit nasal consonant, but I have not yet encountered systematic use. This would affect verbal inflection. Normative spelling generally rejects it, though it does turn up as the main spelling occasionally, but affecting stems rather than affixes.

Latn
The dominant writing system for the Roman script is IAST and its variants. The variations affecting the inflections are:


 * 1) ṃ v. ṁ v. ŋ v. n
 * 2) Mark for vowel length - macron, acute, circumflex or grave.

At present, so far as I am aware, we only use ṃ and macron. I had thought that there was a ukase to that effect, but I can't find it.

I would want to mix forms in a single inflection table, though listing them as a footnote would also be appropriate.

Thai
There are two major writing systems for the Thai script - with and without implicit vowels. For the former, there are older variations involving the use of yamakkan and wanchakan, but these do not need to be addressed for now.

Although there are some inflection tables that mix the two systems in inflection tables for feminine forms, I now think that it is better to give separate tables for nominals. (There are currently technical problems with transliteration in mixing them for masculine and neuter forms).

Deva
There is no indication of any variation that affects the inflection.

Beng
The only likely variation lies in the representation of  in inflection. We may have a choice between Bengal BA, RA WITH MIDDLE DIAGONAL and RA WITH LOWER DIAGONAL. If this happens, I would favour putting them in the same table for verbs. On balance, I also favour mixing them for nominals.

Sinh
We currently have decent attestation for touching letters (with some conjunct exceptions) only. We might need at some point to support visible AL LAKUNA.

I would favour mixing them in the same inflection table.

Brah
As we do not classify Ashokan Prakrit as Pali, there do not seem to be any writing system variations to be concerned with.

Khmr
There is as yet no indication of any variation that affects the inflection. However, in the long term Khom texts should be investigated, as the script as a whole distinguishes an unencoded vowel like โ from the dependent vowel OO of the Khmer script. If there were a distinction, I would favour mixing them.

Lana
Affecting inflection, there is a distinction between round AA and long AA, which is currently handled by the parameter aa to the inflection templates. The rare occurrence of  is so far rare enough to only be included where attested. So far, it does not seem worth splitting inflection tables by writing system. There is potentially an issue with aorist 3rd plural in -iṃsu, but for now more examples are needed.

Mymr
There are at least six writing systems. There is a possibility of conflict between round AA and long AA, but the mechanism for the Lana script will also handle that. The writing systems are Burmese, Mon, 'old' Shan, 'new' Shan, Khamti Shan (Unicode proposal L2/08-276 and Tai Laing (Unicode proposal L2/12-012). Thai Mon should be reviewed for differences - off the top of my head, there may be an issue with -ss-.

A preliminary character chart is here. While combining some of Shan, Khamti Shan and Tai Laing will sometimes be possible, it looks infrequent enough not to expend significant effort enabling it. They will rarely mix with other variants. It needs to be reviewed, and it looks as though there may be other consonant correspondences.

Two vowels in inflections show variation - ā and -ī. For the former we can simply extend the use of aa, taking values 'round', 'tall', 'both' and 'shan'. For the latter, we can do something similar with ii. Thinking ahead, we can footnote the alternatives as Burmese and as Mon when we select both.

However, the consonant differences are different enough to overwhelm this system. The letters r, s, n and h vary between the writing systems, and I propose that we using a writing system parameter (possibly ws, possibly even sc) to distinguish between the Shan sensu lato writing systems. I'm inclined to identify them by the codes for the vernaculars - shn, kht, and tjl. Shan has exhibited two writing systems for Pali, one stacked and another non-stacked. The wording of descriptions give the impression that Khamti and Tai Laing Pali do not stack consonants, but I have seen the misspelt ꩬံပုက္တႃꩬး (for saṃbuddhassa - L2/20-162), so there may be similar variations in the others. I therefore propose to have a parameter such as stack to control this dimension.

Laoo
There seem to be many writing systems. The existence of the following is demonstrated or asserted:

a) Buddhist Institute alphabet with implicit vowels (an abugida), b) Buddhist Institute alphabet without implicit vowels (technically an alphabet). c) Lao repertoire with following options:      Use ຣ or not for ?       Use ຢ or not for        Use ດບ for clusters?       Use cancellation mark for 'un-Lao' clusters? (May need resolution sv v.tv) d) System using nuktas.

As with Thai, inflection tables for the abugidic spelling are best kept separate. At present I am merging the others for nominals where the stems are the same. For verbs, I am splitting the tables on the basis of the writing of  - NYO v. YO.

Bharati Braille
represents comma, apostrophe and avagraha.

(pointing) starts some letter symbols, and modifies some letters, both in Indic scripts and for Urdu.

(caps) lengthens syllabic consonants.