Appendix talk:Old Cyrillic script

Unicode 5.1 additions
Outline of the changes and additions (PDF)
 * Proposal to encode additional Cyrillic characters in the BMP of the UCS

Discussion about changes to existing omega with titlo and uk (PDF)
 * On Cyrillic Letter Omega with Titlo and on Cyrillic Letter Uk

Code charts (PDF)
 * Cyrillic 0400–04FF
 * Cyrillic Supplement 0500–052F
 * Cyrillic Extended-A 2DE0–2DFF
 * Cyrillic Extended-B A640–A69F

Alan Wood's test pages
 * Cyrillic
 * Cyrillic Supplement
 * Cyrillic Extended-A
 * Cyrillic Extended-B

Character inventory
Letters useful in Old Church Slavonic, Old East Slavic, Church Slavonic [other languages?], for use in. Please make corrections and add relevant comments. Code points used for OCS-style orthography are highlighted.

Missing characters will be clearly visible if your system has the latest version (5.1) of SIL's Unicode BMP Fallback Font installed. Font samples are as rendered by the browser, with BukyVede upright and italic, Code2000, DejaVu Sans, DejaVu Serif, Dilyana, Kliment Std, Lazov, Menaion, and RomanCyrillic Std fonts applied.

Fonts
Cyrillic character coverage of released fonts.

Fonts
Well, I have all the fonts listed inside installed, and quite a few of others too, but still can't see single one of these new characters. If anyone has a knowledge of a font supporting these unicode ranges, please announce it here. --Ivan Štambuk 17:08, 8 April 2008 (UTC)


 * I don't think there is one yet.


 * Dilyana has appropriate glyphs for the existing range, including e/je, ze with tail, jery with backstroke and no dot, digraph uk, old glyphs for ci and červ, and jus glyph in the ja position.


 * It seems to have just a few of the new ones in its inventory when I inspect it using Apple Font Book, but they don't work for me. —Michael Z. 17:42, 8 April 2008 (UTC)


 * By the way, the font list is simply Dilyana, plus a copy of the standard list for  as fallback.  There may be other suitable Slavistics fonts to add, but we should confirm that they all use the same glyphs for the same letters.  Hopefully we get a full Unicode 5.1 font before too long. —Michael Z. 18:01, 8 April 2008 (UTC)


 * I opened Dilyana in "Charmap" utility on Windoze (start->run "charmap"), and it does have something but in Private Use area in U+E028, U+E029..which makes is somewhat useless.. --Ivan Štambuk 19:18, 8 April 2008 (UTC)

Found some fonts at Obshtezhitie's Resources page. I'll add a link, but I don't think I can do much more about it in the next week or two. —Michael Z. 2008-04-16 21:05 Z 


 * All the characters in the table look good now, except Menaion seems to have a Che-Zhivie ligature glyph for the letter Ksi. —Michael Z. 2008-04-16 21:37 Z 

RomanCyrillic Std and Kliment Std fonts (both version 3.00) appear to have all of the new characters, but with Latin-type letterforms, not the pre-Petrine Cyrillic style. The combining characters are present, but they don't seem to all work as expected on my Mac.


 * Broad omega with combining titlo:
 * Kliment Std: Ꙍ҃ ꙍ҃
 * RomanCyrillic Std: Ꙍ҃ ꙍ҃
 * Bukyvede: Ꙍ҃ ꙍ҃
 * Broad omega with combining pokrytie:
 * Kliment Std: Ꙍ҇ ꙍ҇
 * RomanCyrillic Std: Ꙍ҇ ꙍ҇
 * Bukyvede: Ꙍ҇ ꙍ҇
 * Broad omega with combining psili pneuma and pokrytie:
 * Kliment Std: Ꙍ҆҇ ꙍ҆҇
 * RomanCyrillic Std: Ꙍ҆҇ ꙍ҆҇
 * Bukyvede: Ꙍ҆҇ ꙍ҆҇
 * A with combining be and pokrytie:
 * Kliment Std: Аⷠ҇ аⷠ҇
 * RomanCyrillic Std: Аⷠ҇ аⷠ҇
 * Bukyvede: Аⷠ҇ аⷠ҇

Possibly of interest, they also have "transliterating" Cyrillic glyphs for the Glagolitic range. —Michael Z. 2008-05-20 06:24 z 

yery
- this dude. With Stephen's advice, while adding OCS entries, I abandoned the usage of ligature 'ы' and used digraph 'ъі' instead. Yery was indeed digraph in both Glagolitic and early Cyrillic (and therefore assumed to have represented a diphthong; in contemporary Latin text it was written as "ui"). However, now I see inside n3194.pdf "YERU WITH BACK YER" U+A650 (Ꙑ) precisely corresponding to this one (both 'ъ' and 'і but without the dot').

So what to do know? Te relocate all the entries using 'ъі' to 'ꙑ' I'd rather wait until an actual font comes up supporting these, and in the meantime we could continue wrtiting pro  and so on. --Ivan Štambuk 17:19, 8 April 2008 (UTC)


 * I suggest holding off on any content changes until we have access to a free font, and we know that it will work on Windows and Mac.


 * For display, it may already be acceptable to use Dilyana font by adding to templates which use  .  On this chart, it displays correct glyphs for everything, although jotified a has to be constructed from і and a, and I can't generate a vertical uk.  Unfortunately, the letters are still very small, despite my adding an HTML &lt;big> element, so I think it would require the font size to be increased.


 * Both ы and digraph ъі display correctly. The only problem is that a jus is displayed for я. —Michael Z. 17:54, 8 April 2008 (UTC)


 * By the way, Unicode 5.1 recommends ignoring the existing ambiguous letter uk, and using о+у for the digraph, and the new monograph uk for that form. —Michael Z. 19:29, 8 April 2008 (UTC)


 * OK, so we'll continue with the current practice of using 'ъі' for yery, 'я' for "iotified a" and ambiguous & instead of a digraph & - all with sc=Cyrs and possibly with head= parameter for that would use 'іа' instead of 'я' so that at least a headword looks properly, until a real font appears. It wouldn't be much of a problem to rename a few hundreds entries en masse (a bot could do it rather trivially) once it becomes possible to do so.
 * What a mess. --Ivan Štambuk 19:38, 8 April 2008 (UTC)


 * At least things are finally improving. That sounds right, except I think we can shift to the recommended handling of Uk.


 * It's safe to start using the two-letter digraph о+у, &, and to convert existing ambiguous ѹ, &. At least it will have a predictable appearance.


 * The ambiguous Uk U+0479 renders as a digraph in my versions of Chrysanthi, Arial, Buckvica, Code200, Dilyana, Geneva, Helvetica, Lucida Grande, Microsoft Sans, Thryomanes, and Unicode5, but as a monograph in DejaVu sans and Titus Cyberbit. Unfortunately, the letter U (U+0443) renders as a vertical ligature in a number of Slavist fonts, so these can be considered less than ideal under Unicode 5.1 unless they are updated: Drevnerusski j, Evangelie UCS, Feofan UCS, Indycton UCS, Irmologion Caps, Irmologion UCS, Kirillica Nova Unicode, Pochaevsk, Psaltyr, Slavjanic, Triodon Caps, Triodon UCS, and Zlatoust UCS.


 * Is it correct that earlier orthographies only used the digraph? Can it be used in the spelling of all words, in all languages?  Is use of the monograph only significant in direct quotations and attestations? —Michael Z. 22:55, 8 April 2008 (UTC)


 * Well, I think that the consistency is the most important thing, and that we should countinue using the non-digraph version and switch to digraph at the same time when we switch to new 5.1 characters. Bot could do the change all in one sweep, and the resulting redirects would than have to be manually checked for orphaning (there are number of wikifed OCS entries in translations tables an appendices, and also OCS declension templates).
 * I think that Uk was always a digraph in OCS manuscripts. Later Church Slavonic and vernacular texts used a myriad of custom modifications and styles of the entire alphabet. I have no idea how to handle those special cases. All the entries should merit inclusion in the format they were attested. However, for the sake of uniformity all the entries should appear in some "standard format", ignoring irrelevant details - such as ignoring three different 'i' characters and use just 'и' for OCS. Other attestations can appear in ===Alternative spellings=== section.
 * Those fonts that render U+0443 as a vertical ligature should be kept out of, especially when 5.1 fonts appear.
 * There's still no About Old Church Slavonic. I'll look to add to it currently used practice to make some things clear.
 * Of course, Old East Slavic category is almost empty, and if you want to start adding content to it you'd better start using digraph Uk from the start, and there'll be no inconsistencies for you to worry about. --22:02, 9 April 2008 (UTC)

Introducing Unicode 5.1 support in old Cyrillic
With Unicode 5.1 there are now several alternate code points representing some letters. We need to choose which to use for a normalized or “canonical” spellings. As Ivan S. suggested above, attested spellings should still be represented, and attested alternate forms should appear.

A few questions remain.

Here's my summary of alternate forms in Unicode, their current usage in Wiktionary, and the issues; please add and correct this list.


 * jest’: е, є—є is used
 * dzelo: ѕ, ꙅ, ꙃ—ѕ is used
 * zemlja: з, ꙁ—з is used, ꙁ is the earlier form, and some manuscripts used both side-by-side
 * iže: і, ї—і is used
 * on: о, ѻ, ꙩ, ꙫ, ꙭ, ꙮ—о is used
 * uk: о+у, ꙋ, ѹ—ѹ (digraph character) is used, but Unicode now recommends о+у
 * ot: ѡ, ꙍ, ѿ—ѡ is used
 * jery: ꙑ, ы, ъ+і—ъ+і is used
 * ju: ю, ꙕ—ю is used
 * ja: ꙗ, я, і+а—я is used, but the glyph is incorrect, including in most Slavonic fonts; the use of the two forms does not overlap in historical documents, so they could be considered the same letter

My thoughts:


 * The normalized forms of words are already represented adequately and unambiguously, so there is no critical need for changes.
 * To be able to attest historical sources accurately, we will need to start adding alternative forms using Unicode 5.1 characters, which will display incorrectly for some readers. It's okay to start adding these links now, as long as a normalized entry exists.
 * To fully support the new Unicode standard in canonical forms, three issues need to be addressed first:
 * Оу Digraph ѹ should be transitioned to the recommended о+у.  This doesn't require Unicode 5.1 support, is easier to edit, and looks identical.   I believe this can be done ad-hoc or all at once with a bot at any time, and it's okay to start using this form for new entries.
 * Ꙑ  The equivalent modern letter is ы, but has a slightly different glyph in almost all modern and Slavonic fonts. ъ+i looks right, and is etymologically equivalent, but lacks the functional advantage of keeping different-language entries together.  When support for Unicode 5.1 is good enough, we may as well transition to the new character.
 * Ꙗ  The direct equivalent modern letter is я, but its glyph is significantly different .  і+а looks better, but using it gives no functional advantage. When support for Unicode 5.1 is good enough, we should transition to the new character.

When is support good enough to start using new characters in normalized entries? With the right fonts installed, I haven't seen any display problems even though my Mac OS only supports Unicode 5.0. For a number of other non-Latin writing systems, a reader must download non-standard fonts already, and a student of old Slavic languages should install them anyway. I doubt that such fonts will become standard in operating systems, no matter how long we wait.

Since several decent Unicode 5.1 Slavonic fonts are freely available, I have no objection to transitioning now. But I also don't mind holding off if there are objections. —Michael Z. 2008-11-06 21:05 z