User talk:Conrad.Irwin/Transliterator.php


 * ''Many more details can be found at http://mediawiki.org/wiki/Extension:Transliterator.

This is a PHP extension designed to allow automatic transliteration where this is possible. I am told that it is possible for many languages, certainly for Armenian, Korean, and Greek, Serbian/Serbo-Croation should also be possible. For languages where it is not automatically possible, tough cookies.

The approach is very simplistic, it is supplied with a list of rules (those for Armenian are attached) and it transliterates by matching rules from longest to shortest against the string.

In order to use the extension, you simply call "" and it will give you "xačakrac’ aršavank’". The extension has been designed to be used in generic templates, so it is possible to use it without requiring any form of check (and internally it checks which rules exist much more efficiently).

For example, a template like term might use the form. This would give the following:
 * fr: => Ouzbékistan
 * amn: => Ուզբեկստան (Uzbekstan)

The syntax for defining rules is fairly simple, and they can be specified in terms of either letters or NFD code-points. Most languages should use the letters form, so that "á" and "a" are different unrelated characters, however for some languages, like Korean, it is useful to be able to analyze the decomposed form.

Rules for transliteration of Armenian (Hübschmann-Meillet) (hy) ա => a բ => b գ => g դ => d ե => e զ => z է => ē ը => ə թ => tʿ ժ => ž ի => i լ => l խ => x ծ => c կ => k հ => h ձ => j ղ => ł ճ => č մ => m յ => y ն => n շ => š ո => o չ => čʿ պ => p ջ => ǰ ռ => ṙ ս => s վ => v տ => t ր => r ց => cʿ ւ => w ու=> u փ => pʿ ք => kʿ և => ew օ => ō ֆ => f
 * 1) lowercase

Ա => A Բ => B Գ => G Դ => D Ե => E Զ => Z Է => Ē Ը => Ə Թ => Tʿ Ժ => Ž Ի => I Լ => L Խ => X Ծ => C Կ => K Հ => H Ձ => J Ղ => Ł Ճ => Č Մ => M Յ => Y Ն => N Շ => Š Ո => O Չ => Čʿ Պ => P Ջ => J̌ Ռ => Ṙ Ս => S Վ => V Տ => T Ր => R Ց => Cʿ Ւ => W Ու => U ՈՒ => U Փ => Pʿ Ք => Kʿ Օ => Ō Ֆ => F
 * 1) uppercase

Rules for transliteration of Belarusian (Scientific transliteration) (be) а => a б => b в => v г => h ґ => g д => d е => e ё => ë ж => ž з => z і => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ў => ŭ ф => f х => x ц => c ч => č ш => š ы => y ь => ’ э => è ю => ju я => ja ѣ => ě А => A Б => B В => V Г => H Ґ => G Д => D Е => E Ё => Ë Ж => Ž З => Z І => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ў => Ŭ Ф => F Х => X Ц => C Ч => Č Ш => Š Ы => Y Ь => ’ Э => È Ю => Ju Я => Ja Ѣ => Ě
 * 1) lowercase
 * 1) uppercase

Rules for transliteration of Bulgarian (Scientific transliteration) (bg) а => a б => b в => v г => g д => d е => e ж => ž з => z и => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => h ц => c ч => č ш => š щ => št ъ => ǎ ь => j ю => ju я => ja ѫ => ǫ ѣ => ě ѧ => ę
 * 1) lowercase

А => A Б => B В => V Г => G Д => D Е => E Ж => Ž З => Z И => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => H Ц => C Ч => Č Ш => Š Щ => Št Ъ => Ǎ Ь => J Ю => Ju Я => Ja Ѫ => Ǫ Ѣ => Ě Ѧ => Ę
 * 1) uppercase

Rules for transliteration of Georgian (ISO 9984) (ka) ა => a ბ => b გ => g დ => d ე => e ვ => v ზ => z თ => t’ ი => i კ => k ლ => l მ => m ნ => n ო => o პ => p ჟ => ž რ => r ს => s ტ => t უ => u ფ => p’ ქ => k’ ღ => ḡ ყ => q შ => š ჩ => č’ ც => c’ ძ => j წ => c ჭ => č ხ => x ჯ => ǰ ჰ => h
 * 1) lowercase

Rules for transliteration of Gothic (got) 𐌰 => a 𐌱 => b 𐌲 => g 𐌳 => d 𐌴 => e 𐌵 => q 𐌶 => z 𐌷 => h 𐌸 => þ 𐌹̈ => ï 𐌹 => i 𐌺 => k 𐌻 => l 𐌼 => m 𐌽 => n 𐌾 => j 𐌿 => u 𐍀 => p 𐍁 => 90 𐍂 => r 𐍃 => s 𐍄 => t 𐍅 => w 𐍆 => f 𐍇 => x 𐍈 => ƕ 𐍉 => o 𐍊 => 900

Rules for transliteration of Greek (el) α => a  ά  => á αι => ai άι => ai  αϊ => ai  αυ => av αυθ => afth αυκ => afk αυξ => afx αυπ => afp αυσ => afs αυς => afs αυτ => aft αυφ => aff αυχ => afch αυψ => afps αυ$ => af αύ => áv αύθ => áfth αύκ => áfk αύξ => áfx αύπ => áfp αύσ => áfs αύς => áfs αύτ => áft αύφ => áff αύχ => áfch αύψ => áfps αύ$ => áf άυ => áy αϋ => aÿ β  => v  γ  => g  γγ => ng  γξ => nx  γκ => gk  γχ => nch δ => d  ε  => e έ  => é ει => ei έι => ei  εϊ => ei  ευ => ev ευθ => efth ευκ => efk ευξ => efx ευπ => efp ευσ => efs ευς => efs ευτ => eft ευφ => eff ευχ => efch ευψ => efps ευ$ => ef εύ => év εύθ => éfth εύκ => éfk εύξ => éfx εύπ => éfp εύσ => éfs εύς => éfs εύτ => éft εύφ => éff εύχ => éfch εύψ => éfps εύ$ => éf έυ => éy εϋ => eÿ ζ  => z  η  => i  ή  => í ηυ => iv ηυθ => ifth ηυκ => ifk ηυξ => ifx ηυπ => ifp ηυσ => ifs ηυς => ifs ηυτ => ift ηυφ => iff ηυχ => ifch ηυψ => ifps ηυ$ => if ηύ => ív ηύθ => ífth ηύκ => ífk ηύξ => ífx ηύπ => ífp ηύσ => ífs ηύς => ífs ηύτ => íft ηύφ => íff ηύχ => ífch ηύ$ => íf ήυ => íy ηϋ => iÿ θ  => th  ι  => i  ί  => í ϊ => ï ΐ => í κ => k  λ  => l  μ  => m  ^μπ => b μπ => mp ν  => n  ντ => nt  ξ  => x  ο  => o  ό  => ó οι => oi όι => oi  οϊ => oi  ου => ou  όυ => óy οϋ => oÿ π  => p  ρ  => r  σ  => s ς  => s  τ  => t  υ  => y ύ  => ý ϋ => ÿ ΰ => ý υι => yi φ  => f  χ  => ch  ψ  => ps  ω  => o ώ  => ó

Rules for transliteration of Kazakh (QazAqparat) (kk) а => a ә => ä б => b в => v г => g ғ => ğ д => d е => e ё => yo ж => j з => z и => ï й => y к => k қ => q л => l м => m н => n ң => ñ о => o ө => ö п => p р => r с => s т => t у => w ұ => u ү => ü ф => f х => x һ => h ц => c ч => ç ш => ş щ => şş ъ => ” ы => ı і => i ь => ’ э => é ю => yu я => ya
 * 1) lowercase

А => A Ә => Ä Б => B В => V Г => G Ғ => Ğ Д => D Е => E Ё => Yo Ж => J З => Z И => Ï Й => Y К => K Қ => Q Л => L М => M Н => N Ң => Ñ О => O Ө => Ö П => P Р => R С => S Т => T У => W Ұ => U Ү => Ü Ф => F Х => X Һ => H Ц => C Ч => Ç Ш => Ş Щ => Şş Ъ => ” Ы => I І => I Ь => ’ Э => É Ю => Yu Я => Ya
 * 1) uppercase

Beginnings of rules for Korean (revised romanization published in 2000) (ko)
 * 1) Single letters take from http://cpansearch.perl.org/src/KAWASAKI/Lingua-KO-Romanize-Hangul-0.20/lib/Lingua/KO/Romanize/Hangul.pm
 * 2) It needs some special cases for certain adjacent characters but I cannot decipher the documentation, and the perl code above
 * 3) seems to replace characters only in circumstances that they can't appear.

&#x1100; => g &#x1101; => kk &#x1102; => n &#x1103; => d &#x1104; => tt &#x1105; => r &#x1106; => m &#x1107; => b &#x1108; => pp &#x1109; => s &#x110A; => ss &#x110B; => &#x110C; => j &#x110D; => jj &#x110E; => ch &#x110F; => k &#x1110; => t &#x1111; => p &#x1112; => h &#x1161; => a &#x1162; => ae &#x1163; => ya &#x1164; => yae &#x1165; => eo &#x1166; => e &#x1167; => yeo &#x1168; => ye &#x1169; => o &#x116A; => wa &#x116B; => wae &#x116C; => oe &#x116D; => yo &#x116E; => u &#x116F; => wo &#x1170; => we  &#x1171; => wi &#x1172; => yu &#x1173; => eu &#x1174; => ui &#x1175; => i &#x11A7; => &#x11A8; => g &#x11A9; => kk &#x11AA; => ks &#x11AB; => n &#x11AC; => nj &#x11AD; => nh &#x11AE; => d &#x11AF; => r &#x11B0; => rg &#x11B1; => rm &#x11B2; => rb &#x11B3; => rs &#x11B4; => rt &#x11B5; => rp &#x11B6; => rh &#x11B7; => m &#x11B8; => b &#x11B9; => bs &#x11BA; => s &#x11BB; => ss &#x11BC; => ng &#x11BD; => j &#x11BE; => c &#x11BF; => k &#x11C0; => t &#x11C1; => p &#x11C2; => h
 * 1) initial
 * 1) Vowel
 * 1) Final
 * 2) This first character seems to indicate "no tail" rather than exist as a character.

Rules for transliteration of Macedonian (ISO/R 9:1968) (mk) а => a б => b в => v г => g д => d ѓ => ǵ е => e ж => ž з => z ѕ => dz и => i ј => j к => k л => l љ => lj м => m н => n њ => nj о => o п => p р => r с => s т => t ќ => ḱ у => u ф => f х => h ц => c ч => č џ => dž ш => š
 * 1) lowercase

А => A Б => B В => V Г => G Д => D Ѓ => Ǵ Е => E Ж => Ž З => Z Ѕ => Dz И => I Ј => J К => K Л => L Љ => Lj М => M Н => N Њ => Nj О => O П => P Р => R С => S Т => T Ќ => Ḱ У => U Ф => F Х => H Ц => C Ч => Č Џ => Dž Ш => Š
 * 1) uppercase

Rules for transliteration of Old Armenian (Hübschmann-Meillet) (xcl) ա => a բ => b գ => g դ => d ե => e զ => z է => ē ը => ə թ => tʿ ժ => ž ի => i լ => l խ => x ծ => c կ => k հ => h ձ => j ղ => ł ճ => č մ => m յ => y ն => n շ => š ո => o չ => čʿ պ => p ջ => ǰ ռ => ṙ ս => s վ => v տ => t ր => r ց => cʿ ւ => w ու=> u փ => pʿ ք => kʿ և => ew օ => ō ֆ => f
 * 1) lowercase

Ա => A Բ => B Գ => G Դ => D Ե => E Զ => Z Է => Ē Ը => Ə Թ => Tʿ Ժ => Ž Ի => I Լ => L Խ => X Ծ => C Կ => K Հ => H Ձ => J Ղ => Ł Ճ => Č Մ => M Յ => Y Ն => N Շ => Š Ո => O Չ => Čʿ Պ => P Ջ => J̌ Ռ => Ṙ Ս => S Վ => V Տ => T Ր => R Ց => Cʿ Ւ => W Ու => U ՈՒ => U Փ => Pʿ Ք => Kʿ Օ => Ō Ֆ => F
 * 1) uppercase

Rules for transliteration of Old Church Slavonic, Cyrillic and Glagolitic (cu) а => a А => A б => b Б => B в => v В => V г => g Г => G д => d Д => D є => e Є => E ж => ž Ж => Ž ѕ => dz Ѕ => Dz ꙃ => dz Ꙃ => Dz з => z З => Z ꙁ => z Ꙁ => Z и => i И => I і => i І => I ї => i ћ => ǵ Ћ => Ǵ к => k К => K л => l Л => L м => m М => M н => n Н => N о => o О => O п => p П => P р => r Р => R с => s С => S т => t Т => T оу => u Оу => U ѹ => u Ѹ => U ф => f Ф => F х => x Х => X ѡ => ō Ѡ => Ō ц => c Ц => C ч => č Ч => Č ш => š Ш => Š щ => št Щ => Št ъ => ŭ Ъ => Ŭ ꙑ => y Ꙑ => Y ъи => y ЪИ => Y ъі => y ЪІ => Y ь => ĭ Ь => Ĭ ѣ => ě Ѣ => Ě ю => ju Ю => Ju я => ja Я => Ja ꙗ => ja Ꙗ => Ja ѥ => je Ѥ => Je ѧ => ę Ѧ => Ę ѩ => ję Ѩ => Ję ѫ => ǫ Ѫ => Ǫ ѭ => jǫ Ѭ => Jǫ ѯ => ks Ѯ => Ks ѱ => ps Ѱ => Ps ѳ => θ Ѳ => Θ ѵ => ü Ѵ => Ü Ѽ => O! ѿ => ot Ѿ => Ot Ⰰ => a ⰰ => a Ⰱ => b ⰱ => b Ⰲ => v ⰲ => v Ⰳ => g ⰳ => g Ⰴ => d ⰴ => d Ⰵ => e ⰵ => e Ⰶ => ž ⰶ => ž Ⰷ => dz ⰷ => dz Ⰸ => z ⰸ => z Ⰹ => i ⰹ => i Ⰺ => i ⰺ => i Ⰻ => i ⰻ => i Ⰼ => ǵ ⰼ => ǵ Ⰽ => k ⰽ => k Ⰾ => l ⰾ => l Ⰿ => m ⰿ => m Ⱀ => n ⱀ => n Ⱁ => o ⱁ => o Ⱂ => p ⱂ => p Ⱃ => r ⱃ => r Ⱄ => s ⱄ => s Ⱅ => t ⱅ => t Ⱆ => u ⱆ => u Ⱇ => f ⱇ => f Ⱈ => x ⱈ => x Ⱉ => ot ⱉ => ot Ⱊ => p ⱊ => p Ⱋ => št ⱋ => št Ⱌ => c ⱌ => c Ⱍ => č ⱍ => č Ⱎ => š ⱎ => š Ⱏ => ŭ ⱏ => ŭ Ⱐ => ĭ ⱐ => ĭ Ⱑ => ě ⱑ => ě Ⱓ => ju ⱓ => ju Ⱔ => ę ⱔ => ę Ⱕ => ę ⱕ => ę Ⱖ => jo ⱖ => jo Ⱗ => ję ⱗ => ję Ⱘ => ǫ ⱘ => ǫ Ⱙ => jǫ ⱙ => jǫ Ⱚ => θ ⱚ => θ Ⱛ => ü ⱛ => ü Ⱝ => a ⱝ => a Ⱞ => m ⱞ => m
 * 1) Cyrillic
 * 1) Glagolitic

Rules for transliteration of Phoenician (phn) 𐤀 => ʾ 𐤁 => b 𐤂 => g 𐤃 => d 𐤄 => h 𐤅 => w 𐤆 => z 𐤇 => ḥ 𐤈 => ṭ 𐤉 => y 𐤊 => k 𐤋 => l 𐤌 => m 𐤍 => n 𐤎 => s 𐤏 => ʿ 𐤐 => p 𐤑 => ṣ 𐤒 => q 𐤓 => r 𐤔 => š 𐤕 => t

Rules for transliteration of Russian (Scientific transliteration) (ru) а => a б => b в => v г => g д => d е => e ё => ë ж => ž з => z и => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => x ц => c ч => č ш => š щ => šč ъ => ” ы => y ь => ’ э => è ю => ju я => ja і => i ѣ => ě ѳ => f ѵ => i
 * 1) lowercase

А => A Б => B В => V Г => G Д => D Е => E Ё => Ë Ж => Ž З => Z И => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => X Ц => C Ч => Č Ш => Š Щ => Šč Ъ => ” Ы => Y Ь => ’ Э => È Ю => Ju Я => Ja І => I Ѣ => Ě Ѳ => F Ѵ => I
 * 1) uppercase

Rules for transliteration of Tajik (developed specifically for Wiktionary) (tg) а => a б => b в => v г => g ғ => ġ д => d е => e ё => yo ж => ž з => z и => i ӣ => ī й => y к => k қ => q л => l м => m н => n о => o п => p р => r с => s т => t у => u ӯ => ū ф => f х => x ҳ => h ч => č ҷ => j ш => š ъ => ʾ э => è ю => ju я => ja А => A Б => B В => V Г => G Ғ => Ġ Д => D Е => E Ё => Yo Ж => Ž З => Z И => I ӣ => Ī Й => Y К => K Қ => Q Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U ӯ => Ū Ф => F Х => X Ҳ => H Ч => Č Ҷ => J Ш => Š Ъ => ʾ Э => È Ю => Ju Я => Ja
 * 1) lowercase
 * 1) uppercase

Rules for transliteration of Ugaritic (uga) 𐎀 => ả 𐎁 => b 𐎂 => g 𐎃 => ḫ 𐎄 => d 𐎅 => h 𐎆 => w 𐎇 => z 𐎈 => ḥ 𐎉 => ṭ 𐎊 => y 𐎋 => k 𐎌 => š 𐎍 => l 𐎎 => m 𐎏 => ḏ 𐎐 => n 𐎑 => ẓ 𐎒 => s 𐎓 => ʿ 𐎔 => p 𐎕 => ṣ 𐎖 => q 𐎗 => r 𐎘 => ṯ 𐎙 => ġ 𐎚 => t 𐎛 => ỉ 𐎜 => ủ 𐎝 => ś

Rules for transliteration of Ukrainian (Scientific transliteration) (uk) а => a б => b в => v г => h ґ => g д => d е => e є => je ж => ž з => z и => y і => i й => j ї => ji к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => x ц => c ч => č ш => š щ => šč ь => ’ ю => ju я => ja ѣ => ě ё => ë э => è ы => y ѳ => f ѵ => i ѧ => ę А => A Б => B В => V Г => H Ґ => G Д => D Е => E Є => Je Ж => Ž З => Z И => Y І => I Й => J Ї => Ji К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => X Ц => C Ч => Č Ш => Š Щ => Šč Ь => ’ Ю => Ju Я => Ja Ѣ => Ě Ё => Ë Э => È Ы => Y Ѳ => F Ѵ => i Ѧ => Ę
 * 1) lowercase
 * 1) uppercase