Wiktionary:About Contemporary Arabic

This is an attempt to standardize the documentation of "Contemporary Arabic" terms, i.e. terms in the contemporary Arabic varieties. See WT:About Arabic for considerations relating to Modern Standard Arabic, and WT:About Arabic/Egyptian for a page discussing transliteration of Egyptian Arabic.

Being a think-tank draft, some of this page's proposals are nonstandard or disagreeable. The nonstandardest, disagreeablest ones are marked with the text [DISCUSS], but everything else on the page is open for discussion and editing as well.

Note that, while the goal of "unified Arabic" on Wiktionary is a good one to work toward (compare Chinese, with only one language header for all concerned varieties), this page operates under the assumption that each individual dialect or dialect group will be listed under its own heading.

Verbs

 * In verbs' headword lines, we follow up the standard past-tense lemma (e.g. ) with the 3sg.m nonpast subjunctive (e.g. ). For varieties where the "subjunctive" is not a distinct verb form, we instead use the least-inflected 3sg.m nonpast form accordingly. (Some current entries for Levantine or Egyptian instead show the present indicative, which has an unnecessary b- prefix.)
 * A North Levantine example:
 * Verb-conjugation tables list all possible conjugations, but cross out or highlight (in red) the implausible or unused ones.
 * Verb-conjugation tables list their respective varieties' "new passives", which are typically derived from or by analogy with the old active-voice reflexive verb forms (e.g.  ), as passive-voice conjugations. (Obviously, this doesn't apply to varieties that have preserved the internal-voweling passives.)
 * In entries for verbs that are both "new passives" and fossilized MSA loans of the aforementioned verb forms, e.g., we use an Etymology 1 heading for the latter and secondary etymologies for their passive-voice meanings.

Not verbs

 * For prefixes, and for prepositions that normally require object-pronoun suffixes, the main entry shows the base form (with no suffix) alongside an inflection-table template. If morphological conditions cause the morpheme to change form, then there will be another suffixless entry for the other form with the text "[condition] form of [main entry]." (e.g. ب/في in Levantine)
 * [DISCUSS] This works fine for examples like that can function without a pronoun suffix, but it starts to get awkward at e.g., and it gets really awkward around  and . Instead, perhaps the 3sg.m form should be the default for prepositions that require a suffix, giving Levantine , , and . The entries could list these definitions along with a qualifier that this is the 3sg.m form.

Arabic script

 * We do not represent with ئ if it is the first radical or if it is word-initial but preceded by a prefix. Instead, we only use إ. This gives لإنو  for some varieties' "because" (formerly under  as the lemma, which is not preferable), متإكد  "sure of", etc., but keeps   as is.
 * We spell the 3pl subject-conjugation suffix, also 1pl in the Maghreb, as — not . This isn't necessarily any "more correct", but it's an equally- or even a more-accepted spelling in most regions, so there's nothing wrong with preferring it.
 * The 3sg.m object suffix is spelled ـو in varieties where both (1) it is pronounced and (2) it is already customary among speakers to write it as ـو. Otherwise, it is spelled ـه.
 * Regarding verb forms incorporating the t-prefix, in varieties that elide its original /ta-/ vowel: if the resulting initial consonant cluster is allowed as is, like in Levantine, then the verb should not be written with an initial alif. But if the variety in question adds a short prosthetic vowel before this new cluster, as in Egyptian , then it is written with a hamza-less ا.

Transcription
As a rule of thumb, IPA transcription enumerates all recognizably distinct pronunciations, whereas Romanization aims to represent all possible pronunciations using as few transliterations as possible (by way of polyphony), and preferably only one transliteration.

All manners of transcription

 * [DISCUSS] and  should be transcribed as "VCC" rather than as "VVC". Morphologically they're VVC, as is plainly proven by a number of things[1], but on the surface they're pronounced and analyzed as VCC. Proof/notes:
 * The Arabic suffix has shifted in Egyptian Arabic into something resembling . One would expect  were the original sequence analyzed as VVC.
 * Urban North Levantine varieties show something that very much appears to be a distinction between the two: historic and  undergo h-elision into  and, which aren't intuitively interpretable as  and , whereas historic  and  remain  and . This gives, for example,  vs. . The other explanation would be that there's a hiatus there, but hiatus has always been verboten in Arabic to my knowledge — and native speakers in Lebanon consistently Romanize these as  and , rarely as  and.
 * Lastly, writing these patterns in this way also lets us match both Wiktionary's current MSA-Romanization scheme and the Arabic-script convention of writing a shadda on the semivowel consonant.
 * [1] ...the "number of things" being:
 * Dialectal " pl. ", or " pl. ", or " pl. ", etc. — the long ā in the plural is necessarily derived from a corresponding long vowel in the singular
 * The fact that can never be stressed in a heavy syllable in Lebanese Arabic, yet the word  has no issue with the first syllable being stressed, indicating that underlyingly it's not  but
 * We DO NOT represent word-final orthographic long vowels as phonemic (or phonetic) long vowels. They are phonetically pronounced and phonemically analyzed as short vowels — heck, they're even pronounced short in MSA, but at least declining to represent MSA pronunciation has a basis in diachronics. There is zero reason to do the same for contemporary Arabic.
 * The exception is in single-syllable words such as, all of which are monosyllabic in various lects and thus free to alternate between having a long and a short final vowel.
 * We always double underlyingly geminate consonants, which some may not double when the gemination isn't obvious in speech. and  for Levantine "2pl want", for example, not  and . This also goes for morphologically-doubled word-final consonants.
 * If, in some variety, there appears to be a word-initial vowel that mysteriously isn't preceded by a glottal stop when pronounced, then it really is a semivowel and we write it as such. (Currently, the Gulf Arabic entry for is in violation of this: the imperfective is spelled ikubb rather than ykubb.)
 * Stress can still be determined automatically if a long vowel is in a word's final three syllables or if a word is monosyllabic. However, we represent stress explicitly using an accent in the absence of these conditions, as it can be unpredictable otherwise. (Current Romanizations on Egyptian Arabic entries represent stress unconditionally, which looks cluttered — especially when the accent stacks on top of a vowel-length macron — and should not be necessary.)

IPA only

 * Again, IPA transcriptions enumerate all common and phonemically distinct variants within the given dialect or dialect group, where "common" is probably best defined as "recognizably present either in more than one country or all throughout a single country". This does leave out affectations like Lebanese rounding of post-emphatic, which (although recognizable) is a minority phenomenon in the big picture.

Romanization only
Our Romanization scheme is based on Wiktionary's current MSA Romanization, with the following exceptions and notes.


 * [DISCUSS] Use IPA, not their Hans Wehr equivalents. Not only are the apostrophic Hans Wehr marks hard to see and distinguish from one another, but their use also sort of feeds the West-centric misconception that the glottal stop & pharyngeal approximant/fricative are "not really consonant sounds" not deserving of their own letters. Besides,   are derived from the same apostrophes that Hans Wehr uses.
 * If this is implemented, we can actually retain and have it serve a good purpose: representing a word-initial glottal stop that is elidible, but only in some of the varieties that the transliteration concerns. For example, some (but not all) North Levantine speakers have pronouns like  rather than, but because both variants exist, "wʾana" could be used to bridge the gap. It would replace the clunkier "wʔana or wana".
 * [DISCUSS] On this topic, something needs to be done about Hans Wehr . Maybe. It's bad enough in MSA because it suggests a false pronunciation, but it's even worse now because there's certainly a phonemic contrast between and ; there may even be varieties with minimal pairs in the two! However, introducing an unnecessary orthographic contrast violates the "as few transliterations as possible" goal of Romanization, so it's best to discuss what exactly is warranted here. If distinguishing the two is a good idea, then the standard solution is presumably to use ◌̣	 U+0323 COMBINING DOT BELOW to create, but that's quite ugly and likely not displayable in the same way everywhere, so this will need to be discussed as well. (It's currently used in the Gulf entry for , if we want to see it in action.) A better workaround may be  (combining macron below) or  (combining dot above).
 * [DISCUSS] In pursuit of transliteration-homogenization, original Arabic * and * should be represented using and, because whether or not they monophthongize varies wildly depending on region and speaker. This way,  can be read as any of , , ,  without needing to write all four possibilities explicitly.
 * If this is implemented, then and  can contrastingly be used for the same monophthongs when they don't come from historic diphthongs (e.g., not ), and specifically when said monophthongs can't diphthongize under any conditions whatsoever in any variety. (For example, the  in the loanword بنطلون "pants" (compare English pantaloons) can diphthongize in Lebanon, despite not being from a historic diphthong, and so the word might ideally be transliterated  rather than .)
 * An imaala'd (or otherwise affected) alif is still an alif, and the presence of affectations like raising can vary greatly within any given region. We therefore represent it using invariably.
 * Short vowels:
 * Certain varieties, particularly Saudi and Bedouin lects, have a phonemic distinct from  and typically corresponding to a damma. We represent it using a  in Romanization, too. Similarly, some varieties like Moroccan Arabic have a phonemic contrast between  and, and so we use the same characters appropriately in Romanization.
 * Otherwise, we use for all kasras and  for all dammas.
 * [DISCUSS] Current Gulf Arabic entries use IPA in Romanization for a kasra. Should this be kept...? Or replaced with  as below?
 * [DISCUSS] Levantine varieties have many, many words where the vowel can and often does alternate between with no semantic effect. Perhaps find a single symbol to transliterate this vowel with, e.g. a dotless.
 * Use for the epenthetic high vowel some varieties use to break up consonant clusters, not . However, this should be used sparingly. Some potentially acceptable contexts for it...
 * when the epenthetic is stressed as in Gulf Arabic (see the "ghəwa" phenomenon, although it's best-modeled here by verbs such as يشربه jəšerba(h): insertion of an epenthetic consonant followed by a stress shift to it)
 * in usage-example transliterations, to preserve the flow of a sentence (e.g. )
 * as a syllabification aid? (e.g. ; the first suggests a syllabic /s/)
 * At the end of a word, use and  for the corresponding historic long vowels. Use  for the 3sg.m suffix in varieties where it's pronounced as such.
 * [DISCUSS] The feminine suffix, historically *, varies in its modern pronunciation between and  depending on lect and phonological environment. Should a word ending in it be simply transcribed as "[...]a or [...]e"? (That seems to run counter to the goal of flattening out variation here.) Or should a single character be used to represent it, and if so, what character?  looks handy.

Dialect specifics
This section will ideally be added to, as time passes, by contributors experienced in individual dialects (which warrant more-specific considerations than the generalities above).

Egyptian Arabic

 * The conventional Egyptian spelling of Arabic is, so main entries in Egyptian Arabic go under a title with ـه. There then can be a separate page titled with the ـة spelling, containing an Egyptian Arabic definition that uses the "alternate form of" template. (We also follow the first part when determining Arabic spelling in quotes, usage examples, etc.)
 * Ditto for when it represents.

Levantine Arabic

 * The conventional Levantine spelling of Arabic ـة is not ـه. This means that Levantine entries and writings on Wiktionary do not use ـه when representing Arabic ـة.
 * Additionally, Levantine ـة is typically pronounced as when imaala applies, not as . Therefore, we never, ever transcribe or Romanize it using, particularly to avoid confusion in transliteration with word-final.
 * However, North Levantine is one of the varieties, mentioned above, where it is customary to spell the 3sg.m object suffix as ـو rather than as ـه. We therefore spell it ـو for such entries on Wiktionary.
 * Following are Levantine's "word-initial hamza" rules, which are descriptions of pronunciation that we refer to when spelling (in Arabic) and transcribing (in Romanization or IPA).
 * Imperative Form I verbs invariably start with a glottal stop when constructed on . If instead constructed on, there is never a glottal stop.
 * The past tense and imperative of any verb whose perfective starts with a kasra (namely, verbs of Form VII and higher) do not start with a glottal stop. Instead, they start with a two-consonant cluster. For example,.
 * However, the verbal nouns of such verbs do start with a glottal stop. For example,, which differs from Modern Standard Arabic . (The first-person present subjunctive does as well, of course.)
 * Any other word, if it sounds like it starts with a glottal stop, always does. This includes pronouns such as and.
 * The of the word for "what" can sometimes cause elision of a following  in common collocations, such as  and . However, this is a unique case of elision, and it's not reason enough to write the base words as, for example, اسم and especially اخبار. (What is reason enough to write اسم that way is its history, but that's another story.)