User:Dan Polansky/Compound

This properly sourced article is about compounds in linguistics.

Definition
Compounds are easy to define approximately as words made from words. They are hard to define exactly, especially since it is difficult to distinguish compounds from phrases.

The following definitions of "compound" can be found:
 * (1) A word composed of multiple words.
 * (2) A word composed of multiple independent words or combining forms of words.
 * (3) A word composed of multiple free morphemes.
 * (4) A lexeme composed of multiple stems.
 * (5) A noun, adjective or verb composed of multiple words or parts of words.
 * (6) A sequence of multiple words that act as a single word.
 * (7) A word or word sequence consisting of multiple parts that captures a specific concept, whether the parts are words or affixes.

The definitions have different implications:
 * The definitions clearly requiring compounds to be words are (1), (2) and (3).
 * Definitions (2) and (4) are a technical refinement of (1). "lexeme" is a fancy synonym of "word". The use of "stem" or "combining form" is required e.g. for Czech, where slunovrat is based on slunce and vrátit, composed of slun- and vrat-, which are word stems, not words.
 * Definition (5) seems broken: "noun, adjective or verb" has to cover typographically multiple words to cover English open compounds, but if it does, then it covers phrases like "cat sitting on the mat", not a compound.
 * Definition (6) is unclear: it is not clear what it means for something to "act as a single word". By saying "act", it allows compounds to be not single words.
 * Definition (7) requires compounds to capture a specific concept, which seems to suggests compounds are not sum of parts. This cannot be so: many German compounds are sum of parts. Furthermore, it includes affixing under compounding; this makes sense for inflected languages: vysokoškolský is a compound but requires suffix -ský to be formed. A requirement for a compound proper, in contrast to lesní, could be that at least two of its parts are words. The case of vysokoškolský shows that definitions (1), (2), (3), (4), (5) and (6) are cross-linguistically inadequate: they work for English since hyphenated adjectival compounds take no suffix.

Demarcation
Compounds need to be distinguished from the following:
 * Affixed words, e.g. blueness. There may be ambiguity: is German "aufholen" made from prefix "auf-" or word "auf"? Is English overcome made from prefix "over-" or word "over"? Furthermore, a Merriam-Webster compound guide includes affixing under compounding. Another source indicates the distinction between compounding and affixing has been treated as problematic in literature.
 * Free non-compound phrases, e.g. green house (house that is green) or cat that is on the mat. The phrase school bus traffic stop laws looks to some as a compound, but credentialed sources usually do not give such an example.
 * Proverbs, e.g. all roads lead to Rome
 * Phrasal verbs. Some non-credentialed sources give phrasal verbs "carry over" and "break up" as example compounds. However, credentialed sources usually do not give such examples. On the other hand, when English phrasal verbs are considered to be single words, they meet the definitions of compounds. Still, sources usually do not define English phrasal verbs as words but rather as phrases.

Part of speech
A compound's part of speech can be noun, adjective and verb. Examples are "bus stop", "self-centered" and "windsurf".

Detection criteria or tests
Compounds written with spaces present a special problem for detection.

Cambridge Grammar of English Language (CGEL) mentions stress, orthography, meaning, and productivity as playing a role in distinguishing compounds from non-compound phrases. CGEL calls the non-compound noun phrases "composite nominals". A further test is "coordination and modification": parts of non-compounds can "enter separately into relations of coordination and modification".

Abdel Rahman Mitib Altakhaineh lists orthography, stress, modification, compositionality, displacement, insertion, referentiality, coordination, replacement of the second element by a pro-form, ellipsis, and inflection and linking element as tests.

Livio Gaeta & Davide Ricca consider compounds to be morphological objects, independent of their lexical status.

Wordhood
Since multiple sources define compounds as words, being a word is a criterion. However, a distinction needs to be made between "orthographic word", "phonological word" and other notions of word. Compounds are not necessarily orthographic words, as per e.g. the two-word compound "high school". The term "lexical item" is broader than most notions of word, as containing proverbs. Another notion is "morphological word". While wordhood is usually a requirement, it is not a simple test but rather depends on a multitude of simpler tests.

Cross-linguistic uniformity or universality of notion
It may be difficult to arrive at a single universal cross-linguistic set of operational tests of compoundhood. "What can be said more generally for the different languages and language families of Europe is that the potential correspondence between compounds and MWEs cannot be described in a uniform way, since it is multifaceted and manifests itself in very different ways."

From the same source in more detail: "The first, very simple observation is that all languages examined here have morphological compounds. However, it turned out that the compounds in these languages do not all share the same defining properties. While lexical (compound) stress, headedness (either right or left), inseparability and debarment of word-internal inflection, recursiveness, and linking elements are generally considered essential criteria for the definition of compound, in particular from a German(ic) perspective, all of them also emerged as problematic in at least one language, or as non-existent. Thus, it seems that there is no universal definition of compound. Rather, as pointed out by Ralli (2013b: 184): 'What makes a compound morphological should be defined on a language-specific basis, since languages vary with respect to the realization of their morphological features and the use of morphologically-proper units.'"

Unity among different linguists within the same language
Even within a single language, different treatments of compounds can be found in literature, resulting in different classification of candidate compounds as true compounds or not.

French is a language for which some linguists count multi-word phrases such as pomme de terre as compounds.<ref name= Some linguists go so far as to claim French has no true compounding at all.<ref name=

German does not seem to suffer from this kind of disunity.

The Italian linguistic tradition is divided over constructions such as zuppa di verdure.

Spanish is another language with varying treatment: multi-word phrases león marino and paquete bomba were regarded as compounds by some but not others.

Spelling or orthography
Words written solid or hyphenated are easier to recognize as compounds. Word sequences written with spaces present a problem: not each such sequence is a compound. For instance, "cat that is on the mat" is not a compound, whereas "high school" is a compound. Britannica's article on compounding gives no example of an open compound, implying it does not consider open compounds to be compounds.

Spelling tests work well for some languages:
 * For German, all compounds are written without spaces, and writing them with spaces is a rare error.
 * In Czech and Slovak, all compounds are spelled as one word, while syntactic phrases are spelled as separate words.
 * Finnish: "As a general rule, Finnish compounds are written without space between the constituents"
 * Greek: "Greek compounds display solid spelling, contrary to phrases, just as in German."
 * In Polish, most compounds are spelled as one word without a hyphen, but there are exceptions such as Bośnia-Hercegowina and czarno-biały.

Morphology
"Compounds are the output of morphology, while MWEs [multi-word expressions] are the output of syntax. [...] The property of being morphological implies that an item is the output of some morphological schema or rule, which is different from a syntactic schema or rule."

"in contrast to German it seems much more difficult to provide clear criteria for morphological compounds as opposed to MWEs in French, Spanish, and Italian."

Phonology
English open compounds have a distinctive phonology. Britannica distinguishes compounds from word groups or phrases by "stress, juncture, or vowel quality or by a combination of these".

In Romance languages, "compounds and MWEs are basically stressed in the same way".

Meaning and sum of parts
Some sources indicate compounds are not sum of parts. A Czech encyclopedia says compounds usually have a meaning different from the base words. However, being more than a sum of parts is not a necessary condition: German compounds such as Tanzschule and Zirkusschule are often sum of parts in that their meaning can be obtained from the meaning of the component words. English bookshop is sum of parts as well. Moreover, it is not a sufficient condition either: idiomatic proverbs are not compounds.

Separate inflection
Consisting of separately inflected parts is one test of non-compoundhood for highly inflected languages. And it works only for some of them:
 * Czech, Slovak
 * German
 * Polish
 * Danish, Swedish

The test has no value for English and Chinese.

A plausible objection was that some Latin compounds have separately inflected parts. Some other languages:
 * In Spanish, some items considered compounds show separate inflection of parts.
 * In Icelandic, there is compound-internal inflection.

Even if it is not a universal cross-linguistic test, its utility for languages to which it applies is confirmed by sources.

Linking element
Presence of a linking element may indicate compoundhood in some languages. Thus, in German, Liebesbrief contains s. However, this is no necessary condition in German, per Konzertreise.

"(Native) linking elements, [...], do not exist in French and Italian."

Norms and prescriptions
Some sources for some languages prescribe compounds to be written without spaces. The prescriptions become descriptions in so far as language users align with them. Some cases:
 * Dutch: "Dutch orthography requires compounds to be written without an internal space."
 * German: "Die Wörter Kürbissuppe, Zwiebelkuchen und Hairstudio werden nach deutschen Wortbildungsregeln zusammengeschrieben."

Examples in various languages
Example compounds in various languages:
 * Ancient Greek: dermatology, democracy, pyromania, rhododendron, that is, δερματολογία, δημοκρατία, πυρομανία, ῥοδόδενδρον
 * Chinese: 大褂儿
 * Czech: zeměpis, olejomalba, vysokoškolský
 * Danish: fyrværkerigrund, bankrådgivning, kulturkløft
 * Dutch: jonggetrouwd, tandextractie, boerenzóon, koningszoon
 * English: rowboat, high school, devil-may-care, crime-prone, grass-green, sky-blue, air-quote, dry-burn
 * Estonian: lutipudel, riisipuder, noortööline
 * Finnish: lentokoneonnettomuus, kesäyö, märkäpuku, metsäyhtiö
 * French: timbre-poste, essuie-glace
 * German: Kürbissuppe, Zwiebelkuchen, Hairstudio, Handelsvertrag, Affenhaus , Frischluft
 * Greek: χαρτόκουτ, κεφαλόσκαλο, εθιμοτυπικός, κρυφοκοιτάζω
 * Hebrew: beyt sefer (בית ספר)
 * Hungarian: városháza, kisautó, kőkemény, városháza, tojásfehérje
 * Icelandic: gufubátur, Norðausturatlantshafsfiskveiðinefndin
 * Italian: pescecane, cavatappi, criminologo, transporto latte, poeta pittore
 * Latin: aequilibrium, multilateralis, carnivora
 * Polish: czerwono-czarni, listopad, językoznawstwo, czcigodny, zmartwychwstały, drobnoustrój
 * Russian: glubokomyslie, lesostep, zvukorežisser, senouborka
 * Sanskrit: rājapūruṣāḥ, rāmakṛṣṇau
 * Slovak: svetonázor
 * Spanish: coliflor, coche cama, bocacalle, telaraña
 * Swedish: livbåt, livbåtsbesättning, flickebarn, människokärlek

Non-compound phrase examples
The following items are non-compound phrases. Whether they are more than sum of parts should not matter, as argued in section Meaning and sum of parts.
 * Danish: røget laks, stor begivenhed
 * Dutch: rode wijn, rijk versierd, koffie zetten
 * English: piece of cake, dry cough, grass slug, hit the road, green card (card that is green), heavy smoker, kick the bucket
 * German: weich wie Butter, schwarzer Tee, rotes Kraut, Spanisches Rohr, kalter Krieg
 * Greek: psixrós pólemos, zóni asfalías
 * Polish: kontrola jakośki, karma dla zwierzat, numer telefonu, pasta do zębów
 * Russian: novaja kniga, myľnaja opera, sredstva massovoj informacii
 * Swedish: röda hund, hög hatt, ymnig grönska, duka bordet

Long compounds
Some languages tend to form long compounds, consisting of 3 or more word bases. Some examples:
 * German: Aufmerksamkeitsdefizitsyndrom, Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
 * Finnish: aasiantupsuhäntäpiikkisika
 * Hungarian: ezerkilencszázkilencvenkilenc, övedelemegyenlőtlenség, kompromisszumképtelenség

Lists of compounds
A fairly extensive list of example English compounds is given in a non-native bachelor thesis written in English, sourced from English sources.

Very long lists of compounds are available in Wiktionary categories such as Category:German compound terms. However, these are unreliable and subject to miscategorization.

Long compounds can be found in syllable-count categories such as Category:Finnish 11-syllable words, Category:German 9-syllable words, Category:Polish 9-syllable words and Category:Russian 11-syllable words. Not all of the members need to be compounds.

Neo-classical compounds
Some sources classify the likes of historiography, chromatography and immunological as "neo-classical compounds". They are defined as "words consisting  of  two  or  more  free morphemes (of Latin  or Ancient  Greek) which are  bound,  not  free, in the modern language concerned, such as English biology."

Machine translation
Translating closed compounds (those written solid, with no spaces or hyphens) is a relevant problem for machine translation from languages forming long compounds such as German. These languages form a huge number of transparent long closed compounds, for which it is impractical to maintain a translation dictionary. While breaking these compounds up into components is fairly easy for humans, it is non-trivial for machines. A sum-of-part translation consists in breaking the compound into components and translating the components separately. And example of ambiguity is German "verinbart", which is properly analyzed as a participe of "vereinbaren", but a machine could analyze it as Verein + Bart. (However, even the machine could note that vereinbart is not capitalized and that it is therefore not a noun. Still, the principle remains.)

Compound term
The phrase "compound term" can be found in reference to compounds in linguistics, but seems rare. One user of the term is Dimković-Telebaković, who includes "vertical take-off and landing aircraft" as an example, which would not be considered to be an English compound by many linguists.

What makes compounds interesting
What follows are personal remarks.

Czech and German compounds are interesting morphologically: they show that N+N and A+N can be combined to produce a single morphological word with the first part uninflected, often with the use of linking element (-o- in Czech, -s- in German). Also interesting is how often word stems are used to produce the result: there is no word drap but there is mrakodrap from mrak and dráp-at. There is nothing interesting about syntactically produced A+N phrases; in Czech, there are no N+N phrases. German is interesting in the massive productiveness of its compounding. If compounding were to include syntactic composition, there would be nothing notable about its productiveness: syntactic composition is naturally hugely productive, producing short and long phrases, clauses and compete sentences. It would be meaningless to say that compounding in Czech is much less productive than in German.

In English, compounds are much less interesting. The closed and hyphenated ones are of note. What is also of note is the possibility of N+N phrases, that is, that a noun can modify a noun. Also interesting are the hyphenated attributive forms of noun phrases acting as adjectives. There are no morphological markers of compounds (no lack-of-inflection test, no linking element), just orthographic, phonological and some others. Neo-classical compounds often use the interfix -o- inherited from Ancient Greek, but these are not classified as compounds in Wiktionary. It would be possible to decide to exclude open compounds from the category of English compounds, to make it more interesting, in keeping with Britannica. However, this would be at odds with treatment in many linguistic sources.

Having the category of compounds dominated by the content of the category of multi-word terms is useless for many languages: we can look at the category of multi-word terms and see approximately the same content instead. What is interesting is the true morphological compounding.