Talk:open compound

Demarcation of the notion
The notion of an open compound is confusing. What, in English, counts as an open compound?
 * If compound is defined as a word, how can open compounds be compounds? They are not words, typographically speaking. Are they words, phonologically speaking?
 * A Google paper definition is this: "A compound is a lexeme that consists of more than one stem. Informally, a compound is a combination of two or more words that function as a single unit of meaning." Here we see the requirement of "single unit of meaning", but only in the second informal definition. This requirement may be implied in the "lexeme" part of the formal definition.
 * Let's suppose that school bus counts, although its being a single unit of meaning can be questioned.
 * Does school bus traffic stop laws count? It is sum of parts and thereby fails the single-unit-of-meaning criterion.
 * Does any adjective-noun phrase count, sum of parts or not? Seems counter-intuitive to me. If all  phrases count as compounds, is not the notion of a compound too diluted?
 * vocabulary.com gives full moon as an example of an open compound, supporting  open compounds.
 * If "black hole" is a single unit of meaning, it is a compound by Google's definition.
 * Since "green leaf" is not a single unit of meaning (it is sum of parts), it would not be a compound. But something being a compound depending on semantics and not on syntax seems weird to me.
 * Czech "černá díra" (black hole) is a single unit of meaning, and it would be a compound by Google's definition, but that is rejected by Czech linguistics.
 * Does any noun-noun phrase count, sum of parts or not?
 * Closed compound sunshine is a compound despite not acting as a single unit of meaning; sun shine would be sum of parts.
 * coalmine is a compound, but is coal mine a compound despite being sum of parts?
 * This throws doubt on the single-unit-of-meaning non-syntactic criterion.
 * Do phrasal verbs such as look up count? Our example hang out suggests so. Does not it make the notion too diluted? yourdictionary.com seems to think so by giving carry over as an example. And a phrasal verb meets Google's definition of a compound, by acting as a single unit of meaning.

In Czech, černá díra (black hole) is not a compound, while černokněžník is. The latter is no open compound. I would argue there are no open compounds in Czech. A Czech source is czechency.org:Kompozitum, and is consistent with this interpretation. This contradicts Google's semantic non-syntactic definition.

In German, I don't think schwarzes Loch counts as compound (Kompositum). I would argue there are no open compounds in German. German, being a Germanic language like English, provides an interesting window on English: the English linguistics possibly allows the notion of an open compound since it corresponds to the German notion of closed compound to a large extent. Our Category:German compound terms is currently mostly consistent with this interpretation but for exceptions such as Anmaßung der Autorenschaft, which is arguably a misclassification. A German source is Duden:Kompo­sition: Zusammen­schreibung, Ge­trennt­schreibung, Binde­strich, and is consistent with this interpretation. This contradicts Google's semantic non-syntactic definition. See also Leerzeichen in Komposita, which is sourced and says that compounds are written solid. The German article covers solid writing of compounds in some other languages as well.

As for Finnish, I don't know it, so I cannot access Finnish sources. However, our translation table says "but note that Finnish has no open compounds". Finnish, like German, is noted for its long closed compounds. Leerzeichen in Komposita says "werden Komposita überwiegend zusammengeschrieben", without source.

Input for other languages with sources would be welcome.

--Dan Polansky (talk) 09:40, 28 September 2022 (UTC)
 * Expanded. --Dan Polansky (talk) 15:03, 1 October 2022 (UTC)
 * @Dan Polansky Stop making unilateral changes without consensus. Theknightwho (talk) 16:13, 1 October 2022 (UTC)
 * You are referring to my fix in schwarzes Loch and in other German entries. You have no idea what you are talking about; check the provided sources. --Dan Polansky (talk) 16:17, 1 October 2022 (UTC)
 * @Dan Polansky I said you should stop making unilateral changes without consensus, which holds weight irrespective of the material facts. Your continuous arrogance is antithetical to the entire project. Theknightwho (talk) 16:19, 1 October 2022 (UTC)
 * Making obviously necessary corrections based on solid understanding of sources does not require a previous discussion. This is just ignorant harassment. --Dan Polansky (talk) 16:25, 1 October 2022 (UTC)
 * No, it isn't. It's holding you to the standards that you are expected to abide by, which (unfortunately for you) means we can't assume your opinion is automatically correct. The fact you call that harassment is galling. Theknightwho (talk) 16:27, 1 October 2022 (UTC)

I am not a linguistic scientist, and I hesitate to even comment here at all given the existing rancor, but I will just add some thoughts that are definitely relevant to the ontology of what an open compound is. (1) One cannot credibly allege that the open, hyphenated, and solid alternative orthographic forms of terms such as the following constitute fundamentally different phenomena; it is obvious that they are a unitary phenomenon whose orthographic representation solidifies over time: street car/street-car/streetcar, coal mine/coal-mine/coalmine, motor car/motor-car/motorcar, and countless others. Multiple canonical usage guides agree that such compounds tend to solidify over time (orthographically); for example, Bernstein 1965 says it, as I recall. One thing that is obvious about many of them is that they may begin as mere noun phrases of collocational type but they become something more than that, and it is clearly the same kind of thing (i.e., morphologic phenomenon) that [modern, standard] mandates solidification for although  does not. In other words, the two languages clearly apply different orthographic rules to the same underlying morphologic phenomena. But English indisputably makes a special case of unisyllabic components, though, as multisyllabic components are usually forced to remain open, not solid. This is why terms like motor car and coal mine often tend toward being motorcar and coalmine over time but shoe factory and motor vehicle are never allowed to become *shoefactory and *motorvehicle in [standard] English [unlike in German]; but that doesn't make them a different morphologic phenomenon from the aforementioned ones, though; rather, it is an artifactual difference of [standard] orthography (regarding conceptualization) to represent the same underlying morphologic engine under the hood. And when one reads 19th-century American literature, one encounters hyphenated compound noun stylings that today would be either open or solid (depending on the term), which illustrates that English speakers were groping their way toward orthographic standardization on a spectrum of completeness (from less unified to more unified). In fact I'm pretty sure I remember reading a discussion about Webster's Second versus Webster's Third that talked about how that class of hyphenation was dialed back for the newer edition, to reflect how standard had been evolving since the earlier edition was edited. Anyhow, the last points I wanted to add here for now were these: (1) Anyone who misapprehends that coalmine and streetcar and icepick are compound nouns but coal mine and street car and ice pick are not is more misguided than they think themselves to be, and I don't mean that as an attack at all, just an objective fact that happens to be an unknown unknown to them. (2) The unconscious minds of speakers of natural languages know rules that their conscious minds do not know, and that is largely what linguistic science is all about: uncovering those and converting them to consciously formulated statements. It is why we all can speak fluently in our native languages despite not being linguists and not knowing linguistics. This phenomenon is also touched on at the tail end of this thread. And conceptualization is a [surprisingly] challenging matter of both morphology (independent of orthography, although most non-linguists do not realize that, and can hardly even believe it when told) and orthography (which is more [inherently] arbitrary than many people realize). Regards, Quercus solaris (talk) 04:27, 2 October 2022 (UTC)
 * The above implies that what makes a compound in English is morphology. But if we believe the source that says that green house (house that is green) is not a compound while is, that cannot be so: both items have the same morphology. Other than that, the above mostly reaffirms the existence of open compounds without advancing distinguishing criteria forward. The questions I raised are left without an answer. No online source is quoted or linked; only a single offline source is mentioned. The question whether and why school bus traffic stop laws is a compound is left unanswered. The shifts in spelling may accompany shifts in phonology, and that may matter; the claim that street car was necessarily a compound before any shifts in spelling or phonology happened is left without proof or source. To claim that the morphology of English and German compounding is fundamentally the same underneath is left without proof and is not plausible: German has a sharp contrast between schwarzes Loch and Schwarzwald, which the putative spelling Schwarz Wald (sometimes found in English sources) would not change: if the first part does not inflect, the whole is plausibly a compound. The German demarcation is clear without knowing the spelling; the English one isn't. Word boundary in Czech and German is largely unproblematic and no source was provided to say otherwise. At least, we seem to agree that compounds are words and that this is a key distinguishing criterion. --Dan Polansky (talk) 07:05, 2 October 2022 (UTC)


 * Yes, I do not disagree about certain things that you point out or mention. But not all of them. It is quite right that green house (semantic notion: a house that is green in color) is not a compound noun, and it is not the same thing as a . But that illustrates that one of the things that is going on here is what linguists call . There are various kinds of zero (e.g., zero inflection, zero copula, zero article, others). The underlying difference between green house and greenhouse and the difference between ice pick and icepick are not the same difference even though their orthographic difference is the same. There are underlying factors being reduced to zero (no visible sign), but they still exist. This is also analogous to how homography happens and to how lossiness in data compression happens. But the zero does not negate the existence of different kinds of underlying differences whose differences from each other are masked/latent/hard to see plainly because not signaled. As the Wikipedia article that concerns notes, there are various factors involved in (and linguistic tests for) where word boundaries will be declared, and phonemics and semantics and syntax and morphology and orthography can be involved in multivariate ways; and furthermore, I will add, the etic-versus-emic difference is operative too. (Aside: You point out the difference between online and offline references and seem to imply (by doing so) that the latter are inherently inferior, but that assertion, if it was meant, is unsound.) You said "The above implies that what makes a compound in English is morphology", but no, that is not wholly what the above is saying. It is saying that semantics and morphology and orthography are mixed together in complex ways, both in constituting the underlying identity/definition of compound nouns and in reacting to it (which orthography does to some degree), and it can change over time and it can also vary across the minds of speakers simultaneously in ways that are latent and do not prevent successful communication. There is a mistaken interpretation in the line "the claim that street car was necessarily a compound before any shifts in spelling or phonology happened is left without proof or source", and the subtle quibble with that interpretation has to do with the dynamism over time and the inter-speaker duality. This has to do with what I said about "One thing that is obvious about many of them is that they may begin as mere noun phrases of collocational type but they become something more than that [i.e., a morphologic phenomenon]." What that means is that when people first started calling streetcars "street cars" most of them probably did not consider that utterance a compound noun (rather, they viewed it as a noun adjunct+noun = adjective+noun = noun phrase), but eventually the concept of a streetcar evolved into a concept whose name was a compound noun ("something more") in many minds. However, the zero-orthographic-signal difference between the two parsings of street car as noun adjunct+noun and street car as compound noun allows different speakers to have different underlying notions, with the difference being latent, even though they understand each other. This is part of why this topic is so challenging. It also has to do with why the alternative orthographic forms for a single compound noun persistently coexist in usage over decades. And also with why people haggle over which terms are solely sum-of-parts or are "something more" idiomatically. Natural language can hold dualities simultaneously in some ways without failing in communication and yet also without signaling them explicitly (zero). I am not a linguist but I remain convinced that the ideas written here are indeed operating with regard to this topic. It also reminds me of what  says about parts of speech in his chapter on grammar in the latest CMOS (speaking of giving you a reference that is available online). He explains that humans can't even agree on the definition of them. Some of the phenomena that linguistics tries to unravel and elucidate are indeed quite complex. That's why this topic of compound nouns in English is as thorny as it is, and why none of us here have short, easy, unassailable answers to it yet. Quercus solaris (talk) 07:57, 2 October 2022 (UTC)


 * By the way, regarding the noun phrase school bus traffic stop laws, many English speakers would describe it as comprising noun adjunct+noun adjunct+noun, and within that paradigm, each noun adjunct can be either a [single-word] noun or an open compound or other noun phrase. Thus open compounds and even other noun phrases can serve as noun adjuncts inside other (larger) noun phrases. Other examples would be airport coffee shop Wi-Fi and book store bathroom policies, where airport and coffee shop are both noun adjuncts, and book store and bathroom are both noun adjuncts, and furthermore, speakers can disagree until the cows come home whether book store is a sum-of-parts noun adjunct+noun unit or it is a compound noun, bookstore. English holds dualities in coexistence via zero-signal, and some kinds of parsing differences (between minds) can exist without communicative failure. Quercus solaris (talk) 08:30, 2 October 2022 (UTC)
 * All these words and yet we are no wiser about how to distinguish a compound street car from a non-compound street car, or a compound car key from a non-compound car key, if there is such a distinction at all. There is some zero invisible difference and that's it. My sources improve upon this by mentioning phonology, quite plausible, and the form being a single unit of meaning, quite implausible given proverbs and given "violin teacher" being allegedly a compound. No further online sources we could use are provided; "giving you a reference that is available online": why do we get no link? Instead, we get more of the sort of word overflow that we find e.g. in the usage notes in developed country.
 * The question about school bus traffic stop laws was whether it is a compound: a lot of talk about it, but no answer and no source.
 * We will need to classify English adj-noun and noun-noun phrases as either compounds or non-compounds. We still don't know how to do it. --Dan Polansky (talk) 08:41, 2 October 2022 (UTC)


 * TL;DR I do hear what you're saying, and I don't dispute the frustrating nature of the problem. And I'll stop trying to help soon, to spare you the annoyance. But I'll just sum up by saying that I think you are trying to find the master key to something that humans don't yet have a master key to. I realize that any dictionary needs operational definitions so that it can impose concrete rules on which terms are headwords, for example, and what part of speech they are assigned, and that is the practical application that you are trying to solve. But I still think that the theory to inform that application is elusive; we won't find a on it anytime soon and therefore dictionary editors will continue to have to haggle over the assignments for individual terms, case by case (e.g., SOP in eye of beholder; and mere noun phrase versus compound noun). I realize that that probably seems like a bad answer. If you find a better one, good on you, and I will learn from your findings. You could also try asking a professional linguistic scientist and see if they'll engage in answering. I suspect that they won't provide any master key, and that they'll write more words than you're willing to read. Don't bother reading the rest if you don't want to: This effort is facing the gap between traditional grammar and linguistics, as well as the gap between description and prescription. It is a huge gap that humans still don't have all the answers to yet, despite trying really hard for many decades to develop unified theories that completely solve for precisely what natural language is and exactly how it works. Traditional grammar does not actually entirely understand the underlying phenomena that it believes that it models (in other words, it develops models that get some good degree of fit but not perfect), although plenty of its practitioners have assumed that it does. The CMOS chapter link is trivial to find from the CMOS TOC, but here it is anyway: https://www.chicagomanualofstyle.org/book/ed17/part2/ch05/toc.html . Regarding the exposition on terminology of countries, I stand by it as a cogent solution that works successfully in the area that I work in (which includes reducing usage disputes enough for people to create STM publications that stand up to others' usage scrutiny without crumpling); nonetheless, I guess I have finally been taught that Wiktionary cannot be a place for such solutions (although I wish it could be), because most people think they're inscrutable. But they're necessary in the field that I work in, and they do have analytic integrity and semantic sense (albeit lost on many people). Quercus solaris (talk) 16:06, 2 October 2022 (UTC)
 * Unhelpful. Meanwhile, another editor has posted notes on CGEL treatment of compounds, how to distinguish them from free phrases. He told much more in much fewer words. That is productive: work with sources and work toward specific tests instead of claiming tests are impossible. Linguists have been in the search for the tests and the objective is to discover what they found, not to claim there is nothing to be found. Some things found:
 * How is 'compound noun' defined in CGEL?, english.stackexchange.com
 * Why do grammars claim that adjective+adjective is always a morphological compound and never a syntactic construction?, english.stackexchange.com
 * With the help of the input from the other editor, I found this:
 * Compound Nouns and Noun Phrases by Michaela Bartušová, theses.cz
 * It reports on some of the tests proposed by English-speaking linguists. wet day is not a compound while small talk is, for instance. More could be found, by those who have something substantive to say about the matter and about sources. --Dan Polansky (talk) 16:53, 2 October 2022 (UTC)

Phonological criteria
Multiple sources use phonological criteria to define "compound": --Dan Polansky (talk) 11:29, 28 September 2022 (UTC)
 * Britannica's article on compounding states: "They differ from word groups or phrases in stress, juncture, or vowel quality or by a combination of these. Thus, already differs from all ready in stress and juncture, cloverleaf from clover leaf in stress, and gentleman from gentle man in vowel quality, stress, and juncture." Boldface mine.
 * thoughtco.com: "Typically a compound begins as a kind of cliché, two words that are frequently found together, as are air cargo or light colored. If the association persists, the two words often turn into a compound, sometimes with a meaning that is simply the sum of the parts (light switch), sometimes with some sort of figurative new sense (moonshine). The semantic relationships of the parts can be of all kinds: a window cleaner cleans windows, but a vacuum cleaner does not clean vacuums. We can be sure we have a compound when the primary stress moves forward; normally a modifier will be less heavily stressed than the word it modifies, but in compounds, the first element is always more heavily stressed." They actually quote Kenneth G. Wilson. Boldface mine.
 * englishclub.com: "Compound nouns tend to have more stress on the first word. In the phrase "pink ball", both words are equally stressed (as you know, adjectives and nouns are always stressed). In the compound noun "golf ball", the first word is stressed more (even though both words are nouns, and nouns are always stressed). Since "golf ball" is a compound noun we consider it as a single noun and so it has a single main stress - on the first word. Stress is important in compound nouns. For example, it helps us know if somebody said "a GREEN HOUSE" (a house which is painted green) or "a GREENhouse" (a building made of glass for growing plants inside)." Boldface mine.

Separate inflection criterion
For highly inflected languages such as Czech and German, one test of compoundhood is this: if the parts of the phrase inflect separately, the phrase is not a compound.

For Czech, černý kníže is not a compound ( inflects separately) while is a compound (černo- does not inflect).

For German, is a not a compound ( inflects separately) whie  is a compound (Schwarz- does not inflect.

For English, this test does not work: in, black does not inflect for gender, case or plurality, so there is no telling.

For Polish, the test works: is not a compound, while  is.

For Russian, the test works: is not a compound, while  is.

Translations of black hole yield examples for more languages, especially Slavic ones.

Hungarian is interesting: the entry says fekete does not inflect so one may somewhat plausibly claim this is an open compound.

--Dan Polansky (talk) 07:05, 2 October 2022 (UTC)

History of the notion
The notion of "open compound" can be traced in Google Books to 2nd half of 20th century; see the quotations in the entry. By contrast, quotations of "compound word" can be found as old as of 18th century, giving examples of compound words but none of open compounds.

Even today, Britannica's article on compounding seems to exclude open compounds from the notion of a compound by 1) giving no example of an open compound and 2) by stating: "They differ from word groups or phrases in stress, juncture, or vowel quality or by a combination of these. Thus, already differs from all ready in stress and juncture, cloverleaf from clover leaf in stress, and gentleman from gentle man in vowel quality, stress, and juncture." Thus, "cloverleaf" is a compound while "clover leaf" is not. While Britannica's demarcation seems phonological, it is also syntactic, allowing closed and hyphenated compounds. --Dan Polansky (talk) 10:59, 28 September 2022 (UTC)

Examples
Some example open compounds from sources:

Adjective-noun:
 * full moon, high school, real estate, first aid, hot dog

Noun-noun:
 * ice cream, blood pressure , violin teacher , web page , voice mail

Phrasal verbs:
 * cave in, break up

Long lists of compounds are available here, in a thesis written in English, in "II.Practical Part":
 * Compound Nouns and Noun Phrases by Michaela Bartušová, theses.cz

References: --Dan Polansky (talk) 14:28, 1 October 2022 (UTC)
 * Updated. --Dan Polansky (talk) 16:38, 2 October 2022 (UTC)