Template:topic cat data submodule documentation

Introduction
This is the documentation page for the main data module for the Module:category tree/topic cat category tree subsystem, as well as for its submodules. Collectively, these modules handle generating the descriptions and categorization for topic pages such as Category:en:Birds, Category:es:France and Category:zh:State capitals of Germany, and the corresponding non-language-specific pages such as Category:Birds, Category:France and Category:State capitals of Germany. (All other categories handled through the auto cat system are handled by the Module:category tree/poscatboiler subsystem.)

The main data module at Module:category tree/topic cat/data does not contain data itself, but rather imports the data from its submodules, and applies some post-processing.


 * To find which submodule implements a specific category, use the search box on the right.
 * To add a new data submodule, copy an existing submodule and modify its contents. Then, add its name to the  list at the top of Module:category tree/topic cat/data.

Per-language and umbrella categories
The topic cat system internally makes a distinction based on which languages a category applies to:
 * 1) Per-language categories. These are of the form   (e.g. Category:es:Birds and Category:de:States of the United States). Here,   is the language code of a recognized full Wiktionary language (see WT:LOL for the list of all such languages and their codes), and   is a topic, generally one that can apply to multiple languages. The intended category contents is terms in the language in question that are either related to, instances of or types of the topic in question (depending on the type of category; see below). Associated with each per-language category is an umbrella category; see below. The following restrictions apply to per-language categories:
 * 2) The language mentioned by   must currently be a full language, not an etymology-only language. (Etymology-only languages include lects such as Provençal, considered a variety of Occitan, and Biblical Hebrew, considered a variety of Hebrew. See here for the list of such lects.)
 * 3) The category label specified by   as found in the category name always begins with a capital letter, whether or not the underlying form of the label is capitalized (contrast Category:en:Birds with Category:en:France). Internally, this is different, and the internal form of a label begins with a lowercase or uppercase letter as appropriate (birds but France).
 * 4) Umbrella categories. These are of the form , i.e. a bare category label. As with per-language categories, this label is always capitalized in the category name, regardless of the underlying form of the label. Examples are Category:Birds, Category:France and Category:State capitals of Germany. Umbrella categories serve to group all the per-language categories for a particular topic. They also serve to group more specific subcategories, e.g. under Category:Birds can be found Category:Birds of prey, Category:Freshwater birds, Category:Columbids (which includes doves and pigeons), etc. as well as Category:Eggs and Category:Feathers. Umbrella categories should not normally directly contain any terms.
 * 5) Unlike for the poscatboiler system, language-specific categories do NOT currently exist. These would be topics that only make sense for a given language or small set of languages, and which are allowed allowed for that language or those languages. Currently, all topics are cross-language even if in practice they don't make sense except in conjunction with a subset of languages; but this may change in the future.

Category types
In addition to the above distinction, the topic cat system divides categories according to the category type, which specifies the relationship between the category and the members of that category:
 * 1) Related-to categories  contain terms that are semantically related to the category topic. For example, Category:en:Chess contains terms such as checkmate, rank (a row on a chessboard), endgame, en passant, Grandmaster, etc. "Related to" is a nebulous criterion, and as a result the terms in the category should be related to the category as directly as possible, to avoid the category becoming a grab bag of random terms.
 * 2) Name  categories contain terms that are names of individual, specific instances of the category. For example, Category:Chess openings contains names of specific openings, such as Ruy Lopez and Sicilian Defense. Even more clearly, Category:Moons of Jupiter contains names of individual moons that orbit the planet Jupiter.
 * 3) Type  categories contains terms for types of the entity described by the category name. For example, Category:Checkmate patterns contains types of checkmates, such as ladder mate and smothered mate. Even more clearly, Category:Hobbyists contains terms for types of hobbyists, such as oenophile (a wine enthusiast), numismatist (a stamp collector), etc. (If this were a name category, it would contain names of specific, presumably famous, hobbyists — something that would probably not be dictionary-worthy material.)
 * 4) Set  categories are used when the distinction between names and types of a given topic may not always be clear, but the overall membership is still well-defined. For example, Category:Heraldic charges contains terms for components of coats of arms, e.g. bend sinister (a diagonal band from lower left to upper right), fleur-de-lis (a stylized image of a lily, as is commonly associated with New Orleans) and quatrefoil (a symmetrical shape made from the outline of four circles).
 * 5) Grouping  categories are higher-level categories that are used only to group more specific categories and should not contain elements themselves (but nevertheless sometimes do). An example is Category:Industries, which contains subcategories devoted to particular industries (e.g. Category:Banking, Category:Mining, Category:Music industry, Category:Oil industry, etc.).
 * 6) Top-level  categories are special high-level categories that list all the categories of one of the above types, and which are always named , e.g. Category:List of related-to categories (listing all the "related-to" umbrella categories) or Category:es:List of name categories (listing all the Spanish name-type categories). The number of top-level categories is fixed.

Note that name, type and set categories are conceptually similar to each other, in that each contains terms that have an relationship with the topic in question, whereas related-to categories express a weaker sort of relation between term and topic, merely asserting that the term is in some way "related" or "pertinent" to the topic in question. For this reason, when creating new topics, you should always strive to create name, type or set topics whenever possible, and avoid related-to topics unless there is no alternative and you're convinced this topic is really necessary. Before creating such a category:
 * 1) Consider whether there is another category already in existence that will cover this semantic space.
 * 2) Consider whether you can convert the category to a name, type or set category.
 * 3) Investigate whether there needs to be a category for the semantic concept at all (in particular, abstract concepts often do not merit related-to categories).
 * 4) Make sure there are enough terms to fill up this category in at least two languages (one of which should be English). What qualifies as "enough" varies a bit from topic to topic but generally should be at least 10.
 * 5) Make sure the terms you add or consider adding to this category are directly related to the topic at hand. Do not add terms merely because the term contains the name of the topic in it (e.g. if you create a category named , do not add terms like brick house, thick as a brick or yellow brick road merely becaues they have the word "brick" in them; instead, use the ===Related terms=== section of the brick lemma to include these terms).

It should also be noted that name, type and set categories typically use the plural in their topic name, which related-to categories often use the singular. This is not a hard and fast rule, however, and there are exceptions in both directions. If it's not obvious what type of category a given topic refers to, consider making this explicit in the topic name, e.g.  or   rather than just. (In the future, all, or at least most, topic categories may be named in such a fashion.)

Adding, removing or modifying categories
A sample entry is as follows (in this case, found in Module:category tree/topic cat/data/History):

labels["ancient history"] = { type = "related-to", description = "default", parents = {"history"}, }

This generates the description and categorization for all per-language categories of the form  (e.g. Category:en:Ancient history) as well as for the umbrella category Category:Ancient history (see above for the definition of per-language and umbrella categories).

The meaning of this snippet is as follows:
 * The label itself needs to use proper capitalization or lower case in the first letter of the label, even though the label as it appears in the category name is always capitalized, consistent with the principle that category names begin with a capital letter. In this case, the label is lowercase, and other labels that reference it need to use the same casing (as in the example below). By contrast, a label like  (as in the example below) is capitalized because the label refers to a specific region, and toponyms are capitalized in English.
 * the  field specifies the category type, as described above. This label is a "related-to" category.
 * The  field gives the description text that will appear when a user visits the category page. Certain special values are recognized, including , which generates a default label. The value of the default label depends on the label's name, the language of the category, and the label's type. In this case, it is equivalent to   (where   is replaced with the name of the language in question) and  " for the umbrella category. See  below for more information on specifying descriptions.
 * The  field gives the labels of the parent categories. Here, the category specifies a single parent  . This means that a category such as Category:en:Ancient history will have Category:en:History as its parent. An additional top-level list parent will automatically be added (in this case Category:en:List of related-to categories) as well as the umbrella parent Category:Ancient history.

Another example follows:

labels["places in Romance of the Three Kingdoms"] = { type = "name", displaytitle = "places in Romance of the Three Kingdoms", description = "=places in ", parents = {"Romance of the Three Kingdoms", "China"}, }

This is a subcategory of  (a 14th century Chinese historical novel) and accordingly specifies   as the parent, along with   (note the capitalization, in accordance with the principles laid out above). A description is given explicitly, preceded by  (which in this case prepends "names for specific" to the description). The  field is also set so that the name of the work is italicized.

Category label fields
The following fields are recognized for the object describing a label:


 * The type of the label ("related-to", "name", "type", "set", "grouping" or "toplevel", as described above. Mandatory. It is possible to specify multiple comma-separated types, for "mixed" categories that can contain more than one type of term. For example, the label  currently has   because it contains a mixture of terms related to flags (e.g. flagpole and grommet), terms for individual flags (e.g. Star-Spangled Banner) and terms for types of flags (e.g. prayer flag, flag of convenience). Mixed categories are strongly dispreferred and should be split into separate per-type categories.
 * The type of the label ("related-to", "name", "type", "set", "grouping" or "toplevel", as described above. Mandatory. It is possible to specify multiple comma-separated types, for "mixed" categories that can contain more than one type of term. For example, the label  currently has   because it contains a mixture of terms related to flags (e.g. flagpole and grommet), terms for individual flags (e.g. Star-Spangled Banner) and terms for types of flags (e.g. prayer flag, flag of convenience). Mixed categories are strongly dispreferred and should be split into separate per-type categories.


 * A plain English description for the label. This should generally be no longer than one sentence. Place additional, longer explanatory text in the  field described below, and put  boxes in the   field described below so that they are correctly right-aligned with the description. Template invocations and special template-like references such as   and   will be expanded appropriately; see  below. Certain values are handled specially, including   (and variants such as ,   and  ) and phrases preceded by an   sign, as explained in more detail below.
 * A plain English description for the label. This should generally be no longer than one sentence. Place additional, longer explanatory text in the  field described below, and put  boxes in the   field described below so that they are correctly right-aligned with the description. Template invocations and special template-like references such as   and   will be expanded appropriately; see  below. Certain values are handled specially, including   (and variants such as ,   and  ) and phrases preceded by an   sign, as explained in more detail below.


 * A table listing one or more parent labels of this label. This controls the parent categories that the category is contained within, as well as the chain of breadcrumbs appearing across the top of the page (see below).
 * An item in the table can be either a single string (the parent label), or a table containing (at least) the two elements  and  . In the latter case,   specifies the parent label name, while the   value specifies the sort key to use to sort it in that category. The default sort key is the category's label.
 * If a parent label begins with  it is interpreted as a raw category name, rather than as a label name. It can still have its own sort key as usual.
 * The first listed parent controls the category's parent breadcrumb in the chain of breadcrumbs at the top of the page. (The breadcrumb of the category itself is determined by the  setting, as described below.)
 * The first listed parent controls the category's parent breadcrumb in the chain of breadcrumbs at the top of the page. (The breadcrumb of the category itself is determined by the  setting, as described below.)


 * The text of the last breadcrumb that appears at the top of the category page.
 * By default, it is the same as the category label, with the first letter capitalized.
 * The value can be either a string, or a table containing two elements called  and  . In the latter case,   specifies the breadcrumb text, while   can be used to disable the automatic capitalization of the breadcrumb text that normally happens.
 * Note that the breadcrumbs collectively are the chain of links that serve as a navigation aid for the hierarchical organization of categories. For example, a category like Category:en:Ancient Near East will have a breadcrumb chain similar to "Fundamental » All languages » English » All topics » History » Ancient history » Ancient Near East", where each breadcrumb is a link to a category at the appropriate level. The last breadcrumb here is "Ancient Near East", and its text is controlled by this field.
 * Note that the breadcrumbs collectively are the chain of links that serve as a navigation aid for the hierarchical organization of categories. For example, a category like Category:en:Ancient Near East will have a breadcrumb chain similar to "Fundamental » All languages » English » All topics » History » Ancient history » Ancient Near East", where each breadcrumb is a link to a category at the appropriate level. The last breadcrumb here is "Ancient Near East", and its text is controlled by this field.


 * Apply special formatting such as italics to the category page title, as with the  magic word (see mw:Help:Magic words). The same formatting is also applied to breadcrumbs, descriptions and other mentions of the label in formatted text. The value of this is either a string (which should be the formatted label, e.g. ,   or  ) or a Lua function to generate the formatted category title. The Lua function is passed two parameters: the raw label (without any preceding language code) and the language object of the category's language (or lua for umbrella categories). It should return the appropriately formatted label. If the value of this field is a string, template invocations and special template-like references such as   and   will be expanded appropriately; see below. See Module:category tree/topic cat/data/Culture for examples of using.
 * Apply special formatting such as italics to the category page title, as with the  magic word (see mw:Help:Magic words). The same formatting is also applied to breadcrumbs, descriptions and other mentions of the label in formatted text. The value of this is either a string (which should be the formatted label, e.g. ,   or  ) or a Lua function to generate the formatted category title. The Lua function is passed two parameters: the raw label (without any preceding language code) and the language object of the category's language (or lua for umbrella categories). It should return the appropriately formatted label. If the value of this field is a string, template invocations and special template-like references such as   and   will be expanded appropriately; see below. See Module:category tree/topic cat/data/Culture for examples of using.


 * Introductory text to display right-aligned, before the edit and recent-entries boxes on the right side. This field should be used for wikipedia and other similar boxes. Template invocations and special template-like references such as  and   are expanded appropriately, just as with  ; see  below. Compare the   field, which is similar to   but used for left-aligned text placed above the description.
 * Introductory text to display right-aligned, before the edit and recent-entries boxes on the right side. This field should be used for wikipedia and other similar boxes. Template invocations and special template-like references such as  and   are expanded appropriately, just as with  ; see  below. Compare the   field, which is similar to   but used for left-aligned text placed above the description.


 * Introductory text to display directly before the text in the  field. The difference between the two is that   text will also be shown in the list of children categories shown on the parent category's page, while the   text will not. For this reason, use   instead of   for also hatnotes and similar text, and keep   relatively short. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below. Compare the   field, which is similar to   but is right-aligned, placed above the edit and recent-entries boxes.
 * Introductory text to display directly before the text in the  field. The difference between the two is that   text will also be shown in the list of children categories shown on the parent category's page, while the   text will not. For this reason, use   instead of   for also hatnotes and similar text, and keep   relatively short. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below. Compare the   field, which is similar to   but is right-aligned, placed above the edit and recent-entries boxes.


 * Additional text to display directly after the text in the the  field. The difference between the two is that   text will also be shown in the list of children categories shown on the parent category's page, while the   text will not. For this reason, use   instead of   for long explanatory notes, See also references and the like, and keep   relatively short. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below.
 * Additional text to display directly after the text in the the  field. The difference between the two is that   text will also be shown in the list of children categories shown on the parent category's page, while the   text will not. For this reason, use   instead of   for long explanatory notes, See also references and the like, and keep   relatively short. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below.


 * Display a box linking to a Wikipedia entry in the upper right corner. The value can either be true to link to an entry that is the same as the label; a string, to link to that entry; or a list of strings or true, to generate multiple boxes, one per list item. For example, if the label  has , a box will be generated that links to  on Wikipedia, and if the label   has  , a box will be generated that links to  on Wikipedia.
 * Display a box linking to a Wikipedia entry in the upper right corner. The value can either be true to link to an entry that is the same as the label; a string, to link to that entry; or a list of strings or true, to generate multiple boxes, one per list item. For example, if the label  has , a box will be generated that links to  on Wikipedia, and if the label   has  , a box will be generated that links to  on Wikipedia.


 * Display a box linking to a Wikipedia category in the upper right corner. This is similar to  except that the link is to a category (the generated entry or entries is/are prepended with  ). For example, if the label   has   set, a box will be generated that links to  on Wikipedia.
 * Display a box linking to a Wikipedia category in the upper right corner. This is similar to  except that the link is to a category (the generated entry or entries is/are prepended with  ). For example, if the label   has   set, a box will be generated that links to  on Wikipedia.


 * Display a box linking to a Wikimedia Commons category in the upper right corner. This is similar to  except that the link is to Wikimedia Commons instead of Wikipedia. For example, if the label   has   set, a box will be generated that links to Category:Racquet sports on Wikimedia Commons.
 * Display a box linking to a Wikimedia Commons category in the upper right corner. This is similar to  except that the link is to Wikimedia Commons instead of Wikipedia. For example, if the label   has   set, a box will be generated that links to Category:Racquet sports on Wikimedia Commons.


 * Text indicating the topic being handled by this category. This appears in the auto-generated "additional" message following the description, which indicates what type this category is (based on the  field) and what sorts of terms should go into it. This does not normally need to be specified, as it's derived directly from the label. But it is useful e.g. for the label types of planets, which sets , because the auto-generated "additional" message contains the text  , and using the label directly will result in redundant text. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below. The value of this field can be   or  , which will be expanded appropriately based on the label.
 * Text indicating the topic being handled by this category. This appears in the auto-generated "additional" message following the description, which indicates what type this category is (based on the  field) and what sorts of terms should go into it. This does not normally need to be specified, as it's derived directly from the label. But it is useful e.g. for the label types of planets, which sets , because the auto-generated "additional" message contains the text  , and using the label directly will result in redundant text. Template invocations and special template-like references such as   and   are expanded appropriately, just as with  ; see  below. The value of this field can be   or  , which will be expanded appropriately based on the label.


 * A table describing the umbrella category that collects all language-specific categories associated with this label. The umbrella category is named using the label, without any language prefix. For example, for the label ancient history, the umbrella category is named Category:Ancient history, and is a parent category (in addition to any categories specified using ) of Category:en:Ancient history, Category:fr:Ancient history and all other language-specific categories holding adjectives. This table contains the following fields:
 * A plain English description for the umbrella category. By default, it is derived from the  field of the label itself by removing language references (specifically, ,  ,   and  ) and adding This category concerns the topic: before the result. Text is automatically added to the end indicating that this category is an umbrella category that only contains other categories, and does not contain pages describing terms.
 * The last breadcrumb in the chain of breadcrumbs at the top of the category page; see above. By default, this is the category label.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.
 * Like the  field on regular category pages; see above.


 * The same as the  subfield of the   field.
 * The same as the  subfield of the   field.

Template substitutions in field values
Template invocations can be inserted in the text of,   (both name and sort key),  ,   and   values, and will be expanded appropriately. In addition, the following special template-like invocations are recognized and replaced by the equivalent text:
 * The name of the current page. (Note that two braces are used here instead of three, as with the other parameters described below.)
 * The name of the current page. (Note that two braces are used here instead of three, as with the other parameters described below.)


 * The name of the language that the category belongs to. Not recognized in umbrella fields.
 * The name of the language that the category belongs to. Not recognized in umbrella fields.


 * The code of the language that the category belongs to (e.g.  for English,   for German). Not recognized in umbrella fields.
 * The code of the language that the category belongs to (e.g.  for English,   for German). Not recognized in umbrella fields.


 * The name of the language's main category, which adds "language" to the regular name. Not recognized in umbrella fields.
 * The name of the language's main category, which adds "language" to the regular name. Not recognized in umbrella fields.


 * A link to the language's main category. Not recognized in umbrella fields.
 * A link to the language's main category. Not recognized in umbrella fields.


 * The message normally at the end of the description for umbrella categories, indicating that the category contains no terms but only subcategories.
 * The message normally at the end of the description for umbrella categories, indicating that the category contains no terms but only subcategories.


 * The value of the  field (or the   field for umbrella categories), if specified; else, the value of   (if specified) or the label, with "the" added if the description is   or a variant containing   (such as  ).
 * The value of the  field (or the   field for umbrella categories), if specified; else, the value of   (if specified) or the label, with "the" added if the description is   or a variant containing   (such as  ).

Descriptions
The description field is of one of three types: If preceded by, the description is generated from the specified phrase by prepending   (which is replaced with the language name) followed by standard type-dependent text, and appending a period. The text prepended is currently as follows: For example, for the label, the description is currently  , which expands to  , and in turn is expanded to e.g.   (if the category is Category:fr:Biblical characters).
 * 1) An English sentence, ending in a period.
 * 2) A phrase preceded by   and not ending in a period.
 * 3) The value   or one of its variants, such as   or.

Note that no standard text is provided for top-level categories, all of which include a custom description.

If  or one of its variants is used as the description, a default description is generated as if the description consisted of   prepended to the label, except that the word   might be added to the beginning of the label, and the words in the label might be wikilinked. Specifically: For example, a label  will be linked as   because the page video game exists, but   will be linked as   because neither Arabian deity nor Arabian deities exists as a page. The use of  is needed with labels such as ,   and  , because their respective singular forms linguistic, comic and humanity exist as Wiktionary pages.
 * 1) If the description is of the form   (or a form such as ,  , etc.), the word   is prefixed to the label.
 * 2) If the label is of the form   (or a related form), the label is linked to Wikipedia. If the label ends in an -s, the label is linked to a Wikipedia entry based on the singular form of the label (which converts -ies to -y; converts -xes, -ches or -shes, respectively, to -x, -ch or -sh; and otherwise just removes -s), unless the label is   or a related form, in which case the label is linked unchanged.
 * 3) Otherwise, the code attempts to link the entire label or the individual words of the label to Wiktionay terms, as follows:
 * 4) If the label ends in -s and   is not specified in the description, and the singular form of the label (generated according to the algorithm described just above) is a Wiktionary term, the label is linked to that term. Note that "is a Wiktionary term" simply means that a page of this name exists; the code does not currently check to see whether there is an English entry or whether the term is a lemma.
 * 5) Otherwise, if the label itself is a Wiktionary term, the label is linked to that term.
 * 6) Otherwise, the label is split into individual words, and each word is checked to see if a page named according to that word exists. If so, the individual words are linked to their corresponding Wiktionary entries; otherwise, the label is left unlinked. Note that the last word is handled specially if it ends in -s and   is not found in the description, in that the code first attempts to link the word to its singular equivalent, falling back to the word itself if the singular equivalent doesn't name a Wiktionary term.

Finally, note that the components of a default-type description (,  and  ) can be given in any order if more than one of them needs to be specified.

Handlers
It is also possible to have handlers that can handle arbitrarily-formed labels, e.g.  for any   (categories such as Category:tg:Political subdivisions of the United Arab Emirates) or   for any   and   (e.g. Category:fr:Counties of South Korea or Category:pt:Municipalities of Tocantins, Brazil). Currently, handlers exist only in the toponym-handling code in Module:category tree/topic cat/data/Places and in Module:category tree/topic cat/data/Names. As example, the following is the handler for :

table.insert(handlers, function(label)	local script = label:match("^(.*) letter names$")	if script then		local sc = require("Module:scripts").getByCanonicalName(script)		if sc then			local script_page			local appendix = ("Appendix: %s script"):format(script)			local appendix_title = mw.title.new(appendix)			if appendix_title and appendix_title.exists then				script_page = appendix			else				script_page = "w:" .. sc:getWikipediaArticle			end			local link = ("%s script"):format(script_page, script)			return {				type = "name",				description = (" terms that serve as names for letters and symbols directly based on letters, " .. "such as ligatures and letters with diacritics, of the %s."):format(link),				parents = {"letter names"},			}		end	end end)

The handler checks is passed a single argument (the label), checks if the passed-in label has a recognized form, and if so, returns an object that follows the same format as described above for directly-specified labels. In this case, the handler makes sure the given script name specifies an actual script, and constructs an appropriate link for the script, depending on whether an appendix page for the script exists (falling back to Wikipedia).

NOTE: The handler needs to be prepared to handle both umbrella categories and per-language categories. The label is passed in as it appears in the category; this means the handler may need to handle both uppercase-initial and lowercase-initial variants of the label. (For this handler, this isn't an issue because the script always appears uppercased.) One way to do that is to convert the label to lowercase-initial before further processing, using.

Note also that if a handler is specified, the module should return a table holding both the label and handler data; see the above modules.