Module:data consistency check/documentation

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Checks performed
For multiple data modules:
 * Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
 * Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
 * Each name in the list of other names must appear only once.
 * , if present, must be an array.
 * Wikidata item IDs must be a positive integer or a string starting with  and ending with decimal digits.

The following must be true of the data used by Module:languages:
 * Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
 * The canonical name (field ) must be present and must not be the same as the canonical name of another language.
 * If field  is not , it must a valid Wikidata item ID.
 * If field  or   is given and not , it must be a valid family code.
 * If field  or   is given and not , it must be an array, and each string in the array must be a valid script code.
 * If  is given, it must be an array, and each string in the array must be a valid language or etymology language code.
 * If  is given, it must be a valid family code.
 * If  is given, it must be one of the recognised values.
 * If  is given, it must be a table that contains either two arrays (  and  ) or a string  or both.
 * If  is given, it may either be a string, or at table that in turn contains either two arrays (  and  ) or a string.
 * If  or   is given, the   array must be longer or equal in length to the   array.
 * If  is given, it must form a valid Lua string pattern when placed between square brackets with   before it (lua). (It should match all characters regularly used in the language, but that cannot be tested.)
 * If  is set,   must also be set, because there must be a transliteration module that can override manual transliteration.
 * If  is present, it must be.
 * Have no data keys besides these: lua.

Checks not performed:
 * If  is present, it should be the name of a module, and this module should contain a   function that takes a pagename (and optionally a language code and script code) as arguments.
 * If  is a string, it should be the name of a module, and this module should contain a   function that takes a pagename (and optionally a language code and script code) as arguments.
 * If  or   is a table and contains a field , the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation.

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or lua attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:
 * must be given.
 * must be given must be a valid language, family or etymology-only language code.
 * If  is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
 * Have no data keys besides these: lua.

Codes in Module:families data must:
 * Have, which must not be the same as the canonical name of another family.
 * If  is given, it must be a valid family code.
 * Have at least one language or subfamily belonging to it.
 * Have no data keys besides these: lua.

Codes in Module:scripts data must:
 * Have.
 * Have at least one language that lists it as one of its scripts.
 * Have a  pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets (lua). (It should match all characters in the script, but that cannot be tested.)
 * Have no data keys besides these: lua.