Template talk:taxlink

Future development
This template or a derived version may be used to put entries with multipart taxonomic names into distinct categories for each of their missing components. DCDuring TALK 02:54, 9 September 2012 (UTC)

Use with Template:term
undefined:, ie, as SECOND, NOT FIRST parameter, allows 's functionality to be combined with almost all of that of. Use as second parameter prevents error messages from Luacization of and provides the user with a link to the Wikispecies entry, if any, or to a Wiktionary entry once created. DCDuring TALK 08:46, 16 June 2013 (UTC)

RFC discussion: August 2013–April 2014
This is creating masses of red linked categories where I don't understand what the categories even mean, for example Category:Entries missing taxonomic name Cornus sericea. Two things, why would we want divide these up by specific taxonomic names (in this example Cornus sericea) instead of putting everything in Category:Entries missing taxonomic name. Secondly, the categorization shouldn't work in all namespaces. Definitely not the user namespace. Appendix probably. But not all namespaces. Mglovesfun (talk) 21:00, 25 August 2013 (UTC)
 * This approach finds the missing taxonomic names that commonly used based on occurences of to focus new-entry creation on those.  Usually, in addition to those so categorized, there are also some number of other entries that use the missing taxonomic name unlinked.  One good way to clean up the missing (red) categories is to add the taxonomic names that are missing and then add the missing ordinary links to those taxonomic name entries, also cleaning up the superfluous occurrences of .  I'd be glad to provide the entry starter that I use for species names to anyone interested.  If you take a look at Special:WantedCategories you will see that a large portion of the missing categories at the bottom of the first page and thereafter are for missing taxonomic names. I have started at the top and have added entries for all of those with five or more items in the category.
 * It works in User namespace because I have some lists of species in my user space that are not worth making an appendix for. I suppose I could just work from those lists as they include the species of most interest to me for one reason or another.
 * You may also note that there are several thousands of entries with red categories that have nothing to do with taxonomic names. DCDuring TALK 00:19, 26 August 2013 (UTC)
 * Now that I have finally had adequate success with bot runs that cover this, I have eliminated this categorization from . This should be shown in the next run of Special:WantedCategories, c. 12/16-17. DCDuring TALK 15:38, 15 December 2013 (UTC)


 * ✅ DCDuring TALK 00:21, 2 April 2014 (UTC)

When to use taxlink
I seem to be using this template in the wrong places. Where is it appropriate to use this template?
 * I just noticed this. Do you still have the problem? DCDuring TALK 13:55, 3 October 2016 (UTC)
 * Thank you for the reply. I have not been contributing much in Wiktionary over several months and do not remember what this was about. If I remember, I will ping you. —BoBoMisiu (talk) 00:13, 4 October 2016 (UTC)

To be implemented
Please use these where warranted in anticipation of the implementation.

The optional parameter "nospe=1" should be used if there is no current link at Wikispecies. It deactivates linking. If this is used there there should be another link to Wikispecies, as under an External links header to a higher taxon that included the taxon of the headword.

The optional parameter "obs=1" should be used to indicate that there is positive indication that the taxon is obsolete. This typically arises in the etymology of a higher taxon which is derived by suffixation to such an obsolete genus name. This will also automatically deactivate linking to Wikispecies.

Experiment (inactive)
Experimentally, this template categorizes each templated item itself in a category which includes the name of the missing item. This permits the use of Special:WantedCategories to count all the entries that have templated use of the name to help speed the creation of entries for the most missed names. For now it is restricted to genus names. In the near future, it will be switched to family names, then species names, cycling through those until only single-page wants remain. With the creation of these additional entries it will probably be useful to cycle through more than once, before adding other levels of taxonomic names, such as orders, tribes, subfamilies etc.

noshow?
Could someone explain in the documentation what noshow does? I see it used in entries, but am not sure whether to use it or not because I don't know what it's for. — Eru·tuon 17:41, 12 November 2016 (UTC)
 * The parameter populates a maintenance category Category:Entries using the taxlink template. I use the category to keep track of taxonomic entries that are changed, without having to further clutter my watchlist. I also use it to check on vernacular-name entries that use, mostly to see whether the name is spelled correctly and is the one currently applicable. As a result, there is no harm and some benefit from not including it. Among other things I learn how little interest there is in editing taxonomic name entries. DCDuring TALK 14:24, 13 November 2016 (UTC)

Non-italicized elements of nomenclatures
Would it be possible to tweak the template so that some elements of a nomenclature can be specified not to be italicized? For example, at, ''Pinus nigra subsp. salzmannii should appear as Pinus nigra'' subsp. salzmannii. However, does not achieve that effect. — SGconlaw (talk) 18:12, 29 July 2017 (UTC)
 * Module:italics can generate the correct italicization automatically, so maybe it should be invoked to generate the displayed form if the rank is species, subspecies, variety, form, and whatever else is italicized. — Eru·tuon 18:27, 29 July 2017 (UTC)
 * How would that be done within templates and ? DCDuring (talk) 19:52, 29 July 2017 (UTC)
 * OK, I have no idea how to invoke a module. See my pathetic attempts in the history of the template. — SGconlaw (talk) 19:58, 29 July 2017 (UTC)
 * I think I fixed it. See my testcases. — Eru·tuon 19:59, 29 July 2017 (UTC)
 * Thanks! — SGconlaw (talk) 20:00, 29 July 2017 (UTC)
 * As to invoking modules: the word directly after  is the function name, which you place in the first parameter of : . So you don't include the   part. — Eru·tuon 20:03, 29 July 2017 (UTC)
 * You can see how I inserted it. If the rank is specified as species, subspecies, variety, use the module with . (I modified Module:italics to make it possible.) — Eru·tuon 20:03, 29 July 2017 (UTC)
 * It should also be specified as italics for genus, subgenus, form, section, subsection (the latter four respectively abbreviated subg., f., sect., subsect.). There may be others, but I'll try to follow the pattern as I discover more. There are many, many thousands of improper non-italicized instances of.
 * And thanks for doing the work. I hadn't noticed (remembered?) when you first created the module. DCDuring (talk) 20:11, 29 July 2017 (UTC)
 * I saw where to add such not-to-be-italicized text (etc.) in module:italics. DCDuring (talk) 20:19, 29 July 2017 (UTC)
 * Excellent. I added those ranks to the "switch" statements in this template. — Eru·tuon 20:37, 29 July 2017 (UTC)
 * You were a participant in the discussion during which I created Module:italics, back in October. But I don't think it ever got implemented anywhere till now. — Eru·tuon 21:13, 29 July 2017 (UTC)
 * Glad to have helped you guys find a use for it! — SGconlaw (talk) 21:16, 29 July 2017 (UTC)
 * See Category:Entries using missing taxonomic names, the subcategories of which show the various taxonomic ranks and rank-like words used in . There are some which are sufficiently rare that I don't remember or never determined the proper display. DCDuring (talk) 22:12, 29 July 2017 (UTC)
 * Another complication is that ALL taxa of viruses are italicized, except for Virus itself. It is governed by a separate body: the.
 * And lastly - probably also most annoyingly - defaulted to italics, so that all uses of  above genus rank are enclosed in wikitext ' s. As a result, at present all taxa above the rank of genus appear italicized if used in . See Cyprinodontidae for some examples. I think we should allow for the legacy behavior until we make all the changes using AWB or, better, a bot. We could code  for proper functioning with the legacy bad practice. Does it pay to deprecate the old approach and have another template to cleanly implement the new, more desirable approach. DCDuring (talk) 22:33, 29 July 2017 (UTC)
 * Regarding virus names, perhaps a parameter 1 would be a good idea. Regarding the switch to new behavior, how about creating with the auto-italicization and reverting  back to its old behavior? — Eru·tuon 22:37, 29 July 2017 (UTC)
 * One other aspect of the taxonomic codes is that they would like taxonomic names to be italicized when in normal text and in a contrasting text style like simple non-italic when in italicized text. I've never gotten the implications for embedding  with no other text in format templates that would have other text normally appear in italics.
 * I was hoping there was some way of having my cake and eating it too and also eating it tomorrow. That is, I would like to be able to use as the name for the currently coded template AND not have to change template names or behavior for the current uses of  (also no new parameters). As previously implemented,  was relying on using wikitext italic formatting as operating compatibly both inside and outside . All of the usage of  that I consider to have been proper work this way. All super-generic taxa have depended on it. I don't see how this could be done an strongly suspect that it is logically impossible. But, sometimes I am surprised awestricken by ingenious software solutions. DCDuring (talk) 23:05, 29 July 2017 (UTC)
 * Wait, wait. Could instances of template for supergeneric taxa could retain the legacy approach, with the new behavior limited to genera and below? I see that viruses would require a new parameter.  could be the new proper coding. I add most of the new instances of  and monitor the rest so effective replacement of the  by  for new entries. DCDuring (talk) 23:20, 29 July 2017 (UTC)
 * Yeah, you can use to determine which ranks use which output format. That's what I've already done to determine which ones should use Module:italics. So I gather the old behavior was that every rank was italicized in the template, and un-italicized by surrounding the template with italics ? I suppose then I could just add italicization as the default (for those not using the module). — Eru·tuon 23:39, 29 July 2017 (UTC)
 * YES.
 * I sort of knew that my crude template methodology would eventually bite me. In fact, it had already bitten me in that de-italicizing doesn't work inside most templates that italicize by default. I've often wonder why such templates should override formatting embedded in the their text or, at least, my template-implemented formatting. DCDuring (talk) 00:54, 30 July 2017 (UTC)


 * At [[Lonicera japonica]] Hypernyms I found "subsect." instead of "subsect." DCDuring (talk) 00:41, 29 November 2017 (UTC)
 * Fixed. Problem with the pattern (regular expression). — Eru·tuon 01:18, 29 November 2017 (UTC)
 * Thanks for the prompt attention given. I'm in the middle of removing the two-apostrophe wikiformatting surrounding super-generic taxa that use and fixing other formatting problems while I'm at it. DCDuring (talk) 04:09, 29 November 2017 (UTC)

Certain ways of showing taxonomic names of hybrids
See [[Orchidinae]] for problematic display of nothogenera. The display contains a spurious " ' '" and is bold (linked problems). I haven't looked at nothospecies etc. yet, so don't undertake too much. DCDuring (talk) 16:30, 31 December 2017 (UTC)

Cultivar, variety
I believe you did the template correctly,, but this template is insufficient. Apart from it not recognizing cv., var., conv., which can be added to the template code, it seems that cultivar names like Vitis vinifera ssp. vinifera Sultana syn. Sultanina should not be italicized, while variety and convariety names should be italicized, eg. Prunus domestica ssp. insititia var. pomariorum alias Prunus domestica ssp. insititia convar. pomariorum. In the end this template will have to be modulized. Fay Freak (talk) 09:49, 17 November 2020 (UTC)
 * Fortunately, these are few and are likely to remain few for quite some time. Estimates are that there are 8.7 million species of plants and animals (Fungi? Chromists? Protozoa? Bacteria? Archaebacteria? Viruses?). 1.2 million of the plants and animal species have been described. We have about 5,300 species from all 7 of these "kingdoms". I am not expected that we will have more than 100 cultivar names in the next decade in the normal course of things (ie, no contexts or dares). DCDuring (talk) 21:20, 24 February 2024 (UTC)

Convert to Lua
, since the template is already invoking Lua to handle the italics, I don't think there's any performance or memory penalty for converting the whole function to Lua. Additionally, using Lua, it's not "expensive" to check the existance of the page, so it would no longer be necessary to manually remove taxlink when the page exists. See User:JeffDoozan/taxlink for a potential Lua-based replacement with lots of tests to verify that the old and new templates function identically. I've tried to add real usages of all possible parameter combinations, but please add any additional tests if you think it's could be missing something. Using Lua opens the possibility of adding some more advanced features such as checking that Translingual exists on the target page, in case anything like that would be worthwhile. What do you think? JeffDoozan (talk) 17:26, 24 February 2024 (UTC)
 * Most importantly, I would like to keep the ability to compile lists of "wanted" taxonomic and English vernacular name pages ordered by the number of incoming links. Now, I run Perl scripts against the XML dumps from time to time and add the most "wanted", counting the number of taxlinks or verns for each name. The automatic formatting is fine to have, but I would still need a way to indicate which taxlinks were no longer "wanted" because the "wants" had been fulfilled. Yes, there are Latin and German capitalized noun pages that are homonyms for taxonomic names. I don't know about other languages. I should do something similar with epinew and epilang to make sure that the most common missing ones are added.
 * If I were to think big, I would also like to run taxlink items against accepted taxonomic names and synonyms from the Catalogue of Life to catch some of the spelling errors and changes of acceptance, both. from 'accepted' to 'synonym' and vice versa. DCDuring (talk) 21:10, 24 February 2024 (UTC)
 * Are you saying that right now you use the existance of taxlink in the XML dumps as a signal that it's a "wanted" entry so if we start keeping taxlink around after the target entry exists, you won't be able to use that tool anymore? Did you write that script yourself? If so, could you adapt it to build a hash of all the page names that include ==Translingual== while it's scanning the XML and use that to ignore "completed" taxlinks? If not, I've already written a bunch of scripts that generate reports from every XML dump, I could easily write another one to generate a report of wanted taxlinks.
 * If we switch to the Lua template, it can verify that the target page exists and includes ==Translingual== or ==English==, which seems easier than doing something manually like eipnew or epilang (I'm not familiar with those templates or the process of using them).
 * Validating taxlinks against the Catalog of Life dataset would be possible using offline scripts, since they offer downloads of the database. Thinking even bigger, it might be possible to use their database to detect italicized species names and apply taxlink so that all of our species links are nicely labeled. JeffDoozan (talk) 22:20, 24 February 2024 (UTC)
 * I count the instances of Taxonomic name to get "wanted" pages, ordered by number of wants. Similarly for vern. Once upon a time I could use Special:Wantedpages for a few, but now those pages are clogged with so many items that will remain red for this century. I have outlasted most of those who provided technical help, so I like something that is simple enough so that I can run the procedure myself with my very limited technical chops. DCDuring (talk) 22:51, 24 February 2024 (UTC)
 * That's reasonable. Do you have the technical chops to adjust your script to account for pages that exist? If not, would you be open to help adjusting your existing script so you can continue to run it yourself? I understand that this template is a central part of a very long-established process that you've successfully run for many years. It's amazing that there are 46,000+ taxlinks that you've almost single-handedly added. From the perspective of a relative newbie, it seems like it would be a win for the project if we can retain the information that you put into every taxlink, with the added bonus of saving you from having to manually remove it. JeffDoozan (talk) 23:22, 24 February 2024 (UTC)
 * As for validating against Catalog of Life: it has its uses, but every taxonomic source has its own version of what's accepted. There are taxonomic codes that spell out in great detail how to determine if a name is validly published, spelled correctly, etc., and how to decide which name has priority. Beyond that, it takes expertise in the taxonomy of the particular field to determine whether a particular name is describing the same taxon as another name, and advances in the science (especially the advent of molecular biology) have tended to overturn those judgments at an astonishing rate in recent decades. Comparing the taxonomy of Wikispecies and Wikipedia shows quite a bit of disagreement, and most of the reference works that give taxonomic names for specific terms are completely out of date in comparison to those two. I have just enough background and just enough references bookmarked to figure out what the outdated names are referring to most of the time, so I help out where I can. DCD doesn't have any background, but he still manages to come up with something useful and worthwhile. Chuck Entz (talk) 04:54, 25 February 2024 (UTC)
 * I only intend for CoL to be used spell-checking and not even to be the definitive source for that. I have learned which taxonomic databases tend to be more authoritative and/or current: PoWO, WoRMS, LPSN, ICTV, MycoBank, ~MSW, ~NCBI. WP tends to be better than Wikispecies, but Wikispecies is more convenient for copypasta. DCDuring (talk) 16:02, 27 February 2024 (UTC)
 * Thank you for the clarification and explanation, Chuck. I should have known that with something as huge as taxon names that's been ongoing for so many years that it wouldn't be just DC doing all the work (just, as you say much of the hard work!). JeffDoozan (talk) 16:39, 25 February 2024 (UTC)
 * I would probably just rely on links being blue. I believe that 46K is the number of pages with taxlink. Many pages have more than one instance. There are about 96,000 distinct taxonomic names in instances of taxlink. I'm lucky if I get 50 entries added in a month. When others help, they invariably add stub entries and mostly for items with few incoming links. I try to improve our existing entries to make them less stubby, which is much more demanding that removing 500 instances of taxlink in a month. I try to add images, gender, etymology, hyponyms, hypernyms, derived terms (mostly for genera), links to external databases (not just sister projects), etc.
 * As 100K taxonomic names (of all ranks, all kingdoms) is a small percentage of the 1.2MM described species of plants and animal (no fungi, chromists, protozoa, archaebacteria, bacteria, and viruses), you can see that some prioritization is essential.
 * What might be useful would be to add enclose all instances of taxonomic names that we have entries for (ie, that use taxon) in taxlink. It should be easy to extract the required info from the entries, even stubby ones. The instances of conflicting rank (Is a taxon a class, an order, a suborder, an infraorder, a clade, etc.?) would need to be manually reconciled. DCDuring (talk) 15:57, 25 February 2024 (UTC)
 * "I would probably just rely on links being blue" I'm not sure what you're referring to, can you clarify this?
 * By which I mean that, when it put the wikiformatted list on one of my user subpages, those items that already had entries would appear as blue links, so I would direct my attention to the red ones. DCDuring (talk) 20:03, 25 February 2024 (UTC)
 * "What might be useful would be to add enclose all instances of taxonomic names that we have entries for (ie, that use taxon) in taxlink. It should be easy to extract the required info from the entries, even stubby ones." This is interesting, can you give me an example of what this would look like? JeffDoozan (talk) 16:39, 25 February 2024 (UTC)
 * For each taxonomic name for which we have an entry, taxon and the entry it appears on have the required information to (re)generate taxlink (or a simpler format-only version). For each such entry PAGENAME has parameter 1, the taxonomic name, and parameter 1 of taxon has what is needed for parmeter 2 of taxlink. One the properly parameterized instance of taxlink is created, it can replace all instances of the taxonomic name, whatever their formatting or linking.
 * For the vast majority of pages with taxon there is only one instance of taxon. For most of those that have more than one, there is probably a real ambiguity that requires manual attention. The ambiguities are usually one of three kinds:
 * The taxon has been used with different ranks, eg, class, subclass, order, but consisting at least approximately the same organisms, requiring a decision as to what its default presentation should be.
 * The taxon has been applied to different sets of organisms, eg, plants, animals, fungi, protists, chromists, bacteria, archaebacteria, viruses.
 * Two or more instances of taxon referring to the same set of organisms differ only in where they are placed (eg, different families or a family vs. order), but have the same 'rank'. This case may be addressed in code where the instances of taxon appear in different subsenses, but it is almost certainly true that this would not be worth the extra coding.
 * Another complication is that all taxa, of any rank, in Archaebacteria, Bacteria, and Virus are italicized. At present, I am not sure that we have systematically presented content in all of these. In principle, there should be a named parameter in taxon that is set as "i=1" for all of these. I could make an effort to add "i=1" where it is needed using wording in the definition or in hypernyms or hyponyms. DCDuring (talk) 20:03, 25 February 2024 (UTC)
 * So, just to take a random example, the page Capra aegagrus hircus contains a single taxon: subspecies. Using that data we can generate taxlink (or equivalent) as Capra aegagrus hircus and then use that to replace the text "Capra aegagrus hircus" (with or without italics or links) anywhere it appears in the mainspace? This sounds almost too good to be true! With over 20,000 pages containing taxon, even after discarding any pages with multiple taxons, this should make it relatively painless to re-apply many of the taxlinks. What exactly should be done with in the case of Archaebacteria, Bacteria, and Virus? Can they use taxlink as-is, or would it need to be expanded with something like a 1 flag? JeffDoozan (talk) 21:37, 25 February 2024 (UTC)
 * That is how it could work for the most common cases. Archaea, Becteria, and Viruses need i=1, but the current version of taxlink doesn't do anything with that parameter. Though in principle we could derive the proper italics from data that would appear in a full taxonomic entry, many of the entries lack the key data, so my manually adding "i=1" to taxon for those kingdoms is quicker than completely filling out the entries. I have been working on some of the virus and archaea entries to make sure that they have i=1 in taxon and taxoninfl (the headword template). I have begun working on bacteria with the same objective, but came across some major name changes that have slowed the process. There are only a couple of hundred more to work on on my current list, but there may be an equal number that are not on that list and would take longer to identify. Not having the italics would sad be for the higher taxa in these three kingdoms, but would not take long to correct manually. DCDuring (talk) 02:24, 26 February 2024 (UTC)

As of the 2/20/2024 XML dump, there are 20,836 taxons on 20,160 different pages. 626 pages have two taxons, and no pages have more than two. The only taxon that appears outside of English or Translingual is ꠍꠁꠔꠦ ꠝꠞꠣ. Here's a list of all the pages with multiple taxons in case that's helpful for you. And here's a long list of all taxons, with the regenerated taxlink and a large sample of the ~38,000+ proposed string replacements where the bot can replace an existing string with a taxlink. This is still a work in progress as it currently matches taxons inside filenames and other places where we don't want to make replacements.
 * Thanks a lot. I will try to take a look after I get back from a dental visit today. DCDuring (talk) 15:45, 27 February 2024 (UTC)

For the proposed replacements, I limited it to taxons that contain a space to avoid matching terms that are probably not taxons like This, Paris, Arizona and Satan. I'm not sure if there's a good way to determine whether or not to convert a single word to a taxlink - maybe only if it's already linked and in italics and is on a line containing specific keywords? I haven't revewied the proposed replacements to see if there are any bad two word combinations that shouldn't be replaced with taxlink.
 * What might work in many circumstances is to detect which capitalized single-word candidate terms have only a Translingual L2. DCDuring (talk) 15:45, 27 February 2024 (UTC)
 * @User:JeffDoozan. I finally looked at your proposed replacement. You apparently selected parameter 2 instead of parameter 1. That is:
 * Rosa canina
 * species
 * should yield Rosa canina, not Rosa canina. DCDuring (talk) 23:43, 27 February 2024 (UTC)
 * Good catch. All of the previous links are now updated now with the correct data. JeffDoozan (talk) 01:28, 28 February 2024 (UTC)
 * This is a good idea. I included single-word taxons and filtered out taxons with multiple L2s so there now ~45,000 proposed replacements. I updated the sample with the new results. JeffDoozan (talk) 21:31, 27 February 2024 (UTC)

What should taxlink do if 1? Skip the call to italics.i and just italicize the entire string?
 * AFAIK at this time there are no non-italicized elements in accepted taxonomic names above the rank of genus, except for those in Virus, Bacteria, and Archaebacteria. DCDuring (talk) 16:07, 27 February 2024 (UTC)

You mentioned earlier that you add a lot of hyper/hyponyms. Would some sort of taxhyper and taxhypo templates be helpful? If so, how should they work? JeffDoozan (talk) 22:38, 26 February 2024 (UTC)
 * Possibly. If entry-existence testing is really cheap, some repetitive typing could~ be eliminated. Even if it is expensive, 'subst'ing such a template would usually save keystrokes and only be expensive once, not every time the entry page is loaded. DCDuring (talk)
 * I believe checking entry-existence is cheap enough that we can use it without worrying about memory or speed. I tested a page with > 5,000 unique Lua taxlinks referencing pages that exist (which I think is a slightly more expensive check than pages that don't exist, since it returns some data about the page) and it worked fine. If it turns out to be too expensive on certain pages, it's easy to adjust. It may even be possible to check existence or do other aggressive validation only on the page preview and not on the live page, which could be helpful. JeffDoozan (talk) 21:31, 27 February 2024 (UTC)
 * I have gone through (all/most/many?) of the entries in the "kingdoms" Archaea, Bacteria, and Virus and included "i=1" in taxoninfl in all of those that ought be italicized, but would not ordinarily be done by taxlink, ie, those above the rank of genus. It was something that needed to be done just to fix the entries. There are obsolete/archaic/data "taxa" that are arguably for archaea, bacteria, and viruses (note lower case). As they apparently predate the official codes for these kingdoms, italics should not apply. If I am wrong, it wouldn't be too hard to correct the dozen or so entries of this type.
 * This means that taxoninfl is a rather reliable indicator of suprageneric taxa that should be italicized. DCDuring (talk) 23:30, 27 February 2024 (UTC)


 * @User:JeffDoozan
 * "Unsafe": Any link from a Translingual-only entry to one of these is very, very, very likely to be safe. There is a possibility that links originating in an Etymology section of a taxonomic entry could be to a proper noun (proper noun of a personal name (any language), placename (any language), or name of classical Latin individual, family, deity) or to a German noun.
 * Multiple taxons: Since what we most need to do is get italics right, any of these where all of the taxons in a given entry show only generic or subgeneric ranks can be assigned the highest rank, usually 'genus'. This is particularly true where either the taxons appear as subsenses and have the same rank (parameter 1). Similar rules can apply where the link originates from a Translingual L2 that is only Archaea, Bacteria, or Virus (where i=1 is appropriate) or where there is no Archaea, Bacteria, or Virus (where i=1 would be wrong). These latter things may be too complicated at this time. DCDuring (talk) 15:54, 28 February 2024 (UTC)
 * @User:DCDuring
 * Good idea, matching unsafe taxlinks in all non-etymology sections of Translingual entries successfully adds a taxlink for Paris on lutetianus plus 90 other fixes that would otherwise have been missed.
 * Can you give me a list of what strings are generic or subgeneric ranks, ordered from highest to lowest? For reference, all of the existing ranks in our taxons (as of 2/20 data export) are "clade; class; cohort; division; epifamily; family; form; form classification; form genus; genus; grandorder; group; ichnogenus; informal group; infraclass; infradivision; infrakingdom; infraorder; infraphylum; kingdom; magnorder; morph; morphological group; nothogenus; nothospecies; oogenus; order; parvorder; phylum; section; series; serovar; species; strain; subclass; subdivision; subfamily; subgenus; subgroup; subkingdom; suborder; subphylum; subsection; subspecies; subtribe; superclass; superfamily; supergroup; superorder; superphylum; supertribe; taxon; tribe; variety"
 * The ordering is: genus, subgenus, section, subsection, species, subspecies, variety. What matters most are genus, subgenus, section, and subsection because some of our entries have them as one-part short forms where they can appear on the same page as a genus name. DCDuring (talk) 02:44, 29 February 2024 (UTC)
 * I think we're close to being able to run this. There will be a new data export this weekend that will include all of your recent 1 work, so the generated taxlinks will have all of that information. Can we use taxlink, or is it better for your process to use a new template? If you'd prefer a new template, what would you like it to be named? JeffDoozan (talk) 22:19, 28 February 2024 (UTC)
 * It would be simpler for me if we used a new template that differed in only a few characters from taxlink, eg, taxfmt. You see, the idea that I wouldn't be visiting entries that have taxlink as the taxonomic name enclosed in the template is made an entry is not realistic. I don't want to have to run all instances of taxlink against all instances of taxon and/or taxoninfl whenever I am generating a frequency-weighted list of "wanted" taxonomic names. The creating of new taxonomic entries for items that are "taxlinked" is a kind of special-purpose watchlist that bypasses the capacity problems of regular watchlists. DCDuring (talk) 02:44, 29 February 2024 (UTC)

, I switched taxlink to Lua and, as a stress-test, previewed Ixora with the HTML comments removed from the big list of taxlinks. Even with 532 taxlinks, it renders at the same speed and, more importantly, works fine even with > 500 taxlinks on the same page. I also created taxfmt and applied it to tinami, petrello, plum, 癩菌, and duck. Please review those diffs and let me know if you see anything that should be adjusted before I apply taxfmt to more pages. JeffDoozan (talk) 16:48, 2 March 2024 (UTC)
 * At tinami you did not apply taxfmt to 7 taxa that were genera, all seemingly unambiguous cases.
 * At duck you missed Dendrocygna.
 * At 癩菌 you applied a redundant "i=1".
 * The other two seemed fine as far as your work is concerned, but, as usual, there are plenty of other things that need changing, like missing parentheses (by my lights, not policy) and missing vern templates for the numerous redlink vernacular names at tinami. No automatable remedy for that short of mining WP, Wikispecies, Wikidata. And polysemy is common in 'vernacular' names. DCDuring (talk) 01:48, 3 March 2024 (UTC)
 * @User:JeffDoozan Have I neglected to answer any questions that you've asked? Do you have new ones? I like your tests and I can't think of other good ones, but I'll not be too surprised if we uncover a good number (but a small percentage) of problems. There do seem already to be things not formatted as I think they should be, like the peculiar fragment in one of the last names in tinami. There was also a missing space before a following sp. or spp. &mdash; and that's just in the five entries so far.
 * BTW, I really like that we have picked up a lot of unlinked taxa. I also really look forward to, in effect, spell-checking our taxa against Catalogue of Life and applying taxlink to any instances of those that are not linked. I suspect that my big list of taxlinked taxa has a pretty good number of no-longer-accepted synonyms as well as misspellings and misformattings, especially among the unique ones. I regularly find such in reviewing entries, often derived from old dictionaries. DCDuring (talk) 03:57, 3 March 2024 (UTC)


 * Thank you for the very careful review of the edits. I didn't expect you to also check for places it didn't make changes or I would have picked smaller pages. Since this is going to make a lot (56,000+) of replacements, I made the matching very, very conservative and will progressively loosen it on muliple runs. The missing fixes on duck and tinami was part of an overabundance of caution on my part where it would discard any potential 'short' matches if a 'longer' match existed on the page: ie, it would never replace Dendrocygna if Dendrocygna guttata also exists on the page. With careful matching of balanced opening/closing italics and brackets, this is no longer needed.
 * On the first run, it won't make any replacements inside templates,  links, html comments or nowiki tags. It will also not make a replacement if the "matched" text contains an unequal number of opening and closing brackets, or if the number of "'" at the start of the match does not equal the number of "'" at the end of the match. Rejected matches are logged here.
 * After the first pass fixes the "easy" matches and nobody complains that it broke their pages, I'll run it again to make replacements inside inside a manually curated list of templates only when it is replacing an italicized link.
 * If that goes well, I'll let it make replacements inside the allowed templates when it's replacing text that is either linked or italicized. This should still be pretty conservative, but has the potential to cause unexpected problems, so I'll pay closer attention to these matches.
 * Finally, when it's down to a number of matches that can be manually reviewed, I'll let it make replacements inside any template when it's replacing is an italicized link.
 * After that, I'll rely on you to let me know if there are other places where it should make automatic matches.
 * The current list of allowed templates is . Matches inside these templates are not included in the error report as rejected matches. If you look through the error report and notice other templates where it should always be safe to make replacements, please let me know. Likewise, you think any templates on my list that not be safe for unsupervised replacements, please let me know.
 * Questions:
 * Did the bot introduce formatting mistakes in the sample edits? I don't see anything related to the peculiar fragment in one of the last names in tinami. There was also a missing space before a following sp. or spp in the bot's edits.
 * I misremembered: it was at petrello. DCDuring (talk) 15:37, 3 March 2024 (UTC)
 * I still don't see any errors in the bot's edit on petrello. I can see that you made some edits afterwards to add taxlink where the bot didn't match because "sp." or "cf." was inside the italicized text, and if there are easy rules for handling that, I can add them to the bot. Is there anything wrong about the changes the bot made that I need to fix before running it on more pages? If not, I'll run it on a larger sample of 50-100 pages and we can look through that for any other concerns. JeffDoozan (talk) 17:03, 3 March 2024 (UTC)
 * They were just instances where the bot didn't everything I would do when editing. Locating and correcting various bits of non-conformity would be useful to one trying to bring these to a uniform high standard. DCDuring (talk) 21:27, 3 March 2024 (UTC)
 * On 癩菌 it applied 1 because the taxon on Mycobacterium leprae contains 1. Is there additional logic it needs to use to decide whether or not to use 1?
 * There is no harm from leaving it for species or genus, except that it provides a bad model for other entries. If parm2 is genus or lower (genus, subgenus, section, subsection, species, subspecies, subspecies, form, variety), then "i=1" is unnecessary. If parm2 is subgenus, section, subsection, species, subspecies, subspecies, form, variety, then it is potentially harmful as "i=1" is supposed to override the formatting logic that is rank-dependent. DCDuring (talk) 15:37, 3 March 2024 (UTC)
 * I've fixed this, it will no longer add "i=1" if parm2 is subgenus, section, subsection, species, subspecies, subspecies, form, or variety. JeffDoozan (talk) 17:03, 3 March 2024 (UTC)
 * The same thing applies to genus, though there is no risk of outright harm. I'm sure that I will be occasionally searching for instances of presence or absence of "i=1" and don't need unnecessary distractions or exclusions. DCDuring (talk) 21:27, 3 March 2024 (UTC)
 * Good catch, I added "genus" to the list of exclusions. JeffDoozan (talk) 22:02, 3 March 2024 (UTC)
 * Is it correct that when 1 is passed to taxlink and taxfmt, the templates should just blindly add  to the start and end of the text provided, without needing to call Module:italics and without doing any validation or making any other changes to the provided string?
 * Yes, which explains previous answer. AFAICR, "i=1" is necessary only for suprageneric ranks in Archaea, Bacteria, and Viruses. DCDuring (talk) 15:37, 3 March 2024 (UTC)
 * Are there any cases where  should exist, or can I have it delete the "wrapping" l template when completely replacing the contents of l? Does the same answer apply to m? Are there any other templates that can be removed if they're just wrapping taxfmt?
 * I can't think of a case where it should appear within l. If m were only used to convey 'mention' vs. 'use', which was often the norm, then it would necessary to allow m to wrap around taxfmt and taxlink. The current discussion (BP?, TR?) shows that folks don't count on it as anything but a formatting tool. The logic behind italics for taxonomic names is that they are supposed to contrast with the surrounding matrix of text . (As suprageneric names were often beyond the reach of the taxonomic codes, they were and are not italicized. The newer codes, for Viruses and Prokaryotes, do apply to higher ranks.) Mentions are also supposed to supply contrast with the matrix text. So in cases where our normal formatting of the matrix puts it in italics, eg, ux, syn, and a to-be-contrasted taxonomic name is embedded within, then it should, in principle, not be italicized. I don't really see how we can follow that rule, especially since many such instances would have relatively little (or no) matrix. IOW, no m wrapper either. I get a headache trying to think through all the other templates, where my own practice has been inconsistent. DCDuring (talk) 15:37, 3 March 2024 (UTC)
 * JeffDoozan (talk) 14:46, 3 March 2024 (UTC)
 * @User:JeffDoozan I appreciate how careful and incremental your approach is. I think we are ready for your proposed next implementation of 50-100. DCDuring (talk) 21:27, 3 March 2024 (UTC)

50 random pages

 * @User:DCDuring Ok, I ran the bot on 50 random pages: 사탕수수, aho, ajao, anyphaenid, arctiídeo, bronze-winged jacana, bulletwood, clearwing, crowfoot, Cuculus, dromaeognathous, euonymus, giải, go the way of the dodo, goldenberry, Hare Bay, härkägemssi, horsehair worm, huitti, jawar, kaukázusi nyírfajd, leszczyna, lobeliaceous, madrone, manteldier, Metriopelia, mogyoróhagyma, monkfish, pelick, pionbìn, piroterio, pokomk, pteridaceous, pyyjuoksija, regnbågsforell, rimoportulae, sandsnäppa, santalaceous, sarcastic fringehead, sigillaria, Sternanis, taklobo, törö, tragulídeo, typhi, zygenid, корюшка, чернуха, اغو آغاجی, قنبر. I ran this in "aggressive" mode so it replaced some template and File: paramaters (never file names, only parameters). I carefully reviewed each edit before letting the bot make it and I didn't see anything suspicious, but if you have time, please double check a handful of those at random and let me know if you see any problems. If not, I can run the first pass of easy fixes. Note: this doesn't make any changes inside *nym sections, pending discussion of whether or not it's reasonable to make some more specialized templates. JeffDoozan (talk) 22:02, 3 March 2024 (UTC)


 * Right off the bat: 사탕수수 - It categorizes the taxon as English. But all taxa Translingual. Should be easy to fix in the module, but it looks bad for both of us to make such a rookie mistake. Especially for someone who using the orange-link gadget. DCDuring (talk) 02:20, 4 March 2024 (UTC)
 * aho - We should have the confidence to remove the now-redundant wikitext italics that surround taxfmt as in the Tokelauan L2: 
 * I made it more bold in removing surronding italics when enclosed in parenthesis and in lists . JeffDoozan (talk) 17:16, 4 March 2024 (UTC)
 * bronze-winged jacana -  did not get taxfmt DCDuring (talk) 02:45, 4 March 2024 (UTC).
 * Cuculus None of the hyponyms, all of them blue links, got taxfmt: Cuculus clamosus, Cuculus crassirostris, Cuculus gularis, Cuculus lepidus, Cuculus micropterus, Cuculus optatus, Cuculus rochii, Cuculus saturatus, Cuculus solitarius
 * The other six of the first ten seemed OK. DCDuring (talk) 03:00, 4 March 2024 (UTC)
 * Metriopelia - none of the taxa in Hypernyms (Columbinae) or Hyponyms Metriopelia aymara, Metriopelia ceciliae, Metriopelia melanoptera, Metriopelia morenoi got taxfmt
 * - It was deliberately ignoring everything in the *nyms sections in case we came up with a better template for them, but on second thought it's better to just replace everything now. If we come up with a good template, it'll be easier to replace a list of taxlink and taxfmt templates than wikitext.
 * Sternanis OK. IOW: 7/12, really 8/12 good; no dealbreaking blunders.
 * I've looked at 12/50. I'll try to look at more tomorrow AM. DCDuring (talk) 03:12, 4 March 2024 (UTC)
 * Thank you for the careful review. I think I've fixed everything you mentioned so far. Is the m replacement in the Etymology of safe? Let me know if you see anything else that should be adjusted or if you'd like me to run this on some more pages. JeffDoozan (talk) 17:16, 4 March 2024 (UTC)
 * I've been looking just now.
 * In huitti "Zapornia." did not get taxfmt. Was it the "."?
 * rimoportulae only one taxon formatted inside quotation template, of several present, but not linked.
 * pteridaceous you hadn't formatted within suffix. I added taxfmt manually and it worked fine.
 * Every other one seemed OK. DCDuring (talk) 21:02, 4 March 2024 (UTC)
 * Zapornia is a redlink, likewise Synedra, Nanofrustulum, Opephora, Staurosirella and Staurosira in rimoportulae. I think they would have to use taxlink, not taxfmt so they get counted in your reports.
 * I'll add suffix to the list of allowed templates. I removed the quote-* templates from the safe list, because I don't think we want to match inside titles, only passages, and I need to figure out how best to handle that.
 * I cleaned up and filtered the list of rejected matches and it's pretty useful now for finding spots where we can try to make the bot smarter and also for finding spots that might need some human editing.
 * Here are 20 more edits on pages with 8-10 fixes per page: acarus, alligator, ash, ber, cinnamon, elk, grape, man-of-war, orange, primrose, rail, roach, robin, saffron, squid, swan, violet, wax, white, wood. JeffDoozan (talk) 02:58, 5 March 2024 (UTC)
 * I keep on wishing that we would be already getting to the point of adding taxlink, not just to linked instances of redlinked taxa, but also unlinked ones. DCDuring (talk) 12:51, 5 March 2024 (UTC)
 * I'm not looking at these very carefully any more, because I haven't found much wrong in the earlier ones. DCDuring (talk) 13:46, 5 March 2024 (UTC)
 * If you're getting bored of looking at good edits, that's a good sign that we're ready to go. The first pass will make 98,787 replacements on 53,535 pages, replacing linked and/or italacized text and making wholesale replacements of some templates like tax name but otherwise avoiding any changes in of sensative areas like templates, comments, nowiki tags, Image:/File: links. I'm going to do a test run of 5,000 pages, which should be enough to catch someone's attention just in case it breaks something. If nobody complains in the next few days, I'll let it do the rest of the first pass and then repeat the same procedure for the second pass.
 * If you have any clever ideas for converting redlinks or other text to taxlink, it wouldn't be too hard to adapt the bot. It just needs a good list of taxon names and ranks. We might also be able to do something similar with vern. JeffDoozan (talk) 15:01, 5 March 2024 (UTC)
 * Extracting a list of redlinks and unlinked terms for names with initial caps; not in German, or Latin L2s; and surrounded by "''" (wikitext for italics) might generate more taxonomic names. That list might still require manual review, but together with our existing list of taxa enclosed in taxlink and taxfmt should give us a good list for scanning through all taxonomic Hypernyms templates; definitions; etymology, hyponyms, hypernyms, coordinate terms, and usage notes sections for italicized names. The third section (viewing "|" as section delimiter) of file/image wikitext is another good place to look for untemplated taxonomic names for more for a master list. That third section of file/images should have a good number of taxa needing taxfmt. DCDuring (talk) 18:19, 5 March 2024 (UTC)
 * Adding taxlink using the data from existing taxlink parameters is easy and results in about 4,000 fixes. See Euphorbioideae, Heterokontophyta, Cynodontia, millet, languri, wheat midge, Penaeidae, curruca, cefalofo, pino for a sample.
 * Will check soon.
 * Extracting redlinks, italacized words, and italics/bold items in the third section of all file/images that start with a capital letter and occur outside of "Latin" and "German" entries and don't match the name of a taxon referenced by an existing taxlink generates 63,000+ possible taxons. If you can use that make a list of names and ranks, I can have the bot apply taxlink to them automatically. JeffDoozan (talk) 15:27, 6 March 2024 (UTC)
 * I'll have to work on that later.
 * User:JeffDoozan/lists/bad taxlinks lists some existing taxlinks that might need manual cleanup. JeffDoozan (talk) 15:47, 6 March 2024 (UTC)
 * Love the bad taxlinks list. I have marked a good number as correct as is (marked with !) and others are stricken as corrected. There are several names of the form Genus (Subgenus). Those are correct, but the templates should format them with italics around both Genus and Subgenus, but not the parentheses. Also, there seem to be some names that must be typed using a font that looks like a standard ASCII, but is not. DCDuring (talk) 18:49, 6 March 2024 (UTC)
 * I have gone through both bad taxlinks lists. I have corrected ones with some kind of input error. Where there are two ranks: there are a few DO NOT USE, there are more where I say you should assume the rank given. The ones marked with ! are correct. DCDuring (talk) 00:49, 7 March 2024 (UTC)
 * @User:JeffDoozan You seem to have been adding taxlink where it should be taxfmt. [Or so it seemed.] And the categorization into Category:Entries with redundant template: taxlink isn't working in taxlink. Seems OK; I might have been hallucinating. DCDuring (talk) 03:14, 7 March 2024 (UTC)
 * Glad it was just a false alarm! I took the opportunity to fix a bug where User: pages were being incorrectly classified into Category:Entries with redundant template: taxlink. AFAIK, everything should be working as expected now. JeffDoozan (talk) 04:33, 7 March 2024 (UTC)
 * @User:JeffDoozan At zizania there are no instances of taxlink, but 2 of taxfmt. Among the hidden categories is Category:Entries using missing taxonomic name (genus), which should only be generated by taxlink. DCDuring (talk) 15:43, 7 March 2024 (UTC)
 * , only taxlink will categorize, taxfmt now does nothing more than format the text. JeffDoozan (talk) 16:27, 7 March 2024 (UTC)
 * Thanks. We will have to give some thought to useful/essential categories for taxfmt. One thing I have already seen is use of taxfmt where there is not Wiktionary entry. I have seen it from a non-bot contributor! This should, at least, put the entry into a category like Category:Entries with template taxfmt that should have taxlink instead (Better name welcome). DCDuring (talk) 16:49, 7 March 2024 (UTC)
 * Really a filter would be better, but I don't know whether a filter can handle that kind of entry/L2 existence test. DCDuring (talk) 16:52, 7 March 2024 (UTC)
 * Do you mean an edit filter, or some sort of filter in the template itself? I don't know much about the former, but I suspect parsing the template parameters and validating the existence of a referenced page goes beyond what edit filters can do. JeffDoozan (talk) 23:07, 7 March 2024 (UTC)
 * Do you mean an edit filter, or some sort of filter in the template itself? I don't know much about the former, but I suspect parsing the template parameters and validating the existence of a referenced page goes beyond what edit filters can do. JeffDoozan (talk) 23:07, 7 March 2024 (UTC)