User talk:Pengo/Archive 0

--Connel MacKenzie 19:16, 1 December 2006 (UTC)

User:Pengo/Latin/Top 1000
Ummm... When you say "Interlingua", do you mean "Translingual"? Interlingua is a constructed language like Esperanto; Translingual means "used in many languages". --EncycloPetey 03:11, 6 September 2011 (UTC)
 * Oops you're right. I must have seen it on some entries and gotten mixed up. Pengo 03:14, 6 September 2011 (UTC)

I also notice that you're not grouping forms of the same word, so that, , and are listed separately, as are  and  or  and. I wonder if it would be better to organize the list to group these, since (1) it will make entry creation simpler for people using the list, and (2) epithets can change from one form to another when new combinations result from synonymy or splitting (if the new genus is of a different grammatical gender). --EncycloPetey 03:21, 6 September 2011 (UTC)
 * Hmm.. Might be tricky. I'll see what I can do. --Pengo 03:49, 6 September 2011 (UTC)


 * Hi there. I've had a go at User:Pengo/Latin/Top 100. You might like to rebuild it sometime (ideally without the colours!). Cheers. SemperBlotto 17:23, 14 December 2011 (UTC)


 * Thanks so much! Pengo 04:44, 15 December 2011 (UTC)

apiculatus (apiculata)
Hi there. I think that this must be correct, but there is just the outside chance that it is related to apicula: (a little bee). The images that I have found look more like hats than bees though. Cheers. SemperBlotto 16:42, 21 December 2011 (UTC)


 * Seems right from my evidence: "Abrupt, short point". Full reply on your talk. Pengo 20:24, 21 December 2011 (UTC)

luctuosa
We have a standard Latin entry for this (see luctuosus:), but I don't see a zoological meaning. (ditto plagiata:)

By the way:- 1) I am planning on continuing this project until at least the end of the year, but sometime I am going back to work on Italian words. 2) I am adding epithets based on surnames as Translingual rather than Latin terms. You can find them at Category:Translingual adjectives. Most of them do not specify a particular naturalist - you might be able to improve those that don't. Cheers SemperBlotto 12:13, 23 December 2011 (UTC)


 * Hi SemperBlotto!


 * Thanks for all your work! Glad you're still going for a while longer. I've been working on writing up a piece about the lists on WikiProject Tree of Life to try to attract some editors to this project. Also I've been slowly increasing my confidence in editing here to try adding to the new entries too, or to try create easier entries (like compound words similar to existing entries).


 * Yep, I noticed the Transligual entries. Makes sense. It could be argued that the other entries that are used "exclusively as a taxonomic epithet" could be considered translingual as well. Though in the end I'm not too concerned one way or the other. I know I've argued about the translingual vs Latin thing before, but my main worry has been more about editors being discouraged from creating new entries by the lack of guidelines on which heading to choose rather than which heading gets used over the other.


 * For luctuosus, the only fitting common name I could find was that of the Widow Skimmer (Libellula luctuosa), which has black bands on its wings that might look like the black lace veil of someone in mourning. Similarly, the Black-and-white Seedeater could be construed to be wearing mourning dress, as could the White-flanked Antwren|. The frog Bokermannohyla luctuosa looks like it's depressed here. I suspect that facial expression isn't uncommon for frogs, but then there's at least three luctuosa frog species. So I guess the taxonomic meaning is an anthropomorphic version of the Latin meaning, something like "appearing to be in mourning dress" or "appearing to be sad".


 * Plagiata/plagiatus I'm not sure about. It might relate to "plagiarism" (which seems to be the precise meaning in Italian at least). Which would make me think it would be used for "false" species (though I can find no evidence of the word "false" specifically being used in association with any plagiata species). Still, perhaps it's used for species which look like better known ones, but that's only purely a guess. Pengo 00:58, 24 December 2011 (UTC)

hyointestinalis
This one if from the current edition of Nature, rather than your lists. It is from a nasty bacterium. I understand the "intestinalis" part, but what has it got to do with the hyoid bone? SemperBlotto 17:40, 24 December 2011 (UTC)


 * My guess would be it's horseshoe shaped, but my guess would be wrong.


 * I found a paper where they describe this one, and it says Hyos = pig (from Greek), intestinalis = associated with intestines. Hyosintestinalis = associated with pig intestines. The paper is here: first page, half way down right column (starting with "Streptoccus intestinalis sp. nov.")


 * Merry Christmas Pengo 22:06, 24 December 2011 (UTC)


 * Well done. Buon natale. No more questions till after Christmas. SemperBlotto 22:17, 24 December 2011 (UTC)

gondii
I can't think of a sensible definition of this. Any ideas? SemperBlotto 19:52, 27 December 2011 (UTC) p.s. not yet in your lists!


 * Looks like you found the meaning already. "Named for the gundi rodent," or "gundi (posessive)"? or something, I guess? Gundi is from "dialectal Arabic" according to random house/dictionary.com. I don't know if gondi/gondii is just a different transliteration. Pengo 21:43, 27 December 2011 (UTC)

A resource!
Have you seen ? I found it when looking for the meaning of thetaiotaomicron:. SemperBlotto 17:05, 28 December 2011 (UTC)


 * Wow, that's quite a bizarre collection (especially the "puns"). The site should help with the stranger entries I hope. My favourite so far: "Erythroneura ix Myers (leafhopper) This was Myers' 9th species of Erythroneura." Pengo 17:38, 28 December 2011 (UTC)

macrantha
I have defined this as "having large flowers". But could it mean having large thorns (Greek akanthus as in pyracanthus)? SemperBlotto 10:12, 29 December 2011 (UTC)


 * I can't find anything definitive, so going off Google image search the pictures of macrantha look like large flowers, while micrantha shows small flowers (though you've defined it as small anthers? Which could be right too?). If acantha is for thorns, I'm guessing large thorns would be macracantha (27) or gigacantha (2) or megacantha (9) or macroacantha (3). See also polyantha. Pengo 22:49, 29 December 2011 (UTC)

Anatomical Latin
Hi there. Just a note to say that all "Latin" terms in otherwise English texts are not all botanical or zoological. See weitbrechti: and ligamentosus: as examples. Cheers. SemperBlotto 08:19, 9 January 2012 (UTC)

knaggsiella
I have taken the liberty of adding this one, even though it gets a single hit, because the epithetical person was a distant relative of mine (same surname). He wrote a couple of very good books on entomology for young people (I'm waiting for a cheap version to appear on Amazon). I wonder why they used the -iella suffix instead of just -i or -ii? SemperBlotto 11:09, 23 January 2012 (UTC) p.s. His son (Henry Valentine Knaggs) was more interesting (I have one of his books)


 * Very cool :) I completely don't understand the different suffixes used in Latin or in these epithets except for the conventions that I think I've learned working on this. Looks like -iella is pretty popular within the Caryocolum genus though Pengo 12:01, 23 January 2012 (UTC)

Catalogue of Life
Hi there. I would like to use this source to get a list of all phyla, classes etc (probably not much further down) to see which ones we haven't got. But the "Browse taxonomic classification" screen won't let me copy/paste the drop-down lists. Is there a better way that copy/pasting each entry? SemperBlotto 17:11, 23 January 2012 (UTC)


 * It's a bit tricky. The queries get complex when traversing taxonomic levels and I haven't abstracted that process (yet). Can we just copy from WikiSpecies? Added bonus of already being wikified. E.g. Phyla from Animalia:

Acanthocephala - Annelida - Arthropoda - Brachiopoda - Bryozoa - Chaetognatha - Chordata - Cnidaria - Ctenophora - Cycliophora - Dicyemida - Echinodermata - Echiura - Entoprocta - Gastrotricha - Gnathostomulida - Hemichordata - Kinorhyncha - Loricifera - Micrognathozoa - Mollusca - Monoblastozoa - Myxozoa - Myzostomida - Nematoda - Nematomorpha - Nemertea - Onychophora - Orthonectida - Phoronida - Placozoa - Platyhelminthes - Porifera - Priapulida - Rotifera - Sipuncula - Tardigrada - Xenacoelomorpha


 * If that doesn't work, let me know and I'll give it a go. I'm actually starting a new job today so won't be able to dedicate as much time to this, but I want to at least finish making a new list of epithets (and another for genre). (todo list: I still haven't included synonyms, and I want to combine -a -um -us etc entries this time, etc). I gotta run. Cheers Pengo 19:54, 23 January 2012 (UTC)

pongo
I notice that, in your popular primates section, you do not include itself. We only have other language definitions. What should we actually do for cases of species with the name X x (Ratus ratus: and the like)? They seem to be the archetype of the genus - the rattiest rat, most orangutang-like orangutang etc. SemperBlotto 08:32, 26 January 2012 (UTC)
 * p.s. I'm wondering why the orchid Bulbophyllum paniscus: is a small chimpanzee. SemperBlotto 09:57, 26 January 2012 (UTC)
 * p.p.s. And I'm wondering why the insects Melambrotus papio: and Nemopterella papio: are baboon-like. SemperBlotto 10:28, 26 January 2012 (UTC)
 * p.p.p.s. Many of the red links in your Tetrapod section are due to piping one word to another. (e.g. Aves => Bird) SemperBlotto 10:49, 26 January 2012 (UTC)
 * p.p.p.p.s Do you think that my assumption in is correct? (no more questions for a while, I'll do something else) SemperBlotto 12:03, 26 January 2012 (UTC)


 * Hi SemperBlotto! Always good to hear from you.
 * All I know about Ratus ratus etc is they're called tautonyms. According to List of tautonyms, exact tauntonyms only exist in zoology but not botany. I don't think tautonyms have a special significance, though often they seem to (like Ratus ratus and Gorilla gorilla). Might be nice to start a category for them, but don't think there's anything else special that needs to be done.
 * Simia capuchin is a type species which I found on Capuchin monkey, which is how it got on the list even though COL doesn't have it listed.
 * I only left out pongo because it only seems to be used as a genus and not an epithet (though looking now it gets used as an epithet in Tanais pongo -- a crustacean). I've been systematically ignoring the genera so far, but want to get to them at some point. I think they could be created in some sort of automated fashion (other than the etymology), and I haven't gotten around to thinking about that too hard as yet. But please feel free to create pages for the genus or whatever else I haven't explicitly linked.
 * I have no idea why paniscus is used for both chimps and orchids. Maybe the orchid is named for Pangkal Pinang (that's a stab in the dark). Looks like it's had a few different names though . Looks like there's a few other similarly named orchids (Bulbophyllum pan, B. pandanetorum B. pandurella B. pantoblepharon). Bulbophyllum is a huge genus. No idea really, sorry.
 * Melambrotus papio.. I dunno.. maybe related to Papilio? Or, as it is an owlfly, which have "with large bulging eyes". Maybe its eyes are red and resemble a baboon's butt? I dunno. I'll have another look at these tomorrow. ... Nemopterella papio is apparently published in "Neuroptera-Planipennia. The Lace-wings of Southern Africa. 6. Family Nemopteridae. Pp. 290-501 in South African Animal Life, B. Hanström, P. Brinck and G. Rudebec, eds. Vol. 13. Swedish Natural Science Research Council, Stockholm." So if you wanted to track that down it might have the answer :)
 * Oops didn't notice the pipes when I copy-pasted that Tetrapod bit. I've unpiped it now and kept both halves.
 * "apella = small ape" seems reasonable but would be good to confirm it somewhere. (Can't find anything on this but I've added the ancient greek meaning)
 * Sorry couldn't be more helpful. I might try asking on some forums at some point

Pengo 20:12, 26 January 2012 (UTC)

cuspidatum
I don't think that binomial species names should be italicized in Latin entries. Do you know whether the standard in taxonomy is to italicize them in Latin species descriptions? I think that only if such is the standard should they be capitalized. OTOH, even if the standard is not for such italicization in Latin, it is at least arguable that the Translingual entries for species names should be. I have always focused on the practice of italicization of species names in English and do not know whether their italicization is the practice in all languages or all languages that have a Roman alphabet. Do you? DCDuring TALK 18:47, 7 February 2012 (UTC)


 * I know in English it's standard to italicize them everywhere. I don't know if there's any special standard for dictionary entries, and I did notice others don't use italics either on Wiktionary, so I haven't been 100%. But the only time I know that you don't usually italicize scientific names (Genera, binomial, or trinomial) is on a typewriter, in which case you underline them instead. I have no clue about other languages though, so let's have a look.


 * I tried Lion (Panthera leo) for my sample set because it's probably the most popular species article on Wikipedia and I figured it would have a lot of translations. I clicked through to all the various non-Roman-script languages to see what was used on different Wikipedias. These are my results:
 * Italics: kbd, am, ab, ar, av, ba, zh-min-nan, be, be-x-old, bg, bxr, cv, cs, ru, cu, el, jp, he (18 languages)
 * No italics: arc, ka (2 languages)


 * It's a pretty quick survey and it's not the most definitive thing ever, but it seems pretty likely that italics are generally used across almost all languages. I didn't count languages where the scientific name appeared in the taxobox or references but never in main body of text (though I only noticed one other example of non-italics in a taxobox, and references were generally italicized too—although possibly due to copy-pasting). The ignored languages (due to not using it in the text) were: bn, bo, pnb, ps, ta, hy, hi, gu, got, fa, ha. The Latin article (la) also used italics.


 * So at this point I reckon they should probably be italicized unless there's a special objection to do with dictionary formatting. Pengo 07:38, 9 February 2012 (UTC)
 * Thanks for doing the research. At least now we know of the apparent validity of italicization in many languages. I still don't see why they should be italicized in Latin entries under the logic (admittedly a prescriptive consideration) we have that such binomials are grammatically Latin terms. I can't find use of "Panthera leo" in texts that reflect usage after the modern standardization of taxonomic names. Linnaeus didn't italicize, of course. DCDuring TALK 15:49, 9 February 2012 (UTC)
 * BTW, because of your research, this discussion might be useful in Category talk:Taxonomic names or a Wiktionary page. DCDuring TALK 15:52, 9 February 2012 (UTC)


 * Seems like something for the beer parlour. In the end I don't really know if they should be considered translingual "descendants" or Latin "derived terms", or what. Anyway a wider discussion would be helpful. Pengo 04:14, 10 February 2012 (UTC)

spathifer
Can't figure this one. Any ideas? SemperBlotto (talk) 07:32, 12 October 2012 (UTC)


 * Great to see you still making progress! Spathifer seems to mean spade-bearing or spatula-bearing or "broad two-edged sword" bearing. "Spathe" might also be from Greek origin, the word σπάθη, which refers to "a flat blade used by weavers for striking the threads of the woof home, so as to make the web close". Or spathe of a plant. I got these answers by asking on Reddit, see all the responses: . Pengo (talk) 19:53, 12 October 2012 (UTC)

xylostella (and xylostellus, xylostellum)
Looks like it ought to refer to woody stars, but Plutella xylostella: is named the diamondback moth:. Any ideas? SemperBlotto (talk) 10:39, 19 October 2012 (UTC)


 * Hey SemperBlotto, I tracked it down the moth. Xylostella refers to Lonicera xylosteum, a plant on which it is "supposed to" have fed on:

"The specific scientific name appended to that of Plutella (namely cruciferarum) is a very appropriate one, as it well describes the habits of the caterpillar in mainly feeding on cruciferous plants, namely those of the Cabbage kind. The word Xylostella appended to Cerostoma, by which synonym this moth was formerly known, refers to the habit it was, in early days of observation, supposed to have of feeding on the Lonicera Xylosteum, L., the upright or two-flowered honeysuckle, a shrub to be found in thickets, and more especially in Sussex."
 * —Eleanor Anne Ormerod Simpkin, Marshall, Hamilton, Kent & Co., 1891 - Beneficial insects (full text)


 * Why Lonicera xylosteum is called that I'm not sure. Pengo (talk) 02:17, 26 November 2013 (UTC)

New list of wanted taxa
See User:DCDuring/MissingTaxa. It is based on two or more occurrences of a taxon enclosed in. It is just another listing to consider. I will update it from time to time, now that I know how.

I am also trying to do lists of specific epithets that are missing in wiktionary entries. See Category:Species entry using missing Translingual specific epithet and Category:Species entry using missing Latin specific epithet. Your listings, as valuable as they are, overwhelmed me. DCDuring TALK 23:22, 2 October 2013 (UTC)


 * Nice one. My lists need a heap of tidying up and maintenance, which I haven't been able to dedicate time to for a while. If you scroll down to the red links, User:Pengo/Latin/Most Common Epithets 1 still contains the most "important" missing epithets, if you're interested. I never finished putting together a list of the most searched for species though. Pengo (talk) 02:47, 3 October 2013 (UTC)

Advice sought
In my efforts to make a useful set of entries for taxonomic names, I am focusing on those that correspond to normal-language entries, ie, not Translingual. Inserting is a somewhat labor-intensive way of achieving that. I have already done searchbox searches for the words species and genus and added wherever it was appropriate. I have tried looking at categories like Category:en:Fruits etc.

Whenever a new taxon is added, I add taxlink wherever appropriate to any redlinked taxonomic name contained in entries that contain the taxon and plain links to at least the first use in an L2. In doing this, I notice that there are many entries which offer no obvious clue that they contain a taxonomic name. That means that I would have to do a more all-encompassing search to find them. I can't really search for "all" taxonomic names or rather it is silly to try as there are millions, most of which are of interest only to specialists to whom Wiktionary is not a plausible resource. So, I would like to exclude all subgeneric names and all specific epithets from my searches. That means all the words I am looking for should be capitalized in Wiktionary and effectively all are single words. I can get a usable list of all entry titles in Wikispecies, which could be reduced to the one-word titles. The intersection of WS one-word titles and capitalized words appearing in definitions worded in English in Wiktionary (excluding sentence-initial capitals and those from abbreviation entries if that is feasible and speeds things up), removing duplicates, would be a valuable list for subsequent use:
 * 1) all redlinks are candidates for addition;
 * 2) all blue links should be linkified at least once in each L2 in which they appear in a definition or list.

Some of this I can do or can learn to do with dump processing using Perl or Python (recommendations about language?)

Linkification would seem botworthy. I know nothing about bots. What course of action do you recommend?

What is an efficient way to produce the list of capitalized words not sentence-initial? Is this likely to take hours, days? (I wish I had installed 64-bit Windows so I could use more than 4 Gb of RAM.)

Am I missing some alternative? Am I thinking about this wrong?

Please answer at your convenience. If you think I should try someone else, please make a recommendation. DCDuring TALK 19:28, 14 November 2013 (UTC)


 * Hi DCDuring. Interesting problem. The processing/bot work I've done, I've used Python and a MediaWiki API/library (WikiTools) and worked from database dumps. I don't know how familiar you are with programming, but even if you are experienced, it can quickly become a very time-consuming and involved process to get anything working well. Things that might seem like they should take hours can end up taking days, so I don't really recommend it unless you really want to learn the skills, or want to invest time in creating an automated fix, (and if you do, then I'd recommend Python. 4GB is not an important limit, as you never keep any entire database in memory at once). However, there is an alternative...


 * I've just had a play with AutoWikiBrowser to see if it would be feasible to use for this, and while it doesn't really seem designed specifically for this sort of thing, it has some powerful tools around filtering and processing lists of pages and I think it could do the job.


 * Here's the rough steps I've worked out in AWB. Partly recording this for myself, so let me know if it doesn't make sense at all or if I've solved the wrong problem. Note that I haven't used AWB before, and also I've never really used wikispecies (surprisingly enough). But here are the steps I've worked out to create a list of candidate pages for adding taxlink templates to. User:Pengo/candidates-sample has a few pages that I've gotten from the this process if you want to skip to the results and see how well it works or doesn't. (There seems to be some false positives which already have taxlink templates.)


 * Notes: It's a little rough (mainly the part where you capture pages from Wikispecies at the start), and you might need to find a better way to do that, but this will get you started. Also note that actually replacing links with taxlinks is not automated here (and would probably require a custom program, e.g. written in Python, or major changes to AWB).


 * download, install, and run AWB
 * Get a list of pages from wikispecies:
 * Options | Preferences | Site | Project: "species"
 * Turn on filtering of non-article space pages: List | Remove non-main space (be sure that it's checked)
 * Choose source: "What links here", and What links to: "Eukaryota" [Note: this isn't the best way to pick pages. You might need to find a better way. Wikispecies doesn't seem to use categories. The limit seems to be 25000 pages. Might be a preference somewhere to increase.]
 * Click "Make list" (now you should have 25,000 pages, which may or may not be eukaryotes and will be of all different classificatoons)
 * Save the list: right click on the list and "Save list". Let's call it "25000-wikispecies-pages.txt"
 * Find which ones are redlinks on Wiktionary.
 * Open up 25000-wikispecies-pages.txt in a text editor (e.g. notepad++) and paste it into a sandbox page on Wiktionary (e.g. User:DCDuring/sandbox). I've put the first 200 here to test this. (I can't find a way to do this step directly within AWB, but there might be a way)
 * Clear your list in AWB: Select an item, Ctrl-A, Del
 * Options | Preferences | Site | Project: "wiktionary"
 * Source: "Links on page (only redlinks)", Links on: "User:DCDuring/sandbox", "Make List" (in my test, I get a list of 177 pages which are redlinks of the 200)
 * You might want to backup (save) this list now (in case you need to redo the next step, etc)
 * Find where those redlinks are found on Wiktionary:
 * Select all items in the list (select an item, Ctrl-A)
 * Right click on list | "Add selected to list from ..." | "What links here"
 * hit "Delete" (to remove the redlinks, so you're left with only the candidate pages. I got 23 pages in my test sample.)
 * Save this list: Right click on the list and "Save list" (e.g. "candidate-list.txt")
 * Go through the pages on this final list and add taxlinks:
 * You can right-click items and choose "Open in browser"; or you can copy "candidate-list.txt" into a sandbox page (see example: User:Pengo/candidates-sample)


 * Let me know if that's useful.


 * As for linkifying blue taxlinks.... the taxlink template already adds the hidden category Category:Entries with redundant taxonomic template (taxlink). It probably wouldn't be difficult to make a regex in AWB to clean these up automatically, but the category is currently empty so perhaps someone already does that? Pengo (talk) 16:17, 15 November 2013 (UTC)
 * Thanks. It is especially useful to consider AWB as a tool. I empty out Category:Entries with redundant taxonomic template (taxlink) as items appear in it, several times a day, so that the run I do on every dump (every two weeks or so) for items enclosed in taxlink has a very high yield (>99%) of redlinks. It's also a way of checking on edits of entries that use taxonomic names. I also have installed new additions listing tables on the bigger taxonomic name categories, like species, genus, and family to see what's going on.
 * One thing that I did not express well is that my highest priority to to find uses of taxonomic names in Wiktionary that are not entries in Wiktionary and are now not enclosed in (and often not even in plain redlinks), especially from L2 sections other than Translingual. I am trying hard to focus effort on terms that have been useful in the definiens for vernacular language entries so that Translingual does not become an island and so contributors in all languages will come to appreciate the value of linking to Translingual entries, preferably with . Almost anything manual toward this end seems incredibly tedious. The reason I was interested in a list of capitalized English words (not sentence-initial) that are not now Wiktionary English entries is that there are many ways of extracting candidates from that list, including affixes, especially suffixes like -idae, -inae, -oidea, -eae, -ida, -ales, as well as comparison with Wikispecies and, possibly, Wikipedia. Some of the resulting lists will have very high yields and will be relatively uniform so that copy and paste could be repeated, requiring little further typing.
 * Looking at the yield from the sample, I drew conclusions. Most significantly Translingual entries do not have a particularly high yield that cannot be exceeded by searches for See also headers (which I have tried in the past, before focusing on non-Translingual entries). The method you propose would work pretty well if it excluded Translingual entries, which are better handled by other means and which often have no connection to any vernacular names. DCDuring TALK 21:38, 15 November 2013 (UTC)

Hi again DCDuring. Hmm. That does sound like a good candidate for a Python script, but I'll persist with AWB and see how far we can get.

I tried making a list just of non-sentence Capitals found in glosses. It doesn't seem like a very useful list by itself as you might expect. e.g. word contains Christian, livre contains France, cat contains Inc, English, Unix and Felidae (so I guess that's one good match). Also, it's tricky to match just mid-sentence capital words without losing a bunch of candidates, such as words inside 'l' templates, etc.

Adding suffixes (like you suggest) seems more useful. Originally I tried to keep the in-sentence-capital check but it was too messy and lost too many candidates (about 43% of -idae pages) so I've just stuck with any capitalized words ending in -idae. Also makes it much easier, as you don't have to worry so much about avoiding the first word of the sentence, or templates, which would complicate things a lot when doing things with Regexs. Here's my list for pages containing capitalized words ending -idae: User:Pengo/candidates-idae (complete list of results, 870 1528 entries). Excludes -idae words already in taxlink templates (which would give it 252 more entries). Here's how I got it in AWB (requires downloading a database dump):


 * Options | Preferences | Site | Project: "wiktionary"
 * Tools | Database Scanner
 * Database | Database file: enwiktionary-20131101-pages-articles-multistream.xml (Download from dumps.wikimedia.org and extract, e.g. with 7zip)
 * Namespace | Content: [X] Content
 * Text:
 * Contains:
 * (Change idae to whatever ending you want to search for)
 * Not contains:
 * [X] Regex
 * [X] Case sensitive
 * [X] Multiline
 * Title:
 * [X] Not contains:
 * [X] Regex
 * [X] Case sensitive
 * (Removes entries ending in -id, e.g. diplodocoid, capitellid, nummulitid. In this example they probably made up most of the results)
 * Start
 * Save list

Some notes on the regex:
 * change "idae" to whatever ending you want.
 * delete to include taxlinks too
 * delete  to only include taxlinks (untested)
 * delete  to not restrict search to glosses, for example, to match -edo (which has matches under the   level)
 * There's an AWB Regex Tester which lets you find what word was matched on a page. You need to paste the source code of the page into it. E.g. http://i.imgur.com/sbRnwLY.png

Also, note AWB has a List Comparer (under Tools), which might be useful for comparing lists from Wikispecies with lists from Wiktionary. However, it might be difficult to get the matched -idae words into a list (rather than the page titles that contain matches) without a little scripting (in Python etc).

I hope that's looking more promising/helpful.

Taxlinks search-and-replace:

To swap taxlink with normal links with search-and-replace: (I've never used AWB to actually edit pages).

Taxlink to normal link:
 * Search:
 * Replace:
 * e.g. replaces " nest" with " ".

Plain text (ending in -idae) to taxlink: I haven't looked into this much though. E.g. it works if the -idae word is plain text (not a link or in a template). I'm not sure if AWB has options for automatically handling more broad cases (haven't looked into it fully).
 * Find:
 * Replace:.

You might also be able to use these search-and-replace regexs with a Regex browser plugin or wikipedia plugin if such a thing exists. Pengo (talk) 04:53, 16 November 2013 (UTC)

The yield of improvement opportunities for these looks pretty high based on a sample of 20.
 * 1) 14 taxlinks in 3 entries
 * 2) some links missing
 * 3) some formatting (no italics above genus, italics for genus and lower)

Those are the most common gaps in my previous manual experience as well. To these must be added other common mistakes, such as use of no-longer-current taxa (eg, entries from Webster 1913) attributing to a taxonomic name a wrong ranking (eg, "the *order Accipitridae"). Both of these could be helped by dump processing using large lists.

But an approach focused on affixes will not cover a large fraction of taxa, most significantly genera and species, which tend to be the levels most used in vernacular name definiens, which are the focus of my current interests.


 * 1) An AWB-based approach looks good for all character-level corrections and for inserting taxlink or other templates, but I haven't used it. I will try it.
 * 2) I suppose there are also cases where a bot might work, but the risk of error is higher without human assistance. I like to think this could be done, but it is not in the cards right now because of my lack of requisite skills.
 * 3) A list (especially a list that is the intersection of multiple lists) is useful for entries with multiple problems or deficiencies, which characterizes the typical Translingual entry relative to what can be readily achieved. Such lists can be produced:
 * 4) By inserting templates (eg,, ) and using categories, search, and transclusion lists
 * 5) By dump processing or
 * 6) By using the search box.

Virtually everything I have done is with lists, but I have found long lists get shorter, but not much shorter, over time. Short-to-medium lists seem more motivating to me. I'd like to use machines to make lists that are both short and contain entries that are clearly worth the effort.

The choice of methods to be used relative to the broad goal of general improvement of a class of entries is not obvious.

I also consulted with Ruakh who thinks it is not ridiculous to compare the entire content of Wiktionary with, say the list of all pagetitles from Wikispecies to generate lists like those under discussion. He is a Perl user who offered to do some of this himself. I'd like to see how far I can get without recourse to his offer. If I can't get far, I'll ask for him to do it for a "hard" case and see if I can use his work as a model for other cases, both simpler and not too much more complicated. DCDuring TALK 20:28, 19 November 2013 (UTC)


 * Yep, probably not ridiculous to generate such lists. Was hoping I could get further just using AWB, but you're right that it's only useful for higher taxa than genra/species level, and even then it's not as successful as I would have hoped. Yep, any search/replace I've suggested would definitely need human assistance.


 * One way to 'reduce' the lists I was looking at previously was to prioritise them by the popularity of the associated articles on Wikipedia (via stats.grok.se, e.g. the two top 10 lists I had at the top of this ), but it took quite a bit of data wrangling to work out. I might try and pick this up again sometime now that I have a little spare time again. Pengo (talk) 02:42, 25 November 2013 (UTC)

Template:la-decl-adj-table-m+f+n/documentation
Hello Pengo. You've done a good job with that documentation; it was well overdue. — I.S.M.E.T.A. 16:10, 21 November 2013 (UTC)

remove incomplete or misplaced quotation
Hi. Just for info, you can use for these - i.e. a quotation by that particular author is being sought, and hasn't been added yet. Probably best to remove the Chaucer one anyway, though, because it wouldn't be modern English. Equinox ◑ 15:38, 6 December 2013 (UTC)
 * Thanks. Looks like Chaucer is considered "the first poet laureate", rather than actually using the word himself in Middle English AFAICT. (So any quote would be better suited to poet laureate anyway.) Pengo (talk) 18:46, 6 December 2013 (UTC)

crimewave
Are you sure that the crimewave is the increase, and not the result of the increase? If so, "a crimewave is going on" would seem to mean "it's increasing more and more all the time", which seems odd. Equinox ◑ 00:56, 6 January 2014 (UTC)


 * Hmm I see your issue, however without "increase" it could mean any short period within a background of high crime would be counted as a "crime wave". I'll have another go at rewording. Pengo (talk) 01:05, 6 January 2014 (UTC)

Common missing words
Send me the "-ings" list if you like (or the entire list!). Equinox ◑ 14:40, 11 March 2014 (UTC)


 * I've been looking at your "missing" lists again. (I added a lot of the "non-" words today.) I tend to feel that the hyphenated forms are secondary, purely because they can be easily decomposed into "non" + whatever, whereas without a hyphen it might not be clear where the word is to be split for parsing. Would you object to me deleting entries from your pages where only the hyphenless form has been created, or do you insist on the creation of both? Equinox ◑ 00:09, 30 January 2015 (UTC)


 * Please be my guest and delete them. The source material (google ngrams) tends to get the hyphenation wrong so it's up to your discretion. I included both just so you can see which entries already exist. 02:21, 30 January 2015 (UTC)

tordilio and tordillo
Hi. In the tordilio and tordillo, there's an L. at the end after the scientific name. Do you think it means something? --Type56op9 (talk) 14:34, 3 February 2015 (UTC)
 * Yup, that's for Carl Linnaeus, as a botanist author citation. Pengo (talk) 19:59, 3 February 2015 (UTC)
 * Thanks. Do you think it should stay in the entry? --Type56op9 (talk) 09:25, 4 February 2015 (UTC)
 * I'm not really sure to be honest. I mostly try to leave it to others to decide how to layout entries :) Personally, I'd prefer to see Linnaeus acknowledged in the etymology of the [currently non-existent] Tordylium maximum entry instead. Could try asking in the tea room for an opinion? If you feel strongly that it's confusing, then go ahead and remove it, and maybe try to add some other detail to make up for it. Pengo (talk) 09:35, 4 February 2015 (UTC)
 * OK, I removed Linnaeus's tag, and gave it a translation, add added that to hartwort. Seems OK to me. Time to move on. --Type56op9 (talk) 09:39, 4 February 2015 (UTC)

umm...taxa question?
So... I just found out about this whole Catalogue of Life thing (as you may have noticed), and I feel a bit weird adding such long entries. I am, however, only adding accepted names from the CoL, which has halved the size of the descendants section. Is this what I should be doing? Should I be adding just a couple token taxa or all the accepted names or all the possible species on the CoL (accepted name, synonym for, previously accepted name, etc.)? If it is the last option, I don't know that I have the patience unless someone exports them into a list for me. Just curious. —JohnC5 11:26, 17 February 2015 (UTC)


 * Good question. I don't have a definite answer. Mostly I just find the things that seem like they need entries, and I don't know what to do with them after that. But if that's the part that's killing you, I can have a go at generating the descendant lists when I get a chance.
 * Generally, most Wiktionary entries I've seen have only included a handful of select binomials descendants, so I've been impressed with your more complete lists. (It hadn't actually occurred to me that they could be put in a collapsible box where they wouldn't take so much screen real estate, even though I must have seen it done before). I have no idea if there's a particular accepted way to list them though or how completely to do so. (I had figured you had it worked out)
 * CoL's "accepted" and "provisionally accepted" names are generally good from what I've seen. The synonyms are variable in usefulness by their nature. Sometimes a synonym is a misspelling that was only ever used once, while another might have been an accepted name for decades, or still be accepted by another source. Many synonyms are also missing from CoL. If I generate lists, I might just include the synonyms which have been seen in numerous books (via google books ngram data), which isn't perfect but might be good enough to stop too many never-seen synonyms.
 * If I had to choose one way or the other, I'd say leaving out synonyms would be preferred to including all of them. I guess it's really meant to be up to how well attested they are.
 * Worth briefly noting, in case you really want to be a completionist, there are other/bigger databases of species names (e.g. EOL and gni.globalnames.org), but I've stuck to only CoL for now. By their own measurements they are only "84% complete". The main area I've noticed they're lacking is in extinct/fossil taxa.
 * If I generate the lists, do you think it's acceptable to divide up the descendants into kingdoms (eg plant/animal/bacteria) to break up long lists, or should I keep them as a straight sorted list? Also should I annotate the synonyms to include what they're synonyms for? e.g. "*Helix aspera (=Cornu aspersum)" or leave that for the Helix aspera page when it's created? Pengo (talk) 13:00, 17 February 2015 (UTC)


 * Very cool. I am somewhat of a completionist when it comes to entry writing. If you created descendants tables that had titles like "Tranligual descedants (Animalia)", "Tranligual descedants (Plantae)", etc., I think that would be awesome. Also, does previously accepted name mean obsolete name, and if so, should we use the fancy  parameter, even if it has yet to be implemented? I assume that parameter will add one of the classy ‡ marks after the name, which pleases me just to contemplate. But, if it is not too much trouble to generate these lists, particularly if they are preformatted into the  and tables, we could start really rolling these babies out. We could even go back and update existing entries. What do you say? Also, and I don't want to heap too much on you, but if you you could check to ensure that all entries which already exist don't use  but instead a normal link, I'm sure DC would greatly appreciate that. As for the synonym thing, I have no idea. That's up to you. If I knew more about taxonomy, I'd add entries myself, but Latin and etymology are more my style. —JohnC5 20:52, 17 February 2015 (UTC)
 * "Tranligual descedants (Animalia)" works. Cool.
 * I guess when a name is considered a synonym it's obsolete. Tagging obsolete should be ok, though basically I'd have to apply it to anything that is a synonym (other relevant categories are aren't consistently applied in COL). So it would be applied to names that were never widely used too. Hopefully COL doesn't disagree with other authorities/sources on what's a synonym too often. Also I'm not sure if there's ever a time when "archaic" should be used instead?
 * There's a few corners I've cut with the lists so far, which I might try to clean up if I'm making stuff for the main space of Wiktionary, e.g. I haven't given any special treatment to the rarely used COL categories of "ambiguous syn" and "misapplied name", nor attempted to remove junk entries like "Tilapia sp".
 * There's also a question of subspecific names, which I've ignored so far, but I guess descendants of "gracilis" should include Lycalopex gymnocercus gracilis (and Spiranthes lacera var. gracilis too).
 * A few of the most common epithets can have thousands of species that use them (e.g. gracilis: 2334 accepted species names, 4779 when including synonyms, or 5671 when including subspecific "gracilis" too, and there'll be even more from sources other than CoL). So I don't know how we could handle those, if we need some kind of appendix or something.
 * The double dagger for synonyms isn't standard btw, I just picked it as a footnote kind of symbol to avoid filling the page with (synonym) (and partly in analogy to the single dagger (†), which is standard for extinct taxa)
 * Doing something to automate addition/removal has been on the todo list for a while, but I haven't really looked at it yet.
 * Anyway, I'll see what I can do. Pengo (talk) 01:19, 18 February 2015 (UTC)
 * Thanks so much for your help. I say just ignore subspecific names. This is a dictionary, not a taxonomic database. if the  stuff is too complicated, just ignore it. In truth, anything you do would be fine with me. :) Would you be posting these to a page where I could find them or something of that nature? —JohnC5 01:34, 18 February 2015 (UTC)
 * Yep, probably best to start simple anyway. Oh yeah, I forgot to ask.. Say I made a species list for "mariana, marianum, marianus".. How many of the other forms should be automatically include? (in this case the others are marianae, marianii, mariani, marianarum, marianiae, marjana, marianorum, marian). I'm guessing the genitive forms (-ae, -i) are safe to include too? Though there might be a chance the -i form refers to an unrelated eponym (ala marianii)? And sometimes -ae forms seem to get their own entries (eg syringae) And is it safe to swap I's for J's or is there a chance they would make different words? The simplest would be to include everything and let someone (you) delete the bad (unrelated) entries, but if the bad entries are minimized then the bot could just add the list to a page automatically. Pengo (talk) 04:25, 18 February 2015 (UTC)
 * I say you can include -us, -a, -um, -i, -ae, -arum, and -orum. For the i ~ j alternation, we normally create another entry that just links to the first (e.g. iunceus ~ junceus). As for extensions of the stem (-ii, -iae, etc.) I don't know how to treat that. In most cases, I imagine these would come from misunderstandings of how Latin declension works. In some cases they might come from alternative forms, but most of the time, I expect they are just wrongly created. In the case of wrongly created forms, add them with the rest. If it becomes clear to me that some of the form belong to a different word, I'd just remove them. —JohnC5 05:02, 18 February 2015 (UTC)
 * Still a work in progress, but the basics are done: User:Pengo/common_epithets/desc Pengo (talk) 10:43, 18 February 2015 (UTC)
 * That looks spectacular! You are going to re-sort them alphabetically at some point, I assume, so that they are not divided by epithet? I have been waiting for you to finish before starting on the entries you put on WT:WE, if you don't mind. Once I add a list to an entry, I'll probably delete it from that page to save space (Once you have program up an running up and running, of course). —JohnC5 11:01, 18 February 2015 (UTC)
 * Oops, it was meant to be sorted already :) Ok, I'll post up ones for those epithets in a minute. Yes, consider this one a temporary list, and delete them / blank the page when done. Pengo (talk) 11:32, 18 February 2015 (UTC)

Coolio! Unfortunately, I'm about to got to sleep, but I shall work on these soon! —JohnC5 11:33, 18 February 2015 (UTC)

Mario Mariani et al and epithets ending in ensis and ense
We could gather even more of the low hanging fruit by getting the most common epithets ending in ensis and ense, which are often toponyms.

Also wikispecies has entries in main space for authorities, many of whom have lent their name to epithets. Wikispecies has such entries for many not graced with any WP article or even mention. DCDuring TALK 05:04, 21 February 2015 (UTC)


 * Hmm.. I feel like many of the most common -ensis terms have been done. Of my "top 1000" epithets, only 4 are red-linked -ensis terms:


 * sitchensis, sitchense
 * halepensis, halepense
 * , barbadensis
 * benghalensis, benghalense


 * There's more you can find hitting ctrl-F and searching for "ensis" at User:Pengo/common_epithets/missing. If inroads get made there, I can try making a more specific list. But there's also this endless list.


 * What are you thinking for the authorities? How are you thinking you could use that information? I stumbled across books before that are dedicated just to describing the people behind eponymous species... Pengo (talk) 06:01, 21 February 2015 (UTC)