User:Connel MacKenzie/Tasks and projects

Pending tasks

 * 1) For a given word, provide "prev" and "next" words alphabetically in English
 * 2) JS Preload buttons (row of icons)
 * 3) figure out way to overlay text on blank button
 * 4) See if single button can be stretched to all buttons' width so only 1 image is downloaded
 * 5) JS Translation column balancing to minimize vertical pixels used
 * 6) JS Language cleanup button - just call Anabel's toolserver tool.
 * 7) JS Magic inflection button (nah, do as automatic from preload choice?  Nah, do as magic stem buttons)
 * 8) JS Welcome and welcomeip, pediawelcome, userwarn buttons
 * 9) JS Categories to bottom of page, but above transwikis (sorted!)
 * 10) JS "subst:" button (find "nearest" "{{".)
 * 11) JS Auto 2 column the selected lines
 * 12) JS Auto 3 column the selected lines
 * 13) JS Auto 4 column the selected lines
 * 14) JS Link magic buttons: (convert all links on page)
 * 15) For all page links, add E, 0, +, H, T, D, P, M, W, R links (edit, hist, talk, del, prot, move, whatlinkshere, related changes)
 * 16) For all user page and user talk page links, add U, T, +, C, BL, B, UL links (user, talk, talk_new, contribs, block log, block, user list/name)
 * 17) For all talk, Wikt: & help: pages, add E, 0, +, H, P, R links (edit, edit sec 0, edit_new, hist, prot, related)
 * 18) re-run GutenBot on top 1,000 entries adapting Hippietrail's formatting suggestions (to an extent)
 * 19) Run one bot to remove the current doobers
 * 20) Rerun the rankings for top 3,000.
 * 21) User talk:Connel MacKenzie
 * 22) Bot runs: (get these scheduled more regularly - each XML dump)
 * 23) ComparaBot,
 * 24) SuperlBot,
 * 25) TheCheatBot,
 * 26) PluralsBot,
 * 27) ThirdPersBot,
 * 28) PastBot
 * 29) User talk:TheDaveRoss
 * 30) Populate WT:RFDA from enwiktionary-latest-meta-history.xml
 * 31) rewrite English Random redirector in PHP, off toolserver db, not XML dumps (someday.  Might be easier to load GT.M MUMPS on toolserver.)
 * 32) Rewrite auto-column balancer (especially for the "names" indexes) in Javascript so I'm no longer the only one doing it.  (Or not, if previous is satisfactory.)  Remember "Zemena" (from ru: wiki) for text-area selection.
 * 33) Devise category vandalism page to identify all category removals.
 * 34) Work out better method of reporting "top-40" languages vs. dewikified languages (make a finite list!)  Update: Hippietrail and Stephen are doing this, so I don't have to.  Yay!  Wonder what ever became of this.
 * 35) Get official clarification from Proj. Gut. regarding Webster's 1913.  Done: July 2006...A-OK.
 * 36) Import Webster's 1913.
 * 37) Preload templates rewrite (each individual template.)
 * 38) Javascript-ize the preload templates to determine from approximate suffix which to auto-load
 * 39) Javascript add buttons for each of the preload templates as a row of small buttons (above or to the right of the current edit-box buttons.)  Better might be a list of preload templates (horizontal) saying what "suffixed word" the template is for.
 * 40) Review Help: namespace and double the current content.  Let Pathoschild do this?
 * 41) JS buttons for "history-ize all" and "edit-ize all" links on page.  (Hrm, maybe my js should just do that for all wikilinks, no matter what - add a "h", "e" and "t" links to them?  For userpages, "t", "+", "c"?  Perhaps also WhatLinksHere?  Delete?  Protect?  Move?  Watch/Unwatch?)  Bookmarklets are the way to go for these, right?
 * 42) Import these into Wiktionary, somehow.
 * 43) FmtTransBot: 'bot add the {{temp|trans-top}}/{{temp|trans-mid}}/{{temp|trans-bottom}} to translation sections, and balance columns of entries that do (only change if unbalanced by three or more?)
 * 44) Import Dictionary of the Chinook Jargon
 * 45) Create a Javascript parser, to create a dict object by parsing by headers, arranging sub-objects by blocks of text (that have sub-sub objects as blocks of text) to correspond to the TOC of an entry.
 * 46) Create a Javascript "re-arranger" to sort headings as per ELE, based on the parsing from previous item in this list.
 * 47) Get small XML list of entries from all other language Wiktionaries, subtract the entries that exist on en.wikt.
 * 48) One by one, download all the other language wiktionaries, traverse all entries (once?) and find all that have ==English== or {{-en-}}.
 * 49) User:ComparBot next in line?
 * 50) Help Hippietrail refine "personalsidebar.js".
 * 51) Move User:Connel MacKenzie/custom.js and User:Connel MacKenzie/Preferences to MediaWiki: namespace.  Half done 12/2006.
 * 52) Random page selection enhancements: by PartOfSpeech
 * 53) Find the MediaWiki: page displayed at login, and add most of {{temp|welcome}} to it, so the first time someone logs in they have at least a hint.
 * 54) Perhaps think about something similar for anon's 1st page view.  (Set a cookie.)
 * 55) Rewrite spellchecker to use en.Wiktionary's list of English words (minus redirects, minus slang, minus colloquial, minus misspellings) plus the "list of common misspellings" as a stop list...instead of using "ispell" as it does now.
 * 56) Add Wiktionary jargon (e.g. WT:BP, protologism, POS, etc.) to spellcheck's list of "custom" allowable terms.  Also add Wikipedia jargon, since it is being used more heavily there.
 * 57) Get the more generic things (spellcheck, keypad) out of WT:PREF and into MB proper.
 * 58) Remove [Citations], replace with (all) [Subpages].  <-- for me only?
 * 59) Wrap User:Dmcdevit/monobook.js into WT:PREFS
 * 60) Check status of http://download.wikimedia.org/ every four hours, send e-mail whenever enwiktionary/latest changes.  Lowest priority...even if I don't check, others nag me.  1/2007
 * 61) Add User:Annabelleke/Transtool link to WT:PREFS.
 * 62) Fix /patrol.js to work with main.js for scrunched up Special:Recentchages display (keep sub-arrays and look through all edits to a term marking them all at once.)
 * 63) rewrite /patrol.js to honor enhanced layout, or SemperBlotto will never use it.  Mark all for a user.  Mark all pages that have since been edited by a whitelisted user sysop (Because it should have just been tagged for speedy, cleanup, verification.  Or cleaned up.)
 * 64) Fix ALT-C for the 'new' namespaces added a few months ago.
 * 65) Finish cleanup (+intos) for Special:Prefixindex/Template:new en & MediaWiki:Noexactmatch
 * 66) Javascript bookmarklets for Special:Checkuser
 * 67) honor the new enhancements
 * 68) export as a comma separated variable file/table
 * 69) expand all usernames to  {{vandal|username}}  (and {{temp|proxyip2}}.)
 * 70) Monobook.js: MOVE to Common.js (only applicable pieces/parts?  Or all?)
 * 71) Monobook.js: Fix the Edittools thing to be dynamic (get code I wrote for bs: back in here!)
 * 72) Monobook.js: In Categories box at bottom of screen, hide all "Translation to be checked ..." categories.  Make it a WT:PREFS perhaps?
 * WT:PREFS: pull in User:Pill/monobook.js
 * 1) FIX STATISTICS
 * 2) Fork Werdnabot-Archiver.pl for RFC then RFD then RFV.
 * 3) Fix WT:PREFS to not show sysop things if there is no [delete] tab present on prefs page.  Add VOA uber-mega rollback thing to sysop section.
 * 4) Add a silly amount more WT:PREFS for randompage by language, externals (google, artfl, etc.)
 * 5) Add a "[Check minimum formatting]" button (via WT:PREF, like [check spelling],) that checks for a level two language heading, at least one level three heading (from the approved POS headings in WT:ELE), the pagename in bold after the approved pos heading OR an inflection template, at least one "#" definition line, and no "#" lines anywhere else.
 * 6) Special:Linksearch
 * 7) Study up on Meatball

Stalk for poor formatting

 * 1) Special:Contributions/Eclectophile
 * 2) Special:Contributions/Hailey C. Shannon
 * 3) Special:Contributions/80.129.147.4
 * 4) Special:Contributions/Mammal
 * 5) Special:Contributions/Fastifex sockpuppet?
 * 6) Special:Contributions/Scooteristi POV edits, formatting
 * 7) Special:Contributions/Kevlar67
 * 8) Special:Contributions/Elwikipedista adding linebreaks to critical templates!
 * 9) Special:Contributions/82.96.100.100 Dictionary of vulgar tongue.
 * 10) Special:Contributions/Brya Pending community ban on Wikipedia for POV craploading, using his Wiktionary craploading as supporting evidence.
 * 11) Special:Contributions/82.159.136.238 fmt
 * 12) Special:Contributions/Waltter Manoel da Silva wten fmt
 * 13) Special:Contributions/67.85.170.171 bogus?
 * 14) Special:Contributions/87.203.85.123 need someone to check this "Greek"
 * 15) Special:Contributions/Tubedogg Trademarks (all?)
 * 16) Special:Contributions/Anemos copyvio vandal
 * 17) Special:Contributions/Ruricolist copyvios & protologisms
 * 18) Special:Contributions/216.51.229.2 Might be OK
 * 19) Special:Contributions/Zondor bad transwikis
 * 20) Special:Contributions/216.220.231.226 Wonderfool?
 * 21) Special:Contributions/Sgeureka copious partial transwikis (garbage WP wanted deleted, indiscriminately?)
 * 22) Special:Contributions/Qubit Predates all formatting - sketchy quotations
 * 23) Special:Contributions/Richontaban Style seems cut-n-paste; probable copyvios.
 * 24) Special:Contributions/Scotwriter English?
 * 25) Special:Contributions/Curtisweyant Copyvio (attributed W1913!)
 * 26) Special:Contributions/96.229.184.69 Questionable entries only - good for mass RFDing?

Inactive/resolved tasks

 * 1) re-rsync w/Project Gutenberg (done ~4/2/2006)
 * 2) import newly sync'd files (done ~4/4/2006)
 * 3) find entries with  but no  (done 08:19, 7 April 2006 (UTC))
 * 4) Bad cruft after headings, on same line.  (done 17:34, 11 April 2006 (UTC))
 * 5) re-rank for Frequency counts, regenerate Frequency lists.  (done 06:54, 16 April 2006 (UTC))
 * 6) experimental English random page (done 05:22, 23 April 2006 (UTC))
 * 7) experimental "top several languages" random page (done 07:03, 25 April 2006 (UTC))
 * 8) Javascript/monobook.js
 * 9) Top row links (doneish 09:48, 22 April 2006 (UTC))
 * 10) [0] and [+] (done.  and [P].  09:48, 22 April 2006 (UTC))
 * 11) Bot one-time runs:
 * 12) TheCheatBot run, (done 08:19, 26 May 2006 (UTC))
 * 13) Add a "0" button to the right of the curent "+" button (edit section zero on all pages) (done a couple weeks ago.  01:34, 17 May 2006 (UTC))
 * 14) User:Kipmaster/to format (done 00:39, May 1, 2006)
 * 15) Template talk:janoun  (done?  16:41, 2 May 2006 (UTC))
 * 16) commons:User:ConnelBot - 1) auto d/l all images on COM:RFD, convert, add translucent red "X" overlay, re-upload.  2) Run toolserver:CheckUsage on each item, and auto append to each section of COM:RFD the results from that tool. (OBE - WT:CT 07:07, 12 June 2006 (UTC))
 * 17) URGENT: Tighten down the did-you-mean code.  Add &rdfrom="". 06:13, 7 July 2006 (UTC)
 * 18) experimental random-cleanup word (done 06:13, 29 April 2006 (UTC)) 06:13, 7 July 2006 (UTC)
 * 19) For Transwiki Namespace, add a link to main namespace entry (to see if it is blue or red) and a link to the transwiki log for PAGENAME...maybe even a link to the appropriate section of transwikilog#UCFIRST. 06:13, 7 July 2006 (UTC)
 * Category:Wikisaurus:Book / Category talk:Wikisaurus:Book inactive, experiment removed, talk pages left until the method is revived by someone.
 * 1) Refactor monobook.js into 20+ separate .js files, so I can pick and choose (and others can as well!)  (Excellent progress so far.  01:15, 22 September 2006 (UTC))
 * 2) JS Sidebar links  (done 09:48, 22 April 2006 (UTC))
 * 3) JS functions (row of buttons)  01:15, 22 September 2006 (UTC)
 * 4) JS HTML to Unicode (nah, run as bot since pywikipediabot has that in Wikipedia.py.) OBE 12/2006
 * 5) JS Language dewikifying button (nah, keep as it is.  12/2006)
 * 6) JS (Bah.  Useless.  12/2006.)  Insert buttons for most common headings (auto determine parent heading level and add one.)
 * 7) English
 * 8) Noun
 * 9) Verb
 * 10) Translations
 * 11) JS Check URL, if it contains ".5B" and/or ".5D" (after "#") remove them and reload page. done quite some time ago 01:17, 22 September 2006 (UTC))
 * 12) User talk:Connel MacKenzie OBE: done at server level from now on.  01:17, 22 September 2006 (UTC)
 * 13) Various "bad user page" lists.  (Started at: User talk:TheDaveRoss 15:46, 12 April 2006 (UTC).)  Give SemperBlotto the "two"s next.  (Done a while ago 12/2006.)
 * 14) quick-format User:Taxman's terms for import.  (mostly done, 18:26, 9 June 2006 (UTC))
 * 15) Refine auto-column balancing (with kerning) for Special:Requested articles.  (And WikiSaurus)  Convert to Caché Server Page so task of cutting-and-pasting can be delegated to someone else.  OBE 11/2006 - Requested pages have been mostly cleared, all lists are now managable sizes.
 * 16) further automate the list of links for WT:BPA... OBE 1/2007: switching to Werdnabot archiving.
 * 17) Add links near the interwiki links for all Special: sub-pages so I don't have to double page load during slow times.  Hrm, the namespace of  is what seems to have goofed it up...hardcode "Wikipedia:" for those links.  Done 11/2006.
 * 18) Push harder for pseudo-namespaces that were approved last year and never turned on.  Done.  Namespaces.  01:15, 22 September 2006 (UTC)
 * 19) Turn on the magic-auto redirects for myself, for internal redlinks (as I did for everyone in Monobook.js for external links.)  Hell no!  12/2006.
 * 20) Add ^wikisbst to monthly run; fix recursion to work correctly (i.e. more than two levels of transclusion.)  Done 12/2006.
 * 21)  (e.g. "invisible"    + Monobook.js code to do the lookup + Bot to tag XML dump entries + RCBot to tag new entries.  Also find the international version that Ec mentioned somewhere.  OBE 2/2007: Hippietrail's extension.
 * 22)  txt = txt.replace(/Morobashi/gi, "Morohashi"); (as a 'bot run)  Done 9/2006.
 * 23) For editing Requested entries*, write a JS function that checks all "* Wiktionary:List of idioms.)  OBE fall 2006 - list now under control: SB bluelinked most, someone else cleared list.
 * 24) Add sidebar random thingy for Special:Randompage/Transwiki.  Done fall 2006.
 * 25) Add sidebar link to www.urbandictionary to filter out cruft faster.  Done fall 2006.
 * 26) More: Finish cleanup of bs:MediaWiki:Monobook.js from zh:MediaWiki:Monobook.js  Done May/June 2006.
 * 27) Start refactoring Mediawiki:Monobook.js to adapt the most excellent code at zh:MediaWiki:Monobook.js, as well as my stuff at bs:MediaWiki:Monobook.js.  Duplicate 12/2006
 * 28) Detect edited doobers from irc://irc.wikimedia.org/en.wiktionary and check them off from the cleanup list. Without a more reliable way to live-mirror the recentchanges, this can never work.  12/2006.
 * 29) Build daily summary lists from irc://irc.wikimedia.org/en.wiktionary then Special:Export them as XML, then re-import that subset to keep my copy relatively up-to-date.
 * 30) Build English indexes OBE: Kipcool did it.  Yay! 01:15, 22 September 2006 (UTC)
 * 31) Build 'new entry' index by year/month.  OBE Fall 2006 - way, way, way too many entries.
 * 32) Fix WT:PREFS to consolodate cookies (URGENT)  Lower priority now: only breaks IE.  No one can realistically Wiki, using IE, so it doesn't matter.
 * 33) Fix WT:PREFS' spellchecker (http://tools.wikimedia.org/~cmackenzie/spellcheck.php) to move to hemlock.knams.  Done 10/2006.
 * 34) Add combobox to /keypad.js to allow (re)selection of that thing.  Done 9/2006.
 * 35) Get Wikipedia bot for clearing w:CAT:MtW tested and run.  2/3rds done 12/2006...
 * 36) Add banner atop RFD, RFDO, RFV, RFC  similar to BP/ID/TR/GP banner.  21:22, 3 January 2007 (UTC)
 * 37) Use the cookie code (/monobook.js) to "remember" the last selected thinggy in Edittools, and/or add a button.  Then, if toolbar exists, getCookie, and preselect that item.  Done summer 2006.
 * 38) /patrol.js page whitelisting 17:28, 5 January 2007 (UTC)

Current mania

 * Rebuilding historic archives of WT:RFV, WT:RFD, WT:RFDO.

Full XML dump 1/16/2008
It is fucking amazing that a 300 GB drive is now so inadequate (i.e. full.) I've finished doing all the houskeeping tasks I can think of. Granted, over 100GB came from the digital camera. I might zap those all down to lower resolution. (But every time I've done that in the past, I've regretted it.)

I can't really get rid of any of the Project Gutenberg stuff. Starting on collocations is set back again though...no disk...no research.

The Current Full history XML dump is >17 GB uncompressed. The "current pages" (including all the weird namespaces) is less than 700 MB. The next thing I can do, I suppose, is more aggressively thwack older "current pages" downloads that I've been saving for comparison (and because they ain't so big, compressed.)

The fact that I don't have a complete backup for over two years now, is troublesome.

Now, being massively space-constrained poses particular problems. Piping the the decompression to my analysis tool, I can read and parse entries/revisions and save the ones of interest. Saving one copy of each WT:RFV revision is somewhere between five and six times larger than simply saving all current revisions of all pages (still running.) I had to write my own buffering XML parser to get it as selective as I need it. :-(    But what's a terabyte or 500 among 'pedia friends.

Parsing those apart to get the last REVID a particular section appeared has nasty challenges of its own. Page blanking vandals in the past, mean that I have to compare the first couple lines of those sections to eliminate some of the duplicates. Other terms actually have been listed numerous times. Section headings themselves, for a long time weren't consistently wikified. (Inconsistently - great.) Long blocks moved from RFV to RFD or RFD to RFV sometimes were softlinked, sometimes not. Long blocks archived to talk pages often were copied to three or four places. Determining if the target page existed when the section was removed (or five minutes later) is a different kind of challenge - I have to pull the deletion log for that page from the live wiki ('cause I don't have the space available to build up a full revision copy here.)

Actually, I probably do. Eliminating non-NS:0 stuff, I might have room for all history. If WT:RFV full history import ever finishes, I guess I'll see. OK, so all revisions of WT:RFV is just slightly (a couple dozen MB) less than a gigabyte. Ugh...RFD was renamed from its original name - more customization to pick that one up next round. Grrrr.

--Connel MacKenzie 06:20, 7 February 2008 (UTC)