User:Scsbot/wikised documentation

Because wikised is a framework for performing many sorts of editing tasks, there are several pieces of information it must be given on startup to tell it what to do. It has to know which wiki to make changes to and who to log in as. It has to know which actual editing script to run and whether there are any check scripts. And it needs a list of pages to edit.

This information can be given to wikised either on its command line or via a couple of configuration files.

We'll start with a quick list of all the command-line flags, followed by some more detailed explanations of what they mean and how they're used.

Command-line flags

 * -host h : set host of wiki
 * -url u : set full base URL of wiki
 * -editsummary m : set edit summary message
 * -minor : mark edits as minor
 * -data f : set driver/data file (list of pages to edit, + parameters)
 * -okaytocreate : okay to create new pages
 * -mustcreate : don't overwrite old pages (must create new)
 * -sleep s : sleep for s seconds between edits
 * -precheckscript s : set pre-check script
 * -postcheckscript s : set post-check script
 * -editscript s : set main edit script
 * -filter : edit script is a pure filter
 * -checkdiffs : check diffs
 * -ninsert n : expected number of inserted lines
 * -ndelete n : expected number of deleted lines
 * -user u : set username to log in as
 * -pass p : password for that username
 * -loginconfig c : use predefined login configuration c (sets host, url, and user)
 * -f s : use master setup script s (sets edit script and all related parameters)
 * -help : print a brief help message

Discussion
To specify which machine to log in to, use either the -host or -url options. If you use -host to specify a host h, the script assumes that the base URL is http://h/w/index.php. Or you can specify the full base url explicitly using -url (in which case you don't need to use -host).

You must also specify a username and password. If you don't want to put the password on the command line, you can omit the -pass option, and the script will prompt for the password.

The cornerstone of the process is the edit script, specified with --editscript. By default, this script will be invoked as


 * script filename [ parms ]

where filename is a temporary file containing the wikitext of the page being edited, and parms is a list of any additional parameters (see below). However, if the -filter flag is specified, the script will be invoked as


 * script [ parms ]

and it will receive the current wikitext on its standard input, and is expected to write the edited text to its standard output.

Typically the bot is run to perform edits on many pages. The action of which pages to edit is determined by a driver script. The driver script is an ordinary text file consisting of one or more columns. The first column is the page name to edit. Any remaining columns are parameters to be passed to the edit script when editing that particular page. For example, when adding rhymes, the driver script might look like

bag   -æɡ dawn  -ɔːn fawn  -ɔːn thorn -ɔː(r)n ...

You specify the name of this driver file with -data.

You can provide an edit summary for the bot to use with -editsummary. You can have the bot mark its edits as minor with -minor.

Normally, it is an error if the bot tries to edit a page that does not exist. If your script might be creating new pages (perhaps because it is specifically an upload script), you should use --okaytocreate which will suppress the error if a page does not exist. If you're uploading pages which might already exist and which, if so, you do not want to overwrite, you can specify the -mustcreate option which will cause an error if the page does exist.

If there is a pre check script, it is specified with the -precheck option. The pre check script is invoked without arguments, with the (unedited) wiki text presented on standard input.

If the -checkdiffs option is present, wikised will effectively press the "Show changes" button and inspect the diff output. The analysis is currently rudimentary, just a count of the number of lines inserted and deleted. Specify the expected number of inserted lines with -ninsert and the expected number of deleted lines with -ndelete. (It is also possible for the edit script to communicate a variable, per-edit ninsert back to wikised.)

If there is a post check script, it is specified with the -postcheck option. The post check script is invoked without arguments, with the (edited) wiki text presented on standard input.

To avoid overwhelming the Wikimedia servers, the script normally waits a minute between each page edited. This delay can be configured with the -delay option.

To streamline the specification of the several different wikis and logins you might be using this script with, you can use the -loginconfig option which automatically sets the host, base url, username, and password based on a small database of frequently-used values.

To streamline the specification of all the information pertaining to a particular edit task, you can use a "master script". A master script is specified with the -f option, or as the first (plain) argument on the command line.

So there are potentially three different configuration scripts you might want to set up: the login config script, the master script, and the driver script. These three scripts are distinct: the login config script controls which wiki is logged in to, the master script controls which editing task is performed, and the driver script controls which specific pages are edited.

These three things are separated because I want to be able to control them separately. I want to be able to test a new edit script on my home test wiki and then, once it's working, without changing anything else, perform exactly the same edits on the live Wiktionary. Then, once I've performed some edits for one set of pages, I want to be able (again, without changing anything else) to run exactly the same edits on another (perhaps larger) set of pages.

Master script syntax
The master script is a set of keyword/value pairs, one per line. The keywords are:


 * host : set the host, just like the -host command-line option
 * baseurl : set the base url, just like the -url command-line option
 * driverscript : set the data driver script, just like the -data command-line option
 * reason : set the edit summary message, just like the -editsummary command-line option
 * minoredit : indicate minor edits, just like the -minoredit command-line option
 * okaytocreate : allow page creation, just like the -okaytocreate command-line option
 * mustcreate : force new page creation, just like the -mustcreate command-line option
 * delay : set the delay between edits, just like the -delay command-line option
 * checkscript : set the pre check script, just like the -precheckscript command-line option
 * editscript : set the edit script, just like the -editscript command-line option
 * filter : indicate that the edit script is a filter, just like the -filter command-line option
 * postcheckscript : set the post check script, just like the -postcheckscript command-line option
 * checkdiffs : request checking diffs, just like the -checkdiffs command-line option
 * expecteddelete : set the expected number of insertions, just like the -ninsert command-line option
 * expectedinsert : set the expected number of deletions, just like the -ndelete command-line option
 * username : set the user name, just like the '''-user command-line option
 * password : set the password, just like the -pass command-line option