Wiktionary:Votes/bt-2023-09/User:KamusiBot for bot status

User:KamusiBot for bot status
Nomination: I hereby request the Bot flag for User:KamusiBot for the following purposes:

I (User:tbm) have done a lot of supervised QA fixes recently with the help of Python scripts that show me a diff. Some of these should have been run under a bot account and I have more cleanups pending that really ought to be done with a bot account.

I've documented the kind of QA fixes I've done plus some example edits (manually or supervised with the help of scripts).

Current
I'd like to run the following tasks:


 * Apply more whitespace and cosmetic fixes (Examples:, , , )
 * Normalize entries (Examples:, , , , )
 * Convert Arabic roots to . I'll coordinate with Fenakhay on this. (Example: )
 * Standardize category template usage on for Swahili

Future
Possible tasks for the future:


 * More cleanups and fixes. I have a long list of issues
 * Add information from other language Wiktionary (e.g. hyphenation patterns)
 * Create Swahili forms (verb forms, noun plurals, etc)
 * Generate Swahili entries from Google Docs (to work with volunteers who are not familiar with Wiktionary)

Approach
I'll follow a conservative approach since my scripts don't take every corner case into account (yet?):


 * 1) To start with, for the majority of edits, I'll have scripts generate a patch file (diff) which I'll review manually.  The bot can then apply the changes.
 * 2) For edits where I'm reasonably sure that a pattern covers it fully, I will make automatic edits (after sufficient testing).  For example, Arabic root conversion based on the regex  . (This is the most simple case; there are harder ones.)

Mistakes
I make mistakes but I promise to clean up after myself. Some mistakes I've made so far:


 * Arabic root conversion didn't take into account pages with two different roots and created two boxes with the same root.
 * A off-by-one error that meant a stray character remained.
 * I can't remember what went wrong but I introduced a syntax error when trying to fix a hyphenation pattern.
 * Doing supervised editing but not paying enough attention and removing "duplicate" words from non-English text where these are probably not duplicate.
 * Removing a duplicate word from a quote where the duplicate word is in the original (maybe we need to add "sic"?)

Source code
I use Python and Pywikibot. I intend to clean up my code and publish it under a FOSS license on GitHub (famous last words, I know, but I'll do it).

Naming
I picked "KamusiBot" rather than "tbmBot" because I wanted to run the bot with a friend. He's busy but I like the name (Kamusi is Swahili for dictionary). Erutuon runs ToilBot so UserBot doesn't seem like a hard requirement.

Schedule:
 * Vote starts: 13:39, 23 September 2023 (UTC)
 * Vote ends: 23:59, 30 September 2023 (UTC)
 * Vote created: tbm (talk) 13:39, 23 September 2023 (UTC)

Discussion:
 * [[Image:Wikt rei-artur3.svg|20px]] Wiktionary talk:Votes/bt-2023-09/User:KamusiBot for bot status

Support

 * 1)  John Cross (talk) 07:31, 24 September 2023 (UTC) I appreciated the open approach and all the context provided. John Cross (talk) 07:31, 24 September 2023 (UTC)
 * 2)  Vininn126 (talk) 11:06, 24 September 2023 (UTC)
 * 3)  You seem careful and I like your approach, but I'd advise you not to completely automate changes based on language code mismatches like diff and diff. Mismatched language codes are often a sign that someone has copied and pasted that data from someplace else and, if they didn't review the language code, they may not have reviewed the rest of the data. In short, these will need some sort of manual review, often by someone familiar with the specific language, because it's not safe to assume that simply changing the language code is enough to correct it, nor is it safe to assume that it's incorrect and delete it. -JeffDoozan (talk) 14:42, 24 September 2023 (UTC)
 * Yes, absolutely. I will continue to make such edits with my regular account in a supervised manner. I just want to move cleanups and similar edits to the bot. tbm (talk) 06:49, 25 September 2023 (UTC)
 * I'll also add, you removed synonyms in different languages in some cases. This is probably not a good idea; the synonym info is potentially useful and should be moved instead of just deleted. Benwing2 (talk) 07:34, 28 September 2023 (UTC)
 * 1)  I second what JeffDoozan said. Megathonic (talk) 17:52, 24 September 2023 (UTC)
 * 2)  P. Sovjunk (talk) 02:21, 25 September 2023 (UTC)
 * 3)  — Fenakhay ( حيطي · مساهماتي ) 20:39, 25 September 2023 (UTC)
 * 4)  Fay Freak (talk) 21:47, 25 September 2023 (UTC)
 * 5)  Jberkel 21:54, 25 September 2023 (UTC)
 * 6)  Netizen3102 (talk) 22:18, 27 September 2023 (UTC)

Abstain

 * 1)  I don't feel I have enough context on how this bot will work to know whether to support it. It would help to see some of the code. (Also FYI I have already written a script to templatize and standardize topic categories to C; see .) Benwing2 (talk) 07:46, 28 September 2023 (UTC)
 * 2) . DonnanZ (talk) 18:31, 30 September 2023 (UTC)

Decision
Passed 9-0-2. I'll add the flag. &mdash; S URJECTION / T / C / L / 12:00, 1 October 2023 (UTC)