User talk:Jberkel

Catalan pronunciations
Hi, just a note to be careful when adding Catalan pronunciations. For example, you added a pronunciation of to esquetx, which is wrong (it should be ) and unlikely in any case, since  generally only occurs with inheritances and some old borrowings, and esquetx is a recent borrowing from English. I have documented the sources of pronunciation in the documentation to ca-IPA; in particular, only trust the DCVB for Balearic pronunciations and don't trust cawikt at all. Benwing2 (talk) 02:34, 28 January 2024 (UTC)


 * Ok, I thought cawikt was fairly reliable. Btw, thanks for your great work on the Catalan corner! Jberkel 10:42, 28 January 2024 (UTC)

Statistics
Hi Jberkel, willst du noch einen neunen Update der Statistik machen? Dein letzter stammt schon wieder vom 1. Juli. Ja, ich weiß dass es eine Menge Zeit und Computerkraft beansprucht, aber ich denke wir alle möchten das einfach schon mal wieder wissen. :) Steinbach (talk) 17:18, 22 February 2024 (UTC)


 * @Steinbach Hallo, würde ich gerne regelmäßig machen, aber es gibt immer noch Datenprobleme mit den HTML-Dumps: T305407. Die letzten einigermaßen kompletten Daten sind vom letzten Juli. Die WMF-Leute arbeiten daran, aber irgendwie dauert das ewig, bin schon ständig am nachfragen :( Jberkel 17:42, 22 February 2024 (UTC)
 * @Steinbach Gibt frische Stats… Jberkel 00:53, 5 June 2024 (UTC)

HTML Dump
Hi, I saw your posts complaining about the lack of HTML dumps as I had the same issue. I ended up creating my own HTML dump using the API to rapidly download millions of entries. I used the 20240220 XML dump as a base so that the two dumps would include exactly the same revisions. Note that the same wikitext can produce different HTML code at different points in time, so I can't guarantee that the page looks exactly as it did at the time of the XML dump.


 * Pages included: non-redirects in namespaces 0 (main) and 118 (reconstruction)
 * Number of lines: 7,952,575
 * Time generated: February ‎20, ‎2024, ‏‎7:49:52 PM to ‎February ‎22, ‎2024, ‏‎1:16:18 AM (EST)
 * Uncompressed size: 112,213,194,308 bytes
 * Compressed size: 5,482,140,342 bytes

Would you be interested in the code or the dump itself?

Ioaxxere (talk) 20:05, 22 February 2024 (UTC)


 * @Ioaxxere Lol, I'm close to starting a project myself, given the glacial progress on the WMF side. Yes, I'm interested, how did you get the HTML, how long does it take? Is it the Parsoid rendered version which is used in the HTML dumps? If you want we can join forces and run it as a community project. Jberkel 09:44, 23 February 2024 (UTC)

The script works by grabbing HTML data using a revision ID. For example: https://en.wiktionary.org/w/api.php?action=parse&oldid=65853771&format=json. I'm not sure what parser is used but it seems to correspond with "view page source" in my browser. Here is the code:

Then I verified the output with this code:

Which produced:

These correspond with pages in the XML dump that have recently been deleted.

I don't have the time/resources to generate these on a regular basis, but you're welcome to adapt this code for your purposes!

Ioaxxere (talk) 19:56, 23 February 2024 (UTC)


 * Oh god, I just realized that adding  to the API query gives *far* better data. Time to rerun... Ioaxxere (talk) 20:09, 23 February 2024 (UTC)
 * Cool, thanks! We could run it on WMF infrastructure. Great to see that 50 lines of Python yield better results than the WMF's buzzword soup of Kafka, DAGs and what have you… How long does it take to do a full run? Jberkel 15:20, 26 February 2024 (UTC)
 * nm, you already had in your post, almost 2 days… :) Jberkel 15:57, 26 February 2024 (UTC)
 * Even if the WMF some day manage to produce useful dumps again, we'll still need wiki-specific namespaces such as Reconstruction, so it'll be useful to have some way of generating them ourselves. Jberkel 15:58, 26 February 2024 (UTC)

ScribuntoUnit vs. UnitTests
I just discovered there are two unit testing frameworks here, Module:UnitTests used by everyone but you, and Module:ScribuntoUnit used by you. The former is older than the latter, so I'm not sure why you imported the latter from Wikipedia, but I think we should consolidate. Can you think about converting your unit tests to use Module:UnitTests? Benwing2 (talk) 20:34, 10 March 2024 (UTC)


 * Hi, just wondering if you got my msg. Can you at least clarify why you imported and started using Module:ScribuntoUnit in preference to our own module? BTW I just discovered a third unit test framework, Module:QFQ/UnitTests, used only on Module:mnw-translit. Benwing2 (talk) 07:43, 14 March 2024 (UTC)
 * Hi @Benwing2, sorry had short Wiktionary hiatus. It's been a long time (~ 10 years), but I think when I first looked at Module:UnitTests it was a spaghetti mess and didn't have the features I wanted. That's probably no longer the case, and I agree it's better to standardize on one framework. Jberkel 09:27, 15 March 2024 (UTC)

catalogue raisonné
Wwoww, Jberkel, you're fast. Wanted to cite the same Guardian passage here, and it was already there ... MistaPPPP (talk) 12:55, 19 March 2024 (UTC)

Apologies
I need to apologise to you also, about my simple edit in my archaic paragraph about certain 'etymologies that discredit Wiktionary' that it should have completely disrupted the edit section including yours - there should really be mechanism in place to stop this from happening, since any innocent editor could well make a similar mistake that if not detected quickly as both Surjection and I did, it could cause linguistic mayhem! Regards, Andrew Andrew H. Gray 11:40, 29 March 2024 (UTC)

On ass...
What Doyle said was about this:

https://en.m.wiktionary.org/wiki/arse#English

Here, ass is another way of spelling arse (as in dumb). Lunatone3000 (talk) 22:24, 4 April 2024 (UTC)

The reputation system
You mentioned this in a beer parlour comment about "the reputation system, for good or ill".

The reputation system is for ill.

There are editors like me whose behavior is scrutinized. And people are willing to make inaccurate claims about how many or few productive edits I've

Then there are other editors who have almost no ability at all to get along with other editors or admit wrongdoing. But, because they're perceived as being essential to the project, it's unacceptable to question their opinions or behavior. Purplebackpack89 13:46, 5 June 2024 (UTC)


 * I'd say there's a mix of different people finding problems with your edits: editors who had already mentally "blacklisted" you (Equinox, putting you in the "moron" box), WF (creating RFDs "for the lulz" to create havoc), and more level-headed/diplomatic editors who see real CFI/process-related issues. As -sche pointed out, because there are so many different editors involved, it's difficult to conclude that *all* of them are here to harass you. And because this has been going on for years, patience/good will/faith is running low… Jberkel 14:59, 5 June 2024 (UTC)
 * "Because there are so many different editors involved" makes it feel like I'm being harassed regardless of why they are doing it. Perhaps unwittingly, Equinox name-calling and WF/Denazz trolling made it harder for somebody like Benwing to legit address my edits.  Knightwho is somewhere in between.  While he may also legit want to clean up the project, he has a long and well-documented history of being confrontational.  And the other problem is that Benwing and Knight could've maybe noticed that I felt put upon at the moment and maybe waited, say, a couple of weeks until things had died down.  There wasn't anything they were doing that had to be addressed immediately.  They didn't do that. Purple</b><b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b></b> 16:37, 5 June 2024 (UTC)

Wanted
User:Jberkel/lists/wanted hasn't bin updated4a while. Can we get it bac, pls? Denazz (talk) 22:28, 5 June 2024 (UTC)


 * now iz bac. zorry for ze inconviniance caused. Jberkel 09:24, 6 June 2024 (UTC)

List user subpages
Many of the various long lists on user subpages of yours seem to have served their purpose and/or to no longer be in active use. Also, the same term often appears on multiple subpages, differing only by when they were compiled. The result of this is that using "&sort=incoming_links_desc" in the searchbox to find entries relatively important to other Wiktionary entries does not give a good list. My user pages have had the same effect. I have consequently used to disable entire subpages. If you are too busy, let me know which pages are important (of what rule to follow to determine importance) so I could disable the right pages, if there are any. You are not the only one with such subpages, but yours are the ones I most notice. DCDuring (talk) 22:23, 17 July 2024 (UTC)


 * Are you referring to the wanted entries lists? Yes, they should probably be deleted, but I haven't had the time to submit them all for deletion (needs to be automated, there are so many of them). Maybe some admin with scripting skills can delete them directly? Jberkel 22:26, 17 July 2024 (UTC)
 * Yes, them's the ones. Did you want to extract any of the redlinks in any of them? DCDuring (talk) 23:00, 17 July 2024 (UTC)
 * I believe, based on a simple test on made-up subpages of mine, that you can delete all the subpages of a top subpage at once by deleting the top subpage. That wouldn't take long. I don't think adminship is required. I was wrong. It seems to be as you said. DCDuring (talk) 23:09, 17 July 2024 (UTC)
 * Could you please mass-delete the old wanted entries lists (and dependent data modules)? Perhaps everything before 2024. Jberkel 07:07, 18 July 2024 (UTC)
 * @Jberkel Can you supply me with a list (at least in schematic form, it doesn't have to include every single ifle)? Benwing2 (talk) 07:32, 18 July 2024 (UTC)
 * every list has two pages:
 * User:Jberkel/lists/wanted/YYYYMMDD/[lang-code]
 * User:Jberkel/lists/wanted/YYYYMMDD/[lang-code]/data
 * Language codes are in User:Jberkel/lists/wanted/languages.
 * Timestamps to delete:
 * 20230701, 20230601, 20230301, 20230201, 20230101,20221001, 20220820, 20220601, 20220501, 20220401, 20220320, 20220301, 20220120, 20220101, 20211201, 20211101, 20211001, 20210901, 20210801, 20210701, 20210601, 20210501, 20210401, 20210101, 20201101, 20200401, 20200201, 20200120, 20200101, 20191201, 20191101, 20191020, 20191001, 20190901, 20190801, 20190701, 20190620, 20190601, 20190501, 20190420, 20190401.
 * + for each timestamp, the overview page: User:Jberkel/lists/wanted/YYYYMMDD
 * – Jberkel 13:33, 18 July 2024 (UTC)