User:LA2

LA2 is the username for Lars Aronsson, Sweden. See w:user:LA2.

da:User:LA2 de:User:LA2 no:User:LA2 ru:User:LA2 sv:User:LA2 uk:User:LA2

{| style="font-size:75%;line-height:75%"
 * colspan=3 | For my cut-and-paste convenience:
 * - valign=top
 * - valign=top

Translations

 * Swedish: sv

Verb

 * }
 * }
 * }
 * }
 * }
 * }
 * }
 * }
 * }
 * }
 * }

Diary
August 26, 2020: I start working on Appendix:Swedish corpus, based on my 2017 presentation.

June 2017: I submit a proposal for a presentation at the Wikimedia Central and East European conference in Warszaw in September. It is approved.

May 2017: I start to contribute to Ukrainian Wiktionary (my user page).

December 14, 2015: CodeCat is renaming several Swedish inflection templates for no apparent reason, leaving bewilderment and fatigue. For example, sv-noun-reg-er becomes sv-infl-noun-c-er.

October 2014: I start to contribute actively to Russian Wiktionary (my user page).

May 4, 2013: Should sometimes read:
 * Ladislav Zgusta, Manual of Lexicography (1971; foreword signed 1968) Google Books
 * C.C. Berg (professor at Leiden), Report on the Need for Publishing Dictionaries which do not to-date exist (booklet, between 1960 and 1962, published by CIPSH, Conseil International de la philosophie et des sciences humaines)

February 2013: I start to contribute actively to Danish Wiktionary (my user page).

January 24, 2013: I introduce and category:Swedish compounds with maskin, as used for displaying Derived terms in maskin. -- Bad idea.

November 19, 2012: Fun photo gallery: 10 Swedish words you won’t find in English: orka, harkla, hinna, blunda, mysa, vabba, duktig, jobbig, gubbe/gumma, mormor/farmor/morfar/farfar (actually 14).

August 27, 2012: I give up all hope about the Norwegian entries in en.wiktionary. Please remind me to stay away if any discussion should come up again.

April 18, 2011: To do: handgemäng, hägn, ohägn, hugnad, misshällighet

April 7, 2011: All the words from this article about common translation errors should be incorporated into Wiktionary.

April 3, 2011: I think I'm done with Swedish form entries for now. When the new XML dump arrived 20110402, Wiktionary contained 87,651 Swedish words. After parsing the XML dump I was able to generate 1521 new Swedish form entries. I have the machinery in place to fill in the missing form entries after each new dump. Now we need to expand the 20,000 Swedish gloss entries to a full Swedish vocabulary. But can that work be automated? How do we add the next 20,000 gloss entries without spending 3 minutes on each? (1000 hours or 25 weeks of fulltime work)

March 20, 2011: When spannen is the definite singular of spann (bucket) and definite plural of spann (set of horses), I'd like to indicate in the form entry which sense belongs to which form. Perhaps "senseid" is the way to do this. Both the form templates and the declension/conjugation templates would have to take the sense ID as an extra parameter. This would be a major change to the 80,000 existing Swedish entries.

March 18, 2011: I create Appendix:Swedish verbs.

March 10, 2011: The new XML database dump shows 80,000 Swedish entries, yet another giant leap forward. My simple script for generating missing form entries has evolved into one that reads the declension and conjugation table template calls and concludes which form entry templates should be called from where. For example in ande should generate  in the page anden. If this form entry template call is found, fine. If not, the wanted form entry is saved as a file, that a modified version of pagefromfile.py can read. If the page doesn't exist, it is created. If it exists, a ==Swedish== entry is appended at the bottom. If a Swedish entry already exists, because "anden" is also the definite form of and, this is logged and I have to edit the existing Swedish entry manually. At least for now, this happens a lot. In some cases, a verb form entry is also an adjective form. In some cases, the form entry exists but uses another template (form of, plural of, ...) or no template at all. Right now I have a backlog of 8,000 entries to go through, or 10 percent of the existing stock. Maybe I should automate the addition of adjective form entries to Swedish entries that don't have an adjective subheading already ... done.

March 2, 2011: The most commonly used Norwegian templates are: (733 calls),  (351),  (221),  (178),  (125),  (101),  (97),  (87),  (85),  (76),  (73),  (71),  (68),  (54),  (51),  (48),  (47),  (41),  (40),  (39),  (33),  (32),  (32),  (30),  (26),  (24),  (23),  (22),  (21),  (18),  (17),  (16),  (15),  (13),  (13),  (12),  (12),  (12),  (11),  (11),  (10),  (10),  (10),  (9),  (9),  (9),  (8),  (8),  (8),  (7),  (7),  (6),  (6),  (6),  (6),  (6),  (6),  (6),  (6),  (5),  (5),  (5).

February 27, 2011: I don't speak French or Italian, but when I saw all these form entries (mostly created by Keenebot2 and SemperBlottoBot) for verbs using the primitive, I started to substitute them to the more structured. See Template talk:conjugation of. I have made the following translations of parameters:

February 25, 2011: In the XML database dump of 2011-02-05, the most common headings for Swedish entries (compare August 21, 2010) are:

As a comparison, the most common headings for all languages (not counting the L2 headings for the language names themselves) are:

The most common combinations and sequences for Swedish sections are:

February 8, 2011: English Wiktionary now contains more Swedish entries (78,985) than Swedish Wiktionary (76,119). The overlap is only 34,178 entries. Swedish Wiktionary has more gloss definitions and English Wiktionary has more form entries, many created by LA2-bot.

February 6, 2011: I should try to incorporate as much as possible of Wikipedia:Swedish Wikipedians' notice board/Terminology into Wiktionary.

February 4, 2011: I set up and create some entries that refer to it.

January 30, 2011: I set up and create some entries that refer to it, mostly in Category:sv:Government.

January 20, 2011: How to extract a list of Swedish headwords from the Swedish Wiktionary:

wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=sv&wikifam=.wiktionary.org&basecat=Svenska&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" | awk '-F\t' '$1==0 {print $2}' | tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort

January 10, 2011: How to extract a list of Swedish headwords:

wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=en&wikifam=.wiktionary.org&basecat=Swedish+language&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" | awk '-F\t' '$1==0 && $3!="Translation_requests_(Swedish)" && $3!="Translations_to_be_checked_(Swedish)" && $3!~/derived_from_Swedish/ {print $2}' | tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort

November 19, 2010: I import from sv.wikipedia.

November 15, 2010: I think there are now 20,000 Swedish entries in en.wiktionary.org, which is twice as many as the beginning of this year. This has been achieved mainly by adding form entries. Statistics here. I have added more word forms, based on word frequency lists (see corpus coverage in the August 31 entry below). I have focused less on including all defintions and all forms for every word. What I have tried to do is to create links between the entries, so compounds link to their component words. Hopefully, this will attract more users who then start to fill in the missing definitions (second usage of words) and forms. This philosophy, known as eventualism, is similar to creating stub articles in Wikipedia, hoping that later users will fill in more facts. I'm not a general subscriber to that idea, but it can be a useful approach in the early stages of a project. A useful Swedish dictionary probably needs 120,000 basic forms (and half a million form entries), which is ten times more than en.wiktionary has today and five times more than sv.wiktionary has.

September 18, 2010: There are 51,318 pages that call, or. The page with most translations is be (607 translations), followed by you (447), set (438), love (421). Halfway down the list we find words like toner and toadstool (4 translations each). The most translated words that don't yet have any Swedish translation (or where the translations didn't use these templates in the database dump of 2010-09-12) are: judge (161), 下 (156), heat (154), jump (153), spread (141), stroke (140), proper (137), cry (131), behind (130), desire (126), nose (125), round (123), article (122), double (121), taste (117), end (117), situation (116), shut up (116), male (116), Albanian (116), draft (112), chest (112),  e-mail (110), truth (108), storm (108), squeeze (105), same (105), job (105), exit (105), 牛 (104), cheap (103), steer (102), prayer (100), entry (100), cinema (100), split (99), Gypsy (99), care (99), waste (98), sole (97), hook (97), chat (97), welcome (96), believe (96), coach (95), short (94), bend (94), herd (91), finish (91), sit (90), return (90), pickle (90), drill (90), dragon (90), cum (90), cherry (90), butt (90), British (90), masculine (88), correct (88), icon (87), gun (87), gentleman (87), freedom (87), beginning (87), separate (86), Moon (86), account (86), justice (85), I'm Jewish (85), definition (85), puzzle (84), atmosphere (84), corner (83), Macedonian (81), lime (81), lady (80), decline (80), damn (80), cardinal (79), plague (78), interest (78), dash (78), auxiliary (78), study (77), newspaper (77), hi (77), criminal (77), cement (77), bundle (77), bug (77), appropriate (77), agree (77), vacuum (76), swarm (76), reach (76), poetry (76), late (76), harmony (76), custom (76), chip (76), certainly (76), authority (76), rear (75), pumpkin (75), discharge (75), silk (74), dinner (74), crash (74), Commonwealth of Independent States (74), cheat (74), accept (74), walnut (73), transfer (73), grain (73), ceremony (73), abate (73), victim (72), vagina (72), type (72), prophet (72), increase (72), contact (72), constitution (72), constellation (72), budget (72), application (72), soldier (71), plot (71), painting (71), crew (71), brass (71), thunder (70), roast (70), psychology (70), communism (70), brake (70), witch (69), saddle (69), neighbour (69), vault (68), shallow (68), perfume (68), particle (68), harvest (68), electronic (68), coral (68), camp (68), amount (68), odd (67), occupation (67), how much (67), device (67), chamber (67), bust (67), association (67), airplane (67), track (66), stab (66), spice (66), pomegranate (66), crust (66), comfort (66), aeroplane (66), random (65), plough (65), no way (65), married (65), foundation (65), execution (65), channel (65), breath (65), arrest (65), studio (64), Myanmar (64), fail (64), enter (64), dish (64), actual (64), abrupt (64), wizard (63), Vladimir (63), substantial (63), splinter (63), reply (63), purple (63), paddle (63), nucleus (63), notice (63), illusion (63), how are you (63), deliver (63), dairy (63), counterfeit (63), blackmail (63), arrive (63), wardrobe (62), stuff (62), seat (62), not at all (62), deliberate (62), cylinder (62), crop (62), advertisement (62), zone (61), tower (61), source (61), sexuality (61), litter (61), gravity (61), fill (61), composition (61), business (61), bully (61), asshole (61), trial (60), sponge (60), sigh (60), resolution (60), orthography (60), mount (60), Java (60), implement (60), hood (60), half (60), habit (60), forever (60), anyway (60). Of course there can also be many definitions of be or you that don't have Swedish translations.

September 7, 2010: Some Unix/Linux shell commands:

When all of the above are combined, I get a list of all words occurring in the Swedish example sentences, sorted by frequency. And so I can check that Wiktionary provides explanaitions for all or most of them. The Swedish example sentences constitute an 84 kbyte e-text, having 13,255 words of which 4819 are unique. Wiktionary has Swedish entries for 71.1 percent of the occurrences. This is rather low. Part of the explanation is that some text is in English, because the example sentences are incorrectly formatted and contain templates and URLs.

September 4, 2010: Inserting the templates l and t: python replace.py -family:wiktionary -lang:en -xml:enwiktionary.xml -summary:"l:sv, t:sv" -regex -recursive \ '\[\[#Swedish\|([^\]]+)\]\]' '' \ '\[\[([^#\|\]]+)#Swedish\|[^\]]*\]\]' '' \ '(\* *Swedish:.*?)\[\[([^\]]*)\]\]' '\1\2' \ '(\* *Swedish:.*?){{l\|sv\|' '\1{{t|sv|' \ '(\* *Swedish:.*?{{t[^}]*)}} {{([cfmnp](\|[cfmnp])*}})' '\1|\2'

August 31, 2010: The Swedish Bible of 1917 contains 769,316 words of text, using a vocabular of 26,990 words and word forms, including some capitalized words at the beginning of sentences. Of this vocabulary, 3802 words or 14 % have Swedish entries in en.wiktionary. However, since these 14 % contain many of the most common words, they make up 74 % of the text. This number (74 %) is the definition of the dictionary's coverage of this corpus of text. If you pick a random page, line and word in the Bible, there's 74 % chance that word has a Swedish entry here. 74 % is a very low coverage for a dictionary, and a sign that we have a very long way to go.

Here's how it works on the two first verses: ''i begynnelsen skapade gud himmel och jord. och jorden var öde och tom, och mörker var över djupet, och guds ande svävade över vattnet.'' (Genesis 1:1-2) Of these 24 words, 5 are "och", 2 are "var", 2 are "över". These three words alone make up 9 of the 24 words or 37% of the text.

(The Wikipedia corpus used here contains some garbage that will never be covered by the dictionary, e.g. Wikipedia user names, occasional talk pages in English, and some remaining wiki markup, so the coverage percentage will inevitably be lower. It's still interesting to have a really large corpus to study.)

(* No database dump exists for 2010-12-31, but a preliminary dictionary was extracted.)

(** Dictionary generated by category wget. See diary entry for January 10, 2011.)

August 28, 2010: I think it would be helpful to know how common a word is. This can be determined by computing its rank in some large body of text, putting the most frequent word ("the" for English, "och" for Swedish) at position 1. This is what template {{temp|rank}} does, for example able has rank 391, but I think a logarithmic scale would be more informative than a linear one. Color graphics could indicate how "hot" a word is, but with the cool and neutral black, white and light-blue appearance of Wiktionary, the colors must be restricted to a very small area:

▲ rank 8 ▲ rank 64 ▲ rank 512 ▲ rank 4096 ▲ rank 32,768 ▲ rank 262,144

August 21, 2010: Many open issues:
 * So far, only 10,000 entries in Swedish. Redefining templates is easier now than after many more entries have been created.
 * How should templates be named? Is the -reg-/-irreg- part of the name really necessary? Can we do with fewer templates and shorter names?
 * How do we create entries for all inflected forms? Can this be automated?
 * Can conjugation/declension tables handle passive verbs? Subjunctives? All adjectives?
 * Should template parameters be standardized? Now they are different everywhere: 2=, stem=, sg-def-gen=
 * Can templates support irregular verbs, so avgå, tillstå kan be based on gå, stå?
 * Can templates support prefixed and suffixed words, e.g. "gå an/gick an" smarter than today?
 * Should templates for Swedish words be standardized across languages of Wiktionary?
 * Old spelling (elf/älf/älv) can be handled, but how should we handle giva/ge, hava/ha?

The most common headings in Swedish sections are:

10969 Swedish       533 Derived terms          72 Compounds          37 Ordinal number 6402 Noun          319 Adverb                 72 Abbreviation       31 Conjunction 2618 Pronunciation 251 Usage notes            63 Cardinal number    25 Proverb 1705 Verb          251 Antonyms               58 Conjugation        22 Verb form 1520 Related terms 214 Alternative spellings  54 Idiom              22 Descendants 1300 Adjective     100 Etymology 2            52 References         17 Etymology 3 1247 Proper noun   100 Etymology 1            51 Preposition        16 Hypernyms 1013 Etymology      96 Inflection             48 Phrase             14 Homophones 995 See also       88 Interjection           41 Alternative forms  12 Hyponyms 789 Synonyms       83 Pronoun                39 Suffix             11 Phrases

The most common heading structures are listed below. "((" means heading level 2.

3158 ((Swedish(Noun)))                              57 ((Swedish(Pronunciation;Noun(See also)))) 831 ((Swedish(Proper noun)))                       56 ((Swedish(Pronunciation;Noun(Derived terms)))) 660 ((Swedish(Verb)))                              47 ((Swedish(Abbreviation))) 565 ((Swedish(Pronunciation;Noun)))                45 ((Swedish(Pronunciation;Adjective(Related terms)))) 505 ((Swedish(Adjective)))                         43 ((Swedish(Noun;Verb))) 290 ((Swedish(Noun(Related terms))))               42 ((Swedish(Pronunciation;Noun;Verb))) 206 ((Swedish(Etymology;Noun)))                    41 ((Swedish(Verb(See also)))) 168 ((Swedish(Noun(Synonyms))))                    37 ((Swedish(Alternative spellings;Proper noun))) 168 ((Swedish(Noun(See also))))                    34 ((Swedish(Pronunciation;Noun(Synonyms)))) 156 ((Swedish(Pronunciation;Verb)))                34 ((Swedish(Pronunciation;Adverb))) 142 ((Swedish(Pronunciation;Noun(Related terms)))) 34 ((Swedish(Alternative spellings;Noun(Related terms)))) 131 ((Swedish(Pronunciation;Adjective)))           33 ((Swedish(Phrase))) 121 ((Swedish(Verb(Related terms))))               32 ((Swedish(Adjective(See also)))) 112 ((Swedish(Etymology;Proper noun)))             29 ((Swedish(Adjective;Noun))) 101 ((Swedish(Proper noun(Related terms))))        28 ((Swedish(Pronunciation;Verb(See also)))) 81 ((Swedish(Adjective(Related terms))))          28 ((Swedish(Etymology;Noun(Related terms)))) 73 ((Swedish(Adverb)))                            27 ((Swedish(Etymology;Verb))) 72 ((Swedish(Pronunciation;Verb(Related terms)))) 27 ((Swedish(Etymology;Adjective))) 72 ((Swedish(Etymology;Pronunciation;Noun)))      26 ((Swedish(Verb(Synonyms)))) 62 ((Swedish(Noun(Derived terms))))               26 ((Swedish(Interjection)))

Starting to introduce ====Declension==== and ====Conjugation==== on a big scale, will change this pattern.

It seems I have a bot command that works:

python replace.py -family:wiktionary -lang:en -cat:'Swedish verbs' -summary:'Conjugation heading' -regex -dotall \ '(===Verb===\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)'   '\1\5====Conjugation====\n\3'    \ '(====Verb====\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)' '\1\5=====Conjugation=====\n\3'

August 20, 2010: In the database dump of 2010-08-12, there were 6341 calls to templates named sv-. Kinds are conj = conjugation table for verbs, decl = declension table for adjectives and nouns, form = referring from an inflected form to the main entry, infl = one-liner inflection pattern.

August 19, 2010: There are currently 81 templates named sv-... (too many for my taste), having the following parts of their names: