Module talk:bg-verb

expand accelerated vn
Hi. Can "vn" be expanded to "verbal noun" for accelerated entry purposes? e.g. -> ? --Anatoli T. (обсудить/вклад) 06:55, 29 April 2020 (UTC)
 * I changed it to read "vnoun", which is a recognized abbreviation in for "verbal noun" (and hence will display as "verbal noun"). Hopefully that will work. Benwing2 (talk) 00:30, 30 April 2020 (UTC)

display of reflexive verbs
(moved from Talk:къпя)

Is there a way to display the reflexive conjugation without including се in the links? Ultimateria (talk) 17:29, 29 April 2020 (UTC)
 * Could you explain what you mean and what it should it look like in your opinion? --Anatoli T. (обсудить/вклад) 22:30, 29 April 2020 (UTC)
 * In the second table rather than link to, output   instead. I haven't compared every form between the two tables, but any one that's just X + се would fail RFD, right? I think the table at se la ramener handles reflexive forms in an ideal way. Ultimateria (talk) 23:19, 29 April 2020 (UTC)
 * Ultimateria, I originally implemented exactly what you proposed but then changed it to the current scheme. This was because of cases like, which exist only as reflexives. Currently uses the old inflection table, which does link e.g. just , and in fact that entry exists with the definition "Feminine indefinite past active aorist participle of гордея се". But that seems wrong to me; гордя́ла by itself is not the feminine singular indefinite aorist participle of , it's гордя́ла се that's the actual form. Ultimately I think it shouldn't matter so much whether a reflexive is written as one word or two words, what should matter is whether the reflexive construction is SOP or not. In fact this is the logic we use in Russian, where we don't generally include reflexive verbs that are solely the SOP passive or reflexive of the equivalent non-reflexive verb. Benwing2 (talk) 00:50, 30 April 2020 (UTC)
 * Yes, we need to discuss and make clearer policies on reflexive verbs.
 * I have started a new related discussion Beer_parlour/2020/April. Please join.
 * I disagree on Russian passive reflexive verbs. Since the particles -ся and -сь are always written together, they are considered words and lemmas, e.g. and . They are definitely to pass CFI. --Anatoli T. (обсудить/вклад) 00:59, 30 April 2020 (UTC)

warnings when trying to convert old conjugation templates
Please see User:Benwing2/convert-bg-conj-warnings. I wrote a script to convert old bg-conj-* uses to. Out of 929 verbs, it output 148 warnings. 34 of those are due to the need to specify the type of verbal noun, but most of them are due to errors of various sorts in the existing headword or conjugation templates. Help is appreciated in fixing some of them up. Benwing2 (talk) 02:28, 30 April 2020 (UTC)
 * Wow. There are a lot of warnings. Aspects and stresses are in dictionaries but it will takes some time to add/fix them. Leave this page there, so that we could work on. BTW, all verbs in -ирам are biaspectual and stressed on -и́рам. This dictionary https://slovored.com/accent/ is probably faster for checking accents. --Anatoli T. (обсудить/вклад) 02:37, 30 April 2020 (UTC)
 * I'm still not 100% comfortable with the numbers representing conjugation classes, any hints, documentation would be helpful but I am learning.
 * I have just fixed it has FOUR conjugation tables for impf, pf and for reflexive forms. Does that look OK? Maybe  the class and aspect should be displayed on top of the table as with Russian verbs, such ? --Anatoli T. (обсудить/вклад) 02:56, 30 April 2020 (UTC)
 * See my comment there, it should be OK to use  for the aspect in, so you'll only need two tables. Benwing2 (talk) 02:58, 30 April 2020 (UTC)
 * See Template:bg-conj/documentation, I added a section on all the numbered classes and their characteristics. I'll add class and aspect to the top of the table. Benwing2 (talk) 03:27, 30 April 2020 (UTC)

Verbal nouns
Do verbal nouns normally have plurals? Currently I'm listing plurals for all verbal nouns but somehow I have the feeling mostly they don't exist, e.g. for do  and  exist? Also does have a verbal noun? Neither nor  are listed in RBE and seem rare at best. Benwing2 (talk) 03:50, 1 May 2020 (UTC)
 * I think the short answer is yes, all verbal nouns have or should have plurals, rare or even theoretical (at worst). No harm in including them, IMO. --Anatoli T. (обсудить/вклад) 03:57, 1 May 2020 (UTC)
 * Theoretically they do. However, they are declined like -ие stem (except in the rare cases where they have been reanalyzed as nt-stem neuter). For example, both (of native origin) vs  (of Church Slavonic/Russian origin) have  as plural. PS The verbal noun of  is, but it is obsolete. The CS form  is standard. PS2 There is no such a word like . It would have been the verbal noun of an intensive  <  if it existed. Instead, a long ablauted  is attested. Безименен (talk) 11:00, 1 May 2020 (UTC)

past passive participle of зная/знам, позная
What are the past passive participles (PPP's) of зная/знам and позная? chitanka says the PPP of зная/знам is зна́ен, but lists it as зна́ян (the expected form). RBE doesn't mention any irregular PPP for зная. All three agree that the PPP of позная is позна́т. Benwing2 (talk) 16:43, 2 May 2020 (UTC)
 * I get almost no hits for "знаян" in Google books for Bulgarian texts (mixing with other Bulgarian only words). It must be irregular and "знаен" is the correct form. --Anatoli T. (обсудить/вклад) 03:34, 3 May 2020 (UTC)
 * **знаян should be a mistake, probably generated by a generator function. This would have been the expected past passive participle of *znajati > **знаям, **знаяш, **зная. The past participle of is, while the one of  /from an athematic declension -м, -ш, -т/0, etc. see 🇨🇬 for comparison/ is technically  . This form has been lost in the standard language, but its derivative  is used nonetheless. Безименен (talk) 09:56, 3 May 2020 (UTC)

Non-lemma forms on same page as corresponding lemma
I'm trying to clean up ===Pronunciation 1=== headers. Generally I believe we should split by Etymology, not pronunciation. There's no problem having multiple pronunciations in a single etymology section; we can just use 1 as appropriate. This is what we do for Russian. However, this brings up an issue: Many of the cases with ==Pronunciation 1==, ==Pronunciation 2== are cases like or  where the same page lists both a lemma and some non-lemma forms that belong to the same lemma but are pronounced differently. When I wrote a bot to generate Russian non-lemma forms, my policy was to skip generating non-lemma forms that would end up on the same page as the corresponding lemma, even if pronounced differently. Do we want such non-lemma forms? If not I will gladly delete them as it will clean up the lemma pages. Benwing2 (talk) 03:32, 3 May 2020 (UTC)
 * I notice this as well and I believe I removed one non-lemma. I think we should do what you suggest (no non-lemma for the same term on lemma page) but we'll have to add a pronunciation with annotations. There are cases when the forms have the same stress but the pronunciation is still different. Typically, when "а" (both stressed and unstressed) is pronounced as "ъ" (or "я" as "ьъ" - a non-existent combination but I hope you know what I mean). --Anatoli T. (обсудить/вклад) 03:40, 3 May 2020 (UTC)
 * Yes, I know what you're referring to. Verbs ending in stressed а́ or я́ will have lemma forms where the а́ is pronounced as ъ́ (and similarly for я́) but will often have 2nd/3rd singular aorist forms that are spelled the same with the same stress but pronounced as written. However, if we follow the policy of not having such non-lemma forms on the same page, this won't be an issue because the non-lemma forms won't be included. (Maybe though in this case they should be included to make it clear how to pronounce the non-lemma forms; not sure.) Benwing2 (talk) 04:01, 3 May 2020 (UTC)

Deleting гордееш, гордее, etc.
I'm thinking maybe we should delete non-lemma forms like гордееш, гордее. These are currently defined as non-lemma forms of гордея се but they aren't really what they claim to be. гордееш isn't the 2nd singular present indicative of гордея се; rather the correct form is гордееш се. It's true that other words may intervene between гордееш and се (at least I think), and I think it's even possible for the се to go before гордееш. But гордееш can only occur when се also occurs. Alternatively, move гордееш etc. to гордееш се etc. and clean up the entry; that can be done by bot but it's more work. Benwing2 (talk) 04:05, 3 May 2020 (UTC)
 * Reflexive verb forms are going to be a pain in the neck. User:-sche suggested to use the forms without the се as . I think it's not a bad idea. I think we'll have to come with a format for both with and without the reflexive particles. For reflexive verbs when they are only used with се and for verbs with and without it and for each inflected form. --Anatoli T. (обсудить/вклад) 04:16, 3 May 2020 (UTC)
 * I could potentially add a flag to to indicate whether to link reflexive forms as a whole or link the non-reflexive portion of them. As for things like гордееш are you suggesting it get defined as  and define горде́еш се as the 2nd singular present indicative of горде́я се? Benwing2 (talk) 04:22, 3 May 2020 (UTC)


 * in my view, maybe, to avoid duplications:
 * should have, something like see on the definition line and  in the Conjugation section.
 * should have, the proper definition and a proper conjugation. Perhaps the same should be done for verbs where both transitive and reflexive forms exist.
 * Yes, the inflected forms like should have only in.
 * Yes to all inflected forms with се like but I don't know how to deal with particle the positioning, perhaps a usage note, a superscript saying that it may take various positions in a clause? --Anatoli T. (обсудить/вклад) 11:18, 3 May 2020 (UTC)


 * Yeah, especially if other words can intercede between the parts, this seems like exactly the kind of thing soft redirects like were designed for, because we can reasonably expect that language learners (and even people who aren't learning the language but are trying to work out a particular string of text, something I do sometimes!) will look up individual words. Whether горде́еш should point to горде́еш се or to the lemma is a good question; perhaps we should create a new template that says just says "See [lemma]" (or something). This would be intelligible as long as the lemma then had a conjugation table in which one could find горде́еш contained in the string/form горде́еш се. (I've also been a proponent of Template:used in phrasal verbs, and of sense-line links whenever a sense of x only exists as the x or in the plural, like on message.) - -sche (discuss) 16:15, 3 May 2020 (UTC)

full compound paradigms
Can one of the native speakers here review User:Benwing2/test-bg-conj-full? Here I have full paradigms that expand out all the compound forms. I have a few questions: Thanks! Benwing2 (talk) 01:59, 4 May 2020 (UTC)
 * 1) Under ща, the future perfect indicative has two forms, e.g.  and  for the first person masculine singular positive. The corresponding tables for съм and бъда don't list the form with бъда as an auxiliary. Is this correct or are the forms with бъда as an auxiliary valid and just left out?
 * 2) The tables for бъда and съм have a gap where "renarrative present and past perfect" should be. This gap is filled in the ща table; the only gap in that table is "dubitative present and past perfect". Is the "renarrative present and past perfect" really missing for бъда and съм?
 * 3) Similarly, the tables for бъда and съм have gaps under "dubitative aorist" and "conclusive present and past perfect" which are filled in the ща table. Are these gaps real?
 * 4) Finally, съм (but not бъда or ща) has a gap under "dubitative present and imperfect". Is this gap real?
 * Also can you answer the same questions about gaps and alternative forms (involving бъда as an auxiliary) for a regular verb like or ? Benwing2 (talk) 02:02, 4 May 2020 (UTC)
 * I don't see formal mistakes. Some forms of бъда sound unnatural to me, but I guess they exists theoretically - e.g. the definite forms of the l-participle: билият, билата, билото. Standard Bulgarian uses -ият, -ата, -ото to say the former, the past.
 * PS The dubitative aorist (бил съм бил) of 'бъда'/'съм' theoretically exist. It's identical to the present, though, so I can't think of a proper example that illustrates it. I'm actually not sure that it is correct to distinguish past from present for something that is doubted/uncertain. I think the 3p. sg. forms should be бил е бил, била е била, било е било, etc instead of бил бил, била била, било било (but I'm not sure).
 * PS2 Probably, the safest option is to copy the tables from instead of relying on human proof-checking. It's hard to grasp all these forms out of context even for a native speaker. Безименен (talk) 09:41, 4 May 2020 (UTC)
 * Can you say ще бъ́да гово́рил "I will have spoken", ще бъ́да крал "I will have stolen", etc. for the future perfect in place of ще съм гово́рил, ще съм крал, etc.? Benwing2 (talk) 13:21, 5 May 2020 (UTC)
 * Chitanka only lists the indicative and renarrative forms, and only for бъда, not for съм. It says the renarrative present/past perfect can be either бил бил or бил е бил, and similarly the renarrative aorist either бил or бил е, and the renarrative present/imperfect either бъдел or бъдел е, and the renarrative future/future-in-the-past either щял да бъде or щал е да бъде, and the renarrative future perfect/future-perfect-in-the-past either щял да е бил or щял е да е бил. Benwing2 (talk) 13:38, 5 May 2020 (UTC)
 * Regarding the construction ще бъда + л-participle: it sounds unnatural. The closest to what you ask is ще бъда + н/т-participle, which however conveys passive sense (I will/would be + perfect participle...), not active future perfect. For example:

Ще бъда наказан/нахокан, ако ... /I would be punished/scolded, if .../ Ще бъда поготвен ... /I will be prepared/
 * I would say it doesn't exist... at least not as an alternative to ще съм + л-participle. Безименен (talk) 15:20, 5 May 2020 (UTC)

Template:bg-verb-form and Template:bg-verb form of are gone
I eliminated both of these by bot. All uses of have been replaced with. was replaced with the following: The specialized headword templates are used for categorization. All of them categorize into Category:Bulgarian verb forms as well as another category (Category:Bulgarian participles, Category:Bulgarian participle forms, Category:Bulgarian verbal nouns or Category:Bulgarian noun forms, respectively) and also categorizes into a specific participle category such as Category:Bulgarian past active aorist participles.
 * 1) Finite verb forms are replaced with.
 * 2) "Lemma" participles (the masculine singular indefinite) are replaced with, with the type(s) of participle specified in args 2, 3, ... for categorization purposes.
 * 3) Other participle forms are replaced with.
 * 4) "Lemma" verbal nouns (the masculine singular indefinite) are replaced with.
 * 5) Other verbal noun forms are replaced with.

Finite verb forms use the ==Verb== header, participles and participle forms use the ==Participle== header, and verbal nouns and verbal noun forms use the ==Noun== header. Benwing2 (talk) 13:16, 5 May 2020 (UTC)
 * Thanks for the changes. I was wondering why participles had header ==Verb== until now. Безименен (talk) 15:23, 5 May 2020 (UTC)

Error at ям
Hi. Sorry, pinging again, in case you missed it. Please see Talk:ям. The third person plural is яда́т, not ямт. I am clueless about this module and how to fix it. --Anatoli T. (обсудить/вклад) 00:09, 8 May 2020 (UTC)
 * Fixed. Benwing2 (talk) 04:04, 8 May 2020 (UTC)
 * Thank you! --Anatoli T. (обсудить/вклад) 04:10, 8 May 2020 (UTC)
 * Was also broken for дам and знам; it's now fixed in a general way that should take care of all such verbs. Benwing2 (talk) 04:15, 8 May 2020 (UTC)

Imperative of дойда, зайда, подойда, придойда
Chitanka indicates the imperative of дойда as regular дойди́, and says there's no imperative plural. But says the imperative is ела́, plural ела́те. What is correct? For now I've given до́йда the imperative ела́ pl. ела́те, but left за́йда, подо́йда and придо́йда with regular imperatives зайди́/зайде́те, подойди́/подойде́те, придойди́/придойде́те. Benwing2 (talk) 20:17, 10 May 2020 (UTC)


 * https://uraa.eu/BG-newspaper/глагол/ says the following: "От глагола дойда повелителната форма е дойди, дойдете, но вместо нея употребяваме ела, елате." - "the imperatives of дойда are дойди, дойдете but we use ела, елате instead of them" I wonder if дойди, дойдете are considered rare or wrong. --Anatoli T. (обсудить/вклад) 23:06, 10 May 2020 (UTC)


 * , : The imperative forms are not wrong, but they are indeed somewhat rare. Most people use the Greek borrowing to say come here. I can think of the thread:

Дойди ми и ... 'Come at me and ...' (e.g. ... ще ти видя сметката ~ '... I'll resolve my business with you' = I'll beat you/annihilate you/sth of this sort)
 * as an example of . The imperative of 🇨🇬 is clearly attested in Зайди, зайди, ясно слънце. Безименен (talk) 08:57, 11 May 2020 (UTC)
 * Thank you. We should include and  as alternatives of  and  in  but label them as rare. --Anatoli T. (обсудить/вклад) 09:02, 11 May 2020 (UTC)

Subjective and objective case references are incorrect
@Benwing2 @Kiril kovachev today, the acceleration tags of this module render certain Bulgarian participle forms as being in the subjective case or objective case, I assume because the endings change depending on whether the participle is part of the subject or object of a sentence. That, however, is grammatically incorrect - this is not a case distinction (no such cases have ever existed in Bulgarian), but rather a distinction within the definite article, which has two flavors, called "full" and "short". A different nomenclature calls them the subject definite article and object definite article, which would work here.

In other words, I'd change:

to:

A similar change should be done for the noun conjugation. Thoughts?

Thanks,

Chernorizets (talk) 09:13, 30 August 2023 (UTC)


 * @Chernorizets What is the difference? Surely the definite article only changes its form depending on whether it takes on a subjective or objective role (which seems to come very close to English pronoun distinctions, e.g. who/whom having subjective and objective forms)? May you give an example of a participle where this is currently stated incorrectly and what the mistake is? Kiril kovachev (talk・contribs) 12:29, 30 August 2023 (UTC)
 * @Kiril kovachev when the template is passed "sbjv" and "objv", it adds "subjective" or "objective" to the inflection, which are clickable links to articles about the subjective case and objective case, respectively. Here is the link for "objective", which means "any case that is neither nominative nor vocative". Those happen to be the only cases in Bulgarian. Note that "case" here refers to grammatical case.
 * My proposed change replaces "subjective" with "subject" and "objective" with "object", which are used by with the usual meaning of subject and object of a verb. That's what we're really trying to convey here, and it corresponds to how we all learned the rules for when to use the full definite article vs the short one. It's just the correct nomenclature. Chernorizets (talk) 22:04, 30 August 2023 (UTC)
 * @Chernorizets Oh, okay, so it's just that. Sorry I didn't get it the first time, but yeah, I support this. I suppose we would require a script to edit these in the many entries they appear in? Kiril kovachev (talk・contribs) 22:11, 30 August 2023 (UTC)
 * @Kiril kovachev the first order of business would be to fix this here and in Module:bg-nominal. I don't think we have a ton of those entries created for either nouns or verbs, but I've seen more for nouns, so I'd run a script to fix that first, followed by any non-lemma verb forms we might have created entries for. Chernorizets (talk) 22:24, 30 August 2023 (UTC)
 * @Chernorizets Okay, sounds good. Do you know how the sbjb/objv labels currently work? I imagine somewhere these are expanded into their full names, but where is that? If there's already such a label as sbj/obj then I guess that doesn't matter though—but else we'd have to create it, wherever that's to be done, I guess. Kiril kovachev (talk・contribs) 14:07, 31 August 2023 (UTC)
 * @Kiril kovachev all of those labels are defined in the documentation of . The ones I recommend we start using already exist. I'm happy to make the change if no one has concerns about it. Chernorizets (talk) 20:35, 31 August 2023 (UTC)
 * @Chernorizets Seems fine to me, I've no problem with it. Why not make it anyway, since we aren't changing any actual page contents, so even if anyone comes to disagree, the relevant code can be reverted and the change is immediately undone? No harm done, I think :) Kiril kovachev (talk・contribs) 21:06, 31 August 2023 (UTC)
 * @Kiril kovachev done. Chernorizets (talk) 21:30, 31 August 2023 (UTC)
 * @Chernorizets Nice one. ^^^ Let's wait and see; maybe we could post in the beer parlour or something about this, but I don't know if it's even that controversial. I guess it wouldn't hurt...? Kiril kovachev (talk・contribs) 21:31, 31 August 2023 (UTC)
 * @Kiril kovachev IMO it doesn't rise to Beer Parlour-level visibility, but I wouldn't try to stop anyone from posting there if they saw fit. Chernorizets (talk) 21:49, 31 August 2023 (UTC)
 * @Chernorizets Fine by me either way. @Benwing2 @Bezimenen @SimonWikt what do you think about this change in general? Kiril kovachev (talk・contribs) 23:56, 31 August 2023 (UTC)
 * @Chernorizets @Kiril kovachev This is fine by me. I have heard that some dialects of Bulgarian have more cases, e.g. the dative (and there are definitely archaic expressions with case endings in them in standard Bulgarian), and treating the subject/object distinction in the definite article as a case remnant seems reasonable on the surface but I don't know Bulgarian grammar in detail. Benwing2 (talk) 00:01, 1 September 2023 (UTC)
 * @Benwing2 the forms of the definite article are an orthographic convention, not case remnants. Some of the early codifiers of Bulgarian advocated for there to be just one version, but the current distinction won out. The definite article in Bulgarian comes from Proto-Slavic and OCS words for "this" - ta/to/tъ - that became fused with the nouns. Chernorizets (talk) 00:24, 1 September 2023 (UTC)
 * @Chernorizets What I mean by "case remnant" is from a synchronic perspective in the written language, they behave like cases; whether they derive from Old Bulgarian case forms is irrelevant (compare the "new cases" in a language like Bengali, which are not derived from the earlier case system, which has entirely disappeared [in contrast to languages like Hindi], but which synchronically are cases nevertheless). Benwing2 (talk) 00:28, 1 September 2023 (UTC)
 * @Benwing2 I'm comfortable saying that it's unlikely users of the dictionary would find references to the subjective/objective case in Bulgarian online or offline resources, so it's an academic rather than a practical argument IMO. To a casual reader, subject & object are clearer in meaning than subjective/objective, which without clicking on the links could easily sound like their everyday meanings, rather than the grammatical ones. Besides, Bulgarian does have evidential verb forms, which express subjective belief in whether something did or did not happen, so again I think it's a distracting set of terms to use in the contexts where they're being used today. Chernorizets (talk) 00:44, 1 September 2023 (UTC)
 * @Chernorizets OK, that is fine with me. Benwing2 (talk) 00:45, 1 September 2023 (UTC)
 * @Kiril kovachev, @Chernorizets I don't think that I have used these yet, but it makes sense to me.
 * Does that imply that the definitions of пълен член, непълен член and possibly others, should also be changed?
 * SimonWikt (talk) 09:50, 1 September 2023 (UTC)
 * @SimonWikt I would follow the nomenclature in and translate these as "subject definite article" and "object definite article", respectively. Chernorizets (talk) 20:22, 1 September 2023 (UTC)

Regarding accelerator tags for verbal participles
Hey @Benwing2,

I have two questions about the way participle entries are created today via acceleration, by clicking inside a Bulgarian conjugation table. Right now, the POS header they get is "Verb" rather than "Participle". I notice that languages make this choice differently - I've seen English entries put past participles in a "Verb" section (e.g. melted), and I've seen Polish past participles using "Participle" (e.g. topiony). Is there a specific reason to use "Verb" when "Participle" is available?

Secondly, Bulgarian conjugation tables include the different person/number forms for participles. As in other languages, Bulgarian participles can be used as adjectives - sometimes acquiring new senses, sometimes not - in which case those person/number forms would also show up in the adjective declension table, with different acceleration tags from those inside the verb conjugation table. Is this desirable? If we were to actually create those non-lemma forms, they'd end up having a "Verb" (or "Participle") section and an "Adjective" section, and I wonder if that's what we want.

E.g. if you take, its past passive participle "closed" in the feminine definite form would be. When I try to put it in a sentence, I come up with adjectival uses, or maybe substantivized adjectives:
 * Затворената врата е червена. - The closed door is red.
 * Затворената вкъщи заради COVID гледаше телевизия. - The [one, f.] who had been confined (lit. closed) at home due to COVID was watching TV.

What provides the distinction between the adjectival and participle "manifestations" of a non-lemma form like that? The only evidence for distinction I can think of is the way the prescriptive Bulgarian Language Dictionary doesn't list participles as headwords, unless they have adjectival senses that aren't immediately apparent from the main verb.

Thanks,

Chernorizets (talk) 04:57, 23 November 2023 (UTC)

@Chernorizets Hi. I believe the correct way to handle participles is to use a ==Participle== header and a 'participle' (or sometimes 'past participle') part of speech in head. I have corrected any occurrences of ==Verb== for participles to ==Participle==. If the accelerators generate "verb", that should be fixed. As for your second question, are you asking about duplication between derived forms of participles (e.g. feminine singular) and the corresponding derived forms of adjectives generated from those participles? I think the tendency in that case is to create two separate POS entries, one for the derived adjective and one for the derived participle, even though it may be duplicative. But this isn't totally worked out, I think, and maybe there's a better way. For example, maybe we should implement the ability to have two POS's listed under a single header, something like this:

Participle,Adjective


I think I proposed something of this sort in the past, specifically with regards to Arabic, and received some positive feedback. Benwing2 (talk) 07:25, 23 November 2023 (UTC)

Verbal nouns are not handled correctly in several ways
the current treatment of Bulgarian verbal nouns in Wiktionary is incorrect, in several ways:


 * they get categorized as non-lemma forms, presumably by Module:bg-headword adding them to Category:Bulgarian verb forms. This is incorrect. Bulgarian verbal nouns are regular nouns, possessing the categories of gender, number and definiteness common to all Bulgarian nouns. Bulgarian monolingual explanatory dictionaries have them as headwords. This is especially useful since they can have more than one noun sense, and/or more than one applicable English translation. For example:
 * the verbal noun of is, which can mean either "eating" (the action) or "a meal".
 * the verbal noun of is, which can mean either "traveling" (as in "traveling is a good way to learn about the world") or "a trip, a journey" (as in "next year I'm planning a trip to France").
 * since they are regular nouns, their declensions should be given via in their corresponding entries, and each declined form should refer to the base noun form. Instead, we use  for declined forms, leading to a ton of entries like  which claim to be verbal rather than nominal inflections.
 * handling declension via in a verbal noun's entry also has the advantage of specifying correct pluralization. Despite the claim of Bulgarian conjugation tables that verbal nouns have plurals in -ния and -нета, only some verbal nouns use both. The usual plural is in -ния (as in, not ), but for example the plural of  in the sense of "a meal" is , not . Some verbal nouns are uncountable.
 * The practice seen in and  of having two noun POS sections - one for  and another which lists the actual English senses - is incorrect. It's true that some verbal nouns would benefit from separate POS sections - like  - if they have senses that e.g. require different pluralization, and/or aren't simply "the action of the verb". However, others like  have all of their senses derive from "the action of the verb". There is nothing special about  that precludes relevant senses being listed under it.

To fix these various issues, I want to make several changes:
 * Module:bg-headword: remove the statement that adds verbal nouns to the "Bulgarian verb forms" category. It's sufficient that they end up in "Bulgarian verbal nouns", which is a subcategory of "Bulgarian nouns". That's also the pattern in e.g. Polish.
 * Module:accel/bg: ensure that we handle the case of accelerated creation of verbal nouns in the following way:
 * the POS header should be "Noun"
 * the headword should use, even though I'm not sure we need it over simply.
 * the definition should simply use, as in Polish, vs. the current "indefinite masculine singular indicative etc" stuff.
 * the declension will be given by with a comment to verify pluralization.
 * this module (bg-verb): there are three options:
 * change the accel tags on verbal noun declined forms to point to the verbal noun, rather than the verb, so one would get identical results to creating the forms from within the table generated by.
 * make everything but the base form of the verbal noun unclickable (no accel tags, no links)
 * remove everything but the base form of the verbal noun, and optionally move it to its own row in the table.

With these changes in place, we'll need some cleanup actions:
 * (bot job?): modify verbal noun form entries (e.g. ) to from.
 * delete.

Let me know if you have any questions or feedback.

Thanks,

Chernorizets (talk) 08:37, 12 December 2023 (UTC)


 * @Chernorizets: I would be happy to take on the bot task today if we're going ahead with this. I was quite surprised to read that verbal nouns are meant to be lemmas, because on the other hand, aren't they themselves an "inflection" of a verb? And yet they also have their own declension, so the current treatment is understandable to an extent. But in response to all your points, I much agree this would be a helpful change. Kiril kovachev (talk・contribs) 12:55, 12 December 2023 (UTC)
 * Check out and  in which I try the transformation you suggested.
 * First of all, my current approach to the task is as follows. We have a large pre-analyzed repository of data available form kakki.org, which has parsed all the Bulgarian entries. I filtered out a specific subset of this data, and then built up a map from verbs to their verbal nouns. Some have more than 1, which is accounted for: you can see the data here.
 * We iterate through pages which have the problem, and check what the situation is: if it's "indefinite singular verbal noun", we remove the "indefinite singular" parts and keep just the "verbal noun" part.
 * If it's an inflection of a verbal noun, the verb that was previously considered to be the lemma of the inflection is used as the index into the lookup table, which provides the new lemma (the verbal noun); this becomes the parameter of, and then the "verbal noun" argument is removed.
 * You can see the code that does this here. Let me know of any feedback ^^ Kiril kovachev (talk・contribs) 20:36, 12 December 2023 (UTC)
 * @Kiril kovachev thanks for working on this, I'll take a look as soon as I can. Fun fact about your two example diffs - is a non-existent plural of  :-) The only plural is . This is a separate problem - there are likely a number of non-lemma forms for vnouns that we'll need to delete. We'd have to create a list of all the plural ones in -та and manually check it. This would go under "cleanup actions" in my list above. Chernorizets (talk) 21:28, 12 December 2023 (UTC)
 * @Chernorizets Funnily, I see the Chinese Wiktionary also has these forms when I tried looking it up for attestations (none found, of course). And worse still, they still have our old IPA in place :-)
 * About listing out the terms for checking, as well, I can compile this from Ben's list below. Kiril kovachev (talk・contribs) 13:05, 13 December 2023 (UTC)
 * @Chernorizets Okay, more important update. I just made User:Kiril kovachev/Verbal nouns to check, but it's a lot of work — 1313 verbal noun forms ending in -та. Is there any way we can cut down our search space? Check ones ending only in -нета, for example? (I don't know, that might exclude other kinds of error...)
 * Separately, is брания ever stressed as ? Not heard this one, but apparently that's a verbal noun of . Kiril kovachev (talk・contribs) 18:05, 13 December 2023 (UTC)
 * Also, are these bugs also present in the conjugation tables, or just an artefact created back in the day? We can use the data again to see which forms actually exist, which may help provided the tables are now largely correct and only the ghost pages remain. Kiril kovachev (talk・contribs) 19:45, 13 December 2023 (UTC)
 * Otherwise, the diffs do look exactly like what I have in mind. Chernorizets (talk) 21:32, 12 December 2023 (UTC)
 * @Chernorizets Also, what should we do about entries like ? This has a definition that reads ; should we just change it to ? Kiril kovachev (talk・contribs) 19:41, 12 December 2023 (UTC)
 * @Kiril kovachev I'm proposing that we simply use, and similarly for other verbs. You can see it in action in e.g. . Chernorizets (talk) 21:17, 12 December 2023 (UTC)
 * @Chernorizets Ah, my mistake, I thought you meant to remove somehow, rather than . That makes sense, I'll make the relevant change to the code. Kiril kovachev (talk・contribs) 21:21, 12 December 2023 (UTC)
 * Also, @Benwing2: would it be possible to briefly add a tracking page for all entries using which contain "vnoun" for Bulgarian? I can't edit the module so I couldn't do this myself.
 * There might be a better way of doing it, but all I need for this task is to access all the verbal nouns and their forms in some way. Note that doesn't show up in any verbal noun category, so this is sadly not enough by itself, I think. Kiril kovachev (talk・contribs) 19:55, 12 December 2023 (UTC)
 * @Kiril kovachev I don't know if this helps you, but a lot (maybe all?) of the existing verbal noun declined forms use in the headword. Chernorizets (talk) 21:19, 12 December 2023 (UTC)
 * @Chernorizets The problem is simply knowing what entries need to be edited, in an efficient way, more so than being able to identify if any given entry needs to be. Currently my script iterates over all kinds of sundry entries before eventually finding a verbal noun form, but that's quite cumbersome — it took maybe a minute to find a single valid entry. (It has to fetch and parse the page text every time we want to check whether it's what we want.) So I would also be happy with just a word list. That actually reminds me I could try to figure out the list using the kaikki.org data I mentioned above. Thanks for prompting me with this idea! Kiril kovachev (talk・contribs) 21:24, 12 December 2023 (UTC)
 * Also, I realize I could have just used the "what links here" off of that template you suggested. Facepalm. Kiril kovachev (talk・contribs) 12:56, 13 December 2023 (UTC)
 * @Kiril kovachev There is already tracking on a per-tag but not per-tag+lang basis; see Special:WhatLinksHere/Template:tracking/inflection of/tag/vnoun. I filtered that list down to only those terms with Cyrillic characters and then filtered out any containing non-Bulgarian or ending in  or . The result is here: User:Benwing2/vnoun-mostly-bulgarian There may be a few false positives but all the Bulgarian terms should be there. Benwing2 (talk) 06:16, 13 December 2023 (UTC)
 * @Benwing2 Great, thanks! I'll try it on this list with a few more entries later today. Kiril kovachev (talk・contribs) 12:58, 13 December 2023 (UTC)
 * for some examples. Everything looks to be working, for both classes of problem ( to and  to ). Kiril kovachev (talk・contribs) 20:06, 13 December 2023 (UTC)
 * @Chernorizets @Benwing2, would you both be happy for me to run this on the rest of the entries?
 * I now did another 30 runs and everything is still okay. Kiril kovachev (talk・contribs) 19:44, 16 December 2023 (UTC)
 * @Kiril kovachev a few of the verbs in the JSON file are perfective, and as such should not have verbal nouns in -не:
 * : no vnoun (*денене)
 * : no vnoun (*усмихнене)
 * not sure if there are others, but I didn't see any
 * These verbs' conjugation tables probably need to be fixed, and these non-existent vnouns need to be deleted (along with their inflected forms). Otherwise, the changes look good and you should go ahead. Chernorizets (talk) 20:25, 16 December 2023 (UTC)
 * @Kiril kovachev actually, where in your code is the change from to  for the base form (indef sg)? Chernorizets (talk) 21:01, 16 December 2023 (UTC)
 * @Chernorizets Sorry, the current version of the code is different from the one I put on GitHub previously. I just now pushed it if you want to see. Kiril kovachev (talk・contribs) 21:07, 16 December 2023 (UTC)
 * @Kiril kovachev No objections. Benwing2 (talk) 21:40, 16 December 2023 (UTC)
 * @Kiril kovachev ship it! :-) Chernorizets (talk) 21:48, 16 December 2023 (UTC)
 * @Benwing2, @Chernorizets: it's now done! Kiril kovachev (talk・contribs) 22:56, 16 December 2023 (UTC)
 * @Benwing2 @Kiril kovachev I've now implemented changes in Module:accel/bg to create entries for verbal nouns per my proposal in the original post - пържене is an example of what that looks like. Unless there are any concerns, I'm going to go ahead with the other changes I wrote about. Chernorizets (talk) 01:47, 14 December 2023 (UTC)
 * @Chernorizets @Kiril kovachev Thanks for doing this. It brings verbal nouns in line with how participles are normally structured. Note that I did some cleanups to existing verbal noun entries awhile ago, which introduced the headword templates bg-verbal noun and bg-verbal noun form; I may have changed things to include 'vnoun' but your approach is better. Benwing2 (talk) 02:02, 14 December 2023 (UTC)
 * @Benwing2 to keep things simple, I've suppressed acceleration in the verb conjugation table for all but the indefinite singular (base) form of verbal nouns. Per my original post, there are a couple of reasons for this:
 * verbal noun inflected forms should have the verbal noun as the acceleration lemma, not the verb. I'm disinclined to "recover" the base noun from an inflected form in Module:accel/bg, because 1) that kind of processing doesn't naturally belong in an acceleration module, and 2) sometimes the stress moves in the plural, in cases hardcoded in Module:bg-verb. I don't want to duplicate that information.
 * the conjugation table as-is contains verbal noun inflected forms that may not exist, such as plurals in -та, or plurals in general. It also hardcodes verbal noun alternatives for several verbs. I'm not convinced that verbal noun inflections should be in the conjugation table in the first place, because - unlike participles - they don't retain verbal grammatical categories in the modern language. If someone really insists on keeping anything beyond the base form of the verbal noun in the conj table, we'd need to at least provide template-level control for vnoun plurals (if they differ from the -ния default).
 * All in all, I went for the simplest change that ensures people don't create verbal noun entries that need fixing. My inclination would be to keep only the base form of verbal nouns in verb conj tables, and remove the slots for the declension.
 * Thanks,
 * Chernorizets (talk) 04:28, 14 December 2023 (UTC)

Enhanced support for Bulgarian impersonal verb constructions
Hi @Benwing2 and @Kiril kovachev,

I promise this is going to be the last time I spam you this year. I'd like to modify this module to support a kind of impersonal verb construction that's common in Bulgarian. It looks like a reflexive, but it has a dative pronoun, for example:


 * &rarr;
 * &rarr;
 * &rarr;
 * &rarr; &rarr;

The normative Bulgarian dictionary gives these forms with the 1st person singular dative pronoun, although you can use any dative pronoun, e.g. . In terms of conjugation, it's similar to other impersonal verbs:
 * imperfect:
 * future:
 * etc

I need to do some research though on whether some of the tenses and participles (always) make sense for those:
 * the present active participle (*спящ ми се) and the adverbial participle (*спейки ми се) don't make sense. I'm not sure whether they make sense for impersonal verbs in general.
 * the aorist sounds ungrammatical; people would use the imperfect. "Feeling sleepy" doesn't have an implied beginning and end, which is what the aorist conveys.
 * for the same reason, there is an evidential form using the past imperfect participle -, but no such form with the past aorist participle -.

Adding support for these forms comes with the caveat that we should be judicious as to which ones we include, since there's basically a form corresponding to any "I feel like -ing", e.g. . However, for common ones like and  I feel like it would be a beneficial addition. This type of construction is very common both in writing and in speech. The alternative would be to add a section to the nascent Bulgarian verbs appendix I'm working on, with the downside that few people tend to discover the appendices.

Thoughts and suggestions are, as always, welcome!

Thanks,

Chernorizets (talk) 04:02, 15 December 2023 (UTC)


 * @Chernorizets This sounds fine to me and don't worry about spamming me with long messages :) ... I'd much rather have more engagement than less with this site. Consistent with the principle that the lemma should be listed in the conjugation, I'd suggest a syntax something like this: where the underscores in place of spaces aid in parsing (otherwise the code would likely interpret just се as the lemma). Note that some other languages have modules that support similar features, e.g. the Portuguese verb module supports insertion of the reflexive pronoun either before (proclisis), after (enclisis) or in the middle of (mesoclisis) an inflected term; a postprocessing step adds the reflexive pronoun, but in the conjugation step, a special character is added to mark the mesoclisis insertion point, which gets removed in a later stage if not needed. Also, most verb modules have a  function that is called just before adding a form to check whether the addition should be skipped (e.g. due to it being an impersonal verb). It looks like that function isn't present in this module, probably because it was one of the earliest modules I wrote, but you could consider adding it. (For similar reasons, this module doesn't use the full machinery of Module:inflection utilities, but rolls its own for some portions; I actually developed some of the machinery in Module:inflection utilities based on the Bulgarian modules). If you have any questions about how the code works, please feel free to ask. Benwing2 (talk) 04:59, 15 December 2023 (UTC)
 * @Chernorizets I've the same comments as Ben about activity, rather I like it when we sit down and discuss what progress can still be made with our infrastructure. Would it be possible, perhaps, @Benwing2, in the same way the reflexive conjugation is auto-detected, to auto-detect the form of a dative reflexive in the same way as well? If we just put that check first (i.e. check for "ми се" suffix before just "се"), then would that make the treatment of the argument itself a bit simpler? I assume we'd want to generate all these forms in a separate conjugation table, like we do with reflexive and non-reflexive pairs as well. Kiril kovachev (talk・contribs) 16:11, 15 December 2023 (UTC)
 * @Kiril kovachev stay tuned - I'll try to get some more grammatical info about these forms, and maybe impersonal verbs in general, in order to figure out what conj table(s) we need. I've already sent a question over to the BG language institute to get more clarity on some of the details. Chernorizets (talk) 22:37, 15 December 2023 (UTC)
 * @Chernorizets Thanks for the enquiry — I look forward to what they have to say. Kiril kovachev (talk・contribs) 22:55, 15 December 2023 (UTC)
 * @Kiril kovachev Yes, what you propose should actually work, without the need for underscores. I didn't realize that the normal multiword support isn't enabled for Bulgarian verbs; with that support in place, you'd need to use an underscore in place of a space to group parts of a lemma (this is done with de-conj, for example, to handle verbs like kennen lernen, because of the multiword support, cf. aus.schneiden und ein.fügen<> for ausschneiden und einfügen). Benwing2 (talk) 23:37, 15 December 2023 (UTC)
 * @Benwing2 That's actually a separate issue that I was wondering about, which would help us out in a few entries such as — if I remember correctly, the Latin module has support for phrases with both verbs and invariable components, which would be a good upgrade if we were able to do that to. What would be necessary technically for that kind of change? If we keep up the current system (and implement the auto-detection as I suggested above), will this make it harder to implement multi-word support later? Kiril kovachev (talk・contribs) 00:08, 16 December 2023 (UTC)
 * @Kiril kovachev isn't ям бой more of an idiomatic expression (Phrase) than a Verb? I do see a number of English entries that treat things like that as verbs (e.g. take a breather, so I'm not sure. If you want to point the reader to the conjugation of the main verb, you could use, as is done in ближа си раните. Chernorizets (talk) 05:56, 16 December 2023 (UTC)
 * @Chernorizets I don't know what categorization we want for it part-of-speech wise, since I would say it's a "verbal phrase", so really, both would be applicable. I would defend "Verb" over "Phrase" simply because this tells the reader more about how the phrase works (as it could also be a noun phrase, adverbial, etc.).
 * Thanks for informing me about . I didn't know about that. If any verb phrase can be redirected like this, I suppose there's no need to specially conjugate the phrase itself. Kiril kovachev (talk・contribs) 19:35, 16 December 2023 (UTC)
 * @Kiril kovachev Yeah there used to be similar templates for Russian multiword noun terms but I removed them in favor of indicating the actual declension, which otherwise can be confusing esp. if there is more than one declined component (e.g. adjective plus noun). For multiword verbal expressions what we typically do in Spanish and Italian is conjugate them in the headword (the Spanish and Italian verb headwords list the first singular present, first singular preterite and past participle), and otherwise not include a full conjugation table. Benwing2 (talk) 21:43, 16 December 2023 (UTC)
 * @Benwing2 Okay, that sounds like a good solution. If we need that, although for the phrasal verbs we have so far the redirect may even be enough, we can work on extending with those parameters as well, perhaps (or even allow them to be auto-generated by the module itself via an argument?), but for now even doing nothing might be fine. Thanks for your help, Kiril kovachev (talk・contribs) 23:34, 16 December 2023 (UTC)
 * @Kiril kovachev You will find for example that es-verb takes the same parameters as es-conj and auto-generates the principal parts specified above. We could make bg-verb work that way as well; it would take the same argument as bg-conj in 1, and take additional parameters to specify the aspectually-paired verb(s). Not necessary to make that change now but something to consider in the future. Benwing2 (talk) 00:03, 17 December 2023 (UTC)