User talk:Chernorizets/bg-top-5000-bnc

Download
Hi @Chernorizets, Do you have this list in CSV, or other, downloadable format SimonWikt (talk) 13:44, 22 August 2023 (UTC)


 * @SimonWikt you can download the original frequency lists from here: https://dcl.bas.bg/bulnc/en/dostap/retchnitsi/. All I've done is some ping to exclude words with non-Cyrillic characters and uppercase letters. I've used the GENERAL dictionary which combines all others. Chernorizets (talk) 20:41, 22 August 2023 (UTC)
 * Thanks @Chernorizets
 * I dumped your page into a spreadsheet and went from there.
 * On closer inspection it is not quite what i was hoping for. Some, of what I consider to be everyday, words aren't even on the list!
 * This seems to be typical of all the 'Top lists' that I can find, they all seem to be based on sources such as Wikipedia, film subtitles, the news, books, etc and not on normal everyday life.
 * Thanks anyway 😀
 * SimonWikt (talk) 05:53, 23 August 2023 (UTC)
 * @SimonWikt there might be other corpora of Bulgarian text that include a more representative sample - this particular corpus is what's available from BAS at the moment. The  wordlist is probably denser in everyday vocabulary, based on its description - you can take a look at that one if you'd like. Chernorizets (talk) 06:51, 23 August 2023 (UTC)