Wiktionary talk:Frequency lists/Japanese/Wikipedia2013

有 and 昼 are each repeated twice Infofarmer (talk) 14:17, 31 January 2014 (UTC)

Is this a mistake? Should it be corrected? The page mentions it, but doesn't mention if it's an error. 73.181.13.156 04:48, 22 July 2020 (UTC)

This is not a list of words by frequency. It is a list of word spellings by frequency. 並びに is the kanji spelling for ならびに. Both appear separately in the list at rank positions 4476 and 5563, respectively, indicating that the kanji spelling is predominant for this word. In an actual word frequency list, the word would have to occupy a higher rank due to its spellings being combined into a single count. Another example pair is など (rank 20) and its kanji spelling 等 (2262). 70.79.163.252 16:33, 14 August 2020 (UTC)

Another problem which indicates that this is a list of spellings and not of words is ambiguity. The list includes kanji and kanji compounds that have multiple readings, without indicating which word that is. If we had the original texts from which the word was harvested, it would in many cases be clear which word that is, either from the context or possibly from furigana. For instance, does the word 機 ("hata", loom) really occur so often that it fetches position 63? No, most of the occurrences are probably the "ki" reading, where this is not a word, but a suffix for some kind of machine or instrument, and other uses, possibly of it as a stand-alone word in a context where it is clear that the topic is not looms. The algorithm just didn't recognize suffix uses that do not appear as entries in its word dictionary. 70.79.163.252 16:33, 14 August 2020 (UTC)