Wiktionary:Frequency lists/Japanese2015 10000

This was drawn from the complete dump of the Japanese Wikipedia on April 22, 2015 which was run through a morphological analyzer called mecab. This is a list of lemmas, not inflected forms, where homographs are counted as a single lemma. The lemma may not be the most common form. Morphological suffixes such as -ta (past suffix) are counted as independent lemmas.

Wikipedia markup was cleaned through wp2txt, and the count ignores punctuation and words in Latin characters (rōmaji). In total, 669,419,716 occurrences of 2,610,776 lemmas were counted.

See also the list of lemmas ranked 10,001–20,000. The full lemma count is available in TSV (tab-separated) format here. A count of inflected word forms is also available.