Appendix talk:Vocabulary lists

Introduction
This set of Wiktionary appendices serves as a supplement to the various comparative lexical databases currently available online:


 * IDS
 * ASJP
 * RefLex
 * TransNewGuinea.org
 * Chirila
 * STEDT
 * MKED
 * ACD
 * IELex
 * Starling

Abacaxeiro (talk) 12:41, 8 October 2020 (UTC)

Word list digitization methods
Wiktionary's vocabulary lists were digitized using different methods. The first lists were published on Wiktionary in the mid-2010s, and most of the lists were published on Wiktionary from about 2019 to 2021. For documentation purposes, and to check which lists need further proofreading, I just did a survey of how each list was digitized. Note that in some cases, a mixture of different methods were used. Thanks to everyone who let me know.

Explanatory key:
 * Already digitized online: Copied and adapted from online databases, and so forth.
 * PDF extraction: "Born-digital" PDFs that do not need OCR. Not all PDFs are in Unicode, so oftentimes character conversion was required.
 * OCR: All OCR texts were very carefully proofread and done using ABBYY FineReader versions 11, 14, and 15. The software program's multi-view panels allowed for easy proofreading.
 * Manual typing: Some of the manually typed lists may need to be re-checked for accuracy.

Abacaxeiro (talk) 07:03, 2 July 2021 (UTC)