User:Matthias Buchmeier/trans-en-es.awk

Dictionaries from translations sections
Below you find a gawk script to create wikified bilingual dictionaries form the translations sections of the datadase dump.

Usage:

 * 1) Download the  database dump (enwiktionary-DATE-pages-articles.xml.bz2) from here.
 * 2) Copy the code below to trans-en-es.awk.

on LINUX

 * 1) Enter the following command (language is the language name e.g. "Spanish", iso-code is the corresponding iso language code, e.g. "es" for Spanish, as they appear in the wiki-code of the translations sections):
 * 2) * bzcat enwiktionary-DATE-pages-articles.xml.bz2|gawk -v LANG=language -v ISO=iso-code -v REMOVE_WIKILINKS="y" -f trans-en-es.awk|sort -s -d -k 1,1 -t"{">OUTPUT-FILE

on MS-Windows

 * 1) Unzip enwiktionary-DATE-pages-articles.xml.bz2
 * 2) Run the following command from the DOS-window (language is the language name e.g. "Spanish", iso-code is the corresponding iso language code, e.g. "es" for Spanish, as they appear in the wiki-code of the translation sections):
 * 3) * gawk -v LANG=language -v ISO=iso-code -v REMOVE_WIKILINKS="y" -f trans-en-es.awk enwiktionary-DATE-pages-articles.xml> OUPUT-FILE
 * 4) Optionally sort the OUPUT-FILE with whatever program at hand.

Dictionaries from non-English language sections
This is a gawk script to create wikified bilingual dictionaries form the foreign language (FL) sections from the datadase dump.

Usage:
bzcat enwiktionary-DATE-pages-articles.xml.bz2|gawk -v LANG=foreign-language -v REMOVE_WIKILINKS="y" -f trans-FL-en.awk|sort -s -d -k 1,1 -t"{">OUTPUT-FILE


 * Currently supported foreign languages: Italian, Spanish, French, Finnish, Latin, German, Dutch