WritEMe - Lexicon Database

The WritEMe Lexicon database is an experiment in data mining for cuneiform sources, with special reference to Sumerian texts from the third millennium BCE. It originated from the need to automatically identify the most relevant texts for the scope of the project, i.e., those more intimately related to writing and accounting. A very large digitized corpus of cuneiform texts is readily available via the Cuneiform Digital Library Initiative (CDLI), but the size of it makes it difficult to navigate. Manually searching for all Sumerian terms for writing and accounting is in fact a very tedious and time consuming task. A tailored data mining approach may help saving considerable time, therefore boosting research. This is accomplished by a script for parsing well-formed transliterations according to ATF standards. In turn, the script relies on a dictionary of selected Sumerian words and spellings, which can be obtained through the electronic Pennsylvania Sumerian Dictionary (ePSD2).

For further information on how to use this database please refer to the README section.

All input data are from the Cuneiform Digital Library Initiative project (CDLI).No effort has been made to fix minor inconsistencies in the transliterations, which may result in glitches during the tokenization step.

The database rests on dedicated scripts for the processing of CDLI transliterations. The scripts analyse input data, create the database structure and populate it. Source code on Github.