From the website:
- Annotated with our latest NLP tools (part of speech tagger 1.9, tokenizer 4.1.0, language tagger and lemmatizer include lexical entries from the Database and Dictionary of Greek Loanwords in Coptic (DDGLC))
- Now contains the morph layer (annotating compound words and Coptic morphs such ⲣⲉϥ- ⲙⲛⲧ- ⲁⲧ-)
- Visualizations for linguistic analysis
Search and queriesFor searches and queries using our ANNIS database to find specific terms, for this corpus we recommend searching the normalized words using regular expressions (to capture instances of the desired word that may still be embedded in a Coptic bound group, instances that our tokenizer may have missed):
As you can see, most of the hits are accurate (e.g., ⲥⲟⲧⲙ, ⲁⲧⲥⲱⲧⲙ, ⲣⲁⲧⲥⲱⲧⲙ, ⲣⲉϥⲥⲱⲧⲙ); some of the Coptic bound groups did not tokenize properly (e.g., ⲉⲡⲥⲱⲧⲙ, ⲙⲁⲣⲟⲩⲥⲱⲧⲙ). We expect accuracy to increase as we incorporate more texts into our corpora that have been machine annotated and then manually edited.
Reading by individual chapterYou can also read these documents and see the linguistic analysis visualizations at data.copticscriptorium.org/urn:cts:copticLit:nt. The first documents you will see (Gospel of Mark, 1 Corinthians) are manually annotated. Scroll down for “New Testament,” which is the full, machine-annotated Sahidica New Testament. Click on “Chapter” to read each chapter as normalized Coptic (with English translation as a pop-up when you hover your cursor). Click on “Analytic” for the normalized Coptic, part of speech analysis, and English translation for each chapter. Please keep in mind the English translation provided is a free, open-access New Testament translation from the World English Bible; it is not a direct translation from the Coptic.
Note: we know that our server is slow generating the documents for this corpus. It may take several minutes to load; please be patient. For faster access, use ANNIS. Visualizations to read the chapters are available by clicking on the corpus and the icon for visualizations.
We hope this corpus is useful to researchers.