Open Data Armenia

[EN] Awesome list: a toolkit for text analyzis Armenian language

- Eastern Armenian National Corpus Electronic Library provides a full view of works by classical authors (these books are in the public domain because their authors died more than 70 years ago). The corpus contains 4547379 words from 104 books by 12 authors.

- Named entity recognition. pioNer — trained data for Armenian NER using Wikipedia. This corpus provides the gold standard for automatically generated annotated datasets using GloVe models for Armenian. Along with the datasets, 50-, 100-, 200-, and 300-dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedias have been released.

- The Polyglot library for Python supports language detection, named entity extraction (using Wikipedia data), morphological analysis, transliteration, and sentiment analysis for Armenian.

- Kevin Bougé Stopword Lists Page includes th Armenian language.

- Ranks NL Stopword Lists Page includes the Armenian language.

If you know of new usefull tools and guides, please share that knowledge with us!

Image author Aparna Melaput

#opendata #armenia #language #tools #digitalhumanities

1.8K viewsKseniia Orlova, 10:19

Open Data Armenia

Natural Language Processing can enhance not only our communication and language knowledge, but also strengthen the historical studies.

Marcella Tambuscio and Tara Lee Andrews in their Geolocation and Named Entity Recognition in Ancient Texts: A Case Study about Ghewond’s Armenian History apply Named Entity Recognition (NER) to Ghewond’s Armenian History. This facilitates drawing the ‘big picture’ of Armenian history in that period and matching historical toponyms with their contemporary counterparts. The outcomes and reproducible validated results of applying the model are published on GitHub. We also added them to our data catalog.

We believe that such studies are going to become more common, making ancient texts more available to a wider public and to the professional community. Tell us if you are aware of similar efforts in the field!

#opendata #armenia #history #language

249 viewsValeria Babayan, edited 08:17

About

Blog

Apps

Platform