Open Data Armenia
274 subscribers
48 photos
1 video
2 files
81 links
Open Data Armenia news channel. English/Armenian/Russian

Join chat at https://t.iss.one/opendataamchat
Download Telegram
[EN] Awesome list: a toolkit for text analyzis Armenian language

- Eastern Armenian National Corpus Electronic Library provides a full view of works by classical authors (these books are in the public domain because their authors died more than 70 years ago). The corpus contains 4547379 words from 104 books by 12 authors. 

- Named entity recognition. pioNer — trained data for Armenian NER using Wikipedia. This corpus provides the gold standard for automatically generated annotated datasets using GloVe models for Armenian. Along with the datasets, 50-, 100-, 200-, and 300-dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedias have been released.

- The Polyglot library for Python supports language detection, named entity extraction (using Wikipedia data), morphological analysis, transliteration, and sentiment analysis for Armenian.

- Kevin Bougé Stopword Lists Page includes th Armenian language.

- Ranks NL Stopword Lists Page includes the Armenian language.

If you know of new usefull tools and guides, please share that knowledge with us!

Image author Aparna Melaput

#opendata #armenia #language #tools #digitalhumanities
Natural Language Processing can enhance not only our communication and language knowledge, but also strengthen the historical studies.

Marcella Tambuscio and Tara Lee Andrews in their Geolocation and Named Entity Recognition in Ancient Texts: A Case Study about Ghewond’s Armenian History apply Named Entity Recognition (NER) to Ghewond’s Armenian History. This facilitates drawing the ‘big picture’ of Armenian history in that period and matching historical toponyms with their contemporary counterparts. The outcomes and reproducible validated results of applying the model are published on GitHub. We also added them to our data catalog.

We believe that such studies are going to become more common, making ancient texts more available to a wider public and to the professional community. Tell us if you are aware of similar efforts in the field!

#opendata #armenia #history #language