Project

A special focus is on less-resourced European languages, including Estonian, Slovenian, Croatian, Finnish, Latvian, ...

Be first to try out our free AI solutions for less-represented languages: inquire about our tools and let us help you find the right solution!

Inquire

SOLUTIONS

EMA enables journalists, editors and researchers to automate text analysis processes, including:

comment moderation icon

COMMENT MODERATION

Analyse and filter user comments (e.g., detecting hate-speech)

keyword extraction icon

KEYWORD EXTRACTION

Extract important keywords and identify persons and organisations from articles to generate tags

Article generation icon

ARTICLE GENERATION

Generate text from raw data to produce articles for specific topics

Watch a video from the Conference on AI Technology for the Media Industry where EMBEDDIA tools have been presented:

TECHNOLOGY

Our solutions are based on state-of-the-art natural language processing using neural embeddings (deep learning) and transfer learning; this allows us to train language models and solutions with less data, making them practical for less-resourced languages like Estonian, Slovenian, Croatian, Finnish, Latvian.

technology

HOW TO USE OUR TOOLS (ACCESS POINTS)?

Selected tools can be used as pre-trained models (ready to use), available as APIs or Docker images.
For a larger number of tools, we provide the code and developers and scientists can train the models on their data.

texta toolkit logo

Web application with GUI

(Texta Toolkit)

github logo

EMBEDDIA tool's APIs

(GitHub)

gitlab logo

Docker

(Gitlab)

DEMO

Want to try some of our tools quickly and easily online? Click below!

Start journey!
picture of embeddia demo

TOOLS EXPLORER

[object Object] Task Language
NameFunctionalityDescriptionTrainable for other languagesLanguages available off-the-shelflicenceAccess
TNT-KIDKeyword ExtractionA system for automatic keyword extraction; must be trained on a corpus of articles with human-assigned keywords. Pre-trained models are also available for a range of languages - see other TNT-KID entries here.yesen, et, hr, lvMIT
TNT-KID (EN)Keyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for English. noenMIT
TNT-KID (HR)Keyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for Croatian; annotators were 24sata editors. nohrMIT
TNT-KID (LV)Keyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for Latvian; annotators were Latvian Delfi staff. nolvMIT
TNT-KID (ET)Keyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for Estonian; annotators were Ekspress Meedia staff. noetMIT
TNT-KID (SI)Keyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for Slovenian; annotators were Slovenian journalists . noslMIT
BERT MultilingualKeyword ExtractionA system for automatic keyword extraction, trained on a corpus of articles with human-assigned keywords. Pre-trained version with API, for all languages, fine-tuned on manually annotade articles by the staff of: 24sata, Latvian Delfi, Ekspress Meedia and Slovenian journalists.yeshr, lv, sl, et, ru, enMIT
TEXTA TaggerGeneral text classificationA general toolkit for producing supervised text classifiers based on user-defined search results.yesallGPLv3
TEXTA Bert TaggerGeneral text classificationA supervised method for classifying texts using BERT models.yesallGPLv3
TEXTA MLPMultilingual PreprocessorA multilingual processing tool incorporating several approaches for extraction of named entities (people, organizations, locations); works for several languages and different types of text.yesen, et, hr, lt, fiGPLv3
NER APINamed entity recognitionProvides an API giving easy access to the NER BERT BiLSTM models (see other entry).yeshr, sl, fi, ru, sv, lv, lt, etMIT
NER BERT BiLSTMNamed entity recognitionA Named Entity Recognition system based on a standard BiLSTM+CNN+CRF architecture but that uses additional types of embeddings to improve performance.yeshr, sl, fi, ru, sv, lv, lt, etMIT
COMMENTSUMComment summarizationProvides an API giving easy access to COMMENTSUM: a multilingual summarization tool that returns the most informative sentences from a set of user comments. yesen, de, sl, hrMIT
COMMENTSUMComment summarizationA multilingual summarization tool that returns the most informative sentences from a set of user comments. yesen, de, sl, hrMIT
Scalable and interpretable semantic shift detectionSemantic shift detectionDetects changes in word meaning over time, genre or across different media sources, and visualizing the differences, using representations derived from a BERT model.yesen, la, de, svMIT
News sentimentSentiment analysisLabels a news article as positive, negative, or neutral based on its sentiment, using a fine-tuned BERT model.yessl, hrMIT
Language variety classificationLanguage variety classificationDetects which language variety or dialect a text is written in, using neural networks combined with linguistic features.yeshr, sr, bs, de, ar, pt, es, enMIT
Dutch author profilingDutch cross-genre gender classificationDetects the likely gender (male vs. female) of the author of a short text, using a range of linguistic features.yesnlMIT
English and Spanish author profilingGender and bot vs human classificationIdentifies whether a short text was written by a bot or a human, and (for humans) their gender (male vs. female), using a range of linguistic features. Pre-trained version with API, in English and Spanishyesen, esMIT
English and Spanish author profilingGender and bot vs human classificationIdentifies whether a short text was written by a bot or a human, and (for humans) their gender (male vs. female), using a range of linguistic features.yesen, esMIT
Semantic shift detection by averaging contextual embeddingsSemantic shift detectionDetects changes in word meaning over time, and how translations change over time, in different languages, using representations derived from a BERT model.yesen, slMIT
Multi-lingual semantic shift detectionSemantic shift detectionDetects changes in word meaning over time, using representations derived from a BERT model.yesen, la, de, svMIT
NEL FilterNamed entity linking filterA post-processing filter that improves the accuracy of a Named Entity Linking tool by using heuristics and data from Wikidata and DBpedia.yes fi, en, fr, deMIT
NER FEDANamed entity recognitionA Named Entity Recognition system created to train models using multiple datasets, regardless of whether they use the same tagsets or not. This can improve performance, while sharing common aspects but also tagging documents using a variety of tagsets.yesru, sl, bg, pl, cs, uk.MIT
NER MultitaskNamed entity recognitionA Named Entity Recognition system that explores multiple simple methods for improving the performance of trained models.yeshr, sl, fi, ru, sv, lv, lt, etMIT
Multilingual Entity LinkingNamed entity linkingLinks named entities (names of organizations, places and people) to entries in Wikidata, which in turn can be linked to Wikipedia.yeshr, sl, fi, ru, sv, lv, lt, et, fr, de, enApache 2.0
Stacked NERNamed entity recognitionA multilingual Named Entity Recognition system based on fine-tuned BERT models.yeshr, sl, fi, ru, sv, lv, lt, et, fr, de, enMIT
Cross-lingual article linking Article linkingTakes as input an article in one language and a list of candidate articles in another language, and outputs the candidate article that is most similar to the query.yeset, lvMIT
Dynamic multilingual topic modellingTopic evolution trackingTakes as input thematically-aligned documents in two or more languages divided into time slices, and outputs topics for every time slice and language that shows the evolution of a topic for that language.yesen, de, fi, svMIT
Tweet BERT-imentSentiment analysisLabels a short text (e.g. a tweet) as positive, negative, or neutral based on its sentiment, using a fine-tuned BERT model.yeshr, en, ru, slMIT
Fake-News spreaders detection on TwitterFake News detectionDetects users likely to be spreading fake news, based on a collection of their texts, using a range of linguistic features.yesen, esMIT
Detection of COVID-19 related Fake-NewsFake News detectionDetects users likely to be spreading fake news, based on a collection of their texts, using stacked neural networks.yesenMIT
RaKUn APIKeyword extractionAPI giving easy access to RaKUn: extracts keywords from text, by turning text into a graph in which the most important nodes mostly turn out to be keywords. Does not need any training (it is unsupervised) so it can be used for any language.noallGPL3
RaKUnKeyword extractionExtracts keywords from text, by turning text into a graph in which the most important nodes mostly turn out to be keywords. Does not need any training (it is unsupervised) so it can be used for any language.noallGPL3
COVID-NLGNews article generationUses template-based natural language generation to describe salient statistics about the COVID-19 situation on a national level. The use of template-based processing gives a very high quarantee that the output is factually correct, but limits the fluency of the produced text.noen
Eurostat NLGNews article generationUses template-based natural language generation to produce reports about several Eurostat statistics, most prominently consumer price index data for European countries, given the country the report should focus on, and the language the report should be in. Aso supports some statistical datasets beyond Eurostat, but reports about those can only be generated in English and Finnish. The use of template-based methods ensures the output is highly accurate, but limits the fluency of the output.noen, fi, hr, ru, sl, et
Comment analysis NLGComment reportingAnalyses a set of (online) news comments using various other EMBEDDIA text analysis tools, and then produces a brief natural language report designed to give a general understanding of what things are being discussed, and how.noen
Simple Sentiment AnalysisSentiment analysisProvides the CardiffNLP sentiment analysis model (NOT developed within EMBEDDIA) as an online microservice that can be used by the Comment Analysis NLG tool.noall
Comment ModerationComment ModerationDetects whether newspaper comments should be blocked by moderators, using a fine-tuned BERT model. Can output the moderation policy rule which a blocked comment contravenes, given a suitable dataset/model.yesallMIT
Comment Moderation API (EN)Comment ModerationDetects whether newspaper comments should be blocked by moderators, using a fine-tuned BERT model.noallMIT
Comment Moderation API (EN/HR/SL)Comment ModerationDetects whether newspaper comments should be blocked by moderators, using a fine-tuned BERT model.noen, hr, slMIT
Comment Moderation API (EN/ET)Comment ModerationDetects whether newspaper comments should be blocked by moderators, using a fine-tuned BERT model.noen, etMIT
Comment Moderation API (EN/DE/HR/SL/ET)Comment ModerationDetects whether newspaper comments should be blocked by moderators, using a fine-tuned BERT model.noallMIT
Comment Topic Modelling APIComment Topic ModellingAnalyses newspaper comments in terms of the topics they discuss;, using a version of the Embedded Topic Model; topics are automatically detected and can be supplied as lists of keywords.nohrMIT
TeMoCo/TeMoTopicVisualisationVisualizes changes in topic distribution and associated keywords in a document or collection of articles.yesallMIT