nlpIrish

Irish NLP Dataset Descriptions

This is a collection of descriptions, sources and extraction instructions for Irish language natural language processing (NLP) text datasets for NLP research.

Would you like to add to or collaborate on this collection? Great! Head up to the About section to see how to contribute ๐Ÿ‘Œ

This site is hosted on GitHub and built using the fabulous fastpages

Parallel Corpora*

In order of dataset size (but remember lines of text doesnโ€™t equal quality!):

*Sizes as of June 2020, word count defined as space-separated tokens

Monolingua Irish Corpora

  • tbd

Task-specific Corpora

  • tbd

ALL DATASET DESCRIPTIONS ๐Ÿ‘‡