Skip to content

Awesome Bible NLP

A curated list of resources dedicated to Biblical Natural Language Processing

Contribute your favorite Biblical NLP resource by raising a pull request! Please read the contribution guidelines before raising a pull request.

Machine Translation

Audio

  • Snow Mountain Dataset: Open-licensed and formatted dataset of audio recordings of the Bible in low-resource Indian languages.

Original Languages

  • Macula Hebrew | Greek: Open-licensed and curated dataset of the Bible in Hebrew and Greek with various connected meta resources (e.g. Syntax trees, glosses, semantic roles).
  • Bible word alignments for multiple languages: This repository contains openly-licensed word alignments for Bibles, including both automatic alignments and manually corrected alignments.

Tokenizers

  • utoken: Universal tokenizer in Python and CLI interface that is also tested on Biblical text.

Romanizers

  • uroman: Universal Romanizer that can convert any unicode script to roman (latin) script

Toolkits

  • SIL Machine | Python version | JavaScript Version: Toolkit for various NLP operations on Biblical content (especially support for Paratext projects).
  • Wildebeest: Investigate, repair and normalize text for a wide range of issues at the character level. Especially tested on Biblical content.

Made with ❤️ by the PABNLP community