This page provides the Universal Declaration of Human Rights corpus. The original PDF files have been retrieved from the Office of the High Commission for Human Rights (OHCHR) and the the texts have been extracted. If you use this corpus in your work, please cite the OHCHR and the following paper:
Tommi Vatanen, Jaakko J. Väyrynen and Sami Virpioja (2010) Language identification of short text segments with n-gram models. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), pages 3423-3430. European Language Resources Association (ELRA). [BibTeX]
For questions, please contact jaakko.j.vayrynen@aalto.fi.
Please let us know.
We are not aware of any copyright restrictions of the UDHR material. The OHCHR wishes that if UDHR translations or materials are reproduced, users should make reference to their website as a source by providing a link.
The whole corpus can be downloaded from the links below. PDF files are copies of the originals from the OHCHR web site. Text files have been extracted in UTF-8 with the pdftotext command in Linux and manually checked.
Type | Languages | File | Size |
---|---|---|---|
372 | udhr_pdf_20100325.tar.gz | 70 MB | |
txt | 281 | udhr_txt_20100325.tar.gz | 1.1 MB |