The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0

From National Research Council Canada

Alternative titleLe corpus parallèle inuktitut – anglais du Hansard du Nunavut 3.0
Download
  1. (NUNAVUT-HANSARD-INUKTITUT-ENGLISH-PARALLEL-CORPUS-3.0.1.TGZ, 201 MB)
DOIResolve DOI: https://doi.org/10.4224/40001819
AuthorSearch for: 1; Search for: 1; Search for: 1; Search for: 1; Search for: 1; Search for: 1; Search for: 1; Search for: 2
Name affiliation
  1. National Research Council Canada. Digital Technologies
  2. US Army Research Laboratory
FormatText
TypeDataset
Subjectparallel corpus; machine translation; sentence alignment; indigenous languages; low-resource languages
Abstract
Publication date
PublisherNational Research Council Canada
Copyright statement
  • For the text of this corpus: © Legislative Assembly of Nunavut 1999-2020
  • For the scripts and documentation accompanying this corpus: © Her Majesty in Right of Canada 2020
Licence
Terms of use
  • We chose the license CC-BY-4.0 because it allows derivative works, such as improved sentence alignments or trained machine translation systems, while requiring that any derivative work convey our full Copyright, disclaimer and license statement, and a description of changes you make. Please include this LICENSE file and the accompanying README file in any derivative work of this work.
  • You are encouraged to work with this corpus to train machine translation systems or any other NLP technology.
  • You are encouraged to work with this corpus to improve on our alignments: that is why we provide not only the aligned corpus, but also the raw text in one-paragraph-per-line format, and the gold standard evaluation alignments.
  • One thing that is not allowed is: this corpus and any derivatives you make of it should not be used to try to misrepresent what a Member of the Legislative Assembly of Nunavut has said, or be used for any defamatory purpose.
  • Any work using this corpus should cite the following paper: Eric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart and Jeffrey Micher. The Nunavut Hansard Inuktitut-English Parallel Corpus 3.0 with Preliminary Machine Translation Results. Submitted to LREC 2020
Related publication
NoteThe alignment of the corpus and the accompanying documentation and scripts were produced by the National Research Council of Canada.
DISCLAIMER: This corpus and any derivative work created from it do not constitute an official transcript or translation of the debates of the Legislative Assembly of Nunavut, and cannot be construed as evidence of what its members have said. To consult the official transcripts, please visit https://www.assembly.nu.ca/.
Export citationExport as RIS
CollectionNRC Research Data
Record identifierc7e34fa7-7629-43c2-bd6d-19b32bf64f60
Record created2020-01-20
Record modified2020-01-28
Date modified: