Development of a Guarani - Spanish Parallel Corpus

Tipo

Paper de conferencia

Año

2020

Lugar publicado

Marseille, France

Publisher

European Language Resources Association

Páginas

2629

Abstract

This paper presents the development of a Guarani - Spanish parallel corpus with sentence-level alignment. The Guarani sentences of the corpus use the Jopara Guarani dialect, the dialect of Guarani spoken in Paraguay, which is based on Guarani grammar and may include several Spanish loanwords or neologisms. The corpus has around 14,500 sentence pairs aligned using a semi-automatic process, containing 228,000 Guarani tokens and 336,000 Spanish tokens extracted from web sources.

Autores

Gustavo Giménez Lugo

Pedro Amarilla

Adolfo Ríos

Luis Chiruzzo

Citekey

chiruzzo-EtAl:2020:LREC

URL a la publicación

https://www.aclweb.org/anthology/2020.lrec-1.320

Keywords