Pasar al contenido principal

Development of a Guarani - Spanish Parallel Corpus

Tipo
Paper de conferencia
Año
2020
Lugar publicado
Marseille, France
Publisher
European Language Resources Association
Páginas
2629
Abstract

This paper presents the development of a Guarani - Spanish parallel corpus with sentence-level alignment. The Guarani sentences of the corpus use the Jopara Guarani dialect, the dialect of Guarani spoken in Paraguay, which is based on Guarani grammar and may include several Spanish loanwords or neologisms. The corpus has around 14,500 sentence pairs aligned using a semi-automatic process, containing 228,000 Guarani tokens and 336,000 Spanish tokens extracted from web sources.

Autores

Gustavo Giménez Lugo
Pedro Amarilla
Adolfo Rí­os
Luis Chiruzzo
Citekey
chiruzzo-EtAl:2020:LREC
Keywords