Celtic Language Technology Workshop 2019

At MT Summit XVII, Dublin City University, Dublin. Monday 19th August 2019

Proceedings

The proceedings are now available on the ACL Anthology: https://www.aclweb.org/anthology/volumes/W19-69/.

Programme

(09.05-10.30)	Morning Session 1
09.00-09.10	Opening Remarks
	Micheál Ó Conaire, Department of Culture, Heritage and the Gaeltacht
09.10-10.05	Invited talk by Kelly Davis
	Free(ing) Speech Corpora and STT Models with Common Voice and Deep Speech [slides]
10.05-10.20	Speech technology and Argentinean Welsh [slides]
	Elise Bell
(10.30-11.00)	Break
(11.00-12.30)	Morning Session 2
11.00-11.20	Embedding Welsh to English MT in a private company [slides]
	Myfyr Prys and Dewi Bryn Jones
11.20-11.35	Leveraging backtranslation to improve machine translation for Gaelic languages [slides]
	Meghan Dowling, Teresa Lynn and Andy Way
11.35-11.55	Improving full-text search results on dúchas.ie using language technology [slides]
	Brian Ó Raghallaigh, Kevin Scannell and Meghan Dowling
11.55-12:15	Adapting Term Recognition to an Under-Resourced Language: the Case of Irish [slides]
	John P. McCrae and Adrian Doyle
12:15-12.30	Unsupervised multi–word term recognition in Welsh [slides]
	Irena Spasić, David Owen, Dawn Knight and Andreas Artemiou
(12.30-14.00)	Lunch
(14.00-15.00)	Afternoon Session 3
14.00-14.20	Development of a Universal Dependencies treebank for Welsh [slides]
	Johannes Heinecke and Francis M. Tyers
14.20-14.40	Universal dependencies for Scottish Gaelic: syntax [slides]
	Colin Batchelor
14.40-15.00	A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles [slides]
	Adrian Doyle, John P. McCrae and Clodagh Downey
(15.00-15.30)	Afternoon break
(15.30-17.45)	Afternoon Session 4
15.30-16.30	Invited talk by Claudia Soria
	BLaRKing at minority language speakers: the Digital Language Survival Kit as a speaker-centered approach to digital development of minority languages. [slides]
16:30-16.50	Code-switching in Irish tweets: A preliminary analysis [slides]
	Teresa Lynn and Kevin Scannell
16:50-17.10	A Green Approach for an Irish App (Refactor, reuse and keeping it real) [slides]
	Monica Ward, Maxim Mozgovy and Marina Purgina
17.10-17.30	Community Discussion

Invited speakers

Claudia Soria is a researcher at CNR-ILC. She has a background in computational linguistics, with a focus on language resources in their entire life-cycle, from creation to representation to evaluation. She is one of the authors of LMF, Lexical Markup Framework, an ISO standard for the representation of computational lexicons. Her current research interests revolve around use of technological means, Language Technology in particular, for protection and valorisation of linguistic diversity. Other current interests are use and usability of regional/minority languages on social media; ethnolinguistic vitality of regional and minority languages of Italy; creation of lexico-conceptual resources for archiving traditional knowledge. She coordinated an Erasmus+ project, “The Digital Language Diversity Project”, and a research project in cooperation with the Polish Academy of Sciences, “Protection of the linguistic heritage. A comparison of attitudes towards linguistic diversity in Poland and Italy”. She’s currently serving as vice-director of the European Language Equality Network (ELEN), and is part of the UNESCO Board of Experts on Multilingualism in Cyberspace. On the activist side, she is involved in spreading awareness about Italy's linguistic diversity, encouraging use and re-appropriation of autochthonous languages.

Kelly Davis has many irons in the fire. He studied Mathematics and Physics at MIT, then went on to do graduate work in Superstring Theory/M-Theory. He then jumped ship, coding at a startup that eventually went public in the late 90's. When the bubble burst, he jumped back into an academic setting and joined the Max Planck Institute for Gravitational Physics where he worked on software systems used to help simulate black hole mergers. Jumping ship yet again, he went back into industry, writing 3D rendering software at Mental Images/NVIDIA. When that lost its charm, he founded a NLU at a startup, 42, that created a system, based off of IBM'S Watson, able to answer general knowledge questions. After a brief stint as the Director of Machine Learning at another Berlin startup, he joined Mozilla where he now leads the machine learning group.

Venue and registration

The Helix, DCU, Dublin.

Call for papers

Language Technology and Computational Linguistics research innovations in recent years have given us a great deal of modern language processing tools and resources for many languages. Basic language tools like spell and grammar checkers through to interactive systems like Siri, as well as resources like the Trillion Word Corpus, all fit together to produce products and services which enhance our daily lives.

Until relatively recently, languages with smaller numbers of speakers have largely not benefited from attention in this field. However, modern techniques in the field are making it easier to create language tools and resources from fewer resources in a faster time. In this light, many lesser-spoken languages are making their way into the digital age through the provision of language technologies and resources.

The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing NLP (Natural Language Processing) resources and technologies for Celtic languages. As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.

This will be the third Celtic Language Technology Workshop (CLTW), this time co-located with MT Summit XVII in Dublin, Ireland.

Our workshop welcomes theoretical and practical submissions on any Celtic language (Irish, Welsh, Scottish Gaelic, Manx, Cornish or Breton) that contributes to research in machine translation, automated language processing, language/speech technologies or resources for the same. With Ireland’s recent progress in the area of machine translation (particularly in public administration) and steps towards combining speech processing and machine translation for Welsh, there is much scope for sharing best practices and leveraging from learned experiences through working with limited resources in this forum. We will particularly encourage studies that address either practical applications with a human in the loop or the lack of resources available for a given language in this field.

Topics of interest for the CLTW include but are not limited to:

machine translation
corpus development and analysis
treebanking
speech technology
evaluation methods
language resources
syntax
semantics
parsing
lexicons
phonology
morphological analysis
ontologies
terminology
knowledge representation
computer-assisted language learning (CALL)
Digital Humanities

Important dates

15th March: First CfP
~~10th May: Submission deadline~~
~~24th May: Extended deadline~~
14th June: Acceptance notification
9th July: Camera-ready deadline

Instructions for authors

Full papers must not exceed 10 (ten) pages plus unlimited pages for references, and must be formatted according to the MT Summit 2019 style guide (links below).

Short papers should be up to five pages with unlimited pages for references.

All papers will be rigorously reviewed for novelty and impact, and published in the workshop proceedings. The papers that best suit poster presentation will be presented as posters and the rest as talks.

Submitted papers must be in PDF. To allow for blind reviewing, please do not include author names and affiliations within the paper, and avoid obvious self-references. Papers must be submitted to the Easy Chair system.

Organisers

Teresa Lynn, Dublin City University
Delyth Prys, University of Bangor
Colin Batchelor, Royal Society of Chemistry
Francis M. Tyers, Indiana University and Higher School of Economics

Programme committee

Andrew Carnie, University of Arizona
Annie Foret, Université Rennes 1
Aranza Diaz, University of the Basque Country
Brian Davis, Maynooth University
Brian Ó Raghallaigh, Fiontar/Dublin City University
Elaine Uí Dhonnchadha, Trinity College Dublin
Jeremy Evas, Cardiff University and Welsh Government
John Judge, ADAPT Centre, Dublin City University
John McCrae, INSIGHT Centre, University of Galway
Kepa Sarasola, University of the Basque Country
Kevin Scannell, St. Louis University
Monica Ward, Dublin City University
Montse Maritxalar, University of the Basque Country
Claudia Soria, National Research Council of Italy
Nancy Stenson, University College Dublin
Pauline Welby, CNRS/University College Dublin
William Lamb, University of Edinburgh

Previous workshops

The first Celtic Language Technology Workshop took place at COLING in 2014, again at DCU. The second one was held at JEP-TALN in 2016 in Paris.

Acknowledgements

We thank Mozilla and the Department of Culture, Heritage and the Gaeltacht for support.

Colin Batchelor
2019-08-20