Maud Ehrmann

EPFL CDH DHI DHLAB
INN 116 (Bâtiment INN)
Station 14
1015 Lausanne

Expertise

With a background in both natural language processing (NLP) and the humanities, my expertise is in the area of historical and multilingual NLP, with particular focus on historical document processing, information extraction, named entity processing, multilingual and historical resource creation, NLP system evaluation, and large-scale infrastructure. In recent years, I have worked and coordinated work on these topics in research projects at the intersection of computer science and cultural heritage - an interdisciplinary setting in which I have often acted as an intermediary between computer scientists, humanities scholars, engineers, and representatives of cultural heritage institutions.

Highlights:

impresso. Media Monitoring of the Past. How can newspaper archives help understand the past? How to explore them? This large-scale, impact-driven project aims to enable critical mining of newspaper archives by integrating robust content mining and innovative data visualisation and exploration into a powerful user interface that can support digital scholarship.
The HIPE Evaluation Campaigns. What is the ability of machines to recognise and disambiguate entities (e.g. people, places, organisations) in multilingual historical documents? The series of HIPE shared tasks aims to assess and advance the development of robust, adaptable and transferable approaches to named entity processing in historical documents to foster efficient semantic indexing of digitised cultural heritage collections. See the HIPE-2020 and HIPE-2022 websites, the HIPE-eval GitHub organisation, the HIPE-2022 dataset, and the DHLAB web page.

Expertise

With a background in both natural language processing (NLP) and the humanities, my expertise is in the area of historical and multilingual NLP, with particular focus on historical document processing, information extraction, named entity processing, multilingual and historical resource creation, NLP system evaluation, and large-scale infrastructure. In recent years, I have worked and coordinated work on these topics in research projects at the intersection of computer science and cultural heritage - an interdisciplinary setting in which I have often acted as an intermediary between computer scientists, humanities scholars, engineers, and representatives of cultural heritage institutions.

Highlights:

impresso. Media Monitoring of the Past. How can newspaper archives help understand the past? How to explore them? This large-scale, impact-driven project aims to enable critical mining of newspaper archives by integrating robust content mining and innovative data visualisation and exploration into a powerful user interface that can support digital scholarship.
The HIPE Evaluation Campaigns. What is the ability of machines to recognise and disambiguate entities (e.g. people, places, organisations) in multilingual historical documents? The series of HIPE shared tasks aims to assess and advance the development of robust, adaptable and transferable approaches to named entity processing in historical documents to foster efficient semantic indexing of digitised cultural heritage collections. See the HIPE-2020 and HIPE-2022 websites, the HIPE-eval GitHub organisation, the HIPE-2022 dataset, and the DHLAB web page.
Maud Ehrmann is a research scientist and lecturer at the Digital Humanities Laboratory of the Ecole Polytechnique Fédérale de Lausanne. She holds a PhD in Computational Linguistics from the Paris Diderot Universtiy (Paris 7) and has been engaged in a large number of scientific projects centred on information extraction and text analysis, both for present-time and historical documents. Before joining the DHLAB, she worked at the Linguistics Computing Laboratory at the Sapienza University of Rome where she worked on the BabelNet resource and contributed to the LIDER project (2013-2014). Prior to that, she worked at the European Commission's Joint Research Centre in Ispra, Italy, as member of the OPTIMA unit (now Text and Data mining unit) which develops innovative and application-oriented solutions (Europe Media Monitor) for retrieving and extracting information from the Internet with a focus on high multilinguality (2009-2013). Previously, she worked at the Xerox Europe Research Centre in Grenoble, France (now Naver Labs Europe) in the Parsing and Semantics unit, first as PhD candidate supported by a CIFRE grant (2005-2008), then as a post-doctoral researcher (2008-2009). There, her research focused mainly on the automatic processing and fine-grained analysis of entities of interest, specifically named entities and temporal expressions.

Education

PhD in Computational Linguistics

|

2008 – 2008 Paris 7 Diderot University, LaTTICE laboratory

Master in Computational Linguistics

|

2004 – 2004 University of Lorraine, France

Master in General Linguistics

|

2003 – 2003 University of Lorraine, France

Bachelor in History

|

2002 – 2002 University of Lorraine, France

Bachelor in Comparative Literature

|

2001 – 2001 University of Lorraine, France

Selected publications

Named Entity Recognition and Classification in Historical Documents: A Survey

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet.
Published in ACM Computing Survey (accepted) in

Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, Antoine Doucet, Simon Clematide.
Published in CLEF 2022 proceedings in

Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers

Maud Ehrmann, Matteo Romanello, Alex Flückiger, Simon Clematide.
Published in CLEF 2020 proceedings in

Language Resources for Historical Newspapers: the Impresso Collection

Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Ströbel, Raphaël Barman
Published in LREC 2020 in

Teaching & PhD

Courses

Historical Document and Media Processing

DH-400

This course introduces historical document processing, focusing on concepts and methods that enable the transformation of digitised materials into searchable information. Grounded in machine learning and document processing, it also covers data curation and copyright considerations.