Maud Ehrmann

Edit profile

Scientist

maud.ehrmann@epfl.ch +41 21 693 19 31

https://orcid.org/0000-0001-9900-2193

Google Scholar ID

Linkedin ID

EPFL CDH DHI DHLAB
INN 116 (Bâtiment INN)
Station 14
1015 Lausanne

+41 21 693 19 31
Office: INN 116
EPFL > CDH > DHI > DHLAB

Web site: Web site: https://dhlab.epfl.ch

+41 21 693 19 31
EPFL > CDH > CDH-SHS > SHS-ENS

+41 21 693 19 31
EPFL > CDH > CDH-SG > CDH-IT

vCard
Administrative data

Fields of expertise

With a background in both natural language processing (NLP) and the humanities, my expertise is in the area of historical and multilingual NLP, with particular focus on historical document processing, information extraction, named entity processing, multilingual and historical resource creation, NLP system evaluation, and large-scale infrastructure. In recent years, I have worked and coordinated work on these topics in research projects at the intersection of computer science and cultural heritage - an interdisciplinary setting in which I have often acted as an intermediary between computer scientists, humanities scholars, engineers, and representatives of cultural heritage institutions.

Highlights:

impresso. Media Monitoring of the Past. How can newspaper archives help understand the past? How to explore them? This large-scale, impact-driven project aims to enable critical mining of newspaper archives by integrating robust content mining and innovative data visualisation and exploration into a powerful user interface that can support digital scholarship.

The HIPE Evaluation Campaigns. What is the ability of machines to recognise and disambiguate entities (e.g. people, places, organisations) in multilingual historical documents? The series of HIPE shared tasks aims to assess and advance the development of robust, adaptable and transferable approaches to named entity processing in historical documents to foster efficient semantic indexing of digitised cultural heritage collections. See the HIPE-2020 and HIPE-2022 websites, the HIPE-eval GitHub organisation, the HIPE-2022 dataset, and the DHLAB web page.

Biography

Maud Ehrmann is a research scientist and lecturer at the Digital Humanities Laboratory of the Ecole Polytechnique Fédérale de Lausanne. She holds a PhD in Computational Linguistics from the Paris Diderot Universtiy (Paris 7) and has been engaged in a large number of scientific projects centred on information extraction and text analysis, both for present-time and historical documents. Before joining the DHLAB, she worked at the Linguistics Computing Laboratory at the Sapienza University of Rome where she worked on the BabelNet resource and contributed to the LIDER project (2013-2014). Prior to that, she worked at the European Commission’s Joint Research Centre in Ispra, Italy, as member of the OPTIMA unit (now Text and Data mining unit) which develops innovative and application-oriented solutions (Europe Media Monitor) for retrieving and extracting information from the Internet with a focus on high multilinguality (2009-2013). Previously, she worked at the Xerox Europe Research Centre in Grenoble, France (now Naver Labs Europe) in the Parsing and Semantics unit, first as PhD candidate supported by a CIFRE grant (2005-2008), then as a post-doctoral researcher (2008-2009). There, her research focused mainly on the automatic processing and fine-grained analysis of entities of interest, specifically named entities and temporal expressions.

Publications

Infoscience publications

Towards Chapterisation of Podcasts Detection of Host and Structuring Questions in Radio Transcripts

M. Piguet

2024-03-28

Advisor(s) : M. Ehrmann

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet. ACM Computing Survey (accepted)	Named Entity Recognition and Classification in Historical Documents: A Survey
Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, Antoine Doucet, Simon Clematide. CLEF 2022 proceedings	Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
Maud Ehrmann, Matteo Romanello, Alex Flückiger, Simon Clematide. CLEF 2020 proceedings	Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Ströbel, Raphaël Barman LREC 2020	Language Resources for Historical Newspapers: the Impresso Collection

Maud Ehrmann

Scientist

Fields of expertise

Biography

Publications

Infoscience publications

Towards Chapterisation of Podcasts Detection of Host and Structuring Questions in Radio Transcripts

Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers

Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers

From Archival Sources to Structured Historical Information: Annotating and Exploring the "Accordi dei Garzoni"

Computational Approaches to Digitised Historical Newspapers (Dagstuhl Seminar 22292)

Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

Digitised Historical Newspapers: A Changing Research Landscape (Introduction)

Digitised Newspapers – A New Eldorado for Historians? Reflections on Tools, Methods and Epistemology

Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

ECCE: Entity-centric Corpus Exploration Using Contextual Implicit Networks

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

HIPE-2022 Shared Task Named Entity Datasets

Automatic table detection and classification in large-scale newspaper archives

Named Entity Recognition and Classification in Historical Documents: A Survey

Explorer la presse numérisée : le projet Impresso

Datasets and Models for Historical Newspaper Article Segmentation

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

CLEF-HIPE-2020 Shared Task Named Entity Datasets

Historical Newspaper Content Mining: Revisiting the impresso Project's Challenges in Text and Image Processing, Design and Historical Scholarship

The impresso system architecture in a nutshell

Impresso Named Entity Annotation Guidelines (CLEF-HIPE-2020)

CLEF-HIPE-2020 - Shared Task Participation Guidelines

Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers

Language Resources for Historical Newspapers: the Impresso Collection

Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers

Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers

Survey of digitized newspaper interfaces (dataset and notebooks)

The Past, Present and Future of Digital Scholarship with Newspaper Collections

Historical newspaper semantic segmentation using visual and textual features

Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical Newspapers

Index-Driven Digitization and Indexation of Historical Archives

Historical Newspaper User Interfaces: A Review

Named Entity Processing for Historical Texts

Linked Lexical Knowledge Bases Foundations and Applications

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

From Documents to Structured Data: First Milestones of the Garzoni Project

Diachronic Evaluation of NER Systems on Old Newspapers

Navigating through 200 years of historical newspapers

Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms

Named Entity Resources - Overview and Outlook

A Method for Record Linkage with Sparse Historical Data

Les entités nommées pour le traitement automatique des langues

Selected publications

Teaching & PhD

Teaching

Courses

Historical Document and Media Processing

All postal addresses and positions