Fields of expertise
Over the past few years I worked and coordinated work on these topics in research projects at the intersection of computer science and cultural heritage. In the context of various interdisciplinary settings I could leverage both my backgrounds in NLP and humanities, and often acted as an intermediary between computer scientists, humanity scholars, engineers and representatives of cultural heritage institutions.
impresso. Media Monitoring of the Past. How can newspaper archives help understand the past? How to explore them? This large-scale, impact-driven project aims to enable critical mining of newspaper archives by integrating robust content mining and innovative data visualisation and exploration into a powerful user interface that can support digital scholarship.
The HIPE Evaluation Campaigns. What is the ability of machines to recognise and disambiguate entities (e.g. people, places, organisations) in multilingual historical documents? The series of HIPE shared tasks aims to assess and advance the development of robust, adaptable and transferable approaches to named entity processing in historical documents to foster efficient semantic indexing of digitised cultural heritage collections. See the HIPE-2020 and HIPE-2022 websites, the HIPE-eval GitHub organisation, the HIPE-2022 dataset, and the DHLAB web page.
BiographyMaud Ehrmann is a research scientist and lecturer at the Digital Humanities Laboratory of the Ecole Polytechnique Fédérale de Lausanne. She holds a PhD in Computational Linguistics from the Paris Diderot Universtiy (Paris 7) and has been engaged in a large number of scientific projects centred on information extraction and text analysis, both for present-time and historical documents. Before joining the DHLAB, she worked at the Linguistics Computing Laboratory at the Sapienza University of Rome where she worked on the BabelNet resource and contributed to the LIDER project (2013-2014). Prior to that, she worked at the European Commission’s Joint Research Centre in Ispra, Italy, as member of the OPTIMA unit (now Text and Data mining unit) which develops innovative and application-oriented solutions (Europe Media Monitor) for retrieving and extracting information from the Internet with a focus on high multilinguality (2009-2013). Previously, she worked at the Xerox Europe Research Centre in Grenoble, France (now Naver Labs Europe) in the Parsing and Semantics unit, first as PhD candidate supported by a CIFRE grant (2005-2008), then as a post-doctoral researcher (2008-2009). There, her research focused mainly on the automatic processing and fine-grained analysis of entities of interest, specifically named entities and temporal expressions.
Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical DocumentsProceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum. 2022. Conference and Labs of the Evaluation Forum (CLEF 2022) , Bologna, Italy , 5-8 Sept 2022.
DOI : 10.5281/zenodo.6979577.
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical DocumentsAdvances in Information Retrieval. 2022-04-05. 44th European Conference on IR Research, ECIR 2022 , Stavanger, Norway , April 10-14, 2022. p. 347-354.
DOI : 10.1007/978-3-030-99739-7_44.
HIPE-2022 Shared Task Named Entity Datasets
Named Entity Recognition and Classification in Historical Documents: A Survey
ACM Computing Survey
Datasets and Models for Historical Newspaper Article Segmentation
Historical Newspaper Content Mining: Revisiting the impresso Project's Challenges in Text and Image Processing, Design and Historical ScholarshipDH2020 Book of Abstracts. 2020. Digital Humanities Conference (DH) , Ottawa, Canada , July 20-24, 2020.
DOI : 10.5281/zenodo.4641894.
The impresso system architecture in a nutshell
CLEF-HIPE-2020 - Shared Task Participation Guidelines
Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical NewspapersCLEF 2020 Working Notes. Conference and Labs of the Evaluation Forum. 2020-10-21. 11th Conference and Labs of the Evaluation Forum (CLEF 2020) , [Online event] , 22-25 September, 2020.
DOI : 10.5281/zenodo.4117566.
Language Resources for Historical Newspapers: the Impresso CollectionProceedings of the 12th Language Resources and Evaluation Conference. 2020-05-11. 12th International Conference on Language Resources and Evaluation (LREC) , Marseille, France , May 11-16 2020. p. 958-968.
DOI : 10.5281/zenodo.4641902.
Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical NewspapersExperimental IR meets multilinguality, multimodality, and interaction. 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22–25, 2020, Proceedings. 2020-09-15. 11th International Conference of the CLEF Association - CLEF 2020 , Thessaloniki, Greece , September 22–25, 2020. p. 288--310.
DOI : 10.1007/978-3-030-58219-7_21.
Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical NewspapersAdvances in Information Retrieval. ECIR 2020. 2020-04-08. ECIR 2020 : 42nd European Conference on Information Retrieval , Lisbon, Portugal , April 14-17, 2020. p. 524-532.
DOI : 10.1007/978-3-030-45442-5_68.
Survey of digitized newspaper interfaces (dataset and notebooks)
The Past, Present and Future of Digital Scholarship with Newspaper CollectionsDH 2019 Book of Abstracts. 2019-07-09. DIgital Humanities Conference , Utrecht , July 2019.
Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical NewspapersDigital Humanitites in the Nordic Countries, Copenhagen, Denmark, March 2019.
Named Entity Processing for Historical Texts
Linked Lexical Knowledge Bases Foundations and Applications
2017Vol. 43, num. 2.
DOI : 10.1162/COLI_r_00289
From Documents to Structured Data: First Milestones of the Garzoni Project
Diachronic Evaluation of NER Systems on Old NewspapersProceedings of the 13th Conference on Natural Language Processing (KONVENS 2016). 2016. 13th Conference on Natural Language Processing (KONVENS 2016)Conference on Natural Language Processing , Bochum, GermanyBochum, Germany , September 19-21, 2016September 19–21, 2016. p. 97-107.
Navigating through 200 years of historical newspapers2016. International Conference on Digital Preservation (IPRES) , Bern, Switzerland , October 3-6, 2016.
Cross-lingual Linking of Multi-word Entities and their corresponding AcronymsProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). 2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.
Named Entity Resources - Overview and OutlookProceedings of the 9th International Conference on Language Resources and Evaluation. 2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.
A Method for Record Linkage with Sparse Historical Data2016. Digital Humanities Conference 2016 , Krakow, Poland , July 11-16, 2016.
Les entités nommées pour le traitement automatique des languesISTE editions.
|Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet.
ACM Computing Survey (accepted)
|Named Entity Recognition and Classification in Historical Documents: A Survey|
|Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, Antoine Doucet, Simon Clematide.
CLEF 2022 proceedings
|Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents|
|Maud Ehrmann, Matteo Romanello, Alex Flückiger, Simon Clematide.
CLEF 2020 proceedings
|Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers|
|Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Ströbel, Raphaël Barman
|Language Resources for Historical Newspapers: the Impresso Collection|
Teaching & PhD
Humanities and Social Sciences Program