Maud Ehrmann
EPFL CDH DHI DHLAB
INN 116 (Bâtiment INN)
Station 14
1015 Lausanne
+41 21 693 19 31
Office:
INN 116
EPFL › IC › DHI › DHLAB
Site web: https://dhlab.epfl.ch
+41 21 693 19 31
EPFL › CDH › CDH-SG › CDH-IT
Formation
2008 – 2008 Paris 7 Diderot University, LaTTICE laboratory
2004 – 2004 University of Lorraine, France
2003 – 2003 University of Lorraine, France
2002 – 2002 University of Lorraine, France
2001 – 2001 University of Lorraine, France
Publications représentatives
Named Entity Recognition and Classification in Historical Documents: A Survey
Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet.
Published in ACM Computing Survey (accepted) in
Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, Antoine Doucet, Simon Clematide.
Published in CLEF 2022 proceedings in
Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
Maud Ehrmann, Matteo Romanello, Alex Flückiger, Simon Clematide.
Published in CLEF 2020 proceedings in
Language Resources for Historical Newspapers: the Impresso Collection
Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Ströbel, Raphaël Barman
Published in LREC 2020 in
Exploring Large Vision-Language Models for Historical Newspaper Segmentation
2025Advisor(s) : M. Ehrmann; F. Kaplan; P. I. Conti; E. Boros
Data Visualization Dashboard For Large-Scale Data Processing Monitoring And Quality Control
2025Advisor(s) : M. Ehrmann; P. I. Conti
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
Sustainability and Empowerment in the Context of Digital Libraries - 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024, Proceedings. 2025. 26th International Conference on Asia-Pacific Digital Libraries , Bandar Sunway, Malaysia , 2024-12-04 - 2024-12-06. p. 54 - 66.DOI : 10.1007/978-981-96-0865-2_5.
Towards Chapterisation of Podcasts Detection of Host and Structuring Questions in Radio Transcripts
2024Advisor(s) : M. Ehrmann
Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024). 2024. The 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , St Julian's, Malta , March 22, 2024. p. 133 - 159.impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers
Frontiers in Big Data
2023
Vol. 6, num. Visualizing Big Culture and History Data.DOI : 10.3389/fdata.2023.1249469
Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers
2023Advisor(s) : M. Ehrmann; E. Boros; M. Duering; F. Kaplan
From Archival Sources to Structured Historical Information: Annotating and Exploring the "Accordi dei Garzoni"
Apprenticeship, Work, Society in Early Modern Venice; Abingdon: Routledge, Taylor & Francis Group,2023.
DOI : 10.4324/9781003197195-6.
Computational Approaches to Digitised Historical Newspapers (Dagstuhl Seminar 22292)
2023
Digitised Historical Newspapers: A Changing Research Landscape (Introduction)
Digitised Newspapers – A New Eldorado for Historians?; Berlin, Boston: De Gruyter Oldenbourg,2022.
Digitised Newspapers – A New Eldorado for Historians? Reflections on Tools, Methods and Epistemology
Berlin: De Gruyter, 2022.Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
Advances in Information Retrieval. 2022. 44th European Conference on IR Research, ECIR 2022 , Stavanger, Norway , April 10-14, 2022. p. 347 - 354.DOI : 10.1007/978-3-030-99739-7_44.
Automatic table detection and classification in large-scale newspaper archives
2022Advisor(s) : M. Ehrmann; S. Clematide; F. Kaplan
ECCE: Entity-centric Corpus Exploration Using Contextual Implicit Networks
WWW ’22 Companion. 2022. The Web Conference (WWW'22) , Lyon, France , April 25-29, 2022. p. 1 - 4.DOI : 10.1145/3487553.3524237.
HIPE-2022 Shared Task Named Entity Datasets
2022.
Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum. 2022. 13th Conference and Labs of the Evaluation Forum (CLEF 2022) , Bologna, Italy , 5-8 Sept 2022.DOI : 10.5281/zenodo.6979577.
Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, September 5–8, 2022, Proceedings. 2022. 13th Conference and Labs of the Evaluation Forum (CLEF 2022) , Bologna, Italy , 5-8 September 2022. p. 423 - 446.DOI : 10.1007/978-3-031-13643-6_26.
Explorer la presse numérisée : le projet Impresso
Revue Historique Vaudoise
2021
Vol. 129/2021.Named Entity Recognition and Classification in Historical Documents: A Survey
ACM Computing Survey
2021
Vol. 56, num. 2.Datasets and Models for Historical Newspaper Article Segmentation
2021.
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
Journal of Data Mining & Digital Humanities
2021
Vol. 2021, num. Special Issue on HistoInformatics: Computational Approaches to History.DOI : 10.5281/zenodo.4065271
Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
CLEF 2020 Working Notes. Conference and Labs of the Evaluation Forum. 2020. 11th Conference and Labs of the Evaluation Forum (CLEF 2020) , [Online event] , 22-25 September, 2020.DOI : 10.5281/zenodo.4117566.
Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers
Experimental IR meets multilinguality, multimodality, and interaction. 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22–25, 2020, Proceedings. 2020. 11th International Conference of the CLEF Association - CLEF 2020 , Thessaloniki, Greece , September 22–25, 2020. p. 288 - .DOI : 10.1007/978-3-030-58219-7_21.
Language Resources for Historical Newspapers: the Impresso Collection
Proceedings of the 12th Language Resources and Evaluation Conference. 2020. 12th International Conference on Language Resources and Evaluation (LREC) , Marseille, France , May 11-16 2020. p. 958 - 968.DOI : 10.5281/zenodo.4641902.
Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers
Advances in Information Retrieval. ECIR 2020. 2020. ECIR 2020 : 42nd European Conference on Information Retrieval , Lisbon, Portugal , April 14-17, 2020. p. 524 - 532.DOI : 10.1007/978-3-030-45442-5_68.
CLEF-HIPE-2020 Shared Task Named Entity Datasets
2020.
The impresso system architecture in a nutshell
2020
Impresso Named Entity Annotation Guidelines (CLEF-HIPE-2020)
2020
CLEF-HIPE-2020 - Shared Task Participation Guidelines
2020
Historical Newspaper Content Mining: Revisiting the impresso Project's Challenges in Text and Image Processing, Design and Historical Scholarship
DH2020 Book of Abstracts. 2020. Digital Humanities Conference (DH) , Ottawa, Canada , July 20-24, 2020.DOI : 10.5281/zenodo.4641894.
Historical Newspaper User Interfaces: A Review
[Proceedings of the 85th IFLA General Conference and Assembly]. 2019. 85th IFLA General Conference and Assembly , Athens, Greece , 24-30 August 2019. p. 1 - 24.DOI : 10.5281/zenodo.3404155.
Named Entity Processing for Historical Texts
2019.
The Past, Present and Future of Digital Scholarship with Newspaper Collections
DH 2019 Book of Abstracts. 2019. DIgital Humanities Conference , Utrecht , July 2019.Historical newspaper semantic segmentation using visual and textual features
2019Advisor(s) : M. Ehrmann; S. Ares Oliveira; S. Clematide
Index-Driven Digitization and Indexation of Historical Archives
Frontiers in Digital Humanities
2019
Vol. 6, num. 1-16.DOI : 10.3389/fdigh.2019.00004
Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical Newspapers
Digital Humanitites in the Nordic Countries, Copenhagen, Denmark, March 2019.Survey of digitized newspaper interfaces (dataset and notebooks)
2019.
JRC-Names: Multilingual Entity Name variants and titles as Linked Data
Semantic Web
2017
Vol. 8, num. 2.DOI : 10.3233/SW-160228
Linked Lexical Knowledge Bases Foundations and Applications
Computational Linguistics
2017
Vol. 43, num. 2.DOI : 10.1162/COLI_r_00289
Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). 2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.Named Entity Resources - Overview and Outlook
Proceedings of the 9th International Conference on Language Resources and Evaluation. 2016. 10th International Conference on Language Resources and Evaluation , Portorož, Slovenia , May 2016.A Method for Record Linkage with Sparse Historical Data
2016. Digital Humanities Conference 2016 , Krakow, Poland , July 11-16, 2016.From Documents to Structured Data: First Milestones of the Garzoni Project
DHCommons
2016
num. 2.Diachronic Evaluation of NER Systems on Old Newspapers
Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016). 2016. 13th Conference on Natural Language Processing (KONVENS 2016)Conference on Natural Language Processing , Bochum, GermanyBochum, Germany , September 19-21, 2016September 19–21, 2016. p. 97 - 107.Navigating through 200 years of historical newspapers
2016. International Conference on Digital Preservation (IPRES) , Bern, Switzerland , October 3-6, 2016.Les entités nommées pour le traitement automatique des langues
ISTE editions, 2015.Enseignement et PhD
Cours
Historical Document and Media Processing
DH-400
Ce cours introduit le traitement de documents historiques, i.e. les concepts et méthodes pour transformer des matériaux numérisés en informations consultables. Fondé sur l'apprentissage automatique et le traitement de documents, il aborde aussi la curation de données et le droit d'auteur.