Jean-Marc Odobez

Senior Scientist
EPFL STI IEM LIDIAP
ELD 241 (Bâtiment ELD)
Station 11
1015 Lausanne
Web site: Web site: https://idiap.epfl.ch/
EPFL > VPA-AVP-DLE > AVP-DLE-EDOC > EDEE-ENS
Fields of expertise
Activity analysis, human behavior understanding, human communication, interaction modelings.
Biography
Dr Jean-Marc Odobez (Msc 1990, PhD 1994) received his Ph.D degree from Rennes University in 1994 for his PhD dissertation done at INRIA. After 5 years as an assistant professor at the University du Maine, France, he decided to join Idiap, where he is now the Head of the Perception & Activity Understanding Group and adjunct faculty at EPFL in the school of engineering and a member of the Electrical Engineering Doctoral committee (EDEE).His research interests are the design of multimodal perception systems rooted in computer vision, statistical machine learning and deep learning, or social sciences, for activity and behavior recognition, human-human or human-robot interaction modeling and understanding. Application domains include human health assessment, social robotics, or media content analysis. He has published more than 50 journals and 160 conference refereed papers in his research field. He has been the principal investigator of more than 16 European and Swiss projects, and has worked on 10 tech transfer projects with SMEs. He holds several patents in computer vision, and is the cofounder of Klewel SA (www.klewel.ch) and Eyeware SA (eyeware.tech), a tech company in eye tracking and attention modeling. He is an IEEE member and associate editor of the Machine Vision and Application journal. He regularly serves as area chair for the ICMI, ICCV, CVPR or ECCV conferences.
Education
PhD
Computer Science
Rennes I University
1994
Engineering Degree
ENST Bretagne (Ecole Nationale d'Ing�nieur des T�l�coms de Bretagne)
1990
Publications
Infoscience publications
Infoscience
Journal Articles
Robust Unsupervised Gaze Calibration Using Conversation and Manipulation Attention Priors
Acm Transactions On Multimedia Computing Communications And Applications
2022-01-01
DOI : 10.1145/3472622
Active Learning of Bayesian Probabilistic Movement Primitives
IEEE Robotics and Automation Letters
2021
DOI : 10.1109/LRA.2021.3060414
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation
IEEE/ACM Transactions on Audio, Speech, and Language Processing
2021
DOI : 10.1109/TASLP.2021.3060257
A Differential Approach for Gaze Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence
2021
DOI : 10.1109/TPAMI.2019.2957373
WatchNet plus plus : efficient and accurate depth-based network for detecting people attacks and intrusion
Machine Vision And Applications
2020-06-17
DOI : 10.1007/s00138-020-01089-y
Multi-scale sequential network for semantic text segmentation and localization
Pattern Recognition Letters
2020
DOI : 10.1016/j.patrec.2019.11.001
Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation
IEEE Transactions on Circuits and Systems for Video Technology
2020
DOI : 10.1109/TCSVT.2019.2952779
Improving speech embedding using crossmodal transfer learning with audio-visual data
Multimedia Tools and Applications
2019-06-01
DOI : 10.1007/s11042-018-6992-3
HeadFusion: 360 degrees Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction
IEEE Transactions On Pattern Analysis and Machine Intelligence
2018-11-01
DOI : 10.1109/TPAMI.2018.2841403
How to Tell Ancient Signs Apart? Recognizing and Visualizing Maya Glyphs with CNNs
ACM Journal on Computing and Cultural Heritage (JOCCH)
2018
DOI : 10.1145/3230670
Maya Codical Glyph Segmentation: A Crowdsourcing Approach
IEEE Transactions on Multimedia
2018-03-01
DOI : 10.1109/TMM.2017.2755985
Theories and Models of Teams and Groups
Small Group Research
2017
DOI : 10.1177/1046496417722841
Analyzing and Visualizing Ancient Maya Hieroglyphics Using Shape: from Computer Vision to Digital Humanities
Digital Scholarship in the Humanities
2017
DOI : 10.1093/llc/fqx028
Extracting Maya Glyphs from Degraded Ancient Documents via Image Segmentation
Journal on Computing and Cultural Heritage
2017
DOI : 10.1145/2996859
CRF-Based Context Modeling for Person Identification in Broadcast Videos
Frontiers in ICT: Computer Image Analysis
2016
DOI : 10.3389/fict.2016.00009
Gaze Estimation in the 3D Space Using RGB-D Sensors
International Journal Of Computer Vision
2016
DOI : 10.1007/s11263-015-0863-4
Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
2016
DOI : 10.1109/TPAMI.2016.2537340
Evaluating Shape Representations for Maya Glyph Classification
ACM Journal on Computing and Cultural Heritage
2016
DOI : 10.1145/2905369
Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions
Pattern Recognition Letters
2015
DOI : 10.1016/j.patrec.2014.10.002
Klewel Webcast: from Research to Growing Company
IEEE Multimedia
2015
DOI : 10.1109/MMUL.2015.80
Multimedia Analysis and Access of Ancient Maya Epigraphy
IEEE Signal Processing Magazine
2015
DOI : 10.1109/MSP.2015.2411291
Leveraging Colour Segmentation for Upper-Body Detection
Pattern Recognition
2014
DOI : 10.1016/j.patcog.2013.12.014
Conference Papers
ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour
2023-01-01. IEEE/CVF International Conference on Computer Vision (ICCV) , Paris, FRANCE , OCT 02-06, 2023. p. 20878-20889.DOI : 10.1109/ICCV51070.2023.01914.
A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
2022-01-01. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , New Orleans, LA , Jun 18-24, 2022. p. 5037-5046.DOI : 10.1109/CVPRW56347.2022.00552.
Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction
2021-01-01. Interspeech Conference , Brno, CZECH REPUBLIC , Aug 30-Sep 03, 2021. p. 506-510.DOI : 10.21437/Interspeech.2021-1769.
Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers
2021-01-01. IEEE/CVF International Conference on Computer Vision (ICCVW) , ELECTR NETWORK , Oct 11-17, 2021. p. 2276-2284.DOI : 10.1109/ICCVW54120.2021.00257.
Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets
2021-01-01. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , ELECTR NETWORK , Jun 19-25, 2021. p. 3147-3155.DOI : 10.1109/CVPRW53098.2021.00352.
Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation
2020. IEEE/RSJ International Conference on Intelligent Robots and Systems , Las Vegas, NV, USA , 24 Oct.-24 Jan. 2021.DOI : 10.1109/IROS45743.2020.9340695.
Unsupervised Representation Learning for Gaze Estimation
2020-01-01. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , ELECTR NETWORK , Jun 14-19, 2020. p. 7312-7322.DOI : 10.1109/CVPR42600.2020.00734.
The MuMMER Data Set for Robot Perception in Multi-party HRI Scenarios
2020-01-01. 29th IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) , ELECTR NETWORK , Aug 31-Sep 04, 2020. p. 1294-1300.DOI : 10.1109/RO-MAN47096.2020.9223340.
ManiGaze: a Dataset for Evaluating Remote Gaze Estimator in Object Manipulation Situations
2020. Symposium on Eye Tracking Research and Applications , Stuttgart, Germany .DOI : 10.1145/3379156.3391369.
Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis
2019. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, CA , Jun 16-20, 2019. p. 11929-11938.DOI : 10.1109/CVPR.2019.01221.
A Deep Learning Approach for Robust Head Pose Independent Eye Movements Recognition from Videos
2019. 2019 ACM Symposium on Eye Tracking Research & Applications . p. 5.DOI : 10.1145/3314111.3319844.
Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-Adversarial Training
2019. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Brighton, ENGLAND , May 12-17, 2019. p. 770-774.DOI : 10.1109/ICASSP.2019.8682655.
Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model
2018. 15th European Conference on Computer Vision (ECCV) , Munich, GERMANY , Sep 08-14, 2018. p. 456-474.DOI : 10.1007/978-3-030-11012-3_35.
WatchNet: Efficient and Depth-based Network for People Detection in Video Surveillance Systems
2018. IEEE International Conference on Advanced Video and Signal-based Surveillance , Auckland, NEW ZEALAND , Nov 27-30, 2018. p. 109-114.DOI : 10.1109/AVSS.2018.8639165.
Facing Employers and Customers: What Do Gaze and Expressions Tell About Soft Skills?
2018. 17th International Conference on Mobile and Ubiquitous Multimedia , Cairo, EGYPT , Nov 25-28, 2018. p. 121-126.DOI : 10.1145/3282894.3282925.
Investigating Depth Domain Adaptation for Efficient Human Pose Estimation
2018. 15th European Conference on Computer Vision (ECCV) , Munich, GERMANY , Sep 08-14, 2018. p. 346-363.DOI : 10.1007/978-3-030-11012-3_28.
UNICITY: A depth maps database for people detection in security airlocks
2018. IEEE International Conference on Advanced Video and Signal-based Surveillance Workshop .DOI : 10.1109/AVSS.2018.8639152.
Leveraging Convolutional Pose Machines for Fast and Accurate Head Pose Estimation
2018. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, SPAIN , Oct 01-05, 2018. p. 1089-1094.DOI : 10.1109/IROS.2018.8594223.
Real-time Convolutional Networks for Depth-based Human Pose Estimation
2018. IEEE/RSJ International Conference on Intelligent Robots and Systems , Madrid, SPAIN , Oct 01-05, 2018. p. 41-47.DOI : 10.1109/IROS.2018.8593383.
A Differential Approach for Gaze Estimation with Calibration
2018. 29TH BRITISH MACHINE VISION CONFERENCE .Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization
2018. Proceedings of Interspeech , Hyderabad, INDIA , Aug 02-Sep 06, 2018. p. 2257-2261.DOI : 10.21437/Interspeech.2018-1685.
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
2018. Proceedings of Interspeech , Hyderabad, INDIA , Aug 02-Sep 06, 2018. p. 312-316.DOI : 10.21437/Interspeech.2018-1269.
Deep Neural Networks for Multiple Speaker Detection and Localization
2018. 2018 IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, AUSTRALIA , May 21-25, 2018. p. 74-79.DOI : 10.1109/ICRA.2018.8461267.
Supervised Gaze Bias Correction for Gaze Coding in Interactions
2017. ECEM COGAIN Symposium . p. 3.Active Online Anomaly Detection using Dirichlet Process Mixture Model and Gaussian Process Classification
2017. IEEE Winter Conference on Applications of Computer Vision (WACV) , Washington . p. 615-623.DOI : 10.1109/WACV.2017.74.
Towards the Use of Social Interaction Conventions as Prior for Gaze Model Adaptation
2017.DOI : 10.1145/3136755.3136793.
A Domain Adaptation Approach to Improve Speaker Turn Embedding Using Face Representation
2017. ACM International Conference on Multimodal Interaction , Glasgow, Scotland . p. 411–415.DOI : 10.1145/3136755.3136800.
Improving speaker turn embedding by crossmodal transfer learning from face embedding
2017. ICCV Workshop on Computer Vision for Audio-Visual Media . p. 428-437.DOI : 10.1109/ICCVW.2017.58.
Towards large scale multimedia indexing
2017. 15th International Workshop on Content-Based Multimedia Indexing . p. 18.DOI : 10.1145/3095713.3095732.
Shape Representations for Maya Codical Glyphs: Knowledge-driven or Deep?
2017. 15th International Workshop on Content-Based Multimedia Indexing . p. 32.DOI : 10.1145/3095713.3095746.
Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction
2017. Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017) . p. 711-718.DOI : 10.1109/Fg.2017.90.
Unsupervised Interpretable Pattern Discovery in Time Series Using Autoencoders
2016. IAPR Int. Workshops on Structural and Syntactic Pattern Recognition (SSPR) . p. 427–438.DOI : 10.1007/978-3-319-49055-7_38.
Towards building an attentive artificial listener: on the perception of attentiveness in audio-visual feedback tokens
2016. Proceedings of the 18th ACM International Conference on Multimodal Interaction , Tokyo, Japan . p. 21-28.DOI : 10.1145/2993148.2993188.
Training on the Job: Behavioral Analysis of Job Interviews in Hospitality
2016. Proceedings of the 18th ACM International Conference on Multimodal Interaction . p. 84-91.DOI : 10.1145/2993148.2993191.
EUMSSI team at the MediaEval Person Discovery Challenge 2016
2016. MediaEval Benchmarking Initiative for Multimedia Evaluation , Hilversum, Netherlands .Long-Term Time-Sensitive Costs for CRF-Based Tracking by Detection
2016. 2nd Workshop on Benchmarking Multi-target Tracking: MOTChallenge 2016 , Amsterdam . p. 43-51.DOI : 10.1007/978-3-319-48881-3_4.
Transferring Neural Representations for Low-dimensional Indexing of Maya Hieroglyphic Art
2016. ECCV Workshop on Computer Vision for Art Analysis , Amsterdam . p. 842–855.DOI : 10.1007/978-3-319-46604-0_58.
Efficient and Accurate Tracking for Face Diarization via Periodical Detection
2016. International Conference on Pattern Recognition , Cancun, Mexico .Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media
2016. ACM Multimedia , Amsterdam . p. 202-206.DOI : 10.1145/2964284.2967211.
Assessing a Shape Descriptor for Analysis of Mesoamerican Hieroglyphics: A View Towards Practice in Digital Humanities
2016. Digital Humanities Conference (DH) , Krakow .Ancient Maya Writings as High-Dimensional Data: a Visualization Approach
2016. Digital Humanities (DH) , Krakow .EUMSSI team at the MediaEval Person Discovery Challenge
2015. Working Notes Proceedings of the MediaEval 2015 Workshop , Wurzen, Germany .Head Nod Detection from a Full 3D Model
2015.DOI : 10.1109/ICCVW.2015.75.
Deciphering the Silent Participant. On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions
2015. International Conference on Multimodal Interaction , Seattle, Washington, USA .DOI : 10.1145/2818346.2820759.
Who Will Get the Grant ? A Multimodal Corpus for the Analysis of Conversational Behaviours in Group
2014. International Conference on Multimodal Interaction, Understanding and Modeling Multiparty, Multimodal Interactions Workshop , Istanbul, Turkey .DOI : 10.1145/2666242.2666251.
The MAAYA Project: Multimedia Analysis and Access for Documentation and Decipherment of Maya Epigraphy
2014. Digital Humanities Conference , Lausanne .Is That a Jaguar? Segmenting Ancient Maya Glyphs via Crowdsourcing
2014. ACM Int. Workshop on Crowdsourcing for Multimedia , Orlando . p. 37–40.DOI : 10.1145/2660114.2660117.
Automated Bobbing and Phase Analysis to Measure Walking Entrainment
2014. IEEE International Conference on Image Processing (ICIP), Paris . p. 4186-4190.DOI : 10.1109/ICIP.2014.7025850.
Automatic Maya Hieroglyph Retrieval Using Shape and Context Information
2014. ACM MM . p. 1037–1040.DOI : 10.1145/2647868.2655044.
Theses
Regularization Techniques for Low-Resource Machine Translation
Lausanne, EPFL, 2023.DOI : 10.5075/epfl-thesis-10128.
On matching data and model in LF-MMI-based dysarthric speech recognition
Lausanne, EPFL, 2023.DOI : 10.5075/epfl-thesis-9681.
Learning and optimization of anticipatory feedback controllers for robot manipulation
Lausanne, EPFL, 2023.DOI : 10.5075/epfl-thesis-9850.
Novel Methods For Detection And Analysis Of Atypical Aspects In Speech
Lausanne, EPFL, 2023.DOI : 10.5075/epfl-thesis-9785.
Visual Scene Understanding for Transportation: From Detecting Objects To Relationships
Lausanne, EPFL, 2022.DOI : 10.5075/epfl-thesis-8994.
Memory of Motion for Initializing Optimization in Robotics
Lausanne, EPFL, 2022.DOI : 10.5075/epfl-thesis-9717.
Auditory externalization of a remote microphone signal
Lausanne, EPFL, 2022.DOI : 10.5075/epfl-thesis-8959.
Learning strategies and representations for intuitive robot learning from demonstration
Lausanne, EPFL, 2021.DOI : 10.5075/epfl-thesis-9368.
Modeling and Inferring Attention between Humans or for Human-Robot Interactions
Lausanne, EPFL, 2021.DOI : 10.5075/epfl-thesis-8509.
Efficient Depth-based Deep Learning Methods for Multi-Party Pose Estimation
Lausanne, EPFL, 2021.DOI : 10.5075/epfl-thesis-8429.
Deep Learning Approaches for Auditory Perception in Robotics
Lausanne, EPFL, 2021.DOI : 10.5075/epfl-thesis-8000.
Computational methods for live heart imaging with speed-constrained microscopes
Lausanne, EPFL, 2021.DOI : 10.5075/epfl-thesis-7969.
Active illumination and computational Methods for Temporal and Spectral Super-Resolution Microscopy
Lausanne, EPFL, 2020.DOI : 10.5075/epfl-thesis-7844.
Accurate Nod and 3D Gaze Estimation for Social Interaction Analysis
Lausanne, EPFL, 2020.DOI : 10.5075/epfl-thesis-7284.
Multimodal person recognition in audio-visual streams
Lausanne, EPFL, 2019.DOI : 10.5075/epfl-thesis-9442.
Visual speech recognition
Lausanne, EPFL, 2018.DOI : 10.5075/epfl-thesis-8799.
Visual Analysis of Maya Glyphs via Crowdsourcing and Deep Learning
Lausanne, EPFL, 2017.DOI : 10.5075/epfl-thesis-8058.
Robust Eye Tracking Based on Adaptive Fusion of Multiple Cameras
Lausanne, EPFL, 2017.DOI : 10.5075/epfl-thesis-7933.
Optimized Coding Strategies for Interactive Multiview Video
Lausanne, EPFL, 2015.DOI : 10.5075/epfl-thesis-6843.
3D Gaze Estimation from Remote RGB-D Sensors
Lausanne, EPFL, 2015.DOI : 10.5075/epfl-thesis-6680.
Book Chapters
MAAYA: Multimedia Methods to Support Maya Epigraphic Analysis
Arqueologia computacional: Nuevos enfoques para el analisis y la difusion del patrimonio cultural; INAH-RedTDPC,2017.
Reports
An Attention Mechanism for Deep Q-Networks with Applications in Robotic Pushing
2021
Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis
2019
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
2018
Deep Neural Networks for Multiple Speaker Detection and Localization
2018
Supervised Gaze Bias Correction for Gaze Coding in Interactions
2017
Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction
2017
Real-time Multiple Head Tracking Using Texture and Colour Cues
2017
Maya Codical Glyph Segmentation: A Crowdsourcing Approach
2017
Teaching & PhD
Teaching
Electrical and Electronics Engineering