Babak Falsafi
Full Professor
babak.falsafi@epfl.ch +41 21 693 55 92 http://parsa.epfl.ch/~falsafi
EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne
+41 21 693 55 92
+41 21 693 13 93
Office:
INJ 233
EPFL
>
IC
>
IINFCOM
>
PARSA
Web site: Web site: https://parsa.epfl.ch/
+41 21 693 55 92
EPFL
>
IC
>
IC-SIN
>
SIN-ENS
Web site: Web site: https://sin.epfl.ch
+41 21 693 55 92
EPFL
>
IC
>
IC-SSC
>
SSC-ENS
Web site: Web site: https://ssc.epfl.ch
+41 21 693 55 92
EPFL
>
SB
>
SB-SMA
>
SMA-ENS
Web site: Web site: https://sma.epfl.ch/
Fields of expertise
Biography
Babak is a Professor in the School of Computer and Communication Sciences and the founder of EcoCloud, an industrial/academic consortium at EPFL investigating scalable sustainable information technology. He has made numerous contributions to computer system design and evaluation including a scalable multiprocessor architecture which was prototyped by Sun Microsystems (now Oracle), snoop filters incorporated into multi-socket x86 servers and IBM BlueGene supercomputers, spatial and temporal memory streaming that appear in ARM cores, and computer system performance evaluation methodologies that have been in use by AMD, HP and Google PerfKit . He has shown that hardware memory consistency models are neither necessary (in the 90's) nor sufficient (a decade later) to achieve high performance in servers. These results eventually led to fence speculation in modern CPUs. His work on cloud-native CPUs laid the foundation for the first generation of Cavium ARM server CPUs, ThunderX. He is a recipient of an NSF CAREER award, IBM Faculty Partnership Awards, and an Alfred P. Sloan Research Fellowship. He is a fellow of ACM and IEEE.NEWS
Online services are stuck in memory and DRAM is not scaling. AstriFlash at HPCA'23 presents a system to serve data directly out of Flash, reducing memory cost by 20x and meeting ms-scale SLO for online services at 95% of throughput compared to DRAM.Network bandwidth is projected to grow at 20% a year for a decade thanks to optics. Logic density is lagging behind at 15% a year and slowing down resulting a "datacenter tax". Optimus Prime a data transformation accelerator, NebuLA a hardware-terminated network stack, and Cerebros an RPC processor are examples of how to mitigate the datacenter tax in the post-Moore era. Great to see that Google has followed up with their own data transformation accelerator in 2022.
See our paper on "Rebooting Virtual Memory with Midgard" for a novel approach to future-proof virtual memory. Here is a news snippet.
Numerical training of DNNs is converging on fixed point with orders of magnitude improvement in logic, memory, power and bandwidth. See our blog.
RESEARCH
Data has emerged as a currency for modern society and datacenters are now the backbone of IT offering large-scale cloud services at low costs benefiting from and exploiting the economies of scale. With silicon efficiency scaling having dwindled since 2004 and silicon density scaling, Moore's Law, slowing down, future digital platforms will rely on heterogeneous logic and memory to allow for IT scalability. Meanwhile, the demand for large-scale cloud services has grown dramatically faster than conventional silicon scaling making IT platform scalability a grand challenge. Future platforms will need hand-in-hand collaboration of application domain experts and platform designers to improve scalability. With many online services being in-memory and the minimum communication latency between the farthest nodes being microseconds, future server platforms will go through revolutionary changes in architecture and systems to enable seamless aggregation of logic and memory resources across nodes, breaking the conventional abstraction layers. Babak's research and educational activities center around post-Moore server design.He investigates techniques to address these challenges in the context of the following projects:
- CloudSuite: A Benchmark Suite for Scale-Out Workloads
- ColTraIn: Co-Located Training and Inference DNN Accelerators
- HARNESS: Heterogeneous Architectures for Next-Generation Server Systems
- Midgard: Future-Proofing Virtual Memory
- QFlex: Fast, Full-System Open-Source Server Simulation/Emulation
- VISA: Cloud-Native CPUs
Selected Talks
Integration, Specialization and Approximation: the "ISA" of Post-Moore ServersHPCA Keynote, 2022.
Post-Moore AI Infrastructure
Facebook SysML Talk, 2021.
Post-Moore Server Architecture
ICS Keynote, 2020 (Video on YouTube!).
Server Architecture for the Post-Moore Era
HotDC Keynote, 2017.
Publications
Infoscience publications
Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023-07-01. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023-06-01. DOI : 10.1145/3572917.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921-2939. DOI : 10.1109/SP46215.2023.00125.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30-46. DOI : 10.1145/3567955.3567957.Roles of Clinical Features and Chest CT in Predicting the Outcomes of Hospitalized Patients with COVID-19 Developing AKI
Iranian Journal Of Kidney Diseases. 2023-01-01. DOI : 10.52547/ijkd.7241.An optimal preconditioned FFT-accelerated finite element solver for homogenization
Applied Mathematics And Computation. 2023-01-30. DOI : 10.1016/j.amc.2023.127835.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Efficient Meso-Scale Modeling of Alkali-Silica-Reaction Damage in Concrete
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-9591.Elimination of ringing artifacts by finite-element projection in FFT-based homogenization
Journal Of Computational Physics. 2022-03-15. DOI : 10.1016/j.jcp.2021.110931.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Equinox: Training (for Free) on a Custom Inference Accelerator
2021-10-18. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021-10-18. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407-420. DOI : 10.1145/3466752.3480055.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020-06-01. DOI : 10.1145/3394898.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199-212. DOI : 10.1109/ISCA45697.2020.00027.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203-1216. DOI : 10.1145/3373376.3378501.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020Distributed Logless Atomic Durability with Persistent Memory
2019-10-16. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785–800. DOI : 10.1145/3319535.3363194.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs
2019-04-15. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19, Providence, Rhode Island, USA, April 13-17, 2019. p. 35-48. DOI : 10.1145/3297858.3304070.Mitigating Load Imbalance in Distributed Data Serving with Rack-Scale Memory Pooling
ACM Transactions on Computer Systems. 2019-04-01. DOI : 10.1145/3309986.Design Guidelines for High-Performance SCM Hierarchies
2018-10-01. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Training DNNs with Hybrid Block Floating Point
2018-01-01. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching
2018. Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18, Williamsburg, VA, USA, March 24th – March 28th, 2018. p. 489-502. DOI : 10.1145/3173162.3173211.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303-317. DOI : 10.1109/Pact.2017.56.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393-406.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367–368. DOI : 10.1145/2896377.2901501.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7-12.Scale-out non-uniform memory access
US9734063 ; US2015242324 . 2015.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166-177. DOI : 10.1145/2830772.2830785.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Network-on-chip using request and reply trees for low-latency processor-memory communication
US9703707 ; US2014156929 . 2014.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25-37. DOI : 10.1109/MICRO.2014.51.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545-557. DOI : 10.1109/MICRO.2014.44.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.A Primer on Hardware Prefetching
Morgan & Claypool.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108-119. DOI : 10.1109/HPCA.2014.6835922.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404–415. DOI : 10.1145/2485922.2485957.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Scale-Out Processors
2012Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V-V.Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152-162. DOI : 10.1145/2155620.2155638.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271-284. DOI : 10.1145/1736020.1736051.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377-382. DOI : 10.1145/1840845.1840929.Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195-201. DOI : 10.1109/PRDC.2009.39.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Infoscience
Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023-07-01. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023-06-01. DOI : 10.1145/3572917.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921-2939. DOI : 10.1109/SP46215.2023.00125.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30-46. DOI : 10.1145/3567955.3567957.Roles of Clinical Features and Chest CT in Predicting the Outcomes of Hospitalized Patients with COVID-19 Developing AKI
Iranian Journal Of Kidney Diseases. 2023-01-01. DOI : 10.52547/ijkd.7241.An optimal preconditioned FFT-accelerated finite element solver for homogenization
Applied Mathematics And Computation. 2023-01-30. DOI : 10.1016/j.amc.2023.127835.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Efficient Meso-Scale Modeling of Alkali-Silica-Reaction Damage in Concrete
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-9591.Elimination of ringing artifacts by finite-element projection in FFT-based homogenization
Journal Of Computational Physics. 2022-03-15. DOI : 10.1016/j.jcp.2021.110931.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Equinox: Training (for Free) on a Custom Inference Accelerator
2021-10-18. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021-10-18. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407-420. DOI : 10.1145/3466752.3480055.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020-06-01. DOI : 10.1145/3394898.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199-212. DOI : 10.1109/ISCA45697.2020.00027.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203-1216. DOI : 10.1145/3373376.3378501.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020Distributed Logless Atomic Durability with Persistent Memory
2019-10-16. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785–800. DOI : 10.1145/3319535.3363194.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs
2019-04-15. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19, Providence, Rhode Island, USA, April 13-17, 2019. p. 35-48. DOI : 10.1145/3297858.3304070.Mitigating Load Imbalance in Distributed Data Serving with Rack-Scale Memory Pooling
ACM Transactions on Computer Systems. 2019-04-01. DOI : 10.1145/3309986.Design Guidelines for High-Performance SCM Hierarchies
2018-10-01. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Training DNNs with Hybrid Block Floating Point
2018-01-01. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching
2018. Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18, Williamsburg, VA, USA, March 24th – March 28th, 2018. p. 489-502. DOI : 10.1145/3173162.3173211.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303-317. DOI : 10.1109/Pact.2017.56.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393-406.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367–368. DOI : 10.1145/2896377.2901501.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7-12.Scale-out non-uniform memory access
US9734063 ; US2015242324 . 2015.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166-177. DOI : 10.1145/2830772.2830785.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Network-on-chip using request and reply trees for low-latency processor-memory communication
US9703707 ; US2014156929 . 2014.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25-37. DOI : 10.1109/MICRO.2014.51.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545-557. DOI : 10.1109/MICRO.2014.44.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.A Primer on Hardware Prefetching
Morgan & Claypool.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108-119. DOI : 10.1109/HPCA.2014.6835922.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404–415. DOI : 10.1145/2485922.2485957.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Scale-Out Processors
2012Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V-V.Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152-162. DOI : 10.1145/2155620.2155638.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271-284. DOI : 10.1145/1736020.1736051.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377-382. DOI : 10.1145/1840845.1840929.Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195-201. DOI : 10.1109/PRDC.2009.39.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Teaching & PhD
Teaching
Computer Science
Mathematics
Communication Systems