Babak Falsafi
EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne
+41 21 693 55 92
+41 21 693 13 93
Office: INJ 233
EPFL › IC › IINFCOM › PARSA
Site web: https://parsa.epfl.ch/
+41 21 693 55 92
EPFL › IC › IC-SIN › SIN-ENS
Site web: https://sin.epfl.ch
+41 21 693 55 92
EPFL › IC › IC-SSC › SSC-ENS
Site web: https://ssc.epfl.ch
+41 21 693 55 92
EPFL › SB › SB-SMA › SMA-ENS
Site web: https://sma.epfl.ch/
Prix et distinctions
2015
2012
Alfred P. Sloan Foundation
2004
A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs
ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.QFlex 3.0: Fast and Accurate ARM Server Simulation
2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.Single-Address-Space FaaS with Jord
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.Avant-Garde: Empowering GPUs with Scaled Numeric Formats
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.Constrained bit allocation for neural networks
2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.Rethinking IOMMU for Future IO Devices
2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.Effective Interplay Between Sparsity and Quantization: from Theory to Practice
2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.UrbanTwin: An urban digital twin for climate action
EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.Silicon Efficiency in Post-Moore Servers
2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.Electrical-Level Fault-Injection Attacks on FPGA-Based Systems
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.Server Architecture from Enterprise to Post-Moore
IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Equinox: Training (for Free) on a Custom Inference Accelerator
2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020. DOI : 10.1145/3394898.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020Distributed Logless Atomic Durability with Persistent Memory
2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.Design Guidelines for High-Performance SCM Hierarchies
2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.Training DNNs with Hybrid Block Floating Point
2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.A Primer on Hardware Prefetching
Morgan & Claypool, 2014.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Scale-Out Processors
2012Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.Spatio-Temporal Memory Streaming
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.Practical Off-chip Meta-data for Temporal Memory Streaming
2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.Flexible hardware acceleration for instruction-grain program monitoring
2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.Temporal instruction fetch streaming
2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.Workshop on Transactional Computing (TRANSACT 2008) - Introduction
Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs
2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.Temporal streams in commercial server applications
2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.Predictor virtualization
2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.PAI: A lightweight mechanism for single-node memory recovery in DSM servers
2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.Multi-bit error tolerant caches using two-dimensional error coding
2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.Database Servers on Chip Multiprocessors: Limitations and Opportunities
2007.To Share or Not To Share?
2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.An Analysis of Database System Performance on Chip Multiprocessors
2007.Last-touch correlated data streaming
2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.PROTOFLEX: FPGA-accelerated hybrid functional simulator
2007. DOI : 10.1109/IPDPS.2007.370516.Scheduling threads for constructive cache sharing on CMPs
2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.Mechanisms for store-wait-free multiprocessors
2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development
2006.Coarse-grain coherence tracking: RegionScout and region coherence arrays
IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.Spatial Memory Streaming
2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.Simulation sampling with live-points
2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors
2006.Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.Dynamic feature selection for hardware prediction
Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.Reunion: Complexity-effective multicore redundancy
2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.Parallel depth first vs. work stealing schedulers on CMP architectures
2006. DOI : 10.1145/1148109.1148167.Log-based architectures for general-purpose monitoring of deployed code
2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.Temporal Streaming of Shared Memory
2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.A case for asymmetric-cell cache memories
IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.TurboSMARTS: Accurate microarchitecture simulation sampling in minutes
2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"
2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.Accelerating Database Operations Using a Network Processor
2005.Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.Understanding the performance of concurrent error detecting superscalar microarchitectures
2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.DBmbench: fast and accurate database workload representation on modern microarchitecture
2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.TRUSS: A Reliable, Scalable Server Architecture
IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.Store-Ordered Streaming of Shared Memory
2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.The Third International Workshop on Power-Aware Computer Systems. Revised Papers.
2004The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers
2004SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data
2004Accurate and complexity-effective spatial pattern prediction
2004. p. 276 - 287.Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
2004.TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute
2004Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.Memory coherence activity prediction in commercial workloads
2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.An Evaluation of Stratified Sampling of Microarchitecture Simulations
2004.Efficient resource sharing in concurrent error detecting superscalar microarchitectures
2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches
2003.Near-optimal precharging in high-performance nanoscale CMOS caches
2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.The Second International Workshop on Power-Aware Computer Systems. Revised Papers.
2003Implicitly-multithreaded processors
2003. p. 39 - 50. DOI : 10.1145/859618.859624.Speculative Sequential Consistency with Little Custom Storage
Journal of Instruction-Level Parallelism. 2003.Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication
Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.Speculative sequential consistency with little custom storage
2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy
2002.Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery
2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation
2001Reducing set-associative cache energy via way-prediction and selective direct-mapping
2001. p. 54 - 65.JETTY: Filtering snoops for reduced energy consumption in SMP servers
2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.Dead-block prediction & dead-block correlating prefetchers
2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.The First International Workshop on Power-Aware Computer Systems. Revised Papers.
2001Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor
2001. p. 368 - 380. DOI : 10.1145/377792.377863.An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches
2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.Reference idempotency analysis: A framework for optimizing speculative execution
2001. p. 2 - 11. DOI : 10.1145/379539.379547.The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.
2000Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories
2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor
2000Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator
IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.Dynamic Feature Selection for Hardware Prediction
2000Address partitioning in DSM clusters with parallel coherence controllers
2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters
2000. p. 79 - 88. DOI : 10.1145/341800.341811.Low-Overhead and High-Performance Implementations of Sequential Consistency
2000.Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache
2000Selective, accurate, and timely self-invalidation using last-touch prediction
2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.Memory sharing predictor: the key to a speculative coherent DSM
1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.Is SC+ILP=RC?
1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.Cacheable Interface Control Registers for High Speed Data Transfer
US5951657 . 1999.Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols
1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.Is SC + ILP = RC?
ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.Sirocco: cost-effective fine-grain distributed shared memory
1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.Fine-grain Access Control for Distributed Shared Memory
Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator
1997.Scheduling communication on an SMP node parallel machine
1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.Reactive NUMA: A design for unifying S-COMA and CC-NUMA
1997. p. 229 - 240. DOI : 10.1145/264107.264205.Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations
1996When does Dedicated Protocol Processing Make Sense?
1996Coherent network interfaces for fine-grain communication
1996. p. 247 - 258. DOI : 10.1145/232973.232999.Cost/performance of a parallel computer simulator
1994. p. 173 - 182.Application-specific protocols for user-level shared memory
1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.Mechanisms for Cooperative Shared Memory
CMG Transactions. 1994. DOI : 10.1145/173682.165151.Fine-grain access control for distributed shared memory
1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.Kernel support for the Wisconsin Wind Tunnel
1993. p. 73 - 89.Mechanisms for cooperative shared memory
1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.Component Labeling Algorithms on an Intel iPSC/2 Hypercube
1990. p. 159 - 164.A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs
ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.QFlex 3.0: Fast and Accurate ARM Server Simulation
2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.Single-Address-Space FaaS with Jord
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.Avant-Garde: Empowering GPUs with Scaled Numeric Formats
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.Constrained bit allocation for neural networks
2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.Rethinking IOMMU for Future IO Devices
2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.Effective Interplay Between Sparsity and Quantization: from Theory to Practice
2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.UrbanTwin: An urban digital twin for climate action
EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.Silicon Efficiency in Post-Moore Servers
2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.Electrical-Level Fault-Injection Attacks on FPGA-Based Systems
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.Server Architecture from Enterprise to Post-Moore
IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Equinox: Training (for Free) on a Custom Inference Accelerator
2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020. DOI : 10.1145/3394898.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020Distributed Logless Atomic Durability with Persistent Memory
2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.Design Guidelines for High-Performance SCM Hierarchies
2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.Training DNNs with Hybrid Block Floating Point
2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.A Primer on Hardware Prefetching
Morgan & Claypool, 2014.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Scale-Out Processors
2012Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.Spatio-Temporal Memory Streaming
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.Practical Off-chip Meta-data for Temporal Memory Streaming
2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.Flexible hardware acceleration for instruction-grain program monitoring
2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.Temporal instruction fetch streaming
2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.Workshop on Transactional Computing (TRANSACT 2008) - Introduction
Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs
2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.Temporal streams in commercial server applications
2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.Predictor virtualization
2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.PAI: A lightweight mechanism for single-node memory recovery in DSM servers
2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.Multi-bit error tolerant caches using two-dimensional error coding
2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.Database Servers on Chip Multiprocessors: Limitations and Opportunities
2007.To Share or Not To Share?
2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.An Analysis of Database System Performance on Chip Multiprocessors
2007.Last-touch correlated data streaming
2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.PROTOFLEX: FPGA-accelerated hybrid functional simulator
2007. DOI : 10.1109/IPDPS.2007.370516.Scheduling threads for constructive cache sharing on CMPs
2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.Mechanisms for store-wait-free multiprocessors
2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development
2006.Coarse-grain coherence tracking: RegionScout and region coherence arrays
IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.Spatial Memory Streaming
2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.Simulation sampling with live-points
2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors
2006.Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.Dynamic feature selection for hardware prediction
Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.Reunion: Complexity-effective multicore redundancy
2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.Parallel depth first vs. work stealing schedulers on CMP architectures
2006. DOI : 10.1145/1148109.1148167.Log-based architectures for general-purpose monitoring of deployed code
2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.Temporal Streaming of Shared Memory
2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.A case for asymmetric-cell cache memories
IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.TurboSMARTS: Accurate microarchitecture simulation sampling in minutes
2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"
2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.Accelerating Database Operations Using a Network Processor
2005.Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.Understanding the performance of concurrent error detecting superscalar microarchitectures
2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.DBmbench: fast and accurate database workload representation on modern microarchitecture
2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.TRUSS: A Reliable, Scalable Server Architecture
IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.Store-Ordered Streaming of Shared Memory
2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.The Third International Workshop on Power-Aware Computer Systems. Revised Papers.
2004The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers
2004SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data
2004Accurate and complexity-effective spatial pattern prediction
2004. p. 276 - 287.Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
2004.TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute
2004Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.Memory coherence activity prediction in commercial workloads
2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.An Evaluation of Stratified Sampling of Microarchitecture Simulations
2004.Efficient resource sharing in concurrent error detecting superscalar microarchitectures
2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches
2003.Near-optimal precharging in high-performance nanoscale CMOS caches
2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.The Second International Workshop on Power-Aware Computer Systems. Revised Papers.
2003Implicitly-multithreaded processors
2003. p. 39 - 50. DOI : 10.1145/859618.859624.Speculative Sequential Consistency with Little Custom Storage
Journal of Instruction-Level Parallelism. 2003.Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication
Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.Speculative sequential consistency with little custom storage
2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy
2002.Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery
2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation
2001Reducing set-associative cache energy via way-prediction and selective direct-mapping
2001. p. 54 - 65.JETTY: Filtering snoops for reduced energy consumption in SMP servers
2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.Dead-block prediction & dead-block correlating prefetchers
2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.The First International Workshop on Power-Aware Computer Systems. Revised Papers.
2001Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor
2001. p. 368 - 380. DOI : 10.1145/377792.377863.An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches
2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.Reference idempotency analysis: A framework for optimizing speculative execution
2001. p. 2 - 11. DOI : 10.1145/379539.379547.The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.
2000Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories
2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor
2000Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator
IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.Dynamic Feature Selection for Hardware Prediction
2000Address partitioning in DSM clusters with parallel coherence controllers
2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters
2000. p. 79 - 88. DOI : 10.1145/341800.341811.Low-Overhead and High-Performance Implementations of Sequential Consistency
2000.Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache
2000Selective, accurate, and timely self-invalidation using last-touch prediction
2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.Memory sharing predictor: the key to a speculative coherent DSM
1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.Is SC+ILP=RC?
1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.Cacheable Interface Control Registers for High Speed Data Transfer
US5951657 . 1999.Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols
1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.Is SC + ILP = RC?
ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.Sirocco: cost-effective fine-grain distributed shared memory
1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.Fine-grain Access Control for Distributed Shared Memory
Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator
1997.Scheduling communication on an SMP node parallel machine
1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.Reactive NUMA: A design for unifying S-COMA and CC-NUMA
1997. p. 229 - 240. DOI : 10.1145/264107.264205.Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations
1996When does Dedicated Protocol Processing Make Sense?
1996Coherent network interfaces for fine-grain communication
1996. p. 247 - 258. DOI : 10.1145/232973.232999.Cost/performance of a parallel computer simulator
1994. p. 173 - 182.Application-specific protocols for user-level shared memory
1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.Mechanisms for Cooperative Shared Memory
CMG Transactions. 1994. DOI : 10.1145/173682.165151.Fine-grain access control for distributed shared memory
1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.Kernel support for the Wisconsin Wind Tunnel
1993. p. 73 - 89.Mechanisms for cooperative shared memory
1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.Component Labeling Algorithms on an Intel iPSC/2 Hypercube
1990. p. 159 - 164.Enseignement et PhD
Doctorant·es actuel·les
Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari
A dirigé les thèses EPFL de
Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud
A co-dirigé les thèses EPFL de
Stanko Novakovic, Tao Lin, Atri Bhattacharyya
Cours
Advanced multiprocessor architecture
CS-471
Les multiprocesseurs sont à la base des systèmes informatiques. Ce cours couvre l'architecture et l'organisation des multiprocesseurs modernes, des accélérateurs prévalents (GPU, TPU, etc.) et des datacenters. Il comprend un projet de recherche sur les multiprocesseurs et les datacenters post-Moore.
Parallelism and concurrency in software
CS-302
Des capteurs aux téléphones intelligents,en passant par les plus grands centres de données et superordinateurs du monde,le parallélisme et la concurrence sont omniprésents dans l'informatique moderne.Il existe également de nombreuses formes d'exécution parallèle et concurrente dans les plateformes.
Topics on Datacenter Design
CS-728