Babak Falsafi
EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne
+41 21 693 55 92
+41 21 693 13 93
Office: INJ 233
EPFL › IC › IINFCOM › PARSA
Site web: https://parsa.epfl.ch/
+41 21 693 55 92
EPFL › IC › IC-SIN › SIN-ENS
Site web: https://sin.epfl.ch
+41 21 693 55 92
EPFL › IC › IC-SSC › SSC-ENS
Site web: https://ssc.epfl.ch
+41 21 693 55 92
EPFL › SB › SB-SMA › SMA-ENS
Site web: https://sma.epfl.ch/
Prix et distinctions
2015
2012
Alfred P. Sloan Foundation
2004
A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs
ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.QFlex 3.0: Fast and Accurate ARM Server Simulation
2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.Single-Address-Space FaaS with Jord
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.Avant-Garde: Empowering GPUs with Scaled Numeric Formats
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.Constrained bit allocation for neural networks
2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.Rethinking IOMMU for Future IO Devices
2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.Effective Interplay Between Sparsity and Quantization: from Theory to Practice
2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.UrbanTwin: An urban digital twin for climate action
EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.Silicon Efficiency in Post-Moore Servers
2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.Server Architecture from Enterprise to Post-Moore
IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.Electrical-Level Fault-Injection Attacks on FPGA-Based Systems
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Equinox: Training (for Free) on a Custom Inference Accelerator
2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020. DOI : 10.1145/3394898.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.Distributed Logless Atomic Durability with Persistent Memory
2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.Design Guidelines for High-Performance SCM Hierarchies
2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Training DNNs with Hybrid Block Floating Point
2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.A Primer on Hardware Prefetching
Morgan & Claypool, 2014.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.Scale-Out Processors
2012Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.Practical Off-chip Meta-data for Temporal Memory Streaming
2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.Spatio-Temporal Memory Streaming
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.Temporal instruction fetch streaming
2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs
2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.Workshop on Transactional Computing (TRANSACT 2008) - Introduction
Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.Temporal streams in commercial server applications
2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.Flexible hardware acceleration for instruction-grain program monitoring
2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.Predictor virtualization
2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.Last-touch correlated data streaming
2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.An Analysis of Database System Performance on Chip Multiprocessors
2007.Mechanisms for store-wait-free multiprocessors
2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.PROTOFLEX: FPGA-accelerated hybrid functional simulator
2007. DOI : 10.1109/IPDPS.2007.370516.Scheduling threads for constructive cache sharing on CMPs
2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.Database Servers on Chip Multiprocessors: Limitations and Opportunities
2007.PAI: A lightweight mechanism for single-node memory recovery in DSM servers
2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.To Share or Not To Share?
2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.Multi-bit error tolerant caches using two-dimensional error coding
2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.Spatial Memory Streaming
2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors
2006.Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.Simulation sampling with live-points
2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.Dynamic feature selection for hardware prediction
Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.Reunion: Complexity-effective multicore redundancy
2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.Parallel depth first vs. work stealing schedulers on CMP architectures
2006. DOI : 10.1145/1148109.1148167.ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development
2006.Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.Coarse-grain coherence tracking: RegionScout and region coherence arrays
IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.Log-based architectures for general-purpose monitoring of deployed code
2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.Accelerating Database Operations Using a Network Processor
2005.Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.Store-Ordered Streaming of Shared Memory
2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.DBmbench: fast and accurate database workload representation on modern microarchitecture
2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.TRUSS: A Reliable, Scalable Server Architecture
IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.TurboSMARTS: Accurate microarchitecture simulation sampling in minutes
2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.Temporal Streaming of Shared Memory
2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.A case for asymmetric-cell cache memories
IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"
2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.Understanding the performance of concurrent error detecting superscalar microarchitectures
2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data
2004Accurate and complexity-effective spatial pattern prediction
2004. p. 276 - 287.Memory coherence activity prediction in commercial workloads
2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.The Third International Workshop on Power-Aware Computer Systems. Revised Papers.
2004An Evaluation of Stratified Sampling of Microarchitecture Simulations
2004.The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers
2004TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute
2004Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
2004.Efficient resource sharing in concurrent error detecting superscalar microarchitectures
2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.Speculative Sequential Consistency with Little Custom Storage
Journal of Instruction-Level Parallelism. 2003.Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches
2003.Implicitly-multithreaded processors
2003. p. 39 - 50. DOI : 10.1145/859618.859624.Near-optimal precharging in high-performance nanoscale CMOS caches
2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.The Second International Workshop on Power-Aware Computer Systems. Revised Papers.
2003Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy
2002.Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.Speculative sequential consistency with little custom storage
2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication
Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.JETTY: Filtering snoops for reduced energy consumption in SMP servers
2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.Reference idempotency analysis: A framework for optimizing speculative execution
2001. p. 2 - 11. DOI : 10.1145/379539.379547.The First International Workshop on Power-Aware Computer Systems. Revised Papers.
2001Dead-block prediction & dead-block correlating prefetchers
2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches
2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.Reducing set-associative cache energy via way-prediction and selective direct-mapping
2001. p. 54 - 65.Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery
2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation
2001Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor
2001. p. 368 - 380. DOI : 10.1145/377792.377863.Dynamic Feature Selection for Hardware Prediction
2000Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator
IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters
2000. p. 79 - 88. DOI : 10.1145/341800.341811.Address partitioning in DSM clusters with parallel coherence controllers
2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.Low-Overhead and High-Performance Implementations of Sequential Consistency
2000.Selective, accurate, and timely self-invalidation using last-touch prediction
2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache
2000The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.
2000Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories
2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor
2000Cacheable Interface Control Registers for High Speed Data Transfer
US5951657 . 1999.Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols
1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.Memory sharing predictor: the key to a speculative coherent DSM
1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.Is SC + ILP = RC?
ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.Is SC+ILP=RC?
1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.Sirocco: cost-effective fine-grain distributed shared memory
1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.Fine-grain Access Control for Distributed Shared Memory
Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.Reactive NUMA: A design for unifying S-COMA and CC-NUMA
1997. p. 229 - 240. DOI : 10.1145/264107.264205.Scheduling communication on an SMP node parallel machine
1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator
1997.When does Dedicated Protocol Processing Make Sense?
1996Coherent network interfaces for fine-grain communication
1996. p. 247 - 258. DOI : 10.1145/232973.232999.Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations
1996Cost/performance of a parallel computer simulator
1994. p. 173 - 182.Fine-grain access control for distributed shared memory
1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.Mechanisms for Cooperative Shared Memory
CMG Transactions. 1994. DOI : 10.1145/173682.165151.Application-specific protocols for user-level shared memory
1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.Kernel support for the Wisconsin Wind Tunnel
1993. p. 73 - 89.Mechanisms for cooperative shared memory
1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.Component Labeling Algorithms on an Intel iPSC/2 Hypercube
1990. p. 159 - 164.A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs
ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.QFlex 3.0: Fast and Accurate ARM Server Simulation
2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.Single-Address-Space FaaS with Jord
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.Avant-Garde: Empowering GPUs with Scaled Numeric Formats
2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.Constrained bit allocation for neural networks
2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.Rethinking IOMMU for Future IO Devices
2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.Effective Interplay Between Sparsity and Quantization: from Theory to Practice
2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.UrbanTwin: An urban digital twin for climate action
EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.Silicon Efficiency in Post-Moore Servers
2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.Server Architecture from Enterprise to Post-Moore
IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.Secure Interface Design Leveraging Hardware/Software Support
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.Electrical-Level Fault-Injection Attacks on FPGA-Based Systems
Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.What's Missing in Agile Hardware Design? Verification!
Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.Scale-out Systolic Arrays
Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.Rebooting Virtual Memory with Midgard
Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.SecureCells: A Secure Compartmentalized Architecture
2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.Imprecise Store Exceptions
2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.AstriFlash: A Flash-Based System for Online Services
2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.Hardware and Software Support for RPC-Centric Server Architecture
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.Algorithms for Efficient and Robust Distributed Deep Learning
Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.Equinox: Training (for Free) on a Custom Inference Accelerator
2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.Cerebros: Evading the RPC Tax in Datacenters
2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.Data transformer apparatus
US11748254 ; US2022327048 ; WO2021037341 . 2021.Hardware-Software Co-Design of an RPC Processor
Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.Rebooting Virtual Memory with Midgard
2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.Exploiting Errors for Efficiency: A Survey from Circuits to Applications
Acm Computing Surveys. 2020. DOI : 10.1145/3394898.Optimus Prime: Accelerating Data Transformation in Servers
2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
2020The NEBULA RPC-Optimized Architecture
2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.ColTraIn: Co-located DNN training and inference
Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.Distributed Logless Atomic Durability with Persistent Memory
2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.Analog Neural Networks with Deep-submicron Nonlinear Synapses
IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.SMoTherSpectre: Exploiting Speculative Execution through Port Contention
2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.Design Guidelines for High-Performance SCM Hierarchies
2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.Training DNNs with Hybrid Block Floating Point
2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.Network-Compute Co-Design for Distributed In-Memory Computing
Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.Atomic object reads for in-memory rack-scale computing
US10929174 ; US2018173673 . 2018.Rack-Scale Memory Pooling for Datacenters
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.Fat Caches For Scale-Out Servers
Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.Unified prefetching into instruction cache and branch target buffer
US9996358 ; US2017090935 . 2017.FPGAs versus GPUs in Data centers
IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.Near-Memory Address Translation
Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.Near-Memory Address Translation
2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.The Mondrian Data Engine
2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.Towards Near-Threshold Server Processors
2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.Near-Memory Data Services
IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.An Analysis of Load Imbalance in Scale-out Data Serving
2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.Unlocking Energy
2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.Shared Frontend for Manycore Server Processors
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.Sort vs. Hash Join Revisited for Near-Memory Execution
2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.Multi-Gigabyte On-Chip DRAM Caches for Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.Accelerators for Data Processing
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.Confluence: unified instruction supply for scale-out servers
2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.Manycore Network Interfaces for In-Memory Rack-Scale Computing
2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.Sort vs. Hash Join Revisited for Near-Memory Execution
5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.Memory Systems and Interconnects for Scale-Out Servers
Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.Asynchronous memory access chaining
Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.A Case for Specialized Processors for Scale-Out Workloads
IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.A Primer on Hardware Prefetching
Morgan & Claypool, 2014.Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.Big Data
IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.BuMP: Bulk Memory Access Prediction and Streaming
2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.Scale-Out NUMA
2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.Towards stable cloud performance
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.Architectural Support to Accelerate Fine-Grain Program Monitoring
Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
2014SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.Meet the Walkers: Accelerating Index Traversals for In-Memory Databases
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.Multi-Grain Coherence Directory
2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction
IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache
2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.Scale-Out Processors
Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.DeSyRe: On-demand system reliability
Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
2012Scale-Out Processors
2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.Scale-Out Processors
2012Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.Dark Silicon Accelerators for Database Indexing
Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip
2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.Optimizing Data-Center TCO with Scale-Out Processors
IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.Dark Silicon Accelerators for Database Indexing
2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.NOC-Out: Microarchitecting a Scale-Out Processor
2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips
2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
2011Proactive Instruction Fetch
2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.Cuckoo Directory: A Scalable Directory for Many-Core Systems
2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.Toward Dark Silicon in Servers
IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.Reliability in the Dark Silicon Era
2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.TurboTag: Lookup Filtering to Reduce Coherence Directory Power
2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures
IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.Making Address-Correlated Prefetching Practical
IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors
2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.Practical Off-chip Meta-data for Temporal Memory Streaming
2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.Flexible Hardware Acceleration for Instruction-Grain Lifeguards
IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.Spatio-Temporal Memory Streaming
2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.Workshop on Transactional Computing (TRANSACT 2008) - Introduction
Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.Temporal streams in commercial server applications
2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.Flexible hardware acceleration for instruction-grain program monitoring
2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs
2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.Temporal instruction fetch streaming
2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.Predictor virtualization
2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.Last-touch correlated data streaming
2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.An Analysis of Database System Performance on Chip Multiprocessors
2007.Mechanisms for store-wait-free multiprocessors
2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.PROTOFLEX: FPGA-accelerated hybrid functional simulator
2007. DOI : 10.1109/IPDPS.2007.370516.Scheduling threads for constructive cache sharing on CMPs
2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.PAI: A lightweight mechanism for single-node memory recovery in DSM servers
2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.Database Servers on Chip Multiprocessors: Limitations and Opportunities
2007.Multi-bit error tolerant caches using two-dimensional error coding
2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.To Share or Not To Share?
2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.Spatial Memory Streaming
2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors
2006.Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.Simulation sampling with live-points
2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.Dynamic feature selection for hardware prediction
Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.Reunion: Complexity-effective multicore redundancy
2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development
2006.Coarse-grain coherence tracking: RegionScout and region coherence arrays
IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.Log-based architectures for general-purpose monitoring of deployed code
2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.Parallel depth first vs. work stealing schedulers on CMP architectures
2006. DOI : 10.1145/1148109.1148167.Accelerating Database Operations Using a Network Processor
2005.Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.Store-Ordered Streaming of Shared Memory
2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.DBmbench: fast and accurate database workload representation on modern microarchitecture
2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.TRUSS: A Reliable, Scalable Server Architecture
IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.Understanding the performance of concurrent error detecting superscalar microarchitectures
2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.A case for asymmetric-cell cache memories
IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.Temporal Streaming of Shared Memory
2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.TurboSMARTS: Accurate microarchitecture simulation sampling in minutes
2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"
2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.Accurate and complexity-effective spatial pattern prediction
2004. p. 276 - 287.Memory coherence activity prediction in commercial workloads
2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.An Evaluation of Stratified Sampling of Microarchitecture Simulations
2004.Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.The Third International Workshop on Power-Aware Computer Systems. Revised Papers.
2004Efficient resource sharing in concurrent error detecting superscalar microarchitectures
2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers
2004SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data
2004TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute
2004Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth
2004.Speculative Sequential Consistency with Little Custom Storage
Journal of Instruction-Level Parallelism. 2003.Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches
2003.Implicitly-multithreaded processors
2003. p. 39 - 50. DOI : 10.1145/859618.859624.Near-optimal precharging in high-performance nanoscale CMOS caches
2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.The Second International Workshop on Power-Aware Computer Systems. Revised Papers.
2003Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy
2002.Speculative sequential consistency with little custom storage
2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication
Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.JETTY: Filtering snoops for reduced energy consumption in SMP servers
2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.Reference idempotency analysis: A framework for optimizing speculative execution
2001. p. 2 - 11. DOI : 10.1145/379539.379547.The First International Workshop on Power-Aware Computer Systems. Revised Papers.
2001Dead-block prediction & dead-block correlating prefetchers
2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches
2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.Reducing set-associative cache energy via way-prediction and selective direct-mapping
2001. p. 54 - 65.Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor
2001. p. 368 - 380. DOI : 10.1145/377792.377863.Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery
2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation
2001Dynamic Feature Selection for Hardware Prediction
2000Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator
IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters
2000. p. 79 - 88. DOI : 10.1145/341800.341811.Address partitioning in DSM clusters with parallel coherence controllers
2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.Low-Overhead and High-Performance Implementations of Sequential Consistency
2000.Selective, accurate, and timely self-invalidation using last-touch prediction
2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache
2000Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories
2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.
2000Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor
2000Cacheable Interface Control Registers for High Speed Data Transfer
US5951657 . 1999.Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols
1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.Memory sharing predictor: the key to a speculative coherent DSM
1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.Is SC+ILP=RC?
1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.Is SC + ILP = RC?
ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.Sirocco: cost-effective fine-grain distributed shared memory
1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.Fine-grain Access Control for Distributed Shared Memory
Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.Reactive NUMA: A design for unifying S-COMA and CC-NUMA
1997. p. 229 - 240. DOI : 10.1145/264107.264205.Scheduling communication on an SMP node parallel machine
1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator
1997.When does Dedicated Protocol Processing Make Sense?
1996Coherent network interfaces for fine-grain communication
1996. p. 247 - 258. DOI : 10.1145/232973.232999.Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations
1996Cost/performance of a parallel computer simulator
1994. p. 173 - 182.Fine-grain access control for distributed shared memory
1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.Mechanisms for Cooperative Shared Memory
CMG Transactions. 1994. DOI : 10.1145/173682.165151.Application-specific protocols for user-level shared memory
1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.Kernel support for the Wisconsin Wind Tunnel
1993. p. 73 - 89.Mechanisms for cooperative shared memory
1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.Component Labeling Algorithms on an Intel iPSC/2 Hypercube
1990. p. 159 - 164.Enseignement et PhD
Doctorant·es actuel·les
Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari
A dirigé les thèses EPFL de
Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud
A co-dirigé les thèses EPFL de
Stanko Novakovic, Tao Lin, Atri Bhattacharyya
Cours
Advanced multiprocessor architecture
CS-471
Les multiprocesseurs sont à la base des systèmes informatiques. Ce cours couvre l'architecture et l'organisation des multiprocesseurs modernes, des accélérateurs prévalents (GPU, TPU, etc.) et des datacenters. Il comprend un projet de recherche sur les multiprocesseurs et les datacenters post-Moore.
Parallelism and concurrency in software
CS-302
Des capteurs aux téléphones intelligents,en passant par les plus grands centres de données et superordinateurs du monde,le parallélisme et la concurrence sont omniprésents dans l'informatique moderne.Il existe également de nombreuses formes d'exécution parallèle et concurrente dans les plateformes.
Topics on Datacenter Design
CS-728