Babak Falsafi

EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne

Expertise

Computer architecture, datacenter systems, cloud-native server architecture.
Babak is a Professor in the School of Computer and Communication Sciences and the founder of EcoCloud, an industrial/academic consortium at EPFL investigating scalable sustainable information technology. He has made numerous contributions to computer system design and evaluation including a scalable multiprocessor architecture which was prototyped by Sun Microsystems (now Oracle), snoop filters incorporated into multi-socket x86 servers and IBM BlueGene supercomputers, spatial and temporal memory streaming that appear in ARM cores, and computer system performance evaluation methodologies that have been in use by AMD, HP and Google PerfKit . He has shown that hardware memory consistency models are neither necessary (in the 90's) nor sufficient (a decade later) to achieve high performance in servers. These results eventually led to fence speculation in modern CPUs. His work on cloud-native CPUs laid the foundation for the first generation of Cavium ARM server CPUs, ThunderX. He is a recipient of an NSF CAREER award, IBM Faculty Partnership Awards, and an Alfred P. Sloan Research Fellowship. He is a fellow of ACM and IEEE.

NEWS

Online services are stuck in memory and DRAM is not scaling. AstriFlash at HPCA'23 presents a system to serve data directly out of Flash, reducing memory cost by 20x and meeting ms-scale SLO for online services at 95% of throughput compared to DRAM.
Network bandwidth is projected to grow at 20% a year for a decade thanks to optics. Logic density is lagging behind at 15% a year and slowing down resulting a "datacenter tax". Optimus Prime a data transformation accelerator, NebuLA a hardware-terminated network stack, and Cerebros an RPC processor are examples of how to mitigate the datacenter tax in the post-Moore era. Great to see that Google has followed up with their own data transformation accelerator in 2022.
See our paper on "Rebooting Virtual Memory with Midgard" for a novel approach to future-proof virtual memory. Here is a news snippet.
Numerical training of DNNs is converging on fixed point with orders of magnitude improvement in logic, memory, power and bandwidth. See our blog.

RESEARCH

Data has emerged as a currency for modern society and datacenters are now the backbone of IT offering large-scale cloud services at low costs benefiting from and exploiting the economies of scale. With silicon efficiency scaling having dwindled since 2004 and silicon density scaling, Moore's Law, slowing down, future digital platforms will rely on heterogeneous logic and memory to allow for IT scalability. Meanwhile, the demand for large-scale cloud services has grown dramatically faster than conventional silicon scaling making IT platform scalability a grand challenge. Future platforms will need hand-in-hand collaboration of application domain experts and platform designers to improve scalability. With many online services being in-memory and the minimum communication latency between the farthest nodes being microseconds, future server platforms will go through revolutionary changes in architecture and systems to enable seamless aggregation of logic and memory resources across nodes, breaking the conventional abstraction layers. Babak's research and educational activities center around post-Moore server design.
He investigates techniques to address these challenges in the context of the following projects:
  • CloudSuite: A Benchmark Suite for Scale-Out Workloads
  • ColTraIn: Co-Located Training and Inference DNN Accelerators
  • HARNESS: Heterogeneous Architectures for Next-Generation Server Systems
  • Midgard: Future-Proofing Virtual Memory
  • QFlex: Fast, Full-System Open-Source Server Simulation/Emulation
  • VISA: Cloud-Native CPUs

Selected Talks

Awards

Elected Fellow of Association for Computing Machinery (ACM)

2015

Elected Fellow of the Institute of Electrical and Electronics Engineers

2012

Sloan Research Fellowship

Alfred P. Sloan Foundation

2004

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

Acm Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Infoscience

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

Acm Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Teaching & PhD

PhD Students

Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari

Past EPFL PhD Students

Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud

Past EPFL PhD Students as codirector

Stanko Novakovic, Tao Lin, Atri Bhattacharyya

Courses

Advanced multiprocessor architecture

CS-471

Multiprocessors are basic building blocks for all computer systems. This course covers the architecture and organization of modern multiprocessors, prevalent accelerators (e.g., GPU, TPU), and datacenters. It includes a research project on multiprocessors and post-Moore era datacenters.

Parallelism and concurrency in software

CS-302

From sensors,to smart phones,to the world's largest datacenters and supercomputers, parallelism & concurrency is ubiquitous in modern computing.There are also many forms of parallel & concurrent execution in modern platforms with varying degrees of ease of programmability,performance & efficiency.

Topics in Machine Learning Systems

CS-723

This course will cover the latest technologies, platforms and research contributions in the area of machine learning systems. The students will read, review and present papers from recent venues across the systems for ML spectrum.

Topics on Datacenter Design

CS-728

Modern datacenters with thousands of servers and multi-megawatt power budgets form the backbone of our digital universe. ln this course, we will survey a broad and comprehensive spectrum of datacenter design topics from workloads, to server architecture and infrastructure.