Babak Falsafi

EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne

Expertise

Computer architecture, datacenter systems, cloud-native server architecture.
Babak is a Professor in the School of Computer and Communication Sciences and the founder of EcoCloud, an industrial/academic consortium at EPFL investigating scalable sustainable information technology. He has made numerous contributions to computer system design and evaluation including a scalable multiprocessor architecture which was prototyped by Sun Microsystems (now Oracle), snoop filters incorporated into multi-socket x86 servers and IBM BlueGene supercomputers, spatial and temporal memory streaming that appear in ARM cores, and computer system performance evaluation methodologies that have been in use by AMD, HP and Google PerfKit . He has shown that hardware memory consistency models are neither necessary (in the 90's) nor sufficient (a decade later) to achieve high performance in servers. These results eventually led to fence speculation in modern CPUs. His work on cloud-native CPUs laid the foundation for the first generation of Cavium ARM server CPUs, ThunderX. He is a recipient of an NSF CAREER award, IBM Faculty Partnership Awards, and an Alfred P. Sloan Research Fellowship. He is a fellow of ACM and IEEE.

NEWS

Online services are stuck in memory and DRAM is not scaling. AstriFlash at HPCA'23 presents a system to serve data directly out of Flash, reducing memory cost by 20x and meeting ms-scale SLO for online services at 95% of throughput compared to DRAM.
Network bandwidth is projected to grow at 20% a year for a decade thanks to optics. Logic density is lagging behind at 15% a year and slowing down resulting a "datacenter tax". Optimus Prime a data transformation accelerator, NebuLA a hardware-terminated network stack, and Cerebros an RPC processor are examples of how to mitigate the datacenter tax in the post-Moore era. Great to see that Google has followed up with their own data transformation accelerator in 2022.
See our paper on "Rebooting Virtual Memory with Midgard" for a novel approach to future-proof virtual memory. Here is a news snippet.
Numerical training of DNNs is converging on fixed point with orders of magnitude improvement in logic, memory, power and bandwidth. See our blog.

RESEARCH

Data has emerged as a currency for modern society and datacenters are now the backbone of IT offering large-scale cloud services at low costs benefiting from and exploiting the economies of scale. With silicon efficiency scaling having dwindled since 2004 and silicon density scaling, Moore's Law, slowing down, future digital platforms will rely on heterogeneous logic and memory to allow for IT scalability. Meanwhile, the demand for large-scale cloud services has grown dramatically faster than conventional silicon scaling making IT platform scalability a grand challenge. Future platforms will need hand-in-hand collaboration of application domain experts and platform designers to improve scalability. With many online services being in-memory and the minimum communication latency between the farthest nodes being microseconds, future server platforms will go through revolutionary changes in architecture and systems to enable seamless aggregation of logic and memory resources across nodes, breaking the conventional abstraction layers. Babak's research and educational activities center around post-Moore server design.
He investigates techniques to address these challenges in the context of the following projects:
  • CloudSuite: A Benchmark Suite for Scale-Out Workloads
  • ColTraIn: Co-Located Training and Inference DNN Accelerators
  • HARNESS: Heterogeneous Architectures for Next-Generation Server Systems
  • Midgard: Future-Proofing Virtual Memory
  • QFlex: Fast, Full-System Open-Source Server Simulation/Emulation
  • VISA: Cloud-Native CPUs

Selected Talks

Awards

Elected Fellow of Association for Computing Machinery (ACM)

2015

Elected Fellow of the Institute of Electrical and Electronics Engineers

2012

Sloan Research Fellowship

Alfred P. Sloan Foundation

2004

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

ACM SIGPLAN Notices. 2008. DOI : 10.1145/1402227.1402233.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Infoscience

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

ACM SIGPLAN Notices. 2008. DOI : 10.1145/1402227.1402233.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Teaching & PhD

PhD Students

Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari

Past EPFL PhD Students

Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud

Past EPFL PhD Students as codirector

Stanko Novakovic, Tao Lin, Atri Bhattacharyya

Courses

Advanced multiprocessor architecture

CS-471

Multiprocessors are basic building blocks for all computer systems. This course covers the architecture and organization of modern multiprocessors, prevalent accelerators (e.g., GPU, TPU), and datacenters. It includes a research project on multiprocessors and post-Moore era datacenters.

Parallelism and concurrency in software

CS-302

From sensors,to smart phones,to the world's largest datacenters and supercomputers, parallelism & concurrency is ubiquitous in modern computing.There are also many forms of parallel & concurrent execution in modern platforms with varying degrees of ease of programmability,performance & efficiency.

Topics in Machine Learning Systems

CS-723

This course will cover the latest technologies, platforms and research contributions in the area of machine learning systems. The students will read, review and present papers from recent venues across the systems for ML spectrum.

Topics on Datacenter Design

CS-728

Modern datacenters with thousands of servers and multi-megawatt power budgets form the backbone of our digital universe. ln this course, we will survey a broad and comprehensive spectrum of datacenter design topics from workloads, to server architecture and infrastructure.