Babak Falsafi

EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne

Prix et distinctions

2015

2012

Alfred P. Sloan Foundation

2004

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Enseignement et PhD

Doctorant·es actuel·les

Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari

A dirigé les thèses EPFL de

Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud

A co-dirigé les thèses EPFL de

Stanko Novakovic, Tao Lin, Atri Bhattacharyya

Cours

Advanced multiprocessor architecture

CS-471

Les multiprocesseurs sont à la base des systèmes informatiques. Ce cours couvre l'architecture et l'organisation des multiprocesseurs modernes, des accélérateurs prévalents (GPU, TPU, etc.) et des datacenters. Il comprend un projet de recherche sur les multiprocesseurs et les datacenters post-Moore.

Parallelism and concurrency in software

CS-302

Des capteurs aux téléphones intelligents,en passant par les plus grands centres de données et superordinateurs du monde,le parallélisme et la concurrence sont omniprésents dans l'informatique moderne.Il existe également de nombreuses formes d'exécution parallèle et concurrente dans les plateformes.