Babak Falsafi

EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne

Prix et distinctions

2015

2012

Alfred P. Sloan Foundation

2004

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

Acm Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

Acm Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

Acm Sigplan Notices. 2008. DOI : 10.1145/1402227.1402233.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Enseignement et PhD

Doctorant·es actuel·les

Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari

A dirigé les thèses EPFL de

Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud

A co-dirigé les thèses EPFL de

Stanko Novakovic, Tao Lin, Atri Bhattacharyya

Cours

Advanced multiprocessor architecture

CS-471

Les multiprocesseurs sont à la base des systèmes informatiques. Ce cours couvre l'architecture et l'organisation des multiprocesseurs modernes, des accélérateurs prévalents (GPU, TPU, etc.) et des datacenters. Il comprend un projet de recherche sur les multiprocesseurs et les datacenters post-Moore.

Parallelism and concurrency in software

CS-302

Des capteurs aux téléphones intelligents,en passant par les plus grands centres de données et superordinateurs du monde,le parallélisme et la concurrence sont omniprésents dans l'informatique moderne.Il existe également de nombreuses formes d'exécution parallèle et concurrente dans les plateformes.