Babak Falsafi

EPFL IC IINFCOM PARSA
INJ 233 (Bâtiment INJ)
Station 14
1015 Lausanne

Prix et distinctions

2015

2012

Alfred P. Sloan Foundation

2004

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

ACM SIGPLAN Notices. 2008. DOI : 10.1145/1402227.1402233.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs

N. (. (Nematollahi zadeh) MahaniH. FalahatiS. DarabiA. Javadi-NezhadY. Oh  et al.

ACM Transactions on Architecture and Code Optimization. 2025. DOI : 10.1145/3760782.

QFlex 3.0: Fast and Accurate ARM Server Simulation

S. LinA. AnsariA. ChakrabortyB. EryilmazY. Li  et al.

2025. ARM-based General-Purpose Computing: Software-Hardware Co-Optimization for Performance Acceleration, Tokyo, Japan, 2025-06-21.

Single-Address-Space FaaS with Jord

Y. LiA. BhattacharyyaM. KumarA. BhattacharjeeYoav Etsion  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 694 - 707. DOI : 10.1145/3695053.3731108.

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

M. GilD. HaS. B. HarmaM. K. YoonB. Falsafi  et al.

2025. The 52nd Annual International Symposium on Computer Architecture, Tokyo, Japan, 2025-06-21 - 2025-06-25. p. 153 - 165. DOI : 10.1145/3695053.3731100.

Constrained bit allocation for neural networks

S. BoudouhS. B. HarmaA. MahmoudB. Falsafi

2025. Machine Learning for Computer Architecture and Systems 2025, Tokyo, Japan, 2025-06-21.

Rethinking IOMMU for Future IO Devices

M. KumarY. LiY. EtsionA. BhattacharjeeA. Basu  et al.

2025. 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 2025-03-30 - 2025-04-03.

Effective Interplay Between Sparsity and Quantization: from Theory to Practice

S. B. HarmaA. ChakrabortyE. KostenokD. MishinD. Ha  et al.

2025. The Thirteenth International Conference on Learning Representations, Singapore, 2025-04-24 - 2025-04-28.

UrbanTwin: An urban digital twin for climate action

D.-A. ConstantinescuV. KartschY. NakatsukaP. WieseP. Orbanovik  et al.

EcoCloud Annual Event on IT Sustainability 2024, Lausanne, Switzerland, 2024-10-08.

Silicon Efficiency in Post-Moore Servers

A. AnsariS. LinA. ChakrabortyM. AlianB. Eryilmaz  et al.

2024. Workshop on Hot Topics in Ethical Computer Systems, San Diego, California, USA, 2024-04-28.

Secure Interface Design Leveraging Hardware/Software Support

A. Bhattacharyya / M. J. PayerB. Falsafi (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-9975.

Electrical-Level Fault-Injection Attacks on FPGA-Based Systems

D. G. A. S. Mahmoud / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2024. DOI : 10.5075/epfl-thesis-10315.

Server Architecture from Enterprise to Post-Moore

B. FalsafiM. FerdmanB. Grot

IEEE Micro. 2024. DOI : 10.1109/MM.2024.3418975.

What's Missing in Agile Hardware Design? Verification!

B. Falsafi

Journal Of Computer Science And Technology. 2023. DOI : 10.1007/s11390-023-0005-3.

Scale-out Systolic Arrays

A. C. YuzugulerC. SonmezM. DrumondY. OhB. Falsafi  et al.

Acm Transactions On Architecture And Code Optimization. 2023. DOI : 10.1145/3572917.

Rebooting Virtual Memory with Midgard

S. Gupta / B. FalsafiA. Bhattacharjee (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-8864.

SecureCells: A Secure Compartmentalized Architecture

A. BhattacharyyaF. HofhammerY. LiS. GuptaA. Sánchez Marín  et al.

2023. 44th IEEE Symposium on Security and Privacy, San Francisco, USA, May 22-24, 2023. p. 2921 - 2939. DOI : 10.1109/SP46215.2023.00125.

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads

M. J. SutherlandB. FalsafiA. Daglis

2023. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), Vancouver, BC, Canada, March 25–29, 2023. p. 30 - 46. DOI : 10.1145/3567955.3567957.

Evaluating, Exploiting, and Hiding Power Side-Channel Leakage of Remote FPGAs

O. Glamocanin / B. FalsafiM. Stojilovic (Dir.)

Lausanne, EPFL, 2023. DOI : 10.5075/epfl-thesis-9918.

Imprecise Store Exceptions

S. GuptaY. LiQ. KangA. BhattacharjeeB. Falsafi  et al.

2023. The 50th Annual International Symposium on Computer Architecture (ISCA ’23), Orlando, FL, USA, June 17–21, 2023. DOI : 10.1145/3579371.3589087.

AstriFlash: A Flash-Based System for Online Services

S. GuptaY. OhL. YanM. J. SutherlandA. Bhattacharjee  et al.

2023. The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA-29), Montreal, QC, Canada, Feb 25 – March 01, 2023. DOI : 10.1109/HPCA56546.2023.10070955.

Algorithms for Efficient and Robust Distributed Deep Learning

T. Lin / M. JaggiB. Falsafi (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8980.

Hardware and Software Support for RPC-Centric Server Architecture

M. J. Sutherland / B. FalsafiA. Daglis (Dir.)

Lausanne, EPFL, 2022. DOI : 10.5075/epfl-thesis-8017.

Cerebros: Evading the RPC Tax in Datacenters

A. Pourhabibi ZarandiM. J. SutherlandA. DaglisB. Falsafi

2021. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18–22, 2021. p. 407 - 420. DOI : 10.1145/3466752.3480055.

Equinox: Training (for Free) on a Custom Inference Accelerator

M. P. Drumond Lages De OliveiraL. CoulonA. Pourhabibi ZarandiA. C. YüzügülerB. Falsafi  et al.

2021. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021. DOI : 10.1145/3466752.3480057.

Hardware-Software Co-Design of an RPC Processor

A. Pourhabibi Zarandi / B. Falsafi (Dir.)

Lausanne, EPFL, 2021. DOI : 10.5075/epfl-thesis-7217.

Rebooting Virtual Memory with Midgard

S. GuptaA. BhattacharyyaY. OhA. BhattacharjeeB. Falsafi  et al.

2021. ISCA 2021 48th International Symposium on Computer Architecture, Online conference, June 14-19, 2021. DOI : 10.1109/ISCA52012.2021.00047.

Data transformer apparatus

A. Pourhabibi ZarandiS. GuptaH. KassirM. SutherlandZ. Tian  et al.

US11748254 ; US2022327048 ; WO2021037341 . 2021.

Exploiting Errors for Efficiency: A Survey from Circuits to Applications

P. Stanley-MarbellA. AlaghiM. CarbinE. DarulovaL. Dolecek  et al.

ACM Computing Surveys. 2020. DOI : 10.1145/3394898.

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

J. PicorelS. A. S. KohroudiZ. YanA. BhattacharjeeB. Falsafi  et al.

2020

Optimus Prime: Accelerating Data Transformation in Servers

A. Pourhabibi ZarandiS. GuptaH. KassirM. J. SutherlandZ. Tian  et al.

2020. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16–20, 2020. p. 1203 - 1216. DOI : 10.1145/3373376.3378501.

ColTraIn: Co-located DNN training and inference

M. P. Drumond Lages De Oliveira / B. FalsafiM. Jaggi (Dir.)

Lausanne, EPFL, 2020. DOI : 10.5075/epfl-thesis-10265.

The NEBULA RPC-Optimized Architecture

M. SutherlandS. GuptaB. FalsafiV. MaratheD. Pnevmatikatos  et al.

2020. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, May, 30th - June, 3rd 2020. p. 199 - 212. DOI : 10.1109/ISCA45697.2020.00027.

Distributed Logless Atomic Durability with Persistent Memory

S. GuptaA. DaglisB. Falsafi

2019. The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52), Columbus, OH, USA, October 12–16, 2019. DOI : 10.1145/3352460.3358321.

SMoTherSpectre: Exploiting Speculative Execution through Port Contention

A. BhattacharyyaA. SandulescuM. NeugschwandtnerA. SorniottiB. Falsafi  et al.

2019. The 26th ACM Conference on Computer and Communications Security - ACM CSS 2019, London, UK, November 11-15, 2019. p. 785 - 800. DOI : 10.1145/3319535.3363194.

Analog Neural Networks with Deep-submicron Nonlinear Synapses

A. C. YüzügülerF. ÇelikM. P. Drumond Lages De OliveiraB. FalsafiP. Frossard

IEEE Micro. 2019. DOI : 10.1109/MM.2019.2931182.

Design Guidelines for High-Performance SCM Hierarchies

D. UstiugovA. DaglisJ. Picorel ObandoM. J. SutherlandE. Bugnion  et al.

2018. 4th International Symposium on Memory Systems (MEMSYS), Old Town Alexandria, VA, USA, October 1-4, 2018. DOI : 10.1145/3240302.3240310.

Atomic object reads for in-memory rack-scale computing

A. DaglisB. R. GrotB. Falsafi

US10929174 ; US2018173673 . 2018.

Training DNNs with Hybrid Block Floating Point

M. DrumondT. LinM. JaggiB. Falsafi

2018. NeurIPS 2018 - 32nd Conference on Neural Information Processing Systems, Montreal, CANADA, Dec 02-08, 2018.

Network-Compute Co-Design for Distributed In-Memory Computing

A. Daglis / B. FalsafiE. Bugnion (Dir.)

Lausanne, EPFL, 2018. DOI : 10.5075/epfl-thesis-8749.

Near-Memory Address Translation

J. Picorel Obando / B. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7875.

Near-Memory Address Translation

J. PicorelD. JevdjicB. Falsafi

2017. 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, SEP 09-13, 2017. p. 303 - 317. DOI : 10.1109/Pact.2017.56.

The Mondrian Data Engine

M. P. Drumond Lages De OliveiraA. DaglisN. MirzadehD. UstiugovJ. Picorel Obando  et al.

2017. The 44th International Symposium on Computer Architecture, Toronto, ON, Canada, June 24-28, 2017. DOI : 10.1145/3079856.3080233.

FPGAs versus GPUs in Data centers

B. FalsafiB. DallyD. SinghD. ChiouJ. J. Yi  et al.

IEEE Micro. 2017. DOI : 10.1109/MM.2017.19.

Fat Caches For Scale-Out Servers

S. VolosD. JevdjicB. FalsafiB. Grot

Ieee Micro. 2017. DOI : 10.1109/MM.2017.32.

Rack-Scale Memory Pooling for Datacenters

S. Novakovic / E. BugnionB. Falsafi (Dir.)

Lausanne, EPFL, 2017. DOI : 10.5075/epfl-thesis-7612.

Unified prefetching into instruction cache and branch target buffer

B. FalsafiI. C. KaynakB. R. Grot

US9996358 ; US2017090935 . 2017.

Near-Memory Data Services

B. FalsafiM. StanK. SkadronN. JayasenaY. Chen  et al.

IEEE Micro. 2016. DOI : 10.1109/MM.2016.9.

An Analysis of Load Imbalance in Scale-out Data Serving

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM SIGMETRICS, Antibes Juan-Les-Pins, France, June 14-18, 2016. p. 367 - 368. DOI : 10.1145/2896377.2901501.

SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing

A. DaglisD. UstiugovS. NovakovicE. BugnionB. Falsafi  et al.

2016. 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, October 15-19, 2016. DOI : 10.1109/MICRO.2016.7783709.

Unlocking Energy

B. FalsafiR. GuerraouiJ. Picorel ObandoV. Trigonakis

2016. 2016 USENIX Annual Technical Conference, Denver, Colorado, USA, June 22-24, 2016. p. 393 - 406.

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2016. ACM Symposium on Cloud Computing, Santa Clara, USA, October 05-07, 2016. DOI : 10.1145/2987550.2987577.

Towards Near-Threshold Server Processors

A. PahlevanJ. Picorel ObandoA. Pourhabibi ZarandiD. RossiM. Zapater Sancho  et al.

2016. Design, Automation and Test in Europe Conference (DATE '16), Dresden, Germany, March 14-18, 2016. p. 7 - 12.

Multi-Gigabyte On-Chip DRAM Caches for Servers

D. Jevdic / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6631.

Manycore Network Interfaces for In-Memory Rack-Scale Computing

A. DaglisS. NovakovicE. BugnionB. FalsafiB. Grot

2015. 42nd International Symposium in Computer Architecture, Portland, Oregon, USA, June 13-17, 2015. DOI : 10.1145/2749469.2750415.

Confluence: unified instruction supply for scale-out servers

C. KaynakB. GrotB. Falsafi

2015. the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015. p. 166 - 177. DOI : 10.1145/2830772.2830785.

Memory Systems and Interconnects for Scale-Out Servers

S. Volos / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6682.

Accelerators for Data Processing

Y. O. Koçberber / B. Falsafi (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6710.

Asynchronous memory access chaining

O. KocberberB. FalsafiB. Grot

Proceedings of the VLDB Endowment. 2015. DOI : 10.14778/2856318.2856321.

Shared Frontend for Manycore Server Processors

I. C. Kaynak / B. FalsafiB. R. Grot (Dir.)

Lausanne, EPFL, 2015. DOI : 10.5075/epfl-thesis-6669.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehO. KocberberB. FalsafiB. Grot

2015. 5th Workshop on Architectures and Systems for Big Data (ASBD 2015), Portland, Oregon, USA, June 13, 2015.

Sort vs. Hash Join Revisited for Near-Memory Execution

N. MirzadehY. O. KoçberberB. FalsafiB. Grot

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ), Portland, Oregon, USA, June 13, 2015.

BuMP: Bulk Memory Access Prediction and Streaming

S. VolosJ. PicorelB. FalsafiB. Grot

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 13-17, 2014. p. 545 - 557. DOI : 10.1109/MICRO.2014.44.

Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models

E. VlachosS. FytrakiP. B. GibbonsM. A. KozuchB. Falsafi

2014

Towards stable cloud performance

D. Novakovic / B. FalsafiD. Kostic (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6261.

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring

S. FytrakiE. VlachosO. KocberberB. FalsafiB. Grot

2014. 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), Orlando, Florida, USA, February 15-19, 2014. p. 108 - 119. DOI : 10.1109/HPCA.2014.6835922.

A Case for Specialized Processors for Scale-Out Workloads

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

IEEE Micro. 2014. DOI : 10.1109/MM.2014.41.

Scale-Out NUMA

S. NovakovicA. DaglisE. BugnionB. FalsafiB. Grot

2014. Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, March 1-5, 2014. DOI : 10.1145/2541940.2541965.

A Primer on Hardware Prefetching

B. FalsafiT. F. Wenisch

Morgan & Claypool, 2014.

Architectural Support to Accelerate Fine-Grain Program Monitoring

S. Fytraki / B. Falsafi (Dir.)

Lausanne, EPFL, 2014. DOI : 10.5075/epfl-thesis-6257.

Big Data

B. FalsafiB. Grot

IEEE Micro. 2014. DOI : 10.1109/MM.2014.65.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

D. JevdjicG. H. LohC. KaynakB. Falsafi

2014. 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, December 13-17, 2014. p. 25 - 37. DOI : 10.1109/MICRO.2014.51.

Scale-Out Processors

P. Lotfi Kamran / B. Falsafi (Dir.)

Lausanne, EPFL, 2013. DOI : 10.5075/epfl-thesis-5906.

Multi-Grain Coherence Directory

J. ZebchukB. FalsafiA. Moshovos

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540739.

TOP PICKS FROM THE 2012 COMPUTER ARCHITECTURE CONFERENCES Introduction

B. FalsafiG. H. Loh

IEEE Micro. 2013. DOI : 10.1109/MM.2013.65.

Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

D. JevdjicS. VolosB. Falsafi

2013. 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013. p. 404 - 415. DOI : 10.1145/2485922.2485957.

DeSyRe: On-demand system reliability

I. SourdisC. StrydisA. ArmatoC. S. BouganisB. Falsafi  et al.

Microprocessors and Microsystems - Embedded Hardware Design. 2013. DOI : 10.1016/j.micpro.2013.08.008.

Meet the Walkers: Accelerating Index Traversals for In-Memory Databases

O. KocberberB. GrotJ. PicorelB. FalsafiK. Lim  et al.

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540748.

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

C. KaynakB. GrotB. Falsafi

2013. 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, December 7-11, 2013. DOI : 10.1145/2540708.2540732.

Thermal Characterization of Cloud Workloads on a Power-Efficient Server-on-Chip

D. MilojevicS. IdgunjiD. JevdjicE. OzerP. Lotfi-Kamran  et al.

2012. 30th IEEE International Conference on Computer Design, Montreal, Quebec, Canada, September 30 - October 3, 2012. DOI : 10.1109/ICCD.2012.6378637.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

S. VolosC. SeiculescuB. GrotN. Khosro PourB. Falsafi  et al.

2012. 6th International Symposium on Networks-on-Chip, Lyngby, Denmark, May 9-11, 2012.

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012

Dark Silicon Accelerators for Database Indexing

O. KoçberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

ACM Transactions on Computer Systems. 2012. DOI : 10.1145/2382553.2382557.

NOC-Out: Microarchitecting a Scale-Out Processor

P. Lotfi-KamranB. GrotB. Falsafi

2012. 45th International Symposium on Microarchitecture, Vancouver, BC, Canada, December 1-5, 2012. DOI : 10.1109/MICRO.2012.25.

Dark Silicon Accelerators for Database Indexing

O. KocberberB. FalsafiK. LimP. RanganathanS. Harizopoulos

2012. 1st Dark Silicon Workshop, Portland, Oregon, USA, June 10, 2012.

Optimizing Data-Center TCO with Scale-Out Processors

B. GrotD. HardyP. Lotfi-KamranB. FalsafiC. Nicopoulos  et al.

IEEE Micro. 2012. DOI : 10.1109/MM.2012.71.

BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

S. FytrakiO. KocberberE. VlachosJ. B. SartorB. Grot  et al.

2012

Scale-Out Processors

P. Lotfi-KamranB. GrotM. FerdmanS. VolosO. Kocberber  et al.

2012. 39th Annual International Symposium on Computer Architecture, Portland, Oregon, USA, June 9-13, 2012. DOI : 10.1145/2366231.2337217.

Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2012. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, London, UK, March 3-7, 2012.

Proactive Instruction Fetch

M. FerdmanC. KaynakB. Falsafi

2011. 44th Annual IEEE/ACM Symposium on Microarchitecture (MICRO 2011), Porto Alegre, Brazil, December 3-7. p. 152 - 162. DOI : 10.1145/2155620.2155638.

Toward Dark Silicon in Servers

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2011. DOI : 10.1109/MM.2011.77.

Reliability in the Dark Silicon Era

B. Falsafi

2011. 17th IEEE International On-Line Testing Symposium (IOLTS), Athens, Greece, Jul 13-15, 2011. p. V - V.

Cuckoo Directory: A Scalable Directory for Many-Core Systems

M. FerdmanP. Lotfi-KamranK. BaletB. Falsafi

2011. HPCA 2011, San Antonio, Texas, USA, February 12-16, 2011. DOI : 10.1109/HPCA.2011.5749726.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

C. SeiculescuS. VolosN. Khosro PourB. FalsafiG. De Micheli

2011. Workshop on Energy-Efficient Design (WEED 2011), San Jose, California, USA, June 5, 2011.

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

M. FerdmanA. AdilehO. KocberberS. VolosM. Alisafaee  et al.

2011

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

IEEE Micro. 2010. DOI : 10.1109/MM.2010.22.

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

E. VlachosM. L. GoodsteinM. A. KozuchS. ChenB. Falsafi  et al.

2010. ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010. p. 271 - 284. DOI : 10.1145/1736020.1736051.

Making Address-Correlated Prefetching Practical

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

IEEE Micro. 2010. DOI : 10.1109/MM.2010.21.

TurboTag: Lookup Filtering to Reduce Coherence Directory Power

P. Lotfi-KamranM. FerdmanD. CrisanB. Falsafi

2010. 16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20. p. 377 - 382. DOI : 10.1145/1840845.1840929.

Spatio-Temporal Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. Falsafi

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 69 - 80. DOI : 10.1145/1555754.1555766.

Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches

N. HardavellasM. FerdmanB. FalsafiA. Ailamaki

2009. 36th ACM/IEEE Annual International Symposium on Computer Architecture, Austin, TX. p. 184 - 195. DOI : 10.1145/1555754.1555779.

Flexible Hardware Acceleration for Instruction-Grain Lifeguards

S. ChenM. KozuchP. B. GibbonsM. RyanT. Strigkos  et al.

IEEE Micro Top Picks. 2009. DOI : 10.1109/MM.2009.6.

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

B. T. GoldB. FalsafiJ. C. Hoe

2009. p. 195 - 201. DOI : 10.1109/PRDC.2009.39.

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungM. K. PapamichaelE. NurvitadhiJ. C. HoeB. Falsafi  et al.

ACM Transactions on Reconfigurable Technology and Systems. 2009. DOI : 10.1145/1534916.1534925.

Practical Off-chip Meta-data for Temporal Memory Streaming

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2009. 15th International Symposium on High-Performance Computer Architecture, Raleigh, NC. p. 79 - 90. DOI : 10.1109/HPCA.2009.4798239.

Temporal streams in commercial server applications

T. F. WenischM. FerdmanA. AilamakiB. FalsafiA. Moshovos

2008. IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, September. p. 99 - 108. DOI : 10.1109/IISWC.2008.4636095.

Workshop on Transactional Computing (TRANSACT 2008) - Introduction

B. FalsafiP. Felber

ACM SIGPLAN Notices. 2008. DOI : 10.1145/1402227.1402233.

Flexible hardware acceleration for instruction-grain program monitoring

S. ChenM. KozuchT. StrigkosB. FalsafiP. B. Gibbons  et al.

2008. the 35th Annual International Symposium on Computer Architecture (ISCA), Beijing, China, June. p. 377 - 388. DOI : 10.1109/ISCA.2008.20.

A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2008. 16th international ACM/SIGDA symposium on Field programmable gate arrays (FPGA), Monterey, CA, February. p. 77 - 86. DOI : 10.1145/1344671.1344684.

Predictor virtualization

I. BurceaS. SomogyiA. MoshovosB. Falsafi

2008. the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), Seattle, WA, March. p. 157 - 167. DOI : 10.1145/1346281.1346301.

Temporal instruction fetch streaming

M. FerdmanT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2008. the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Lake Como, Italy, November. p. 1 - 10. DOI : 10.1109/MICRO.2008.4771774.

Multi-bit error tolerant caches using two-dimensional error coding

J. KimN. HardavellasK. MaiB. FalsafiJ. C. Hoe

2007. p. 197 - 209. DOI : 10.1109/MICRO.2007.19.

Database Servers on Chip Multiprocessors: Limitations and Opportunities

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

Scheduling threads for constructive cache sharing on CMPs

S. ChenP. B. GibbonsM. KozuchV. LiaskovitisA. Ailamaki  et al.

2007. p. 105 - 115. DOI : 10.1145/1248377.1248396.

An Analysis of Database System Performance on Chip Multiprocessors

N. HardavellasI. PandisR. JohnsonN. MancherilA. Ailamaki  et al.

2007.

PROTOFLEX: FPGA-accelerated hybrid functional simulator

E. S. ChungE. NurvitadhiJ. C. HoeB. FalsafiK. Mai

2007. DOI : 10.1109/IPDPS.2007.370516.

Mechanisms for store-wait-free multiprocessors

T. F. WenischA. AilamakiB. FalsafiA. Moshovos

2007. p. 266 - 277. DOI : 10.1145/1250662.1250696.

To Share or Not To Share?

R. JohnsonN. HardavellasI. PandisN. MancherilS. Harizopoulos  et al.

2007. 33rd International Conference on Very Large Data Bases, Vienna, Austria, September. p. 351 - 362.

PAI: A lightweight mechanism for single-node memory recovery in DSM servers

J. KimJ. C. SmolensB. FalsafiJ. C. Hoe

2007. p. 298 - 305. DOI : 10.1109/PRDC.2007.53.

Last-touch correlated data streaming

M. FerdmanB. Falsafi

2007. p. 105 - 115. DOI : 10.1109/ISPASS.2007.363741.

Reunion: Complexity-effective multicore redundancy

J. C. SmolensB. T. GoldB. FalsafiJ. C. Hoe

2006. p. 223 - 234. DOI : 10.1109/MICRO.2006.42.

Coarse-grain coherence tracking: RegionScout and region coherence arrays

J. F. CantinJ. E. SmithM. H. LipastiA. MoshovosB. Falsafi

IEEE Micro. 2006. DOI : 10.1109/MM.2006.8.

Parallel depth first vs. work stealing schedulers on CMP architectures

V. LiaskovitisS. ChenP. B. GibbonsA. AilamakiG. E. Blelloch  et al.

2006. DOI : 10.1145/1148109.1148167.

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

E. S. ChungJ. C. HoeB. Falsafi

2006.

Statistical sampling of microarchitecture simulation

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

ACM Transactions on Modeling and Computer Simulation. 2006. DOI : 10.1145/1147224.1147225.

Log-based architectures for general-purpose monitoring of deployed code

S. ChenB. FalsafiP. B. GibbonsM. KozuchT. C. Mowry  et al.

2006. p. 63 - 65. DOI : 10.1145/1181309.1181319.

The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors

B. T. GoldJ. C. SmolensB. FalsafiJ. C. Hoe

2006.

Spatial Memory Streaming

S. SomogyiT. F. WenischA. AilamakiB. FalsafiA. Moshovos

2006. p. 252 - 263. DOI : 10.1109/ISCA.2006.38.

Exploiting reference idempotency to reduce speculative storage overflow

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

ACM Transactions on Programming Languages and Systems. 2006. DOI : 10.1145/1152649.1152653.

Simulation sampling with live-points

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2006. p. 2 - 12. DOI : 10.1109/ISPASS.2006.1620785.

Dynamic feature selection for hardware prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

Journal of Systems Architecture. 2006. DOI : 10.1016/j.sysarc.2004.12.007.

Temporal Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimA. Ailamaki  et al.

2005. p. 222 - 233. DOI : 10.1109/ISCA.2005.50.

TRUSS: A Reliable, Scalable Server Architecture

B. T. GoldJ. KimJ. C. SmolensE. ChungV. Liaskovitis  et al.

IEEE Micro. 2005. DOI : 10.1109/MM.2005.122.

Understanding the performance of concurrent error detecting superscalar microarchitectures

J. C. SmolensK. JangwooJ. C. HoeB. Falsafi

2005. p. 13 - 18. DOI : 10.1109/ISSPIT.2005.1577062.

ReCast: Boosting tag line buffer coverage in low-power high-level caches "for free"

W.-H. ParkA. MoshovosB. Falsafi

2005. p. 609 - 616. DOI : 10.1109/ICCD.2005.90.

DBmbench: fast and accurate database workload representation on modern microarchitecture

M. ShaoA. AilamakiB. Falsafi

2005. p. 254 - 267. DOI : 10.1145/1105634.1105653.

TurboSMARTS: Accurate microarchitecture simulation sampling in minutes

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2005. p. 408 - 409. DOI : 10.1145/1064212.1064278.

A case for asymmetric-cell cache memories

A. MoshovosB. FalsafiF. N. NajmN. Azizi

IEEE Transactions on Very Large Scale Integration Systems. 2005. DOI : 10.1109/TVLSI.2005.850127.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

B. FalsafiD. A. Wood

Journal of Parallel and Distributed Computing. 2005. DOI : 10.1016/j.jpdc.2004.11.011.

Accelerating Database Operations Using a Network Processor

B. T. GoldA. AilamakiL. HustonB. Falsafi

2005.

Store-Ordered Streaming of Shared Memory

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2005. p. 75 - 86. DOI : 10.1109/PACT.2005.37.

The Third International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2004

Accurate and complexity-effective spatial pattern prediction

C. F. ChenS.-H. YangB. FalsafiA. Moshovos

2004. p. 276 - 287.

Memory coherence activity prediction in commercial workloads

S. SomogyiT. F. WenischN. HardavellasJ. KimA. Ailamaki  et al.

2004. p. 37 - 45. DOI : 10.1145/1054943.1054949.

An Evaluation of Stratified Sampling of Microarchitecture Simulations

R. E. WunderlichT. F. WenischB. FalsafiJ. C. Hoe

2004.

The Fourth International Workshop on Power-Aware Computer Systems. Revised Papers

B. FalsafiT. Vijaykumar

2004

TurboSMARTS: Accurate Microarchitecture Simulation Sampling in Minute

T. F. WenischR. E. WunderlichB. FalsafiJ. C. Hoe

2004

SORDS: Just-In-Time Streaming of Temporally-Correlated Shared Data

T. F. WenischS. SomogyiN. HardavellasJ. KimC. Gniady  et al.

2004

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. SmolensB. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

2004.

Efficient resource sharing in concurrent error detecting superscalar microarchitectures

J. C. SmolensJ. KimJ. C. HoeB. Falsafi

2004. p. 257 - 268. DOI : 10.1109/MICRO.2004.19.

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

N. HardavellasS. SomogyiT. F. WenischR. E. WunderlichS. Chen  et al.

Performance Evaluation Review. 2004. DOI : 10.1145/1054907.1054914.

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

J. C. SmolensB. T. GoldJ. KimB. FalsafiJ. C. Hoe  et al.

IEEE Micro. 2004. DOI : 10.1109/MM.2004.72.

Near-optimal precharging in high-performance nanoscale CMOS caches

S.-H. YangB. Falsafi

2003. p. 67 - 78. DOI : 10.1109/MICRO.2003.1253184.

Speculative Sequential Consistency with Little Custom Storage

C. GniadyB. Falsafi

Journal of Instruction-Level Parallelism. 2003.

Performance and Energy Trade-Offs of Bitline Isolation in Nanoscale CMOS Caches

S.-H. YangB. Falsafi

2003.

Implicitly-multithreaded processors

I. ParkB. FalsafiT. N. Vijaykumar

2003. p. 39 - 50. DOI : 10.1145/859618.859624.

The Second International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2003

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

S.-H. YangM. D. PowellB. FalsafiT. N. Vijaykumar

2002. p. 151 - 161. DOI : 10.1109/HPCA.2002.995706.

Optimizing traffic in DSM clusters: fine-grain memory caching versus page migration/replication

A.-C. LaiB. Falsafi

Theory of Computing Systems. 2002. DOI : 10.1007/s00224-002-1054-6.

Speculative sequential consistency with little custom storage

C. GniadyB. Falsafi

2002. p. 179 - 188. DOI : 10.1109/PACT.2002.1106016.

Gated Precharge: Using Temporal Locality of Subarrays to Save Deep- Submicron Cache Energy

S.-H. YangB. Falsafi

2002.

An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001. p. 147 - 157. DOI : 10.1109/HPCA.2001.903259.

Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor

C.-L. OoiS. W. KimI. ParkR. EigenmannB. Falsafi  et al.

2001. p. 368 - 380. DOI : 10.1145/377792.377863.

Reducing leakage in a high-performance deep-submicron instruction cache

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2001. DOI : 10.1109/92.920821.

Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery

J. RayJ. C. HoeB. Falsafi

2001. 34th Annual IEEE/ACM International Symposium on Microarchitecture, Austin, Texas, December 1-5, 2001. p. 214 - 224. DOI : 10.1109/MICRO.2001.991120.

Evaluating Opportunity and Effectiveness of Cache Resizing in Reducing Energy Dissipation

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2001

The First International Workshop on Power-Aware Computer Systems. Revised Papers.

B. FalsafiT. Vijaykumar

2001

JETTY: Filtering snoops for reduced energy consumption in SMP servers

A. MoshovosG. MemikB. FalsafiA. Choudhary

2001. p. 85 - 96. DOI : 10.1109/HPCA.2001.903254.

Reference idempotency analysis: A framework for optimizing speculative execution

S. W. KimC.-L. OoiR. EigenmannB. FalsafiT. N. Vijaykumar

2001. p. 2 - 11. DOI : 10.1145/379539.379547.

Dead-block prediction & dead-block correlating prefetchers

A.-C. LaiC. FideB. Falsafi

2001. p. 144 - 154. DOI : 10.1109/ISCA.2001.937443.

Reducing set-associative cache energy via way-prediction and selective direct-mapping

M. D. PowellA. AgarwalT. N. VijaykumarB. FalsafiK. Roy

2001. p. 54 - 65.

Low-Overhead and High-Performance Implementations of Sequential Consistency

C. GniadyB. Falsafi

2000.

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

A.-C. LaiB. Falsafi

2000. p. 79 - 88. DOI : 10.1145/341800.341811.

Dynamically Resizable Instruction Cache: A Design for an Energy-Efficient and High-Performance Deep-Submicron Instruction Cache

S.-H. YangM. D. PowellB. FalsafiK. RoyT. N. Vijaykumar

2000

Selective, accurate, and timely self-invalidation using last-touch prediction

A.-C. LaiB. Falsafi

2000. p. 139 - 148. DOI : 10.1109/ISCA.2000.854385.

Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor

S. W. KimC.-L. OoiI. ParkR. EigenmannB. Falsafi  et al.

2000

The Fourth International Workshop on Network-Based Parallel Computing. Communication, Architecture, and Applications. Revised Papers.

B. FalsafiM. Lauria

2000

Gated-Vdd: a circuit technique to reduce leakage in deep- submicron cache memories

M. D. PowellS.-H. YangB. FalsafiK. RoyT. N. Vijaykumar

2000. International Symposium on Low Power Electronics and Design (ISLPED), Rapallo, Italy, July. p. 90 - 95. DOI : 10.1109/LPE.2000.876763.

Dynamic Feature Selection for Hardware Prediction

A. FernR. GivanB. FalsafiT. N. Vijaykumar

2000

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowM. D. Hill  et al.

IEEE Concurrency. 2000. DOI : 10.1109/4434.895100.

Address partitioning in DSM clusters with parallel coherence controllers

I. PragaspathyB. Falsafi

2000. p. 47 - 56. DOI : 10.1109/PACT.2000.888330.

Is SC + ILP = RC?

C. GniadyB. FalsafiT. N. Vijaykumar

ACM SIGARCH Computer Architecture News. 1999. DOI : 10.1145/307338.300993.

Memory sharing predictor: the key to a speculative coherent DSM

A.-C. LaiB. Falsafi

1999. p. 172 - 183. DOI : 10.1109/ISCA.1999.765949.

Is SC+ILP=RC?

C. GuiadyB. FalsafiT. N. Vijaykumar

1999. ISCA, Atlanta, GA, May. p. 162 - 171. DOI : 10.1109/ISCA.1999.765948.

Cacheable Interface Control Registers for High Speed Data Transfer

D. A. WoodS. K. ReinhardtS. S. MukherjeeB. FalsafiM. D. Hill  et al.

US5951657 . 1999.

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. FalsafiD. A. Wood

1999. p. 182 - 192. DOI : 10.1109/HPCA.1999.744362.

Sirocco: cost-effective fine-grain distributed shared memory

I. SchoinasB. FalsafiM. D. HillJ. R. LarusD. A. Wood

1998. p. 40 - 49. DOI : 10.1109/PACT.1998.727144.

Reactive NUMA: A design for unifying S-COMA and CC-NUMA

B. FalsafiD. A. Wood

1997. p. 229 - 240. DOI : 10.1145/264107.264205.

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

S. S. MukherjeeS. K. ReinhardtB. FalsafiM. LitzkowS. Huss-Lederman  et al.

1997.

Modeling cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

ACM Transactions on Modeling and Computer Simulation. 1997. DOI : 10.1145/244804.244808.

Fine-grain Access Control for Distributed Shared Memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

Distributed Shared Memory: Concepts and Systems; IEEE Computer Society Press, 1997.

Scheduling communication on an SMP node parallel machine

B. FalsafiD. A. Wood

1997. p. 128 - 138. DOI : 10.1109/HPCA.1997.569649.

Coherent network interfaces for fine-grain communication

S. S. MukherjeeB. FalsafiM. D. HillD. A. Wood

1996. p. 247 - 258. DOI : 10.1145/232973.232999.

Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations

I. SchoinasB. FalsafiM. D. HillJ. R. LarusC. E. Lukas  et al.

1996

When does Dedicated Protocol Processing Make Sense?

B. FalsafiD. A. Wood

1996

Mechanisms for Cooperative Shared Memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

CMG Transactions. 1994. DOI : 10.1145/173682.165151.

Application-specific protocols for user-level shared memory

B. FalsafiA. R. LebeckS. K. ReinhardtI. SchoinasM. D. Hill  et al.

1994. Supercomputing '94, Washington D.C., USA, November 14-18. p. 380 - 389. DOI : 10.1109/SUPERC.1994.344301.

Fine-grain access control for distributed shared memory

I. SchoinasB. FalsafiA. R. LebeckS. K. ReinhardtJ. R. Larus  et al.

1994. ASPLOS'94. 6th International Conference on Architectural support for Programming Languages and Operating Systems, San Jose, CA, October. p. 297 - 306. DOI : 10.1145/195470.195575.

Cost/performance of a parallel computer simulator

B. FalsafiD. A. Wood

1994. p. 173 - 182.

Mechanisms for cooperative shared memory

D. A. WoodS. ChandraB. FalsafiM. D. HillJ. R. Larus  et al.

1993. 20th International Symposium on Computer Architecture, San Diego, CA, May. p. 156 - 167. DOI : 10.1145/165123.165151.

Kernel support for the Wisconsin Wind Tunnel

S. K. ReinhardtB. FalsafiD. A. Wood

1993. p. 73 - 89.

Component Labeling Algorithms on an Intel iPSC/2 Hypercube

B. FalsafiR. Miller

1990. p. 159 - 164.

Enseignement et PhD

Doctorant·es actuel·les

Yuanlong Li, Simla Burcu Harma, Shashwat Shrivastava, Alexandros Poupakis, Shanqing Lin, Ayan Chakraborty, Pooria Poorsarvi Tehrani, Ali Ansari

A dirigé les thèses EPFL de

Pejman Lotfi Kamran, Sotiria Fytraki, Dejan Novakovic, Stavros Volos, Djordje Jevdjic, Ilknur Cansu Kaynak, Yusuf Onur Koçberber, Javier Picorel Obando, Alexandros Daglis, Mario Paulo Drumond Lages De Oliveira, Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Siddharth Gupta, Ognjen Glamocanin, Dina Gamaleldin Ahmed Shawky Mahmoud

A co-dirigé les thèses EPFL de

Stanko Novakovic, Tao Lin, Atri Bhattacharyya

Cours

Advanced multiprocessor architecture

CS-471

Les multiprocesseurs sont à la base des systèmes informatiques. Ce cours couvre l'architecture et l'organisation des multiprocesseurs modernes, des accélérateurs prévalents (GPU, TPU, etc.) et des datacenters. Il comprend un projet de recherche sur les multiprocesseurs et les datacenters post-Moore.

Parallelism and concurrency in software

CS-302

Des capteurs aux téléphones intelligents,en passant par les plus grands centres de données et superordinateurs du monde,le parallélisme et la concurrence sont omniprésents dans l'informatique moderne.Il existe également de nombreuses formes d'exécution parallèle et concurrente dans les plateformes.