MNM publication list

Compact view

Publications of year 2026

Articles in journal or book chapters

Sergej Breiter, James D. Trotter, and Karl Fürlinger. Cache partitioning for sparse matrix–vector multiplication on the A64FX. Parallel Computing, 127:103169, 2026. doi:https://doi.org/10.1016/j.parco.2025.103169 Publisher's Version PDF Supplement

Abstract

One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2

BibTeX Entry

@article{btf26, author = {Sergej Breiter and James D. Trotter and Karl Fürlinger}, title = {Cache partitioning for sparse matrix–vector multiplication on the A64FX}, journal = {Parallel Computing}, volume = {127}, pages = {103169}, year = {2026}, doi = {https://doi.org/10.1016/j.parco.2025.103169}, url = {https://www.sciencedirect.com/science/article/pii/S0167819125000456}, pdf = {https://www.sciencedirect.com/science/article/pii/S0167819125000456/pdfft?md5=0d23ebeaae7c2a4ab65d71f51763f527&pid=1-s2.0-S0167819125000456-main.pdf}, abstract = {One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2\% without cache partitioning. With sector cache enabled, depending on the configuration, the model predicts the number of L2 cache missed within 2–3\% and 4–18\% for sequential and parallel SpMV with 48 threads, respectively. Further experiments show the effect of various sector cache configurations on performance. A median speedup of about 1.05× is achieved, whereas the maximum speedup is about 1.6×.}, issn = {0167-8191}, keywords = {Sparse matrix–vector multiplication, A64FX, Cache partitioning, Sector cache, Performance model}, }
Various authors. Seminar Proceedings Selected Topics in Quantum Computing Winter Term 2025/2026. 2026. PDF

BibTeX Entry

@article{sprocws25, author = {Various authors}, title = {Seminar Proceedings Selected Topics in Quantum Computing Winter Term 2025/2026}, year = {2026}, pdf = {https://bib.nm.ifi.lmu.de/pdf/semQC25proceedings.pdf}, }

Articles in conference or workshop proceedings

Sergej Breiter, Minh Chung, Karl Fürlinger, Josef Weidendorfer, and Dieter Kranzlmüller. Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems. In SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026), SCA/HPCAsia 2026, 2026. Association for Computing Machinery. doi:10.1145/3773656.3773692 Publisher's Version PDF

BibTeX Entry

@inproceedings{bcfwk26, author = {Sergej Breiter and Minh Chung and Karl Fürlinger and Josef Weidendorfer and Dieter Kranzlmüller}, title = {Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems}, booktitle = {SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026)}, year = {2026}, series = {SCA/HPCAsia 2026}, publisher = {Association for Computing Machinery}, isbn = {979-8-4007-2067-3/26/01}, doi = {10.1145/3773656.3773692}, pdf = {}, address = {New York, NY, USA}, location = {Osaka, Japan}, }

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Last modified: Mon Feb 02 23:42:09 2026 CET