Publications of Sergej Breiter
Articles in journal or book chapters
-
Cache partitioning for sparse matrix–vector multiplication on the A64FX.
Parallel Computing, 127:103169,
2026.
doi:https://doi.org/10.1016/j.parco.2025.103169
Publisher's Version
PDF
Supplement
Abstract
One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2BibTeX Entry
@article{btf26, author = {Sergej Breiter and James D. Trotter and Karl Fürlinger}, title = {Cache partitioning for sparse matrix–vector multiplication on the A64FX}, journal = {Parallel Computing}, volume = {127}, pages = {103169}, year = {2026}, doi = {https://doi.org/10.1016/j.parco.2025.103169}, url = {https://www.sciencedirect.com/science/article/pii/S0167819125000456}, pdf = {https://www.sciencedirect.com/science/article/pii/S0167819125000456/pdfft?md5=0d23ebeaae7c2a4ab65d71f51763f527&pid=1-s2.0-S0167819125000456-main.pdf}, abstract = {One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2\% without cache partitioning. With sector cache enabled, depending on the configuration, the model predicts the number of L2 cache missed within 2–3\% and 4–18\% for sequential and parallel SpMV with 48 threads, respectively. Further experiments show the effect of various sector cache configurations on performance. A median speedup of about 1.05× is achieved, whereas the maximum speedup is about 1.6×.}, issn = {0167-8191}, keywords = {Sparse matrix–vector multiplication, A64FX, Cache partitioning, Sector cache, Performance model}, } -
BEAST Lab: A Practical Course on Experimental Evaluation of Diverse Modern HPC Architectures and Accelerators .
Journal of Computational Science Education, 2024(15):23-31,
3
2024.
PDF
BibTeX Entry
@article{bcef24, author = {Amir Raoofy and Bengisu Elis and Vincent Bode and Minh Chung and Sergej Breiter and Maron Schlemon and Dennis-Florian Herr and Karl Fuerlinger and Martin Schulz and Josef Weidendorfer}, title = {{BEAST} {Lab:} A {Practical} {Course} on {Experimental} {Evaluation} of {Diverse} {Modern} {HPC} {Architectures} and {Accelerators}}, journal = {Journal of Computational Science Education}, volume = {2024}, number = {15}, pages = {23-31}, year = {2024}, pdf = {https://doi.org/10.22369/issn.2153-4136/15/1/5}, key = {bcef24}, month = {3}, }
Articles in conference or workshop proceedings
-
Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems.
In SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026),
SCA/HPCAsia 2026,
2026.
Association for Computing Machinery.
doi:10.1145/3773656.3773692
Publisher's Version
PDF
BibTeX Entry
@inproceedings{bcfwk26, author = {Sergej Breiter and Minh Chung and Karl Fürlinger and Josef Weidendorfer and Dieter Kranzlmüller}, title = {Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems}, booktitle = {SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026)}, year = {2026}, series = {SCA/HPCAsia 2026}, publisher = {Association for Computing Machinery}, isbn = {979-8-4007-2067-3/26/01}, doi = {10.1145/3773656.3773692}, pdf = {}, address = {New York, NY, USA}, location = {Osaka, Japan}, } -
Reproducibility Report for SC25 Paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs.
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
SC '25,
pages 2325–2326,
2025.
Association for Computing Machinery.
doi:10.1145/3712285.3769441
Publisher's Version
PDF
Supplement
Abstract
This reproducibility report provides details about the artifact evaluation done with regards to the Artifact Description and Evaluation appendix of SC25 paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs by Yu et al.. The work was done as part of the Reproducibility Initiative of SC25. The author is a member of the SC25 Reproducibilty Committee.BibTeX Entry
@inproceedings{b25, author = {Sergej Breiter}, title = {Reproducibility Report for SC25 Paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, pages = {2325–2326}, year = {2025}, series = {SC '25}, publisher = {Association for Computing Machinery}, isbn = {9798400714665}, doi = {10.1145/3712285.3769441}, url = {https://doi.org/10.1145/3712285.3769441}, pdf = {https://dl.acm.org/doi/pdf/10.1145/3712285.3769441}, abstract = {This reproducibility report provides details about the artifact evaluation done with regards to the Artifact Description and Evaluation appendix of SC25 paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs by Yu et al.. The work was done as part of the Reproducibility Initiative of SC25. The author is a member of the SC25 Reproducibilty Committee.}, address = {New York, NY, USA}, keywords = {Artifact Evaluation, Reproducibility}, location = {}, numpages = {2}, } -
A Profiling-Based Approach to Cache Partitioning of Program Data.
In Parallel and Distributed Computing, Applications and Technologies (PDCAT22),
pages 453-463,
4
2023.
Springer Nature Switzerland.
PDF
BibTeX Entry
@inproceedings{bcfw23, author = {Sergej Breiter and Josef Weidendorfer and Minh Chung and Karl Fuerlinger}, title = {A {Profiling-Based} {Approach} to {Cache} {Partitioning} of {Program} {Data}}, booktitle = {Parallel and Distributed Computing, Applications and Technologies (PDCAT22)}, pages = {453--463}, year = {2023}, publisher = {Springer Nature Switzerland}, pdf = {https://doi.org/10.1007/978-3-031-29927-8\_35}, key = {bcfw23}, month = {4}, } -
Modelling Data Locality of Sparse Matrix-Vector Multiplication on the A64FX.
In Proceedings of the SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis,
pages 1334-1342,
11
2023.
ACM.
PDF
BibTeX Entry
@inproceedings{bft23a, author = {Sergej Breiter and James D. Trotter and Karl Fuerlinger}, title = {{Modelling} {Data} {Locality} of {Sparse} {Matrix-Vector} {Multiplication} on the {A64FX}}, booktitle = {Proceedings of the SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis}, volume = {2023}, pages = {1334-1342}, year = {2023}, publisher = {ACM}, pdf = {https://dl.acm.org/doi/pdf/10.1145/3624062.3624198}, key = {bft23a}, month = {11}, }
Disclaimer:
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

