Publications of year 2026
Articles in journal or book chapters
-
Cache partitioning for sparse matrix–vector multiplication on the A64FX.
Parallel Computing, 127:103169,
2026.
doi:https://doi.org/10.1016/j.parco.2025.103169
Publisher's Version
PDF
Supplement
Abstract
One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2BibTeX Entry
@article{btf26, author = {Sergej Breiter and James D. Trotter and Karl Fürlinger}, title = {Cache partitioning for sparse matrix–vector multiplication on the A64FX}, journal = {Parallel Computing}, volume = {127}, pages = {103169}, year = {2026}, doi = {https://doi.org/10.1016/j.parco.2025.103169}, url = {https://www.sciencedirect.com/science/article/pii/S0167819125000456}, pdf = {https://www.sciencedirect.com/science/article/pii/S0167819125000456/pdfft?md5=0d23ebeaae7c2a4ab65d71f51763f527&pid=1-s2.0-S0167819125000456-main.pdf}, abstract = {One of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of applying the sector cache to sparse matrix-vector multiplication (SpMV) in the Compressed Sparse Row (CSR) format using a collection of 490 sparse matrices. A performance model based on reuse analysis is used to better understand situations in which and how the sector cache leads to improved cache reuse and to predict cache behavior. The model predicts the number of L2 cache misses within an error of 2\% without cache partitioning. With sector cache enabled, depending on the configuration, the model predicts the number of L2 cache missed within 2–3\% and 4–18\% for sequential and parallel SpMV with 48 threads, respectively. Further experiments show the effect of various sector cache configurations on performance. A median speedup of about 1.05× is achieved, whereas the maximum speedup is about 1.6×.}, issn = {0167-8191}, keywords = {Sparse matrix–vector multiplication, A64FX, Cache partitioning, Sector cache, Performance model}, }
Articles in conference or workshop proceedings
-
Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems.
In SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026),
SCA/HPCAsia 2026,
2026.
Association for Computing Machinery.
doi:10.1145/3773656.3773692
Publisher's Version
PDF
BibTeX Entry
@inproceedings{bcfwk26, author = {Sergej Breiter and Minh Chung and Karl Fürlinger and Josef Weidendorfer and Dieter Kranzlmüller}, title = {Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems}, booktitle = {SCA/HPCAsia 2026: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026)}, year = {2026}, series = {SCA/HPCAsia 2026}, publisher = {Association for Computing Machinery}, isbn = {979-8-4007-2067-3/26/01}, doi = {10.1145/3773656.3773692}, pdf = {}, address = {New York, NY, USA}, location = {Osaka, Japan}, }
Disclaimer:
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

