Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware
Resumen
Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo.
Referencias bibliográficas
AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.html
AMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.html
AMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.html
AMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.html
Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604
Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260
Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935
Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772-2014.pdf
Bosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021
BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programming-models/linear-algebra-and-math-libraries
Cppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raii
Davis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663
De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779
Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042
Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22
ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.html
Fang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019-00581-w
Gao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36
Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419
Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042
Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12
Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/
Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/sycl
Kuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005
Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847
NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkit
OpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/
Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30
Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280
Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029
Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdf
Townsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826
Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134
Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020-8588-8_6
Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12-18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31
Descargas
Derechos de autor 2025 Revista Colombiana de Computación

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.