Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware

Palabras clave: Computación de Alto Rendimiento, Ginkgo, FPGAs, SpMV

Resumen

Aunque los sistemas heterogéneos basados en aceleradores hardware son un tópico de tendencia en la comunidad HPC, explorar las ventajas y desventajas de los basados en hardware reconfigurable en librerías de álgebra lineal para sistemas de alto rendimiento, no ha sido estudiado en profundidad. Por ello, en esta investigación, nuestro objetivo es aprovechar la capacidad de reconfiguración, adaptabilidad y reducción del consumo de energía de las FPGAs para generar kernels basados en FPGAs en Ginkgo, una librería especializada de álgebra lineal de alto rendimiento para sistemas multinúcleo. Generamos 3 kernels basados en FPGA para los formatos CSR, SELLP y SELL SpMV, y obtuvimos aumentos de velocidad de al menos 10 veces respecto a los kernels basados en CPU. Además, demostramos mediante un estudio de caracterización del rendimiento que las FPGA superan a los procesadores de propósito general en términos de tiempo de cálculo.

Referencias bibliográficas

AMD. (2022a, August 4). Heterogeneous Accelerated Compute Cluster (HACC) Program. (Advanced Micro Devices, Inc) Retrieved 2023, from AMD Website: https://www.amd.com/en/corporate/university-program/aup-hacc.html

AMD. (2022b, October 7). XRT Native APIs. (Advanced Micro Devices, Inc) Retrieved 2023, from https://xilinx.github.io/XRT/master/html/xrt_native_apis.html

AMD. (2023). ROCm Software 5.3.0: HIP Documentation. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD website: https://rocm.docs.amd.com/projects/HIP/en/docs-5.3.0/index.html

AMD. (2024, May 15). AMD. (Advanced Micro Devices, Inc) Retrieved 2024, from AMD Website: https://www.amd.com/en.html

Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J., . . . Sorensen, D. (1999). LAPACK Users' Guide (Third ed.). Philadelphia, USA: SIAM. doi:10.1137/1.9780898719604

Anzt, H., Cojean, T., Chen, Y.-C., Flegar, G., Göbel, F., Grützmacher, T., . . . Tsai, Y.-H. (2020). Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software, 5(52), 1-6, 2260. doi:10.21105/joss.02260

Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., . . . Quintana-Ortí, E. S. (2022, March). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. (Z. Bai, & W. Bangerth, Eds.) ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33, Article No. 2. doi:10.1145/3480935

Anzt, H., Tomov, S., & Dongarra, J. (2014, April). Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee. Retrieved from https://icl.utk.edu/files/publications/2014/icl-utk-772-2014.pdf

Bosch, J., Tan, X., Filgueras, A., Vidal, M., Mateu, M., Jiménez-González, D., . . . Labarta, J. (2018). Application Acceleration on FPGAs with OmpSs@FPGA. In 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10-14 Dec. (pp. 70-77). IEEE. doi:10.1109/FPT.2018.00021

BSC. (2016). Linear Algebra and Math Libraries. (Barcelona Supercomputing Center) Retrieved 2023, from BSC website: https://www.bsc.es/research-development/research-areas/programming-models/linear-algebra-and-math-libraries

Cppreference. (2024, October 4). RAII. Retrieved from Cppreference website: https://en.cppreference.com/w/cpp/language/raii

Davis, T. A., & Hu, Y. (2011, November). The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1), 1-25, Article 1. doi:10.1145/2049662.2049663

De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020). fBLAS: streaming linear algebra on FPGA. SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 9 - 19 (pp. 1-13, Article 59). Atlanta, Georgia, USA: IEEE. doi:10.5555/3433701.3433779

Dongarra, J. J., & Walker, D. W. (1995). Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2), 151-180. doi:10.1137/1037042

Dongarra, J., & Blackford, L. S. (1996). ScaLAPACK tutorial. In J. Waśniewski, J. Dongarra, K. Madsen, & D. Olesen, Applied Parallel Computing Industrial Computation and Optimization. Third International Workshop, PARA 1996, Lyngby, Denmark, August 18-21. Lecture Notes in Computer Science (Vol. 1184, pp. 204–215). Berlin, Heidelberg, Germany: Springer. doi:10.1007/3-540-62095-8_22

ETH Zürich. (2024). ETH Zürich. Retrieved from ETH website: https://ethz.ch/de.html

Fang, J., Mulder, Y. B., Hidders, J., Lee, J., & Hofstee, H. P. (2020, January). In-memory database acceleration on FPGAs: a survey. The VLDB Journal, 29(1), 33–59. doi:10.1007/s00778-019-00581-w

Gao, Y., & Zhang, P. (2016). A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing. 2016 IEEE International Conference on Smart Cloud (SmartCloud), 8-20 Nov. (pp. 170-175). New York, NY, USA: IEEE. doi:10.1109/SmartCloud.2016.36

Girden, E. R. (1992). ANOVA: repeated measures. Newbury Park, CA, USA: Sage, University Paper Serires on Quantitativer Aplications in the Social Sciences, Series 07-084. doi:10.4135/9781412983419

Gonzalez, J., & Núñez, R. C. (2009, July 1). LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. Journal of Physics: Conference Series, SciDAC 2009, 14–18 June, 180(1, 012042). doi:10.1088/1742-6596/180/1/012042

Kestur, S., Davis, J. D., & Chung, E. S. (2012). Towards a Universal FPGA Matrix-Vector Multiplication Architecture. 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, 29 April - 1 May (pp. 9-16). Toronto, ON, Canada: IEEE. doi:10.1109/FCCM.2012.12

Khronos Group. (2024). OPenCL: Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from Khronos Group website: https://www.khronos.org/opencl/

Khronos Group. (2024). SYCL: C++ Programming for Heterogeneous Parallel Computing. Retrieved from Khronos® Group website: https://www.khronos.org/api/index_2017/sycl

Kuon, I., Tessier, R., & Rose, J. (2008). FPGA Architecture: Survey and Challenges. Foundations and Trends in Electronic Design Automation, 2(2), 135-253. doi:10.1561/1000000005

Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979, September). Basic Linear Algebra Subprograms for Fortran Usage. (J. R. Rice, Ed.) ACM Transactions on Mathematical Software (TOMS), 5(3), 308–323. doi:10.1145/355841.355847

NVIDIA Corporation. (2024). CUDA Toolkit. Retrieved from NVIDIA Developer website: https://developer.nvidia.com/cuda-toolkit

OpenMP. (2024). OpenMP: The OpenMP API specification for parallel programming. Retrieved from OpenMP website: https://www.openmp.org/

Podobas, A. (2014). Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs. Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSOC '14, September 23 - 25 (pp. 149-156). Washington, DC, USA: IEEE. doi:10.1109/MCSoC.2014.30

Sommer, L., Korinth, J., & Koch, A. (2017). OpenMP device offloading to FPGA accelerators. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 10-12 July (pp. 201-205). Seattle, WA, USA: IEEE. doi:10.1109/ASAP.2017.7995280

Steffenel, L. A. (2019). HPC challenges for the next years: the rising of heterogeneity and its impact on simulations. CECAM Workshop: Microscopic simulations: forecasting the next two decades, April 24-26 (pp. 1-25). Toulouse, France: CECAM - Centre Européen de Calcul Atomique et Moléculaire. Retrieved from https://hal.univ-reims.fr/hal-02120029

Sun, J., Peterson, G. D., & Storaasli, O. (2007). Mapping Sparse Matrix-Vector Multiplication on FPGAs. Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07), July 17-20 (pp. 1-10). Urbana, Illinois, USA: RSSI. Retrieved from http://rssi.ncsa.illinois.edu/2007/proceedings/papers/rssi07_12_paper.pdf

Townsend, K. R. (2016). Computing SpMV on FPGAs. PhD Thesis, Iowa State University, Electrical and Computer Engineering, Ames, Iowa. doi:10.31274/etd-180810-4826

Tsoi, K. H., & Luk, W. (2010). Axel: a heterogeneous cluster with FPGAs and GPUs. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '10, Monterey, California, USA, February 21 - 23 (pp. 115–124). New York, NY, USA: Association for Computing Machinery. doi:10.1145/1723112.1723134

Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., & Jason, C. (2008). AutoPilot: A Platform-Based ESL Synthesis System. In P. Coussy, & A. Morawiec (Eds.), High-Level Synthesis: From Algorithm to Digital Circuit (pp. 99-112). Dordrecht, Netherlands: Springer. doi:10.1007/978-1-4020-8588-8_6

Zhuo, L., & Prasanna, V. K. (2005). High Performance Linear Algebra Operations on Reconfigurable Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 12-18 November (p. 2). Seattle, WA, USA: IEEE. doi:10.1109/SC.2005.31

Cómo citar
Morales-Peña, A., & Meneses, E. (2024). Expansión de Ginkgo para administrar kernels reconfigurables basados en hardware. Revista Colombiana De Computación, 25(2), 43–58. https://doi.org/10.29375/25392115.5276

Descargas

Los datos de descargas todavía no están disponibles.
Publicado
2024-12-31
Sección
Artículo de investigación científica y tecnológica

Métricas

QR Code
Crossref Cited-by logo