performance optimization via register use and cache reuse in cpu and in gpu using CUDA with dynamically allocated matrix multiplication
cd cpumake compiles all source filesmake test executes the climake valgrind runs valgrindmake clean removes outputted filescd gpumake compile compiles and link all source fileschmod +x submit.sh make shell script executablerun_gpu submit.sh use script to run on gpuvi output_filename view performancegcc read.c compile reader program./a.out output_matrix.mtx view outputted matrixmake clean removes outputted filesCS 481 - High Performance Computing, Instructor: Dingwen Tao