performance optimization via register use and cache reuse in cpu and in gpu using CUDA with dynamically allocated matrix multiplication
cd cpu
make
compiles all source filesmake test
executes the climake valgrind
runs valgrindmake clean
removes outputted filescd gpu
make compile
compiles and link all source fileschmod +x submit.sh
make shell script executablerun_gpu submit.sh
use script to run on gpuvi output_filename
view performancegcc read.c
compile reader program./a.out output_matrix.mtx
view outputted matrixmake clean
removes outputted filesCS 481 - High Performance Computing, Instructor: Dingwen Tao