Enhancing the Matrix Transpose Operation Using Intel AVX Instruction Set Extension


General-purpose microprocessors are augmented with short-vector instruction extensions in order to simultaneously process more than one data element using the same operation. This type of parallelism is known as data-parallel processing. Many scientific, engineering, and signal processing applications can be formulated as matrix operations. Therefore, accelerating these kernel operations on microprocessors, which are the building blocks or large high-performance computing systems, will definitely boost the performance of the aforementioned applications. In this paper, we consider the acceleration of the matrix transpose operation using the 256-bit Intel Advanced Vector Extension (AVX) instructions. We present a novel vector-based matrix transpose algorithm and its optimized implementation using AVX instructions. The experimental results on Intel Core i7 processor demonstrates a 2.83 speedup over the standard sequential implementation and a maximum of 1.53 speedup over the GCC library implementation. When the transpose is combined with matrix addition to compute the matrix update, B + AT, where A and B are squared matrices, the speedup of our implementation over the sequential algorithm increased to 3.19.


Zekri A.S.

Journal/Conference Information

International Journal of Computer Science and Information Technology (IJCSIT),2014, 6(3), 67-78