Parallelization of the AES Standard Encryption Algorithm Using Multi-Core Processor
Abstract
Since the beginning of time, keeping secrets has been an interest for everyone. A lot of ways to keep confidential information hidden were invented in the past history. Processes to keep such critical data away from strangers were developed under the name of cryptography. It is the process of shuffling data and altering its form to a totally new unreadable one, and getting it back to the initial form for specific readers. The first step is called encryption while the other is called decryption. With the industrial achievements, cryptography became an important field of study. When computers were invented, it became one of the most intriguing topics that computer specialists were interested in. For this sake, algorithms to encrypt and decrypt data were created.
In the revolutionarily changing world of Information Technology, users can’t ignore the size of data they are dealing with. To keep large data files secured, encryption or decryption may be a challenging task to execute. Advanced Encryption Standard (AES), the standard encryption decryption algorithm worldwide, degrades on large data. To enhance the performance of this algorithm, a lot of trials have been attempted to enhance the performance of this algorithm. With the available parallel processing tools, it is easier to search for “no-additional costs” solutions to improve AES performance and minimize its execution time.
OpenMP threading and Single Instruction Multiple Data (SIMD) extensions, the common multicore parallel processing tools, give a big opportunity in parallelizing AES using no more than local resources available inside microprocessor CPU chips. In this thesis, the Tiny 128 AES sequential implementation was adjusted to read from a plain text file to encrypt it or from an encrypted text file to decrypt it. The implementation was then parallelized using the threading OpenMP library, and the Intel SIMD extensions SSE2/AVX with the GCC compiler automatic optimization techniques. The speedup measured reached 15x in Encryption, and 65x for Decryption compared to the sequential execution.
After using Multicore with SIMD extensions and GCC automatic optimization, the execution time of the MixColumns function as that of the overall Decryption process was better in our work compared to using the compiler auto vectorization alone. The MixColumns function is 15 times faster than the sequential execution and 1.5 times faster than the compiler auto-optimized version. The overall decryption algorithm is 1.12 times faster than the version optimized by the compiler.
Student(s)
Mohammad Mahmoud Moshawrab
Supervisor(s)
Ahmed Sherif Zekri