Cryptographic applications often run more than one independent algorithm such as encryption and authentication. This fact provides a high level of parallelism which can be exploited by software and converted into instruction level parallelism to improve overall performance on modern super-scalar processors. We present fast and efficient methods of computing such pairs of functions on IA processors using a method called “function stitching”. Instead of computing pairs of functions sequentially as is done today in applications/libraries, we replace the function calls by a single call to a composite function that implements both algorithms. The execution time of this composite function can be made significantly shorter than the sums of the execution times for the individual functions and, in many cases, close to the execution time of the slower function.
Function stitching is best done at a very fine grain, interleaving the code for the individual algorithms at an instruction-level granularity. This results in excellent utilization of the execution resources in the processor core with a single thread.
We show how stitching pairs of functions together in a fine-grained manner results in excellent performance on IA processors. Currently, applications perform the functions sequentially. We demonstrate performance gains of 1.4X-1.9X with stitching over the best sequential function performance.