Benchmarks: SPIRAL Generated Programs

For historical benchmarks go here. The benchmarks below have been developed in detail in the following reference:

F. Franchetti, T. M. Low, D. T. Popovici, R. M. Veras, D. G. Spampinato, J. R. Johnson, M. Püschel, J. C. Hoe, and J. M. F. Moura.
SPIRAL: Extreme Performance Portability
In Proceedings of the IEEE,vol. 106, no. 11, 2018.
Special Issue on From High Level Specification to High Performance Code

Performance Results with Respect to Machine Size

FFT kernels (codelets) on 1.1 GHz ARM A57:

Large FFT on Intel. SPIRAL implements a scratchpad-style double-buffering scheme for better memory performance:

References

T. Popovic, T.-M. Low, and F. Franchetti.
Large bandwidth-efficient FFTs on multicore and multi-socket systems
In IEEE Internationa Parallel and Distributed Processing Symposium (IPDPS).IEEE 2018.

FFT results on the BlueGene/P supercomputer "Intrepid" at Argonne National Laboratory. 32 racks (32R) are 128k cores:

References

F. Franchetti, Y. Voronenko, and G. Almasi.
Automatic generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P
In High Performance Computing for Computational Science (VECPAR), 2012.

Performance Results Across Machine Types

Batch ID FFT on NVIDIA 480GTX Fermi GPU: SPIRAL vs. SUDA 4.0 CUFFT:

References

C. Angelopoulos, F. Franchetti, and M. Püschel.
Automatic Generation of the FFT Libraries for GPUs
NVIDIA Research Summit at the GPU Technology Conference, 2012.

Streaming ID DFT₂₅₆ on Xilinx Virtex-y XC6VLS760 FPGA: SPIRAL vs. Xilinx LogiCore IP library 12.0:

References

P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel.
Computer Generation of Hardware for Linear Digital Signal Processing Transforms
ACM Transactions on Design Automation of Electronic Systems, vol. 17, no. 2, 2012.

ONETEP 2x2x2 upsampling kernel with small odd-sized 3D batch FFTs on 3.5 GHz Intel Haswell 4770K: SPIRAL vs. FFTW and Intel MKL:

References

T. Popovici, F. Russell, K. Wilkinson, C.-K. Skylaris, P. H. J. Kelly, and F. Franchetti.
Generating optimized Fourier interpolation routines for density functional theory using SPIRAL
In IEEE Internationsl Parallel and Distributed Processing Symposium (IPDPS), 2015.

Performance Based on Kernel and Application Type

Performance of SPIRAL-generated polar formatting SAR image formation on 3.0 GHz Intel 5160, the 3.0 GHz Intel X9560, and the 2.66 GHz Intel Core i7 920 for 16 and 100 megapixel:

References

D. McFarlin, F. Franchetti, J. M. F. Moura, and M. Püschel.
High Performance Synthetic Aperture Radar Image Formation on Commodity Architectures
In SPIE Conference on Defense, Security, and Sensing, 2009.

Software Viterbi decoder on a 3 GHz Intel Core 2 Extreme X9650 for a range of codes: SPIRAL vs. Karn's library:

References

F. de Mesmay, S. Chellapa, F. Franchetti, and M. Püschel.
Computer Generation of Efficient Software Viterbi Decoders
In International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), ser. Lecture Notes in Computer Science, vol. 5952. Springer, 2010, pp. 353-368.

End-to-end stencil performance on a 3.4 GHz Intell Core i7-26000K for a range of stencil kernels: PLuTo/PTile [100] together with the SPIRAL backend vs. PTile plus Intel C/C++ compiler:

References

T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan.
A stencil compiler for short-vector SIMD architectures
In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ser. ICS '13. New York, NY, USA: ACM, 2013, pp. 13-24. [Online].

Performance of: (a) a rank-4 update generated with LGen vs. MKL, Intel C compiler (icc) compiled code, and LGen disabling structure support; (b) a single iteration of the Kalman filter generated with SLinGen vs. MKL, Eigen, and Intel C compiler (icc) compiled code. Code tested on an Intel Core i7-2600K (Sandy Bridge microarchitecture):

(a)

(b)

References

D. G. Spampinato and M. Püschel.
A basic linear algebra compiler for structured matrices
In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.

D. G. Spampinato, D. Fabregat-Traver, P. Bientinesi, and M. Püschel.
Program generation for small-scale linear algebra applications
In International Symposium on Code Generation and Optimization (CGO), 2018, pp. 117-127.