On this page we overview software that we developed in this project. We have three types of projects: (a) program generators for particular problem domains, (b) hand-optimized software for certain problems, and (c) tools related to performance
Open Source SPIRAL is available here under non-viral license (BSD-style license). This is an initial version, and there will be an ongoing effort to open source whole system. Please let us know which parts of SPIRAL you are most interested in. Commercial support is available via SpiralGen, Inc.
SPIRAL was developed over 20 years by the SPIRAL team under funding from DARPA (OPAL, DESA, HACMS, PERFECT, BRASS), NSF, ONR, DoD HPC, JPL, DoE, CMU SEI, Intel, Nvidia, and Merury. The open sourcing of SPIRAL is an ongoing effort. The initial open source version of SPIRAL was supported by DARPA PERFECT.
Please subscibe to spiral-info@lists.andrew.cmu.edu to stay up-to-date regarding SPIRAL updates and new releases.
In DARPA PERFECT we pursue the goal of developing technology to achieve power efficiency of 75 GFLOPS/W at 7nm for DoD-related applications by deploying algorithm/architecture co-synthesis that leverages 3D chip stacking technology.
Go to the PERFECT project page.
FFTE is a Fortran subroutine library for computing the FFT in one or more dimensions. It includes real, complex, mixed-radix, and parallel transforms. We proposed an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE), and found that with the ARM compiler, SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C. [From Takahashi and Franchetti, "FFTE on SVE: SPIRAL-Generated Kernels," Proceedings of HPC Asia 2020, Jan. 2020, pp. 114-122]
Go to the FFTE project page.
FFTX is the exascale follow-on to the FFTW open source discrete FFT package for executing the Fast Fourier Transform as well as higher-level operations composed of linear operations combined with DFT transforms. At the heart of FFTX is a build-time code generator, SPIRAL, that produces very high performance kernels targeted to their specific uses and platform environments.
Go to the FFTX project page.
NTTX is an FFTX-like package that leverages SPIRAL's capability of autonomous code generation and platform-based autotuning to generate high-performance code of Number Theoretic Transforms (NTTs), which are widely used to accelerate Fully Homomorphic Encryption (FHE), a post-quantum encryption scheme. Mirroring the structure of FFTW and FFTX, the NTTX package offers FFTW-style C/C++ API in line with FFTX-style code generation.
Go to the NTTX project page.
HELIX is a formally verified language and rewriting engine for generating high-performance implementations for a variety of numerical algorithms. Based on the existing SPIRAL system, HELIX adds the rigor of formal verification of its correctness using the Coq proof assistant.
Go to the HELIX project page.
In the SPIRAL quantum package, we define the primitives and directives necessary to leverage SPIRAL’s code generation framework for quantum circuit optimization. The input to our system is a circuit description written in terms of qubit transform objects. SPIRAL outputs both a simplified tensor expression representing the optimized circuit and QASM code that can be executed on a quantum computer.
Go to the Quantum Circuit Optimization page.
Since the inception of compiler research the Holy Grail has been to devise a system that provides high level abstraction (programmers express their intent as concisely as in an algorithms textbook), and an automatic system that translates these programs or specifications into executables targeting an ever-evolving landscape of platforms, extracting close-to-optimal performance on all these platforms. The SnowWhite effort works towards addressing this problem by adding a novel AI approach to compiler analysis: It introduces high level reasoning to orchestrate the complex components and enables the systems to “understand” the computation much like human experts would do.
Go to the SnowWhite System page.
In the DARPA BRASS project we propose a framework for the development
of long-lived, DARPA-relevant, complex software systems that is robust to changes in the physical and logical resources provided by their ecosystem.
Go to the BRASS project page.
High-Assurance SPIRAL is a system for the synthesis of high-assurance implementations of controllers for vehicular systems that are executed in today's and future high-performance embedded system processors.
Go to the High Assurance SPIRAL (HACMS project) page.
We are doing research on language environments that support the constructions of program generators for performance. As an example, we built a Spiral prototype n Scala using LMS (staging). Doing so, various transformations and features can be implemented elegantly.
Go to the Spiral-Scala page.
The SPIRAL Code Generator is our flagship. Completely autonomous code generation, optimization, and platform adaptation for discrete signal transforms.
Go to the code generator site.
We developed a program generator, called LGen, for basic linear algebra computations. The design is very similar to the design of Spiral for transforms (see above).
Go to LGen site.
We provide an online generator for Viterbi decoders (in software) for convolutional codes.
Go to the Viterbi decoder site.
We have implemented a measurement framework to create roofline plots with measured performance data.
Go to the roofline site.
Also note our generalization of the rofline model.
We have developed a sorting library that adapts itself to platform characteristics and to the structure or statistics to the input to achieve better performance. Go to the sorting site.
We have developed a program generator for the all-pairs shortest path problem. It generates high performance scalar and vector code tuned to a given platform.
Go to the shortest path website.
We have developed algorithms and tools that generated multiplier blocks that multiply by one or several constants "multiplierless," that is, using only additions, subtractions, and shifts.
Go to the multiplierless website.
We developed a self-adaptable, easy-to-install package for efficently computing the Walsh-Hadamard transform on single processor and SMP platforms. The design of the package is based on an FFT package by Sebastian Egner. We use this package to experiment with new optimization techniques to be incorporated into the SPIRAL system (above).
Go to the WHT site.
We have developed a flexible C implementation of the fast hierarchical backprojection algorithm to investigate performance trade-offs with respect to runtime and accuracy.
Go to the backprojection site.
We have developed a tool to obtain high resolution performance profiles for a given program. For example, the number of cache misses during the execution recorded in very small time steps.
Go to the website.
We have generated software (for the DFT and other kernels) for the Cell BE chip multiprocessor.
Go to the Cell BE site.
We have optimized the recent new sparse FFT algorithm from Hassanieh et al. at MIT.
Go to the sparse FFT site.
We have developed a generalization of the roofline model to identify performance bottlenecks.
Go to the generalized roofline page.