I kept hitting the same wall: prototype something in NumPy or PyTorch,
then rewrite it in C++ for edge deployment. The rewrite always took
longer than the original work. Eigen's fixed-size matrix API doesn't
map to tensor workloads, xtensor is CPU-only with compile-time templated
types that produce unreadable errors, and none of them have GPU support
on Mac. Worse, Eigen was often slower than the Python version because
PyTorch bundles optimized BLAS while Eigen uses its own limited
implementation.
So I built Axiom to make that rewrite mechanical. The API mirrors
NumPy/PyTorch as closely as I could — same method names, broadcasting
rules, operator overloading, dynamic shapes, runtime dtypes. Code that
looks like this in PyTorch:
scores = Q.matmul(K.transpose(-2, -1)) / math.sqrt(64)
output = scores.softmax(-1).matmul(V)
looks like this in Axiom:
auto scores = Q.matmul(K.transpose(-2, -1)) / std::sqrt(64.0f);
auto output = scores.softmax(-1).matmul(V);
No mental translation. No debugging subtle API differences.
What's in the box (28k LOC):
- 100+ operations: arithmetic, reductions, activations (relu, gelu, silu,
softmax), pooling, FFT, full LAPACK linear algebra (SVD, QR, Cholesky,
eigendecomposition, solvers)
- Metal GPU via MPSGraph — all ops run on GPU, not just matmul. Compiled
graphs are cached by (shape, dtype) to avoid recompilation
- Seamless CPU ↔ GPU: `auto g = tensor.gpu();` — unified memory on Apple
Silicon avoids copies entirely
- Built-in einops: `tensor.rearrange("b h w c -> b c h w")`
- Highway SIMD across architectures (NEON, AVX2, AVX-512, SSE, WASM, RISC-V)
- Runtime dtypes via variant (readable errors, not template explosions)
- Row-major default, column-major supported via as_f_contiguous()
- Works on macOS, Linux, Windows, and WebAssembly
Performance on M4 Pro (vs Eigen with OpenBLAS, PyTorch, NumPy):
- Matmul 2048×2048: 3,196 GFLOPS (Eigen 2,911 / PyTorch 2,433)
- ReLU 4096×4096: 123 GB/s (Eigen 117 / PyTorch 70)
- FFT2 2048×2048: 14.9ms (PyTorch 27.6ms / NumPy 63.5ms)
To try it:
git clone https://github.com/frikallo/axiom.git
cd axiom && make release
Or add to your CMake project via FetchContent. Example files in examples/.
Happy to answer questions about the internals or take feedback on the API.