Tensor programming for databases, with first class dimensions October 14, 2024
Tensor libraries like PyTorch and JAX have developed compact and accelerated APIs for manipulating n-dimensional arrays. N-dimensional arrays are kind of similar to tables in database, and this results in the logical question which is could you setup a Tensor-like API to do queries on databases that would be normally done with SQL? We have two challenges:
- Tensor computation is typically uniform and data-independent. But SQL relational queries are almost entirely about filtering and joining data in a data-dependent way.
- JOINs in SQL can be thought of as performing outer joins, which is not a very common operation in tensor computation.
However, we have a secret weapon: first class dimensions were primarily designed to as a new frontend syntax that made it easy to express einsum, batching and tensor indexing expressions. They might be good for SQL too.