API Reference
A full reference to all available SQL scalar functions, table functions, and virtual tables for sqlite-vss
.
WARNING
As a reminder, sqlite-vss
is still young, so breaking changes should be expected while sqlite-vss
is in a pre-v1 stage!
vss0
Virtual Tables
The vss0
module is used to create virtual tables that store and query your vectors. It takes inspiration from, and is very similar to the SQLite FTS5 full-text search virtual table.
Constructor Synax
create virtual table vss_xyz using vss0(
headline_embedding(384),
description_embedding(384) factory="IVF4096,Flat,IDMap2"
);
The constructor of the vss0
module takes in a list of column definitions. Currently, each column must be a vector column, where you define the dimensions of the vector as the single argument to the column name. In the above example, both the headline_embedding
and description_embedding
columns store vectors with 384 dimensions.
An optional factory=
option can be placed on individual columns. These are Faiss factory strings that give you more control over how the Faiss index is created. Consult the Faiss documentation to determine which factory makes the most sense for your use case. It's recommended that you include IDMap2
to your factory string, in order to reconstruct vectors in queries. The default factory string is "Flat,IDMap2"
, an exhaustive search index.
By contention the table name should be prefixed with vss_
. If your data exists in a "normal" table named "xyz"
, then name the vss0 table vss_xyz
.
Training
By default, the Faiss indexes storing your vectors do not require any additional training, so you can go straight to inserting data. But if you use a special factory string that requires one, like "IVF4096,Flat,IDMap2"
, then you'll have to insert training data before using your table. You can do so with the special operation='training'
constraint.
insert into vss_xyz(operation, description_embedding)
select 'training', description_embedding from xyz;
All training data is read into memory, so take care with large datasets. Not all indexes require the full dataset to train, so you can probably add a LIMIT N
clause where N
is an appropriate amount of training vectors. Note that in this example, only the description_embedding
column needs training, not the headline_embedding
column that uses the default factory.
Inserting Data
Data can be insert into vss0
virtual tables with normal INSERT INTO
operations.
insert into vss_xyz(rowid, headline_embedding, description_embedding)
select rowid, headline_embedding, description_embedding from xyz;
The vectors themselves can be any JSON or "raw bytes". The rowid is optional, but if your vss_xyz
table is linked to a xyz
table, its a good idea to use the same rowids for JOIN
s later.
JSON format
insert into vss_xyz(a_embedding)
values ('[0.1, 0.2, 0.3]');
"Raw bytes" format
insert into vss_xyz(a_embedding)
values (X'cdcccc3dcdcc4c3e9a99993e');
Transactions
In order for the data to actually insert and appear in the index, make sure to COMMIT
your inserted data. This is automatically done when using the SQLite CLI, but client libraries like Python will require explicit .commit()
calls.
Querying
vss_xyz
can be queried with SELECT
statements.
select * from vss_xyz;
In order to take advantage of the Faiss indexes for fast KNN (k nearest neighbors), use the vss_search
function.
select rowid, distance
from vss_xyz
where vss_search(
headline_embedding,
(select headline_embedding from xyz where rowid = 123)
)
limit 20
select rowid, distance
from vss_xyz
where vss_search(
headline_embedding,
vss_search_params(
(select headline_embedding from xyz where rowid = 123),
20
)
)
Here we get the 20 nearest headline embeddings to the headline_embedding value in row #123. In return we get the rowids of those similar columns, as well as the calculated distance from the query vector.
vss_search()
on SQLite 3.41 and above
Note that vss_search()
queries with limit N
only work on SQLite version 3.41 and above, due to a bug in previous versions.
On lower versions, use vss_search_params()
instead:
select rowid, distance
from vss_xyz
where vss_search(
headline_embedding,
vss_search_params(
(select headline_embedding from xyz where rowid = 123),
20
)
)
This is equivalent to the query above, just a little more verbose.
Deleting data
DELETE
operations are supported.
delete from vss_xyz where rowid between 100 and 200;
Keep in mind, small DELETE
operations are ineffiecient, so batch your inserts/deletes and wrap INSERT
s/DELETE
s in transactions whenever possible.
Shadow Table Schema
You shouldn't need to directly access the shadow tables for vss0
virtual tables, but here's the format for them. Subject to change, do not rely on this, will break in the future.
For a vss0
virtual table called xyz
, the follow shadow tables will exist:
xyz_data
- One row per "item" in the virtual table. Used to delegate and track rowid usage in the virtual table.x
is a no-op column.create table xyz_data(x);
xyz_index
- One row per column index. Stores the raw serialized Faiss index in one big BLOB.create table xyz_index(idx);
sqlite-vss
Functions
vss_version()
Returns the version string of the sqlite-vss
library.
select vss_version();
"0.1.2"
vss_debug()
Returns a debug string of various info about sqlite-vss
, including the version string, build date, and commit hash.
select vss_debug();
vss_search()
vss_search()
is a special function meant to be used in the WHERE
clause while querying a vss0
virtual table. It allows you to perform "K nearest neighbors" style searches on a vector table.
create virtual table vss_foo using vss0( bar(4) );
select rowid, distance
from vss_foo
where vss_search(bar, json('[0.1, 0.2, 0.3, 0.4]'))
limit 20;
select rowid, distance
from foo
where vss_search(bar, vss_search_params(json(''), 20));
vss_search_params()
select vss_search_params(); --
vss_range_search()
create virtual table foo using vss0( bar(4) );
select rowid, distance
from foo
where vss_range_search(
bar,
vss_range_search_params(json(''), .5)
);
vss_range_search_params()
select vss_range_search_params(); --
vss_distance_l1()
Returns the L1 distance between two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_L1()
select vss_distance_l1(
json('[0, 0]'),
json('[5, 1]')
);
6.0
select vss_distance_l1(
json('[2, 3]'),
json('[5, 1]')
);
5.0
vss_distance_l2()
Returns the L2 distance between two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_L2sqr()
select vss_distance_l2(
json('[0, 0]'),
json('[5, 1]')
);
25.999999999999999999
select vss_distance_l2(
json('[2, 3]'),
json('[5, 1]')
);
12.999999999999999999
vss_distance_linf()
Returns the infinity distance between two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_Linf()
select vss_distance_linf(
json('[0, 0]'),
json('[5, 1]')
);
5.0
select vss_distance_linf(
json('[2, 3]'),
json('[5, 1]')
);
3.0
vss_inner_product()
Returns the inner product result between two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_inner_product()
select vss_inner_product(
json('[0, 0]'),
json('[5, 1]')
);
0.0
select vss_inner_product(
json('[2, 3]'),
json('[5, 1]')
);
12.999999999999999999
vss_fvec_add()
Scalar addition on two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_add()
select vector_to_json(
vss_fvec_add(
json('[0, 0]'),
json('[1, 2]')
)
);
'[1.0,2.0]'
select vector_to_json(
vss_fvec_add(
json('[2, 5]'),
json('[-3, 4]')
)
);
'[-1.0,9.0]'
vss_fvec_sub()
Scalar subtraction on two vectors a
and b
. The two arguments must be vectors of the same length. Uses fvec_sub()
select vector_to_json(
vss_fvec_sub(
json('[0, 0]'),
json('[1, 2]')
)
);
'[-1.0,-2.0]'
select vector_to_json(
vss_fvec_sub(
json('[2, 5]'),
json('[-3, 4]')
)
);
'[5.0,1.0]'
sqlite-vector
Functions
sqlite-vector
is much more unstable than sqlite-vss
, so expect many breaking changes to these functions.
vector_version()
Returns a version string of the sqlite-vector
library.
select vector_version();
vector_debug()
Returns a debug information of the sqlite-vector
library.
select vector_debug();
vector0()
Returns a pointer containing a struct of low-level C API functions and utilities to work with vectors. Uses SQLite's pointer passing interface. You likely do not need to use this function.
select vector0();
NULL
vector_length()
Returns the length of a vector. TODO fix.
select vector_length(json('[0.1, 0.2, 0.3]'));
vector_value_at()
Returns the float value
select vector_value_at();
vector_from_blob()
select vector_from_blob();
vector_from_json()
select vector_from_json();
vector_from_raw()
select vector_from_raw();
vector_to_blob()
select vector_to_blob();
vector_to_json()
select vector_to_json();
vector_to_raw()
select vector_to_raw();