Vector search in 7 different programming languages using SQL
2024-05-23 by Alex Garcia
As part of the sqlite-vec
project, I
provide multiple "bindings" to various programming languages and runtimes,
including:
- Python, as pip packages on PyPi
- Ruby, as gems on rubygems
- Node.js (and consequently Bun and Deno), as npm packages on npm
- Client-side JavaScript/WASM, as a "demo" npm package
- Go, as a Go module
- Rust, as a cargo crate on crates.io
- C: An "amalgamation" of a single
sqlite-vec.c
andsqlite-vec.h
file.
This article won't be about how I created the bindings — for that, see
this blog post for Python,
this one for Node.js,
and
this one for Ruby.
Instead, I want to compare all these languages together, in the specific context
of vector search with sqlite-vec
.
Here's the goal: For every programming language and runtime listed above, we are going to write a single-file script that does the following:
- Loads the
sqlite-vec
SQLite extension - Prints the SQLite version and
sqlite-vec
version, using thesqlite_version()
andvec_version()
SQL functions. - Create a
vec0
virtual table and populate it with 4-dimensional vectors. - Query the
vec0
table to find the 3 closest vectors to a query vector (KNN-style query).
For the 4-dimensional vectors, we will use this "database" of five vectors:
[0.1, 0.1, 0.1, 0.1]
[0.2, 0.2, 0.2, 0.2]
[0.3, 0.3, 0.3, 0.3]
[0.4, 0.4, 0.4, 0.4]
[0.5, 0.5, 0.5, 0.5]
With id
values of 1
, 2
, 3
, 4
, and 5
, respectively. The query vector
is:
[0.3, 0.3, 0.3, 0.3]
From eye-balling it, we can see the closest vector will always be #3 with a
distance of 0
, since both vectors have equal values. The next two closest
vectors will be #2 and #4.
Let's get into the code! For reference, these examples can be found
in the sqlite-vec
examples directory.
Feel free to skip to a specific language, or head to the bottom for some
high-level notes:
¶ Baseline: pure SQL
As a baseline, here's this logic in pure SQL:
.load ../../dist/vec0
.mode table
.header on
select sqlite_version(), vec_version();
CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4]);
INSERT INTO vec_items(rowid, embedding)
select
value ->> 0,
value ->> 1
from json_each('[
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]]
]');
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH '[0.3, 0.3, 0.3, 0.3]'
ORDER BY distance
LIMIT 3;
And the output, using the SQLite CLI (v3.45.1
):
$ sqlite3 < demo.sql
+------------------+----------------+
| sqlite_version() | vec_version() |
+------------------+----------------+
| 3.45.1 | v0.0.1-alpha.9 |
+------------------+----------------+
+-------+-------------------+
| rowid | distance |
+-------+-------------------+
| 3 | 0.0 |
| 4 | 0.199999988079071 |
| 2 | 0.200000017881393 |
+-------+-------------------+
Fairly straightforward! There's little boilerplate, only a few dot-commands at the top to load the extension and configure the output.
One interesting note: SQLite doesn't have "vectors" or "lists" as a native
datatype. However, we can use SQLite's builtin JSON functions like
json_each()
and
->>
to help when bulk-inserting
vectors. sqlite-vec
will automatically recognize and parse vectors provided as
JSON, so we're good to go!
¶ Python
Here's the Python implementation of the "simple" demo, using
Python's builtin sqlite3
library
and the sqlite-vec
PyPi package:
Show Python code
import sqlite3
import sqlite_vec
from typing import List
import struct
def serialize_f32(vector: List[float]) -> bytes:
"""serializes a list of floats into a compact "raw bytes" format"""
return struct.pack("%sf" % len(vector), *vector)
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
sqlite_version, vec_version = db.execute(
"select sqlite_version(), vec_version()"
).fetchone()
print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")
items = [
(1, [0.1, 0.1, 0.1, 0.1]),
(2, [0.2, 0.2, 0.2, 0.2]),
(3, [0.3, 0.3, 0.3, 0.3]),
(4, [0.4, 0.4, 0.4, 0.4]),
(5, [0.5, 0.5, 0.5, 0.5]),
]
query = [0.3, 0.3, 0.3, 0.3]
db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")
with db:
for item in items:
db.execute(
"INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)",
[item[0], serialize_f32(item[1])],
)
rows = db.execute(
"""
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
""",
[serialize_f32(query)],
).fetchall()
print(rows)
And the output:
$ python demo.py
sqlite_version=3.45.1, vec_version=v0.0.1-alpha.9
[(3, 0.0), (4, 0.19999998807907104), (2, 0.20000001788139343)]
Having SQLite in the standard library for your programming language is great!
Nothing extra to install, just works out of the box. However, sometimes your
version of SQLite can "lag", especially if you haven't upgraded in a bit. And
since sqlite-vec
relies on "new" SQLite features for performance/developer
experience upgrades, it can get a little annoying.
We can insert vectors as JSON strings or in a compact binary format. In general,
the compact binary format is faster to generate. We do have a build a helper
function, serialize_f32()
, which is a bit cumbersome, but it's only a single
line.
¶ Ruby
Here's the Ruby implementation of the "simple" demo, using
the sqlite3
gem and the
sqlite-vec
gem:
Show Ruby code
require 'sqlite3'
require 'sqlite_vec'
db = SQLite3::Database.new(':memory:')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first
puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"
db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")
items = [
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]],
]
db.transaction do
items.each do |item|
db.execute("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", [item[0], item[1].pack("f*")])
end
end
query = [0.3, 0.3, 0.3, 0.3]
rows = db.execute(<<-SQL, [query.pack("f*")])
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
SQL
puts rows
And the output:
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.3
3
0.0
4
0.19999998807907104
2
0.20000001788139343
In Ruby we can use .pack("f*")
to convert an array of floats into a vector
BLOB. That's nice! We can also use a heredoc for the SQL (<<-SQL
) which gave
me SQL syntax highlighting in my editor, which is neat.
¶ Node.js
Here's the Node.js implementation of the "simple" demo, using
better-sqlite3
npm package and
the sqlite-vec
npm package:
Show Node.js code
import * as sqliteVec from "sqlite-vec";
import Database from "better-sqlite3";
const db = new Database(":memory:");
sqliteVec.load(db);
const { sqlite_version, vec_version } = db
.prepare(
"select sqlite_version() as sqlite_version, vec_version() as vec_version;"
)
.get();
console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);
const items = [
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];
db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");
const insertStmt = db.prepare(
"INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)"
);
const insertVectors = db.transaction((items) => {
for (const [id, vector] of items) {
insertStmt.run(BigInt(id), new Float32Array(vector));
}
});
insertVectors(items);
const rows = db
.prepare(
`
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
`
)
.all(new Float32Array(query));
console.log(rows);
And the output:
$ node demo.mjs
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.8
[
{ rowid: 3, distance: 0 },
{ rowid: 4, distance: 0.19999998807907104 },
{ rowid: 2, distance: 0.20000001788139343 }
]
I'm a big fan of the better-sqlite3
package! The API make a lot of sense and
is easy to work with. For coverting an array of numbers to a vector BLOB, all we
need to do is wrap it with
new Float32Array()
and better-sqlite3
will bind is as a BLOB. Pretty neat!
¶ Bun
Here's the Bun implementation of the "simple" demo, using
Buns's builtin bun:sqlite
library and the
sqlite-vec
npm package:
Show Bun code
import { Database } from "bun:sqlite";
Database.setCustomSQLite("/usr/local/opt/sqlite3/lib/libsqlite3.dylib");
const db = new Database(":memory:");
//sqliteVec.load(db);
db.loadExtension("../../dist/vec0");
const { sqlite_version, vec_version } = db
.prepare(
"select sqlite_version() as sqlite_version, vec_version() as vec_version;"
)
.get();
console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);
const items = [
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];
db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");
const insertStmt = db.prepare(
"INSERT INTO vec_items(rowid, embedding) VALUES (?, vec_f32(?))"
);
const insertVectors = db.transaction((items) => {
for (const [id, vector] of items) {
insertStmt.run(BigInt(id), new Float32Array(vector));
}
});
insertVectors(items);
const rows = db
.prepare(
`
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
`
)
.all(new Float32Array(query));
console.log(rows);
And the output:
$ bun demo.ts
sqlite_version=3.45.1, vec_version=v0.0.1-alpha.9
[
{
rowid: 3,
distance: 0,
}, {
rowid: 4,
distance: 0.19999998807907104,
}, {
rowid: 2,
distance: 0.20000001788139343,
}
]
Very similar to the Node.js demo! The only real difference is that the builtin
bun:sqlite
package is used instead of better-sqlite3
, which already has a
similar API to better-sqlite3
, so little code changes are needed. However,
bun:sqlite
will by default use the system-level SQLite library, with on MacOS
doesn't allow extensions. So I needed to use Database.setCustomSQLite()
to
point to the brew version of SQLite instead.
Although technically, Bun is compatible with Node.js, so theoretically we could also run the Node.js version with Bun without fuss. Though when I try on my x86 Mac, it fails for me:
$ bun ../simple-node/demo.mjs
dyld[11775]: missing symbol called
Killed: 9
¶ Deno (almost!)
Here's the Deno implementation of the "simple" demo, using
Deno's jsr:@db/sqlite
library and
the sqlite-vec
npm package:
Show Deno code
import { Database } from "jsr:@db/sqlite@0.11";
import * as sqliteVec from "npm:sqlite-vec@0.0.1-alpha.9";
const db = new Database(":memory:");
db.enableLoadExtension = true;
sqliteVec.load(db);
db.enableLoadExtension = false;
const [sqlite_version, vec_version] = db
.prepare("select sqlite_version(), vec_version()")
.value<[string, string]>()!;
console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);
const items = [
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];
db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");
const insertStmt = db.prepare(
"INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)"
);
const insertVectors = db.transaction((items) => {
for (const [id, vector] of items) {
insertStmt.run(BigInt(id), new Uint8Array(new Float32Array(vector).buffer));
}
});
insertVectors(items);
const rows = db
.prepare(
`
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 5
`
)
.all([new Uint8Array(new Float32Array(query).buffer)]);
console.log(rows);
db.close();
And the output:
$ deno run -A --unstable-ffi demo.ts
error: Error getting response at https://registry.npmjs.org/sqlite-vec for package "sqlite-vec": invalid type: null, expected a sequence at line 1 column 916
at file:///Users/.../demo.ts:2:28
But if fails! This is because
of an obscure bug that should
fix on the next version of Deno. Long story short, I published an old version of
the sqlite-vec
NPM package with an error on the package.json
file, which
broke Deno/npm compatability. Thankfully, the Deno team put out a fix quickly,
so hopefully we don't have to wait long for Deno 1.44
!
Nonetheless, I really like Deno's SQLite client library, specifically the
jsr:@db/sqlite
package. Deno's FFI support is a little awkward, but it's
really fast and has an even nicer API than better-sqlite3
, on my opinion.
¶ Client-side JavaScript
sqlite-vec
can be compiled to WASM to be used in the browser, using the
official SQLite WASM build. There
is a sqlite-vec-wasm-demo
NPM package that contains a pre-comiled WASM build
of the SQLite library with sqlite-vec
statically compiled in, that can be used
like so:
Show WASM code
<html>
<body>
<h1>sqlite-vec demo/simple-wasm</h1>
<div id="target"></div>
<script type="module">
import {default as init} from "https://cdn.jsdelivr.net/npm/sqlite-vec-wasm-demo@latest/sqlite3.mjs";
function log(msg) {
const pre = document.querySelector('#target').appendChild(document.createElement("pre"));
pre.textContent = msg;
}
const sqlite3 = await init();
const db = new sqlite3.oo1.DB(":memory:");
const [sqlite_version, vec_version] = db.selectArray('select sqlite_version(), vec_version();')
log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);
const items = [
[1, [0.1, 0.1, 0.1, 0.1]],
[2, [0.2, 0.2, 0.2, 0.2]],
[3, [0.3, 0.3, 0.3, 0.3]],
[4, [0.4, 0.4, 0.4, 0.4]],
[5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];
db.exec('CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4]);');
const stmt = db.prepare("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)");
for(const item of items) {
stmt
.bind(1, item[0])
.bind(2, new Float32Array(item[1]).buffer)
.stepReset();
}
stmt.finalize();
const rows = db
.selectArrays(
`
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
`, new Float32Array(query).buffer
);
log(JSON.stringify(rows));
</script>
</body>
</html>
`
If you open that in a web browser, you will see:
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.9
[[3,0],[4,0.19999998807907104],[2,0.20000001788139343]]
The JavaScript bindings are from the SQLite team, which are a bit awkward, like
new sqlite3.oo1.DB(":memory:")
. But just like the other JavaScript runtimes,
we can use new Float32Array(vector).buffer
to bind vectors as BLOBs rather
easily!
¶ Go
Here's the Go implementation of the "simple" demo, using the
mattn/go-sqlite3
CGO SQLite module and
the cgo/sqlite-vec
Go module:
Show Go code
package main
import (
"bytes"
"database/sql"
"encoding/binary"
"fmt"
"log"
sqlite_vec "github.com/asg017/sqlite-vec/bindings/go/cgo"
_ "github.com/mattn/go-sqlite3"
)
// #cgo LDFLAGS: -L../../dist
import "C"
func serializeFloat32(vector []float32) ([]byte, error) {
buf := new(bytes.Buffer)
err := binary.Write(buf, binary.LittleEndian, vector)
if err != nil {
return nil, err
}
return buf.Bytes(), nil
}
func main() {
sqlite_vec.Auto()
db, err := sql.Open("sqlite3", ":memory:")
if err != nil {
log.Fatal(err)
}
defer db.Close()
var sqliteVersion string
var vecVersion string
err = db.QueryRow("select sqlite_version(), vec_version()").Scan(&sqliteVersion, &vecVersion)
if err != nil {
log.Fatal(err)
}
fmt.Printf("sqlite_version=%s, vec_version=%s\n", sqliteVersion, vecVersion)
_, err = db.Exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")
if err != nil {
log.Fatal(err)
}
items := map[int][]float32{
1: {0.1, 0.1, 0.1, 0.1},
2: {0.2, 0.2, 0.2, 0.2},
3: {0.3, 0.3, 0.3, 0.3},
4: {0.4, 0.4, 0.4, 0.4},
5: {0.5, 0.5, 0.5, 0.5},
}
q := []float32{0.3, 0.3, 0.3, 0.3}
for id, values := range items {
v, err := serializeFloat32(values)
if err != nil {
log.Fatal(err)
}
_, err = db.Exec("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", id, v)
if err != nil {
log.Fatal(err)
}
}
query, err := serializeFloat32(q)
if err != nil {
log.Fatal(err)
}
rows, err := db.Query(`
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 3
`, query)
if err != nil {
log.Fatal(err)
}
for rows.Next() {
var rowid int64
var distance float64
err = rows.Scan(&rowid, &distance)
if err != nil {
log.Fatal(err)
}
fmt.Printf("rowid=%d, distance=%f\n", rowid, distance)
}
err = rows.Err()
if err != nil {
log.Fatal((err))
}
}
And the output:
sqlite_version=3.45.1, vec_version=v0.0.1-alpha.3
rowid=3, distance=0.000000
rowid=4, distance=0.200000
rowid=2, distance=0.200000
I do love go, especially how easy it is to type SQL returned results. But we do
have to write our own serializeFloat32
function to convert a slice of floats
into a vector BLOB, which is annoying. And if I have to write if err != nil
one more time...
¶ Rust
Here's the Rust implementation of the "simple" demo, using the
rusqlite crate and the sqlite-vec
crate:
Show Rust code
use rusqlite::{ffi::sqlite3_auto_extension, Connection, Result};
use sqlite_vec::sqlite3_vec_init;
use zerocopy::AsBytes;
fn main() -> Result<()> {
unsafe {
sqlite3_auto_extension(Some(std::mem::transmute(sqlite3_vec_init as *const ())));
}
let db = Connection::open_in_memory()?;
let v: Vec<f32> = vec![0.1, 0.2, 0.3];
let (sqlite_version, vec_version, x): (String, String, String) = db.query_row(
"select sqlite_version(), vec_version(), vec_to_json(?)",
&[v.as_bytes()],
|x| Ok((x.get(0)?, x.get(1)?, x.get(2)?)),
)?;
println!("sqlite_version={sqlite_version}, vec_version={vec_version}");
let items: Vec<(usize, Vec<f32>)> = vec![
(1, vec![0.1, 0.1, 0.1, 0.1]),
(2, vec![0.2, 0.2, 0.2, 0.2]),
(3, vec![0.3, 0.3, 0.3, 0.3]),
(4, vec![0.4, 0.4, 0.4, 0.4]),
(5, vec![0.5, 0.5, 0.5, 0.5]),
];
println!("{x}");
db.execute(
"CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])",
[],
)?;
let mut stmt = db.prepare("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)")?;
for item in items {
stmt.execute(rusqlite::params![item.0, item.1.as_bytes()])?;
}
let query: Vec<f32> = vec![0.3, 0.3, 0.3, 0.3];
let result: Vec<(i64, f64)> = db
.prepare(
r"
SELECT
rowid,
distance
FROM vec_items
WHERE embedding MATCH ?1
ORDER BY distance
LIMIT 3
",
)?
.query_map([query.as_bytes()], |r| Ok((r.get(0)?, r.get(1)?)))?
.collect::<Result<Vec<_>, _>>()?;
println!("{:?}", result);
Ok(())
}
And the output:
$ cargo run demo.rs
Compiling sqlite-vec-demo v0.0.0 (/Users/.../simple-rust)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.85s
Running `target/debug/demo demo.rs`
sqlite_version=3.45.0, vec_version=v0.0.1-alpha.7
[0.100000,0.200000,0.300000]
[(3, 0.0), (4, 0.19999998807907104), (2, 0.20000001788139343)]
If you're okay adding the
zerocopy
crate, then coverting a
Vec<f32>
into a vector BLOB is as easy as a .as_bytes()
call! The syntax can
get a little gnarly, especially around .query_map().collect()
, but man once
you get it to work, it's extremely productive!
¶ C
Finally, here's the C implementation of the "simple" demo, using the raw SQLite
C API, and the sqlite-vec
amalgamation!
Show C code
#include "sqlite3.h"
#include "sqlite-vec.h"
#include <stdio.h>
#include <unistd.h>
#include <assert.h>
int main(int argc, char *argv[]) {
int rc = SQLITE_OK;
sqlite3 *db;
sqlite3_stmt *stmt;
rc = sqlite3_auto_extension((void (*)())sqlite3_vec_init);
assert(rc == SQLITE_OK);
rc = sqlite3_open(":memory:", &db);
assert(rc == SQLITE_OK);
rc = sqlite3_prepare_v2(db, "SELECT sqlite_version(), vec_version()", -1, &stmt, NULL);
assert(rc == SQLITE_OK);
rc = sqlite3_step(stmt);
printf("sqlite_version=%s, vec_version=%s\n", sqlite3_column_text(stmt, 0), sqlite3_column_text(stmt, 1));
sqlite3_finalize(stmt);
static const struct {
sqlite3_int64 id;
float vector[4];
} items[] = {
{1, {0.1, 0.1, 0.1, 0.1}},
{2, {0.2, 0.2, 0.2, 0.2}},
{3, {0.3, 0.3, 0.3, 0.3}},
{4, {0.4, 0.4, 0.4, 0.4}},
{5, {0.5, 0.5, 0.5, 0.5}},
};
float query[4] = {0.3, 0.3, 0.3, 0.3};
rc = sqlite3_prepare_v2(db, "CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])", -1, &stmt, NULL);
assert(rc == SQLITE_OK);
rc = sqlite3_step(stmt);
assert(rc == SQLITE_DONE);
sqlite3_finalize(stmt);
rc = sqlite3_exec(db, "BEGIN", NULL, NULL, NULL);
assert(rc == SQLITE_OK);
rc = sqlite3_prepare_v2(db, "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", -1, &stmt, NULL);
assert(rc == SQLITE_OK);
for (unsigned long i = 0; i < sizeof(items) / sizeof(items[0]); i++) {
sqlite3_bind_int64(stmt, 1, items[i].id);
sqlite3_bind_blob(stmt, 2, items[i].vector, sizeof(items[i].vector), SQLITE_STATIC);
rc = sqlite3_step(stmt);
assert(rc == SQLITE_DONE);
sqlite3_reset(stmt);
}
sqlite3_finalize(stmt);
rc = sqlite3_exec(db, "COMMIT", NULL, NULL, NULL);
assert(rc == SQLITE_OK);
rc = sqlite3_prepare_v2(db,
"SELECT "
" rowid, "
" distance "
"FROM vec_items "
"WHERE embedding MATCH ?1 "
"ORDER BY distance "
"LIMIT 3 "
, -1, &stmt, NULL);
assert(rc == SQLITE_OK);
sqlite3_bind_blob(stmt, 1, query, sizeof(query), SQLITE_STATIC);
while(1) {
rc = sqlite3_step(stmt);
if(rc == SQLITE_DONE) break;
assert(rc==SQLITE_ROW);
sqlite3_int64 rowid = sqlite3_column_int64(stmt, 0);
double distance = sqlite3_column_double(stmt, 1);
printf("rowid=%lld distance=%f\n", rowid, distance);
}
sqlite3_finalize(stmt);
sqlite3_close(db);
return 0;
}
And the output:
$ make demo
gcc \
-DSQLITE_CORE \
-I../../ -I../../vendor \
demo.c ../../sqlite-vec.c ../../vendor/sqlite3.c \
-o demo
$ ./demo
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.9
rowid=3 distance=0.000000
rowid=4 distance=0.200000
rowid=2 distance=0.200000
C is not a fun language is write and can be super unsafe. But man, it feels
really nice to have a single 1.5MB
executable with everything you need...
¶ Final thoughts
¶ SQL is the standard API across all languages
No matter the language, every single demo above ran basically the same SQL:
select sqlite_version(), vec_version();
create virtual table vec_items using vec0(embedding float[4]);
insert into vec_items(rowid, embedding)
values (?, ?);
select
rowid,
distance
from vec_items
where embedding MATCH ?
order by distance
limit 3;
The only real differences were superficial: how the SQLite client library works for that language, how to convert a list of numbers to the BLOB format, and how to handle transactions.
I think this is the real super-power of sqlite-vec
. It's rare to have a vector
search library work in so many different languages and runtimes. But it's even
more rare, perhaps unique, to have an API that's more-or-less the same across
different programming languages. That means you can create and "train" an index
in Python, serve it to users in the browser in JavaScript, in an Express server
with Node.js, or in a CLI with Rust, all without much fuss.
¶ Creating these bindings was worth it
It took me a loooooong time to figure out a good "distribution" strategy for SQLite extensions. When I first started creating SQLite extensions, I followed the pattern established by sqlean, which was to upload pre-compiled loadable extensions to Github Releases for popular platforms (MacOS, Linux, Windows, etc.).
This is great, but downloading links yourself can be quite a chore. If you want
to use sqlite-vec
in your Python package, you don't want to download and
extract a tar.gz
file: you just want to pip install
something and go along
your merry way.
So over the last few years, I've developed a few techniques to creating
"binding" packages for these SQLite extensions on popular package managers:
pypi
, npm
, rubygems
, crates.io
, and Golang modules. At this point, I've
spent more time on these distributions than on SQLite extensions themselves, but
it was worth it!
I haven't blogged about how the Rust/Go bindings work quite yet, but I hope to
soon! I also have a work-in-progress project called
sqlite-dist
that automates a lot of
these bindings, which I also plan to blog about in the near future.
¶ Is it a good idea to have a SQLite client built into your language/runtime?
Of these examples, only Python and Bun have a builtin client library for SQLite. For all other languages, it's a 3rd party package that you must install and update yourself.
One one hand, this is great! No need to install another dependency, just write and use.
On the other hand, it makes working with modern SQLite features a bit harder.
For Python, the version of SQLite that's "bundled-in" depends on which Python your using. If you're using the default MacOS Python, that uses the default MacOS SQLite library, which is outdated and blocks SQLite extensions. If you're using brew, then it might use whatever SQLite version brew has installed. If you're using Docker, I have no idea.
For Bun, there's even
a section in their bun:sqlite
docs
about this confusion. For MacOS users, the default SQLite library that Bun uses
is the outdated default MacOS SQLite build. So Bun added a
Database.setCustomSQLite("/path/to/libsqlite.dylib")
method as a workaround.
Which does work, but is an extra step you have to perform at runtime. And if you
want to run your code on other operating systems, you'll need to add a few if
statements to avoid accidentally loading in a SQLite build that doesn't exist. A
small headache!
Now compare this to all the other libraries where SQLite is offered as a third party package: Node.js, Deno, Ruby, Go, and Rust. To update to a latest version of SQLite, you usually just need to update the package with whatever package manager your language/runtime offers. Sometimes there's a delay when a new SQLite version is released and package maintainers release a new version, which is something to consider. But it's way less likely that you'll get "surprised" by code that works on one machine and not the other, and you can handle updates yourself.