Alex Garcia's Blog

Vector search in 7 different programming languages using SQL

2024-05-23 by Alex Garcia

As part of the sqlite-vec project, I provide multiple "bindings" to various programming languages and runtimes, including:

This article won't be about how I created the bindings — for that, see this blog post for Python, this one for Node.js, and this one for Ruby. Instead, I want to compare all these languages together, in the specific context of vector search with sqlite-vec.

Here's the goal: For every programming language and runtime listed above, we are going to write a single-file script that does the following:

  1. Loads the sqlite-vec SQLite extension
  2. Prints the SQLite version and sqlite-vec version, using the sqlite_version() and vec_version() SQL functions.
  3. Create a vec0 virtual table and populate it with 4-dimensional vectors.
  4. Query the vec0 table to find the 3 closest vectors to a query vector (KNN-style query).

For the 4-dimensional vectors, we will use this "database" of five vectors:

[0.1, 0.1, 0.1, 0.1]
[0.2, 0.2, 0.2, 0.2]
[0.3, 0.3, 0.3, 0.3]
[0.4, 0.4, 0.4, 0.4]
[0.5, 0.5, 0.5, 0.5]

With id values of 1, 2, 3, 4, and 5, respectively. The query vector is:

[0.3, 0.3, 0.3, 0.3]

From eye-balling it, we can see the closest vector will always be #3 with a distance of 0, since both vectors have equal values. The next two closest vectors will be #2 and #4.

Let's get into the code! For reference, these examples can be found in the sqlite-vec examples directory. Feel free to skip to a specific language, or head to the bottom for some high-level notes:

Baseline: pure SQL

As a baseline, here's this logic in pure SQL:

.load ../../dist/vec0
.mode table
.header on

select sqlite_version(), vec_version();

CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4]);

INSERT INTO vec_items(rowid, embedding)
  select
    value ->> 0,
    value ->> 1
  from json_each('[
    [1, [0.1, 0.1, 0.1, 0.1]],
    [2, [0.2, 0.2, 0.2, 0.2]],
    [3, [0.3, 0.3, 0.3, 0.3]],
    [4, [0.4, 0.4, 0.4, 0.4]],
    [5, [0.5, 0.5, 0.5, 0.5]]
  ]');

SELECT
  rowid,
  distance
FROM vec_items
WHERE embedding MATCH '[0.3, 0.3, 0.3, 0.3]'
ORDER BY distance
LIMIT 3;

And the output, using the SQLite CLI (v3.45.1):

$ sqlite3 < demo.sql
+------------------+----------------+
| sqlite_version() | vec_version()  |
+------------------+----------------+
| 3.45.1           | v0.0.1-alpha.9 |
+------------------+----------------+
+-------+-------------------+
| rowid |     distance      |
+-------+-------------------+
| 3     | 0.0               |
| 4     | 0.199999988079071 |
| 2     | 0.200000017881393 |
+-------+-------------------+

Fairly straightforward! There's little boilerplate, only a few dot-commands at the top to load the extension and configure the output.

One interesting note: SQLite doesn't have "vectors" or "lists" as a native datatype. However, we can use SQLite's builtin JSON functions like json_each() and ->> to help when bulk-inserting vectors. sqlite-vec will automatically recognize and parse vectors provided as JSON, so we're good to go!

Python

Here's the Python implementation of the "simple" demo, using Python's builtin sqlite3 library and the sqlite-vec PyPi package:

Show Python code
import sqlite3
import sqlite_vec

from typing import List
import struct


def serialize_f32(vector: List[float]) -> bytes:
    """serializes a list of floats into a compact "raw bytes" format"""
    return struct.pack("%sf" % len(vector), *vector)


db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)

sqlite_version, vec_version = db.execute(
    "select sqlite_version(), vec_version()"
).fetchone()
print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")

items = [
    (1, [0.1, 0.1, 0.1, 0.1]),
    (2, [0.2, 0.2, 0.2, 0.2]),
    (3, [0.3, 0.3, 0.3, 0.3]),
    (4, [0.4, 0.4, 0.4, 0.4]),
    (5, [0.5, 0.5, 0.5, 0.5]),
]
query = [0.3, 0.3, 0.3, 0.3]

db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")

with db:
    for item in items:
        db.execute(
            "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)",
            [item[0], serialize_f32(item[1])],
        )

rows = db.execute(
    """
      SELECT
        rowid,
        distance
      FROM vec_items
      WHERE embedding MATCH ?
      ORDER BY distance
      LIMIT 3
    """,
    [serialize_f32(query)],
).fetchall()

print(rows)

And the output:

$ python demo.py
sqlite_version=3.45.1, vec_version=v0.0.1-alpha.9
[(3, 0.0), (4, 0.19999998807907104), (2, 0.20000001788139343)]

Having SQLite in the standard library for your programming language is great! Nothing extra to install, just works out of the box. However, sometimes your version of SQLite can "lag", especially if you haven't upgraded in a bit. And since sqlite-vec relies on "new" SQLite features for performance/developer experience upgrades, it can get a little annoying.

We can insert vectors as JSON strings or in a compact binary format. In general, the compact binary format is faster to generate. We do have a build a helper function, serialize_f32(), which is a bit cumbersome, but it's only a single line.

Ruby

Here's the Ruby implementation of the "simple" demo, using the sqlite3 gem and the sqlite-vec gem:

Show Ruby code
require 'sqlite3'
require 'sqlite_vec'


db = SQLite3::Database.new(':memory:')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)

sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first
puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"

db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")

items = [
  [1, [0.1, 0.1, 0.1, 0.1]],
  [2, [0.2, 0.2, 0.2, 0.2]],
  [3, [0.3, 0.3, 0.3, 0.3]],
  [4, [0.4, 0.4, 0.4, 0.4]],
  [5, [0.5, 0.5, 0.5, 0.5]],
]

db.transaction do
  items.each do |item|
    db.execute("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", [item[0], item[1].pack("f*")])
  end
end

query = [0.3, 0.3, 0.3, 0.3]
rows = db.execute(<<-SQL, [query.pack("f*")])
  SELECT
    rowid,
    distance
  FROM vec_items
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 3
SQL

puts rows

And the output:

sqlite_version=3.45.3, vec_version=v0.0.1-alpha.3
3
0.0
4
0.19999998807907104
2
0.20000001788139343

In Ruby we can use .pack("f*") to convert an array of floats into a vector BLOB. That's nice! We can also use a heredoc for the SQL (<<-SQL) which gave me SQL syntax highlighting in my editor, which is neat.

Node.js

Here's the Node.js implementation of the "simple" demo, using better-sqlite3 npm package and the sqlite-vec npm package:

Show Node.js code
import * as sqliteVec from "sqlite-vec";
import Database from "better-sqlite3";

const db = new Database(":memory:");
sqliteVec.load(db);

const { sqlite_version, vec_version } = db
  .prepare(
    "select sqlite_version() as sqlite_version, vec_version() as vec_version;"
  )
  .get();

console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);

const items = [
  [1, [0.1, 0.1, 0.1, 0.1]],
  [2, [0.2, 0.2, 0.2, 0.2]],
  [3, [0.3, 0.3, 0.3, 0.3]],
  [4, [0.4, 0.4, 0.4, 0.4]],
  [5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];

db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");

const insertStmt = db.prepare(
  "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)"
);

const insertVectors = db.transaction((items) => {
  for (const [id, vector] of items) {
    insertStmt.run(BigInt(id), new Float32Array(vector));
  }
});

insertVectors(items);

const rows = db
  .prepare(
    `
  SELECT
    rowid,
    distance
  FROM vec_items
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 3
`
  )
  .all(new Float32Array(query));

console.log(rows);

And the output:

$ node demo.mjs
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.8
[
  { rowid: 3, distance: 0 },
  { rowid: 4, distance: 0.19999998807907104 },
  { rowid: 2, distance: 0.20000001788139343 }
]

I'm a big fan of the better-sqlite3 package! The API make a lot of sense and is easy to work with. For coverting an array of numbers to a vector BLOB, all we need to do is wrap it with new Float32Array() and better-sqlite3 will bind is as a BLOB. Pretty neat!

Bun

Here's the Bun implementation of the "simple" demo, using Buns's builtin bun:sqlite library and the sqlite-vec npm package:

Show Bun code
import { Database } from "bun:sqlite";
Database.setCustomSQLite("/usr/local/opt/sqlite3/lib/libsqlite3.dylib");

const db = new Database(":memory:");
//sqliteVec.load(db);
db.loadExtension("../../dist/vec0");

const { sqlite_version, vec_version } = db
  .prepare(
    "select sqlite_version() as sqlite_version, vec_version() as vec_version;"
  )
  .get();

console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);

const items = [
  [1, [0.1, 0.1, 0.1, 0.1]],
  [2, [0.2, 0.2, 0.2, 0.2]],
  [3, [0.3, 0.3, 0.3, 0.3]],
  [4, [0.4, 0.4, 0.4, 0.4]],
  [5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];

db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");

const insertStmt = db.prepare(
  "INSERT INTO vec_items(rowid, embedding) VALUES (?, vec_f32(?))"
);

const insertVectors = db.transaction((items) => {
  for (const [id, vector] of items) {
    insertStmt.run(BigInt(id), new Float32Array(vector));
  }
});

insertVectors(items);

const rows = db
  .prepare(
    `
  SELECT
    rowid,
    distance
  FROM vec_items
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 3
`
  )
  .all(new Float32Array(query));

console.log(rows);

And the output:

$ bun demo.ts
sqlite_version=3.45.1, vec_version=v0.0.1-alpha.9
[
  {
    rowid: 3,
    distance: 0,
  }, {
    rowid: 4,
    distance: 0.19999998807907104,
  }, {
    rowid: 2,
    distance: 0.20000001788139343,
  }
]

Very similar to the Node.js demo! The only real difference is that the builtin bun:sqlite package is used instead of better-sqlite3, which already has a similar API to better-sqlite3, so little code changes are needed. However, bun:sqlite will by default use the system-level SQLite library, with on MacOS doesn't allow extensions. So I needed to use Database.setCustomSQLite() to point to the brew version of SQLite instead.

Although technically, Bun is compatible with Node.js, so theoretically we could also run the Node.js version with Bun without fuss. Though when I try on my x86 Mac, it fails for me:

$ bun ../simple-node/demo.mjs
dyld[11775]: missing symbol called
Killed: 9

Deno (almost!)

Here's the Deno implementation of the "simple" demo, using Deno's jsr:@db/sqlite library and the sqlite-vec npm package:

Show Deno code
import { Database } from "jsr:@db/sqlite@0.11";
import * as sqliteVec from "npm:sqlite-vec@0.0.1-alpha.9";

const db = new Database(":memory:");
db.enableLoadExtension = true;
sqliteVec.load(db);
db.enableLoadExtension = false;

const [sqlite_version, vec_version] = db
  .prepare("select sqlite_version(), vec_version()")
  .value<[string, string]>()!;
console.log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);

const items = [
  [1, [0.1, 0.1, 0.1, 0.1]],
  [2, [0.2, 0.2, 0.2, 0.2]],
  [3, [0.3, 0.3, 0.3, 0.3]],
  [4, [0.4, 0.4, 0.4, 0.4]],
  [5, [0.5, 0.5, 0.5, 0.5]],
];
const query = [0.3, 0.3, 0.3, 0.3];

db.exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])");

const insertStmt = db.prepare(
  "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)"
);

const insertVectors = db.transaction((items) => {
  for (const [id, vector] of items) {
    insertStmt.run(BigInt(id), new Uint8Array(new Float32Array(vector).buffer));
  }
});

insertVectors(items);

const rows = db
  .prepare(
    `
  SELECT
    rowid,
    distance
  FROM vec_items
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 5
`
  )
  .all([new Uint8Array(new Float32Array(query).buffer)]);

console.log(rows);

db.close();

And the output:

$ deno run -A --unstable-ffi demo.ts
error: Error getting response at https://registry.npmjs.org/sqlite-vec for package "sqlite-vec": invalid type: null, expected a sequence at line 1 column 916
    at file:///Users/.../demo.ts:2:28

But if fails! This is because of an obscure bug that should fix on the next version of Deno. Long story short, I published an old version of the sqlite-vec NPM package with an error on the package.json file, which broke Deno/npm compatability. Thankfully, the Deno team put out a fix quickly, so hopefully we don't have to wait long for Deno 1.44!

Nonetheless, I really like Deno's SQLite client library, specifically the jsr:@db/sqlite package. Deno's FFI support is a little awkward, but it's really fast and has an even nicer API than better-sqlite3, on my opinion.

Client-side JavaScript

sqlite-vec can be compiled to WASM to be used in the browser, using the official SQLite WASM build. There is a sqlite-vec-wasm-demo NPM package that contains a pre-comiled WASM build of the SQLite library with sqlite-vec statically compiled in, that can be used like so:

Show WASM code
<html>
  <body>
    <h1>sqlite-vec demo/simple-wasm</h1>

    <div id="target"></div>
    <script type="module">
      import {default as init} from "https://cdn.jsdelivr.net/npm/sqlite-vec-wasm-demo@latest/sqlite3.mjs";

      function log(msg) {
        const pre = document.querySelector('#target').appendChild(document.createElement("pre"));
        pre.textContent = msg;
      }

      const sqlite3 = await init();
      const db = new sqlite3.oo1.DB(":memory:");

      const [sqlite_version, vec_version] = db.selectArray('select sqlite_version(), vec_version();')
      log(`sqlite_version=${sqlite_version}, vec_version=${vec_version}`);

      const items = [
        [1, [0.1, 0.1, 0.1, 0.1]],
        [2, [0.2, 0.2, 0.2, 0.2]],
        [3, [0.3, 0.3, 0.3, 0.3]],
        [4, [0.4, 0.4, 0.4, 0.4]],
        [5, [0.5, 0.5, 0.5, 0.5]],
      ];
      const query = [0.3, 0.3, 0.3, 0.3];

      db.exec('CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4]);');
      const stmt = db.prepare("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)");
      for(const item of items) {
        stmt
        .bind(1, item[0])
        .bind(2, new Float32Array(item[1]).buffer)
        .stepReset();
      }
      stmt.finalize();

      const rows = db
        .selectArrays(
          `
        SELECT
          rowid,
          distance
        FROM vec_items
        WHERE embedding MATCH ?
        ORDER BY distance
        LIMIT 3
      `, new Float32Array(query).buffer
        );

      log(JSON.stringify(rows));
    </script>
  </body>
</html>
`

If you open that in a web browser, you will see:

sqlite_version=3.45.3, vec_version=v0.0.1-alpha.9
[[3,0],[4,0.19999998807907104],[2,0.20000001788139343]]

The JavaScript bindings are from the SQLite team, which are a bit awkward, like new sqlite3.oo1.DB(":memory:"). But just like the other JavaScript runtimes, we can use new Float32Array(vector).buffer to bind vectors as BLOBs rather easily!

Go

Here's the Go implementation of the "simple" demo, using the mattn/go-sqlite3 CGO SQLite module and the cgo/sqlite-vec Go module:

Show Go code
package main

import (
	"bytes"
	"database/sql"
	"encoding/binary"
	"fmt"
	"log"

	sqlite_vec "github.com/asg017/sqlite-vec/bindings/go/cgo"
	_ "github.com/mattn/go-sqlite3"
)

// #cgo LDFLAGS: -L../../dist
import "C"

func serializeFloat32(vector []float32) ([]byte, error) {
	buf := new(bytes.Buffer)
	err := binary.Write(buf, binary.LittleEndian, vector)
	if err != nil {
		return nil, err
	}
	return buf.Bytes(), nil
}
func main() {
	sqlite_vec.Auto()
	db, err := sql.Open("sqlite3", ":memory:")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	var sqliteVersion string
	var vecVersion string
	err = db.QueryRow("select sqlite_version(), vec_version()").Scan(&sqliteVersion, &vecVersion)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("sqlite_version=%s, vec_version=%s\n", sqliteVersion, vecVersion)

	_, err = db.Exec("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")
	if err != nil {
		log.Fatal(err)
	}

	items := map[int][]float32{
		1: {0.1, 0.1, 0.1, 0.1},
		2: {0.2, 0.2, 0.2, 0.2},
		3: {0.3, 0.3, 0.3, 0.3},
		4: {0.4, 0.4, 0.4, 0.4},
		5: {0.5, 0.5, 0.5, 0.5},
	}
	q := []float32{0.3, 0.3, 0.3, 0.3}

	for id, values := range items {
		v, err := serializeFloat32(values)
		if err != nil {
			log.Fatal(err)
		}
		_, err = db.Exec("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", id, v)
		if err != nil {
			log.Fatal(err)
		}
	}

	query, err := serializeFloat32(q)
	if err != nil {
		log.Fatal(err)
	}

	rows, err := db.Query(`
		SELECT
			rowid,
			distance
		FROM vec_items
		WHERE embedding MATCH ?
		ORDER BY distance
		LIMIT 3
	`, query)

	if err != nil {
		log.Fatal(err)
	}

	for rows.Next() {
		var rowid int64
		var distance float64
		err = rows.Scan(&rowid, &distance)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Printf("rowid=%d, distance=%f\n", rowid, distance)
	}
	err = rows.Err()
	if err != nil {
		log.Fatal((err))
	}

}

And the output:

sqlite_version=3.45.1, vec_version=v0.0.1-alpha.3
rowid=3, distance=0.000000
rowid=4, distance=0.200000
rowid=2, distance=0.200000

I do love go, especially how easy it is to type SQL returned results. But we do have to write our own serializeFloat32 function to convert a slice of floats into a vector BLOB, which is annoying. And if I have to write if err != nil one more time...

Rust

Here's the Rust implementation of the "simple" demo, using the rusqlite crate and the sqlite-vec crate:

Show Rust code
use rusqlite::{ffi::sqlite3_auto_extension, Connection, Result};
use sqlite_vec::sqlite3_vec_init;
use zerocopy::AsBytes;

fn main() -> Result<()> {
    unsafe {
        sqlite3_auto_extension(Some(std::mem::transmute(sqlite3_vec_init as *const ())));
    }

    let db = Connection::open_in_memory()?;
    let v: Vec<f32> = vec![0.1, 0.2, 0.3];

    let (sqlite_version, vec_version, x): (String, String, String) = db.query_row(
        "select sqlite_version(), vec_version(), vec_to_json(?)",
        &[v.as_bytes()],
        |x| Ok((x.get(0)?, x.get(1)?, x.get(2)?)),
    )?;

    println!("sqlite_version={sqlite_version}, vec_version={vec_version}");

    let items: Vec<(usize, Vec<f32>)> = vec![
        (1, vec![0.1, 0.1, 0.1, 0.1]),
        (2, vec![0.2, 0.2, 0.2, 0.2]),
        (3, vec![0.3, 0.3, 0.3, 0.3]),
        (4, vec![0.4, 0.4, 0.4, 0.4]),
        (5, vec![0.5, 0.5, 0.5, 0.5]),
    ];
    println!("{x}");

    db.execute(
        "CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])",
        [],
    )?;
    let mut stmt = db.prepare("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)")?;
    for item in items {
        stmt.execute(rusqlite::params![item.0, item.1.as_bytes()])?;
    }

    let query: Vec<f32> = vec![0.3, 0.3, 0.3, 0.3];
    let result: Vec<(i64, f64)> = db
        .prepare(
            r"
          SELECT
            rowid,
            distance
          FROM vec_items
          WHERE embedding MATCH ?1
          ORDER BY distance
          LIMIT 3
        ",
        )?
        .query_map([query.as_bytes()], |r| Ok((r.get(0)?, r.get(1)?)))?
        .collect::<Result<Vec<_>, _>>()?;
    println!("{:?}", result);
    Ok(())
}

And the output:

$ cargo run demo.rs
   Compiling sqlite-vec-demo v0.0.0 (/Users/.../simple-rust)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.85s
     Running `target/debug/demo demo.rs`
sqlite_version=3.45.0, vec_version=v0.0.1-alpha.7
[0.100000,0.200000,0.300000]
[(3, 0.0), (4, 0.19999998807907104), (2, 0.20000001788139343)]

If you're okay adding the zerocopy crate, then coverting a Vec<f32> into a vector BLOB is as easy as a .as_bytes() call! The syntax can get a little gnarly, especially around .query_map().collect(), but man once you get it to work, it's extremely productive!

C

Finally, here's the C implementation of the "simple" demo, using the raw SQLite C API, and the sqlite-vec amalgamation!

Show C code
#include "sqlite3.h"
#include "sqlite-vec.h"
#include <stdio.h>
#include <unistd.h>
#include <assert.h>

int main(int argc, char *argv[]) {
  int rc = SQLITE_OK;
  sqlite3 *db;
  sqlite3_stmt *stmt;

  rc = sqlite3_auto_extension((void (*)())sqlite3_vec_init);
  assert(rc == SQLITE_OK);

  rc = sqlite3_open(":memory:", &db);
  assert(rc == SQLITE_OK);

  rc = sqlite3_prepare_v2(db, "SELECT sqlite_version(), vec_version()", -1, &stmt, NULL);
  assert(rc == SQLITE_OK);

  rc = sqlite3_step(stmt);
  printf("sqlite_version=%s, vec_version=%s\n", sqlite3_column_text(stmt, 0), sqlite3_column_text(stmt, 1));
  sqlite3_finalize(stmt);

  static const struct {
    sqlite3_int64 id;
    float vector[4];
  } items[] = {
    {1, {0.1, 0.1, 0.1, 0.1}},
    {2, {0.2, 0.2, 0.2, 0.2}},
    {3, {0.3, 0.3, 0.3, 0.3}},
    {4, {0.4, 0.4, 0.4, 0.4}},
    {5, {0.5, 0.5, 0.5, 0.5}},
  };
  float query[4] = {0.3, 0.3, 0.3, 0.3};


  rc = sqlite3_prepare_v2(db, "CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])", -1, &stmt, NULL);
  assert(rc == SQLITE_OK);
  rc = sqlite3_step(stmt);
  assert(rc == SQLITE_DONE);
  sqlite3_finalize(stmt);

  rc = sqlite3_exec(db, "BEGIN", NULL, NULL, NULL);
  assert(rc == SQLITE_OK);
  rc = sqlite3_prepare_v2(db, "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", -1, &stmt, NULL);
  assert(rc == SQLITE_OK);
  for (unsigned long i = 0; i < sizeof(items) / sizeof(items[0]); i++) {
    sqlite3_bind_int64(stmt, 1, items[i].id);
    sqlite3_bind_blob(stmt, 2, items[i].vector, sizeof(items[i].vector), SQLITE_STATIC);
    rc = sqlite3_step(stmt);
    assert(rc == SQLITE_DONE);
    sqlite3_reset(stmt);
  }
  sqlite3_finalize(stmt);
  rc = sqlite3_exec(db, "COMMIT", NULL, NULL, NULL);
  assert(rc == SQLITE_OK);

  rc = sqlite3_prepare_v2(db,
    "SELECT "
    "  rowid, "
    "  distance "
    "FROM vec_items "
    "WHERE embedding MATCH ?1 "
    "ORDER BY distance "
    "LIMIT 3 "
  , -1, &stmt, NULL);
  assert(rc == SQLITE_OK);

  sqlite3_bind_blob(stmt, 1, query, sizeof(query), SQLITE_STATIC);

  while(1) {
    rc = sqlite3_step(stmt);
    if(rc == SQLITE_DONE) break;
    assert(rc==SQLITE_ROW);
    sqlite3_int64 rowid = sqlite3_column_int64(stmt, 0);
    double distance = sqlite3_column_double(stmt, 1);
    printf("rowid=%lld distance=%f\n", rowid, distance);
  }
  sqlite3_finalize(stmt);
  sqlite3_close(db);
  return 0;
}

And the output:

$ make demo
gcc \
		-DSQLITE_CORE \
		-I../../ -I../../vendor \
		demo.c ../../sqlite-vec.c ../../vendor/sqlite3.c \
		-o demo
$ ./demo
sqlite_version=3.45.3, vec_version=v0.0.1-alpha.9
rowid=3 distance=0.000000
rowid=4 distance=0.200000
rowid=2 distance=0.200000

C is not a fun language is write and can be super unsafe. But man, it feels really nice to have a single 1.5MB executable with everything you need...

Final thoughts

SQL is the standard API across all languages

No matter the language, every single demo above ran basically the same SQL:

select sqlite_version(), vec_version();

create virtual table vec_items using vec0(embedding float[4]);

insert into vec_items(rowid, embedding)
  values (?, ?);

select
  rowid,
  distance
from vec_items
where embedding MATCH ?
order by distance
limit 3;

The only real differences were superficial: how the SQLite client library works for that language, how to convert a list of numbers to the BLOB format, and how to handle transactions.

I think this is the real super-power of sqlite-vec. It's rare to have a vector search library work in so many different languages and runtimes. But it's even more rare, perhaps unique, to have an API that's more-or-less the same across different programming languages. That means you can create and "train" an index in Python, serve it to users in the browser in JavaScript, in an Express server with Node.js, or in a CLI with Rust, all without much fuss.

Creating these bindings was worth it

It took me a loooooong time to figure out a good "distribution" strategy for SQLite extensions. When I first started creating SQLite extensions, I followed the pattern established by sqlean, which was to upload pre-compiled loadable extensions to Github Releases for popular platforms (MacOS, Linux, Windows, etc.).

This is great, but downloading links yourself can be quite a chore. If you want to use sqlite-vec in your Python package, you don't want to download and extract a tar.gz file: you just want to pip install something and go along your merry way.

So over the last few years, I've developed a few techniques to creating "binding" packages for these SQLite extensions on popular package managers: pypi, npm, rubygems, crates.io, and Golang modules. At this point, I've spent more time on these distributions than on SQLite extensions themselves, but it was worth it!

I haven't blogged about how the Rust/Go bindings work quite yet, but I hope to soon! I also have a work-in-progress project called sqlite-dist that automates a lot of these bindings, which I also plan to blog about in the near future.

Is it a good idea to have a SQLite client built into your language/runtime?

Of these examples, only Python and Bun have a builtin client library for SQLite. For all other languages, it's a 3rd party package that you must install and update yourself.

One one hand, this is great! No need to install another dependency, just write and use.

On the other hand, it makes working with modern SQLite features a bit harder.

For Python, the version of SQLite that's "bundled-in" depends on which Python your using. If you're using the default MacOS Python, that uses the default MacOS SQLite library, which is outdated and blocks SQLite extensions. If you're using brew, then it might use whatever SQLite version brew has installed. If you're using Docker, I have no idea.

For Bun, there's even a section in their bun:sqlite docs about this confusion. For MacOS users, the default SQLite library that Bun uses is the outdated default MacOS SQLite build. So Bun added a Database.setCustomSQLite("/path/to/libsqlite.dylib") method as a workaround. Which does work, but is an extra step you have to perform at runtime. And if you want to run your code on other operating systems, you'll need to add a few if statements to avoid accidentally loading in a SQLite build that doesn't exist. A small headache!

Now compare this to all the other libraries where SQLite is offered as a third party package: Node.js, Deno, Ruby, Go, and Rust. To update to a latest version of SQLite, you usually just need to update the package with whatever package manager your language/runtime offers. Sometimes there's a delay when a new SQLite version is released and package maintainers release a new version, which is something to consider. But it's way less likely that you'll get "surprised" by code that works on one machine and not the other, and you can handle updates yourself.