Introducing sqlite-rembed: A SQLite extension for generating text embeddings from remote APIs

2024-07-25 by Alex Garcia

_tl;dr — sqlite-rembed is a new SQLite extension for generating text embeddings from remote APIs — like OpenAI, Nomic, Cohere, llamafile, Ollama, and more! It bundles its own HTTP client, so it can be used in small environments like the official SQLite CLI. It doesn't support batch embeddings yet, but can still be useful in many cases.

sqlite-rembed is a new SQLite extension I've been experimenting with, as a sister project to sqlite-vec. It connects to various 3rd party APIs to generate text embeddings.

For example, to use OpenAI's embedding service, this is all you need:

INSERT INTO temp.rembed_clients(name, options)
  VALUES ('text-embedding-3-small', 'openai');

select rembed(
  'text-embedding-3-small',
  'The United States Postal Service is an independent agency...'
); -- X'A452...01FC', Blob<6144 bytes>

Here we register a new rembed "client" named text-embedding-3-small, using the special openai option. By default, The openai option will source your API key from the OPENAI_API_KEY environment variable, and use the client name (text-embedding-3-small) as the model name.

Now, we can use the rembed() SQL function to generate embeddings from OpenAI! It returns the embeddings in a compact BLOB format, the same format that sqlite-vec uses. In this case, text-embedding-3-small returns 1536 dimensions, so a 1536 * 4 = 6144 length BLOB is returned.

And sqlite-rembed has support for other providers! Here's an example that uses Nomic's embedding API:

INSERT INTO temp.rembed_clients(name, options)
  VALUES ('nomic-embed-text-v1.5', 'nomic');

select rembed(
  'nomic-embed-text-v1.5',
  'The United States Postal Service is an independent agency...'
);

And with Cohere's embedding API:

INSERT INTO temp.rembed_clients(name, options)
  VALUES ('embed-english-v3.0', 'cohere');

select rembed(
  'embed-english-v3.0',
  'The United States Postal Service is an independent agency...'
);

Notice how you can have multiple clients, all with different names and using different API providers. Secrets are sourced from places you expect: NOMIC_API_KEY, CO_API_KEY, and so on.

If you want to manually configure which API keys to use, or change the "base URL" of a provider, you can do so with rembed_client_options():

INSERT INTO temp.rembed_clients(name, options) VALUES
  (
    'text-embedding-3-small',
    rembed_client_options(
      'format', 'openai',
      'key', :OPENAI_API_KEY -- SQL parameter to bind an API key
    )
  );

In total, sqlite-rembed currently has support for the following embedding providers:

OpenAI
Nomic
Cohere
Jina
MixedBread
Llamafile
Ollama

¶ "Remote" embeddings can still be local!

sqlite-rembed stands for "SQLite remote embeddings," in contrast to its sister project sqlite-lembed that stands for "SQLite local embeddings." For sqlite-lembed, "local" means inside the same process, no external process or server needed. "Remote" in sqlite-rembed just means "outside the current process", which isn't always an outside https://... server.

You can totally run a embeddings model locally with llamafile, Ollama, or some other "OpenAI compatible" service, and point sqlite-rembed to a http://localhost:... endpoint.

Let's take llamafile as an example: follow the "Getting Started with LLaMAfiler" guide. Once up, you'll have a local embeddings server available to you at http://127.0.0.1:8080/. To use it from sqlite-rembed, register with the llamafile option:

INSERT INTO temp.rembed_clients(name, options)
 VALUES ('llamafile', 'llamafile');

.mode quote

select rembed('llamafile', 'Tennis star Coco Gauff will carry the U.S. flag...');

And that's it! Not a single byte of your data will leave your computer.

Another option is Ollama's embeddings support. Once installed, Ollama will have a constantly running server at http://localhost:11434. To use from sqlite-rembed, register a ollama client like so:

INSERT INTO temp.rembed_clients(name, options)
  VALUES ('snowflake-arctic-embed:s', 'ollama');

select rembed('ollama', 'LeVar Burton talks about his changing...');

Where the snowflake-arctic-embed:s model I downloaded with ollama pull snowflake-arctic-embed:s. This approach is nice because the Ollama service will be constantly running in the background, and will "wake up" embedding models into memory on first request (and will unload after 5 minutes of inactivity). Again, not a single byte of your data leaves your computer.

So try out sqlite-rembed today! There are pre-compiled binaries on Github releases, or you can pip install sqlite-rembed or npm install sqlite-remebed.