Local AI Experimental
⚠️
Experimental — the ai API may change before v1.0. Core functionality is available but the API surface is not yet stable.
VeloxKit bundles llama.cpp (opens in a new tab) for on-device inference. No network required, no API keys.
Requires capability: "ai"
Setup
Place model files in assets/models/:
assets/models/
├── nomic-embed-text-v1.5.Q4_K_M.gguf ← embedding model (~80MB)
└── llama-3.2-1b-instruct.Q4_K_M.gguf ← generation model (~900MB)Register models in veloxkit.config.ts:
export default defineConfig({
capabilities: ['ai', 'db'],
ai: {
models: [
{
name: 'embedder',
path: './assets/models/nomic-embed-text-v1.5.Q4_K_M.gguf',
type: 'embedding',
},
{
name: 'generator',
path: './assets/models/llama-3.2-1b-instruct.Q4_K_M.gguf',
type: 'generation',
},
],
},
})Embeddings
Generate a vector embedding for text:
import { ai } from 'veloxkit'
const embedding = await ai.embed('embedder', 'My note content here')
// → Float32Array (1024 dimensions for nomic-embed-text-v1.5)Semantic search with sqlite-vec
Store embeddings and retrieve similar documents:
import { ai, db } from 'veloxkit'
// Setup (run once in initDatabase)
db.execute(`
CREATE VIRTUAL TABLE IF NOT EXISTS note_embeddings
USING vec0(embedding FLOAT[1024])
`)
// Index a note
async function indexNote(noteId: number, content: string) {
const embedding = await ai.embed('embedder', content)
db.execute(
'INSERT OR REPLACE INTO note_embeddings(rowid, embedding) VALUES (?, ?)',
[noteId, embedding]
)
}
// Search
async function semanticSearch(query: string, limit = 5) {
const queryEmbedding = await ai.embed('embedder', query)
return db.query<{ id: number; title: string; distance: number }>(
`SELECT notes.id, notes.title, ne.distance
FROM note_embeddings ne
JOIN notes ON notes.id = ne.rowid
WHERE ne.embedding MATCH ?
AND k = ?
ORDER BY ne.distance`,
[queryEmbedding, limit]
)
}Text generation
Stream a response from a local LLM:
import { ai } from 'veloxkit'
import { useState } from 'react'
function AIAssistant() {
const [response, setResponse] = useState('')
const [loading, setLoading] = useState(false)
async function ask(prompt: string) {
setLoading(true)
setResponse('')
const stream = ai.generate('generator', {
prompt,
maxTokens: 512,
temperature: 0.7,
systemPrompt: 'You are a helpful assistant. Be concise.',
})
for await (const chunk of stream) {
setResponse(prev => prev + chunk)
}
setLoading(false)
}
return (
<View style={{ flex: 1, padding: 16, gap: 12 }}>
<Pressable
onPress={() => ask('Summarize my recent notes')}
style={{ padding: '10px 16px', background: '#00A878', borderRadius: 6 }}
>
<Text style={{ color: '#fff' }}>Ask AI</Text>
</Pressable>
{loading && <Text style={{ color: '#666677' }}>Generating...</Text>}
<Text>{response}</Text>
</View>
)
}Transcription (Whisper)
⚠️
Transcription requires a Whisper GGUF model. Coming in v0.4.
// Coming in v0.4
const transcript = await ai.transcribe('whisper', audioBuffer)Model loading
Models load lazily on first use. You can preload them:
import { ai } from 'veloxkit'
// Preload on app start to avoid first-use latency
await ai.preload('embedder')Check load status:
const status = ai.modelStatus('embedder')
// → 'unloaded' | 'loading' | 'ready' | 'error'Performance notes
- Embedding a short text: ~5–20ms on Apple M2
- Generating 100 tokens: ~800ms on Apple M2 (1B model, Q4)
- Models are loaded into memory once and kept loaded until the app exits
- GPU acceleration (Metal/Vulkan) is automatic when available