ai Experimental WINMACLNX
On-device AI inference powered by Candle (opens in a new tab) (Rust ML framework). No network calls, no API keys, no data leaving the machine.
Experimental — model selection and API shape may change before v1.0.
Capability
{
"capabilities": ["ai"]
}Import
import { ai } from 'veloxkit'Models
Models are downloaded from HuggingFace Hub on the first call and cached in ~/.cache/huggingface/. Subsequent calls reuse cached weights with no download.
| API | Model | Size | First-call download |
|---|---|---|---|
ai.embed() | sentence-transformers/all-MiniLM-L6-v2 | ~22 MB | ~1 s |
ai.generate() | Phi-2 Q4_K_M (quantized) | ~1.7 GB | ~5 min on fast connection |
ai.transcribe() | openai/whisper-tiny | ~75 MB | ~30 s |
The first call to each API blocks until the model is downloaded. Show a loading indicator or call the API during app startup. Subsequent calls load the already-cached model in ~1 s.
ai.embed(text)
Embed text into a 384-dimensional unit-normalised vector, suitable for semantic similarity search.
const vector = await ai.embed('the quick brown fox')
// → number[] (384 floats, length-1 normalized)Returns Promise<number[]> — 384-element float array.
Performance — ~5–20 ms per call after model load (CPU, single-threaded).
Semantic search with vectorDb
import { ai, vectorDb } from 'veloxkit'
const db = await vectorDb.open('embeddings.db')
// Upsert documents
for (const doc of documents) {
const vec = await ai.embed(doc.text)
await db.upsert('docs', doc.id, vec, { title: doc.title })
}
// Search
const query = await ai.embed('how do I reset my password?')
const results = await db.search('docs', query, 5)
// → [{ id, score, metadata }]ai.generate(prompt, opts?)
Generate text using Phi-2 (quantized 4-bit, CPU).
const output = await ai.generate('Summarize this text: ...', {
maxTokens: 200, // default 200
temperature: 0.7, // default 0.7; higher = more creative
})
console.log(output)Options
| Option | Type | Default | Description |
|---|---|---|---|
maxTokens | number | 200 | Maximum tokens to generate |
temperature | number | 0.7 | Sampling temperature (0 = deterministic, 1 = very random) |
Returns Promise<string> — full generated text.
Performance — ~10–30 seconds per 200 tokens on CPU. Consider calling during background tasks or on user initiation.
function GenerateButton({ prompt }: { prompt: string }) {
const [output, setOutput] = useState('')
const [loading, setLoading] = useState(false)
const handleGenerate = async () => {
setLoading(true)
try {
const result = await ai.generate(prompt, { maxTokens: 150 })
setOutput(result)
} finally {
setLoading(false)
}
}
return (
<View style={{ gap: 12 }}>
<Pressable onPress={handleGenerate} disabled={loading}
style={{ padding: 12, backgroundColor: '#3a3a5e', borderRadius: 8 }}>
<Text>{loading ? 'Generating…' : 'Generate'}</Text>
</Pressable>
{output ? <Text style={{ fontSize: 14, lineHeight: 22 }}>{output}</Text> : null}
</View>
)
}ai.transcribe(audioPath, opts?)
Transcribe an audio file to text using Whisper-tiny (CPU).
const transcript = await ai.transcribe('/recordings/meeting.wav', {
language: 'en', // ISO 639-1 code; empty string = auto-detect (default)
})
console.log(transcript)Options
| Option | Type | Default | Description |
|---|---|---|---|
language | string | '' | ISO 639-1 language code (e.g. 'en', 'fr'). Empty = auto-detect |
Returns Promise<string> — plain text transcript.
Supported formats — WAV (16 kHz mono preferred), MP3, FLAC, OGG.
Performance — ~5 seconds for a 30-second clip on CPU.
Record and transcribe
import { ai, dialog } from 'veloxkit'
async function pickAndTranscribe() {
const paths = await dialog.openFile({
filters: [{ name: 'Audio', extensions: ['wav', 'mp3', 'flac', 'ogg'] }],
})
if (!paths) return
const text = await ai.transcribe(paths[0], { language: 'en' })
return text
}Best practices
Warm up models at startup
// In your app entry point — downloads happen here, not mid-session
useEffect(() => {
const warmUp = async () => {
// Trigger model load in background
await ai.embed('warmup').catch(() => {})
}
warmUp()
}, [])Battery-aware generation
import { system } from 'veloxkit'
async function safeGenerate(prompt: string) {
const dark = system.getDarkMode() // warm API check
// The Rust layer logs a warning if on battery but doesn't block generation.
// Check manually if you want to warn the user:
const batt = await battery.getStatus()
if (batt && !batt.charging && batt.level < 0.2) {
throw new Error('Battery too low for AI inference — please plug in.')
}
return ai.generate(prompt)
}All inference runs on CPU using SIMD acceleration. GPU support (CUDA / Metal) is planned for a future release. The ~/.cache/huggingface/ directory is shared with other HuggingFace tools — if you've already downloaded these models via Python, they won't be re-downloaded.