🚧 VeloxKit is pre-release software. APIs may change before v1.0. Get started →
Documentation
APIs & Bindings
ai (Local AI)

ai Experimental WINMACLNX

On-device AI inference powered by Candle (opens in a new tab) (Rust ML framework). No network calls, no API keys, no data leaving the machine.

⚠️

Experimental — model selection and API shape may change before v1.0.

Capability

veloxkit.config.json
{
  "capabilities": ["ai"]
}

Import

import { ai } from 'veloxkit'

Models

Models are downloaded from HuggingFace Hub on the first call and cached in ~/.cache/huggingface/. Subsequent calls reuse cached weights with no download.

APIModelSizeFirst-call download
ai.embed()sentence-transformers/all-MiniLM-L6-v2~22 MB~1 s
ai.generate()Phi-2 Q4_K_M (quantized)~1.7 GB~5 min on fast connection
ai.transcribe()openai/whisper-tiny~75 MB~30 s
⚠️

The first call to each API blocks until the model is downloaded. Show a loading indicator or call the API during app startup. Subsequent calls load the already-cached model in ~1 s.


ai.embed(text)

Embed text into a 384-dimensional unit-normalised vector, suitable for semantic similarity search.

const vector = await ai.embed('the quick brown fox')
// → number[]  (384 floats, length-1 normalized)

Returns Promise<number[]> — 384-element float array.

Performance — ~5–20 ms per call after model load (CPU, single-threaded).

Semantic search with vectorDb

import { ai, vectorDb } from 'veloxkit'
 
const db = await vectorDb.open('embeddings.db')
 
// Upsert documents
for (const doc of documents) {
  const vec = await ai.embed(doc.text)
  await db.upsert('docs', doc.id, vec, { title: doc.title })
}
 
// Search
const query  = await ai.embed('how do I reset my password?')
const results = await db.search('docs', query, 5)
// → [{ id, score, metadata }]

ai.generate(prompt, opts?)

Generate text using Phi-2 (quantized 4-bit, CPU).

const output = await ai.generate('Summarize this text: ...', {
  maxTokens:   200,   // default 200
  temperature: 0.7,   // default 0.7; higher = more creative
})
console.log(output)

Options

OptionTypeDefaultDescription
maxTokensnumber200Maximum tokens to generate
temperaturenumber0.7Sampling temperature (0 = deterministic, 1 = very random)

Returns Promise<string> — full generated text.

Performance — ~10–30 seconds per 200 tokens on CPU. Consider calling during background tasks or on user initiation.

function GenerateButton({ prompt }: { prompt: string }) {
  const [output, setOutput]   = useState('')
  const [loading, setLoading] = useState(false)
 
  const handleGenerate = async () => {
    setLoading(true)
    try {
      const result = await ai.generate(prompt, { maxTokens: 150 })
      setOutput(result)
    } finally {
      setLoading(false)
    }
  }
 
  return (
    <View style={{ gap: 12 }}>
      <Pressable onPress={handleGenerate} disabled={loading}
                 style={{ padding: 12, backgroundColor: '#3a3a5e', borderRadius: 8 }}>
        <Text>{loading ? 'Generating…' : 'Generate'}</Text>
      </Pressable>
      {output ? <Text style={{ fontSize: 14, lineHeight: 22 }}>{output}</Text> : null}
    </View>
  )
}

ai.transcribe(audioPath, opts?)

Transcribe an audio file to text using Whisper-tiny (CPU).

const transcript = await ai.transcribe('/recordings/meeting.wav', {
  language: 'en',   // ISO 639-1 code; empty string = auto-detect (default)
})
console.log(transcript)

Options

OptionTypeDefaultDescription
languagestring''ISO 639-1 language code (e.g. 'en', 'fr'). Empty = auto-detect

Returns Promise<string> — plain text transcript.

Supported formats — WAV (16 kHz mono preferred), MP3, FLAC, OGG.

Performance — ~5 seconds for a 30-second clip on CPU.

Record and transcribe

import { ai, dialog } from 'veloxkit'
 
async function pickAndTranscribe() {
  const paths = await dialog.openFile({
    filters: [{ name: 'Audio', extensions: ['wav', 'mp3', 'flac', 'ogg'] }],
  })
  if (!paths) return
 
  const text = await ai.transcribe(paths[0], { language: 'en' })
  return text
}

Best practices

Warm up models at startup

// In your app entry point — downloads happen here, not mid-session
useEffect(() => {
  const warmUp = async () => {
    // Trigger model load in background
    await ai.embed('warmup').catch(() => {})
  }
  warmUp()
}, [])

Battery-aware generation

import { system } from 'veloxkit'
 
async function safeGenerate(prompt: string) {
  const dark = system.getDarkMode()  // warm API check
  // The Rust layer logs a warning if on battery but doesn't block generation.
  // Check manually if you want to warn the user:
  const batt = await battery.getStatus()
  if (batt && !batt.charging && batt.level < 0.2) {
    throw new Error('Battery too low for AI inference — please plug in.')
  }
  return ai.generate(prompt)
}

All inference runs on CPU using SIMD acceleration. GPU support (CUDA / Metal) is planned for a future release. The ~/.cache/huggingface/ directory is shared with other HuggingFace tools — if you've already downloaded these models via Python, they won't be re-downloaded.