Vision Lab
Point your camera, ask a question. A 1.6 GB vision-language model (Moondream2, with SmolVLM-500M fallback) runs entirely on your device via WebGPU. Frames never leave the browser.
One-time setup
- Downloads the model from HuggingFace's CDN β happens once.
- Cached in your browser's persistent storage (Origin Private File System).
- Subsequent visits load in ~3 seconds with no network.
- Frames are processed on-device. The model's text answer is the only thing that leaves your browser (only when you press "Find compatible skills").
Checking your browserβ¦
Bandwidth tip: the download is large. On cellular it'll eat your data plan; prefer Wi-Fi.
Loading modelβ¦
Connecting to HuggingFace CDNβ¦
First load takes 1β3 minutes. Future visits skip this entirely.
frame frozen β release to resume