Google's LiteRT-LM Adds Gemma 4 Support for On-Device Agents
Google has added Gemma 4 support to LiteRT-LM, its open-source inference framework for running large language models on local hardware. The change landed in the project's v0.10.1 release and gives developers a new path to ship multimodal, function-calling AI apps without depending on a cloud endpoint for every request.
What shipped
According to Google's release notes and project docs, LiteRT-LM now supports Gemma 4 across Android, iOS, web, desktop, and IoT targets including Raspberry Pi. The release also introduces a new CLI, direct Hugging Face imports, auto-conversion for unsupported models, speculative decoding, and a LiteRT-based KV cache.
Google positions LiteRT-LM as production infrastructure rather than a lab demo. The project README says the stack already powers on-device GenAI features in Chrome, Chromebook Plus, and Pixel Watch, while the companion edge blog frames Gemma 4 as the model family that brings agentic workflows, tool use, and audio-visual inputs to local apps.
Why it matters
The bigger story is distribution. A lot of open-weight model launches still assume datacenter hardware, but LiteRT-LM is aimed at phones, laptops, browsers, and embedded devices. That makes Gemma 4 more practical for developers building offline assistants, private mobile workflows, or edge tools that need low latency.
Community demos started appearing almost immediately. One video posted after the release shows Gemma 4 classifying iPhone photos locally through LiteRT-LM, a useful signal that Google's edge push is moving beyond benchmark charts and into working apps.