Google is expanding LiteRT-LM, its open-source runtime for running large language models on edge devices, with new integration paths for iOS and browser apps.

The practical update is in the project's v0.12 line. Google's LiteRT-LM README lists early preview Swift APIs for native iOS and macOS apps, early preview Web JavaScript APIs for browser-based inference, community Flutter support, and a command-line update that adds NPU support alongside existing CPU and GPU backends.

Google's developer post frames the release as a move from Android-centered edge inference toward a broader cross-platform stack. It says LiteRT-LM can run Gemma 4 workloads across mobile, desktop, web, and embedded targets, while handling runtime details such as memory management, hardware acceleration, and model-specific orchestration.

The browser piece is notable because it brings the runtime into WebGPU-backed client-side execution. Google says its web demo ran on a MacBook Pro and that developers can see decode speeds up to 76 tokens per second in that setup. The same post also claims up to a 2.2x speedup from Multi-Token Prediction support with Gemma 4, plus 52 tokens per second on Android GPU and 56 tokens per second on iOS Metal for Gemma 4 E2B without MTP enabled.

The conservative read is that this is still developer infrastructure, not a finished consumer feature. But it gives app teams more ways to test private, low-latency AI features locally instead of sending every prompt to a server.