Google Adds Flex and Priority Service Tiers to the Gemini API
Google has introduced Flex and Priority service tiers for the Gemini API, giving developers a more explicit way to split background agent workloads from user-facing traffic without moving to a separate async architecture.
What changed
According to Google's announcement, Flex is the lower-cost option for latency-tolerant work, such as background research, enrichment, or longer-running agent tasks. Google's Flex docs say the tier offers a 50% cost reduction compared with standard rates, but with variable latency and best-effort availability. The notable part is that it still uses the normal synchronous API flow instead of the Batch API's file-based job handling.
Priority is the opposite side of that tradeoff. Google says it is meant for interactive workloads that need stronger reliability during peak demand. The Priority docs say the tier is available to Tier 2 and Tier 3 paid users across the GenerateContent and Interactions APIs, and that overflow traffic is gracefully downgraded to Standard processing rather than failing outright.
Why it matters
That makes this more than a pricing tweak. Agent builders increasingly mix cheap background reasoning with time-sensitive chat or tool calls, and Google's new service_tier controls give them a way to route both through the same API surface. The conservative read is that developers still need to test real latency and capacity behavior in production, but the product shape is clear: Google is trying to make Gemini easier to use as infrastructure for multi-speed agent systems.