Run Agentic AI MCP Servers in Google Cloud Run

Transforming AI Tooling: Deploying Remote MCP Servers with Google Cloud Run

As Large Language Models (LLMs) become more central to modern software systems, the need to supply them with relevant and reliable context has grown urgent. Developers often face challenges integrating external data and tools into LLM workflows, especially when trying to maintain deterministic behaviors or replicate precise calculations. Recognizing this friction, Anthropic introduced the Model Context Protocol (MCP)—a standardized way to define and serve tool interfaces for LLMs. MCP bridges models with external context using structured, type-safe requests, enabling everything from math operations to full API integrations.

Initially, MCP was mostly used locally via stdio, limiting broader adoption. But now, thanks to Google Cloud Run’s support for remote MCP servers using the streamable-http transport, a whole new architecture is unlocked. Developers can deploy these servers quickly and securely, removing the need for local hosting while taking advantage of Google’s scalable infrastructure. This evolution redefines how teams collaborate on AI tools, enabling shared endpoints with enterprise-grade security, scalability, and speed.

From Local to Global: Evolving MCP with Streamable HTTP

The Model Context Protocol works on a client-server model. At first, it supported only local interaction using stdio—ideal for simple development but impractical for distributed teams or production systems. Over time, two new transport options were introduced: Server-Sent Events (SSE) and streamable-http. While SSE remains available for legacy compatibility, the newer streamable-http transport has emerged as the recommended choice due to its robustness and standard compliance with HTTP POST and GET methods.

Using streamable-http, an MCP server runs as an independent process with a single HTTP endpoint—say, https://yourdomain.com/mcp—capable of handling multiple simultaneous client connections. This allows remote clients, including AI agents or developer scripts, to query and interact with the server without running their own local versions. And by pairing this with Cloud Run’s scalable infrastructure, developers can serve thousands of concurrent tool requests with automatic load management and security controls.

Google’s documentation now includes a hands-on guide showing how to get such a server running on Cloud Run in under 10 minutes. It’s a major step forward for teams working on AI-powered applications that need to integrate deterministic logic, proprietary APIs, or shared tooling in a controlled and maintainable way.

A Math Server Example: FastMCP on Cloud Run in Minutes

To demonstrate how accessible it is to get started, Google offers a minimal example: a simple math MCP server that provides two tools—add and subtract. The server is built using FastMCP, a Python package designed to make implementing MCP endpoints quick and clean.

The process begins by creating a project directory and using uv, a high-performance Python project manager, to scaffold the project. After installing FastMCP as a dependency, developers define tool functions in a server.py script. These tools are annotated with @mcp.tool() decorators, making them discoverable by any MCP client. The server is then set to run on 0.0.0.0 using the streamable-http transport, which is required for Cloud Run compatibility.

Dockerizing the project is straightforward. A lightweight Python image forms the base, uv is used to install dependencies and run the server, and the container is set up for immediate deployment. Google Cloud Run supports two deployment paths: directly from source, or via a container image hosted on Artifact Registry. Both methods include the --no-allow-unauthenticated flag, a critical setting that enforces authentication for any incoming request.

By following this workflow, developers can deploy a server in minutes and avoid the complexity of provisioning and maintaining traditional infrastructure.

Secure, Scalable, and Shareable: Benefits of Remote MCP on Cloud Run

There are compelling reasons to adopt remote MCP servers hosted on Google Cloud Run. First and foremost is scalability. Because Cloud Run automatically adjusts resource allocation based on demand, the server can handle a burst of requests or idle quietly during off-hours, eliminating overprovisioning. This pay-as-you-go model is perfect for the unpredictable nature of LLM usage patterns.

Second is the centralization benefit. Teams often struggle with version drift or inconsistent behavior when everyone runs their own local tooling. A shared remote MCP server guarantees a single source of truth, reducing bugs and easing collaboration. IAM permissions allow fine-grained access control, so only authorized developers or systems can query or invoke tools.

Security is also baked in. With --no-allow-unauthenticated in place, only users or agents granted the roles/run.invoker role can access the server. This prevents abuse and ensures all tool invocations are traceable and legitimate. Cloud Run proxy tools even allow developers to run authenticated tunnels from their local machines, which is ideal for testing or hybrid workflows.

Combined, these features turn what was once a local development protocol into a fully-fledged API backend—maintainable, secure, and battle-tested in production.

Test and Extend: Building a Foundation for Agentic Applications

With your MCP server deployed, the next logical step is testing. Using FastMCP’s client interface, developers can connect to the server—typically via http://127.0.0.1:8080/mcp when proxied—and invoke tools like add or subtract. The output confirms the round-trip flow: the server advertises its tools, receives structured calls, and returns structured results. This example, while simple, illustrates the core value of MCP: deterministic, predictable extensions to the reasoning capabilities of language models.

As you expand your tooling, you might integrate APIs that connect to CRMs, databases, external file storage, or even proprietary business logic. Each function can be exposed via MCP, letting your LLMs interact with them safely and consistently. The FastMCP library supports complex data types, tool chaining, and asynchronous calls, all of which lend themselves well to more advanced AI agent behavior.

Cloud Run makes it easy to version, test, and redeploy these servers without downtime. Infrastructure-as-code tools like Terraform or CI/CD pipelines can automate the full lifecycle—from development to deployment—allowing teams to iterate rapidly without compromising stability.

In many ways, this approach mirrors the architecture of modern web services. But here, instead of building for human users, you’re creating structured interfaces for intelligent systems—systems that can introspect available tools and use them dynamically.

Conclusion: A Smarter, Simpler Way to Extend LLMs

Jack Wotherspoon, a Developer Advocate at Google, wrote in his June 2025 guide that developers could get a functional MCP server up and running on Google Cloud Run in less than ten minutes. That’s not just a promise of speed—it’s a vision of a smarter development paradigm. One where context, tooling, and computation live outside the model but remain seamlessly accessible through standard, discoverable protocols.

Whether you’re prototyping a personal project or building enterprise-grade AI systems, deploying your MCP servers on Cloud Run delivers the scalability, security, and ease-of-use needed to take ideas into production. The combination of FastMCP and Cloud Run ensures developers spend more time building the tools their agents need and less time managing servers or reinventing connection logic.

To explore the full tutorial and try it yourself, visit Google’s guide here:
👉 Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes