Protobuf for AI Infrastructure: Building the MCP Layer

Protobuf powers AI systems at scale—ML pipelines, feature stores, model serving. Yet the tooling ecosystem is fragmented and CLI-bound. We built an MCP server that works across the entire protobuf ecosystem.

What We Do at DevExp.ai

At DevExp.ai, we build MCP servers for developer tools.

MCP—Model Context Protocol—is an open standard created by Anthropic that lets AI assistants connect to external tools. Instead of AI just giving advice ("you should run buf lint"), MCP lets AI actually call the tool, get results, and iterate.

Think of it as the difference between a colleague who gives suggestions and one who can actually do the work.

The protocol has gained serious traction. OpenAI, Google, and Microsoft have all adopted it. There are now thousands of MCP servers for everything from databases to CI/CD systems. It's becoming the standard way AI agents interact with developer tooling.

Our focus is developer experience and product-led growth. We help teams make their CLI tools accessible through AI assistants—which changes how developers discover and adopt those tools.

How We Crossed Paths with Protobuf

One of our recent projects led us to protobuf.ai.

Protocol Buffers (protobuf) is Google's language-neutral format for serializing structured data. It's been around since 2008. If you've used gRPC, Kubernetes, or most large-scale microservices architectures, you've touched protobuf—even if you didn't realize it.

What surprised us: protobuf plays a much bigger role in AI infrastructure than most people recognize.

Protobuf Is Quietly Powering AI Systems

When people talk about AI infrastructure, they mention GPUs, model architectures, training frameworks. They rarely mention serialization formats.

But look at what's underneath most AI systems at scale:

ML Pipelines

Training data needs to move efficiently between storage, preprocessing, and training. At scale, that's often protobuf—5-10x smaller than JSON, faster to parse.

Feature Stores

Features fed into ML models need consistent schemas. Change a feature definition without coordination, and model predictions break. Protobuf schemas enforce those contracts.

Model Serving

Low-latency inference APIs often use gRPC, which is built on protobuf. When milliseconds matter, JSON parsing overhead adds up.

Event Streaming

Kafka, Redpanda, Pulsar—event-driven architectures frequently use protobuf for message encoding. Schema registries manage compatibility.

Data Contracts

As organizations adopt data mesh patterns, teams need contracts between data producers and consumers. Protobuf schemas serve as those contracts.

Protobuf is connective tissue. AI systems depend on it even when "AI" isn't in the name.

The Gap: No AI-Native Tooling

The protobuf ecosystem is mature. Buf has built excellent CLI tools and a schema registry. Protoc has been reliable for years. gRPC is battle-tested.

But all of it is CLI-first. Manual. Disconnected from how developers increasingly work.

And the tooling is fragmented. You've got Buf, protoc, gRPC, Connect-RPC, Twirp, ScalaPB—each with their own CLI, config files, and conventions.

Here's the gap we noticed:

What AI assistants can do What they couldn't do
Explain protobuf syntax Validate a schema against real linters
Suggest schema designs Check if a change breaks wire compatibility
Describe best practices Generate working SDKs
Answer documentation questions Enforce org-specific conventions

Without tool access, AI can only advise. It can't verify. It can't act.

That's a problem when you're trying to move fast. You get a schema suggestion from AI, then have to context-switch to a terminal, run buf lint, interpret the output, go back to the AI, iterate. The feedback loop is broken.

What We Built: The Switzerland of Protocol Buffers

We built protobuf.ai as a platform-agnostic MCP server—one interface that works across the entire protobuf ecosystem.

Supported Platforms

Platform What it's for
Protoc Standard protobuf compiler
Buf Modern build system, linting, breaking change detection
gRPC High-performance RPC framework
Connect-RPC Browser-friendly RPC
Twirp Simple JSON-RPC
ScalaPB Scala protobuf support

Instead of learning six different CLIs, you describe what you want. The MCP server figures out which tools to call.

"Create a user service with authentication, make sure it works with both gRPC and Connect"
→ AI generates schema, validates against both platforms, reports compatibility

Core Capabilities

1. Natural Language Schema Generation

Describe your service in plain English, get production-ready proto3.

"User management system with email validation and role-based access"
→ Complete message definitions, service methods, validation rules

The system extracts requirements, generates proper package structure, and applies naming conventions automatically.

2. Cross-Platform Compatibility Analysis

Before you commit to a schema, know if it works everywhere you need it.

"Will this schema work with gRPC, Connect, and Twirp?"
→ Compatibility report with per-platform scores, unsupported features flagged

The analyzer checks streaming support, null handling patterns, authentication methods—everything that varies between platforms.

3. Breaking Change Prediction

Not just detection—prediction. Know what's likely to break before you change it.

  • Identifies fields likely to evolve
  • Flags potential field number conflicts
  • Estimates impact severity
  • Suggests migration strategies
"I want to change user_id from int32 to string"
→ "This breaks wire compatibility. 3 downstream services affected.
   Here's a migration plan with dual-support period."

4. Pattern Learning

The system learns from every schema it processes.

  • Extracts patterns from your organization's schemas
  • Suggests consistent naming based on what you've done before
  • Predicts performance characteristics (serialization time, memory, bandwidth)
  • Recommends templates for common domains (auth, payments, messaging)

This creates compounding value—the tool gets smarter as you use it.

5. Migration Planning

Moving from gRPC to Connect? Twirp to Buf? The system generates a four-phase migration plan:

  1. Compatibility analysis
  2. Schema adaptation
  3. Code generation
  4. Validation and testing

Includes risk assessment and rollback procedures.

The Registry: Semantic Schema Management

Alongside the MCP server, we built registry.protobuf.ai—a schema registry designed for AI-native workflows.

Vector-Based Semantic Search

Find schemas by describing what you need, not memorizing names.

"Find schemas related to payment processing"
→ Returns semantically similar schemas across your organization

Built on pgvector with OpenAI embeddings. The registry understands what your schemas mean, not just what they're called.

Breaking Change Detection with Impact Analysis

When you push a schema change, the registry:

  • Identifies breaking changes
  • Traces downstream dependencies
  • Shows which services are affected
  • Generates evolution plans with deprecation strategies

Auto-Generated SDKs

Push a schema, get client libraries automatically published:

  • TypeScript → NPM
  • Python → PyPI
  • Go → GitHub releases
  • Java → Maven
  • Rust → Crates.io

GitHub Actions handles the CI/CD. You push schemas, clients get updated packages.

Confluent Schema Registry Compatibility

Drop-in replacement for existing Confluent deployments. Same API, more intelligence. Teams can migrate without rewriting clients.

Multi-Format Support

Not just protobuf—the registry handles Avro, JSON Schema, Thrift, and GraphQL. One registry for all your schema needs.

Distributed Validation Agents

For teams that need validation at scale, we built a distributed agent network using LangGraph.

Schema Analyzer Agent

  • AST-level schema analysis
  • Breaking change detection by comparing old/new versions
  • Protocol compatibility across gRPC, Connect, gRPC-Web, REST
  • AI-powered recommendations

Performance Tester Agent

  • Multi-region load testing (US, EU, Asia)
  • P95/P99 latency measurement
  • Throughput and error rate tracking
  • Threshold validation against your SLOs

Conformance Validator Agent

  • Runtime validation against protovalidate constraints
  • Service contract verification
  • Message format validation
  • Constraint-driven test generation

LangGraph Orchestration

The agents coordinate through a state machine:

Schema Analysis → Performance Testing → Conformance Validation → Results

Conditional routing means if schema analysis fails, you don't waste time on performance tests. Results aggregate across regions for unified metrics.

Geographic Testing

APIs behave differently in different regions. The performance tester runs simultaneously from multiple locations, so you see real-world latency patterns—not just local performance.

Why This Matters for AI Infrastructure

AI systems are scaling fast. The data infrastructure underneath them needs to keep up.

Schema management becomes critical when you have:

  • Dozens of ML pipelines with different data contracts
  • Feature stores that need consistent definitions
  • Model serving endpoints across regions
  • Event-driven architectures with strict compatibility requirements

Manual schema management doesn't scale. Teams need:

  • Automated validation in the development workflow
  • Breaking change detection before deployment
  • Cross-platform compatibility without manual testing
  • Governance that doesn't slow down development

That's what we built.

The Workflow Shift

Before:

  1. Learn protobuf syntax
  2. Pick a platform (Buf? gRPC? Connect?)
  3. Install toolchains
  4. Write schemas manually
  5. Run linters, fix errors, repeat
  6. Test compatibility by deploying and hoping
  7. Generate SDKs for each language manually

After:

  1. Describe what you need
  2. AI generates, validates across platforms, iterates
  3. Push to registry
  4. SDKs auto-publish
  5. Agents validate performance and conformance

Same underlying tools. Different interface. The complexity is still there—you just don't have to manage it.

Technical Architecture

MCP Server (protobuf.ai)

  • 30+ MCP tools
  • Platform adapters for Protoc, Buf, gRPC, Connect-RPC, Twirp, ScalaPB
  • Stateless deployment on Cloudflare Workers
  • OpenAI/Anthropic for natural language processing

Registry (registry.protobuf.ai)

  • PostgreSQL + pgvector on Supabase
  • 384-dimensional embeddings via OpenAI
  • GitHub integration for SDK automation
  • Confluent-compatible API layer

Agents (protobuf-ai-agents)

  • LangGraph for orchestration
  • Express API + A2A protocol for agent communication
  • Multi-region deployment for geographic testing

Make Your Developer Tools Agent-Accessible

We help companies build MCP servers that let AI agents use their products. Whether you're building for protobufs, databases, or any developer tool—the pattern is the same.

Schedule a Demo

What's Next

  • Impact analysis: "Which services break if I change this?"
  • Migration automation: Execute migration plans, not just generate them
  • More language SDKs: Kotlin, Swift, PHP
  • IDE integration: Schema validation in VS Code, JetBrains

Resources

About DevExp.ai

We build AI-native interfaces for developer tools. If your platform has a CLI, it should probably have an MCP server. Get in touch to talk about what that looks like for your tooling.