Protobuf for AI Infrastructure: Building the MCP Layer

What We Do at DevExp.ai

At DevExp.ai, we build MCP servers for developer tools.

MCP—Model Context Protocol—is an open standard created by Anthropic that lets AI assistants connect to external tools. Instead of AI just giving advice ("you should run buf lint"), MCP lets AI actually call the tool, get results, and iterate.

Think of it as the difference between a colleague who gives suggestions and one who can actually do the work.

The protocol has gained serious traction. OpenAI, Google, and Microsoft have all adopted it. There are now thousands of MCP servers for everything from databases to CI/CD systems. It's becoming the standard way AI agents interact with developer tooling.

Our focus is developer experience and product-led growth. We help teams make their CLI tools accessible through AI assistants—which changes how developers discover and adopt those tools.

How We Crossed Paths with Protobuf

One of our recent projects led us to protobuf.ai.

Protocol Buffers (protobuf) is Google's language-neutral format for serializing structured data. It's been around since 2008. If you've used gRPC, Kubernetes, or most large-scale microservices architectures, you've touched protobuf—even if you didn't realize it.

What surprised us: protobuf plays a much bigger role in AI infrastructure than most people recognize.

Protobuf Is Quietly Powering AI Systems

When people talk about AI infrastructure, they mention GPUs, model architectures, training frameworks. They rarely mention serialization formats.

But look at what's underneath most AI systems at scale:

ML Pipelines

Training data needs to move efficiently between storage, preprocessing, and training. At scale, that's often protobuf—5-10x smaller than JSON, faster to parse.

Feature Stores

Features fed into ML models need consistent schemas. Change a feature definition without coordination, and model predictions break. Protobuf schemas enforce those contracts.

Model Serving

Low-latency inference APIs often use gRPC, which is built on protobuf. When milliseconds matter, JSON parsing overhead adds up.

Event Streaming

Kafka, Redpanda, Pulsar—event-driven architectures frequently use protobuf for message encoding. Schema registries manage compatibility.

Data Contracts

As organizations adopt data mesh patterns, teams need contracts between data producers and consumers. Protobuf schemas serve as those contracts.

Protobuf is connective tissue. AI systems depend on it even when "AI" isn't in the name.

The Gap: No AI-Native Tooling

The protobuf ecosystem is mature. Buf has built excellent CLI tools and a schema registry. Protoc has been reliable for years. gRPC is battle-tested.

But all of it is CLI-first. Manual. Disconnected from how developers increasingly work.

And the tooling is fragmented. You've got Buf, protoc, gRPC, Connect-RPC, Twirp, ScalaPB—each with their own CLI, config files, and conventions.

Here's the gap we noticed:

What AI assistants can do	What they couldn't do
Explain protobuf syntax	Validate a schema against real linters
Suggest schema designs	Check if a change breaks wire compatibility
Describe best practices	Generate working SDKs
Answer documentation questions	Enforce org-specific conventions

Without tool access, AI can only advise. It can't verify. It can't act.

That's a problem when you're trying to move fast. You get a schema suggestion from AI, then have to context-switch to a terminal, run buf lint, interpret the output, go back to the AI, iterate. The feedback loop is broken.

What We Built: The Switzerland of Protocol Buffers

We built protobuf.ai as a platform-agnostic MCP server—one interface that works across the entire protobuf ecosystem.

Supported Platforms

Platform	What it's for
Protoc	Standard protobuf compiler
Buf	Modern build system, linting, breaking change detection
gRPC	High-performance RPC framework
Connect-RPC	Browser-friendly RPC
Twirp	Simple JSON-RPC
ScalaPB	Scala protobuf support

Instead of learning six different CLIs, you describe what you want. The MCP server figures out which tools to call.

"Create a user service with authentication, make sure it works with both gRPC and Connect"
→ AI generates schema, validates against both platforms, reports compatibility

Core Capabilities

1. Natural Language Schema Generation

Describe your service in plain English, get production-ready proto3.

"User management system with email validation and role-based access"
→ Complete message definitions, service methods, validation rules

The system extracts requirements, generates proper package structure, and applies naming conventions automatically.

2. Cross-Platform Compatibility Analysis

Before you commit to a schema, know if it works everywhere you need it.

"Will this schema work with gRPC, Connect, and Twirp?"
→ Compatibility report with per-platform scores, unsupported features flagged

The analyzer checks streaming support, null handling patterns, authentication methods—everything that varies between platforms.

3. Breaking Change Prediction

Not just detection—prediction. Know what's likely to break before you change it.

Identifies fields likely to evolve
Flags potential field number conflicts
Estimates impact severity
Suggests migration strategies

"I want to change user_id from int32 to string"
→ "This breaks wire compatibility. 3 downstream services affected.
   Here's a migration plan with dual-support period."

4. Pattern Learning

The system learns from every schema it processes.

Extracts patterns from your organization's schemas
Suggests consistent naming based on what you've done before
Predicts performance characteristics (serialization time, memory, bandwidth)
Recommends templates for common domains (auth, payments, messaging)

This creates compounding value—the tool gets smarter as you use it.

5. Migration Planning

Moving from gRPC to Connect? Twirp to Buf? The system generates a four-phase migration plan:

Compatibility analysis
Schema adaptation
Code generation
Validation and testing

Includes risk assessment and rollback procedures.

The Registry: Semantic Schema Management

Alongside the MCP server, we built registry.protobuf.ai—a schema registry designed for AI-native workflows.

Vector-Based Semantic Search

Find schemas by describing what you need, not memorizing names.

"Find schemas related to payment processing"
→ Returns semantically similar schemas across your organization

Built on pgvector with OpenAI embeddings. The registry understands what your schemas mean, not just what they're called.

Breaking Change Detection with Impact Analysis

When you push a schema change, the registry:

Identifies breaking changes
Traces downstream dependencies
Shows which services are affected
Generates evolution plans with deprecation strategies

Auto-Generated SDKs

Push a schema, get client libraries automatically published:

TypeScript → NPM
Python → PyPI
Go → GitHub releases
Java → Maven
Rust → Crates.io

GitHub Actions handles the CI/CD. You push schemas, clients get updated packages.

Confluent Schema Registry Compatibility

Drop-in replacement for existing Confluent deployments. Same API, more intelligence. Teams can migrate without rewriting clients.

Multi-Format Support

Not just protobuf—the registry handles Avro, JSON Schema, Thrift, and GraphQL. One registry for all your schema needs.

Distributed Validation Agents

For teams that need validation at scale, we built a distributed agent network using LangGraph.

Schema Analyzer Agent

AST-level schema analysis
Breaking change detection by comparing old/new versions
Protocol compatibility across gRPC, Connect, gRPC-Web, REST
AI-powered recommendations

Performance Tester Agent

Multi-region load testing (US, EU, Asia)
P95/P99 latency measurement
Throughput and error rate tracking
Threshold validation against your SLOs

Conformance Validator Agent

Runtime validation against protovalidate constraints
Service contract verification
Message format validation
Constraint-driven test generation

LangGraph Orchestration

The agents coordinate through a state machine:

Schema Analysis → Performance Testing → Conformance Validation → Results

Conditional routing means if schema analysis fails, you don't waste time on performance tests. Results aggregate across regions for unified metrics.

Geographic Testing

APIs behave differently in different regions. The performance tester runs simultaneously from multiple locations, so you see real-world latency patterns—not just local performance.

Why This Matters for AI Infrastructure

AI systems are scaling fast. The data infrastructure underneath them needs to keep up.

Schema management becomes critical when you have:

Dozens of ML pipelines with different data contracts
Feature stores that need consistent definitions
Model serving endpoints across regions
Event-driven architectures with strict compatibility requirements

Manual schema management doesn't scale. Teams need:

Automated validation in the development workflow
Breaking change detection before deployment
Cross-platform compatibility without manual testing
Governance that doesn't slow down development

That's what we built.

The Workflow Shift

Before:

Learn protobuf syntax
Pick a platform (Buf? gRPC? Connect?)
Install toolchains
Write schemas manually
Run linters, fix errors, repeat
Test compatibility by deploying and hoping
Generate SDKs for each language manually

After:

Describe what you need
AI generates, validates across platforms, iterates
Push to registry
SDKs auto-publish
Agents validate performance and conformance

Same underlying tools. Different interface. The complexity is still there—you just don't have to manage it.

Technical Architecture

MCP Server (protobuf.ai)

30+ MCP tools
Platform adapters for Protoc, Buf, gRPC, Connect-RPC, Twirp, ScalaPB
Stateless deployment on Cloudflare Workers
OpenAI/Anthropic for natural language processing

Registry (registry.protobuf.ai)

PostgreSQL + pgvector on Supabase
384-dimensional embeddings via OpenAI
GitHub integration for SDK automation
Confluent-compatible API layer

Agents (protobuf-ai-agents)

LangGraph for orchestration
Express API + A2A protocol for agent communication
Multi-region deployment for geographic testing

Make Your Developer Tools Agent-Accessible

We help companies build MCP servers that let AI agents use their products. Whether you're building for protobufs, databases, or any developer tool—the pattern is the same.

Schedule a Demo

What's Next

Impact analysis: "Which services break if I change this?"
Migration automation: Execute migration plans, not just generate them
More language SDKs: Kotlin, Swift, PHP
IDE integration: Schema validation in VS Code, JetBrains

Resources

protobuf.ai — MCP server (open source, MIT)
registry.protobuf.ai — Schema registry
Model Context Protocol — Spec
Buf — Protobuf tooling ecosystem