LLM Tools: Scaling with MCP Architecture (Part 2)

Series Overview

This is part 2 of a four-part series on building production-ready AI agents:

Part 1: Building Your First LLM Agent: From Chatbot to Tool-Using Assistant
Part 2: Scaling LLM Agents with MCP (Model Context Protocol) - This post
Part 3: Securing LLM Agents with Authentication
Part 4: Real-time LLM Responses in Production

I hope you find the code in this series helpful! The complete implementation for this post can be found here and the final code for the project can be found here. Feel free to fork it and adapt it for your own projects.

Note: This project includes comprehensive testing with a carefully configured Jest/SWC setup for TypeScript monorepos. Testing LLM applications with MCP can be quite tricky, so if you fork this project, don’t ignore the valuable testing configuration—it includes solutions for common issues like workspace package mocking, module resolution, and proper test isolation.

Architecture Note: This series implements MCP protocol manually rather than using the official MCP TypeScript SDK for educational purposes and maximum control. You’ll learn exactly how MCP works under the hood, making debugging and customization easier. The patterns shown here can easily be adapted to use the official SDK if preferred.

The repository includes an AI.md file with comprehensive guidance for developers and LLMs who want to modify and extend this code as a starting point for their own projects. It covers architecture patterns, extension points, testing configuration, and production considerations.

Your real estate AI assistant from Part 1 is working great. Users love it. But then success creates new problems.

The marketing team wants their own AI tool for market analysis. The analytics team needs performance reports. The scheduling team wants property showing tools. Everyone wants to build AI features, and they all need access to the same data.

What do you do? Copy your findListings function into every project? That works until you need to update the database schema and fix it in five different places.

There’s a better approach: the Model Context Protocol (MCP). Instead of copying code everywhere, you build tools as standalone services that any AI application can use.

Think microservices, but for AI tools.

What is the MCP Pattern?

The Model Context Protocol establishes a standard way for AI applications to discover, connect to, and use external tools. At its core, MCP defines three key components:

Tool Discovery: MCP servers expose a /tools endpoint that describes what tools are available, their parameters, and what they do. Your AI application can discover tools dynamically without hardcoding them.

Tool Execution: A standardized /tools/call endpoint accepts tool name and parameters, executes the tool, and returns results in a consistent format. This creates a common interface regardless of what the tool actually does.

Protocol Standards: MCP defines the JSON schemas for requests, responses, and error handling. This means any MCP-compliant server can work with any MCP client, creating true interoperability.

The power of MCP isn’t just in the protocol itself, but in the architectural patterns it enables. Instead of tightly coupled function calls, you get loosely coupled services that can evolve independently while maintaining compatibility.

The Problem with Direct Integration

Let’s say your Part 1 assistant worked so well that now you have:

Customer chatbot using findListings() for searches
Marketing dashboard copying the same logic for analysis
Mobile app duplicating tools for agents
Analytics service with yet another copy for reporting

Each team maintains their own version. Change something? You’re updating four codebases. This is exactly what MCP solves.

But here’s the thing: jumping straight from direct function calls to full microservices is a big leap. Let’s take it step by step.

MCP Architecture Overview

Here’s how the complete MCP architecture transforms your system from tightly coupled functions to scalable, reusable services:

MCP Architecture Diagram

The architecture shows our two-phase approach: Internal MCP Patterns prepare your application for scaling by introducing tool registries and conversation history, while External MCP Services extract tools into independent servers that multiple applications can share. The MCP Client acts as the bridge, handling tool discovery and execution across all services.

Section 1: Internal MCP Patterns

Before we extract tools into separate services, let’s introduce MCP concepts within our existing application. This teaches the patterns without the complexity of HTTP and deployment.

Framework Choice: NestJS vs Fastify

Before we dive into the implementation, let’s address an important architectural decision you’ll see throughout this tutorial.

Main Application (NestJS): Our core application uses NestJS because it excels at complex enterprise applications that need:

Dependency injection for managing service relationships
Decorators for clean API definitions and validation
Guards and interceptors for cross-cutting concerns
Modular architecture for large, evolving codebases
Rich ecosystem of enterprise features

MCP Servers (Fastify): Our tool servers use Fastify because MCP servers should be:

Lightweight and fast-starting
Simple HTTP endpoints with minimal overhead
Easy to deploy and scale independently
Focused on a single responsibility

This isn’t about one framework being “better”, it’s about using the right tool for the job. Complex orchestration gets NestJS and simple, focused services get Fastify.

Adding Conversation History

First, let’s make our assistant remember conversations. Real assistants need context from previous messages.

We’ll update our chat endpoint to require a userId and store conversation history:

📁 View chat request DTO on GitHub

// chat-request.dto.ts
export class ChatRequestDto {
  @IsString()
  @IsNotEmpty()
  userId: string;

  @IsString()
  @IsNotEmpty()
  userMessage: string;
}

Now we need a service to manage chat history per user:

📁 View chat history service on GitHub

// chat-history.service.ts
@Injectable()
export class ChatHistoryService {
  // In-memory Map for demo purposes - in production, use a database like PostgreSQL or Redis
  private chatHistory: Map<string, OpenRouterMessage[]> = new Map();

  async saveChatMessage(userId: string, message: OpenRouterMessage): Promise<void> {
    if (!this.chatHistory.has(userId)) {
      this.chatHistory.set(userId, []);
    }

    this.chatHistory.get(userId)!.push(message);
    
    // Keep only last 20 messages for memory management
    // LLMs have token limits - too much history hits those limits and costs more
    const userHistory = this.chatHistory.get(userId)!;
    if (userHistory.length > 20) {
      this.chatHistory.set(userId, userHistory.slice(-20));
    }
  }

  async getChatHistory(userId: string, limit: number = 10): Promise<OpenRouterMessage[]> {
    const userHistory = this.chatHistory.get(userId) || [];
    return userHistory.slice(-limit);
  }
}

Internal MCP-Style Tool Registry

Instead of hardcoding tools in our agents service, let’s create an internal tool registry that mimics MCP patterns:

📁 View tools config on GitHub

// tools-config.ts
export const LISTINGS_TOOLS: MCPTool[] = [
  {
    name: 'findListings',
    description: 'Find property listings based on search criteria',
    inputSchema: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' },
        state: { type: 'string', description: 'State name' },
        minBedrooms: { type: 'number', description: 'Minimum bedrooms' },
        maxPrice: { type: 'number', description: 'Maximum price' },
        status: { 
          type: 'string', 
          enum: ['Active', 'Pending', 'Sold'],
          description: 'Listing status' 
        }
      }
    }
  },
  {
    name: 'sendListingReport',
    description: 'Send email report of property listings',
    inputSchema: {
      type: 'object',
      properties: {
        listingIds: { 
          type: 'array', 
          items: { type: 'string' },
          description: 'Array of listing IDs'
        },
        recipientEmail: { 
          type: 'string', 
          description: 'Email address' 
        }
      },
      required: ['listingIds', 'recipientEmail']
    }
  }
];

Context-Aware Chat Flow

Now our agents service includes conversation history in every LLM call:

📁 View complete agents service on GitHub

// agents.service.ts
async chat(userId: string, userMessage: string): Promise<string> {
  // Get chat history for context
  const chatHistory = await this.chatHistoryService.getChatHistory(userId, 5);
  
  // Save user message
  await this.chatHistoryService.saveChatMessage(userId, {
    role: "user",
    content: userMessage
  });

  // Build messages with history context
  const messages: OpenRouterMessage[] = [
    { role: "system", content: TOOL_SELECTION_PROMPT },
    ...chatHistory, // Include previous conversation
    { role: "user", content: userMessage }
  ];

  // Send to LLM with full context
  const toolResponse = await this.callOpenRouter("moonshotai/kimi-k2", messages, this.tools);

  // Process tool calls and save response...
  const assistantResponse = await this.generateResponse(userId, userMessage, toolCall, toolResult);
  
  await this.chatHistoryService.saveChatMessage(userId, {
    role: "assistant",
    content: assistantResponse
  });

  return assistantResponse;
}

This internal approach gives us MCP-style tool discovery and conversation persistence without HTTP complexity. We can test the patterns, debug easily, and understand the benefits before extracting to external services.

Section 2: External MCP Services

Now let’s extract our tools into standalone MCP servers. This is where the real scaling benefits emerge.

Monorepo Structure

We’ll organize everything as a monorepo with shared packages:

llm-tools/
├── apps/
│   ├── main-app/          # NestJS application
│   ├── mcp-listings/      # Listings MCP server
│   └── mcp-analytics/     # Analytics MCP server
├── packages/
│   ├── shared-types/      # Common TypeScript types
│   └── mcp-client/        # HTTP client for MCP communication

Shared Types Package

First, we extract our types into a shared package that all services can use:

📁 View shared types package on GitHub

// packages/shared-types/src/listings.ts
export interface ListingFilters {
  status?: 'Active' | 'Pending' | 'Sold';
  city?: string;
  state?: string;
  minBedrooms?: number;
  maxPrice?: number;
}

export interface Listing {
  listingId: string;
  address: {
    street: string;
    city: string;
    state: string;
    zip: string;
  };
  price: number;
  bedrooms: number;
  bathrooms: number;
  status: 'Active' | 'Pending' | 'Sold';
}

MCP Listings Server

Our listings server implements the MCP protocol with Fastify:

📁 View MCP listings server on GitHub

// apps/mcp-listings/src/server.ts
import Fastify from 'fastify';
import cors from '@fastify/cors';
import { toolsRoutes } from './routes/tools';

const server = Fastify({ logger: true });

server.register(cors, { origin: true });
server.register(toolsRoutes);

const start = async () => {
  const port = process.env.PORT ? parseInt(process.env.PORT) : 3001;
  await server.listen({ port, host: '0.0.0.0' });
  console.log(`MCP Listings Server running on port ${port}`);
};

start();

The tools routes implement the MCP protocol:

📁 View tools routes on GitHub

// apps/mcp-listings/src/routes/tools.ts
import { LISTINGS_TOOLS } from '../config/tools-config';

export function toolsRoutes(fastify: FastifyInstance) {
  // Tool discovery endpoint
  fastify.get('/tools', async (_, reply) => {
    reply.send(LISTINGS_TOOLS);
  });

  // Tool execution endpoint
  fastify.post('/tools/call', async (request, reply) => {
    const { name, arguments: args } = request.body;

    try {
      let result: any;

      switch (name) {
        case 'findListings':
          result = await findListings(args);
          break;
        case 'sendListingReport':
          result = await sendListingReport(args.listingIds, args.recipientEmail);
          break;
        default:
          reply.code(400).send({ error: `Unknown tool: ${name}` });
          return;
      }

      reply.send({ result });
    } catch (error) {
      reply.code(500).send({
        error: `Tool execution failed: ${error.message}`
      });
    }
  });

  // Health check endpoint
  fastify.get('/health', async (_, reply) => {
    reply.send({
      status: 'ok',
      timestamp: new Date().toISOString(),
      service: 'mcp-listings'
    });
  });
}

MCP Analytics Server

We can add additional functionality with a separate analytics server:

📁 View analytics tools config on GitHub

// apps/mcp-analytics/src/config/tools-config.ts
export const ANALYTICS_TOOLS: MCPTool[] = [
  {
    name: 'getListingMetrics',
    description: 'Get analytics data for specific listings',
    inputSchema: {
      type: 'object',
      properties: {
        listingIds: {
          type: 'array',
          items: { type: 'string' },
          description: 'Array of listing IDs'
        }
      },
      required: ['listingIds']
    }
  },
  {
    name: 'getMarketAnalysis',
    description: 'Get market trends for a specific area',
    inputSchema: {
      type: 'object',
      properties: {
        area: {
          type: 'string',
          description: 'Geographic area (e.g., "Portland, OR")'
        }
      },
      required: ['area']
    }
  }
];

MCP Client Integration

The MCP client is the bridge between your LLM application and the MCP servers. In our architecture, the main NestJS app acts as the MCP client. It handles:

Tool Discovery: Automatically finding what tools are available across all MCP servers
Protocol Translation: Converting OpenRouter tool calls into MCP server requests
Error Handling: Retrying failed requests and gracefully handling server failures

Think of it as your application’s “tool coordinator” - it knows where all the tools live and how to call them.

Our main app becomes an MCP client that discovers and calls tools dynamically:

// packages/mcp-client/src/mcp-client.ts
export class MCPClient {
  constructor(private options: { baseURL: string; timeout?: number; retries?: number }) {}

  async discoverTools(): Promise<MCPTool[]> {
    const response = await axios.get(`${this.options.baseURL}/tools`);
    return response.data;
  }

  async callTool(request: { name: string; arguments: any }): Promise<any> {
    const response = await axios.post(`${this.options.baseURL}/tools/call`, request);
    return response.data;
  }

  async healthCheck(): Promise<boolean> {
    try {
      await axios.get(`${this.options.baseURL}/health`);
      return true;
    } catch {
      return false;
    }
  }
}

The main app discovers tools from multiple MCP servers on startup:

// apps/main-app/src/agents/agents.service.ts
@Injectable()
export class AgentsService implements OnModuleInit {
  private readonly mcpClients: MCPClient[] = [];
  private tools: Tool[] = [];

  constructor(private readonly configService: ConfigService) {
    this.mcpClients = [
      new MCPClient({ baseURL: "http://localhost:3001" }), // Listings
      new MCPClient({ baseURL: "http://localhost:3002" })  // Analytics
    ];
  }

  async onModuleInit() {
    await this.discoverTools();
  }

  private async discoverTools(): Promise<void> {
    const allTools: Tool[] = [];

    // We discover tools at startup so the LLM knows what's available before any user chats.
    // This enables dynamic tool usage - new MCP servers can be added without code changes.
    for (const client of this.mcpClients) {
      try {
        const mcpTools = await client.discoverTools();
        const openRouterTools = mcpTools.map(tool => ({
          type: "function",
          function: {
            name: tool.name,
            description: tool.description,
            parameters: tool.inputSchema
          }
        }));
        allTools.push(...openRouterTools);
      } catch (error) {
        console.warn(`Failed to discover tools: ${error.message}`);
      }
    }

    this.tools = allTools;
  }
}

npm Workspaces Setup

npm workspaces let us manage multiple related packages in a single repository. Instead of having separate git repos for each service, we organize everything as a monorepo where:

Dependencies are hoisted to the root node_modules (faster installs, less disk space)
Shared packages can reference each other with @llm-tools/shared-types
One npm install at the root installs everything
Cross-package development is seamless

We configure the root package.json for easy development:

{
  "workspaces": ["apps/*", "packages/*"],
  "scripts": {
    "dev": "concurrently \"npm run dev:main-app\" \"npm run dev:mcp-listings\" \"npm run dev:mcp-analytics\"",
    "dev:main-app": "npm run start:dev -w apps/main-app",
    "dev:mcp-listings": "npm run dev -w apps/mcp-listings",
    "dev:mcp-analytics": "npm run dev -w apps/mcp-analytics"
  }
}

Now npm run dev starts all services simultaneously.

The Benefits We’ve Achieved

This two-phase approach gives us several key benefits:

Section 1 Benefits: We learned MCP patterns without HTTP complexity, added conversation history, and prepared our architecture for extraction.

Section 2 Benefits: We gained true scalability with independent services, team ownership of different tools, and the ability to develop and deploy services independently.

Scaling Across Teams

Now different teams can own different MCP servers:

Real Estate Data Team owns the listings server
Business Intelligence Team owns the analytics server
Any Team can build AI applications that use both

Each server can be updated, tested, and deployed independently.

Type Safety Across Services

Our shared types package ensures consistency. Without it, each team’s service might define Listing slightly differently. The listings team adds a photos field, but the analytics team doesn’t know about it. The mobile team uses status: boolean while everyone else uses status: string. Before you know it, services can’t talk to each other:

// All services use the same types
import { Listing, ListingFilters } from '@llm-tools/shared-types';

// No schema drift between services - everyone gets the same interface
const listings: Listing[] = await findListings(filters);

When to Choose This Approach

Start with Section 1 (Internal MCP) when you want to:

Add conversation history to your existing AI assistant
Prepare for future scaling without immediate complexity
Learn MCP patterns in a simpler environment

Move to Section 2 (External MCP) when you have:

Multiple teams wanting to build AI features
Tools that could be reused across applications
Need for independent development and deployment
Complex tools that warrant dedicated maintenance

Production Considerations

Real deployments need a few extras beyond our examples:

Service Discovery: In production, you won’t hardcode URLs. Use environment variables or service discovery:

const mcpClients = [
  new MCPClient({ baseURL: process.env.MCP_LISTINGS_URL }),
  new MCPClient({ baseURL: process.env.MCP_ANALYTICS_URL })
];

Error Handling: MCP clients should gracefully handle server failures. In distributed systems, servers go down or become unresponsive. When this happens during a user chat, you don’t want the entire conversation to break. The fallback pattern lets you try multiple servers that might have the same tool, or gracefully degrade functionality:

private async executeTool(toolCall: ToolCall): Promise<any> {
  for (const client of this.mcpClients) {
    try {
      const response = await client.callTool({
        name: toolCall.function.name,
        arguments: JSON.parse(toolCall.function.arguments)
      });
      
      if (response.error) continue; // Try next server
      return response.result;
    } catch (error) {
      continue; // Try next server
    }
  }
  
  throw new Error(`Failed to execute tool on any MCP server`);
}

Health Monitoring: Check MCP server health and exclude failed servers from tool discovery.

What’s Next: From Scalable to Secure to Real-time

We’ve transformed our Part 1 assistant into a scalable MCP architecture with conversation history and multiple tool servers. But production systems need more than just scalability; they need security and responsive user experiences.

In Part 3, we’ll secure our MCP servers and chat endpoints with authentication, protecting your AI tools when you ship to real users.

Then Part 4 will transform the user experience with real-time streaming responses, eliminating the silent waits that make users wonder if anything is happening.

The progression is intentional: build it, scale it, secure it, make it responsive. Each step builds on the previous while solving real problems that emerge as your AI system grows from prototype to production.

Next up: Part 3: Securing LLM Agents with Authentication - Add user management and protect your AI endpoints for real-world deployment.