Most teams pick RAG or MCP before they have decided what their AI system actually does. The choice should run the other way. The architecture answers three questions about the system. What it reads, what it acts on, and how it answers. Once those answers are honest, the pattern usually picks itself, and the remaining work is implementation, not selection.
Three questions decide most of what an AI system will look like. What it reads. What it acts on. How it answers. The RAG-or-MCP debate disappears once they are answered honestly, and what is left is a small set of patterns that fit the small set of shapes most real systems take. The five use cases that follow show what the picks look like when the framework is applied.
§01The three questions that pick the pattern
Before any architecture diagram, three questions decide most of what the system will look like.
What does it read? Static corpora that change weekly or monthly are one thing. Live state that changes by the second is another. Tickets, deploys, billing balances, file contents are not corpora. Treating them like one is the most common architectural mistake people make when they reach for RAG by reflex.
What does it do? A system that only answers questions is a different system from one that creates issues, runs queries, or edits files. Mutations require an interface to the system being mutated. Vector stores cannot mutate anything. MCP servers can.
How does it answer? Single-turn extractive answers (give me the policy on refunds) are different from multi-step agentic ones (figure out which customer is affected, draft a credit memo, file a Linear ticket). The first wants a tight retrieval pipeline. The second wants the model to compose tools.
The three answers together pick the pattern. The five use cases below show what the picks look like.
§02Use case · customer-facing documentation Q&A
Shape: a help center, a developer docs site, a knowledge base. Tens of thousands of pages, mostly static, with content that updates on a weekly or monthly cadence. Questions are predictable in shape, things like "how do I do X with your API" or "what does error Y mean." Answers are extractive. Latency must be tight, because nobody waits eight seconds for a documentation answer.
Pattern: pure RAG.
This is the case RAG was built for. The work goes into the retrieval pipeline. Embed the corpus with a model that handles your domain vocabulary. Chunk it at a size that preserves whole concepts (most teams default to 512 tokens and quietly regret it for technical content). Add a reranker on top of the vector store, like bge-reranker or Cohere's, both worth their cost. Tune over months. Build evals against real user questions and watch precision-at-k stay above 0.8.
What you do not need here: an agent, a tool-calling loop, or MCP. The model gets retrieved chunks plus the question, generates an answer, and you are done in one round-trip. The cost is low, the latency is tight, the quality holds up, and the system is legible.
The trap is reaching for an agent because the agent discourse made it sound mandatory. It is not. A well-tuned RAG pipeline on documentation will outperform an agent that has access to a search_docs tool, because the agent will sometimes choose not to search, sometimes search badly, and sometimes spin in a loop.
§03Use case · internal company knowledge inside Claude UI
Shape: a company canon. Mission, strategy, competitors, brand voice, internal playbooks. Lives in your own product, not in Notion or Drive. Employees use Claude UI day-to-day and want company context available there. Access has to be identity-scoped, because what an account exec can see is not what an engineer can see.
Pattern: custom MCP server.
This is one of the cases where the RAG-or-MCP framing collapses immediately. There is no Anthropic-built connector that fits, because the canon is proprietary. There is no useful version of "stuff documents into a project," because the canon is structured (pages with relationships, not a pile of files). The only path Claude UI offers for live, identity-scoped access to a custom system is a custom MCP server.
The MCP surface should be small and named after the work. Three to five tools, not fifteen. Good shapes: search_canon for keyword and semantic lookup, get_canon_page for retrieval by ID, list_canon_sections for navigation, get_brand_voice for the specific high-traffic asset that has its own page. The model picks tools by their names, so vague names produce vague behavior.
The implementation pain is not the protocol. It is identity-scoped OAuth, server hosting that Claude UI can actually reach, and the workspace-level install that pushes the connector to every employee instead of forty manual setups. The playbook on building a custom MCP covers the mechanics.
§04Use case · live operations assistant
Shape: an agent for SRE or support work. Read deploy status, read ticket queues, read database rows, open or close incidents, write a postmortem stub, run a SQL query against a read replica. Everything the agent looks at is live. Almost everything it does is a mutation.
Pattern: MCP only.
RAG is the wrong shape here for a reason that does not show up in framework comparisons. Vector stores are snapshots. The moment a ticket changes status or a deploy goes red, every embedding tied to it is wrong. You can rebuild embeddings on a schedule, but the schedule is always slower than the question. By the time the agent retrieves "what is the deploy status," the embedding is from ten minutes ago and the system has moved.
Live MCP servers do not have this problem because they read the source of truth each time. Linear, Datadog, GitHub, Postgres, PagerDuty, all of them expose live state through MCP. The agent strings them together. The cost is more round-trips per task. The benefit is correctness on live data and the ability to actually do things in those systems.
The architectural pattern worth borrowing from this case: when in doubt about freshness, default to MCP. Wrap a vector store only when the corpus is genuinely static.
§05Use case · research-grade corpus search
Shape: a large, mostly static corpus, like legal filings, scientific papers, financial filings, or an internal compliance archive, where answers are not single extractive lookups. A lawyer wants to find every case in the corpus that touches a doctrine, then synthesize across them. A researcher wants to triangulate a result against five papers. Single-turn RAG misses the iterative work.
Pattern: RAG behind an MCP tool.
The retrieval pipeline is built the way you would build any RAG system. The difference is that the corpus is exposed to the agent as one or two well-named MCP tools (search_case_law, get_case_text) instead of being implicitly stuffed into a single prompt. The agent decides when to retrieve and how many times. If the first query misses, it can rephrase. If it needs to combine three retrievals, it can. If it needs to retrieve plus look at something external, it can do that too.
This is the pattern the primer called the honest answer for most production AI products. The reason is most visible in research workflows. The question often is not "find me the answer" but "help me search this space." The model needs the option to retrieve more than once, and to combine retrievals with other moves. RAG behind an MCP tool gives it both.
The cost is one extra layer of indirection and a small accuracy hit from giving the model the choice of when to call the tool (it sometimes will not call it when it should). The fix is in naming and description: tools with clear description fields that anchor the agent's instinct. "Search the case-law corpus before answering questions about legal doctrine" beats search_corpus(query: string) every time.
§06Use case · mixed product knowledge plus live customer state
Shape: a customer-support assistant. Needs the static product knowledge (features, docs, policies, troubleshooting trees). Needs the live customer state (subscription tier, recent tickets, billing status, account flags). The two sets of data have different access patterns and different freshness requirements.
Pattern: both layers, separated by purpose.
The product knowledge goes into a RAG pipeline exposed as a search_product_docs MCP tool. The customer state stays behind native MCP tools that hit the live systems: get_customer, get_recent_tickets, get_subscription, get_billing_status. The agent composes across them. Pull the docs for a refund policy, then check whether this customer's subscription tier is actually eligible, then draft the response.
This is the cleanest version of "use both." The architecture is honest about which data has which shape. Product knowledge is genuinely static and rewards retrieval tuning. Customer state is genuinely live and rewards direct queries. Putting customer data into a vector store would be a category error. Putting product docs behind a SQL query would be slower and less accurate than retrieval. Each piece sits where it belongs.
The systems that get this wrong tend to put everything behind a single "knowledge" abstraction, then spend the next quarter explaining to engineering why the assistant keeps quoting yesterday's subscription state.
◆ pull quote
“Most AI systems do not need to pick. They need an architecture honest enough to put the right data behind the right pattern.”
§07The questions to ask before you write any code
Read what the system actually does, not what it sounds like in the design doc. Then walk the three questions:
- What does it read? List the data sources. Mark each one static or live. Anything that updates more than once an hour is live for practical purposes.
- What does it do? List the actions. Anything that mutates a system anywhere belongs behind MCP, not RAG.
- How does it answer? Single-turn extractive or multi-step agentic. Be honest. Most teams overestimate how agentic their use case is and end up with an agent that should have been a retrieval pipeline.
The pattern follows. Static plus extractive plus tight latency wants pure RAG. Live plus actions wants pure MCP. Anything else, including everything most production AI systems become once they grow up, wants both, with the architecture making the seams explicit.
The vs framing is a trap, because it forces a single answer for a system that probably needs more than one. Replace it with the three questions, and the seams take care of themselves.
- 01Pure RAG
- ▸large static corpus, extractive answers
- ▸tight latency, predictable queries
- ▸tune chunking, embeddings, reranker
- ▸e.g. customer docs, FAQ assistants
- 02Custom MCP
- ▸structured data in your own app
- ▸identity-scoped, multi-user access
- ▸three to five well-named tools, not fifteen
- ▸e.g. internal canon, live ops agents
- 03RAG behind an MCP tool
- ▸large static corpus plus agentic workflow
- ▸model needs to retrieve more than once
- ▸tool description anchors agent behavior
- ▸e.g. legal corpus, research synthesis
- 04Both layers, separated
- ▸static docs via RAG-as-MCP-tool
- ▸live state via native MCP servers
- ▸architecture makes the seams explicit
- ▸e.g. customer support, finance ops