Building an Insights AI Architecture

Framework for building an Insights AI architecture

Some enterprises that Real Story Group advises are finally moving beyond the pilot stage with Agent AI. They have real setups running in production, or they're getting very close. But the main issue stalling these rollouts is almost always architectural rather than related to the underlying LLM or prompt engineering.

After my post on the context layer , the most common question I got was asking where that layer actually sits in the broader stack. Many teams are struggling to figure out the division of labor. Many often jump straight to the agent-runtime and start tweaking prompts. On the other hand, the middle layer, i.e., the plumbing that should actually be preparing the evidence, is either thin or completely missing.

Three layers

At the bottom of the stack sits Layer 1: Signals and source systems. This is where your raw inputs live. These could be trending conversations pulled from social media, creative quality scores from another system, or your historical campaign data. Vendors operating at this tier are highly skilled at collecting and scoring data within their specific domains. That isolation is useful for localized reporting, but it’s not very helpful for decision support.

Multi-layered Insights AI Architecture

Figure: Multi-layered Insights AI Architecture

Layer 2: Evidence preparation and contextualization is the support layer that connects signals from your raw data with your execution tools. Layer 2 is where the biggest gap lies today. Rather than just passing mindlessly to an LLM, this layer actively integrates sources. It normalizes ad metrics, scores trend strength against brand fit, checks for legal flags (e.g., ensuring a marketing brief doesn't unknowingly recommend restricted ingredients), attaches specific source provenance, and finally outputs a structured evidence packet.

The most critical distinction is that this middle layer focuses more on securely preparing the evidence than on generating the final insight. In our work at RSG, we see architectures fail constantly because they skip this preparation step and feed raw, uncontextualized data straight to the top layer, which then misdirects or hallucinates with extreme confidence.

Up top sits Layer 3: Decision support and execution. Layer 3 is where your LLMs, templates, and workflow logic take that carefully prepared evidence and synthesize it into concrete outputs, such as drafting a brand-safe influencer brief or recommending a media-optimization step. Layer 3 explains the reasons and orchestrates various steps. It should never be involved in making up metric definitions, guessing if an audience trend is still current, or deciding what legal policy applies. That work firmly belongs much earlier in the stack.

Where the context model fits

If you look closely at Layer 2, that is exactly where the four context components belong: Semantic (what things mean), State (what things are right now/freshness), Policy (what is permitted), and Governance (who actually maintains all this).

You can split or merge these depending on your stack, but the names don't matter. What matters is that someone actually owns them, and the system checks them before acting. If you skip this, your architecture diagram might look plausible, but the deployment will fail. We've seen this happen enough times that it shouldn't be a surprise anymore.

The demo trap

I totally get the temptation to build this backward; that is, from the top down. Layer 3 is highly visible. You can spin up a chat interface, produce polished output, and impress everyone in a conference room in less than a week. At RSG, we regularly see this type of demo presented by vendors to our corporate members.

Layer 2, on the other hand, can feel like opaque data plumbing. It's taxonomy work and validation logic. You can't really demo governance mechanics to executives, so it naturally gets deferred in the work plan.

In a couple of recent engagements with large enterprise teams, the firm and its consultants reached the same failure point from opposite ends. One team had great signal sources but terrible normalization and zero policy enforcement. Another team tried to skip the plumbing entirely and rely on live API calls for context, only to discover that API timeouts and reliability issues were major problems. The demos for both approaches had initially looked incredibly smart, but in reality, they proved to be a disappointing mess.

Can't you just buy your way out

A very common mistake here is assuming you can just buy a single "insights engine" that handles all of this end-to-end. That product simply doesn't exist, at least not yet. The market is highly componentized, and vendors who claim otherwise are often selling a strong Layer 3 built on a shallow Layer 2.

In reality, your source landscape at Layer 1 today is heavily driven by vendor licenses, so there is a buy (or “already bought”) story here. Layer 2, however, leans heavily toward a "buy-and-extend" approach. You might buy some initial data integration tooling, but you still have to do some meaningful internal building around your own normalization logic and governance constraints. Layer 3 is then applied primarily to your specific templates and use cases, pulling in platform support wherever it makes sense.

We know from hard experience that you will confront trade-offs. Stacking vendor tools gets you moving fast. Owning an internal middle layer gets you some control. But if you actually care about auditability and avoiding lock-in, Layer 2 can never be an afterthought.

A quick test for vendors

The evaluation list is long, but start by asking a vendor these five things:

There are many vendor (buy) and consultancy/integrator (build) considerations to evaluate on this journey. But, the next time a vendor shows you an agent demo, ask them these five basic questions:

Where exactly are metric definitions governed, and who owns those definitions? How is data freshness ensured, represented, and enforced at runtime? Do the policy checks execute before or after text generation? Can you trace every recommendation back to the source evidence across multiple systems? If one source is stale or missing, does the system refuse to answer gracefully, or does it improvise?

If you receive vague answers, you’re witnessing a Layer 3 demo without a real Layer 2 under it. Better to find out now than six months into a failed implementation.

What you should do

The MarTech marketplace is flooded with agent claims focusing entirely on what the top layer can generate. To be fair, that’s probably also because we’re all focused on the experience tier. Real enterprise outcomes, however, hinge on what the middle layer can guarantee.

The article describes a 3-layer architecture. In your case, you may end up with a different number of layers. That’s okay. Just make sure you do indeed create an overall architecture, with an explicit division of labor. However, your boxes and arrows pencil out, make sure you build the middle layer with actual discipline, without letting the demo layer substitute for the required plumbing.

If you work at an end-user enterprise plotting AI and Agentic architectures, reach out. RSG rapidly helps teams distinguish what to buy, what to build, and what to govern before they scale.

Other Agent AI for Marketing posts