Experiment
Building Daggerheart Forge: Designing an Agentic AI System for Human-Centered Game Mastering
Daggerheart Forge began as a personal experiment to reduce the operational overhead of Game Mastering while preserving creative control. The project evolved into a larger exploration of agentic AI systems, structured validation, RAG pipelines, and human-centered AI orchestration for narrative content generation.
Why This Exists
I have a demanding full-time role, a family, and limited time for campaign preparation. Like many Game Masters, I found myself wanting to create richer content for my players while struggling to consistently find the time and mental energy required to build everything manually.
The original problem was not creativity. I already had the ideas.
The problem was operational overhead.
I wanted a way to rapidly generate campaign support material, maintain consistency with Daggerheart rules and guidelines, reduce repetitive prep work, preserve creative control, reuse and organize previously created content, and support improvisation during live sessions.
Most importantly, I did not want AI replacing the creative role of the Game Master. The goal was always augmentation, not automation.
Daggerheart Forge became an opportunity to explore what a human-centered AI system might look like when the intent was not replacing creativity, but supporting it.
The Original Assumption
The first version of the project was based on a relatively common assumption in AI experimentation: a sufficiently detailed prompt combined with a capable language model should be able to generate usable TTRPG content.
Initially, the system was extremely simple:
- local models running through Ollama
- basic generation scripts
- a single generation agent
- no structured validation
- manual review and cleanup
The idea was straightforward: write a structured prompt, generate adversaries, environments, or equipment, and use the results in game sessions.
At first, this seemed promising. Then the outputs started failing in predictable ways.
Early Failure Modes
Inconsistent Formatting
Outputs varied wildly in structure and completeness. Some content resembled usable game material. Other outputs became long instructional essays or incoherent mechanical descriptions.
D&D Contamination
Because Daggerheart is relatively new and Dungeons & Dragons dominates online TTRPG training data, the models frequently introduced D&D terminology, incompatible mechanics, incorrect rule structures, and unrelated gameplay assumptions.
This became one of the most persistent problems in the system.
Poor Mechanical Balance
Generated adversaries frequently contained wildly inflated values, unusable action economies, contradictory mechanics, and abilities that violated the intended design philosophy of the game. Some outputs were technically formatted correctly but completely unplayable.
Narrative Weakness
Even when the mechanics were functional, the content often lacked narrative usefulness. The generated material felt generic, instructional, emotionally flat, and disconnected from the campaign world.
The content technically existed, but it did not feel alive.
The Turning Point
The project changed significantly when I realized the problem was not simply model quality. The problem was architectural.
I was asking a single model to understand intent, generate content, validate rules, maintain lore consistency, preserve narrative quality, avoid contamination, and self-correct errors.
That was too much responsibility for a single generation layer.
The project stopped improving when the model was asked to “be smarter,” and started improving when the system became more structured.
From Prompting to Orchestration
The architecture evolved from a single-generation workflow into a multi-agent orchestration system with specialized responsibilities. Instead of one model trying to do everything, the system became a collaborative pipeline.

GM Intent ↓ Structured Prompt + Contracts ↓ Specialized Generator Agents ├── Adversary Generator ├── Environment Generator ├── Equipment Generator └── NPC Generator ↓ Validation Layer ├── Rules Keeper ├── Lore Keeper ├── Content Repair Agent └── Contamination Sentinel ↓ RAG + Semantic Retrieval ↓ Eval Harness ↓ Playable Content

Specialized Generator Agents
One of the most important architectural decisions was separating generators by content type.
Instead of a single content bot, the system introduced focused agents for adversary generation, environment generation, equipment generation, and NPC generation.
This added complexity, but it improved output consistency, iteration speed, validation accuracy, and narrative quality. Each generator could specialize around role expectations, balance constraints, formatting contracts, narrative patterns, and gameplay intent.

Validation as a First-Class System
The project improved substantially once validation became explicit rather than implied.
The system introduced deterministic validation, contracts, canonical enums, range checking, contamination detection, and structured repair workflows.
This was necessary because TTRPG systems are not purely binary. Many rules exist inside ranges, relationships, and gameplay expectations rather than strict yes/no correctness.
The solution became deterministic enforcement for structure and safety, paired with advisory review for narrative and experiential quality.

Keeper Agents and Editorial Oversight
The system eventually evolved into something closer to a board of editors than a simple generation workflow.
Rules Keeper
The Rules Keeper validated mechanical compliance, ranges, canonical structures, and gameplay consistency.
Lore Keeper
The Lore Keeper validated world consistency, narrative continuity, and semantic alignment.
Content Repair Agent
The Content Repair Agent handled schema repair, cleanup, formatting correction, and contract compliance adjustments.
This separation of responsibilities dramatically improved output quality.

RAG and Semantic Grounding
Retrieval-Augmented Generation became increasingly important as the project matured.
RAG was used for rules retrieval, semantic grounding, balance guidance, example retrieval, lore consistency, and gameplay pattern reinforcement.
Without grounding, the models drifted too easily into generic fantasy content, unrelated systems, and mechanically unstable designs.
The retrieval layer helped reinforce both system identity and gameplay philosophy.
Low Cognitive Load Design
One of the project’s core design goals became reducing cognitive load for Game Masters.
A good adversary or environment was not simply correct. It needed to fit naturally into the world, be quickly scannable, support improvisation, provide narrative prompts, and avoid unnecessary complexity.
Game Masters already juggle pacing, narrative, player engagement, rules interpretation, improvisation, and emotional tone. Dense or confusing content increases operational friction during live play.
Good Game Master tools should reduce cognitive load, not replace creativity.

The Tension Between Compliance and Fun
One of the most interesting lessons in the project involved the tension between strict rules compliance and experiential quality.
A small deviation from ideal balance was often acceptable if the encounter was memorable, the narrative supported it, and the scene created interesting decisions.
But inconsistency was destructive. The problem was never slight variance. The problem was unpredictability.
This reinforced an important design lesson: systems do not need perfect rigidity. They need understandable boundaries.
Local Models vs API Models
The project initially relied heavily on local models through Ollama using Mistral variants.
Local models were extremely valuable for experimentation, learning, rapid iteration, understanding system behavior, and architecture exploration. They helped expose validation problems, orchestration needs, prompt limitations, and structural weaknesses.
Eventually, API models significantly improved output quality and generation speed.
The move toward OpenAI API models did not eliminate the need for structure. It reinforced it. Better models still required contracts, retrieval, validation, orchestration, and explicit intent capture.
The Moment It Became Playable
The project finally started feeling genuinely usable once several systems converged: contracts, eval harnesses, RAG grounding, validation layers, specialized generators, and advisory review.
At that point, the generated content stopped feeling random and began feeling intentionally shaped.
Outputs became narratively coherent, mechanically usable, easier to improvise with, more aligned with campaign tone, and significantly lower effort to use in live sessions.
That was the moment the project shifted from interesting experiment to practical tool.
What Still Needs Improvement
Cross-System Relationships
Adversaries and environments still need deeper interconnected behavior. The long-term goal is stronger systemic relationships between world state, factions, encounters, environments, and narrative consequences.
Campaign Knowledge Management
The project increasingly wants to become a Game Master operating system, campaign notebook, reusable content catalog, and structured worldbuilding system.
This expands the challenge from generation into knowledge architecture, retrieval design, and continuity management.
What I Would Do Differently
If restarting today, I would focus on one content type first, establish contracts earlier, build validation before expansion, avoid overloading a single generation agent, and implement retrieval and eval systems sooner.
Trying to support adversaries, equipment, and environments all at once created unnecessary complexity early in the project.
Broader Lessons
Daggerheart Forge ultimately became less about tabletop gaming and more about designing AI systems responsibly.
The project reinforced several ideas: AI systems need architecture, prompting alone is insufficient, intent capture matters, validation matters, role separation matters, retrieval matters, and human-centered constraints matter.
Most importantly, creativity does not disappear when structure is introduced. In many cases, thoughtful structure creates more room for creativity.
As both a father and a Game Master, the project helped me create more meaningful experiences with my family while also teaching me more about how AI systems succeed or fail in real-world environments.
The goal was never automation for its own sake. It was creating more room for storytelling.
Related Writing
Related essays and follow-up notes will be added as the Daggerheart Forge work continues.
