AI-Powered D&D Session Archival Pipeline
Spitwater Campaign — Session 1: Democracy via Large ArtilleryGygaxBot turns raw Discord D&D session transcripts into illustrated, narrated adventure archives — fully automated.
Discord bot captures the full session transcript from the game channel
Discord.jsLocal LLM analyzes the transcript and extracts 4-6 key dramatic scenes
Ollama / Llama 3.1Each scene gets an illustration using character reference images for visual consistency
Gemini + Reference PhotosDramatic narration via voice cloning from a reference WAV, or high-quality synthesis as fallback
Fish Speech Kokoro TTSSessions are indexed into a vector database for semantic search across campaign history
ChromaDBCharacter reference images are fed to the AI image generator to maintain visual consistency across all scene illustrations.
The party explores an ancient airship wreck, battles turquoise raptors, discovers a targeting system, and makes a very democratic decision involving a very large cannon.
As the party explores the airship wreck, a sudden burst of light illuminates the observation dome. Three raptors emerge from the shadows, their turquoise scales glistening in the fading light. Threx stands firm against the onslaught, but the team's coordination is put to the test as they work together to take down the attackers.
Airship Observation Dome
As they venture deeper into the wreck, the party stumbles upon a mysterious targeting system. The airship's advanced technology lies dormant, its blue energy sphere pulsating with an otherworldly glow. Crazy88's skilled hands work to repair the malfunctioning circuit board, and the team holds their breath as he rolls a 23.
Airship Observation Dome
The party makes a democratic decision to fire the cannon at Gus's oasis, motivated by his previous extortion attempts. Chad enters the coordinates with precision, and the cannon roars to life as it unleashes a devastating blast that obliterates the oasis. The recoil is so immense that the cannon destroys itself.
Airship Observation Deck
As the night watch begins, Threx takes his turn to gaze out into the darkness. Suddenly, he notices a series of dust silhouettes pointing towards the deeper wreck entrance. A figure emerges from the shadows, gesturing wildly before vanishing into thin air. Was it real or just a mirage?
Airship Wreck
As Chad takes his turn on night watch, he's tasked with keeping an eye out for any potential threats. His attention is drawn to a rust rat scurrying across the hangar floor, caught in the faint light of his lantern. The party breathes a collective sigh of relief as they realize there are no immediate dangers lurking in the shadows.
Airship Hangar
Crazy88 takes his turn on night watch, but it's a quiet and uneventful shift. He stands vigilant, scanning the shadows for any signs of danger, but the only sound is the gentle creaking of metal as the airship adjusts to its new surroundings.
Airship Wreck
The full technology stack powering automated D&D session archival.
The raw Discord transcript is sent to a locally-hosted Llama 3.1 8B model running via Ollama. A carefully crafted prompt instructs the model to identify 4-6 key dramatic moments, prioritizing combat, dramatic events, exploration, and atmospheric scenes. The model outputs structured JSON with scene titles, visual descriptions for image generation, narration scripts for TTS, character lists, and locations.
Each scene's visual description is enhanced with campaign-specific style tags (western frontier aesthetic, sepia tones, warm sunset lighting) and sent to Gemini's image generation API. The system automatically finds character reference images on disk and includes them in the request, allowing Gemini to maintain visual consistency — Chad always looks like a centaur, Threx always looks like a red dragonborn. Up to 12 reference images can be included per generation. A local Flux.1 Dev backend via ComfyUI is also available as an alternative.
Two TTS backends work in tandem. Fish Speech handles voice cloning — drop a narrator reference WAV in the campaign directory and all narrations use that cloned voice for consistency across sessions. When no reference audio exists, Kokoro TTS provides high-quality synthesis with campaign-specific voice profiles. Both run locally on CPU, output 24kHz WAV files, and the pipeline automatically selects the best available backend per campaign.
Every session is chunked and indexed into a ChromaDB vector store using sentence transformer embeddings. This enables semantic search across all campaign history — ask "when did the party find the cannon?" and get relevant transcript sections, scene descriptions, and narration from any session. The RAG system powers both the dashboard search and a dedicated API endpoint.
The entire pipeline is orchestrated by a FastAPI server. When GygaxBot sends a transcript, the server immediately returns a job ID and processes everything asynchronously: scene extraction (CPU), TTS narration (CPU, parallel), and image generation (GPU, sequential). The bot polls for completion and posts results back to Discord. The dashboard provides a browser-based view of all archived sessions with media playback.