← Back to jefehz.org

GygaxBot

AI-Powered D&D Session Archival Pipeline

Spitwater Campaign — Session 1: Democracy via Large Artillery
Browse All Sessions →
6
Scenes Extracted
6
AI Illustrations
6
Audio Narrations
<7m
Pipeline Time

How It Works

GygaxBot turns raw Discord D&D session transcripts into illustrated, narrated adventure archives — fully automated.

1
📝

Transcript Capture

Discord bot captures the full session transcript from the game channel

Discord.js
2
🧠

Scene Extraction

Local LLM analyzes the transcript and extracts 4-6 key dramatic scenes

Ollama / Llama 3.1
3
🎨

AI Illustration

Each scene gets an illustration using character reference images for visual consistency

Gemini + Reference Photos
4

TTS Narration

🔊

Dramatic narration via voice cloning from a reference WAV, or high-quality synthesis as fallback

Fish Speech Kokoro TTS
5
🔍

RAG Indexing

Sessions are indexed into a vector database for semantic search across campaign history

ChromaDB

The Party

Character reference images are fed to the AI image generator to maintain visual consistency across all scene illustrations.

Chad

Chad

Centaur Ranger
Frontier wilderness master with his mystical mammoth companion Mammo
Crazy 88

Crazy 88

Warforged Artificer
Brilliant inventor of wild contraptions and steam-powered gadgets
Threx

Threx

Red Dragonborn Warlock
Frontier spirit pactmaker who brings fire and brimstone to the dusty trails

Session 1: Democracy via Large Artillery

The party explores an ancient airship wreck, battles turquoise raptors, discovers a targeting system, and makes a very democratic decision involving a very large cannon.

Raptor Ambush at Observation Dome
Scene 1

Raptor Ambush at Observation Dome

ThrexRickyChad

As the party explores the airship wreck, a sudden burst of light illuminates the observation dome. Three raptors emerge from the shadows, their turquoise scales glistening in the fading light. Threx stands firm against the onslaught, but the team's coordination is put to the test as they work together to take down the attackers.

Airship Observation Dome

Narration Audio
Targeting System Discovery
Scene 2

Targeting System Discovery

Crazy88Threx

As they venture deeper into the wreck, the party stumbles upon a mysterious targeting system. The airship's advanced technology lies dormant, its blue energy sphere pulsating with an otherworldly glow. Crazy88's skilled hands work to repair the malfunctioning circuit board, and the team holds their breath as he rolls a 23.

Airship Observation Dome

Narration Audio
Cannon Firing at Gus's Oasis
Scene 3

Cannon Firing at Gus's Oasis

ChadThrex

The party makes a democratic decision to fire the cannon at Gus's oasis, motivated by his previous extortion attempts. Chad enters the coordinates with precision, and the cannon roars to life as it unleashes a devastating blast that obliterates the oasis. The recoil is so immense that the cannon destroys itself.

Airship Observation Deck

Narration Audio
Night Watch: Threx's Silhouette Encounter
Scene 4

Night Watch: Threx's Silhouette Encounter

Threx

As the night watch begins, Threx takes his turn to gaze out into the darkness. Suddenly, he notices a series of dust silhouettes pointing towards the deeper wreck entrance. A figure emerges from the shadows, gesturing wildly before vanishing into thin air. Was it real or just a mirage?

Airship Wreck

Narration Audio
Rust Rat Capture by Chad's Watch
Scene 5

Rust Rat Capture by Chad's Watch

Chad

As Chad takes his turn on night watch, he's tasked with keeping an eye out for any potential threats. His attention is drawn to a rust rat scurrying across the hangar floor, caught in the faint light of his lantern. The party breathes a collective sigh of relief as they realize there are no immediate dangers lurking in the shadows.

Airship Hangar

Narration Audio
Crazy88's Uneventful Watch
Scene 6

Crazy88's Uneventful Watch

Crazy88

Crazy88 takes his turn on night watch, but it's a quiet and uneventful shift. He stands vigilant, scanning the shadows for any signs of danger, but the only sound is the gentle creaking of metal as the airship adjusts to its new surroundings.

Airship Wreck

Narration Audio

Under the Hood

The full technology stack powering automated D&D session archival.

Scene Extraction — Ollama + Llama 3.1 8B

The raw Discord transcript is sent to a locally-hosted Llama 3.1 8B model running via Ollama. A carefully crafted prompt instructs the model to identify 4-6 key dramatic moments, prioritizing combat, dramatic events, exploration, and atmospheric scenes. The model outputs structured JSON with scene titles, visual descriptions for image generation, narration scripts for TTS, character lists, and locations.

AI Illustration — Google Gemini with Reference Images

Each scene's visual description is enhanced with campaign-specific style tags (western frontier aesthetic, sepia tones, warm sunset lighting) and sent to Gemini's image generation API. The system automatically finds character reference images on disk and includes them in the request, allowing Gemini to maintain visual consistency — Chad always looks like a centaur, Threx always looks like a red dragonborn. Up to 12 reference images can be included per generation. A local Flux.1 Dev backend via ComfyUI is also available as an alternative.

Voice Narration — Fish Speech + Kokoro TTS

Two TTS backends work in tandem. Fish Speech handles voice cloning — drop a narrator reference WAV in the campaign directory and all narrations use that cloned voice for consistency across sessions. When no reference audio exists, Kokoro TTS provides high-quality synthesis with campaign-specific voice profiles. Both run locally on CPU, output 24kHz WAV files, and the pipeline automatically selects the best available backend per campaign.

Semantic Search — ChromaDB + SentenceTransformers

Every session is chunked and indexed into a ChromaDB vector store using sentence transformer embeddings. This enables semantic search across all campaign history — ask "when did the party find the cannon?" and get relevant transcript sections, scene descriptions, and narration from any session. The RAG system powers both the dashboard search and a dedicated API endpoint.

Orchestration — FastAPI + Async Pipeline

The entire pipeline is orchestrated by a FastAPI server. When GygaxBot sends a transcript, the server immediately returns a job ID and processes everything asynchronously: scene extraction (CPU), TTS narration (CPU, parallel), and image generation (GPU, sequential). The bot polls for completion and posts results back to Discord. The dashboard provides a browser-based view of all archived sessions with media playback.