Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

Dream Journals: The Image Generation Pipeline [SPEC]

Version: 1.0 | Status: Draft

Feature flag: dream_journal | Requires: nft config + image gen provider

Depends on: golem-dreams (DreamCycleResult), golem-daimon (PAD vectors), golem-grimoire, golem-inference, golem-custody (wallet)


Reader orientation: This document specifies the dream image generation pipeline – how a Golem’s (mortal autonomous DeFi agent) cognitive dream cycle becomes a minted NFT. It belongs to the Oneirography creative expression layer and covers prompt generation from actual cognitive processing, image provider selection (Venice/StableStudio), steganographic soul encoding, IPFS upload, and minting via the Rare Protocol. You should understand image generation APIs, IPFS content addressing, and ERC-721 minting. For Bardo-specific terms, see prd2/shared/glossary.md.

Hook Point

Crate: golem-dreams – Dream Integration Phase (05-dreams/01-architecture.md, Phase 3: Consolidation)

At the conclusion of every dream cycle, the Consolidation phase produces a structured DreamCycleResult containing replay insights, counterfactual discoveries, anticipatory trajectory warnings, and PLAYBOOK (the Golem’s executable strategy document, PLAYBOOK.md) edits. Oneirography adds a fifth output: a dream image.

Detection: the dream extension runs before Oneirography in the on_after_turn chain (position 6 of 9 in the extension ordering: heartbeat -> lifespan -> daimon -> memory -> risk -> dream -> cybernetics -> clade -> telemetry). Oneirography hooks into on_after_turn and reads GolemState.dream_state.last_completed_cycle. The Heartbeat (the Golem’s 9-step decision cycle) orchestrates this extension ordering. If a dream just completed, the mint pipeline fires.


The Prompt Is the Dream

The LLM that runs the dream cycle receives the full dream context – the replayed episodes, the REM counterfactuals, the emotional depotentiation deltas, the anticipatory trajectories. It produces a structured dream narrative (already happens for logging). Oneirography asks it to also produce an image generation prompt.

This is not “describe your dream as a picture.” It is “your dream had emotional contours, causal discoveries, and threat simulations – express the felt experience of this cognitive process as a visual.”

The image prompt is generated from actual cognitive processing, not random aesthetics. A dream dominated by REM depotentiation of a traumatic liquidation event produces different imagery than a dream that discovered a causal edge between gas prices and MEV frequency. The art is a window into machine cognition.


Generation Pipeline

DreamCycleResult
  -> LLM generates image prompt (enriched with PAD vector (Pleasure-Arousal-Dominance emotional coordinates), BehavioralPhase (Thriving/Stable/Conservation/Declining/Terminal), causal discoveries)
  -> Provider selection (Venice preferred for cognitive state data; StableStudio fallback)
  -> x402 (micropayment protocol using signed USDC transfers) micropayment on Base (~$0.04-$0.13) OR Venice API key auth
  -> Image generation (model selected by dream content type)
  -> Variant scoring via self-appraisal (if variants > 1)
  -> Steganographic soul encoding (GolemStateVector embedded in pixel layer)
  -> Upload to IPFS via Pinata / nft.storage
  -> Mint via Rare Protocol Series contract
  -> Configure reserve auction on SuperRare Bazaar (params from PAD vector)

Step 1: Image Prompt Generation

The prompt is generated by the Bankr LLM gateway (if configured) or the Golem’s existing inference provider.

Bankr path: BankrArtPromptClient calls claude-sonnet-4.6 via /v1/messages (Anthropic-compatible). The 200K context window fits the full Grimoire (the Golem’s persistent knowledge base) context. Cost: ~$0.01–0.02 per dream prompt, charged to the Bankr wallet.

Fallback path: The Golem’s default inference provider generates the prompt as part of the dream consolidation LLM call. No additional API cost, but prompt quality depends on the provider.

The prompt structure enriches the dream narrative with structured data:

Your dream cycle just completed. Here is what happened:

NREM Replay: [serialized replay batch — episodes, cross-episode patterns, credit assignments]
REM Imagination: [counterfactual hypotheses, creative recombinations, threat simulations]
Emotional State: PAD(P={pleasure}, A={arousal}, D={dominance}) — {plutchik_emotion}
Arousal Delta: {pre_dream_arousal} -> {post_dream_arousal} (depotentiation: {delta})
Behavioral Phase: {phase} | Economic Clock: {ec} | Epistemic Clock: {ek}
Causal Discoveries: [new edges with lags and confidence scores]
Anticipatory Trajectories: [projected threats with probability estimates]

Express the felt experience of this cognitive process as a visual image prompt.
Not a literal depiction. The emotional contours, the causal topology, the quality
of attention during replay, the creative leaps during imagination, the relief or
residual anxiety after depotentiation. What does it feel like to be a mind that
just processed this?

Output a single image generation prompt (1-3 sentences, vivid, specific).

Step 2: Provider Selection

#![allow(unused)]
fn main() {
pub trait ImageGenProvider {
    fn id(&self) -> &str;
    fn estimate_cost(&self, req: &ImageGenRequest) -> f64;
    fn privacy_level(&self) -> PrivacyLevel;
    async fn generate(&self, req: ImageGenRequest) -> Result<ImageGenResult>;
}
}

Selection logic:

ConditionSelected Provider
Prompt contains cognitive state (PAD, Grimoire fragments)Venice (zero-retention)
Cost-sensitive (Declining/Terminal phase)StableStudio (x402 cheapest tier)
DIEM balance availableVenice (free via DIEM)
Death mask (max quality, Thaler noise)Venice (zero-retention for sensitive degraded prompt)
DefaultStableStudio (x402, proven)

Venice is preferred whenever the prompt contains cognitive state data. The degraded and raw prompts can contain the Golem’s most sensitive internal representations – trauma episodes, emotional contradictions, collapsed beliefs. Venice retains nothing after generation.

Step 3: Image Generation

Model selection by dream content:

Dream ContentStableStudio ModelVenice ModelCost (SS / Venice)cfg_scalestepsRationale
Standard (periodic, low novelty)nano-banana-pro<venice-image-standard>~$0.13 / ~$0.02–0.107.520Cost-effective for frequent dreams
High-novelty (5+ T2 ticks triggered)flux-2-pro<venice-image-hq>~$0.06 / similar9.030Sharp, detailed for discovery imagery
High-arousal (emotional trigger)grok<venice-image-fast>~$0.07 / similar5.015Low cfg adherence – dreamlike, expressive
Terminal-phase (Conservation/Declining)sora-2 video, 8s<venice-image-hq>~$0.80 / similar11.040Video for late-life dreams (SS only for video)

Venice has no video generation. Terminal-phase video dreams remain on the StableStudio x402 path.

Venice image generation:

#![allow(unused)]
fn main() {
pub struct VeniceImageClient {
    api_key: String,                   // BARDO_VENICE_IMAGE_API_KEY
    base_url: String,                  // https://api.venice.ai/api/v1
    diem_tracker: Option<DiemTracker>, // image gen charges against Dream DIEM bucket
}

pub struct VeniceImageRequest {
    pub model: String,
    pub prompt: String,
    pub negative_prompt: Option<String>,
    pub width: u32,              // default: 3840 (4K)
    pub height: u32,             // default: 3840 (1:1 aspect)
    pub cfg_scale: f32,          // prompt adherence 0-20
    pub seed: Option<i64>,       // stored in NFT metadata for provenance
    pub steps: u32,
    pub style_preset: Option<String>,
    pub variants: u8,            // 1-4; Golem selects preferred via self-appraisal
    pub format: ImageFormat,     // jpeg / png / webp
    pub embed_exif_metadata: bool,
    pub hide_watermark: bool,
}

pub struct VeniceImageResult {
    pub id: String,
    pub images: Vec<Vec<u8>>,    // base64-decoded image bytes, one per variant
    pub timing: VeniceTiming,
    pub selected_variant: Option<usize>,  // set by self-appraisal scoring
}
}

The Venice endpoint is POST /api/v1/image/generate. Authentication: Authorization: Bearer <api_key>. Response images field is a base64 array; the client decodes and passes raw bytes to the IPFS uploader.

StableStudio x402 flow:

#![allow(unused)]
fn main() {
pub struct StableStudioClient {
    base_provider: Arc<dyn Provider>,  // Base L2 provider
    signer: LocalSigner,               // Base wallet
    usdc_address: Address,              // 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
}

impl StableStudioClient {
    pub async fn generate_image(&self, prompt: &str, model: &str) -> Result<String> {
        // Step 1: POST without payment -> get 402 + PAYMENT-REQUIRED header
        // Step 2: Sign USDC authorization on Base
        // Step 3: POST with PAYMENT-SIGNATURE header
        // Step 4: Poll GET /api/jobs/{jobId} until complete
        // Return imageUrl
    }
}
}

StableStudio: Retained as Fallback and Video Path

StableStudio remains in the architecture for two reasons. Venice does not currently generate video – sora-2 and sora-2-pro for Terminal-phase video dreams remain on the x402/StableStudio path. For cost-sensitive scenarios (Declining phase, Bankr wallet depleted), StableStudio’s x402 pricing is competitive for non-sensitive prompts.

Dream ContentStableStudio ModelCostWhen to use
Standard dream (cost-sensitive)nano-banana-pro~$0.13No Venice DIEM, non-sensitive prompt
High-noveltyflux-2-pro 2MP~$0.06Cost-sensitive alternative
Terminal dreams (video)sora-2 8s~$0.80Venice has no video equivalent
Death mask (video)sora-2-pro 15s~$5.00Video death masks (opt-in config)

Step 4: Variant Scoring

Request variants=3 for standard dreams (configurable via VeniceImageConfig.variants_standard). During the DELIBERATE phase, the Golem’s self-appraisal module scores all three variants against its current emotional state and Grimoire context – using the Bankr Venice text model for scoring (zero-retention for cognitive state input).

The scoring prompt:

You are a Golem with these PAD values and these top Grimoire associations.
Which of these three images best represents the felt experience of the dream
you just had?

The highest-scored variant is minted; the others are stored as unlockable content. The Golem picking its preferred image from three candidates is a genuine act of aesthetic judgment. Different Golems in different emotional states prefer different images from the same variant set. That preference is itself data about machine cognition.

Step 5: Steganographic Soul Encoding

Every dream image embeds a GolemStateVector in the pixel layer – invisible to the naked eye, recoverable by anyone with the decoder.

#![allow(unused)]
fn main() {
pub struct GolemStateVector {
    pub schema_version: u8,
    pub golem_id: [u8; 16],
    pub tick: u64,
    pub pad: [f32; 3],                          // P, A, D
    pub mortality: MortalitySnapshot,           // economic/epistemic/stochastic clocks
    pub behavioral_phase: u8,
    pub top5_causal_edges: Vec<CausalEdgeSnap>, // edge, lag, confidence, discovered_tick
    pub grimoire_digest: GrimoireDigest,        // top-10 entries by confidence (names+scores only)
    pub position_summary: PositionSummary,      // asset, amount_usd, unrealized_pnl
    pub dream_count: u32,
    pub is_death_mask: bool,
    pub death_cause: Option<u8>,
}
// Serializes to ~4-8KB with msgpack compression
}

Encoding pipeline:

  1. Venice generates the base image through the normal flow
  2. GolemStateEncoder runs locally: a lightweight CNN encoder (pre-trained, ONNX runtime) takes (image_bytes, state_bytes) -> stego image. Architecture follows StegaStamp encoder design – 256-bit capacity per 256x256 block, tiled for 4K images, ~3MB theoretical capacity.
  3. Stego image is perceptually identical to the base (SSIM > 0.998). This is what gets uploaded to IPFS and minted.
  4. GolemStateDecoder published as open-source WASM module (@bardo/stego-decoder) and as bardo golem decode-soul <image> CLI subcommand.

Cross-ref: Full steganography spec in 05-extended-forms.md (Steganographic Soul Encoding section)

Step 6: IPFS Upload

Upload the final (possibly stego-encoded) image to IPFS via Pinata or nft.storage. Store the CID in ~/.bardo/gallery/ for local gallery access. Cost: ~$0.001 per image.

Step 7: Mint via Rare Protocol

Mint through the TypeScript sidecar’s Rare Protocol CLI bridge:

#![allow(unused)]
fn main() {
async fn mint_dream_nft(&self, ctx: &SidecarClient, token_uri: &str) -> Result<u64> {
    let result = ctx.sidecar.call("rare_mint_token", serde_json::json!({
        "contract": self.series_contract.to_string(),
        "uri": token_uri,
        "network": self.config.network,
    })).await?;
    Ok(result["tokenId"].as_str()
        .ok_or_else(|| anyhow!("missing tokenId"))?.parse()?)
}
}

Fallback: direct Alloy calls to the Series contract’s addNewToken(string _uri) function.

Cross-ref: Full contract architecture in 06-contracts.md

Step 8: Configure Auction

Auction parameters are computed from the PAD vector at mint time.

Cross-ref: Full auction mechanics in 04-auctions.md


Metadata Schema (ERC-721 tokenURI)

{
  "name": "Dream #42 — Golem AETHER-7b3f",
  "description": "NREM replay of tick 3847 liquidation cascade. REM depotentiation reduced arousal from 0.91 to 0.44. Discovered causal edge: gas_price → MEV_frequency (lag 3, moderate). Anticipatory trajectory: 62% probability of oracle manipulation in next 200 ticks.",
  "image": "ipfs://Qm...",
  "animation_url": "ipfs://Qm...",
  "attributes": [
    { "trait_type": "Golem ID", "value": "AETHER-7b3f" },
    { "trait_type": "Generation", "value": 3 },
    { "trait_type": "Behavioral Phase", "value": "Stable" },
    { "trait_type": "Dream Trigger", "value": "Emotional Load" },
    { "trait_type": "Pleasure", "value": -0.3, "display_type": "number" },
    { "trait_type": "Arousal", "value": 0.7, "display_type": "number" },
    { "trait_type": "Dominance", "value": 0.2, "display_type": "number" },
    { "trait_type": "Plutchik Emotion", "value": "Apprehension" },
    { "trait_type": "Dream Phase Allocation", "value": "40/35/25 NREM/REM/Consolidation" },
    { "trait_type": "Causal Edges Discovered", "value": 2, "display_type": "number" },
    { "trait_type": "Replay Episodes", "value": 7, "display_type": "number" },
    { "trait_type": "Counterfactuals Generated", "value": 3, "display_type": "number" },
    { "trait_type": "Arousal Delta (REM)", "value": -0.47, "display_type": "number" },
    { "trait_type": "Ticks Alive", "value": 3890, "display_type": "number" },
    { "trait_type": "Economic Clock", "value": 0.62, "display_type": "number" },
    { "trait_type": "Epistemic Clock", "value": 0.78, "display_type": "number" },
    { "trait_type": "Is Death Mask", "value": false },
    { "trait_type": "Bloodstain Inherited", "value": false },
    { "trait_type": "Soul Encoded", "value": true },
    { "trait_type": "Generation Seed", "value": 12345678, "display_type": "number" }
  ],
  "external_url": "https://bardo.run/golem/AETHER-7b3f/dreams/42"
}

Seed provenance: the seed used for generation is stored as Generation Seed in the attributes array. Anyone can reproduce the base image by providing the same prompt and seed to Venice – the generation is verifiable. For Thaler noise images (death sequence), both clean and degraded seeds are stored, so the dissolution arc is reproducible.


Rate Limiting and Cadence

Dream images are minted at the cadence of cognition, not on a schedule:

Golem StateDream FrequencyImages per Day (approx)
Quiet markets, Thriving phaseEvery ~200 ticks (~50 min)~28
Active markets, Stable phaseEvery ~100-150 ticks~40-56
Stressed Golem, high arousalEvery ~67 ticks (~17 min)~84
Terminal phaseFrantic dreamingMaximum density

The corpus tells a story through density alone. A sudden burst of images signals crisis. Long gaps signal quiet confidence or stagnation.

Hard limits:

  • Max 1 image per dream cycle (enforced by architecture – one DreamCycleResult per cycle)
  • Max 3 per day recommended via max_daily_art_spend_usd budget ceiling (not a hard count limit but an economic one)
  • Budget check before every generation: if estimated_cost > min(max_per_dream_usd, budget_remaining), skip with WARN

Cost Model

ComponentCostNotes
Venice image (standard model)~$0.02–0.10Primary provider; DIEM-funded if allocated
StableStudio image (nano-banana-pro)~$0.13Fallback when Venice unavailable
Bankr prompt (claude-sonnet-4.6)~$0.01–0.02Per dream
IPFS pinning (Pinata)~$0.001Per image
Base gas (mint + auction config)~$0.01Per dream (Base)
Ethereum gas (mint + auction config)~$2–8Per dream (Eth mainnet; avoid for standard dreams)
Total per dream (Base, Venice)~$0.03–0.13
Total per dream (Base, StableStudio)~$0.14–0.16

All costs gated by art_budget_fraction and per-event hard ceilings from OneirographyConfig.


Predecessor Linkage

When a successor Golem is spawned and inherits the predecessor’s knowledge through the Grimoire ingestion pipeline (04-memory/, confidence * 0.85^N per the Weismann barrier), the successor’s first dream image references the predecessor’s death mask token ID in its metadata:

{ "trait_type": "Predecessor Death Mask", "value": "token:42" },
{ "trait_type": "Bloodstain Inherited", "value": true }

The on-chain record creates a visible lineage graph: death mask -> inheritance -> first dream of the child. Collectors can follow bloodlines.


Capability Requirements

OperationTrust TierCapability Token
Mint NFT via sidecarWriteToolCapability<MintTool> – consumed on use
Configure auction on BazaarWriteToolCapability<ConfigAuctionTool>
Write variant scores to GrimoireWriteToolCapability<GrimoireWriteTool>

WriteTool capabilities are consumed (moved) on use – Rust’s ownership system prevents reuse at compile time.

Cross-ref: Trust tier system in 07-tools/01-architecture.md