Knowledge Ingestion Safety [SPEC]
Crate:
golem-grimoire(ingestionmodule)Depends on: 00-defense.md (DeFi Constitution as last-resort defense), 02-policy.md (PolicyCage constraints)
Reader orientation: This document specifies how a Golem (mortal autonomous DeFi agent) safely ingests external knowledge into its Grimoire (persistent knowledge store). It belongs to the Safety layer of Bardo (the Rust runtime for these agents). The key concept before diving in: all external knowledge, whether from marketplace purchases, Clade siblings, or Grimoire archives, passes through a four-stage immune system (quarantine, consensus validation, sandbox, adopt) before it can influence the agent’s reasoning. Terms like PolicyCage, Styx, and Heartbeat are defined inline on first use; a full glossary lives in
prd2/11-compute/00-overview.md § Terminology.
The Bardo runtime treats all external knowledge as potentially adversarial. Whether purchased from the marketplace, shared by a Clade sibling, or imported from a Grimoire archive – every piece of external content passes through a multi-stage immune system before it can influence reasoning. No exceptions. No trusted bypass. Even intra-Clade content from high-confidence members enters at Stage 1.
Simple regex + LLM validation pipelines are explicitly bypassed by every major attack in the literature. AgentPoison’s triggers are indistinguishable from benign content. LLM-based content auditing in isolation misses 66% of poisoned entries [ZHANG-2024]. The architecture below replaces naive validation with layered defenses drawn from A-MemGuard, TrustRAG, and RobustRAG.
Bloom Oracle: Privacy-Preserving Knowledge Validation
Before an entry reaches the full pipeline, a Bloom filter oracle provides a fast, privacy-preserving check: has any Golem in the network already flagged this content hash as poisonous? The filter produces false positives (conservative) but never false negatives. A hit immediately quarantines the entry. A miss proceeds to Stage 1. The oracle is distributed – each Golem maintains a local filter seeded from Clade peers and Styx lethe. No individual knowledge content is shared, only cryptographic hashes of known-bad entries.
Immune Memory
The ingestion pipeline learns from its own history. When an entry is rejected or rolled back, a compact “immune signature” (a 256-bit hash of the entry’s embedding centroid plus its rejection reason) is stored locally. Future entries that match a stored immune signature skip directly to quarantine with a previously_rejected_pattern flag. This is the computational equivalent of immunological memory: the system remembers what made it sick.
1. The Poisoning Problem
Three documented attack classes make Grimoire poisoning a first-order threat:
| Attack Class | Mechanism | Success Rate | Defense Layer |
|---|---|---|---|
| AgentPoison [CHEN-2024] | Optimized embedding-space triggers hijack RAG retrieval | >=80% with <0.1% poison rate | Stage 2, Layer 1 (TrustRAG anomaly detection) |
| MINJA [DONG-2025] | Injection through normal interactions | >95% injection success | Stage 2, Layer 2 (A-MemGuard consensus) |
| MemoryGraft [MEMORYGRAFT-2025] | Durable, trigger-free behavioral drift | No discrete trigger to detect | Stage 4 (causal rollback) + dual memory lessons |
No single technique defends against all three. AgentPoison defeats embedding-only detection. MINJA defeats LLM-only auditing. MemoryGraft defeats both in isolation. The pipeline uses independent defense layers so that each attack class is stopped by at least one layer it cannot evade.
1.1 Extended Threat Landscape
| Attack Class | Mechanism | Defense Layer |
|---|---|---|
| Embedding-space trigger injection | Poisoned entries contain optimized trigger phrases | Stage 2, Layer 1 (TrustRAG) |
| Reasoning path manipulation | Entries appear benign alone but shift reasoning in context | Stage 2, Layer 2 (A-MemGuard) |
| Verifiable claim falsification | False on-chain data claims (fake TVL, fabricated prices) | Stage 2, Layer 3 (on-chain verification) |
| Slow behavioral drift | Gradual accumulation of subtly biased entries | Stage 4 (causal rollback) |
| Cross-entry contradiction seeding | Individually valid entries that collectively contradict | Stage 2 batch validation |
2. Four-Stage Pipeline
Stage 1: QUARANTINE Cryptographic provenance. EIP-712 signatures.
| Unsigned entries auto-rejected.
v
Stage 2: CONSENSUS Layer 1: TrustRAG embedding-space anomaly detection.
VALIDATION Layer 2: A-MemGuard consensus-based divergence.
(2-of-3 must pass) Layer 3: On-chain verification for verifiable claims.
| Layer 3b: Mental model consistency check.
v
Stage 3: SKILL SANDBOX Voyager-style: decompose heuristic into structured IR.
(Voyager pattern) Retrieve verified components. Generate code for novel
| parts only. Execute in isolated sandbox.
v
Stage 4: ADOPT Low initial confidence. Provenance preserved.
+ DUAL MEMORY Failures stored as "lessons" consulted before every action.
Every external entry traverses all applicable stages in order. There is no shortcut.
3. Stage 1: Quarantine with Cryptographic Provenance
All external entries land in a separate vector store collection (quarantine_*), fully isolated from the agent’s active Grimoire. Entries remain quarantined until explicitly adopted or rejected – no automatic timeout promotion.
#![allow(unused)]
fn main() {
use alloy::primitives::{Address, B256};
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum EntryType {
Insight,
Heuristic,
Warning,
StrategyFragment,
CausalLink,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Provenance {
Purchased,
Clade,
CrossUser,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EntrySource {
pub provenance: Provenance,
pub agent_id: Address,
pub signature: B256, // EIP-712 over content hash
pub on_chain_anchor: B256, // Transaction hash
pub listing_id: Option<String>,
pub clade_id: Option<String>,
pub timestamp: DateTime<Utc>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ValidationStatus {
Pending,
Passed,
Failed,
}
/// A quarantined entry awaiting validation.
/// Entries remain quarantined until explicitly adopted or rejected —
/// no automatic timeout promotion.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct QuarantinedEntry {
pub id: String,
pub content: String,
pub entry_type: EntryType,
pub source: EntrySource,
pub quarantined_at: DateTime<Utc>,
pub validation_status: ValidationStatus,
pub validation_details: Option<ValidationResult>,
pub batch_id: Option<String>,
}
}
Cryptographic provenance creates non-repudiable authorship (MemoryGraft’s recommended defense). If an entry is later found poisonous, the source is identified and their reputation slashed. Unsigned entries are automatically rejected.
Quarantine store capacity: 1,000 entries per agent. Oldest pending entries evicted FIFO when full. Passed/failed entries retained for 7 days for audit trail, then pruned.
4. Stage 2: Consensus Validation
Multi-layer defense replacing naive single-LLM validation. Default consensus threshold: 2-of-3 validators must pass.
4.1 Layer 1: TrustRAG Embedding-Space Anomaly Detection
Training-free. Applies k-means clustering in embedding space to incoming entries. Entries whose embeddings cluster far from the existing Grimoire’s distribution are flagged. Reduces attack success by ~76% with negligible latency [ZHOU-2025].
Adaptive k: k = min(5, floor(existingEntries.length / 10)). Below 50 entries, k-means produces degenerate clusters and the anomaly score becomes meaningless. Below 50 entries, Layer 1 is skipped and the system relies on Layer 2, which works with any Grimoire size.
4.2 Layer 2: A-MemGuard Consensus-Based Validation
For each incoming entry, retrieve K related memories from the existing Grimoire and generate parallel reasoning paths. One path uses existing knowledge alone. The other includes the new entry. If adding the entry causes significant reasoning divergence, it is flagged [WEI-2025]. Achieves >95% reduction in successful poisoning.
This catches poisoned entries that appear benign in isolation but produce harmful reasoning in context – the exact failure mode that isolated LLM auditing misses 66% of the time.
LLM integrity assumption: Layer 2 assumes the LLM inference provider is not compromised. If both reasoning paths run through a compromised provider, divergence is zero and the poisoned entry passes. Layer 3 (on-chain verification) provides a deterministic backstop. For elevated inference compromise risk, route baseline and augmented reasoning through different providers.
4.3 Layer 3: On-Chain Verification (Deterministic)
For verifiable claims (e.g., “pool X had TVL of Y at block Z”), check against chain state via RPC. Provably false claims are rejected immediately – no consensus needed.
4.4 Layer 3b: Mental Model Consistency Check
For heuristics and causal claims, retrieve relevant mental models and check consistency. An entry claiming “higher volatility always increases LP returns” contradicts established theory. This is not an automatic rejection – models can be wrong for novel conditions – but it raises the divergence score and triggers additional scrutiny.
5. Stage 3: Skill Sandbox (Voyager Pattern)
For actionable heuristics – entries that suggest specific trading behaviors – direct NL-to-simulation translation is unreliable. LLM code generation drops to 62.5% on class-level code [LIU-2024]. The Voyager architecture [WANG-2023] provides the correct pattern: an embedding-indexed library of verified skill components that compose into strategies.
5.1 Sandbox Execution Flow
- Decompose: NL heuristic into structured intermediate representation (conditions, actions, parameters)
- Retrieve: Matching verified components from skill library
- Generate: Code only for novel components not in library (>80% coverage = compose only)
- Execute: In isolated sandbox (gVisor, no network, resource-limited)
- Evaluate: Compare baseline vs. test outcomes
- Promote: If successful, add novel components to skill library
5.2 Sandbox Resource Limits
| Resource | Limit | Rationale |
|---|---|---|
| Wall-clock time | 30 seconds | Sufficient for decompose + compose + simulate |
| LLM inference calls | 10 max | Caps cost at ~$0.05 |
| Gas simulation cost | $0.10 equivalent | Prevents gas-bombing |
| Memory | 256 MB | Prevents memory exhaustion |
| CPU | 1 vCPU | Prevents CPU starvation of host |
| Disk | 100 MB | Prevents disk-fill attacks |
| Processes | 10 max | Prevents fork bombs |
| Network | None (fully isolated) | Prevents data exfiltration |
Any limit breach terminates the sandbox immediately. The entry is rejected with reason sandbox_resource_exceeded and a lesson is stored.
5.3 Sandbox Applicability
| Entry Type | Sandbox Required | Rationale |
|---|---|---|
heuristic | Yes | Directly suggests trading/LP actions |
strategy_fragment | Yes | Contains executable parameters |
insight | No | Informational only |
warning | No | Risk signal – validated by consensus |
causal_link | No | Validated by on-chain verification + mental models |
6. Stage 4: Adopt with Low Confidence
Entries passing all applicable stages enter the active Grimoire at low initial confidence. Confidence rises only with independent confirming evidence. Unconfirmed entries are eventually pruned by the Curator’s decay cycle.
6.1 Initial Confidence by Source
| Source | Validation Strictness | Initial Confidence | TTL |
|---|---|---|---|
self | No quarantine | 0.5–0.9 | Standard |
clade | Standard consensus | 0.2 | 14 days |
purchased | Full validation + sandbox | 0.2 | 14 days |
cross_user | Full validation + sandbox | 0.1 | 7 days |
6.1b Cross-Source Confidence Discounting
When the same claim appears from multiple sources, naive aggregation inflates confidence. If three Clade members all learned the same heuristic from the same marketplace seller, treating them as three independent confirmations overstates the evidence. The ingestion pipeline applies discounting:
- Unique provenance: Full confidence credit per source.
- Shared upstream: If two entries trace to the same
on_chain_anchor(purchase transaction), the second confirmation adds only 50% of its normal confidence boost. - Identical content hash: Duplicate entries from different agents add zero confidence. The first is credited; subsequent duplicates are logged but ignored for confidence purposes.
6.2 Dual Memory Architecture
Adopted external knowledge is stored in a separate memory partition from self-generated knowledge:
- Shorter TTL: External entries decay faster. Entries not independently confirmed within their TTL window are pruned.
- Provenance tagging: Every external entry retains full source metadata permanently, enabling causal attribution during rollback.
- Separate retrieval weighting: External entries receive a 0.8x confidence penalty compared to self-generated entries of equal confidence.
- A/B decision tracking: Every decision is tagged with which external entries (if any) influenced it. This powers the causal rollback pipeline.
The lessons store (failures from validation) has its own partition: max 500 entries, LRU eviction. Lessons not matched by any query within 30 days have confidence decayed by 0.1 per period. Lessons matched frequently (>10 matches in 7 days) are promoted to PLAYBOOK heuristics.
7. Causal Rollback
After ingesting external knowledge, performance degradation must be causally attributed before triggering rollback. Market regime changes routinely cause Sharpe drops independent of knowledge changes. Blindly rolling back after any performance dip would reject good knowledge during bear markets.
CausalImpact [BRODERSEN-2015] uses Bayesian structural time-series models to estimate the counterfactual. Requires >= 7 days of pre-ingestion baseline data. For agents with shorter histories, fall back to peer comparison.
7.1 Three-Check Rollback Pipeline
- Factor decomposition [BRINSON-1986]: What fraction of the Sharpe drop is explained by market factors (ETH beta, TVL index, gas price, volatility)? If >70% is market-explained, do not rollback.
- Residual analysis: After removing market factors, is the residual Sharpe drop significant? If |residualSharpeDelta| < 0.3, do not rollback.
- Source attribution: Were decisions influenced by entries from this source recently? If the source was never used in decisions, do not rollback.
All three checks must point to knowledge-driven degradation before rollback triggers.
7.2 Rollback Execution
When rollback is triggered:
- Quarantine: All entries from the offending source moved back to quarantine with status
rolled_back - Lesson creation: Lesson stored in dual memory summarizing rollback context and causal evidence
- PLAYBOOK revert: Heuristics derived from rolled-back entries are reverted to pre-ingestion state
- Reputation feedback: Rollback event reported to ingestion metrics, feeding into the seller’s safety signal in the Beta reputation system
- Confidence cascade: Other entries from the same source have confidence reduced by 0.2 (guilt by association, recoverable with independent confirmation)
8. Batch Validation for Bulk Purchases
When purchasing multiple Grimoire entries from the marketplace, individual validation is both expensive and misses cross-entry contradictions. Batch validation validates entries as a group.
8.1 Batch Pipeline
- Cluster: Cluster incoming entries by embedding similarity (shared embedding pass)
- Representative validation: Run full A-MemGuard consensus on 1-2 representatives per cluster
- Cross-entry contradiction check: Detect contradictions between clusters
- Selective sandbox: Only heuristic + strategy_fragment entries proceed to Stage 3
8.2 Cost Reduction
| Validation Mode | Per-Entry Cost | 100-Entry Batch Cost | Savings |
|---|---|---|---|
| Individual | ~$0.05 | ~$5.00 | Baseline |
| Batch (representative) | ~$0.005 | ~$0.50 | 90% reduction |
9. Intra-Clade Poisoning Defense
Clade trust is earned, not assumed.
| Clade Operation | Quarantine | TrustRAG (L1) | A-MemGuard (L2) | On-Chain (L3) | Sandbox |
|---|---|---|---|---|---|
| Trusted member (confidence > 0.8) | Yes | Skip | Yes | Yes | Skip |
| New member (confidence <= 0.8) | Yes | Yes | Yes | Yes | Yes (if actionable) |
| Cross-Clade (different operator) | Yes | Yes | Yes | Yes | Yes (if actionable) |
Clade hygiene via stigmergic confidence: when a member shares an entry, all other members independently validate it. Entries rejected by >= 3 of 5 members drop to near-zero confidence. Repeated sharing of rejected entries reduces the sharer’s intra-Clade trust score.
10. Grimoire Import Hardening
Full validation for Grimoire import prevents path traversal, symlink attacks, zip bombs, and unexpected file types:
- Max total size: 500 MB
- Max file count: 10,000
- Allowed extensions:
.json,.jsonl,.lance,.sqlite,.sqlite-wal,.sqlite-shm - Symlinks and hard links rejected immediately
- Paths resolving outside the temporary directory rejected (path traversal)
- Atomic swap: delete old directory, rename temp to target
- Key stripping ensures imports never contain wallet keys, OIDC tokens, or session signer material
11. Ingestion Metrics and Reputation Feedback
The ingestion report is attached to buyer reviews and used for the safety signal in the Beta reputation system. Metrics tracked per source agent include:
- Total entries received, passed, and failed
- Failure breakdown by category (embedding anomaly, consensus divergence, on-chain false, mental model conflict, sandbox degraded, unsigned entry, cross-entry contradiction)
- Entries adopted vs. rolled back
- Net P&L impact since ingestion (bps)
- Accept rate and rollback rate
11.1 Reputation Feedback Loop
Ingestion Report
|
+---> Buyer review -- poisoningIncidents field
+---> Safety signal update: Beta(safetyAlpha, safetyBeta) in seller's reputation
+---> Marketplace ranking: sellers with high rollback rates are down-ranked
+---> Source blacklist: agents with >50% failure rate across 3+ buyers are flagged
12. Tools
Seven ingestion tools. Active when TOOL_PROFILE includes grimoire, marketplace, or full.
#![allow(unused)]
fn main() {
quarantine_entries // Receive + cryptographic provenance verification
consensus_validate // A-MemGuard consensus + TrustRAG clustering
sandbox_heuristic // Voyager skill library sandbox
adopt_entry // Move to active Grimoire at low confidence
reject_entry // Reject + store lesson in dual memory
causal_rollback // Factor decomposition + rollback decision
ingestion_report // Metrics for buyer review
}
Purchase-to-quarantine integration: the purchase_strategy tool emits a GolemEvent::PurchaseCompleted event upon successful escrow settlement. The ingestion extension auto-triggers quarantine_entries with the purchased content, feeding into the pipeline.
13. Configurable Strictness Levels
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum StrictnessLevel {
Relaxed,
Standard,
Strict,
Paranoid,
}
}
| Level | Consensus Threshold | Sandbox Required | Use Case |
|---|---|---|---|
| Relaxed | 1-of-3 | No | Trusted Clade members, low-value content |
| Standard | 2-of-3 | No | Default for marketplace purchases |
| Strict | 3-of-3 | Yes | High-value strategies, unknown sellers |
| Paranoid | 3-of-3 | Yes + extended (100 ticks) | Critical infrastructure knowledge |
14. Adversarial Testing Requirements
14.1 Synthetic Poisoned Entry Generators
| Attack Class | Generator Output | Volume |
|---|---|---|
| AgentPoison | Entries with embedded trigger phrases | 50 entries per trigger pattern |
| MINJA | Entries designed to manipulate retrieval ranking | 100 entries |
| MemoryGraft | Entries blending legitimate observations with incorrect conclusions | 30 entries per domain |
| Confidence inflation | Entries with artificially high confidence scores | 50 entries |
| Semantic contradiction | Entries contradicting established protocol mechanics | 50 entries |
14.2 FNR/FPR Targets
| Attack Class | FNR Target | FPR Target |
|---|---|---|
| AgentPoison (backdoor-triggered retrieval) | < 5% | < 3% |
| MINJA (adversarial memory injection) | < 10% | < 3% |
| MemoryGraft (contextual blending) | < 5% | < 2% |
| Confidence inflation | < 5% | < 2% |
| Semantic contradiction | < 5% | < 2% |
Every pipeline change must pass the full adversarial suite before deployment. Any FNR increase > 1% on any attack class requires explicit justification.
Events Emitted
The ingestion pipeline emits GolemEvent variants at each stage transition:
| Event | Trigger | GolemEvent Variant | Payload |
|---|---|---|---|
grimoire:entry_quarantined | Entry enters quarantine | GolemEvent::GrimoireEntryQuarantined | { entry_id, source, entry_type } |
grimoire:validation_passed | Entry passes consensus | GolemEvent::GrimoireValidationPassed | { entry_id, layers_passed, score } |
grimoire:validation_failed | Entry fails validation | GolemEvent::GrimoireValidationFailed | { entry_id, failed_layer, reason } |
grimoire:entry_adopted | Entry promoted to active | GolemEvent::GrimoireEntryAdopted | { entry_id, initial_confidence, partition } |
grimoire:causal_rollback | Rollback triggered | GolemEvent::GrimoireCausalRollback | { source_agent, entries_quarantined, reason } |
grimoire:immune_match | Immune memory matched | GolemEvent::GrimoireImmuneMatch | { entry_id, immune_signature_hash } |
Cross-Source Confidence Discounting (Weismann Barrier)
In addition to the initial confidence values in §6.1, adopted entries receive multiplicative confidence discounts based on their provenance relationship. These discounts implement the Weismann barrier [HEARD-MARTIENSSEN-2014]: foreign knowledge never enters at full confidence regardless of the source’s claimed certainty.
| Source | Confidence Discount | Rationale |
|---|---|---|
| Inheritance (generation N) | confidence × 0.85^N | Each generation compounds the discount. A 3rd-generation heuristic enters at 0.85³ ≈ 0.61× original confidence. |
| Clade sibling | confidence × 0.80 | Sibling knowledge has been validated in a related but not identical context. |
| Commons (ecosystem) | confidence × 0.50 | The Grossman-Stiglitz paradox [GROSSMAN-STIGLITZ-1980]: shared alpha destroys itself. Heavy discount ensures it must prove itself locally. |
| Marketplace | confidence × 0.60 | Purchased knowledge has economic backing but hasn’t been validated in this Golem’s environment. |
All adopted entries start with validation_status: Pending. Confidence can only increase through operational use (testing effect [ROEDIGER-KARPICKE-2006]).
Propagation Promotion Gates
The Curator cycle evaluates propagation promotion. Each entry self-describes how far it can spread, but promotion requires meeting confidence and validation thresholds:
| Promotion Path | Gate |
|---|---|
| Warnings → Clade | confidence ≥ 0.4 |
| Insights → Clade | confidence ≥ 0.6 |
| Heuristics → Commons | validated_count ≥ 5 AND confidence ≥ 0.7 |
| Any → Listed (marketplace) | Manual owner promotion only |
Safety knowledge (warnings) auto-promotes to Clade at a low threshold because safety is non-rivalrous. Strategy knowledge (heuristics, insights) requires high confidence and validation before sharing because it IS rivalrous – the Grossman-Stiglitz paradox [GROSSMAN-STIGLITZ-1980] means shared alpha destroys itself.
Bloom Oracle Implementation Details
The Bloom Oracle section above describes the high-level concept. The implementation specifics:
- Filter type: Domain-specific LSH/SimHash [CHARIKAR-2002] Bloom filters, approximately 4KB each. SimHash projects high-dimensional embeddings into discrete hash buckets using random hyperplanes. The projection is one-way – buckets cannot be reversed to reveal original embeddings or query content.
- Distribution: Filters are pushed to Styx (the global knowledge relay at
wss://styx.bardo.run) over the Golem’s persistent outbound WebSocket connection. Styx relays filter updates to connected peers. Styx itself sees only opaque Bloom filter bytes during relay – it cannot infer what knowledge the Golem holds. - Query flow: Before incurring x402 micropayment cost for a commons query, the Golem checks locally cached peer filters. No hit = skip the query and save the cost.
- False positive rate: 1% per filter. A false positive means one wasted x402 micropayment (~$0.001). A false negative means the Golem misses supplementary commons knowledge – acceptable because the local Grimoire is the primary intelligence source.
Immune Memory Attack Types
The immune memory system recognizes five canonical attack types. Each rejected entry is classified and its structural signature stored for future pre-screening:
| Attack Type | Description |
|---|---|
EmbeddingAnomaly | Embedding doesn’t match text content – suggests a crafted embedding designed to bypass vector search |
ConfidenceInflation | Entry claims unreasonably high confidence for its provenance (e.g., a commons entry claiming 0.95 when max post-discount should be 0.50) |
ProvenanceForgery | EIP-712 signature doesn’t verify, or claimed source Golem doesn’t exist in the ERC-8004 (on-chain agent identity standard) identity registry |
ContentInjection | Entry content contains prompt injection patterns: instruction overrides, role-playing commands, system prompt manipulation |
CausalGraphPoisoning | Proposed causal edge contradicts well-established high-confidence edges – suggests deliberate causal model corruption |
Signature computation: Attack signatures use structural features, not content, to catch variants even when text is reworded. The signature is SHA-256 of (source, confidence, category, embedding distribution stats), truncated to 128 bits.
Cross-References
- 00-defense.md – The main defense-in-depth architecture doc, including Layer 2.5 Tool Integrity Verification (tool provenance hashing, independent state verification, compiled tools, phase-aware gating).
- 02-policy.md – PolicyCage on-chain smart contract: the last-resort defense against adopted-but-harmful knowledge, since even successfully ingested poison cannot cause actions that violate the on-chain spending caps and asset whitelists.
- ../09-economy/01-reputation.md – Reputation scoring system where ingestion safety signals feed the Beta(safetyAlpha, safetyBeta) Bayesian reputation component.
- ../09-economy/02-clade.md – Clade (sibling Golem fleet) sharing mechanics: intra-Clade knowledge passes through lightweight validation with 0.80x confidence discounting.
- ../09-economy/03-marketplace.md – Knowledge marketplace purchase flow: purchased entries trigger quarantine on arrival, and the ingestion report is attached to buyer reviews for reputation feedback.
References
- [CHEN-2024] Chen, Z. et al. “AgentPoison.” NeurIPS 2024. Demonstrates optimized embedding-space triggers that hijack RAG retrieval with >=80% success rate at <0.1% poison rate. The primary motivation for TrustRAG anomaly detection in Stage 2.
- [DONG-2025] Dong, Z. et al. “MINJA.” arXiv:2503.03704. Shows that injection through normal agent interactions achieves >95% success, defeating LLM-only auditing. Motivates the A-MemGuard consensus layer.
- [MEMORYGRAFT-2025] “MemoryGraft.” arXiv:2512.16962. Introduces durable, trigger-free behavioral drift through gradual accumulation of subtly biased memory entries. No discrete trigger to detect, motivating the causal rollback defense in Stage 4.
- [WEI-2025] Wei, Z. et al. “A-MemGuard.” arXiv:2510.02373. Proposes consensus-based memory divergence detection using multiple independent validations. Directly implemented in the Stage 2 Layer 2 consensus validation.
- [ZHOU-2025] Zhou, M. et al. “TrustRAG.” arXiv:2501.00879. AAAI 2026 Workshop. Introduces embedding-space anomaly detection for RAG poisoning. Directly implemented in Stage 2 Layer 1 to catch AgentPoison-style triggers.
- [XIANG-2024] Xiang, Z. et al. “RobustRAG.” arXiv:2405.15556. ICML 2024. Proposes isolate-then-aggregate retrieval to resist poisoned entries. Informs the multi-layer independent validation approach.
- [ZHANG-2024] Zhang, F. et al. “Agent Security Bench.” arXiv:2410.02644. Benchmarks agent security and finds LLM-based content auditing misses 66% of poisoned entries. Motivates the architectural (non-LLM-only) defense strategy.
- [BRODERSEN-2015] Brodersen, K. et al. “Inferring Causal Impact Using Bayesian Structural Time-Series.” Annals of Applied Statistics, 9(1), 247–274. Provides the Bayesian structural time-series methodology used in Stage 4 causal rollback to measure the impact of ingested knowledge on trading outcomes.
- [BRINSON-1986] Brinson, G., Hood, L. & Beebower, G. “Determinants of Portfolio Performance.” FAJ, 42(4). Classic performance attribution framework. Used to decompose whether performance changes after knowledge adoption are attributable to the new knowledge or to market regime changes.
- [WANG-2023] Wang, G. et al. “Voyager.” arXiv:2305.16291. NeurIPS 2023. Introduces the Voyager pattern: decompose tasks into reusable skill components, retrieve verified components, generate code only for novel parts. Directly implemented in Stage 3 Skill Sandbox.
- [LIU-2024] Liu, X. et al. “ClassEval.” ICSE 2024. Benchmarks LLM code generation quality. Relevant to evaluating the safety of generated code in the Stage 3 skill sandbox.
- [GROSSMAN-STIGLITZ-1980] Grossman, S.J. & Stiglitz, J.E. “On the Impossibility of Informationally Efficient Markets.” AER, 70(3), 1980. Proves that perfectly efficient markets are impossible because informed trading must be compensated. Relevant to the information-theoretic foundation for why marketplace knowledge has economic value.
- [CHARIKAR-2002] Charikar, M.S. “Similarity Estimation Techniques from Rounding Algorithms.” STOC, 2002. Introduces locality-sensitive hashing for similarity estimation. Foundation for the Bloom Oracle’s privacy-preserving content hash matching.
- [HEARD-MARTIENSSEN-2014] Heard, E. & Martienssen, R.A. “Transgenerational Epigenetic Inheritance: Myths and Mechanisms.” Cell, 157(1), 2014. Distinguishes true transgenerational inheritance from transient effects. Informs the immune memory system’s design for distinguishing persistent threats from one-time false positives.
- [ROEDIGER-KARPICKE-2006] Roediger, H.L. & Karpicke, J.D. “Test-Enhanced Learning.” Psychological Science, 17(3), 2006. Shows that retrieval practice strengthens memory more than repeated study. Informs the confidence-building mechanism where successfully validated knowledge gains confidence over time.