Files
mono/artdag/core/docs/L1_STORAGE.md
giles 1a74d811f7 Incorporate art-dag-mono repo into artdag/ subfolder
Merges full history from art-dag/mono.git into the monorepo
under the artdag/ directory. Contains: core (DAG engine),
l1 (Celery rendering server), l2 (ActivityPub registry),
common (shared templates/middleware), client (CLI), test (e2e).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

git-subtree-dir: artdag
git-subtree-mainline: 1a179de547
git-subtree-split: 4c2e716558
2026-02-27 09:07:23 +00:00

4.4 KiB

L1 Distributed Storage Architecture

This document describes how data is stored when running artdag on L1 (the distributed rendering layer).

Overview

L1 uses four storage systems working together:

System Purpose Data Stored
Local Cache Hot storage (fast access) Media files, plans, analysis
IPFS Durable content-addressed storage All media outputs
Redis Coordination & indexes Claims, mappings, run status
PostgreSQL Metadata & ownership User data, provenance

Storage Flow

When a step executes on L1:

1. Executor produces output file
2. Store in local cache (fast)
3. Compute content_hash = SHA3-256(file)
4. Upload to IPFS → get ipfs_cid
5. Update indexes:
   - content_hash → node_id (Redis + local)
   - content_hash → ipfs_cid (Redis + local)

Every intermediate step output (SEGMENT, SEQUENCE, etc.) gets its own IPFS CID.

Local Cache

Hot storage on each worker node:

cache_dir/
  index.json                    # Cache metadata
  content_index.json            # content_hash → node_id
  ipfs_index.json               # content_hash → ipfs_cid
  plans/
    {plan_id}.json              # Cached execution plans
  analysis/
    {hash}.json                 # Analysis results
  {node_id}/
    output.mkv                  # Media output
    metadata.json               # CacheEntry metadata

IPFS - Durable Media Storage

All media files are stored in IPFS for durability and content-addressing.

Supported pinning providers:

  • Pinata
  • web3.storage
  • NFT.Storage
  • Infura IPFS
  • Filebase (S3-compatible)
  • Storj (decentralized)
  • Local IPFS node

Configuration:

IPFS_API=/ip4/127.0.0.1/tcp/5001  # Local IPFS daemon

Redis - Coordination

Redis handles distributed coordination across workers.

Key Patterns

Key Type Purpose
artdag:run:{run_id} String Run status, timestamps, celery task ID
artdag:content_index Hash content_hash → node_id mapping
artdag:ipfs_index Hash content_hash → ipfs_cid mapping
artdag:claim:{cache_id} String Task claiming (prevents duplicate work)

Task Claiming

Lua scripts ensure atomic claiming across workers:

Status flow: PENDING → CLAIMED → RUNNING → COMPLETED/CACHED/FAILED
TTL: 5 minutes for claims, 1 hour for results

This prevents two workers from executing the same step.

PostgreSQL - Metadata

Stores ownership, provenance, and sharing metadata.

Tables

-- Core cache (shared)
cache_items (content_hash, ipfs_cid, created_at)

-- Per-user ownership
item_types (content_hash, actor_id, type, metadata)

-- Run cache (deterministic identity)
run_cache (
  run_id,           -- SHA3-256(sorted_inputs + recipe)
  output_hash,
  ipfs_cid,
  provenance_cid,
  recipe, inputs, actor_id
)

-- Storage backends
storage_backends (actor_id, provider_type, config, capacity_gb)

-- What's stored where
storage_pins (content_hash, storage_id, ipfs_cid, pin_type)

Cache Lookup Flow

When a worker needs a file:

1. Check local cache by cache_id (fastest)
2. Check Redis content_index: content_hash → node_id
3. Check PostgreSQL cache_items
4. Retrieve from IPFS by CID
5. Store in local cache for next hit

Local vs L1 Comparison

Feature Local Testing L1 Distributed
Local cache Yes Yes
IPFS No Yes
Redis No Yes
PostgreSQL No Yes
Multi-worker No Yes
Task claiming No Yes (Lua scripts)
Durability Filesystem only IPFS + PostgreSQL

Content Addressing

All storage uses SHA3-256 (quantum-resistant):

  • Files: content_hash = SHA3-256(file_bytes)
  • Computation: cache_id = SHA3-256(type + config + input_hashes)
  • Run identity: run_id = SHA3-256(sorted_inputs + recipe)
  • Plans: plan_id = SHA3-256(recipe + inputs + analysis)

This ensures:

  • Same inputs → same outputs (reproducibility)
  • Automatic deduplication across workers
  • Content verification (tamper detection)

Configuration

Default locations:

# Local cache
~/.artdag/cache           # Default
/data/cache               # Docker

# Redis
redis://localhost:6379/5

# PostgreSQL
postgresql://user:pass@host/artdag

# IPFS
/ip4/127.0.0.1/tcp/5001

See Also