Files

giles cc2dcbddd4 Squashed 'core/' content from commit 4957443

git-subtree-dir: core
git-subtree-split: 4957443184ae0eb6323635a90a19acffb3e01d07

2026-02-24 23:09:39 +00:00

4.4 KiB

Raw Blame History

L1 Distributed Storage Architecture

This document describes how data is stored when running artdag on L1 (the distributed rendering layer).

Overview

L1 uses four storage systems working together:

System	Purpose	Data Stored
Local Cache	Hot storage (fast access)	Media files, plans, analysis
IPFS	Durable content-addressed storage	All media outputs
Redis	Coordination & indexes	Claims, mappings, run status
PostgreSQL	Metadata & ownership	User data, provenance

Storage Flow

When a step executes on L1:

1. Executor produces output file
2. Store in local cache (fast)
3. Compute content_hash = SHA3-256(file)
4. Upload to IPFS → get ipfs_cid
5. Update indexes:
   - content_hash → node_id (Redis + local)
   - content_hash → ipfs_cid (Redis + local)

Every intermediate step output (SEGMENT, SEQUENCE, etc.) gets its own IPFS CID.

Local Cache

Hot storage on each worker node:

cache_dir/
  index.json                    # Cache metadata
  content_index.json            # content_hash → node_id
  ipfs_index.json               # content_hash → ipfs_cid
  plans/
    {plan_id}.json              # Cached execution plans
  analysis/
    {hash}.json                 # Analysis results
  {node_id}/
    output.mkv                  # Media output
    metadata.json               # CacheEntry metadata

IPFS - Durable Media Storage

All media files are stored in IPFS for durability and content-addressing.

Supported pinning providers:

Pinata
web3.storage
NFT.Storage
Infura IPFS
Filebase (S3-compatible)
Storj (decentralized)
Local IPFS node

Configuration:

IPFS_API=/ip4/127.0.0.1/tcp/5001  # Local IPFS daemon

Redis - Coordination

Redis handles distributed coordination across workers.

Key Patterns

Key	Type	Purpose
`artdag:run:{run_id}`	String	Run status, timestamps, celery task ID
`artdag:content_index`	Hash	content_hash → node_id mapping
`artdag:ipfs_index`	Hash	content_hash → ipfs_cid mapping
`artdag:claim:{cache_id}`	String	Task claiming (prevents duplicate work)

Task Claiming

Lua scripts ensure atomic claiming across workers:

Status flow: PENDING → CLAIMED → RUNNING → COMPLETED/CACHED/FAILED
TTL: 5 minutes for claims, 1 hour for results

This prevents two workers from executing the same step.

PostgreSQL - Metadata

Stores ownership, provenance, and sharing metadata.

Tables

-- Core cache (shared)
cache_items (content_hash, ipfs_cid, created_at)

-- Per-user ownership
item_types (content_hash, actor_id, type, metadata)

-- Run cache (deterministic identity)
run_cache (
  run_id,           -- SHA3-256(sorted_inputs + recipe)
  output_hash,
  ipfs_cid,
  provenance_cid,
  recipe, inputs, actor_id
)

-- Storage backends
storage_backends (actor_id, provider_type, config, capacity_gb)

-- What's stored where
storage_pins (content_hash, storage_id, ipfs_cid, pin_type)

Cache Lookup Flow

When a worker needs a file:

1. Check local cache by cache_id (fastest)
2. Check Redis content_index: content_hash → node_id
3. Check PostgreSQL cache_items
4. Retrieve from IPFS by CID
5. Store in local cache for next hit

Local vs L1 Comparison

Feature	Local Testing	L1 Distributed
Local cache	Yes	Yes
IPFS	No	Yes
Redis	No	Yes
PostgreSQL	No	Yes
Multi-worker	No	Yes
Task claiming	No	Yes (Lua scripts)
Durability	Filesystem only	IPFS + PostgreSQL

Content Addressing

All storage uses SHA3-256 (quantum-resistant):

Files: content_hash = SHA3-256(file_bytes)
Computation: cache_id = SHA3-256(type + config + input_hashes)
Run identity: run_id = SHA3-256(sorted_inputs + recipe)
Plans: plan_id = SHA3-256(recipe + inputs + analysis)

This ensures:

Same inputs → same outputs (reproducibility)
Automatic deduplication across workers
Content verification (tamper detection)

Configuration

Default locations:

# Local cache
~/.artdag/cache           # Default
/data/cache               # Docker

# Redis
redis://localhost:6379/5

# PostgreSQL
postgresql://user:pass@host/artdag

# IPFS
/ip4/127.0.0.1/tcp/5001

4.4 KiB Raw Blame History