rose-ash/docs/L1_STORAGE.md

# L1 Distributed Storage Architecture

This document describes how data is stored when running artdag on L1 (the distributed rendering layer).

## Overview

L1 uses four storage systems working together:

| System | Purpose | Data Stored |
|--------|---------|-------------|
| **Local Cache** | Hot storage (fast access) | Media files, plans, analysis |
| **IPFS** | Durable content-addressed storage | All media outputs |
| **Redis** | Coordination & indexes | Claims, mappings, run status |
| **PostgreSQL** | Metadata & ownership | User data, provenance |

## Storage Flow

When a step executes on L1:

```
1. Executor produces output file
2. Store in local cache (fast)
3. Compute content_hash = SHA3-256(file)
4. Upload to IPFS → get ipfs_cid
5. Update indexes:
   - content_hash → node_id (Redis + local)
   - content_hash → ipfs_cid (Redis + local)
```

Every intermediate step output (SEGMENT, SEQUENCE, etc.) gets its own IPFS CID.

## Local Cache

Hot storage on each worker node:

```
cache_dir/
  index.json                    # Cache metadata
  content_index.json            # content_hash → node_id
  ipfs_index.json               # content_hash → ipfs_cid
  plans/
    {plan_id}.json              # Cached execution plans
  analysis/
    {hash}.json                 # Analysis results
  {node_id}/
    output.mkv                  # Media output
    metadata.json               # CacheEntry metadata
```

## IPFS - Durable Media Storage

All media files are stored in IPFS for durability and content-addressing.

**Supported pinning providers:**
- Pinata
- web3.storage
- NFT.Storage
- Infura IPFS
- Filebase (S3-compatible)
- Storj (decentralized)
- Local IPFS node

**Configuration:**
```bash
IPFS_API=/ip4/127.0.0.1/tcp/5001  # Local IPFS daemon
```

## Redis - Coordination

Redis handles distributed coordination across workers.

### Key Patterns

| Key | Type | Purpose |
|-----|------|---------|
| `artdag:run:{run_id}` | String | Run status, timestamps, celery task ID |
| `artdag:content_index` | Hash | content_hash → node_id mapping |
| `artdag:ipfs_index` | Hash | content_hash → ipfs_cid mapping |
| `artdag:claim:{cache_id}` | String | Task claiming (prevents duplicate work) |

### Task Claiming

Lua scripts ensure atomic claiming across workers:

```
Status flow: PENDING → CLAIMED → RUNNING → COMPLETED/CACHED/FAILED
TTL: 5 minutes for claims, 1 hour for results
```

This prevents two workers from executing the same step.

## PostgreSQL - Metadata

Stores ownership, provenance, and sharing metadata.

### Tables

```sql
-- Core cache (shared)
cache_items (content_hash, ipfs_cid, created_at)

-- Per-user ownership
item_types (content_hash, actor_id, type, metadata)

-- Run cache (deterministic identity)
run_cache (
  run_id,           -- SHA3-256(sorted_inputs + recipe)
  output_hash,
  ipfs_cid,
  provenance_cid,
  recipe, inputs, actor_id
)

-- Storage backends
storage_backends (actor_id, provider_type, config, capacity_gb)

-- What's stored where
storage_pins (content_hash, storage_id, ipfs_cid, pin_type)
```

## Cache Lookup Flow

When a worker needs a file:

```
1. Check local cache by cache_id (fastest)
2. Check Redis content_index: content_hash → node_id
3. Check PostgreSQL cache_items
4. Retrieve from IPFS by CID
5. Store in local cache for next hit
```

## Local vs L1 Comparison

| Feature | Local Testing | L1 Distributed |
|---------|---------------|----------------|
| Local cache | Yes | Yes |
| IPFS | No | Yes |
| Redis | No | Yes |
| PostgreSQL | No | Yes |
| Multi-worker | No | Yes |
| Task claiming | No | Yes (Lua scripts) |
| Durability | Filesystem only | IPFS + PostgreSQL |

## Content Addressing

All storage uses SHA3-256 (quantum-resistant):

- **Files:** `content_hash = SHA3-256(file_bytes)`
- **Computation:** `cache_id = SHA3-256(type + config + input_hashes)`
- **Run identity:** `run_id = SHA3-256(sorted_inputs + recipe)`
- **Plans:** `plan_id = SHA3-256(recipe + inputs + analysis)`

This ensures:
- Same inputs → same outputs (reproducibility)
- Automatic deduplication across workers
- Content verification (tamper detection)

## Configuration

Default locations:

```bash
# Local cache
~/.artdag/cache           # Default
/data/cache               # Docker

# Redis
redis://localhost:6379/5

# PostgreSQL
postgresql://user:pass@host/artdag

# IPFS
/ip4/127.0.0.1/tcp/5001
```

## See Also

- [OFFLINE_TESTING.md](OFFLINE_TESTING.md) - Local testing without L1
- [EXECUTION_MODEL.md](EXECUTION_MODEL.md) - 3-phase execution model