Files
rose-ash/docs/L1_STORAGE.md
giles cc2dcbddd4 Squashed 'core/' content from commit 4957443
git-subtree-dir: core
git-subtree-split: 4957443184ae0eb6323635a90a19acffb3e01d07
2026-02-24 23:09:39 +00:00

182 lines
4.4 KiB
Markdown

# L1 Distributed Storage Architecture
This document describes how data is stored when running artdag on L1 (the distributed rendering layer).
## Overview
L1 uses four storage systems working together:
| System | Purpose | Data Stored |
|--------|---------|-------------|
| **Local Cache** | Hot storage (fast access) | Media files, plans, analysis |
| **IPFS** | Durable content-addressed storage | All media outputs |
| **Redis** | Coordination & indexes | Claims, mappings, run status |
| **PostgreSQL** | Metadata & ownership | User data, provenance |
## Storage Flow
When a step executes on L1:
```
1. Executor produces output file
2. Store in local cache (fast)
3. Compute content_hash = SHA3-256(file)
4. Upload to IPFS → get ipfs_cid
5. Update indexes:
- content_hash → node_id (Redis + local)
- content_hash → ipfs_cid (Redis + local)
```
Every intermediate step output (SEGMENT, SEQUENCE, etc.) gets its own IPFS CID.
## Local Cache
Hot storage on each worker node:
```
cache_dir/
index.json # Cache metadata
content_index.json # content_hash → node_id
ipfs_index.json # content_hash → ipfs_cid
plans/
{plan_id}.json # Cached execution plans
analysis/
{hash}.json # Analysis results
{node_id}/
output.mkv # Media output
metadata.json # CacheEntry metadata
```
## IPFS - Durable Media Storage
All media files are stored in IPFS for durability and content-addressing.
**Supported pinning providers:**
- Pinata
- web3.storage
- NFT.Storage
- Infura IPFS
- Filebase (S3-compatible)
- Storj (decentralized)
- Local IPFS node
**Configuration:**
```bash
IPFS_API=/ip4/127.0.0.1/tcp/5001 # Local IPFS daemon
```
## Redis - Coordination
Redis handles distributed coordination across workers.
### Key Patterns
| Key | Type | Purpose |
|-----|------|---------|
| `artdag:run:{run_id}` | String | Run status, timestamps, celery task ID |
| `artdag:content_index` | Hash | content_hash → node_id mapping |
| `artdag:ipfs_index` | Hash | content_hash → ipfs_cid mapping |
| `artdag:claim:{cache_id}` | String | Task claiming (prevents duplicate work) |
### Task Claiming
Lua scripts ensure atomic claiming across workers:
```
Status flow: PENDING → CLAIMED → RUNNING → COMPLETED/CACHED/FAILED
TTL: 5 minutes for claims, 1 hour for results
```
This prevents two workers from executing the same step.
## PostgreSQL - Metadata
Stores ownership, provenance, and sharing metadata.
### Tables
```sql
-- Core cache (shared)
cache_items (content_hash, ipfs_cid, created_at)
-- Per-user ownership
item_types (content_hash, actor_id, type, metadata)
-- Run cache (deterministic identity)
run_cache (
run_id, -- SHA3-256(sorted_inputs + recipe)
output_hash,
ipfs_cid,
provenance_cid,
recipe, inputs, actor_id
)
-- Storage backends
storage_backends (actor_id, provider_type, config, capacity_gb)
-- What's stored where
storage_pins (content_hash, storage_id, ipfs_cid, pin_type)
```
## Cache Lookup Flow
When a worker needs a file:
```
1. Check local cache by cache_id (fastest)
2. Check Redis content_index: content_hash → node_id
3. Check PostgreSQL cache_items
4. Retrieve from IPFS by CID
5. Store in local cache for next hit
```
## Local vs L1 Comparison
| Feature | Local Testing | L1 Distributed |
|---------|---------------|----------------|
| Local cache | Yes | Yes |
| IPFS | No | Yes |
| Redis | No | Yes |
| PostgreSQL | No | Yes |
| Multi-worker | No | Yes |
| Task claiming | No | Yes (Lua scripts) |
| Durability | Filesystem only | IPFS + PostgreSQL |
## Content Addressing
All storage uses SHA3-256 (quantum-resistant):
- **Files:** `content_hash = SHA3-256(file_bytes)`
- **Computation:** `cache_id = SHA3-256(type + config + input_hashes)`
- **Run identity:** `run_id = SHA3-256(sorted_inputs + recipe)`
- **Plans:** `plan_id = SHA3-256(recipe + inputs + analysis)`
This ensures:
- Same inputs → same outputs (reproducibility)
- Automatic deduplication across workers
- Content verification (tamper detection)
## Configuration
Default locations:
```bash
# Local cache
~/.artdag/cache # Default
/data/cache # Docker
# Redis
redis://localhost:6379/5
# PostgreSQL
postgresql://user:pass@host/artdag
# IPFS
/ip4/127.0.0.1/tcp/5001
```
## See Also
- [OFFLINE_TESTING.md](OFFLINE_TESTING.md) - Local testing without L1
- [EXECUTION_MODEL.md](EXECUTION_MODEL.md) - 3-phase execution model