Squashed 'core/' content from commit 4957443

git-subtree-dir: core
git-subtree-split: 4957443184ae0eb6323635a90a19acffb3e01d07
This commit is contained in:
giles
2026-02-24 23:09:39 +00:00
commit cc2dcbddd4
80 changed files with 25711 additions and 0 deletions

384
docs/EXECUTION_MODEL.md Normal file
View File

@@ -0,0 +1,384 @@
# Art DAG 3-Phase Execution Model
## Overview
The execution model separates DAG processing into three distinct phases:
```
Recipe + Inputs → ANALYZE → Analysis Results
Analysis + Recipe → PLAN → Execution Plan (with cache IDs)
Execution Plan → EXECUTE → Cached Results
```
This separation enables:
1. **Incremental development** - Re-run recipes without reprocessing unchanged steps
2. **Parallel execution** - Independent steps run concurrently via Celery
3. **Deterministic caching** - Same inputs always produce same cache IDs
4. **Cost estimation** - Plan phase can estimate work before executing
## Phase 1: Analysis
### Purpose
Extract features from input media that inform downstream processing decisions.
### Inputs
- Recipe YAML with input references
- Input media files (by content hash)
### Outputs
Analysis results stored as JSON, keyed by input hash:
```python
@dataclass
class AnalysisResult:
input_hash: str
features: Dict[str, Any]
# Audio features
beats: Optional[List[float]] # Beat times in seconds
downbeats: Optional[List[float]] # Bar-start times
tempo: Optional[float] # BPM
energy: Optional[List[Tuple[float, float]]] # (time, value) envelope
spectrum: Optional[Dict[str, List[Tuple[float, float]]]] # band envelopes
# Video features
duration: float
frame_rate: float
dimensions: Tuple[int, int]
motion_tempo: Optional[float] # Estimated BPM from motion
```
### Implementation
```python
class Analyzer:
def analyze(self, input_hash: str, features: List[str]) -> AnalysisResult:
"""Extract requested features from input."""
def analyze_audio(self, path: Path) -> AudioFeatures:
"""Extract all audio features using librosa/essentia."""
def analyze_video(self, path: Path) -> VideoFeatures:
"""Extract video metadata and motion analysis."""
```
### Caching
Analysis results are cached by:
```
analysis_cache_id = SHA3-256(input_hash + sorted(feature_names))
```
## Phase 2: Planning
### Purpose
Convert recipe + analysis into a complete execution plan with pre-computed cache IDs.
### Inputs
- Recipe YAML (parsed)
- Analysis results for all inputs
- Recipe parameters (user-supplied values)
### Outputs
An ExecutionPlan containing ordered steps, each with a pre-computed cache ID:
```python
@dataclass
class ExecutionStep:
step_id: str # Unique identifier
node_type: str # Primitive type (SOURCE, SEQUENCE, etc.)
config: Dict[str, Any] # Node configuration
input_steps: List[str] # IDs of steps this depends on
cache_id: str # Pre-computed: hash(inputs + config)
estimated_duration: float # Optional: for progress reporting
@dataclass
class ExecutionPlan:
plan_id: str # Hash of entire plan
recipe_id: str # Source recipe
steps: List[ExecutionStep] # Topologically sorted
analysis: Dict[str, AnalysisResult]
output_step: str # Final step ID
def compute_cache_ids(self):
"""Compute all cache IDs in dependency order."""
```
### Cache ID Computation
Cache IDs are computed in topological order so each step's cache ID
incorporates its inputs' cache IDs:
```python
def compute_cache_id(step: ExecutionStep, resolved_inputs: Dict[str, str]) -> str:
"""
Cache ID = SHA3-256(
node_type +
canonical_json(config) +
sorted([input_cache_ids])
)
"""
components = [
step.node_type,
json.dumps(step.config, sort_keys=True),
*sorted(resolved_inputs[s] for s in step.input_steps)
]
return sha3_256('|'.join(components))
```
### Plan Generation
The planner expands recipe nodes into concrete steps:
1. **SOURCE nodes** → Direct step with input hash as cache ID
2. **ANALYZE nodes** → Step that references analysis results
3. **TRANSFORM nodes** → Step with static config
4. **TRANSFORM_DYNAMIC nodes** → Expanded to per-frame steps (or use BIND output)
5. **SEQUENCE nodes** → Tree reduction for parallel composition
6. **MAP nodes** → Expanded to N parallel steps + reduction
### Tree Reduction for Composition
Instead of sequential pairwise composition:
```
A → B → C → D (3 sequential steps)
```
Use parallel tree reduction:
```
A ─┬─ AB ─┬─ ABCD
B ─┘ │
C ─┬─ CD ─┘
D ─┘
Level 0: [A, B, C, D] (4 parallel)
Level 1: [AB, CD] (2 parallel)
Level 2: [ABCD] (1 final)
```
This reduces O(N) to O(log N) levels.
## Phase 3: Execution
### Purpose
Execute the plan, skipping steps with cached results.
### Inputs
- ExecutionPlan with pre-computed cache IDs
- Cache state (which IDs already exist)
### Process
1. **Claim Check**: For each step, atomically check if result is cached
2. **Task Dispatch**: Uncached steps dispatched to Celery workers
3. **Parallel Execution**: Independent steps run concurrently
4. **Result Storage**: Each step stores result with its cache ID
5. **Progress Tracking**: Real-time status updates
### Hash-Based Task Claiming
Prevents duplicate work when multiple workers process the same plan:
```lua
-- Redis Lua script for atomic claim
local key = KEYS[1]
local data = redis.call('GET', key)
if data then
local status = cjson.decode(data)
if status.status == 'running' or
status.status == 'completed' or
status.status == 'cached' then
return 0 -- Already claimed/done
end
end
local claim_data = ARGV[1]
local ttl = tonumber(ARGV[2])
redis.call('SETEX', key, ttl, claim_data)
return 1 -- Successfully claimed
```
### Celery Task Structure
```python
@app.task(bind=True)
def execute_step(self, step_json: str, plan_id: str) -> dict:
"""Execute a single step with caching."""
step = ExecutionStep.from_json(step_json)
# Check cache first
if cache.has(step.cache_id):
return {'status': 'cached', 'cache_id': step.cache_id}
# Try to claim this work
if not claim_task(step.cache_id, self.request.id):
# Another worker is handling it, wait for result
return wait_for_result(step.cache_id)
# Do the work
executor = get_executor(step.node_type)
input_paths = [cache.get(s) for s in step.input_steps]
output_path = cache.get_output_path(step.cache_id)
result_path = executor.execute(step.config, input_paths, output_path)
cache.put(step.cache_id, result_path)
return {'status': 'completed', 'cache_id': step.cache_id}
```
### Execution Orchestration
```python
class PlanExecutor:
def execute(self, plan: ExecutionPlan) -> ExecutionResult:
"""Execute plan with parallel Celery tasks."""
# Group steps by level (steps at same level can run in parallel)
levels = self.compute_dependency_levels(plan.steps)
for level_steps in levels:
# Dispatch all steps at this level
tasks = [
execute_step.delay(step.to_json(), plan.plan_id)
for step in level_steps
if not self.cache.has(step.cache_id)
]
# Wait for level completion
results = [task.get() for task in tasks]
return self.collect_results(plan)
```
## Data Flow Example
### Recipe: beat-cuts
```yaml
nodes:
- id: music
type: SOURCE
config: { input: true }
- id: beats
type: ANALYZE
config: { feature: beats }
inputs: [music]
- id: videos
type: SOURCE_LIST
config: { input: true }
- id: slices
type: MAP
config: { operation: RANDOM_SLICE }
inputs:
items: videos
timing: beats
- id: final
type: SEQUENCE
inputs: [slices]
```
### Phase 1: Analysis
```python
# Input: music file with hash abc123
analysis = {
'abc123': AnalysisResult(
beats=[0.0, 0.48, 0.96, 1.44, ...],
tempo=125.0,
duration=180.0
)
}
```
### Phase 2: Planning
```python
# Expands MAP into concrete steps
plan = ExecutionPlan(
steps=[
# Source steps
ExecutionStep(id='music', cache_id='abc123', ...),
ExecutionStep(id='video_0', cache_id='def456', ...),
ExecutionStep(id='video_1', cache_id='ghi789', ...),
# Slice steps (one per beat group)
ExecutionStep(id='slice_0', cache_id='hash(video_0+timing)', ...),
ExecutionStep(id='slice_1', cache_id='hash(video_1+timing)', ...),
...
# Tree reduction for sequence
ExecutionStep(id='seq_0_1', inputs=['slice_0', 'slice_1'], ...),
ExecutionStep(id='seq_2_3', inputs=['slice_2', 'slice_3'], ...),
ExecutionStep(id='seq_final', inputs=['seq_0_1', 'seq_2_3'], ...),
]
)
```
### Phase 3: Execution
```
Level 0: [music, video_0, video_1] → all cached (SOURCE)
Level 1: [slice_0, slice_1, slice_2, slice_3] → 4 parallel tasks
Level 2: [seq_0_1, seq_2_3] → 2 parallel SEQUENCE tasks
Level 3: [seq_final] → 1 final SEQUENCE task
```
## File Structure
```
artdag/
├── artdag/
│ ├── analysis/
│ │ ├── __init__.py
│ │ ├── analyzer.py # Main Analyzer class
│ │ ├── audio.py # Audio feature extraction
│ │ └── video.py # Video feature extraction
│ ├── planning/
│ │ ├── __init__.py
│ │ ├── planner.py # RecipePlanner class
│ │ ├── schema.py # ExecutionPlan, ExecutionStep
│ │ └── tree_reduction.py # Parallel composition optimizer
│ └── execution/
│ ├── __init__.py
│ ├── executor.py # PlanExecutor class
│ └── claiming.py # Hash-based task claiming
art-celery/
├── tasks/
│ ├── __init__.py
│ ├── analyze.py # analyze_inputs task
│ ├── plan.py # generate_plan task
│ ├── execute.py # execute_step task
│ └── orchestrate.py # run_plan (coordinates all)
├── claiming.py # Redis Lua scripts
└── ...
```
## CLI Interface
```bash
# Full pipeline
artdag run-recipe recipes/beat-cuts/recipe.yaml \
-i music:abc123 \
-i videos:def456,ghi789
# Phase by phase
artdag analyze recipes/beat-cuts/recipe.yaml -i music:abc123
# → outputs analysis.json
artdag plan recipes/beat-cuts/recipe.yaml --analysis analysis.json
# → outputs plan.json
artdag execute plan.json
# → runs with caching, skips completed steps
# Dry run (show what would execute)
artdag execute plan.json --dry-run
# → shows which steps are cached vs need execution
```
## Benefits
1. **Development Speed**: Change recipe, re-run → only affected steps execute
2. **Parallelism**: Independent steps run on multiple Celery workers
3. **Reproducibility**: Same inputs + recipe = same cache IDs = same output
4. **Visibility**: Plan shows exactly what will happen before execution
5. **Cost Control**: Estimate compute before committing resources
6. **Fault Tolerance**: Failed runs resume from last successful step

View File

@@ -0,0 +1,443 @@
# IPFS-Primary Architecture (Sketch)
A simplified L1 architecture for large-scale distributed rendering where IPFS is the primary data store.
## Current vs Simplified
| Component | Current | Simplified |
|-----------|---------|------------|
| Local cache | Custom, per-worker | IPFS node handles it |
| Redis content_index | content_hash → node_id | Eliminated |
| Redis ipfs_index | content_hash → ipfs_cid | Eliminated |
| Step inputs | File paths | IPFS CIDs |
| Step outputs | File path + CID | Just CID |
| Cache lookup | Local → Redis → IPFS | Just IPFS |
## Core Principle
**Steps receive CIDs, produce CIDs. No file paths cross machine boundaries.**
```
Step input: [cid1, cid2, ...]
Step output: cid_out
```
## Worker Architecture
Each worker runs:
```
┌─────────────────────────────────────┐
│ Worker Node │
│ │
│ ┌───────────┐ ┌──────────────┐ │
│ │ Celery │────│ IPFS Node │ │
│ │ Worker │ │ (local) │ │
│ └───────────┘ └──────────────┘ │
│ │ │ │
│ │ ┌─────┴─────┐ │
│ │ │ Local │ │
│ │ │ Blockstore│ │
│ │ └───────────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ /tmp │ (ephemeral workspace) │
│ └─────────┘ │
└─────────────────────────────────────┘
│ IPFS libp2p
┌─────────────┐
│ Other IPFS │
│ Nodes │
└─────────────┘
```
## Execution Flow
### 1. Plan Generation (unchanged)
```python
plan = planner.plan(recipe, input_hashes)
# plan.steps[].cache_id = deterministic hash
```
### 2. Input Registration
Before execution, register inputs with IPFS:
```python
input_cids = {}
for name, path in inputs.items():
cid = ipfs.add(path)
input_cids[name] = cid
# Plan now carries CIDs
plan.input_cids = input_cids
```
### 3. Step Execution
```python
@celery.task
def execute_step(step_json: str, input_cids: dict[str, str]) -> str:
"""Execute step, return output CID."""
step = ExecutionStep.from_json(step_json)
# Check if already computed (by cache_id as IPNS key or DHT lookup)
existing_cid = ipfs.resolve(f"/ipns/{step.cache_id}")
if existing_cid:
return existing_cid
# Fetch inputs from IPFS → local temp files
input_paths = []
for input_step_id in step.input_steps:
cid = input_cids[input_step_id]
path = ipfs.get(cid, f"/tmp/{cid}") # IPFS node caches automatically
input_paths.append(path)
# Execute
output_path = f"/tmp/{step.cache_id}.mkv"
executor = get_executor(step.node_type)
executor.execute(step.config, input_paths, output_path)
# Add output to IPFS
output_cid = ipfs.add(output_path)
# Publish cache_id → CID mapping (optional, for cache hits)
ipfs.name_publish(step.cache_id, output_cid)
# Cleanup temp files
cleanup_temp(input_paths + [output_path])
return output_cid
```
### 4. Orchestration
```python
@celery.task
def run_plan(plan_json: str) -> str:
"""Execute plan, return final output CID."""
plan = ExecutionPlan.from_json(plan_json)
# CID results accumulate as steps complete
cid_results = dict(plan.input_cids)
for level in plan.get_steps_by_level():
# Parallel execution within level
tasks = []
for step in level:
step_input_cids = {
sid: cid_results[sid]
for sid in step.input_steps
}
tasks.append(execute_step.s(step.to_json(), step_input_cids))
# Wait for level to complete
results = group(tasks).apply_async().get()
# Record output CIDs
for step, cid in zip(level, results):
cid_results[step.step_id] = cid
return cid_results[plan.output_step]
```
## What's Eliminated
### No more Redis indexes
```python
# BEFORE: Complex index management
self._set_content_index(content_hash, node_id) # Redis + local
self._set_ipfs_index(content_hash, ipfs_cid) # Redis + local
node_id = self._get_content_index(content_hash) # Check Redis, fallback local
# AFTER: Just CIDs
output_cid = ipfs.add(output_path)
return output_cid
```
### No more local cache management
```python
# BEFORE: Custom cache with entries, metadata, cleanup
cache.put(node_id, source_path, node_type, execution_time)
cache.get(node_id)
cache.has(node_id)
cache.cleanup_lru()
# AFTER: IPFS handles it
ipfs.add(path) # Store
ipfs.get(cid) # Retrieve (cached by IPFS node)
ipfs.pin(cid) # Keep permanently
ipfs.gc() # Cleanup unpinned
```
### No more content_hash vs node_id confusion
```python
# BEFORE: Two identifiers
content_hash = sha3_256(file_bytes) # What the file IS
node_id = cache_id # What computation produced it
# Need indexes to map between them
# AFTER: One identifier
cid = ipfs.add(file) # Content-addressed, includes hash
# CID IS the identifier
```
## Cache Hit Detection
Two options:
### Option A: IPNS (mutable names)
```python
# Publish: cache_id → CID
ipfs.name_publish(key=cache_id, value=output_cid)
# Lookup before executing
existing = ipfs.name_resolve(cache_id)
if existing:
return existing # Cache hit
```
### Option B: DHT record
```python
# Store in DHT: cache_id → CID
ipfs.dht_put(cache_id, output_cid)
# Lookup
existing = ipfs.dht_get(cache_id)
```
### Option C: Redis (minimal)
Keep Redis just for cache_id → CID mapping:
```python
# Store
redis.hset("artdag:cache", cache_id, output_cid)
# Lookup
existing = redis.hget("artdag:cache", cache_id)
```
This is simpler than current approach - one hash, one mapping, no content_hash/node_id confusion.
## Claiming (Preventing Duplicate Work)
Still need Redis for atomic claiming:
```python
# Claim before executing
claimed = redis.set(f"artdag:claim:{cache_id}", worker_id, nx=True, ex=300)
if not claimed:
# Another worker is doing it - wait for result
return wait_for_result(cache_id)
```
Or use IPFS pubsub for coordination.
## Data Flow Diagram
```
┌─────────────┐
│ Recipe │
│ + Inputs │
└──────┬──────┘
┌─────────────┐
│ Planner │
│ (compute │
│ cache_ids) │
└──────┬──────┘
┌─────────────────────────────────┐
│ ExecutionPlan │
│ - steps with cache_ids │
│ - input_cids (from ipfs.add) │
└─────────────────┬───────────────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Worker 1│ │Worker 2│ │Worker 3│
│ │ │ │ │ │
│ IPFS │◄──│ IPFS │◄──│ IPFS │
│ Node │──►│ Node │──►│ Node │
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
└────────────┼────────────┘
┌─────────────┐
│ Final CID │
│ (output) │
└─────────────┘
```
## Benefits
1. **Simpler code** - No custom cache, no dual indexes
2. **Automatic distribution** - IPFS handles replication
3. **Content verification** - CIDs are self-verifying
4. **Scalable** - Add workers = add IPFS nodes = more cache capacity
5. **Resilient** - Any node can serve any content
## Tradeoffs
1. **IPFS dependency** - Every worker needs IPFS node
2. **Initial fetch latency** - First fetch may be slower than local disk
3. **IPNS latency** - Name resolution can be slow (Option C avoids this)
## Trust Domains (Cluster Key)
Systems can share work through IPFS, but how do you trust them?
**Problem:** A malicious system could return wrong CIDs for computed steps.
**Solution:** Cluster key creates isolated trust domains:
```bash
export ARTDAG_CLUSTER_KEY="my-secret-shared-key"
```
**How it works:**
- The cluster key is mixed into all cache_id computations
- Systems with the same key produce the same cache_ids
- Systems with different keys have separate cache namespaces
- Only share the key with trusted partners
```
cache_id = SHA3-256(cluster_key + node_type + config + inputs)
```
**Trust model:**
| Scenario | Same Key? | Can Share Work? |
|----------|-----------|-----------------|
| Same organization | Yes | Yes |
| Trusted partner | Yes (shared) | Yes |
| Unknown system | No | No (different cache_ids) |
**Configuration:**
```yaml
# docker-compose.yml
environment:
- ARTDAG_CLUSTER_KEY=your-secret-key-here
```
**Programmatic:**
```python
from artdag.planning.schema import set_cluster_key
set_cluster_key("my-secret-key")
```
## Implementation
The simplified architecture is implemented in `art-celery/`:
| File | Purpose |
|------|---------|
| `hybrid_state.py` | Hybrid state manager (Redis + IPNS) |
| `tasks/execute_cid.py` | Step execution with CIDs |
| `tasks/analyze_cid.py` | Analysis with CIDs |
| `tasks/orchestrate_cid.py` | Full pipeline orchestration |
### Key Functions
**Registration (local → IPFS):**
- `register_input_cid(path)``{cid, content_hash}`
- `register_recipe_cid(path)``{cid, name, version}`
**Analysis:**
- `analyze_input_cid(input_cid, input_hash, features)``{analysis_cid}`
**Planning:**
- `generate_plan_cid(recipe_cid, input_cids, input_hashes, analysis_cids)``{plan_cid}`
**Execution:**
- `execute_step_cid(step_json, input_cids)``{cid}`
- `execute_plan_from_cid(plan_cid, input_cids)``{output_cid}`
**Full Pipeline:**
- `run_recipe_cid(recipe_cid, input_cids, input_hashes)``{output_cid, all_cids}`
- `run_from_local(recipe_path, input_paths)` → registers + runs
### Hybrid State Manager
For distributed L1 coordination, use the `HybridStateManager` which provides:
**Fast path (local Redis):**
- `get_cached_cid(cache_id)` / `set_cached_cid(cache_id, cid)` - microsecond lookups
- `try_claim(cache_id, worker_id)` / `release_claim(cache_id)` - atomic claiming
- `get_analysis_cid()` / `set_analysis_cid()` - analysis cache
- `get_plan_cid()` / `set_plan_cid()` - plan cache
- `get_run_cid()` / `set_run_cid()` - run cache
**Slow path (background IPNS sync):**
- Periodically syncs local state with global IPNS state (default: every 30s)
- Pulls new entries from remote nodes
- Pushes local updates to IPNS
**Configuration:**
```bash
# Enable IPNS sync
export ARTDAG_IPNS_SYNC=true
export ARTDAG_IPNS_SYNC_INTERVAL=30 # seconds
```
**Usage:**
```python
from hybrid_state import get_state_manager
state = get_state_manager()
# Fast local lookup
cid = state.get_cached_cid(cache_id)
# Fast local write (synced in background)
state.set_cached_cid(cache_id, output_cid)
# Atomic claim
if state.try_claim(cache_id, worker_id):
# We have the lock
...
```
**Trade-offs:**
- Local Redis: Fast (microseconds), single node
- IPNS sync: Slow (seconds), eventually consistent across nodes
- Duplicate work: Accepted (idempotent - same inputs → same CID)
### Redis Usage (minimal)
| Key | Type | Purpose |
|-----|------|---------|
| `artdag:cid_cache` | Hash | cache_id → output CID |
| `artdag:analysis_cache` | Hash | input_hash:features → analysis CID |
| `artdag:plan_cache` | Hash | plan_id → plan CID |
| `artdag:run_cache` | Hash | run_id → output CID |
| `artdag:claim:{cache_id}` | String | worker_id (TTL 5 min) |
## Migration Path
1. Keep current system working ✓
2. Add CID-based tasks ✓
- `execute_cid.py`
- `analyze_cid.py`
- `orchestrate_cid.py`
3. Add `--ipfs-primary` flag to CLI ✓
4. Add hybrid state manager for L1 coordination ✓
5. Gradually deprecate local cache code
6. Remove old tasks when CID versions are stable
## See Also
- [L1_STORAGE.md](L1_STORAGE.md) - Current L1 architecture
- [EXECUTION_MODEL.md](EXECUTION_MODEL.md) - 3-phase model

181
docs/L1_STORAGE.md Normal file
View File

@@ -0,0 +1,181 @@
# L1 Distributed Storage Architecture
This document describes how data is stored when running artdag on L1 (the distributed rendering layer).
## Overview
L1 uses four storage systems working together:
| System | Purpose | Data Stored |
|--------|---------|-------------|
| **Local Cache** | Hot storage (fast access) | Media files, plans, analysis |
| **IPFS** | Durable content-addressed storage | All media outputs |
| **Redis** | Coordination & indexes | Claims, mappings, run status |
| **PostgreSQL** | Metadata & ownership | User data, provenance |
## Storage Flow
When a step executes on L1:
```
1. Executor produces output file
2. Store in local cache (fast)
3. Compute content_hash = SHA3-256(file)
4. Upload to IPFS → get ipfs_cid
5. Update indexes:
- content_hash → node_id (Redis + local)
- content_hash → ipfs_cid (Redis + local)
```
Every intermediate step output (SEGMENT, SEQUENCE, etc.) gets its own IPFS CID.
## Local Cache
Hot storage on each worker node:
```
cache_dir/
index.json # Cache metadata
content_index.json # content_hash → node_id
ipfs_index.json # content_hash → ipfs_cid
plans/
{plan_id}.json # Cached execution plans
analysis/
{hash}.json # Analysis results
{node_id}/
output.mkv # Media output
metadata.json # CacheEntry metadata
```
## IPFS - Durable Media Storage
All media files are stored in IPFS for durability and content-addressing.
**Supported pinning providers:**
- Pinata
- web3.storage
- NFT.Storage
- Infura IPFS
- Filebase (S3-compatible)
- Storj (decentralized)
- Local IPFS node
**Configuration:**
```bash
IPFS_API=/ip4/127.0.0.1/tcp/5001 # Local IPFS daemon
```
## Redis - Coordination
Redis handles distributed coordination across workers.
### Key Patterns
| Key | Type | Purpose |
|-----|------|---------|
| `artdag:run:{run_id}` | String | Run status, timestamps, celery task ID |
| `artdag:content_index` | Hash | content_hash → node_id mapping |
| `artdag:ipfs_index` | Hash | content_hash → ipfs_cid mapping |
| `artdag:claim:{cache_id}` | String | Task claiming (prevents duplicate work) |
### Task Claiming
Lua scripts ensure atomic claiming across workers:
```
Status flow: PENDING → CLAIMED → RUNNING → COMPLETED/CACHED/FAILED
TTL: 5 minutes for claims, 1 hour for results
```
This prevents two workers from executing the same step.
## PostgreSQL - Metadata
Stores ownership, provenance, and sharing metadata.
### Tables
```sql
-- Core cache (shared)
cache_items (content_hash, ipfs_cid, created_at)
-- Per-user ownership
item_types (content_hash, actor_id, type, metadata)
-- Run cache (deterministic identity)
run_cache (
run_id, -- SHA3-256(sorted_inputs + recipe)
output_hash,
ipfs_cid,
provenance_cid,
recipe, inputs, actor_id
)
-- Storage backends
storage_backends (actor_id, provider_type, config, capacity_gb)
-- What's stored where
storage_pins (content_hash, storage_id, ipfs_cid, pin_type)
```
## Cache Lookup Flow
When a worker needs a file:
```
1. Check local cache by cache_id (fastest)
2. Check Redis content_index: content_hash → node_id
3. Check PostgreSQL cache_items
4. Retrieve from IPFS by CID
5. Store in local cache for next hit
```
## Local vs L1 Comparison
| Feature | Local Testing | L1 Distributed |
|---------|---------------|----------------|
| Local cache | Yes | Yes |
| IPFS | No | Yes |
| Redis | No | Yes |
| PostgreSQL | No | Yes |
| Multi-worker | No | Yes |
| Task claiming | No | Yes (Lua scripts) |
| Durability | Filesystem only | IPFS + PostgreSQL |
## Content Addressing
All storage uses SHA3-256 (quantum-resistant):
- **Files:** `content_hash = SHA3-256(file_bytes)`
- **Computation:** `cache_id = SHA3-256(type + config + input_hashes)`
- **Run identity:** `run_id = SHA3-256(sorted_inputs + recipe)`
- **Plans:** `plan_id = SHA3-256(recipe + inputs + analysis)`
This ensures:
- Same inputs → same outputs (reproducibility)
- Automatic deduplication across workers
- Content verification (tamper detection)
## Configuration
Default locations:
```bash
# Local cache
~/.artdag/cache # Default
/data/cache # Docker
# Redis
redis://localhost:6379/5
# PostgreSQL
postgresql://user:pass@host/artdag
# IPFS
/ip4/127.0.0.1/tcp/5001
```
## See Also
- [OFFLINE_TESTING.md](OFFLINE_TESTING.md) - Local testing without L1
- [EXECUTION_MODEL.md](EXECUTION_MODEL.md) - 3-phase execution model

211
docs/OFFLINE_TESTING.md Normal file
View File

@@ -0,0 +1,211 @@
# Offline Testing Strategy
This document describes how to test artdag locally without requiring Redis, IPFS, Celery, or any external distributed infrastructure.
## Overview
The artdag system uses a **3-Phase Execution Model** that enables complete offline testing:
1. **Analysis** - Extract features from input media
2. **Planning** - Generate deterministic execution plan with pre-computed cache IDs
3. **Execution** - Run plan steps, skipping cached results
This separation allows testing each phase independently and running full pipelines locally.
## Quick Start
Run a full offline test with a video file:
```bash
./examples/test_local.sh ../artdag-art-source/dog.mkv
```
This will:
1. Compute the SHA3-256 hash of the input video
2. Run the `simple_sequence` recipe
3. Store all outputs in `test_cache/`
## Test Scripts
### `test_local.sh` - Full Pipeline Test
Location: `./examples/test_local.sh`
Runs the complete artdag pipeline offline with a real video file.
**Usage:**
```bash
./examples/test_local.sh <video_file>
```
**Example:**
```bash
./examples/test_local.sh ../artdag-art-source/dog.mkv
```
**What it does:**
- Computes content hash of input video
- Runs `artdag run-recipe` with `simple_sequence.yaml`
- Stores outputs in `test_cache/` directory
- No external services required
### `test_plan.py` - Planning Phase Test
Location: `./examples/test_plan.py`
Tests the planning phase without requiring any media files.
**Usage:**
```bash
python3 examples/test_plan.py
```
**What it tests:**
- Recipe loading and YAML parsing
- Execution plan generation
- Cache ID computation (deterministic)
- Multi-level parallel step organization
- Human-readable step names
- Multi-output support
**Output:**
- Prints plan structure to console
- Saves full plan to `test_plan_output.json`
### `simple_sequence.yaml` - Sample Recipe
Location: `./examples/simple_sequence.yaml`
A simple recipe for testing that:
- Takes a video input
- Extracts two segments (0-2s and 5-7s)
- Concatenates them with SEQUENCE
## Test Outputs
All test outputs are stored locally and git-ignored:
| Output | Description |
|--------|-------------|
| `test_cache/` | Cached execution results (media files, analysis, plans) |
| `test_cache/plans/` | Cached execution plans by plan_id |
| `test_cache/analysis/` | Cached analysis results by input hash |
| `test_plan_output.json` | Generated execution plan from `test_plan.py` |
## Unit Tests
The project includes a comprehensive pytest test suite in `tests/`:
```bash
# Run all unit tests
pytest
# Run specific test file
pytest tests/test_dag.py
pytest tests/test_engine.py
pytest tests/test_cache.py
```
## Testing Each Phase
### Phase 1: Analysis Only
Extract features without full execution:
```bash
python3 -m artdag.cli analyze <recipe> -i <name>:<hash>@<path> --features beats,energy
```
### Phase 2: Planning Only
Generate an execution plan (no media needed):
```bash
python3 -m artdag.cli plan <recipe> -i <name>:<hash>
```
Or use the test script:
```bash
python3 examples/test_plan.py
```
### Phase 3: Execution Only
Execute a pre-generated plan:
```bash
python3 -m artdag.cli execute plan.json
```
With dry-run to see what would execute:
```bash
python3 -m artdag.cli execute plan.json --dry-run
```
## Key Testing Features
### Content Addressing
All nodes have deterministic IDs computed as:
```
SHA3-256(type + config + sorted(input_IDs))
```
Same inputs always produce same cache IDs, enabling:
- Reproducibility across runs
- Automatic deduplication
- Incremental execution (only changed steps run)
### Local Caching
The `test_cache/` directory stores:
- `plans/{plan_id}.json` - Execution plans (deterministic hash of recipe + inputs + analysis)
- `analysis/{hash}.json` - Analysis results (audio beats, tempo, energy)
- `{cache_id}/output.mkv` - Media outputs from each step
Subsequent test runs automatically skip cached steps. Plans are cached by their `plan_id`, which is a SHA3-256 hash of the recipe, input hashes, and analysis results - so the same recipe with the same inputs always produces the same plan.
### No External Dependencies
Offline testing requires:
- Python 3.9+
- ffmpeg (for media processing)
- No Redis, IPFS, Celery, or network access
## Debugging Tips
1. **Check cache contents:**
```bash
ls -la test_cache/
ls -la test_cache/plans/
```
2. **View cached plan:**
```bash
cat test_cache/plans/*.json | python3 -m json.tool | head -50
```
3. **View execution plan structure:**
```bash
cat test_plan_output.json | python3 -m json.tool
```
4. **Run with verbose output:**
```bash
python3 -m artdag.cli run-recipe examples/simple_sequence.yaml \
-i "video:HASH@path" \
--cache-dir test_cache \
-v
```
5. **Dry-run to see what would execute:**
```bash
python3 -m artdag.cli execute plan.json --dry-run
```
## See Also
- [L1_STORAGE.md](L1_STORAGE.md) - Distributed storage on L1 (IPFS, Redis, PostgreSQL)
- [EXECUTION_MODEL.md](EXECUTION_MODEL.md) - 3-phase execution model