Files
rose-ash/docs/EXECUTION_MODEL.md
giles cc2dcbddd4 Squashed 'core/' content from commit 4957443
git-subtree-dir: core
git-subtree-split: 4957443184ae0eb6323635a90a19acffb3e01d07
2026-02-24 23:09:39 +00:00

385 lines
11 KiB
Markdown

# Art DAG 3-Phase Execution Model
## Overview
The execution model separates DAG processing into three distinct phases:
```
Recipe + Inputs → ANALYZE → Analysis Results
Analysis + Recipe → PLAN → Execution Plan (with cache IDs)
Execution Plan → EXECUTE → Cached Results
```
This separation enables:
1. **Incremental development** - Re-run recipes without reprocessing unchanged steps
2. **Parallel execution** - Independent steps run concurrently via Celery
3. **Deterministic caching** - Same inputs always produce same cache IDs
4. **Cost estimation** - Plan phase can estimate work before executing
## Phase 1: Analysis
### Purpose
Extract features from input media that inform downstream processing decisions.
### Inputs
- Recipe YAML with input references
- Input media files (by content hash)
### Outputs
Analysis results stored as JSON, keyed by input hash:
```python
@dataclass
class AnalysisResult:
input_hash: str
features: Dict[str, Any]
# Audio features
beats: Optional[List[float]] # Beat times in seconds
downbeats: Optional[List[float]] # Bar-start times
tempo: Optional[float] # BPM
energy: Optional[List[Tuple[float, float]]] # (time, value) envelope
spectrum: Optional[Dict[str, List[Tuple[float, float]]]] # band envelopes
# Video features
duration: float
frame_rate: float
dimensions: Tuple[int, int]
motion_tempo: Optional[float] # Estimated BPM from motion
```
### Implementation
```python
class Analyzer:
def analyze(self, input_hash: str, features: List[str]) -> AnalysisResult:
"""Extract requested features from input."""
def analyze_audio(self, path: Path) -> AudioFeatures:
"""Extract all audio features using librosa/essentia."""
def analyze_video(self, path: Path) -> VideoFeatures:
"""Extract video metadata and motion analysis."""
```
### Caching
Analysis results are cached by:
```
analysis_cache_id = SHA3-256(input_hash + sorted(feature_names))
```
## Phase 2: Planning
### Purpose
Convert recipe + analysis into a complete execution plan with pre-computed cache IDs.
### Inputs
- Recipe YAML (parsed)
- Analysis results for all inputs
- Recipe parameters (user-supplied values)
### Outputs
An ExecutionPlan containing ordered steps, each with a pre-computed cache ID:
```python
@dataclass
class ExecutionStep:
step_id: str # Unique identifier
node_type: str # Primitive type (SOURCE, SEQUENCE, etc.)
config: Dict[str, Any] # Node configuration
input_steps: List[str] # IDs of steps this depends on
cache_id: str # Pre-computed: hash(inputs + config)
estimated_duration: float # Optional: for progress reporting
@dataclass
class ExecutionPlan:
plan_id: str # Hash of entire plan
recipe_id: str # Source recipe
steps: List[ExecutionStep] # Topologically sorted
analysis: Dict[str, AnalysisResult]
output_step: str # Final step ID
def compute_cache_ids(self):
"""Compute all cache IDs in dependency order."""
```
### Cache ID Computation
Cache IDs are computed in topological order so each step's cache ID
incorporates its inputs' cache IDs:
```python
def compute_cache_id(step: ExecutionStep, resolved_inputs: Dict[str, str]) -> str:
"""
Cache ID = SHA3-256(
node_type +
canonical_json(config) +
sorted([input_cache_ids])
)
"""
components = [
step.node_type,
json.dumps(step.config, sort_keys=True),
*sorted(resolved_inputs[s] for s in step.input_steps)
]
return sha3_256('|'.join(components))
```
### Plan Generation
The planner expands recipe nodes into concrete steps:
1. **SOURCE nodes** → Direct step with input hash as cache ID
2. **ANALYZE nodes** → Step that references analysis results
3. **TRANSFORM nodes** → Step with static config
4. **TRANSFORM_DYNAMIC nodes** → Expanded to per-frame steps (or use BIND output)
5. **SEQUENCE nodes** → Tree reduction for parallel composition
6. **MAP nodes** → Expanded to N parallel steps + reduction
### Tree Reduction for Composition
Instead of sequential pairwise composition:
```
A → B → C → D (3 sequential steps)
```
Use parallel tree reduction:
```
A ─┬─ AB ─┬─ ABCD
B ─┘ │
C ─┬─ CD ─┘
D ─┘
Level 0: [A, B, C, D] (4 parallel)
Level 1: [AB, CD] (2 parallel)
Level 2: [ABCD] (1 final)
```
This reduces O(N) to O(log N) levels.
## Phase 3: Execution
### Purpose
Execute the plan, skipping steps with cached results.
### Inputs
- ExecutionPlan with pre-computed cache IDs
- Cache state (which IDs already exist)
### Process
1. **Claim Check**: For each step, atomically check if result is cached
2. **Task Dispatch**: Uncached steps dispatched to Celery workers
3. **Parallel Execution**: Independent steps run concurrently
4. **Result Storage**: Each step stores result with its cache ID
5. **Progress Tracking**: Real-time status updates
### Hash-Based Task Claiming
Prevents duplicate work when multiple workers process the same plan:
```lua
-- Redis Lua script for atomic claim
local key = KEYS[1]
local data = redis.call('GET', key)
if data then
local status = cjson.decode(data)
if status.status == 'running' or
status.status == 'completed' or
status.status == 'cached' then
return 0 -- Already claimed/done
end
end
local claim_data = ARGV[1]
local ttl = tonumber(ARGV[2])
redis.call('SETEX', key, ttl, claim_data)
return 1 -- Successfully claimed
```
### Celery Task Structure
```python
@app.task(bind=True)
def execute_step(self, step_json: str, plan_id: str) -> dict:
"""Execute a single step with caching."""
step = ExecutionStep.from_json(step_json)
# Check cache first
if cache.has(step.cache_id):
return {'status': 'cached', 'cache_id': step.cache_id}
# Try to claim this work
if not claim_task(step.cache_id, self.request.id):
# Another worker is handling it, wait for result
return wait_for_result(step.cache_id)
# Do the work
executor = get_executor(step.node_type)
input_paths = [cache.get(s) for s in step.input_steps]
output_path = cache.get_output_path(step.cache_id)
result_path = executor.execute(step.config, input_paths, output_path)
cache.put(step.cache_id, result_path)
return {'status': 'completed', 'cache_id': step.cache_id}
```
### Execution Orchestration
```python
class PlanExecutor:
def execute(self, plan: ExecutionPlan) -> ExecutionResult:
"""Execute plan with parallel Celery tasks."""
# Group steps by level (steps at same level can run in parallel)
levels = self.compute_dependency_levels(plan.steps)
for level_steps in levels:
# Dispatch all steps at this level
tasks = [
execute_step.delay(step.to_json(), plan.plan_id)
for step in level_steps
if not self.cache.has(step.cache_id)
]
# Wait for level completion
results = [task.get() for task in tasks]
return self.collect_results(plan)
```
## Data Flow Example
### Recipe: beat-cuts
```yaml
nodes:
- id: music
type: SOURCE
config: { input: true }
- id: beats
type: ANALYZE
config: { feature: beats }
inputs: [music]
- id: videos
type: SOURCE_LIST
config: { input: true }
- id: slices
type: MAP
config: { operation: RANDOM_SLICE }
inputs:
items: videos
timing: beats
- id: final
type: SEQUENCE
inputs: [slices]
```
### Phase 1: Analysis
```python
# Input: music file with hash abc123
analysis = {
'abc123': AnalysisResult(
beats=[0.0, 0.48, 0.96, 1.44, ...],
tempo=125.0,
duration=180.0
)
}
```
### Phase 2: Planning
```python
# Expands MAP into concrete steps
plan = ExecutionPlan(
steps=[
# Source steps
ExecutionStep(id='music', cache_id='abc123', ...),
ExecutionStep(id='video_0', cache_id='def456', ...),
ExecutionStep(id='video_1', cache_id='ghi789', ...),
# Slice steps (one per beat group)
ExecutionStep(id='slice_0', cache_id='hash(video_0+timing)', ...),
ExecutionStep(id='slice_1', cache_id='hash(video_1+timing)', ...),
...
# Tree reduction for sequence
ExecutionStep(id='seq_0_1', inputs=['slice_0', 'slice_1'], ...),
ExecutionStep(id='seq_2_3', inputs=['slice_2', 'slice_3'], ...),
ExecutionStep(id='seq_final', inputs=['seq_0_1', 'seq_2_3'], ...),
]
)
```
### Phase 3: Execution
```
Level 0: [music, video_0, video_1] → all cached (SOURCE)
Level 1: [slice_0, slice_1, slice_2, slice_3] → 4 parallel tasks
Level 2: [seq_0_1, seq_2_3] → 2 parallel SEQUENCE tasks
Level 3: [seq_final] → 1 final SEQUENCE task
```
## File Structure
```
artdag/
├── artdag/
│ ├── analysis/
│ │ ├── __init__.py
│ │ ├── analyzer.py # Main Analyzer class
│ │ ├── audio.py # Audio feature extraction
│ │ └── video.py # Video feature extraction
│ ├── planning/
│ │ ├── __init__.py
│ │ ├── planner.py # RecipePlanner class
│ │ ├── schema.py # ExecutionPlan, ExecutionStep
│ │ └── tree_reduction.py # Parallel composition optimizer
│ └── execution/
│ ├── __init__.py
│ ├── executor.py # PlanExecutor class
│ └── claiming.py # Hash-based task claiming
art-celery/
├── tasks/
│ ├── __init__.py
│ ├── analyze.py # analyze_inputs task
│ ├── plan.py # generate_plan task
│ ├── execute.py # execute_step task
│ └── orchestrate.py # run_plan (coordinates all)
├── claiming.py # Redis Lua scripts
└── ...
```
## CLI Interface
```bash
# Full pipeline
artdag run-recipe recipes/beat-cuts/recipe.yaml \
-i music:abc123 \
-i videos:def456,ghi789
# Phase by phase
artdag analyze recipes/beat-cuts/recipe.yaml -i music:abc123
# → outputs analysis.json
artdag plan recipes/beat-cuts/recipe.yaml --analysis analysis.json
# → outputs plan.json
artdag execute plan.json
# → runs with caching, skips completed steps
# Dry run (show what would execute)
artdag execute plan.json --dry-run
# → shows which steps are cached vs need execution
```
## Benefits
1. **Development Speed**: Change recipe, re-run → only affected steps execute
2. **Parallelism**: Independent steps run on multiple Celery workers
3. **Reproducibility**: Same inputs + recipe = same cache IDs = same output
4. **Visibility**: Plan shows exactly what will happen before execution
5. **Cost Control**: Estimate compute before committing resources
6. **Fault Tolerance**: Failed runs resume from last successful step