Squashed 'core/' content from commit 4957443
git-subtree-dir: core git-subtree-split: 4957443184ae0eb6323635a90a19acffb3e01d07
This commit is contained in:
384
docs/EXECUTION_MODEL.md
Normal file
384
docs/EXECUTION_MODEL.md
Normal file
@@ -0,0 +1,384 @@
|
||||
# Art DAG 3-Phase Execution Model
|
||||
|
||||
## Overview
|
||||
|
||||
The execution model separates DAG processing into three distinct phases:
|
||||
|
||||
```
|
||||
Recipe + Inputs → ANALYZE → Analysis Results
|
||||
↓
|
||||
Analysis + Recipe → PLAN → Execution Plan (with cache IDs)
|
||||
↓
|
||||
Execution Plan → EXECUTE → Cached Results
|
||||
```
|
||||
|
||||
This separation enables:
|
||||
1. **Incremental development** - Re-run recipes without reprocessing unchanged steps
|
||||
2. **Parallel execution** - Independent steps run concurrently via Celery
|
||||
3. **Deterministic caching** - Same inputs always produce same cache IDs
|
||||
4. **Cost estimation** - Plan phase can estimate work before executing
|
||||
|
||||
## Phase 1: Analysis
|
||||
|
||||
### Purpose
|
||||
Extract features from input media that inform downstream processing decisions.
|
||||
|
||||
### Inputs
|
||||
- Recipe YAML with input references
|
||||
- Input media files (by content hash)
|
||||
|
||||
### Outputs
|
||||
Analysis results stored as JSON, keyed by input hash:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AnalysisResult:
|
||||
input_hash: str
|
||||
features: Dict[str, Any]
|
||||
# Audio features
|
||||
beats: Optional[List[float]] # Beat times in seconds
|
||||
downbeats: Optional[List[float]] # Bar-start times
|
||||
tempo: Optional[float] # BPM
|
||||
energy: Optional[List[Tuple[float, float]]] # (time, value) envelope
|
||||
spectrum: Optional[Dict[str, List[Tuple[float, float]]]] # band envelopes
|
||||
# Video features
|
||||
duration: float
|
||||
frame_rate: float
|
||||
dimensions: Tuple[int, int]
|
||||
motion_tempo: Optional[float] # Estimated BPM from motion
|
||||
```
|
||||
|
||||
### Implementation
|
||||
```python
|
||||
class Analyzer:
|
||||
def analyze(self, input_hash: str, features: List[str]) -> AnalysisResult:
|
||||
"""Extract requested features from input."""
|
||||
|
||||
def analyze_audio(self, path: Path) -> AudioFeatures:
|
||||
"""Extract all audio features using librosa/essentia."""
|
||||
|
||||
def analyze_video(self, path: Path) -> VideoFeatures:
|
||||
"""Extract video metadata and motion analysis."""
|
||||
```
|
||||
|
||||
### Caching
|
||||
Analysis results are cached by:
|
||||
```
|
||||
analysis_cache_id = SHA3-256(input_hash + sorted(feature_names))
|
||||
```
|
||||
|
||||
## Phase 2: Planning
|
||||
|
||||
### Purpose
|
||||
Convert recipe + analysis into a complete execution plan with pre-computed cache IDs.
|
||||
|
||||
### Inputs
|
||||
- Recipe YAML (parsed)
|
||||
- Analysis results for all inputs
|
||||
- Recipe parameters (user-supplied values)
|
||||
|
||||
### Outputs
|
||||
An ExecutionPlan containing ordered steps, each with a pre-computed cache ID:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ExecutionStep:
|
||||
step_id: str # Unique identifier
|
||||
node_type: str # Primitive type (SOURCE, SEQUENCE, etc.)
|
||||
config: Dict[str, Any] # Node configuration
|
||||
input_steps: List[str] # IDs of steps this depends on
|
||||
cache_id: str # Pre-computed: hash(inputs + config)
|
||||
estimated_duration: float # Optional: for progress reporting
|
||||
|
||||
@dataclass
|
||||
class ExecutionPlan:
|
||||
plan_id: str # Hash of entire plan
|
||||
recipe_id: str # Source recipe
|
||||
steps: List[ExecutionStep] # Topologically sorted
|
||||
analysis: Dict[str, AnalysisResult]
|
||||
output_step: str # Final step ID
|
||||
|
||||
def compute_cache_ids(self):
|
||||
"""Compute all cache IDs in dependency order."""
|
||||
```
|
||||
|
||||
### Cache ID Computation
|
||||
|
||||
Cache IDs are computed in topological order so each step's cache ID
|
||||
incorporates its inputs' cache IDs:
|
||||
|
||||
```python
|
||||
def compute_cache_id(step: ExecutionStep, resolved_inputs: Dict[str, str]) -> str:
|
||||
"""
|
||||
Cache ID = SHA3-256(
|
||||
node_type +
|
||||
canonical_json(config) +
|
||||
sorted([input_cache_ids])
|
||||
)
|
||||
"""
|
||||
components = [
|
||||
step.node_type,
|
||||
json.dumps(step.config, sort_keys=True),
|
||||
*sorted(resolved_inputs[s] for s in step.input_steps)
|
||||
]
|
||||
return sha3_256('|'.join(components))
|
||||
```
|
||||
|
||||
### Plan Generation
|
||||
|
||||
The planner expands recipe nodes into concrete steps:
|
||||
|
||||
1. **SOURCE nodes** → Direct step with input hash as cache ID
|
||||
2. **ANALYZE nodes** → Step that references analysis results
|
||||
3. **TRANSFORM nodes** → Step with static config
|
||||
4. **TRANSFORM_DYNAMIC nodes** → Expanded to per-frame steps (or use BIND output)
|
||||
5. **SEQUENCE nodes** → Tree reduction for parallel composition
|
||||
6. **MAP nodes** → Expanded to N parallel steps + reduction
|
||||
|
||||
### Tree Reduction for Composition
|
||||
|
||||
Instead of sequential pairwise composition:
|
||||
```
|
||||
A → B → C → D (3 sequential steps)
|
||||
```
|
||||
|
||||
Use parallel tree reduction:
|
||||
```
|
||||
A ─┬─ AB ─┬─ ABCD
|
||||
B ─┘ │
|
||||
C ─┬─ CD ─┘
|
||||
D ─┘
|
||||
|
||||
Level 0: [A, B, C, D] (4 parallel)
|
||||
Level 1: [AB, CD] (2 parallel)
|
||||
Level 2: [ABCD] (1 final)
|
||||
```
|
||||
|
||||
This reduces O(N) to O(log N) levels.
|
||||
|
||||
## Phase 3: Execution
|
||||
|
||||
### Purpose
|
||||
Execute the plan, skipping steps with cached results.
|
||||
|
||||
### Inputs
|
||||
- ExecutionPlan with pre-computed cache IDs
|
||||
- Cache state (which IDs already exist)
|
||||
|
||||
### Process
|
||||
|
||||
1. **Claim Check**: For each step, atomically check if result is cached
|
||||
2. **Task Dispatch**: Uncached steps dispatched to Celery workers
|
||||
3. **Parallel Execution**: Independent steps run concurrently
|
||||
4. **Result Storage**: Each step stores result with its cache ID
|
||||
5. **Progress Tracking**: Real-time status updates
|
||||
|
||||
### Hash-Based Task Claiming
|
||||
|
||||
Prevents duplicate work when multiple workers process the same plan:
|
||||
|
||||
```lua
|
||||
-- Redis Lua script for atomic claim
|
||||
local key = KEYS[1]
|
||||
local data = redis.call('GET', key)
|
||||
if data then
|
||||
local status = cjson.decode(data)
|
||||
if status.status == 'running' or
|
||||
status.status == 'completed' or
|
||||
status.status == 'cached' then
|
||||
return 0 -- Already claimed/done
|
||||
end
|
||||
end
|
||||
local claim_data = ARGV[1]
|
||||
local ttl = tonumber(ARGV[2])
|
||||
redis.call('SETEX', key, ttl, claim_data)
|
||||
return 1 -- Successfully claimed
|
||||
```
|
||||
|
||||
### Celery Task Structure
|
||||
|
||||
```python
|
||||
@app.task(bind=True)
|
||||
def execute_step(self, step_json: str, plan_id: str) -> dict:
|
||||
"""Execute a single step with caching."""
|
||||
step = ExecutionStep.from_json(step_json)
|
||||
|
||||
# Check cache first
|
||||
if cache.has(step.cache_id):
|
||||
return {'status': 'cached', 'cache_id': step.cache_id}
|
||||
|
||||
# Try to claim this work
|
||||
if not claim_task(step.cache_id, self.request.id):
|
||||
# Another worker is handling it, wait for result
|
||||
return wait_for_result(step.cache_id)
|
||||
|
||||
# Do the work
|
||||
executor = get_executor(step.node_type)
|
||||
input_paths = [cache.get(s) for s in step.input_steps]
|
||||
output_path = cache.get_output_path(step.cache_id)
|
||||
|
||||
result_path = executor.execute(step.config, input_paths, output_path)
|
||||
cache.put(step.cache_id, result_path)
|
||||
|
||||
return {'status': 'completed', 'cache_id': step.cache_id}
|
||||
```
|
||||
|
||||
### Execution Orchestration
|
||||
|
||||
```python
|
||||
class PlanExecutor:
|
||||
def execute(self, plan: ExecutionPlan) -> ExecutionResult:
|
||||
"""Execute plan with parallel Celery tasks."""
|
||||
|
||||
# Group steps by level (steps at same level can run in parallel)
|
||||
levels = self.compute_dependency_levels(plan.steps)
|
||||
|
||||
for level_steps in levels:
|
||||
# Dispatch all steps at this level
|
||||
tasks = [
|
||||
execute_step.delay(step.to_json(), plan.plan_id)
|
||||
for step in level_steps
|
||||
if not self.cache.has(step.cache_id)
|
||||
]
|
||||
|
||||
# Wait for level completion
|
||||
results = [task.get() for task in tasks]
|
||||
|
||||
return self.collect_results(plan)
|
||||
```
|
||||
|
||||
## Data Flow Example
|
||||
|
||||
### Recipe: beat-cuts
|
||||
```yaml
|
||||
nodes:
|
||||
- id: music
|
||||
type: SOURCE
|
||||
config: { input: true }
|
||||
|
||||
- id: beats
|
||||
type: ANALYZE
|
||||
config: { feature: beats }
|
||||
inputs: [music]
|
||||
|
||||
- id: videos
|
||||
type: SOURCE_LIST
|
||||
config: { input: true }
|
||||
|
||||
- id: slices
|
||||
type: MAP
|
||||
config: { operation: RANDOM_SLICE }
|
||||
inputs:
|
||||
items: videos
|
||||
timing: beats
|
||||
|
||||
- id: final
|
||||
type: SEQUENCE
|
||||
inputs: [slices]
|
||||
```
|
||||
|
||||
### Phase 1: Analysis
|
||||
```python
|
||||
# Input: music file with hash abc123
|
||||
analysis = {
|
||||
'abc123': AnalysisResult(
|
||||
beats=[0.0, 0.48, 0.96, 1.44, ...],
|
||||
tempo=125.0,
|
||||
duration=180.0
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Planning
|
||||
```python
|
||||
# Expands MAP into concrete steps
|
||||
plan = ExecutionPlan(
|
||||
steps=[
|
||||
# Source steps
|
||||
ExecutionStep(id='music', cache_id='abc123', ...),
|
||||
ExecutionStep(id='video_0', cache_id='def456', ...),
|
||||
ExecutionStep(id='video_1', cache_id='ghi789', ...),
|
||||
|
||||
# Slice steps (one per beat group)
|
||||
ExecutionStep(id='slice_0', cache_id='hash(video_0+timing)', ...),
|
||||
ExecutionStep(id='slice_1', cache_id='hash(video_1+timing)', ...),
|
||||
...
|
||||
|
||||
# Tree reduction for sequence
|
||||
ExecutionStep(id='seq_0_1', inputs=['slice_0', 'slice_1'], ...),
|
||||
ExecutionStep(id='seq_2_3', inputs=['slice_2', 'slice_3'], ...),
|
||||
ExecutionStep(id='seq_final', inputs=['seq_0_1', 'seq_2_3'], ...),
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
### Phase 3: Execution
|
||||
```
|
||||
Level 0: [music, video_0, video_1] → all cached (SOURCE)
|
||||
Level 1: [slice_0, slice_1, slice_2, slice_3] → 4 parallel tasks
|
||||
Level 2: [seq_0_1, seq_2_3] → 2 parallel SEQUENCE tasks
|
||||
Level 3: [seq_final] → 1 final SEQUENCE task
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
artdag/
|
||||
├── artdag/
|
||||
│ ├── analysis/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── analyzer.py # Main Analyzer class
|
||||
│ │ ├── audio.py # Audio feature extraction
|
||||
│ │ └── video.py # Video feature extraction
|
||||
│ ├── planning/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── planner.py # RecipePlanner class
|
||||
│ │ ├── schema.py # ExecutionPlan, ExecutionStep
|
||||
│ │ └── tree_reduction.py # Parallel composition optimizer
|
||||
│ └── execution/
|
||||
│ ├── __init__.py
|
||||
│ ├── executor.py # PlanExecutor class
|
||||
│ └── claiming.py # Hash-based task claiming
|
||||
|
||||
art-celery/
|
||||
├── tasks/
|
||||
│ ├── __init__.py
|
||||
│ ├── analyze.py # analyze_inputs task
|
||||
│ ├── plan.py # generate_plan task
|
||||
│ ├── execute.py # execute_step task
|
||||
│ └── orchestrate.py # run_plan (coordinates all)
|
||||
├── claiming.py # Redis Lua scripts
|
||||
└── ...
|
||||
```
|
||||
|
||||
## CLI Interface
|
||||
|
||||
```bash
|
||||
# Full pipeline
|
||||
artdag run-recipe recipes/beat-cuts/recipe.yaml \
|
||||
-i music:abc123 \
|
||||
-i videos:def456,ghi789
|
||||
|
||||
# Phase by phase
|
||||
artdag analyze recipes/beat-cuts/recipe.yaml -i music:abc123
|
||||
# → outputs analysis.json
|
||||
|
||||
artdag plan recipes/beat-cuts/recipe.yaml --analysis analysis.json
|
||||
# → outputs plan.json
|
||||
|
||||
artdag execute plan.json
|
||||
# → runs with caching, skips completed steps
|
||||
|
||||
# Dry run (show what would execute)
|
||||
artdag execute plan.json --dry-run
|
||||
# → shows which steps are cached vs need execution
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Development Speed**: Change recipe, re-run → only affected steps execute
|
||||
2. **Parallelism**: Independent steps run on multiple Celery workers
|
||||
3. **Reproducibility**: Same inputs + recipe = same cache IDs = same output
|
||||
4. **Visibility**: Plan shows exactly what will happen before execution
|
||||
5. **Cost Control**: Estimate compute before committing resources
|
||||
6. **Fault Tolerance**: Failed runs resume from last successful step
|
||||
Reference in New Issue
Block a user