The Python fallback path was reading amplitude directly from effect dict
instead of checking dynamic_params first like the CUDA kernel path does.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Zoom now driven by audio energy via core:map-range
- Ripple amplitude reads from dynamic_params in sexp_to_cuda
- Crossfade transition with zoom in/out effect
- Move git clone before COPY in Dockerfile for better caching
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running with --pool=solo, there may already be a running event loop.
Use thread pool to run async coroutines when a loop is already running.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Downstream code expects arrays with .flags attribute, not GPUFrame.
Extract the underlying gpu/cpu array before returning.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The fallback path was passing raw numpy/cupy arrays to GPU functions
that expect GPUFrame objects with .cpu property.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Audio playback path was being resolved during parsing when database
may not be ready, causing fallback to non-existent path. Now resolves
lazily when stream starts, matching how audio analyzer works.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace slow scipy.ndimage operations with custom CUDA kernels:
- gpu_rotate: AFFINE_WARP_KERNEL (< 1ms vs 20ms for scipy)
- gpu_blend: BLEND_KERNEL for fast alpha blending
- gpu_brightness/contrast: BRIGHTNESS_CONTRAST_KERNEL
- Add gpu_zoom, gpu_hue_shift, gpu_invert, gpu_ripple
Preserve GPU arrays through pipeline:
- Updated _maybe_to_numpy() to keep CuPy arrays for GPU primitives
- Primitives detect CuPy arrays via __cuda_array_interface__
- No unnecessary CPU round-trips between operations
New jit_compiler.py contains all CUDA kernels with FastGPUOps
class using ping-pong buffer strategy for efficient in-place ops.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New GPUHLSOutput class for direct GPU-to-NVENC encoding
- RGB→NV12 conversion via CUDA kernel (no CPU transfer)
- Uses PyNvVideoCodec for zero-copy GPU encoding
- ~220fps vs ~4fps with CPU pipe approach
- Automatically used when PyNvVideoCodec is available
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use devel image for compilation, runtime for final image.
Keeps image smaller while enabling NVDEC decode.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Build decord with -DUSE_CUDA=ON for true NVDEC hardware decode
- Use DLPack for zero-copy transfer from decord to CuPy
- Frames stay on GPU throughout: decode -> process -> encode
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace decord (CPU-only pip package) with PyNvCodec which provides
direct NVDEC access. Frames decode straight to GPU memory without
any CPU transfer, eliminating the memory bandwidth bottleneck.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install decord in GPU Dockerfile for hardware video decode
- Update GPUVideoSource to use decord with GPU context
- Decord decodes on GPU via NVDEC, avoiding CPU memory copies
- Falls back to FFmpeg pipe if decord unavailable
- Enable STREAMING_GPU_PERSIST=1 for full GPU pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- CIDVideoSource now uses GPUVideoSource when GPU is available
- Enables CUDA hardware decoding for video sources
- Should significantly improve rendering performance
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add -movflags +faststart to move moov atom to start
- Add -fflags +genpts for proper timestamp generation
- Fixes jerky playback and video/audio desync
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add on_playlist_update callback to IPFSHLSOutput
- Pass callback through StreamInterpreter to output
- Update database with playlist CID as segments are created
- Enables live HLS redirect to IPFS before rendering completes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ipfs_playlist_cid column to pending_runs schema with migration
- Add pool guards to critical database functions (RuntimeError if not initialized)
- Add update_pending_run_playlist() function for streaming
- Update streaming task to save playlist CID to DB for HLS redirect
- Change database error handling from warning to raising exception
Errors should fail fast and explicitly, not be silently swallowed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
init_db() was creating new pools without checking if one already exists,
causing "too many clients already" errors under load. Added early return
if pool is already initialized and set explicit pool limits (min=2, max=10).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Templates now prefer /ipfs/{cid} over /cache/{cid}/raw when
run.ipfs_cid is set. This fixes playback for content that exists
on IPFS but not on the local API server cache.
Also fixed field name: run.output_ipfs_cid -> run.ipfs_cid to match
database schema.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
HLS outputs were including full audio track instead of trimming
to match video duration, causing video to freeze while audio
continued playing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>