Replace slow scipy.ndimage operations with custom CUDA kernels:
- gpu_rotate: AFFINE_WARP_KERNEL (< 1ms vs 20ms for scipy)
- gpu_blend: BLEND_KERNEL for fast alpha blending
- gpu_brightness/contrast: BRIGHTNESS_CONTRAST_KERNEL
- Add gpu_zoom, gpu_hue_shift, gpu_invert, gpu_ripple
Preserve GPU arrays through pipeline:
- Updated _maybe_to_numpy() to keep CuPy arrays for GPU primitives
- Primitives detect CuPy arrays via __cuda_array_interface__
- No unnecessary CPU round-trips between operations
New jit_compiler.py contains all CUDA kernels with FastGPUOps
class using ping-pong buffer strategy for efficient in-place ops.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Build decord with -DUSE_CUDA=ON for true NVDEC hardware decode
- Use DLPack for zero-copy transfer from decord to CuPy
- Frames stay on GPU throughout: decode -> process -> encode
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace decord (CPU-only pip package) with PyNvCodec which provides
direct NVDEC access. Frames decode straight to GPU memory without
any CPU transfer, eliminating the memory bandwidth bottleneck.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install decord in GPU Dockerfile for hardware video decode
- Update GPUVideoSource to use decord with GPU context
- Decord decodes on GPU via NVDEC, avoiding CPU memory copies
- Falls back to FFmpeg pipe if decord unavailable
- Enable STREAMING_GPU_PERSIST=1 for full GPU pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
streaming_gpu.py was being loaded on GPU nodes but had no PRIMITIVES dict,
causing audio-beat, audio-energy etc. to be missing. Now imports and
includes all primitives from the CPU streaming.py module.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add IPFSHLSOutput class that uploads segments to IPFS as they're created
- Update streaming task to use IPFS HLS output for distributed streaming
- Add /ipfs-stream endpoint to get IPFS playlist URL
- Update /stream endpoint to redirect to IPFS when available
- Add GPU persistence mode (STREAMING_GPU_PERSIST=1) to keep frames on GPU
- Add hardware video decoding (NVDEC) support for faster video processing
- Add GPU-accelerated primitive libraries: blending_gpu, color_ops_gpu, geometry_gpu
- Add streaming_gpu module with GPUFrame class for tracking CPU/GPU data location
- Add Dockerfile.gpu for building GPU-enabled worker image
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>