Commit Graph

382 Commits

Author SHA1 Message Date
giles
b773689814 Fix fast_ripple signature in test 2026-02-04 09:52:30 +00:00
giles
2d20a6f452 Add fused-pipeline primitive and test for compiled CUDA kernels
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
2026-02-04 09:51:56 +00:00
giles
8b9309a90b Fix f-string brace escaping in ripple effect CUDA code
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
2026-02-04 09:49:52 +00:00
giles
3b964ba18d Add sexp to CUDA kernel compiler for fused pipeline
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 09:46:40 +00:00
giles
4d95ec5a32 Add profiling to stream interpreter to find bottleneck
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 09:32:53 +00:00
giles
ad1d7893f8 Integrate fast CUDA kernels for GPU effects pipeline
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Replace slow scipy.ndimage operations with custom CUDA kernels:
- gpu_rotate: AFFINE_WARP_KERNEL (< 1ms vs 20ms for scipy)
- gpu_blend: BLEND_KERNEL for fast alpha blending
- gpu_brightness/contrast: BRIGHTNESS_CONTRAST_KERNEL
- Add gpu_zoom, gpu_hue_shift, gpu_invert, gpu_ripple

Preserve GPU arrays through pipeline:
- Updated _maybe_to_numpy() to keep CuPy arrays for GPU primitives
- Primitives detect CuPy arrays via __cuda_array_interface__
- No unnecessary CPU round-trips between operations

New jit_compiler.py contains all CUDA kernels with FastGPUOps
class using ping-pong buffer strategy for efficient in-place ops.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:53:46 +00:00
giles
75f9d8fb11 Configure GPU encoder for low-latency (no B-frames)
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:40:07 +00:00
giles
b96e8ca4d2 Add playlist_url property to GPUHLSOutput
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:37:21 +00:00
giles
8051ef9ba9 Fix GPUHLSOutput method name (write not write_frame)
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:34:51 +00:00
giles
3adf927ca1 Add zero-copy GPU encoding pipeline
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- New GPUHLSOutput class for direct GPU-to-NVENC encoding
- RGB→NV12 conversion via CUDA kernel (no CPU transfer)
- Uses PyNvVideoCodec for zero-copy GPU encoding
- ~220fps vs ~4fps with CPU pipe approach
- Automatically used when PyNvVideoCodec is available

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:32:43 +00:00
giles
9bdad268a5 Fix DLPack: use frame.to_dlpack() for decord→CuPy zero-copy
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:10:18 +00:00
giles
1cb9c3ac8a Add DLPack debug logging to diagnose zero-copy
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:06:19 +00:00
giles
36c4afeb84 Add LD_PRELOAD for libnvcuvid in Dockerfile
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
2026-02-04 01:56:40 +00:00
giles
b6292268fa Add NVDEC headers and libnvcuvid stub for decord build
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
2026-02-04 01:53:37 +00:00
giles
3a02fca7fd Add FFmpeg dev headers for decord build
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
2026-02-04 01:52:21 +00:00
giles
c4004b3f5d Multi-stage Dockerfile for decord CUDA build
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Use devel image for compilation, runtime for final image.
Keeps image smaller while enabling NVDEC decode.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:51:00 +00:00
giles
41adf058bd Build decord from source with CUDA for GPU video decode
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Build decord with -DUSE_CUDA=ON for true NVDEC hardware decode
- Use DLPack for zero-copy transfer from decord to CuPy
- Frames stay on GPU throughout: decode -> process -> encode

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:50:14 +00:00
giles
b7e3827fa2 Use PyNvCodec for true zero-copy GPU video decode
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
Replace decord (CPU-only pip package) with PyNvCodec which provides
direct NVDEC access. Frames decode straight to GPU memory without
any CPU transfer, eliminating the memory bandwidth bottleneck.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:47:03 +00:00
giles
771fb8cebc Add decord for GPU-native video decode
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Install decord in GPU Dockerfile for hardware video decode
- Update GPUVideoSource to use decord with GPU context
- Decord decodes on GPU via NVDEC, avoiding CPU memory copies
- Falls back to FFmpeg pipe if decord unavailable
- Enable STREAMING_GPU_PERSIST=1 for full GPU pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:17:22 +00:00
giles
ef4bc24eda Use GPUVideoSource for hardware-accelerated video decoding
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- CIDVideoSource now uses GPUVideoSource when GPU is available
- Enables CUDA hardware decoding for video sources
- Should significantly improve rendering performance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:03:16 +00:00
giles
0bd8ee71c7 Fix MP4 mux for web playback: add faststart and genpts
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Add -movflags +faststart to move moov atom to start
- Add -fflags +genpts for proper timestamp generation
- Fixes jerky playback and video/audio desync

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:32:45 +00:00
giles
9151d2c2a8 Update IPFS playlist CID in database during streaming for live HLS
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Add on_playlist_update callback to IPFSHLSOutput
- Pass callback through StreamInterpreter to output
- Update database with playlist CID as segments are created
- Enables live HLS redirect to IPFS before rendering completes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:23:49 +00:00
giles
ed5ef2bf39 Add ipfs_playlist_cid to pending_runs and fail-fast on DB errors
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Add ipfs_playlist_cid column to pending_runs schema with migration
- Add pool guards to critical database functions (RuntimeError if not initialized)
- Add update_pending_run_playlist() function for streaming
- Update streaming task to save playlist CID to DB for HLS redirect
- Change database error handling from warning to raising exception

Errors should fail fast and explicitly, not be silently swallowed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:02:18 +00:00
giles
bbcb79cc1e Fix database connection pool leak in init_db()
init_db() was creating new pools without checking if one already exists,
causing "too many clients already" errors under load. Added early return
if pool is already initialized and set explicit pool limits (min=2, max=10).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 23:01:43 +00:00
giles
11bcafee55 Use IPFS URLs for video/image playback when available
Templates now prefer /ipfs/{cid} over /cache/{cid}/raw when
run.ipfs_cid is set. This fixes playback for content that exists
on IPFS but not on the local API server cache.

Also fixed field name: run.output_ipfs_cid -> run.ipfs_cid to match
database schema.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:15:09 +00:00
giles
b49d109a51 Fix HLS audio duration to match video with -shortest flag
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
HLS outputs were including full audio track instead of trimming
to match video duration, causing video to freeze while audio
continued playing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:13:08 +00:00
giles
9096824444 Redirect /hls/stream.m3u8 to IPFS playlist when available
When the local HLS playlist doesn't exist, check for IPFS playlist
CID in pending/completed run and redirect to the IPFS gateway.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:09:19 +00:00
giles
fe6730ce72 Add dev infrastructure improvements
Some checks are pending
GPU Worker CI/CD / test (push) Waiting to run
GPU Worker CI/CD / deploy (push) Blocked by required conditions
- Central config with logging on startup
- Hot reload support for GPU worker (docker-compose.gpu-dev.yml)
- Quick deploy script (scripts/gpu-dev-deploy.sh)
- GPU/CPU frame compatibility tests
- CI/CD pipeline for GPU worker (.gitea/workflows/gpu-worker.yml)
- Standardize GPU_PERSIST default to 0 across all modules

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:56:40 +00:00
giles
6ea39d633b Bake GPU/IPFS settings into Dockerfile
Settings in Dockerfile override swarm service env vars.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:48:03 +00:00
giles
0847b733a9 Fix GPU worker config: disable GPU persistence, use cluster gateway
- STREAMING_GPU_PERSIST=0 until all primitives support GPU frames
- IPFS_GATEWAY_URL points to cluster's public gateway

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:47:19 +00:00
giles
7009840712 Add auto GPU->CPU conversion at interpreter level
Convert GPU frames/CuPy arrays to numpy before calling primitives.
This fixes all CPU primitives without modifying each one individually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:40:37 +00:00
giles
92eeb58c71 Add GPU frame conversion in color_ops
All color_ops primitives now auto-convert GPU frames to numpy,
fixing compatibility with geometry_gpu primitives.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:38:10 +00:00
giles
2c1728c6ce Disable GPU persistence by default
GPU persistence returns CuPy arrays but most primitives expect numpy.
Disable until all primitives support GPU frames.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:24:45 +00:00
giles
6e0ee65e40 Fix streaming_gpu.py to include CPU primitives
streaming_gpu.py was being loaded on GPU nodes but had no PRIMITIVES dict,
causing audio-beat, audio-energy etc. to be missing. Now imports and
includes all primitives from the CPU streaming.py module.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:20:23 +00:00
giles
3116a70c3e Fix IPFS upload: sync instead of background task
The background IPFS upload task was running on workers that don't have
the file locally, causing uploads to fail silently. Now uploads go to
IPFS synchronously so the IPFS CID is available immediately.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:17:22 +00:00
giles
09d5359725 Re-enable GPU queue routing after image rebuild
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:38:37 +00:00
giles
4930eb99ad Temp: disable GPU queue for testing IPFS HLS streaming
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:25:50 +00:00
giles
86830019ad Add IPFS HLS streaming and GPU optimizations
- Add IPFSHLSOutput class that uploads segments to IPFS as they're created
- Update streaming task to use IPFS HLS output for distributed streaming
- Add /ipfs-stream endpoint to get IPFS playlist URL
- Update /stream endpoint to redirect to IPFS when available
- Add GPU persistence mode (STREAMING_GPU_PERSIST=1) to keep frames on GPU
- Add hardware video decoding (NVDEC) support for faster video processing
- Add GPU-accelerated primitive libraries: blending_gpu, color_ops_gpu, geometry_gpu
- Add streaming_gpu module with GPUFrame class for tracking CPU/GPU data location
- Add Dockerfile.gpu for building GPU-enabled worker image

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:23:16 +00:00
giles
5bc655f8c8 Include rendering status in runs list 2026-02-03 00:44:53 +00:00
giles
8f4d6d12bc Use fragmented mp4 for streamable output while rendering 2026-02-03 00:41:07 +00:00
giles
82599eff1e Add fallback to format duration and debug logging for VideoSource 2026-02-03 00:36:36 +00:00
giles
a57be27907 Add live video streaming for in-progress renders 2026-02-03 00:27:19 +00:00
giles
9302283e86 Add admin token support to runs list endpoint 2026-02-03 00:23:14 +00:00
giles
487acdd606 Fix global declaration placement in streaming task 2026-02-03 00:16:49 +00:00
giles
6b2991bf24 Fix database event loop conflicts in streaming task 2026-02-03 00:14:42 +00:00
giles
3ec045c533 Add _stream_time and skip() to CIDVideoSource 2026-02-03 00:07:33 +00:00
giles
3bff130e57 Add path property to CIDVideoSource 2026-02-03 00:06:10 +00:00
giles
414cbddd66 Fix VideoSource import path 2026-02-03 00:02:23 +00:00
giles
89b2fd3d2e Add debug logging to resolve_asset 2026-02-03 00:00:30 +00:00
giles
3d5a08a7dc Don't overwrite pre-registered primitives in _load_primitives 2026-02-02 23:59:38 +00:00