celery

art-dag/celery

Fork 0

Commit Graph

Author	SHA1	Message	Date
giles	ad1d7893f8	Integrate fast CUDA kernels for GPU effects pipeline Some checks are pending GPU Worker CI/CD / test (push) Waiting to run Details GPU Worker CI/CD / deploy (push) Blocked by required conditions Details Replace slow scipy.ndimage operations with custom CUDA kernels: - gpu_rotate: AFFINE_WARP_KERNEL (< 1ms vs 20ms for scipy) - gpu_blend: BLEND_KERNEL for fast alpha blending - gpu_brightness/contrast: BRIGHTNESS_CONTRAST_KERNEL - Add gpu_zoom, gpu_hue_shift, gpu_invert, gpu_ripple Preserve GPU arrays through pipeline: - Updated _maybe_to_numpy() to keep CuPy arrays for GPU primitives - Primitives detect CuPy arrays via __cuda_array_interface__ - No unnecessary CPU round-trips between operations New jit_compiler.py contains all CUDA kernels with FastGPUOps class using ping-pong buffer strategy for efficient in-place ops. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 02:53:46 +00:00

Author

SHA1

Message

Date

giles

ad1d7893f8

Integrate fast CUDA kernels for GPU effects pipeline

GPU Worker CI/CD / test (push) Waiting to run

Details

GPU Worker CI/CD / deploy (push) Blocked by required conditions

Details

Replace slow scipy.ndimage operations with custom CUDA kernels:
- gpu_rotate: AFFINE_WARP_KERNEL (< 1ms vs 20ms for scipy)
- gpu_blend: BLEND_KERNEL for fast alpha blending
- gpu_brightness/contrast: BRIGHTNESS_CONTRAST_KERNEL
- Add gpu_zoom, gpu_hue_shift, gpu_invert, gpu_ripple

Preserve GPU arrays through pipeline:
- Updated _maybe_to_numpy() to keep CuPy arrays for GPU primitives
- Primitives detect CuPy arrays via __cuda_array_interface__
- No unnecessary CPU round-trips between operations

New jit_compiler.py contains all CUDA kernels with FastGPUOps
class using ping-pong buffer strategy for efficient in-place ops.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 02:53:46 +00:00

1 Commits