JAX via XLA produces identical output on CPU and GPU. Previously
CUDA hand-written kernels were preferred on GPU, causing visual
differences vs the JAX CPU fallback. Now JAX is always used first,
with legacy CuPy/GPUFrame as fallback only when JAX is unavailable.
Also adds comprehensive CLAUDE.md for the monorepo.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When CUDA fused kernels aren't available, the fused-pipeline primitive
now uses JAX ops (jax_rotate, jax_scale, jax_shift_hue, etc.) instead
of falling back to one-by-one CuPy/GPUFrame operations. Legacy GPUFrame
path retained as last resort when JAX is also unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>