JAX via XLA produces identical output on CPU and GPU. Previously
CUDA hand-written kernels were preferred on GPU, causing visual
differences vs the JAX CPU fallback. Now JAX is always used first,
with legacy CuPy/GPUFrame as fallback only when JAX is unavailable.
Also adds comprehensive CLAUDE.md for the monorepo.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>