Replace decord (CPU-only pip package) with PyNvCodec which provides
direct NVDEC access. Frames decode straight to GPU memory without
any CPU transfer, eliminating the memory bandwidth bottleneck.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>