Part 2 of my Mojo GPU-puzzles series dives into the workhorse kernels of DL: sliding-window pooling, halo-aware 1-D/2-D convolutions, warp-level prefix sums, and more. Lots of diagrams + runnable kernels; builds directly on Part 1(https://shubhamg.in/posts/2025-07-06-gpu-puzzles-p1.html). Feedback & perf tips welcome!
Part 2 of my Mojo GPU-puzzles series dives into the workhorse kernels of DL: sliding-window pooling, halo-aware 1-D/2-D convolutions, warp-level prefix sums, and more. Lots of diagrams + runnable kernels; builds directly on Part 1(https://shubhamg.in/posts/2025-07-06-gpu-puzzles-p1.html). Feedback & perf tips welcome!