MinBlockPerCu Class — pytorch Architecture
Architecture documentation for the MinBlockPerCu class in launch_kernel_pt.hpp from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/transformers/hip/flash_attn/ck/launch_kernel_pt.hpp lines 11–22
template <int MinBlockPerCu, typename Kernel, typename... Args>
#if CK_TILE_USE_LAUNCH_BOUNDS
__launch_bounds__(Kernel::kBlockSize, MinBlockPerCu)
#endif
__global__ void kentry_pt(Args... args)
{
#if (defined(__gfx90a__) || defined(__gfx942__) || defined(__gfx950__))
Kernel{}(args...);
#else
CUDA_KERNEL_ASSERT(false && "Fatal! Attempting to call a CK SDPA kernel on unsupported hardware");
#endif
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free