MinBlockPerCu Class — pytorch Architecture

Architecture documentation for the MinBlockPerCu class in launch_kernel_pt.hpp from the pytorch codebase.

Class cpp

Entity Profile

Source Code

aten/src/ATen/native/transformers/hip/flash_attn/ck/launch_kernel_pt.hpp lines 11–22

template <int MinBlockPerCu, typename Kernel, typename... Args>
#if CK_TILE_USE_LAUNCH_BOUNDS
__launch_bounds__(Kernel::kBlockSize, MinBlockPerCu)
#endif
    __global__ void kentry_pt(Args... args)
{
#if (defined(__gfx90a__) || defined(__gfx942__) || defined(__gfx950__))
    Kernel{}(args...);
#else
    CUDA_KERNEL_ASSERT(false && "Fatal! Attempting to call a CK SDPA kernel on unsupported hardware");
#endif
}

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free