NoOpWarpIteratorScale Class — pytorch Architecture
Architecture documentation for the NoOpWarpIteratorScale class in mma_from_smem.h from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/mma_from_smem.h lines 270–299
template <typename TensorRef>
class NoOpWarpIteratorScale {
public:
// in pipelined+multistage MMA implementations we keep an array of fragments.
// if we aren't using scaling we don't want to waste registers on fragments
// of scale elements, so ideally this would be sized 0.
// Since arrays of zero-sized objects are not allowed, using size as 1.
// The compiler will most likely wipe it out anyways.
using Fragment = cutlass::Array<char, 1>;
CUTLASS_HOST_DEVICE
NoOpWarpIteratorScale() {}
CUTLASS_HOST_DEVICE
NoOpWarpIteratorScale(TensorRef const&, int) {}
CUTLASS_HOST_DEVICE
NoOpWarpIteratorScale& add_tile_offset(
typename TensorRef::TensorCoord const&) {
return *this;
}
CUTLASS_HOST_DEVICE
NoOpWarpIteratorScale& operator++() {
return *this;
}
CUTLASS_DEVICE
void load(Fragment&) const {}
};
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free