NoOpWarpIteratorScale Class — pytorch Architecture

Architecture documentation for the NoOpWarpIteratorScale class in mma_from_smem.h from the pytorch codebase.

Class c

Entity Profile

Source Code

aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/mma_from_smem.h lines 270–299

template <typename TensorRef>
class NoOpWarpIteratorScale {
 public:
  // in pipelined+multistage MMA implementations we keep an array of fragments.
  // if we aren't using scaling we don't want to waste registers on fragments
  // of scale elements, so ideally this would be sized 0.
  // Since arrays of zero-sized objects are not allowed, using size as 1.
  // The compiler will most likely wipe it out anyways.
  using Fragment = cutlass::Array<char, 1>;

  CUTLASS_HOST_DEVICE
  NoOpWarpIteratorScale() {}

  CUTLASS_HOST_DEVICE
  NoOpWarpIteratorScale(TensorRef const&, int) {}

  CUTLASS_HOST_DEVICE
  NoOpWarpIteratorScale& add_tile_offset(
      typename TensorRef::TensorCoord const&) {
    return *this;
  }

  CUTLASS_HOST_DEVICE
  NoOpWarpIteratorScale& operator++() {
    return *this;
  }

  CUTLASS_DEVICE
  void load(Fragment&) const {}
};

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free