dot_with_fp32_arith_main_loop_no_bfdot Class — pytorch Architecture
Architecture documentation for the dot_with_fp32_arith_main_loop_no_bfdot class in ReducedPrecisionFloatGemvFastPathKernel.cpp from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp lines 287–303
template <typename T>
C10_ALWAYS_INLINE auto
dot_with_fp32_arith_main_loop_no_bfdot(
const T* vec1,
const T* vec2,
int64_t len) {
vec::VectorizedN<float, kF32RegistersPerIteration> sum(0);
const auto len_aligned = len & ~(kF32ElementsPerIteration - 1);
for (int j = 0; j < len_aligned ; j += kF32ElementsPerIteration) {
const auto* vec1_ = vec1 + j;
const auto* vec2_ = vec2 + j;
c10::ForcedUnroll<kF32RegisterPairsPerIteration>{}([vec1_, vec2_, &sum](auto k) C10_ALWAYS_INLINE_ATTRIBUTE {
dot_with_fp32_arith_main_inner_loop_no_bfdot(vec1_, vec2_, sum, k);
});
}
return reduce(sum);
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free