ReluFused Class — pytorch Architecture
Architecture documentation for the ReluFused class in QuantizedOpKernels.cpp from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp lines 2383–2412
template <typename T, bool ReluFused>
inline void do_bn_compute(
typename T::underlying* X_ptr,
typename T::underlying* Y_ptr,
Vectorized<float> & fake_scale,
Vectorized<float> & in_zp_vec,
Vectorized<float> & scale_neg_zp_premul,
int64_t out_zero_point,
Vectorized<T> & out_zero_point_v,
float* alpha,
float* beta,
int64_t vec_num,
int64_t kVLen
) {
using Vec = Vectorized<T>;
auto vals_q = Vec::loadu(X_ptr);
// Fake scale of 1.0 here, should not affect performance (FMA in place of sub)
auto vals_dq = vals_q.dequantize(fake_scale, in_zp_vec, scale_neg_zp_premul);
for (const auto idx : c10::irange(vec_num)) {
auto alpha_v = Vectorized<float>::loadu(alpha + idx * kVLen);
auto beta_v = Vectorized<float>::loadu(beta + idx * kVLen);
vals_dq[idx] = vec::fmadd(alpha_v, vals_dq[idx], beta_v);
}
auto outputs_q = Vec::quantize(vals_dq, /*scale=*/1.0f, out_zero_point, /*inverse_scale=*/1.0f);
// Fake scale again
if constexpr (ReluFused) {
outputs_q = outputs_q.maximum(out_zero_point_v);
}
outputs_q.store(Y_ptr, vec_num * kVLen);
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free