ReluFused Class — pytorch Architecture

Architecture documentation for the ReluFused class in QuantizedOpKernels.cpp from the pytorch codebase.

Class cpp

Entity Profile

Source Code

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp lines 2383–2412

template <typename T, bool ReluFused>
inline void do_bn_compute(
    typename T::underlying* X_ptr,
    typename T::underlying* Y_ptr,
    Vectorized<float> & fake_scale,
    Vectorized<float> & in_zp_vec,
    Vectorized<float> & scale_neg_zp_premul,
    int64_t out_zero_point,
    Vectorized<T> & out_zero_point_v,
    float*  alpha,
    float* beta,
    int64_t vec_num,
    int64_t kVLen
) {
  using Vec = Vectorized<T>;
  auto vals_q = Vec::loadu(X_ptr);
  // Fake scale of 1.0 here, should not affect performance (FMA in place of sub)
  auto vals_dq = vals_q.dequantize(fake_scale, in_zp_vec, scale_neg_zp_premul);
  for (const auto idx : c10::irange(vec_num)) {
    auto alpha_v = Vectorized<float>::loadu(alpha + idx * kVLen);
    auto beta_v = Vectorized<float>::loadu(beta + idx * kVLen);
    vals_dq[idx] = vec::fmadd(alpha_v, vals_dq[idx], beta_v);
  }
  auto outputs_q = Vec::quantize(vals_dq, /*scale=*/1.0f, out_zero_point, /*inverse_scale=*/1.0f);
  // Fake scale again
  if constexpr (ReluFused) {
    outputs_q = outputs_q.maximum(out_zero_point_v);
  }
  outputs_q.store(Y_ptr, vec_num * kVLen);
}

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free