interleave2 Class — pytorch Architecture

Architecture documentation for the interleave2 class in vec256.h from the pytorch codebase.

Class c

Entity Profile

Source Code

aten/src/ATen/cpu/vec/vec256/vec256.h lines 203–225

template <>
std::pair<Vectorized<double>, Vectorized<double>> inline interleave2<double>(
    const Vectorized<double>& a,
    const Vectorized<double>& b) {
  // inputs:
  //   a = {a0, a1, a2, a3}
  //   b = {b0, b1, b2, b3}

  // swap lanes:
  //   a_swapped = {a0, a1, b0, b1}
  //   b_swapped = {a2, a3, b2, b3}
  auto a_swapped =
      _mm256_permute2f128_pd(a, b, 0b0100000); // 0, 2.   4 bits apart
  auto b_swapped =
      _mm256_permute2f128_pd(a, b, 0b0110001); // 1, 3.   4 bits apart

  // group cols crossing lanes:
  //   return {a0, b0, a1, b1}
  //          {a2, b2, a3, b3}
  return std::make_pair(
      _mm256_permute4x64_pd(a_swapped, 0b11011000), // 0, 2, 1, 3
      _mm256_permute4x64_pd(b_swapped, 0b11011000)); // 0, 2, 1, 3
}

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free