deinterleave2 Class — pytorch Architecture
Architecture documentation for the deinterleave2 class in vec256.h from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/cpu/vec/vec256/vec256.h lines 255–277
template <>
std::pair<Vectorized<double>, Vectorized<double>> inline deinterleave2<double>(
const Vectorized<double>& a,
const Vectorized<double>& b) {
// inputs:
// a = {a0, b0, a1, b1}
// b = {a2, b2, a3, b3}
// group cols crossing lanes:
// a_grouped = {a0, a1, b0, b1}
// b_grouped = {a2, a3, b2, b3}
auto a_grouped = _mm256_permute4x64_pd(a, 0b11011000); // 0, 2, 1, 3
auto b_grouped = _mm256_permute4x64_pd(b, 0b11011000); // 0, 2, 1, 3
// swap lanes:
// return {a0, a1, a2, a3}
// {b0, b1, b2, b3}
return std::make_pair(
_mm256_permute2f128_pd(
a_grouped, b_grouped, 0b0100000), // 0, 2. 4 bits apart
_mm256_permute2f128_pd(
a_grouped, b_grouped, 0b0110001)); // 1, 3. 4 bits apart
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free