transpose_mxn Class — pytorch Architecture
Architecture documentation for the transpose_mxn class in vec256_float.h from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/cpu/vec/vec256/vec256_float.h lines 792–825
template <>
inline void transpose_mxn<float, 8, 8>(
const float* src,
int64_t ld_src,
float* dst,
int64_t ld_dst) {
// load from src to registers
at::vec::VectorizedN<float, 8> input;
// a: a0 a1 a2 a3 a4 a5 a6 a7
// b: b0 b1 b2 b3 b4 b5 b6 b7
// c: c0 c1 c2 c3 c4 c5 c6 c7
// d: d0 d1 d2 d3 d4 d5 d6 d7
// e: e0 e1 e2 e3 e4 e5 e6 e7
// f: f0 f1 f2 f3 f4 f5 f6 f7
// g: g0 g1 g2 g3 g4 g5 g6 g7
// h: h0 h1 h2 h3 h4 h5 h6 h7
int i;
#ifndef __msvc_cl__
#pragma unroll
#endif
for (i = 0; i < 8; i++) {
input[i] = _mm256_loadu_ps(&src[i * ld_src]);
}
transpose_block(input);
// store from registers to dst
#ifndef __msvc_cl__
#pragma unroll
#endif
for (i = 0; i < 8; i++) {
_mm256_storeu_ps(&dst[i * ld_dst], input[i]);
}
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free