transpose_mxn Class — pytorch Architecture

Architecture documentation for the transpose_mxn class in vec256_float.h from the pytorch codebase.

Class c

Entity Profile

Source Code

aten/src/ATen/cpu/vec/vec256/vec256_float.h lines 792–825

template <>
inline void transpose_mxn<float, 8, 8>(
    const float* src,
    int64_t ld_src,
    float* dst,
    int64_t ld_dst) {
  // load from src to registers
  at::vec::VectorizedN<float, 8> input;
  // a: a0  a1  a2  a3  a4  a5  a6  a7
  // b: b0  b1  b2  b3  b4  b5  b6  b7
  // c: c0  c1  c2  c3  c4  c5  c6  c7
  // d: d0  d1  d2  d3  d4  d5  d6  d7
  // e: e0  e1  e2  e3  e4  e5  e6  e7
  // f: f0  f1  f2  f3  f4  f5  f6  f7
  // g: g0  g1  g2  g3  g4  g5  g6  g7
  // h: h0  h1  h2  h3  h4  h5  h6  h7
  int i;
#ifndef __msvc_cl__
#pragma unroll
#endif
  for (i = 0; i < 8; i++) {
    input[i] = _mm256_loadu_ps(&src[i * ld_src]);
  }

  transpose_block(input);

  // store from registers to dst
#ifndef __msvc_cl__
#pragma unroll
#endif
  for (i = 0; i < 8; i++) {
    _mm256_storeu_ps(&dst[i * ld_dst], input[i]);
  }
}

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free