cpu_gather_expanded_index_kernel Class — pytorch Architecture

Architecture documentation for the cpu_gather_expanded_index_kernel class in ScatterGatherKernel.cpp from the pytorch codebase.

Class cpp

Entity Profile

Source Code

aten/src/ATen/native/cpu/ScatterGatherKernel.cpp lines 817–854

template <typename scalar_t>
void cpu_gather_expanded_index_kernel(const Tensor& result, const Tensor& _index, const Tensor& self) {
  Tensor index = _index.to(ScalarType::Long);
  const int64_t* index_data = index.const_data_ptr<int64_t>();
  scalar_t* result_data = result.data_ptr<scalar_t>();
  const scalar_t* self_data = self.const_data_ptr<scalar_t>();

  const int64_t M = ensure_nonempty_size(result, 0);
  const int64_t N = ensure_nonempty_size(self, 0);
  const int64_t K = index.numel() / M;

  const int64_t index_upper_bound = N;

  using Vec = vec::Vectorized<scalar_t>;
  int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / K);
  at::parallel_for(0, M, grain_size, [&](int64_t begin, int64_t end) {
    for (const auto m : c10::irange(begin, end)) {
      scalar_t* result_ptr = result_data + m * K;
      int64_t index = index_data[m];
      TORCH_CHECK(index >= 0 && index < index_upper_bound,
                  "index ", index,
                  " is out of bounds for dimension ", 0,
                  " with size ", index_upper_bound);
      const scalar_t* self_ptr = self_data + index * K;
      int64_t d = 0;
      for (; d < K - (K % Vec::size()); d += Vec::size()) {
        Vec out_vec = Vec::loadu(self_ptr + d);
        out_vec.store(result_ptr + d);
      }
      #if !defined(_MSC_VER) && !defined(COMPILING_FOR_MIN_SIZE)
      # pragma unroll
      #endif
      for (; d < K; d++) {
        result_ptr[d] = self_ptr[d];
      }
    }
  });
}

Source

View on GitHub

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free