cpu_gather_expanded_index_kernel Class — pytorch Architecture
Architecture documentation for the cpu_gather_expanded_index_kernel class in ScatterGatherKernel.cpp from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/cpu/ScatterGatherKernel.cpp lines 817–854
template <typename scalar_t>
void cpu_gather_expanded_index_kernel(const Tensor& result, const Tensor& _index, const Tensor& self) {
Tensor index = _index.to(ScalarType::Long);
const int64_t* index_data = index.const_data_ptr<int64_t>();
scalar_t* result_data = result.data_ptr<scalar_t>();
const scalar_t* self_data = self.const_data_ptr<scalar_t>();
const int64_t M = ensure_nonempty_size(result, 0);
const int64_t N = ensure_nonempty_size(self, 0);
const int64_t K = index.numel() / M;
const int64_t index_upper_bound = N;
using Vec = vec::Vectorized<scalar_t>;
int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / K);
at::parallel_for(0, M, grain_size, [&](int64_t begin, int64_t end) {
for (const auto m : c10::irange(begin, end)) {
scalar_t* result_ptr = result_data + m * K;
int64_t index = index_data[m];
TORCH_CHECK(index >= 0 && index < index_upper_bound,
"index ", index,
" is out of bounds for dimension ", 0,
" with size ", index_upper_bound);
const scalar_t* self_ptr = self_data + index * K;
int64_t d = 0;
for (; d < K - (K % Vec::size()); d += Vec::size()) {
Vec out_vec = Vec::loadu(self_ptr + d);
out_vec.store(result_ptr + d);
}
#if !defined(_MSC_VER) && !defined(COMPILING_FOR_MIN_SIZE)
# pragma unroll
#endif
for (; d < K; d++) {
result_ptr[d] = self_ptr[d];
}
}
});
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free