binary_kernel_reduce_lastdim Class — pytorch Architecture
Architecture documentation for the binary_kernel_reduce_lastdim class in Reduce.h from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/cpu/Reduce.h lines 290–308
template <typename reduce_func_t>
void binary_kernel_reduce_lastdim(TensorIteratorBase& iter, reduce_func_t reduce_op) {
auto shape = iter.shape();
int64_t dim_size = shape[0];
int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / dim_size);
TensorIterator sub_iter(iter);
// create sub iterator to parallel on all non-reduce-dims
sub_iter.narrow(0, 0, 1);
auto loop = [&](char** data, const int64_t* strides, int64_t size) {
char* out = data[0];
char* in = data[1];
for (int64_t i = 0; i < size; ++i) {
reduce_op(out, in, dim_size);
out += strides[0];
in += strides[1];
}
};
sub_iter.for_each(loop, grain_size);
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free