Skip to content

Conversation

@xin-zhang2
Copy link
Member

No description provided.

@xin-zhang2 xin-zhang2 requested a review from yingsu00 January 27, 2026 14:22
@xin-zhang2 xin-zhang2 force-pushed the PartitionedOutput1.0-benchmark branch from 18b20cf to 216a2fa Compare January 27, 2026 16:03

void run(const RowVectorPtr& vector, int32_t numPartitions) {
folly::BenchmarkSuspender suspender;
// roundRobinPartitionFunction(vector, numPartitions, partitions_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented out line, or make the distribution a dimension as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a non-uniform distribution to test the skew case.

const auto vectorCopy = std::static_pointer_cast<RowVector>(
BaseVector::copy(*vector, bm->getPool()));
suspender.dismiss();
bm->run(vectorCopy, 10000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10000 partitions?
#partitions also need to be one test dimension. The value of intcan be {4, 8, 16, 32, 64, 128, 256, 512, 1024}

} // namespace

class PartitionedVectorBenchmark : public VectorTestBase {
protected:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the field members to the end of the class.

return std::dynamic_pointer_cast<RowVector>(batch);
}

void prepareBuffers(vector_size_t numRows, int32_t numPartitions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this function with calculatePartitionOffsets(). They are always called consecutively, and the reader has to jump into prepareBuffers() from the main logic to check what it does, because it's rather a very generic description. Merging it into calculatePartitionOffsets() would be easier for the reader, because "calculatePartitionOffsets" is self-explanatory.

rawEndPartitionOffsets[i] = offset;
}
endPartitionOffsets_->setSize(numPartitions * sizeof(vector_size_t));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you set beginPartitionOffsets_?

}

void calculatePartitionOffsets(int32_t numPartitions) {
std::vector partitionRowCounts(numPartitions, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local partitionRowCounts vector is not needed. We can do the calculation in place in endPartitionOffsets_.

vector_size_t offset = 0;
for (int32_t i = 0; i < numPartitions; ++i) {
offset += partitionRowCounts[i];
rawEndPartitionOffsets[i] = offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge line 127 and 128 into one line.

types.reserve(numColumns);

for (int i = 0; i < numColumns; ++i) {
types.push_back(typeSelection[i % typeSelection.size()]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes repeating patterns that optimize caching.
Better shuffle types once.

BufferPtr topRowOffsets_;
BufferPtr beginPartitionOffsets_;
BufferPtr endPartitionOffsets_;
BufferPtr swappingBuffer_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never allocated or sized before use.

folly::runBenchmarks();
bm.reset();
return 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the run output as a comment at the end of the file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants