profile
viewpoint

LordFPL/docker-gitlab 0

GitLab dockerized on top of nginx/mysql stack http://www.damagehead.com/docker-gitlab/

LordFPL/docker-gitlab-ci 0

Dockerfile to build a GitLab CI container image.

LordFPL/docker-openfire 0

Dockerfile to create a Docker container image for Openfire.

LordFPL/elabftw-docker-nosql 0

Install elabftw in a container. You need to link it to a database.

LordFPL/elabimg 0

elabftw in a docker container

LordFPL/gitlabhq 0

Open source software to collaborate on code

LordFPL/gogs 0

Gogs is a painless self-hosted Git service.

LordFPL/haproxy 0

HAProxy related stuff: scripts, configs, etc...

LordFPL/hugo 0

A Fast and Flexible Static Site Generator built with love by spf13 in GoLang

LordFPL/logstash-filters 0

A collection of grok patters that match applications provided by Adaptive Computing

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

@huang-b, Congratulations, you get 6600 in this PR, and your total score is 6600 in hptc challenge program.

<details> <summary>Details</summary>

Tip : None

Warning: The pull request merged, huang-b got the score. But it seems linked issue not picked.

cc: Mentor @qw4990

</details>

huang-b

comment created time in 2 hours

push eventpingcap/tidb

huang-b

commit sha c9288d246c99073ff04304363dc7234d9caa5090

executor: add unit test and benchmark for shuffle merge join (#21360)

view details

push time in 2 hours

PR merged pingcap/tidb

executor: add unit test and benchmark for shuffle merge join rewarded sig/execution status/LGT2 status/can-merge

<!-- Thank you for contributing to TiDB!

PR Title Format:

  1. pkg [, pkg2, pkg3]: what's changed
  2. *: what's changed

-->

Issue Number: close #14441

What is changed and how it works?

What's Changed: add unit test and benchmark for shuffle merge join

How it Works: Below table presents the performance comparison between original merge join and shuffle version (sort before merge join).

InnerRows OuterRows Inline Original Performance Performance of 2 workers
300000 300000 false 1,110,762,991 ns/op 1,004,792,871 ns/op
300000 300000 true 613,705,766 ns/op 453,343,274 ns/op
3000 300000 false 529,891,578 ns/op 317,548,501 ns/op
3000 300000 true 276,733,256 ns/op 204,575,329 ns/op
30 300000 false 553,624,831 ns/op 509,290,244 ns/op
30 300000 true 287,897,600 ns/op 241,498,285 ns/op
300000 330000 false 921,101,314 ns/op 592,612,987 ns/op
300000 330000 true 622,737,161 ns/op 350,011,341 ns/op

Check List <!--REMOVE the items that are not applicable-->

Tests <!-- At least one of them must be included. -->

  • Unit test
  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note <!-- bugfixes or new feature need a release note -->

  • add unit test and benchmark for shuffle merge join
+451 -27

8 comments

2 changed files

huang-b

pr closed time in 2 hours

issue closedpingcap/tidb

Investigate and propose parallel sort-merge join

Description

In this PR(https://github.com/pingcap/tidb/pull/14238), we introduce the shuffle executor. We may use this shuffle executor to implement the parallel sort-merge join by nesting some sort and merge-join executors into a shuffle executor:

ShuffleExec
    - MergeJoinExec
        - SortExec
    - MergeJoinExec
        - SortExec
    ...

Compared with the hash-join we implement now, whose building phase is not parallel, it may utilize more CPU resources to get better performance.

  • [x] Modify the original shuffle to let it support multiple datasources and pass all existed tests
  • [x] Introduce a new variable to control concurrency of merge_join
  • [x] Support shuffle + merge_join
  • [ ] Add some correct tests and benchmarks about shuffle + merge_join

Score

6600

Mentor

@qw4990 (Slack ID: Zhang Yuanjia)

Recommend Skills

  • Golang

closed time in 2 hours

qw4990

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

/run-all-tests

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

/merge

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

Reward success.

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

/reward 6600

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

The linked issue's balance is not enough, current balance is 0.

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

/reward 6600

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

This PR do not have any linked issue.

<details> <summary>Details</summary>

Tip : You need to ensure that the link description follows the following template:

Issue Number: #xxx

Issue Number: close #xxx

About issue link, there is a trace issue.

Warning: None </details>

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

/reward 6600

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

@JinLingChristopher, Congratulations, you get 6600 in this PR, and your total score is 6600 in hptc challenge program.

JinLingChristopher

comment created time in 2 hours

push eventpingcap/tidb

Ling Jin

commit sha b87849868c2ad8bd536750049628df6a528782fb

executor: add benchmark for partitionRangeSplitter (#21363)

view details

push time in 2 hours

PR merged pingcap/tidb

Reviewers
executor: add benchmark for partitionRangeSplitter rewarded sig/execution status/LGT2

<!-- Thank you for contributing to TiDB!

PR Title Format:

  1. pkg [, pkg2, pkg3]: what's changed
  2. *: what's changed

-->

What problem does this PR solve?

Issue Number: close #20651

Problem Summary:

This is the 4th pr to the issue Parallelize stream aggregation executor by using shuffle executor, add a simple benchmark to test the performance of partitionRangeSplitter.

The result shows that it's fast than partitionHashSplitter for the sorted data source, but the overall performance of shuffled stream aggregation is worse than the sequential version.

Proposal: xxx <!-- REMOVE this line if not applicable -->

What's Changed:

  • executor: add simple benchmark for stream aggregation, use range splitter as default.

Benchmarks:

rows no concurrency concurrency 2 concurrency 4
10000 4312926 ns/op 3147944 ns/op 2511416 ns/op
100000 46287871 ns/op 30565550 ns/op 19906485 ns/op
1000000 688690770 ns/op 402929275 ns/op 230093672 ns/op
10000000 12842102225 ns/op 7839787310 ns/op 3884926789 ns/op

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List <!--REMOVE the items that are not applicable-->

Tests <!-- At least one of them must be included. -->

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note <!-- bugfixes or new feature need a release note -->

  • No release note
+38 -23

5 comments

3 changed files

JinLingChristopher

pr closed time in 2 hours

issue closedpingcap/tidb

Parallelize stream aggregation executors by using shuffle executors

Description

In this PR(https://github.com/pingcap/tidb/pull/14238), we introduce the shuffle executor. Like https://github.com/pingcap/tidb/issues/14441, we may use this shuffle executor to implement parallel stream aggregation by nesting some sort and stream-aggg executors into a shuffle executor:

ShuffleExec
    - StreamAgg
        - Sort
    - StreamAgg
        - Sort
    ...

It can utilize more CPU resources to get better performance.

  • [x] Introduce a new variable to control concurrency
  • [x] Support shuffle + stream_agg and add some tests and benchmarks simply
  • [ ] Add more tests and benchmarks about shuffle + stream_agg
  • [x] Support range_spliter

Score

6600

Mentor

@qw4990 (Slack ID: Zhang Yuanjia)

Recommend Skills

  • Golang

closed time in 2 hours

qw4990

Pull request review commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

 type mergeJoinTestCase struct { 	childrenUsedSchema [][]bool } +func prepare4MergeJoinWithSort(tc *mergeJoinTestCase, innerDS, outerDS *mockDataSource) *MergeJoinExec {

Of course, already merged.

huang-b

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

Reward success.

JinLingChristopher

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

/reward 6600

JinLingChristopher

comment created time in 2 hours

issue commentpingcap/tidb

Parallelize stream aggregation executors by using shuffle executors

Pick up success.

qw4990

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

This PR's linked issue is not picked.

JinLingChristopher

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

/reward 6600

JinLingChristopher

comment created time in 2 hours

pull request commentpingcap/tidb

executor: add benchmark for partitionRangeSplitter

/run-all-tests

JinLingChristopher

comment created time in 2 hours

pull request commentpingcap/tidb

ddl: fix duplicate entry message report by add index (#21241)

/run-all-tests

ti-srebot

comment created time in 2 hours

pull request commentpingcap/tidb

executor: Add the HashAggExec runtime information (#20577)

@qw4990, @XuHuaiyu, @crazycs520, PTAL.

ti-srebot

comment created time in 3 hours

Pull request review commentpingcap/tidb

executor: add unit test and benchmark for shuffle merge join

 type mergeJoinTestCase struct { 	childrenUsedSchema [][]bool } +func prepare4MergeJoinWithSort(tc *mergeJoinTestCase, innerDS, outerDS *mockDataSource) *MergeJoinExec {

Can we merge this function with prepare4ShuffleMergeJoinWithSort?

huang-b

comment created time in 3 hours

startedckardaris/ucollage

started time in 4 hours

PR opened pingcap/tidb

executor: bench for range spliter. WIP

<!-- Thank you for contributing to TiDB!

PR Title Format:

  1. pkg [, pkg2, pkg3]: what's changed
  2. *: what's changed

-->

What problem does this PR solve?

Issue Number: close #xxx <!-- REMOVE this line if no issue to close -->

Problem Summary:

What is changed and how it works?

Proposal: xxx <!-- REMOVE this line if not applicable -->

What's Changed:

How it Works:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List <!--REMOVE the items that are not applicable-->

Tests <!-- At least one of them must be included. -->

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note <!-- bugfixes or new feature need a release note -->

  • No release note
+22 -7

0 comment

3 changed files

pr created time in 6 hours

issue commentpingcap/tidb

Leverage Apache Arrow to boost performance and embrace warehouse/big data ecosystems

@ilovesoup @innerr can you enlighten me what you mean with encoding? I would accept that for a format like Parquet but in Arrow the data is quite "plain" in memory and the only thing that would roughly be some sort of encoding would be the validity bitmap. That is though extremely optimized at least in the C++ implementation.

zz-jason

comment created time in 6 hours

more