profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/azat/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

azat-archive/blog 9

Simple blog using Symfony 2.0 DEV

azat/dot_files 4

vim/gdb/tmux/bash/readline

azat/libevent 3

UNSTABLE FORK

azat-archive/akLib 2

akLib - Small PHP Lib. DEPRECATED(OLD CRAP).

azat-archive/bonnie 1

Fork of http://www.coker.com.au/bonnie++/experimental/. PATCHED.

azat-archive/hadoop-io-sequence-reader 1

Apache Hadoop's SequenceFile Wet Reader in C++. Don't use it in "production" purpose.

azat/clang 0

Fork of clang with static-analyzer for libevent patches

azat/ClickHouse 0

UNSTABLE FORK

pull request commentClickHouse/ClickHouse

Add round-robin support for clickhouse-benchmark

Ok. But why it is needed?

So latency and others will be accounted in summary, maybe useful for benchmarking the throughput of the cluster.

azat

comment created time in 13 hours

Pull request review commentClickHouse/ClickHouse

Experiment with sharing file descriptors

+#pragma once++#include <map>+#include <mutex>++#include <Core/Types.h>+#include <Common/ProfileEvents.h>+#include <IO/OpenedFile.h>+++namespace ProfileEvents+{+    extern const Event OpenedFileCacheHits;+    extern const Event OpenedFileCacheMisses;+}++namespace DB+{+++/** Cache of opened files for reading.+  * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files.+  * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second.+  * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded).+  * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit.+  */+class OpenedFileCache+{+private:+    using Key = std::pair<std::string /* path */, int /* flags */>;++    using OpenedFileWeakPtr = std::weak_ptr<OpenedFile>;+    using Files = std::map<Key, OpenedFileWeakPtr>;++    Files files;+    std::mutex mutex;++public:+    using OpenedFilePtr = std::shared_ptr<OpenedFile>;++    OpenedFilePtr get(const std::string & path, int flags)+    {+        Key key(path, flags);++        std::lock_guard lock(mutex);++        auto [it, inserted] = files.emplace(key, OpenedFilePtr{});+        if (!inserted)+            if (auto res = it->second.lock())+                return res;++        OpenedFilePtr res+        {+            new OpenedFile(path, flags),+            [key, this](auto ptr)+            {+                {+                    std::lock_guard another_lock(mutex);+                    files.erase(key);+                }+                delete ptr;+            }+        };

@alexey-milovidov Thanks for pointing out, missed that bit (since I was expecting metric for the cache, not for the OpenedFile, since if later it will be used somewhere else you will not be able to know the cache size at runtime.)

alexey-milovidov

comment created time in 13 hours

PullRequestReviewEvent

Pull request review commentClickHouse/ClickHouse

Experiment with sharing file descriptors

+#pragma once++#include <map>+#include <mutex>++#include <Core/Types.h>+#include <Common/ProfileEvents.h>+#include <IO/OpenedFile.h>+++namespace ProfileEvents+{+    extern const Event OpenedFileCacheHits;+    extern const Event OpenedFileCacheMisses;+}++namespace DB+{+++/** Cache of opened files for reading.+  * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files.+  * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second.+  * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded).+  * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit.

It will help on some systems where increasing files limit is harder, e.g. developers' Mac OS machines.

Why it is harder there?

I think it will be beneficial, see the example #25994.

Well, To me, complexity does not look like it worth it.

It will also help to make shared pool for asynchronous reads.

Agree, although I would make it only for asynchronous reads then.

alexey-milovidov

comment created time in 13 hours

PullRequestReviewEvent

pull request commentClickHouse/ClickHouse

Add round-robin support for clickhouse-benchmark

What's the purpose of it?

This option will just change the output, so instead of differentiate hosts between each other, statistics will be accounted for all hosts in summary, i.e.

before:

127.1:9000, queries 1, QPS: 42.519, RPS: 42.519, MiB/s: 0.000, result RPS: 42.519, result MiB/s: 0.000.
127.2:9000, queries 1, QPS: 24.075, RPS: 24.075, MiB/s: 0.000, result RPS: 24.075, result MiB/s: 0.000.

after:

127.1:9000,127.2:9000, queries 2, QPS: 48.036, RPS: 48.036, MiB/s: 0.000, result RPS: 48.036, result MiB/s: 0.000.

So latency and others will be accounted in summary, maybe useful for benchmarking the cluster.

AFAIU initial mode was added for perf tests.

Note that hosts are queried sequentially (while one host is queried others are not).

Yes, but with --concurrency you can run multiple workers/

azat

comment created time in 13 hours

Pull request review commentClickHouse/ClickHouse

Add round-robin support for clickhouse-benchmark

 class Benchmark : public Poco::Util::Application                 for (const auto & conn : connections)                 {                     if (!connection_description.empty())-                        connection_description += ",";+                        connection_description += ", ";

This will make it impossible to distinguish list of hosts from other columns, since other columns is printed via , too, i.e.:

Before:

127.1:9000,127.2:9000, queries 2, QPS: 48.036, RPS: 48.036, MiB/s: 0.000, result RPS: 48.036, result MiB/s: 0.000.

After:

127.1:9000, 127.2:9000, queries 2, QPS: 48.036, RPS: 48.036, MiB/s: 0.000, result RPS: 48.036, result MiB/s: 0.000.
azat

comment created time in 13 hours

PullRequestReviewEvent

delete branch azat/ClickHouse

delete branch : bench-hang-on-EMFILE-fix

delete time in 14 hours

delete branch azat/ClickHouse

delete branch : part_log-fix-event_time_microseconds

delete time in 14 hours

Pull request review commentClickHouse/ClickHouse

Connection pool factory.

+#include <Client/ConnectionPool.h>++#include <boost/functional/hash.hpp>++namespace DB+{++ConnectionPoolPtr ConnectionPoolFactory::get(+    unsigned max_connections,+    String host,+    UInt16 port,+    String default_database,+    String user,+    String password,+    String cluster,+    String cluster_secret,+    String client_name,+    Protocol::Compression compression,+    Protocol::Secure secure,+    Int64 priority)+{+    Key key{+        max_connections, host, port, default_database, user, password, cluster, cluster_secret, client_name, compression, secure, priority};++    std::unique_lock lock(mutex);+    auto it = pool.find(key);+    ConnectionPoolPtr ret;+    if (it != pool.end())+        ret = it->second.lock();++    if (!ret)+    {+        ret = std::make_shared<ConnectionPool>(+            max_connections,+            host,+            port,+            default_database,+            user,+            password,+            cluster,+            cluster_secret,+            client_name,+            compression,+            secure,+            priority);+        if (it == pool.end())

Seems that dangling ConnectionPoolWeakPtr will be left in the pool since you don't use custom deleter that will clean it up (not sure that it is a problem here though, but looks like it is better to avoid such things)

amosbird

comment created time in 21 hours

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentClickHouse/ClickHouse

Experiment with sharing file descriptors

+#pragma once++#include <map>+#include <mutex>++#include <Core/Types.h>+#include <Common/ProfileEvents.h>+#include <IO/OpenedFile.h>+++namespace ProfileEvents+{+    extern const Event OpenedFileCacheHits;+    extern const Event OpenedFileCacheMisses;+}++namespace DB+{+++/** Cache of opened files for reading.+  * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files.+  * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second.+  * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded).+  * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit.

@alexey-milovidov Are you sure that complexity worth it? Increasing RLIMIT_NOFILE does not looks like a real problem nowadays.

alexey-milovidov

comment created time in a day

Pull request review commentClickHouse/ClickHouse

Experiment with sharing file descriptors

+#pragma once++#include <map>+#include <mutex>++#include <Core/Types.h>+#include <Common/ProfileEvents.h>+#include <IO/OpenedFile.h>+++namespace ProfileEvents+{+    extern const Event OpenedFileCacheHits;+    extern const Event OpenedFileCacheMisses;+}++namespace DB+{+++/** Cache of opened files for reading.+  * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files.+  * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second.+  * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded).+  * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit.+  */+class OpenedFileCache+{+private:+    using Key = std::pair<std::string /* path */, int /* flags */>;++    using OpenedFileWeakPtr = std::weak_ptr<OpenedFile>;+    using Files = std::map<Key, OpenedFileWeakPtr>;

Maybe hashtable is a better holder?

alexey-milovidov

comment created time in a day

Pull request review commentClickHouse/ClickHouse

Experiment with sharing file descriptors

+#pragma once++#include <map>+#include <mutex>++#include <Core/Types.h>+#include <Common/ProfileEvents.h>+#include <IO/OpenedFile.h>+++namespace ProfileEvents+{+    extern const Event OpenedFileCacheHits;+    extern const Event OpenedFileCacheMisses;+}++namespace DB+{+++/** Cache of opened files for reading.+  * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files.+  * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second.+  * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded).+  * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit.+  */+class OpenedFileCache+{+private:+    using Key = std::pair<std::string /* path */, int /* flags */>;++    using OpenedFileWeakPtr = std::weak_ptr<OpenedFile>;+    using Files = std::map<Key, OpenedFileWeakPtr>;++    Files files;+    std::mutex mutex;++public:+    using OpenedFilePtr = std::shared_ptr<OpenedFile>;++    OpenedFilePtr get(const std::string & path, int flags)+    {+        Key key(path, flags);++        std::lock_guard lock(mutex);++        auto [it, inserted] = files.emplace(key, OpenedFilePtr{});+        if (!inserted)+            if (auto res = it->second.lock())+                return res;++        OpenedFilePtr res+        {+            new OpenedFile(path, flags),+            [key, this](auto ptr)+            {+                {+                    std::lock_guard another_lock(mutex);+                    files.erase(key);+                }+                delete ptr;+            }+        };

Worth adding some metric (to ensure that it will not grows infinitely).

P.S. interesting implementation of cache via weak_ptr.

alexey-milovidov

comment created time in a day

PullRequestReviewEvent

pull request commentClickHouse/ClickHouse

Rework SELECT from Distributed optimizations

@alexey-milovidov maybe you can take a look please?

azat

comment created time in a day

Pull request review commentClickHouse/ClickHouse

Fix bad cast

 std::optional<Blocks> evaluateExpressionOverConstantCondition(const ASTPtr & nod          for (const auto & conjunct : dnf)         {-            Block block(conjunct);+            Block block;++            for (const auto & elem : conjunct)+            {+                if (!block.has(elem.name))+                {+                    block.insert(elem);+                }+                else+                {+                    /// Conjunction of condition on column equality to distinct values can never be satisfied.

@alexey-milovidov the behavior had been changed because of this hunk.

SELECT count(*) FROM remote('127.{1,2}', system.one, dummy) WHERE dummy = 1 AND dummy = 0 SETTINGS optimize_skip_unused_shards=1;

Without:

[p1.azat] 2021.07.24 06:41:06.482136 [ 3871 ] {ba1a6a15-576f-4883-9a4c-e81e60bab557} <Debug> StorageDistributed (remote): Skipping irrelevant shards - the query will be sent to the following 
shards of the cluster (shard numbers): [2]

With this patch:

[p1.azat] 2021.07.24 06:37:01.399750 [ 22034 ] {0aa22c9f-63e8-4630-b951-0d0b76a02c10} <Debug> StorageDistributed (remote): Skipping irrelevant shards - the query will be sent to the following shards of the cluster (shard numbers): []

So there are zero shards to send the query, hence nothing more is done. Such queries does not looks like real user queries, so maybe we should not try to optimize them, or return any node if zero is returned, what do you think?

alexey-milovidov

comment created time in a day

PullRequestReviewEvent

delete branch azat/ClickHouse

delete branch : skip_unavailable_shards-excessive-attempts

delete time in a day

PR opened ClickHouse/ClickHouse

Fix event_time_microseconds for REMOVE_PART in system.part_log

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md): Fix event_time_microseconds for REMOVE_PART in system.part_log

Fixes: #20027 (cc @bharatnc )

+30 -15

0 comment

3 changed files

pr created time in 2 days

push eventazat/ClickHouse

Azat Khuzhin

commit sha 00e208342142f6c38d5aeca571b58b13b5b860cd

Fix event_time_microseconds for REMOVE_PART in system.part_log

view details

push time in 2 days

create barnchazat/ClickHouse

branch : part_log-fix-event_time_microseconds

created branch time in 2 days

Pull request review commentClickHouse/ClickHouse

Avoid hanging clickhouse-benchmark if connection fails (i.e. on EMFILE)

+#!/usr/bin/env bash+# shellcheck disable=SC2086++CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)+# shellcheck source=../shell_config.sh+. "$CURDIR"/../shell_config.sh++# NOTE: Tests with limit for number of opened files cannot be run under UBsan.+#+# UBsan needs to create pipe each time it need to check the type:

Here is an RFC patch to address this in UBsan - https://reviews.llvm.org/D106586

azat

comment created time in 2 days

PullRequestReviewEvent

pull request commentClickHouse/ClickHouse

Avoid hanging clickhouse-benchmark if connection fails (i.e. on EMFILE)

AST fuzzer (UBSan) — Lost connection to server. See the logs.

Nothing in logs. Maybe event watchdag had been KILLed?

Integration tests (asan) — fail: 2, passed: 1552, flaky: 0 Integration tests (release) — fail: 2, passed: 1548, flaky: 5 Integration tests (thread) — fail: 2, passed: 1550, flaky: 2

test_materialize_mysql_database known flaky test (#26673)

Testflows check — failed: 147, passed: 2029, other: 173

Known problem (error for JOIN had been changed)

azat

comment created time in 2 days

push eventazat/ClickHouse

Azat Khuzhin

commit sha dac13793e7c3a056de7811cc4884b6f0c0bf3b8b

Disable 01955_clickhouse_benchmark_connection_hang under UBsan UBsan does not works reliable when the RLIMIT_NOFILE exceeded.

view details

push time in 3 days

push eventazat/ClickHouse

Alexander Kuzmenkov

commit sha 61a01782a6fe3223728a1b225c7dbb85615a3b1d

fix lagInFrame for nullable types

view details

Denny Crane

commit sha 62d653583ebbc140ef0bace28c82f119cae408f1

Update settings.md compression level

view details

Denny Crane

commit sha 71f96dcb2a339b19dd6b1e975e857027250cdf0e

Update settings.md compression level

view details

Alexander Kuzmenkov

commit sha a197511a96c89f22c417b84009b0f2e77c20ea00

fixes

view details

Alexey Milovidov

commit sha 17ce71ce3dfb22bb54a5911968f058a596e547c8

Fix flaky test 01509_check_many_parallel_quorum_inserts

view details

Alexey Milovidov

commit sha d8150237009887333122fef40913d1e9627a9577

Fix one source of flaky tests

view details

Alexey Milovidov

commit sha f62fc8c43d32abd1eb78cef59b0da436ff3357bb

Fix another cause of flakiness

view details

Alexey Milovidov

commit sha b8728047a89114046ae1613847ebdcce66df9bc4

Remove some code, more C++ way

view details

Azat Khuzhin

commit sha 7b7e8acf4f2556f635a5373efeb148b660cc4eab

Fix excessive connect attempts with skip_unavailable_shards Before this patch the query was sent from RemoteBlockInputStream::readPrefix() and also from RemoteBlockInputStream::read(). And since in case of skip_unavailable_shards=1 connection errors are ignored, it tries to do x2 connect attempts. Fix this, but removing RemoteBlockInputStream::readPrefix(). Fixes: #26511

view details

su-houzhen

commit sha 412fb28d4f706d2d3bfd73f2c264261ab581f999

Update index.md

view details

Alexander Kuzmenkov

commit sha dadf1e192e242949d4eeca59bb93063fe28e8dfb

fix whitespace

view details

Alexander Kuzmenkov

commit sha 6ae40317239bd8da459da6f8fb6475cf9e53e30f

Merge pull request #26521 from ClickHouse/aku/lag-in-frame-nullable fix lagInFrame for nullable types

view details

alexey-milovidov

commit sha 66d3b534a6539bd962a6d2d3277a85fc144323c4

Merge pull request #26620 from ClickHouse/more-cpp Remove some code, more C++ way

view details

alexey-milovidov

commit sha 4b51ec3a8335d72c812a403b9dbd305835a7160b

Merge pull request #26619 from ClickHouse/fix-flaky-test-23 Fix one possible cause of tests flakiness

view details

Maksim Kita

commit sha f32a2806ee25b0b700b52c706b70ab8d616b60ca

Merge pull request #26664 from su-houzhen/patch-1 Update index.md

view details

Ivan Blinkov

commit sha e87580c89cdb4b738a8675a93fa1b08683538274

Update README.md

view details

Denny Crane

commit sha 6b3c788214909aa655cf6f4b6e523befeb7c6428

Update settings.md

view details

Denny Crane

commit sha fa69ad56ad9f6b7c9abe5cb0d7f276e9cbf342fb

Update settings.md

view details

Vladimir

commit sha 576b4078047bf49329571687440e60dca7d3f054

Support conditions in JOIN ON section (#24420) * Try to enforce table identification in CollectJoinOnKeysMatcher * Support filtering conditions in JOIN ON for HashJoin * Correct handle non equi join * Update test 00878_join_unexpected_results * Join on filters calculated as one row before join * Do not lookup key in hash join if condition for row is not hold * better * Support filtering conditions in JOIN ON for MergeJoin * Support Nullable mask in JOIN ON section * Fix style in Interpreters/TableJoin.cpp * Change return type of getColumnAsMask in join_common to ColumnPtr * Handle Nullable(Nothing) type in JOIN ON section, add test cases * Fix type cast JoinCommon::getColumnAsMask * Check type if conditions in JOIN ON section, support functions * Update tests with JOIN ON * Style changes, add comments for conditions in JOIN ON section * Add test cases for join on condtions * JOIN ON key1 = key2 AND (cond1 OR cond2) * Remove CollectJoinOnKeysVisitor has_join_keys * Add test cases for join on nullable/lc conditions * Fix style * Change error code 48 to 403 in join on tests * Fix whitespace

view details

Alexey Boykov

commit sha b285b1820ebbd676b3ca925180ba99021d85b6ca

Merge pull request #26616 from den-crane/patch-19 Doc. Documentation for compression level

view details

push time in 3 days

push eventazat/ClickHouse

Azat Khuzhin

commit sha f17ca450ac991603e6400c7caef49c493ac69739

Avoid hanging clickhouse-benchmark if connection fails (i.e. on EMFILE)

view details

push time in 4 days