profile
viewpoint
Nick Cameron nrc @pingcap Christchurch, New Zealand https://www.ncameron.org Software engineer at PingCAP; @rust-lang core team.

nrc/derive-new 264

derive simple constructor functions for Rust structs

nrc/apr-intro 63

An alternate introdcution to the APR book

nrc/callgraph.rs 26

Callgraphs for Rust programs

nrc/find-work 23

find something Rusty to work on

GSam/rust-refactor 19

Rust refactoring project

nrc/box-error 3

A library for error handling using boxed errors

nrc/clyde 3

wip

nrc/cargo-edit 1

A utility for managing cargo dependencies from the command line.

nrc/chalk 1

A PROLOG-ish interpreter written in Rust, intended eventually for use in the compiler

pull request commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

There are a couple of outstanding requests for more comments, otherwise lgtm.

sticnarf

comment created time in 2 hours

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

+// Copyright 2020 TiKV Project Authors. Licensed under Apache-2.0.++//! The concurrency manager is responsible for concurrency control of+//! transactions.+//!+//! The concurrency manager contains a key table in memory.+//! Transactional commands can acquire key mutexes from the concurrency manager+//! to ensure serializability. Lock information can be also stored in the+//! manager and reading requests can check if these locks block the read.

I think I find this confusing because the difference between what latches do and what the concurrency manager does is subtle. It would be good for the comment here to reflect what is implemented now, rather than what will be implemented in the future.

sticnarf

comment created time in 2 hours

pull request commenttikv/tikv

txn: Change the order of `ResolveLock`'s read and write phase

/bench +sysbench

longfangsong

comment created time in 3 hours

pull request commenttikv/tikv

makefile: Add `make doc` command

/merge

andylokandy

comment created time in 11 hours

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

+// Copyright 2020 TiKV Project Authors. Licensed under Apache-2.0.++use super::lock_table::LockTable;++use parking_lot::Mutex;+use std::{mem, sync::Arc};+use tokio::sync::{Mutex as AsyncMutex, MutexGuard as AsyncMutexGuard};+use txn_types::{Key, Lock};++/// An entry in the in-memory table providing functions related to a specific+/// key.+pub struct KeyHandle {+    pub key: Key,+    table: LockTable,+    mutex: AsyncMutex<()>,+    lock_store: Mutex<Option<Lock>>,+}++impl KeyHandle {+    pub fn new(key: Key, table: LockTable) -> Self {+        KeyHandle {+            key,+            table,+            mutex: AsyncMutex::new(()),+            lock_store: Mutex::new(None),+        }+    }++    pub async fn lock(self: Arc<Self>) -> KeyHandleGuard {+        // Safety: `_mutex_guard` is declared before `handle_ref` in `KeyHandleGuard`.+        // So the mutex guard will be released earlier than the `Arc<KeyHandle>`.+        // Then we can make sure the mutex guard doesn't point to released memory.+        let mutex_guard = unsafe { mem::transmute(self.mutex.lock().await) };+        KeyHandleGuard {+            _mutex_guard: mutex_guard,+            handle: self,+        }+    }++    pub fn with_lock<T>(&self, f: impl FnOnce(&Option<Lock>) -> T) -> T {+        f(&*self.lock_store.lock())+    }+}++impl Drop for KeyHandle {+    fn drop(&mut self) {+        self.table.remove(&self.key);+    }+}++/// A `KeyHandle` with its mutex locked.+pub struct KeyHandleGuard {+    // It must be declared before `handle_ref` so it will be dropped before+    // `handle_ref`.+    _mutex_guard: AsyncMutexGuard<'static, ()>,+    handle: Arc<KeyHandle>,

Should add a note that it is unsafe to mutate handle to point at a different KeyHandle

sticnarf

comment created time in a day

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

+// Copyright 2020 TiKV Project Authors. Licensed under Apache-2.0.++use super::lock_table::LockTable;++use parking_lot::Mutex;+use std::{mem, sync::Arc};+use tokio::sync::{Mutex as AsyncMutex, MutexGuard as AsyncMutexGuard};+use txn_types::{Key, Lock};++/// An entry in the in-memory table providing functions related to a specific+/// key.+pub struct KeyHandle {+    pub key: Key,+    table: LockTable,+    mutex: AsyncMutex<()>,+    lock_store: Mutex<Option<Lock>>,+}++impl KeyHandle {+    pub fn new(key: Key, table: LockTable) -> Self {+        KeyHandle {+            key,+            table,+            mutex: AsyncMutex::new(()),+            lock_store: Mutex::new(None),+        }+    }++    pub async fn lock(self: Arc<Self>) -> KeyHandleGuard {+        // Safety: `_mutex_guard` is declared before `handle_ref` in `KeyHandleGuard`.+        // So the mutex guard will be released earlier than the `Arc<KeyHandle>`.+        // Then we can make sure the mutex guard doesn't point to released memory.+        let mutex_guard = unsafe { mem::transmute(self.mutex.lock().await) };+        KeyHandleGuard {+            _mutex_guard: mutex_guard,+            handle: self,+        }+    }++    pub fn with_lock<T>(&self, f: impl FnOnce(&Option<Lock>) -> T) -> T {+        f(&*self.lock_store.lock())+    }+}++impl Drop for KeyHandle {+    fn drop(&mut self) {+        self.table.remove(&self.key);+    }+}++/// A `KeyHandle` with its mutex locked.+pub struct KeyHandleGuard {+    // It must be declared before `handle_ref` so it will be dropped before+    // `handle_ref`.

typo: handle_ref should be handle

sticnarf

comment created time in a day

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

+// Copyright 2020 TiKV Project Authors. Licensed under Apache-2.0.++//! The concurrency manager is responsible for concurrency control of+//! transactions.+//!+//! The concurrency manager contains a key table in memory.+//! Transactional commands can acquire key mutexes from the concurrency manager+//! to ensure serializability. Lock information can be also stored in the+//! manager and reading requests can check if these locks block the read.

Could you add a comment about how this interacts with latches? I think they are both needed for now?

sticnarf

comment created time in a day

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

+// Copyright 2020 TiKV Project Authors. Licensed under Apache-2.0.++//! The concurrency manager is responsible for concurrency control of+//! transactions.+//!+//! The concurrency manager contains a key table in memory.+//! Transactional commands can acquire key mutexes from the concurrency manager+//! to ensure serializability. Lock information can be also stored in the+//! manager and reading requests can check if these locks block the read.++mod key_handle;+mod lock_table;++pub use self::key_handle::{KeyHandle, KeyHandleGuard};+pub use self::lock_table::LockTable;++use std::{+    mem::{self, MaybeUninit},+    sync::{+        atomic::{AtomicU64, Ordering},+        Arc,+    },+};+use txn_types::{Key, Lock, TimeStamp};++// TODO: Currently we are using a Mutex<BTreeMap> to implement the handle table.+// In the future we should replace it with a concurrent ordered map.+// Pay attention that the async functions of ConcurrencyManager should not hold+// the mutex.+#[derive(Clone)]+pub struct ConcurrencyManager {+    max_read_ts: Arc<AtomicU64>,+    lock_table: LockTable,+}++impl ConcurrencyManager {+    pub fn new(latest_ts: TimeStamp) -> Self {+        ConcurrencyManager {+            max_read_ts: Arc::new(AtomicU64::new(latest_ts.into_inner())),+            lock_table: LockTable::default(),+        }+    }++    pub fn max_read_ts(&self) -> TimeStamp {+        TimeStamp::new(self.max_read_ts.load(Ordering::SeqCst))+    }++    /// Acquires a mutex of the key and returns an RAII guard. When the guard goes+    /// out of scope, the mutex will be unlocked.+    ///+    /// The guard can be used to store Lock in the table. The stored lock+    /// is visible to `read_key_check` and `read_range_check`.+    pub async fn lock_key(&self, key: &Key) -> KeyHandleGuard {+        self.lock_table.lock_key(key).await+    }++    /// Acquires mutexes of the keys and returns the RAII guards. The order of the+    /// guards is the same with the given keys.+    ///+    /// The guards can be used to store Lock in the table. The stored lock+    /// is visible to `read_key_check` and `read_range_check`.+    pub async fn lock_keys(&self, keys: impl Iterator<Item = &Key>) -> Vec<KeyHandleGuard> {+        let mut keys_with_index: Vec<_> = keys.enumerate().collect();+        // To prevent deadlock, we sort the keys and lock them one by one.+        keys_with_index.sort_by_key(|(_, key)| *key);+        let mut result: Vec<MaybeUninit<KeyHandleGuard>> = Vec::new();+        result.resize_with(keys_with_index.len(), || MaybeUninit::uninit());+        for (index, key) in keys_with_index {+            result[index] = MaybeUninit::new(self.lock_table.lock_key(key).await);+        }+        #[allow(clippy::unsound_collection_transmute)]+        unsafe {+            mem::transmute(result)+        }+    }++    /// Checks if there is a memory lock of the key which blocks the read.+    /// The given `check_fn` should return false iff the lock passed in+    /// blocks the read.+    ///+    /// It will also updates the max_read_ts.+    pub fn read_key_check<E>(+        &self,+        key: &Key,+        ts: TimeStamp,+        check_fn: impl FnOnce(&Lock) -> Result<(), E>,

Why do we need to take check_fn? Rather than just considering a key locked if the lock exists?

sticnarf

comment created time in a day

Pull request review commenttikv/tikv

txn: use in-memory lock table and avoid getting timestamp from PD in prewrite

 pub struct MvccTxn<S: Snapshot, P: PdClient + 'static> {     // collapse continuous rollbacks.     collapse_rollback: bool,     pub extra_op: ExtraOp,-    pd_client: Arc<P>,+    _pd_client: Arc<P>,+    concurrency_manager: ConcurrencyManager,

Could you add some comments to these fields - what they are for, how locks need to be maintained, when they should be checked and released.

sticnarf

comment created time in a day

PR closed tikv/tikv

WIP implement check secondary locks component/storage component/transaction sig/transaction status/WIP

Signed-off-by: Nick Cameron nrc@ncameron.org

<!-- Thank you for contributing to TiKV!

If you haven't already, please read TiKV's CONTRIBUTING document.

If you're unsure about anything, just ask; somebody should be along to answer within a day or two.

PR Title Format:

  1. module [, module2, module3]: what's changed
  2. *: what's changed -->

What problem does this PR solve?

WIP - blocked on https://github.com/pingcap/kvproto/pull/657 and implementing a test

Issue Number: cc #8316

Problem Summary: Implements a command for checking the status of async commit locks

What is changed and how it works?

Proposal: xxx <!-- REMOVE this line if not applicable -->

What's Changed: Adds command boilerplate, plus implementation in process.rs/txn.rs for CheckSecondaryLocks.

Tests <!-- At least one of them must be included. -->

  • Unit test - TODO

Release note <!-- bugfixes or new feature need a release note -->

No release note (partial implementation only)

+229 -19

1 comment

11 changed files

nrc

pr closed time in a day

pull request commenttikv/tikv

WIP implement check secondary locks

closed in favour of #8337

nrc

comment created time in a day

push eventpingcap/tidb

mmyj

commit sha 0887dc6c562d26f89acf673b5e63bf8bdd4decb7

util: add checksumWriter/Reader interface to support evaluate checksum (#18649)

view details

Nick Cameron

commit sha 235c119f0094b1926dba3df8e2ff0fd367948f14

Merge branch 'master' into update-pd-20200803

view details

push time in a day

PR closed pingcap/kvproto

Reviewers
pdpb: revert "pdpb: include down/pending peers in scan region response (#646)" bug

This reverts #646 since it breaks backwards compatibility and prevents updating TiDB:

➜  tidb git:(master) ✗ make
CGO_ENABLED=1 GO111MODULE=on go build  -tags codes  -ldflags '-X "github.com/pingcap/parser/mysql.TiDBReleaseVersion=v4.0.0-beta.2-872-g03003538d-dirty" -X "github.com/pingcap/tidb/util/versioninfo.TiDBBuildTS=2020-08-03 03:25:04" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitHash=03003538d1d7a1762e02fc62365e4832f9df4dd7" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitBranch=master" -X "github.com/pingcap/tidb/util/versioninfo.TiDBEdition=Community" ' -o bin/tidb-server tidb-server/main.go
# github.com/pingcap/pd/v4/client
../../gopath/pkg/mod/github.com/pingcap/pd/v4@v4.0.0-rc.2.0.20200714122454-1a64f969cb3c/client/client.go:562:24: cannot use resp.GetRegions() (type []*pdpb.Region) as type []*metapb.Region in return argument
make: *** [server] Error 2

PTAL @disksing @rleungx

+417 -841

2 comments

2 changed files

nrc

pr closed time in a day

pull request commentpingcap/kvproto

pdpb: revert "pdpb: include down/pending peers in scan region response (#646)"

The TiDB build issue is addressed in https://github.com/pingcap/tidb/pull/18938, aiui, the back compat issue will not be a problem at runtime because of the region_metas field and kvproto is not a published crate, so I think we can close this for now.

nrc

comment created time in a day

pull request commentpingcap/tidb

*: update kvproto, PD, BR and unistore dependencies

/merge

sticnarf

comment created time in a day

pull request commenttikv/tikv

cmd: tikv-ctl upgrade to futures 0.3

/merge

ekexium

comment created time in a day

issue commenttikv/sig-transaction

Tracking issue: async commit

Current blocked on increment 1 is implementing checking of secondary locks in TiDB (https://github.com/pingcap/tidb/pull/18467, but needs significant changes). That is blocked on updating kvproto in TiDB due to back-compat issues, fixed by https://github.com/pingcap/kvproto/pull/659

nrc

comment created time in 2 days

Pull request review commenttikv/tikv

cmd: tikv-ctl upgrade to futures 0.3

 engine_rocks = { path = "../components/engine_rocks" } engine_traits = { path = "../components/engine_traits" } fs2 = "0.4" futures = "0.1"

do we still need the futures 0.1 dep?

ekexium

comment created time in 2 days

pull request commenttikv/tikv

cmd: tikv-ctl upgrade to futures 0.3

/merge

ekexium

comment created time in 2 days

PR opened pingcap/kvproto

Reviewers
pdpb: revert "pdpb: include down/pending peers in scan region response (#646)" bug

This reverts #646 since it breaks backwards compatibility and prevents updating TiDB:

➜  tidb git:(master) ✗ make
CGO_ENABLED=1 GO111MODULE=on go build  -tags codes  -ldflags '-X "github.com/pingcap/parser/mysql.TiDBReleaseVersion=v4.0.0-beta.2-872-g03003538d-dirty" -X "github.com/pingcap/tidb/util/versioninfo.TiDBBuildTS=2020-08-03 03:25:04" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitHash=03003538d1d7a1762e02fc62365e4832f9df4dd7" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitBranch=master" -X "github.com/pingcap/tidb/util/versioninfo.TiDBEdition=Community" ' -o bin/tidb-server tidb-server/main.go
# github.com/pingcap/pd/v4/client
../../gopath/pkg/mod/github.com/pingcap/pd/v4@v4.0.0-rc.2.0.20200714122454-1a64f969cb3c/client/client.go:562:24: cannot use resp.GetRegions() (type []*pdpb.Region) as type []*metapb.Region in return argument
make: *** [server] Error 2

PTAL @disksing @rleungx

+417 -841

0 comment

2 changed files

pr created time in 2 days

create barnchnrc/kvproto

branch : back

created branch time in 2 days

pull request commenttikv/tikv

makefile: Add `make doc` command

/merge

andylokandy

comment created time in 2 days

issue commenttikv/sig-transaction

Tracking issue: async commit

First increment

Async commit protocol is implemented and functional. Implementation (especially in TiKV) may not be high performance. There may be correctness issues with edge cases.

Deliverable: can demo async commit with real workloads. Can reliably run sysbench benchmark

Deadline: 2020-08-07

nrc

comment created time in 2 days

pull request commenttikv/tikv

txn: Mark a flag when commit record collides with a protected rollback in write_cf

Can we change the name of the check_overlay_rollback feature to something more declarative? Either 'async_commit' or 'allow_duplicate_ts' or something similar.

MyonKeminta

comment created time in 2 days

issue commenttikv/sig-transaction

How to change write CF?

cc https://github.com/tikv/tikv/pull/8349

nrc

comment created time in 2 days

pull request commenttikv/tikv

makefile: Add `make doc` command

/merge

andylokandy

comment created time in 2 days

pull request commenttikv/sig-transaction

Create weekly-2020-07-30.md

merged manually

cfzjywxk

comment created time in 2 days

push eventtikv/sig-transaction

cfzjywxk

commit sha 9e267a647be9b9c23eb3c257d2f2d992b091de70

Create weekly-2020-07-30.md

view details

push time in 2 days

push eventcfzjywxk/sig-transaction

Nick Cameron

commit sha 9de2d4260ebe3af4b68e45da86740e22b0749857

Update meetings/minutes/weekly-2020-07-30.md Co-authored-by: Zejun Li <me@bobotu.dev>

view details

push time in 5 days

pull request commenttikv/tikv

txn: Change the order of `ResolveLock`'s read and write phase

I made a mistake... The resolve lock request has a high priority. It uses the other thread pool and can't affect other requests.

By 'the other thread pool' do you mean the scheduler's high priority pool, or something else? Do you know why it has high priority? I assume both the read and write phases are high priority in the current implementation.

longfangsong

comment created time in 5 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<S: Snapshot>: CommandExt {+    fn process_read(self, snapshot: S, statistics: &mut Statistics) -> Result<ProcessResult>;+}++pub struct StorageToWrite<'a, S: Snapshot, L: LockManager, P: PdClient + 'static> {+    snapshot: S,+    lock_mgr: &'a L,+    pd_client: Arc<P>,+}++pub struct WritingContext<'a> {+    extra_op: ExtraOp,+    statistics: &'a mut Statistics,+    pipelined_pessimistic_lock: bool,+}

How about WriteContext for the second and just passing the first as individual args? (Sorry for going back and forth, now that the struct is written out, it doesn't seem much like a coherent abstraction).

longfangsong

comment created time in 5 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<E: Engine>: CommandExt {+    fn process_read(+        &mut self,+        snapshot: E::Snap,+        statistics: &mut Statistics,+    ) -> Result<ProcessResult>;+}++pub trait WriteCommand<S: Snapshot, L: LockManager, P: PdClient + 'static>: CommandExt {

For the second error, I think you just need to add type args toCommand, i.e., something like ... for Command<S, L, P>. I think the first one is solved by putting the type args on the trait rather than on a method?

longfangsong

comment created time in 5 days

push eventrust-dev-tools/cargo-src

dependabot[bot]

commit sha 5a9038121edeae78ab235c62db41e13dc136b4d1

Bump elliptic from 6.4.1 to 6.5.3 Bumps [elliptic](https://github.com/indutny/elliptic) from 6.4.1 to 6.5.3. - [Release notes](https://github.com/indutny/elliptic/releases) - [Commits](https://github.com/indutny/elliptic/compare/v6.4.1...v6.5.3) Signed-off-by: dependabot[bot] <support@github.com>

view details

Nick Cameron

commit sha 328f783ac40245c298ee6c1e9654f4d4e8d46e7b

Merge pull request #271 from rust-dev-tools/dependabot/npm_and_yarn/elliptic-6.5.3 Bump elliptic from 6.4.1 to 6.5.3

view details

push time in 6 days

PR merged rust-dev-tools/cargo-src

Bump elliptic from 6.4.1 to 6.5.3 dependencies

Bumps elliptic from 6.4.1 to 6.5.3. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/indutny/elliptic/commit/8647803dc3d90506aa03021737f7b061ba959ae1"><code>8647803</code></a> 6.5.3</li> <li><a href="https://github.com/indutny/elliptic/commit/856fe4d99fe7b6200556e6400b3bf585b1721bec"><code>856fe4d</code></a> signature: prevent malleability and overflows</li> <li><a href="https://github.com/indutny/elliptic/commit/60489415e545efdfd3010ae74b9726facbf08ca8"><code>6048941</code></a> 6.5.2</li> <li><a href="https://github.com/indutny/elliptic/commit/9984964457c9f8a63b91b01ea103260417eca237"><code>9984964</code></a> package: bump dependencies</li> <li><a href="https://github.com/indutny/elliptic/commit/ec735edde187a43693197f6fa3667ceade751a3a"><code>ec735ed</code></a> utils: leak less information in <code>getNAF()</code></li> <li><a href="https://github.com/indutny/elliptic/commit/71e4e8e2f5b8f0bdbfbe106c72cc9fbc746d3d60"><code>71e4e8e</code></a> 6.5.1</li> <li><a href="https://github.com/indutny/elliptic/commit/7ec66ffa255079260126d87b1762a59ea10de5ea"><code>7ec66ff</code></a> short: add infinity check before multiplying</li> <li><a href="https://github.com/indutny/elliptic/commit/ee7970b92f388e981d694be0436c4c8036b5d36c"><code>ee7970b</code></a> travis: really move on</li> <li><a href="https://github.com/indutny/elliptic/commit/637d0216b58de7edee4f3eb5641295ac323acadb"><code>637d021</code></a> travis: move on</li> <li><a href="https://github.com/indutny/elliptic/commit/5ed0babb6467cd8575a9218265473fda926d9d42"><code>5ed0bab</code></a> package: update deps</li> <li>Additional commits viewable in <a href="https://github.com/indutny/elliptic/compare/v6.4.1...v6.5.3">compare view</a></li> </ul> </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+3 -3

0 comment

1 changed file

dependabot[bot]

pr closed time in 6 days

push eventnrc/xmas-elf

Samuel Tardieu

commit sha 56e47fba70309cf046d8087ff6c4157813020e97

Check length of both header parts A truncated ELF file would make the library panic. This can be illustrated by the following sequence in a Unix shell: $ dd if=/bin/sh of=bogus.elf bs=16 count=1 $ cargo run bogus.elf thread 'main' panicked at 'index 64 out of range for slice of length 16', src/header.rs:28:23

view details

Nick Cameron

commit sha 5778f73f5729856010f5ca5c86fa904b3def203c

Merge pull request #56 from samueltardieu/fix-crashes Check length of both header parts

view details

push time in 6 days

PR merged nrc/xmas-elf

Check length of both header parts

A truncated ELF file would make the library panic. This can be illustrated by the following sequence in a Unix shell:

$ dd if=/bin/sh of=bogus.elf bs=16 count=1
$ cargo run bogus.elf
thread 'main' panicked at 'index 64 out of range for slice of length 16',
  src/header.rs:28:23
+8 -0

0 comment

1 changed file

samueltardieu

pr closed time in 6 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<E: Engine>: CommandExt {+    fn process_read(+        &mut self,+        snapshot: E::Snap,+        statistics: &mut Statistics,+    ) -> Result<ProcessResult>;+}++pub trait WriteCommand<S: Snapshot, L: LockManager, P: PdClient + 'static>: CommandExt {

I think doing nothing and returning None is fine.

longfangsong

comment created time in 7 days

push eventtikv/sig-transaction

ekexium

commit sha f33ed9bf0f112ec3a23889fe22c84adaea032b1b

add some elaboration in tikv doc Signed-off-by: ekexium <ekexium@gmail.com>

view details

ekexium

commit sha 8a50a9f25a6af1970aa70d296e4eab1121d55d0d

minor change Signed-off-by: ekexium <ekexium@gmail.com>

view details

ekexium

commit sha e077ae04d5b02556f36e6d60766c73dda33b7714

resolve comments Signed-off-by: ekexium <ekexium@gmail.com>

view details

ekexium

commit sha d440da8ecbadaae697534d68a94222c51dd7723d

minor change Signed-off-by: ekexium <ekexium@gmail.com>

view details

ekexium

commit sha 091243ba9dd162f7431afaad20443acb12162f2f

remove one section Signed-off-by: ekexium <ekexium@gmail.com>

view details

Nick Cameron

commit sha 1eb2b02fde6db267645ade49cb23558fdcb1644f

Merge pull request #40 from ekexium/doc/txn Add some elaboration in TiKV doc

view details

push time in 7 days

issue commenttikv/tikv

Fix Clippy warnings

You don't have to fix everything! 😀 if you have a pr to fix some, that would be good to land

nrc

comment created time in 8 days

issue commenttikv/tikv

Fix Clippy warnings

Hmm, it's a good question! In this case using the iterator version means the compiler can probably elide a bunch of bounds checks as well as guarantee that we never get an out of bounds error, so there is a reason for Clippy's recommendation other than simplicity (which I agree this is not simpler).

In general I might try and refactor out a variable or a function to make it simpler.

In this case I would probably use:

for (group_index, <item>) in group_by_columns[..self.group_by_exps.len()].into_iter().enumerate() {

(and I'd be even happier if I didn't need group_index). I might factor out group_by_columns[..self.group_by_exps.len()] into a variable to make things clearer.

nrc

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.

Maybe add something like "if the key is locked, the client needs ..."

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.++### Commit++Now comes the most interesting part: commit, just like what I mentioned, commit in TiKV is based on [Percolator](https://research.google/pubs/pub36726/), but there are several things that are different:++- [Percolator](https://research.google/pubs/pub36726/) depends on BigTable's single row transaction, so we must implement something alike by ourselves in TiKV.+- We need to support pessimistic transaction.+  - This introduce some other problems such as dead lock.++So let's see how TiKV deal with these things.++#### Client++From the client side, the commit process is easy, you can see we use a [`TwoPhaseCommitter`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L249) to do the commit job, and what it does is just as the [Percolator](https://research.google/pubs/pub36726/) paper says: [`prewrite`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L278), [`commit_primary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L293) and finally [`commit_secondary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L310).++#### AcquirePessimisticLock++This one does not exists in client-rust for now, so you have to read TiDB's code [here](https://github.com/pingcap/tidb/blob/3748eb920300bd4bc0917ce852a14d90e8e0fafa/store/tikv/pessimistic.go#L58).++Basically, it sends a `PessimisticLockRequest` to TiKV, and TiKV will handle it [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L397), it just run `MvccTxn::acquire_pessimistic_lock` for each key to lock, which just put a lock on the key, the lock is just like the lock used in prewrite in optimistic transaction, the only differece is its type is `LockType::Pessimistic`.++And the it returns whether the lock is successful. If not, it will also [return the lock to wait for](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L447).++#### Prewrite++On TiKV side, the prewrite process happens [here in `process_write_impl`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/txn/process.rs#L557).++The first few lines of code (`if rows > FORWARD_MIN_MUTATIONS_NUM` part) is not covered by the [`TiKV Source Code Reading blogs`](https://pingcap.com/blog-cn/tikv-source-code-reading-12/). I guess it means:++```+if there's no "write" record in [mutations.minKey, mutation.maxKey] {+	skip_constraint_check = true;+  scan_mode = Some(ScanMode::Forward)+}+```++As far as I understand, it just provides a optimized way of checking the "write" column, see [tikv#5846](https://github.com/tikv/tikv/pull/5846) for details.++And no matter whether this branch is taken, we'll construct a `MvccTxn` , and then use it to do the prewrite job for each mutation the client sent to the TiKV server.++The [`MvccTxn::prewrite`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/mvcc/txn.rs#L563) function just do what the [Percolator](https://research.google/pubs/pub36726/) describes: check the `write` record in `[start_ts, ∞]` to find a newer write (this can be bypassed if `skip_constraint_check` is set, we can ignore this check safely in situations like import data). And then check whether the current key is locked at any timestamp. And finally use [`prewrite_key_value`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/txn.rs#L207) to lock the key and write the value in.++##### Latches++Just as I mentioned, there's no such things like "single row transaction" in TiKV, so we need another way to prevent the key's locking state changed by another transaction during `prewrite`.++TiKV use [`Latches`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/latch.rs#L125) to archieve this, you can consider it as a Map from key('s hashcode) to mutexes. You can lock a key in the `Latches` to prevent it be used by other transactions.++The latches is used in [`try_to_wake_up`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/scheduler.rs#L335) , this is called before each command is executed, it will lock all the latches the commands used.++![prewrite-dfd](transaction-handling-newbie-perspective/prewrite-dfd.svg)++#### PrewritePessimistic++[`PrewritePessimistic`'s handling](https://github.com/tikv/tikv/blob/3a4a0c98f9efc2b409add8cb6ac9e8886bb5730c/src/storage/txn/process.rs#L624) is very similiar to `Prewrite`, except it:++- doesn't need to read the write record for checking conflict+- downgrade the pessimistic lock to optimistic lock after prewrite+- needs to prevent deadlock

Could you give an example of how deadlock can happen and why it isn't a problem for optimistic transactions please

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.++### Commit++Now comes the most interesting part: commit, just like what I mentioned, commit in TiKV is based on [Percolator](https://research.google/pubs/pub36726/), but there are several things that are different:++- [Percolator](https://research.google/pubs/pub36726/) depends on BigTable's single row transaction, so we must implement something alike by ourselves in TiKV.+- We need to support pessimistic transaction.+  - This introduce some other problems such as dead lock.++So let's see how TiKV deal with these things.++#### Client++From the client side, the commit process is easy, you can see we use a [`TwoPhaseCommitter`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L249) to do the commit job, and what it does is just as the [Percolator](https://research.google/pubs/pub36726/) paper says: [`prewrite`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L278), [`commit_primary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L293) and finally [`commit_secondary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L310).++#### AcquirePessimisticLock++This one does not exists in client-rust for now, so you have to read TiDB's code [here](https://github.com/pingcap/tidb/blob/3748eb920300bd4bc0917ce852a14d90e8e0fafa/store/tikv/pessimistic.go#L58).++Basically, it sends a `PessimisticLockRequest` to TiKV, and TiKV will handle it [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L397), it just run `MvccTxn::acquire_pessimistic_lock` for each key to lock, which just put a lock on the key, the lock is just like the lock used in prewrite in optimistic transaction, the only differece is its type is `LockType::Pessimistic`.++And the it returns whether the lock is successful. If not, it will also [return the lock to wait for](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L447).++#### Prewrite++On TiKV side, the prewrite process happens [here in `process_write_impl`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/txn/process.rs#L557).++The first few lines of code (`if rows > FORWARD_MIN_MUTATIONS_NUM` part) is not covered by the [`TiKV Source Code Reading blogs`](https://pingcap.com/blog-cn/tikv-source-code-reading-12/). I guess it means:++```+if there's no "write" record in [mutations.minKey, mutation.maxKey] {+	skip_constraint_check = true;+  scan_mode = Some(ScanMode::Forward)+}+```++As far as I understand, it just provides a optimized way of checking the "write" column, see [tikv#5846](https://github.com/tikv/tikv/pull/5846) for details.++And no matter whether this branch is taken, we'll construct a `MvccTxn` , and then use it to do the prewrite job for each mutation the client sent to the TiKV server.++The [`MvccTxn::prewrite`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/mvcc/txn.rs#L563) function just do what the [Percolator](https://research.google/pubs/pub36726/) describes: check the `write` record in `[start_ts, ∞]` to find a newer write (this can be bypassed if `skip_constraint_check` is set, we can ignore this check safely in situations like import data). And then check whether the current key is locked at any timestamp. And finally use [`prewrite_key_value`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/txn.rs#L207) to lock the key and write the value in.++##### Latches++Just as I mentioned, there's no such things like "single row transaction" in TiKV, so we need another way to prevent the key's locking state changed by another transaction during `prewrite`.++TiKV use [`Latches`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/latch.rs#L125) to archieve this, you can consider it as a Map from key('s hashcode) to mutexes. You can lock a key in the `Latches` to prevent it be used by other transactions.++The latches is used in [`try_to_wake_up`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/scheduler.rs#L335) , this is called before each command is executed, it will lock all the latches the commands used.++![prewrite-dfd](transaction-handling-newbie-perspective/prewrite-dfd.svg)++#### PrewritePessimistic++[`PrewritePessimistic`'s handling](https://github.com/tikv/tikv/blob/3a4a0c98f9efc2b409add8cb6ac9e8886bb5730c/src/storage/txn/process.rs#L624) is very similiar to `Prewrite`, except it:++- doesn't need to read the write record for checking conflict+- downgrade the pessimistic lock to optimistic lock after prewrite+- needs to prevent deadlock++##### Dead lock handling++TiKV use deadlock detection to prevent deadlock.++The deadlock detector is made up with two parts: the [`LockManager`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/lock_manager/mod.rs#L49) and the [`Detector`](https://github.com/tikv/tikv/blob/3a4a0c98f9efc2b409add8cb6ac9e8886bb5730c/src/server/lock_manager/deadlock.rs#L467).++Basically, these two make a *Directed acyclic graph* with the transactions and the locks it require, if adding a node may break the "acyclic" rule, then a potential deadlock is detected, a separate doc will be add to describe the [`LockManager`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/lock_manager/mod.rs#L49).++#### (Do) Commit++After `prewrite` is done, the client will do the commit works: first commit the primary key, then the secondary ones, both these two kind of keys' commit are represented by the `Commit` command and handled [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L454). ++In the commit process we just use [`MvccTxn::commit`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/txn.rs#L681) to commit each key, which it does is much like [Percolator](https://research.google/pubs/pub36726/) describes,.++We also collect the released locks and use it to [wake up the waiting pessimistic transactions](https://github.com/tikv/tikv/blob/17e75b6d1d1a8f1fb419f8be249bc684b3defbdb/src/storage/txn/process.rs#L513).++### Rollback++#### (Optimistic) Rollback++On the client side, [rollback](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L327) just construct a `BatchRollbackRequest` and send it to server.

Could you explain what data is sent from client to server please?

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.++### Commit++Now comes the most interesting part: commit, just like what I mentioned, commit in TiKV is based on [Percolator](https://research.google/pubs/pub36726/), but there are several things that are different:++- [Percolator](https://research.google/pubs/pub36726/) depends on BigTable's single row transaction, so we must implement something alike by ourselves in TiKV.+- We need to support pessimistic transaction.+  - This introduce some other problems such as dead lock.++So let's see how TiKV deal with these things.++#### Client++From the client side, the commit process is easy, you can see we use a [`TwoPhaseCommitter`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L249) to do the commit job, and what it does is just as the [Percolator](https://research.google/pubs/pub36726/) paper says: [`prewrite`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L278), [`commit_primary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L293) and finally [`commit_secondary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L310).++#### AcquirePessimisticLock++This one does not exists in client-rust for now, so you have to read TiDB's code [here](https://github.com/pingcap/tidb/blob/3748eb920300bd4bc0917ce852a14d90e8e0fafa/store/tikv/pessimistic.go#L58).++Basically, it sends a `PessimisticLockRequest` to TiKV, and TiKV will handle it [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L397), it just run `MvccTxn::acquire_pessimistic_lock` for each key to lock, which just put a lock on the key, the lock is just like the lock used in prewrite in optimistic transaction, the only differece is its type is `LockType::Pessimistic`.++And the it returns whether the lock is successful. If not, it will also [return the lock to wait for](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L447).++#### Prewrite++On TiKV side, the prewrite process happens [here in `process_write_impl`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/txn/process.rs#L557).++The first few lines of code (`if rows > FORWARD_MIN_MUTATIONS_NUM` part) is not covered by the [`TiKV Source Code Reading blogs`](https://pingcap.com/blog-cn/tikv-source-code-reading-12/). I guess it means:++```+if there's no "write" record in [mutations.minKey, mutation.maxKey] {+	skip_constraint_check = true;+  scan_mode = Some(ScanMode::Forward)+}+```++As far as I understand, it just provides a optimized way of checking the "write" column, see [tikv#5846](https://github.com/tikv/tikv/pull/5846) for details.

@youjiali1995 PTAL

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.
In fact, write just write to local buffer. All data modifications will be sent to TiKV on prewrite.
longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.++### Commit++Now comes the most interesting part: commit, just like what I mentioned, commit in TiKV is based on [Percolator](https://research.google/pubs/pub36726/), but there are several things that are different:++- [Percolator](https://research.google/pubs/pub36726/) depends on BigTable's single row transaction, so we must implement something alike by ourselves in TiKV.+- We need to support pessimistic transaction.+  - This introduce some other problems such as dead lock.++So let's see how TiKV deal with these things.++#### Client++From the client side, the commit process is easy, you can see we use a [`TwoPhaseCommitter`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L249) to do the commit job, and what it does is just as the [Percolator](https://research.google/pubs/pub36726/) paper says: [`prewrite`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L278), [`commit_primary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L293) and finally [`commit_secondary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L310).++#### AcquirePessimisticLock++This one does not exists in client-rust for now, so you have to read TiDB's code [here](https://github.com/pingcap/tidb/blob/3748eb920300bd4bc0917ce852a14d90e8e0fafa/store/tikv/pessimistic.go#L58).++Basically, it sends a `PessimisticLockRequest` to TiKV, and TiKV will handle it [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L397), it just run `MvccTxn::acquire_pessimistic_lock` for each key to lock, which just put a lock on the key, the lock is just like the lock used in prewrite in optimistic transaction, the only differece is its type is `LockType::Pessimistic`.++And the it returns whether the lock is successful. If not, it will also [return the lock to wait for](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L447).++#### Prewrite++On TiKV side, the prewrite process happens [here in `process_write_impl`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/txn/process.rs#L557).++The first few lines of code (`if rows > FORWARD_MIN_MUTATIONS_NUM` part) is not covered by the [`TiKV Source Code Reading blogs`](https://pingcap.com/blog-cn/tikv-source-code-reading-12/). I guess it means:++```+if there's no "write" record in [mutations.minKey, mutation.maxKey] {+	skip_constraint_check = true;+  scan_mode = Some(ScanMode::Forward)+}+```++As far as I understand, it just provides a optimized way of checking the "write" column, see [tikv#5846](https://github.com/tikv/tikv/pull/5846) for details.++And no matter whether this branch is taken, we'll construct a `MvccTxn` , and then use it to do the prewrite job for each mutation the client sent to the TiKV server.++The [`MvccTxn::prewrite`](https://github.com/tikv/tikv/blob/4a75902f266fbbc064f0c19a2a681cfe66511bc3/src/storage/mvcc/txn.rs#L563) function just do what the [Percolator](https://research.google/pubs/pub36726/) describes: check the `write` record in `[start_ts, ∞]` to find a newer write (this can be bypassed if `skip_constraint_check` is set, we can ignore this check safely in situations like import data). And then check whether the current key is locked at any timestamp. And finally use [`prewrite_key_value`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/txn.rs#L207) to lock the key and write the value in.++##### Latches++Just as I mentioned, there's no such things like "single row transaction" in TiKV, so we need another way to prevent the key's locking state changed by another transaction during `prewrite`.++TiKV use [`Latches`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/latch.rs#L125) to archieve this, you can consider it as a Map from key('s hashcode) to mutexes. You can lock a key in the `Latches` to prevent it be used by other transactions.++The latches is used in [`try_to_wake_up`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/scheduler.rs#L335) , this is called before each command is executed, it will lock all the latches the commands used.++![prewrite-dfd](transaction-handling-newbie-perspective/prewrite-dfd.svg)++#### PrewritePessimistic++[`PrewritePessimistic`'s handling](https://github.com/tikv/tikv/blob/3a4a0c98f9efc2b409add8cb6ac9e8886bb5730c/src/storage/txn/process.rs#L624) is very similiar to `Prewrite`, except it:++- doesn't need to read the write record for checking conflict+- downgrade the pessimistic lock to optimistic lock after prewrite

why do we do this?

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)++Let's summerize the process with a dataflow diagram.++![single-point-get-dfd](transaction-handling-newbie-perspective/single-point-get-dfd.svg)++### Scan++On the client side, scan is almost the same as single point get, except that it sends a [`KvScan`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L22) grpc call instead of `KvGet`.++And on the TiKV side, things are a little different, firstly, the request will be handled by [`future_scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1161), and then [`Storage::scan`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L443),and finally we'll find out the function which really do the job is a [`Scanner`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mvcc/reader/scanner/mod.rs#L171), and we'll cover this part in another document. ++### Write++In fact, write just write to local buffer. All data modifications will be sent to TiKV on commit.++### Commit++Now comes the most interesting part: commit, just like what I mentioned, commit in TiKV is based on [Percolator](https://research.google/pubs/pub36726/), but there are several things that are different:++- [Percolator](https://research.google/pubs/pub36726/) depends on BigTable's single row transaction, so we must implement something alike by ourselves in TiKV.+- We need to support pessimistic transaction.+  - This introduce some other problems such as dead lock.++So let's see how TiKV deal with these things.++#### Client++From the client side, the commit process is easy, you can see we use a [`TwoPhaseCommitter`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L249) to do the commit job, and what it does is just as the [Percolator](https://research.google/pubs/pub36726/) paper says: [`prewrite`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L278), [`commit_primary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L293) and finally [`commit_secondary`](https://github.com/tikv/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L310).++#### AcquirePessimisticLock+

You should explain why the client wants to acquire the locks.

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.

'casted' should be 'cast', but I think it is a bit inaccurate since it is not a type cast. Maybe 'turned into' to 'converted into'

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.

I'm not sure what this means

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:++We find all the locks which are expired, and resolve them one by one. ++Then we'll get `lock_version`'s corresponding `commit_version` (might be buffered), and use it to send `cleanup_request`.++It seems that using `CleanupRequest` directly is deprecated after 4.0 , then we'll simply igonre it.++And then it is the key point: [`resolve_lock_with_retry`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/lock.rs#L74), this function will construct a  `ResolveLockRequest`, and send it to TiKV to execute.++Let's turn to TiKV's source code, according to whether the `key` on the request is empty, `ResolveLockRequest` will be casted into `ResolveLockReadPhase` + `ResolveLock` or `ResolveLockLite`. The difference between those two is that `ResolveLockLite` will only handle the locks `Request` ask for resolve, while `ResolveLock` will resolve locks in a whole region.++The handling of `ResolveLock` has 2 parts: the read phase is [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/process.rs#L122), which is resposible for read out the locks and construct the write phase command, and the write phase is [here](https://github.com/TiKV/TiKV/blob/82d180d120e115e69512ea7f944e93e6dc5022a0/src/storage/txn/process.rs#L775), which is responsible for the release work.++These two code part uses `MvccTxn` and `MvccReader`, we'll explain them later in another article.++[Comments](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/txn/commands/resolve_lock.rs#L17) here gives a good intruduction of what `ResolveLock` do.++And then we can go back to client-rust's `resolve_locks`, and continue with the other `expired_locks`.++And then, the result value is returned. (Finally!)

The client must resend the get message

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.++The remote fuction `GetStore` it defined is [here](https://github.com/pingcap/pd/blob/2b56a4c5915cb4b8806629193fd943a2e860ae4f/server/grpc_service.go#L171) in pd.++Finally we'll get a `KvRpcClient` instance, which represents the connection to a TiKV replica.++Then let's back to `retry_response_stream`, next function call we should pay attention to is  `store.dispatch`, it calls grpc function [`KvGet`](https://github.com/pingcap/kvproto/blob/5f564ec8820e3b4002930f6f3dd1fcd710d4ecd0/proto/tikvpb.proto#L21).++And finally we reach the code in TiKV's repo. In TiKV, the requests are handled by [`Server` struct](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/server.rs#L49) , and the `KvGet` will be handled by `future_get` [here](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/server/service/kv.rs#L1136).++Firstly we'll read the value for a key by using [`Storage::get`](https://github.com/tikv/tikv/blob/bf716a111fde9fe8da56f8bd840c53d80c395525/src/storage/mod.rs#L213).++`get` function is a little bit long, we'll ignore `STATIC` parts for now, and we'll get:++```rust, no_run+pub fn get(&self, mut ctx: Context, key: Key,+    start_ts: TimeStamp) -> impl Future<Item = Option<Value>, Error = Error> {+    const CMD: CommandKind = CommandKind::get;+    let priority = ctx.get_priority();+    let priority_tag = get_priority_tag(priority);++    let res = self.read_pool.spawn_handle(+        async move {+            // The bypass_locks set will be checked at most once. `TsSet::vec` is more efficient+            // here.+            let bypass_locks = TsSet::vec_from_u64s(ctx.take_resolved_locks());+            let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, &ctx)).await?;+            let snap_store = SnapshotStore::new(snapshot, start_ts,+                        ctx.get_isolation_level(),+                        !ctx.get_not_fill_cache(),+                        bypass_locks,+                        false);+            let result = snap_store.get(&key, &mut statistics)+                    // map storage::txn::Error -> storage::Error+                    .map_err(Error::from);+            result+        },+        priority,+        thread_rng().next_u64(),+    );+    res.map_err(|_| Error::from(ErrorInner::SchedTooBusy))+        .flatten()+}+```++This function will get a `snapshot`, and then construct a `SnapshotStore` by using the `snapshot`, and then call `get` on this `SnapshotStore`, and finally get the data we need.++The `bypass_locks` part is a tricky optimize related to [large transaction](https://pingcap.com/blog/large-transactions-in-tidb/), see [this pr](https://github.com/tikv/tikv/pull/5798).++Then we'll view the code of `SnapshotStore::get`, you'll see that in fact it consturcted a [`PointGetter`](https://github.com/tikv/tikv/blob/4ac9a68126056d1b7cf0fc9323b899253b73e577/src/storage/mvcc/reader/point_getter.rs#L133), and then call the `get` method on `PointGetter`:++```rust, no_run+pub fn get(&mut self, user_key: &Key) -> Result<Option<Value>> {+    if !self.multi {+        // Protect from calling `get()` multiple times when `multi == false`.+        if self.drained {+            return Ok(None);+        } else {+            self.drained = true;+        }+    }++    match self.isolation_level {+        IsolationLevel::Si => {+            // Check for locks that signal concurrent writes in Si.+            self.load_and_check_lock(user_key)?;+        }+        IsolationLevel::Rc => {}+    }++    self.load_data(user_key)+}+```++As we can see, if the required `isolation_level` is `Si`, we need to check whether there's any locks which may conflict with current get. If we find some, we'll return a  `KeyIsLocked` error:++```rust, no_run+fn load_and_check_lock(&mut self, user_key: &Key) -> Result<()> {+    self.statistics.lock.get += 1;+    let lock_value = self.snapshot.get_cf(CF_LOCK, user_key)?;++    if let Some(ref lock_value) = lock_value {+        self.statistics.lock.processed += 1;+        let lock = Lock::parse(lock_value)?;+        if self.met_newer_ts_data == NewerTsCheckState::NotMetYet {+            self.met_newer_ts_data = NewerTsCheckState::Met;+        }+        lock.check_ts_conflict(user_key, self.ts, &self.bypass_locks)+            .map_err(Into::into)+    } else {+        Ok(())+    }+}+```++And then we'll use `PointGetter`'s `load_data`  method to load the value.++Now we have the value in `GetResponse`, but the client still need to resolve the locked keys. This will still be handled in  `retry_response_stream`.++#### Resolve locks++First, we'll use `take_locks` to take the locks we met, and then we'll use `resolve_locks` to try to resolve them:

You should clarify that this is the client doing the actions, not TiKV. Also what is take_locks?

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}+```++We'll try to read the local buffered key first. And if the local buffered key does not exist, a `GetRequest` will be sent to TiKV.++You may have known that TiKV divide all the data into different regions, and each replica of some certain region is on its own TiKV node, and pd will manage the meta infomation about where are the replicas for some certain key is.++The code above seems doesn't cover the steps which decide which TiKV node should we send the request to. But that's not the case. The code which do these jobs is hidden under [`execute`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L29), and you'll find the code which tries to get the TiKV node [here](https://github.com/tikv/client-rust/blob/b7ced1f44ed9ece4405eee6d2573a6ca6fa46379/src/pd/client.rs#L42) , and it is called by `retry_response_stream` [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/request.rs#L48):++```rust, no_code+fn store_for_key(+        self: Arc<Self>,+        key: &Key,+    ) -> BoxFuture<'static, Result<Store<Self::KvClient>>> {+        self.region_for_key(key)+            .and_then(move |region| self.map_region_to_store(region))+            .boxed()+    }+```++Firstly, it will use grpc call [`GetRegion`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L41) in `region_for_key` to find out which region is the key in.++The remote fuction `GetRegion` it defined is [here](https://github.com/pingcap/pd/blob/6dab049720f4c4e1a91405806fc1fa6517928589/server/grpc_service.go#L416) in pd.++And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the major replica of region.
And then we'll use grpc call [`GetStore`](https://github.com/pingcap/kvproto/blob/d4aeb467de2904c19a20a12de47c25213b759da1/proto/pdpb.proto#L31) in `map_region_to_store` to find out the leader of the region.
longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.++This part of code is [here](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/transaction/transaction.rs#L71):++```rust, no_run+pub async fn get(&self, key: impl Into<Key>) -> Result<Option<Value>> {+	let key = key.into();+	self.buffer.get_or_else(key, |key| {+		new_mvcc_get_request(key, self.timestamp).execute(self.rpc.clone())+	}).await+}

I think this code snippet is a bit confusing since it focusses on the internals of the client, rather than the transaction protocol. Maybe it is better to show the protobuf definition for Get?

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.++### Single point read++We can use `Transaction::get`  to get value for a certain key.

I would explain that 'point read' means read a single value (since this is a guide for newbies, they might not know this)

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:++![architecture](transaction-handling-newbie-perspective/architecture.svg)++Though client is not part of TiKV, it is also an important to read some code in it to understand how a request is handled. ++There're many implements of client, and their process of sending a request is similiar, we'll take [client-rust](https://github.com/TiKV/client-rust) as an example here.++Basically, TiKV's transaction system is based on Google's [Percolator](https://research.google/pubs/pub36726/), you are recommended to read some material about it before you start reading this.++### Begin++You'll need a client object to start a transaction.++The code which creates a transaction is [here](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L28), you can see the client includes a `PdRpcClient`, which is responsible for communicate with the pd component.++And then you can use [`Client::begin`](https://github.com/tikv/client-rust/blob/07194c4c436e393358986b84daa2ad1e41b4886c/src/transaction/client.rs#L51) to start an transaction.++```rust, no_run+pub async fn begin(&self) -> Result<Transaction> {+	let timestamp = self.current_timestamp().await?;+	Ok(self.new_transaction(timestamp))+}+```++Firstly, we'll need to get a time stamp from pd, and then we'll create a new `Transaction` object by using current timestamp.++If you dive into `self.current_timestamp` , you'll find out that in fact it will put a request into [`PD::tso`](https://github.com/pingcap/kvproto/blob/da0b8ff0603cbedc90491042e835f114537ccee8/proto/pdpb.proto#L23) [rpc](https://github.com/TiKV/client-rust/blob/fe765f191115d5ca0eb05275e45e086c2276c2ed/src/pd/timestamp.rs#L66)'s param stream and receive a logical timestamp from the result stream.++The remote fuction `Tso` it defined is [here](https://github.com/pingcap/pd/blob/4971825321cf9dbf15b38f19ec5a9f8f27f4ffeb/server/grpc_service.go#L74) in pd.

I would not use so much detail when describing the client. I think it will go out of date very quickly. Better just to say the abstract steps like 'get a timestamp from PD'

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Transaction Handling Newbie Perspective

+# Transaction Handling Process++This article will introduce how transaction requests are handled in TiKV.++The urls in this article refers to the code which performs certain operation.++In a system which consists of TiDB and TiKV, the architecture looks like this:

The TiDB bit doesn't matter really, this is the architecture no matter which client is used, it is just that in this diagram the client is TiDB

longfangsong

comment created time in 8 days

PullRequestEvent

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Command {         }     } +    pub fn read_command_mut<E: Engine>(&mut self) -> &mut dyn ReadCommand<E> {

I would leave it as just self for now and we can deal with the dispatch issue later. We could use Box rather than & since I'm sure we can't leave the commands on the stack in any case.

longfangsong

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.++Multiple transactions may have identical commit timestamps. However one transaction must have distinct start_ts and commit_ts.++### Reads never fail+++Reads never fail in the Read Committed level. It will always read the latest committed version.
Reads never fail in the read committed level. The client will always read the most recent committed version.
ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.++Multiple transactions may have identical commit timestamps. However one transaction must have distinct start_ts and commit_ts.++### Reads never fail+++Reads never fail in the Read Committed level. It will always read the latest committed version.++Read requests can return Err in the Snapshot Isolation level if the key is locked with `lock_ts` < `read_ts`. The the client will try to resolve the lock and retry until it succeeds.++### TiKV nodes do not communicate with each other, only with a client

To clarify, they don't communicate at the transaction layer. Nodes withing the same Raft group do communicate to ensure consensus at the Raft level

ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.++Multiple transactions may have identical commit timestamps. However one transaction must have distinct start_ts and commit_ts.

start_ts of all transactions must still be unique too

ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.++Multiple transactions may have identical commit timestamps. However one transaction must have distinct start_ts and commit_ts.++### Reads never fail+++Reads never fail in the Read Committed level. It will always read the latest committed version.++Read requests can return Err in the Snapshot Isolation level if the key is locked with `lock_ts` < `read_ts`. The the client will try to resolve the lock and retry until it succeeds.++### TiKV nodes do not communicate with each other, only with a client++TiKV instances do not have to know each other. ++During the execution of transaction or raw kv requests, a TiKV instance will not need information from other TiKV instances. +This is guaranteed by the partitioning pattern that TiKV uses. +The whole span of data is divided into regions. +Each TiKV instance will only accept requests involving data lying in its regions, which should be guaranteed by the client.++### The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions++A TiKV instance does not have to know the topology. The client makes sure any request is sent to the right TiKV node that owns the data involved in the request.++The design decouples transaction logic and physical data distribution. It makes shceduling more flexible and elastic. +Imagine a redistribution of regions among a TiKV cluster that does not require any downtime or maintainance to either clients or TiKV instances.+PD as the scheduler can ask TiKV to redistribute regions, and send the latest region info to clients.++The overhead caused by such decoupling is extra network communication. Though clients must acquire regions' and TiKV stores' addresses from PD, these information be cached locally. If topology changes, client may failed some request and retry to refresh its cache. A long-live client should suffer little from it.++### If committing the primary key succeeds, then committing the secondary keys will never fail.++Even if it fails, the lock of a secondary key contains information of its primary key. Any transactions that meets the lock can recognize its state by reading the primary key and help commit the secondary key.
Even if the commit message sent to the secondary key fails, the lock of a secondary key contains information of its primary key. Any transactions that meets the lock can recognize its state by reading the primary key and help commit the secondary key.
ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.

To clarify, I think this does still hold unless the user opts-in to async commit.

ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.++TODO: further elaboration++### All timestamps are unique++This no longer holds. 1PC and Async commit could break this guarantee.++Multiple transactions may have identical commit timestamps. However one transaction must have distinct start_ts and commit_ts.++### Reads never fail+++Reads never fail in the Read Committed level. It will always read the latest committed version.++Read requests can return Err in the Snapshot Isolation level if the key is locked with `lock_ts` < `read_ts`. The the client will try to resolve the lock and retry until it succeeds.

I would clarify:

Read requests can return `KeyError` in the snapshot isolation level if the key is locked with `lock_ts` < `read_ts`. Then the client can try to resolve the lock and retry until it succeeds.
ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  TODO for each: why? implications, benefits -* Timestamps are supplied by the client, TiKV does not get them directly from PD (exception: CDC).-* All timestamps are unique.-* Reads never fail (due to the transaction protocol, there might be network issues, etc. which cause failure).-  - A locking read in a pessimistic transaction may block.-  - A non-locking read will never block.-* TiKV nodes do not communicate with each other, only with a client.-* The transaction layer does not know about region topology, in particular, it does not treat regions on the same node differently to other regions.-* If committing the primary key succeeds, then committing the secondary keys will never fail.+### Timestamps are supplied by the client++This decision benefits "user experience", performance and simplicity.++First, it gives users more control over the order of concurrent transactions.++For example, a client commits two transactions: T1 and then T2. +If timestamps are supplied by the user, it can assure that T1 won't read any effects of T2 if T1's timestamp is smaller than T2's.+While if we let TiKV get the timestamp, the user cannot get this guarantee because the order of processing T1 and T2 is nondeterministic.++Second, it simplifies the system. Otherwise we have to let TiKV maintain states of all active transactions. ++Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volumn of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.
Third, it is beneficial for performance. Letting TiKV maintain states of active transactions would lead to extra network communication. Large volume of transactions could overburden TiKV server. In addition, GC of inactive transactions is a problem.

"Letting TiKV maintain states of active transactions would lead to extra network communication" I don't think this is true - it should require at most the same amount of network communication, but would allow some optimisations which would reduce network overheads.

ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  ## Optimistic and pessimistic transactions -TiKV supports two transaction models. Optimistic transactions were implemented first and often when TiKV folks don't specify optimistic or pessimistic, they mean optimistic by default. In the optimistic model, reads and writes are built up locally. All writes are sent together in a prewrite. During prewrite, all keys to be written are locked. If any keys are locked by another transaction, return to client. If all prewrites succeed, the client sends a commit message.+TiKV supports two transaction models. Optimistic transactions were implemented first and often when TiKV folks don't specify optimistic or pessimistic, they mean optimistic by default. In the optimistic model, reads and writes are built up locally. All writes are sent together in a prewrite. During prewrite, all keys to be written are locked. If any keys are locked by another transaction, return to the client. If all prewrites succeed, the client sends a commit message. -Pessimistic transactions are the default in TiKV since 3.0.8. In the pessimistic model, there are *locking reads* (from `SELECT ... FOR UPDATE` statements), these read a value and lock the key. This means that reads can block. SQL statements which cause writes, lock the keys as they are executed. Writing to the keys is still postponed until prewrite. Prewrite and commit works+Pessimistic transactions are the default in TiKV since 3.0.8. In the pessimistic model, there are *locking reads* (from `SELECT ... FOR UPDATE` statements), these read a value and lock the key. This means that reads can block. SQL statements which cause writes, lock the keys as they are executed. Writing to the keys is still postponed until the prewrite. Prewrite and commit works

'the' is incorrect here. I can't even explain why, English just sucks.

ekexium

comment created time in 8 days

Pull request review commenttikv/sig-transaction

Add some elaboration in TiKV doc

 TODO  ## Optimistic and pessimistic transactions -TiKV supports two transaction models. Optimistic transactions were implemented first and often when TiKV folks don't specify optimistic or pessimistic, they mean optimistic by default. In the optimistic model, reads and writes are built up locally. All writes are sent together in a prewrite. During prewrite, all keys to be written are locked. If any keys are locked by another transaction, return to client. If all prewrites succeed, the client sends a commit message.+TiKV supports two transaction models. Optimistic transactions were implemented first and often when TiKV folks don't specify optimistic or pessimistic, they mean optimistic by default. In the optimistic model, reads and writes are built up locally. All writes are sent together in a prewrite. During prewrite, all keys to be written are locked. If any keys are locked by another transaction, return to the client. If all prewrites succeed, the client sends a commit message.

'the' is not needed here, if we do use it then it should be 'return them to the client'

ekexium

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Command {         }     } +    pub fn read_command_mut<E: Engine>(&mut self) -> &mut dyn ReadCommand<E> {

I see, so then why can't we use self rather than &mut self?

longfangsong

comment created time in 8 days

pull request commenttikv/tikv

txn: Change the order of `ResolveLock`'s read and write phase

@youjiali1995 actually, could you explain the performance constraints please? Do reads or writes or both cause pressure? When do we worry about contention? How should we optimally handle multiple resolve lock requests? How should we ideally prioritise resolve locks compared to other transactions. Am I correct that GC only uses ResolveLocks, but 'normal' transactions might use both lite and non-lite resolve locks? Furthermore, in that case we don't distinguish between resolve locks caused by GC and caused by other transactions?

longfangsong

comment created time in 8 days

pull request commenttikv/tikv

txn: Change the order of `ResolveLock`'s read and write phase

I'm thinking is it a good idea to do it because if there are many locks in one tikv, I'm afraid the scheduler-pool will be exhausted.

@youjiali1995 do you mean that since we don't wait for the write phases to finish before doing more reads that we will generate so many read and write commands that we won't have time to do other work?

longfangsong

comment created time in 8 days

pull request commenttikv/tikv

txn: Change the order of `ResolveLock`'s read and write phase

This is a great PR, but it is difficult for me to get my head around, I'd like to play around with the code a little bit (which is why I haven't reviewed yet). I'll do so today.

longfangsong

comment created time in 8 days

push eventlongfangsong/tikv

Edward Elric

commit sha 792dca542ef2f6712a8c622803a245ee26c9b56c

copr: remove nullable signature for expression (#8340) Signed-off-by: Edward Elric <sasuke688848@gmail.com>

view details

Nick Cameron

commit sha 032d9df32911bae78c5cb79ffe083a780c94f1b9

Merge branch 'master' into reorder-resolve-lock-rw

view details

push time in 8 days

PR closed pingcap/kvproto

Reviewers
Add a key error field to CheckSecondaryLocks

Was missing from the initial PR.

PTAL @sticnarf @youjiali1995

+614 -557

5 comments

2 changed files

nrc

pr closed time in 8 days

pull request commentpingcap/kvproto

Add a key error field to CheckSecondaryLocks

Having written the actual code, I also don't think this is necessary. Closing for now, sorry for the noise.

nrc

comment created time in 8 days

push eventlongfangsong/tikv

Edward Elric

commit sha 792dca542ef2f6712a8c622803a245ee26c9b56c

copr: remove nullable signature for expression (#8340) Signed-off-by: Edward Elric <sasuke688848@gmail.com>

view details

Nick Cameron

commit sha 6b71043bd1585327404ee2bbc27ac9111b043c70

Merge branch 'master' into refactor

view details

push time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 fn extract_lock_from_result<T>(res: &StorageResult<T>) -> Lock {     } } -pub(super) struct WriteResult {+pub struct WriteResult {

Same with WriteResult.

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<E: Engine>: CommandExt {+    fn process_read(+        &mut self,+        snapshot: E::Snap,+        statistics: &mut Statistics,+    ) -> Result<ProcessResult>;+}++pub trait WriteCommand<S: Snapshot, L: LockManager, P: PdClient + 'static>: CommandExt {

I think rather than introduce two new traits, it is better to add the process_* methods to the CommandExt trait and have them do nothing in the case that the command does not read or does not write. That might mean we have to return Option<Result<_>>, rather than just a Result.

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Command {         }     } +    pub fn read_command_mut<E: Engine>(&mut self) -> &mut dyn ReadCommand<E> {

Looking at this code, it seems to me that we Command should just be a trait, rather than an enum, then we could skip these methods and the as_ref/as_mut methods too. Maybe we could get rid of TypedCommand too? If this is a lot of work, lets leave it for another PR.

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl<E: Engine, L: LockManager, P: PdClient + 'static> Scheduler<E, L, P> {      /// Processes a read command within a worker thread, then posts `ReadFinished` message back to the     /// `Scheduler`.-    fn process_read(self, snapshot: E::Snap, task: Task, statistics: &mut Statistics) {+    fn process_read(self, snapshot: E::Snap, mut task: Task, statistics: &mut Statistics) {         fail_point!("txn_before_process_read");         debug!("process read cmd in worker pool"; "cid" => task.cid);          let tag = task.cmd.tag(); -        let pr = match process_read_impl::<E>(task.cmd, snapshot, statistics) {+        let pr = match task

Can use map_err here rather than matching

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 fn find_mvcc_infos_by_key<S: Snapshot>( #[cfg(test)]

Can we move the tests to the commands too?

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 // Copyright 2018 TiKV Project Authors. Licensed under Apache-2.0. -use std::sync::Arc;-use std::time::Duration;-use std::{mem, thread, u64};--use engine_traits::CF_WRITE;-use kvproto::kvrpcpb::{Context, ExtraOp, LockInfo};-use pd_client::PdClient;-use tikv_util::collections::HashMap;-use txn_types::{Key, Value};--use crate::storage::kv::{Engine, ScanMode, Snapshot, Statistics, WriteData};+use crate::storage::kv::{Snapshot, WriteData}; use crate::storage::lock_manager::{self, Lock, LockManager, WaitTimeout}; use crate::storage::mvcc::{-    has_data_in_range, Error as MvccError, ErrorInner as MvccErrorInner, Lock as MvccLock,-    MvccReader, MvccTxn, ReleasedLock, TimeStamp, Write, MAX_TXN_WRITE_SIZE,-};-use crate::storage::txn::{-    commands::{-        AcquirePessimisticLock, CheckTxnStatus, Cleanup, Command, Commit, MvccByKey, MvccByStartTs,-        Pause, PessimisticRollback, Prewrite, PrewritePessimistic, ResolveLock, ResolveLockLite,-        ResolveLockReadPhase, Rollback, ScanLock, TxnHeartBeat,-    },-    sched_pool::*,-    Error, ErrorInner, ProcessResult, Result,+    Error as MvccError, ErrorInner as MvccErrorInner, Lock as MvccLock, MvccReader, ReleasedLock,+    TimeStamp, Write, };+use crate::storage::txn::{Error, ErrorInner, ProcessResult, Result}; use crate::storage::{-    types::{MvccInfo, PessimisticLockRes, PrewriteResult, TxnStatus},     Error as StorageError, ErrorInner as StorageErrorInner, Result as StorageResult, };+use kvproto::kvrpcpb::Context;+use txn_types::{Key, Value};  // To resolve a key, the write size is about 100~150 bytes, depending on key and value length. // The write batch will be around 32KB if we scan 256 keys each time. pub const RESOLVE_LOCK_BATCH_SIZE: usize = 256; -const FORWARD_MIN_MUTATIONS_NUM: usize = 12;--pub(super) fn process_read_impl<E: Engine>(-    mut cmd: Command,-    snapshot: E::Snap,-    statistics: &mut Statistics,-) -> Result<ProcessResult> {-    let tag = cmd.tag();-    match cmd {-        Command::MvccByKey(MvccByKey { ref key, ref ctx }) => {-            let mut reader = MvccReader::new(-                snapshot,-                Some(ScanMode::Forward),-                !ctx.get_not_fill_cache(),-                ctx.get_isolation_level(),-            );-            let result = find_mvcc_infos_by_key(&mut reader, key, TimeStamp::max());-            statistics.add(reader.get_statistics());-            let (lock, writes, values) = result?;-            Ok(ProcessResult::MvccKey {-                mvcc: MvccInfo {-                    lock,-                    writes,-                    values,-                },-            })-        }-        Command::MvccByStartTs(MvccByStartTs { start_ts, ctx }) => {-            let mut reader = MvccReader::new(-                snapshot,-                Some(ScanMode::Forward),-                !ctx.get_not_fill_cache(),-                ctx.get_isolation_level(),-            );-            match reader.seek_ts(start_ts)? {-                Some(key) => {-                    let result = find_mvcc_infos_by_key(&mut reader, &key, TimeStamp::max());-                    statistics.add(reader.get_statistics());-                    let (lock, writes, values) = result?;-                    Ok(ProcessResult::MvccStartTs {-                        mvcc: Some((-                            key,-                            MvccInfo {-                                lock,-                                writes,-                                values,-                            },-                        )),-                    })-                }-                None => Ok(ProcessResult::MvccStartTs { mvcc: None }),-            }-        }-        // Scans locks with timestamp <= `max_ts`-        Command::ScanLock(ScanLock {-            max_ts,-            ref start_key,-            limit,-            ref ctx,-            ..-        }) => {-            let mut reader = MvccReader::new(-                snapshot,-                Some(ScanMode::Forward),-                !ctx.get_not_fill_cache(),-                ctx.get_isolation_level(),-            );-            let result = reader.scan_locks(start_key.as_ref(), |lock| lock.ts <= max_ts, limit);-            statistics.add(reader.get_statistics());-            let (kv_pairs, _) = result?;-            let mut locks = Vec::with_capacity(kv_pairs.len());-            for (key, lock) in kv_pairs {-                let mut lock_info = LockInfo::default();-                lock_info.set_primary_lock(lock.primary);-                lock_info.set_lock_version(lock.ts.into_inner());-                lock_info.set_key(key.into_raw()?);-                lock_info.set_lock_ttl(lock.ttl);-                lock_info.set_txn_size(lock.txn_size);-                locks.push(lock_info);-            }--            tls_collect_keyread_histogram_vec(tag.get_str(), locks.len() as f64);--            Ok(ProcessResult::Locks { locks })-        }-        Command::ResolveLockReadPhase(ResolveLockReadPhase {-            ref mut txn_status,-            ref scan_key,-            ref ctx,-            ..-        }) => {-            let mut reader = MvccReader::new(-                snapshot,-                Some(ScanMode::Forward),-                !ctx.get_not_fill_cache(),-                ctx.get_isolation_level(),-            );-            let result = reader.scan_locks(-                scan_key.as_ref(),-                |lock| txn_status.contains_key(&lock.ts),-                RESOLVE_LOCK_BATCH_SIZE,-            );-            statistics.add(reader.get_statistics());-            let (kv_pairs, has_remain) = result?;-            tls_collect_keyread_histogram_vec(tag.get_str(), kv_pairs.len() as f64);--            if kv_pairs.is_empty() {-                Ok(ProcessResult::Res)-            } else {-                let next_scan_key = if has_remain {-                    // There might be more locks.-                    kv_pairs.last().map(|(k, _lock)| k.clone())-                } else {-                    // All locks are scanned-                    None-                };-                Ok(ProcessResult::NextCommand {-                    cmd: ResolveLock::new(-                        mem::take(txn_status),-                        next_scan_key,-                        kv_pairs,-                        ctx.clone(),-                    )-                    .into(),-                })-            }-        }-        _ => panic!("unsupported read command"),-    }-}- #[derive(Default)]-struct ReleasedLocks {+pub struct ReleasedLocks {

We could move ReleasedLocks to commands/mod.rs and keep it private

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<E: Engine>: CommandExt {

Why parameterize over the Engine, seems we only need E::Snap so this should parameterize over Snapshot like WriteCommand

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Command {         }     } +    pub fn read_command_mut<E: Engine>(&mut self) -> &mut dyn ReadCommand<E> {

Where is the copy? I see that we convert to a trait object, but it seems to only be a reference copy?

longfangsong

comment created time in 8 days

Pull request review commenttikv/tikv

txn: Move Command's read or write process to their own file

 impl Debug for Command {         self.command_ext().fmt(f)     } }++pub trait ReadCommand<E: Engine>: CommandExt {+    fn process_read(+        &mut self,+        snapshot: E::Snap,+        statistics: &mut Statistics,+    ) -> Result<ProcessResult>;+}++pub trait WriteCommand<S: Snapshot, L: LockManager, P: PdClient + 'static>: CommandExt {+    fn process_write(+        &mut self,+        snapshot: S,+        lock_mgr: &L,+        pd_client: Arc<P>,+        extra_op: ExtraOp,+        statistics: &mut Statistics,+        pipelined_pessimistic_lock: bool,

I think we can probably factor these args out into a struct (or maybe two structs - I feel like the snapshot, lock manager, and pd client, are some abstraction over the storage layer, and the rest are some kind of context)

longfangsong

comment created time in 8 days

push eventyoujiali1995/tikv

Iosmanthus Teng

commit sha 055c5750bebaad1b673035382df172fc13a534a4

copr: fix index out of range in analyze column (#8298) Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

view details

5kbpers

commit sha d889068d178c246711f05196c30a1b202804c6d6

cdc: support to sink old value (#8201) * raftstore: support to output old value to CmdObserver Signed-off-by: 5kbpers <tangminghua@pingcap.com> * cdc: support to output old value Signed-off-by: 5kbpers <tangminghua@pingcap.com> * close ExtraOp when unsubcribing region Signed-off-by: 5kbpers <tangminghua@pingcap.com> * make format Signed-off-by: 5kbpers <tangminghua@pingcap.com> * fix test Signed-off-by: 5kbpers <tangminghua@pingcap.com> * remove unuseful log Signed-off-by: 5kbpers <tangminghua@pingcap.com> * revert cdc/Cargo.toml Signed-off-by: 5kbpers <tangminghua@pingcap.com> * integration test Signed-off-by: 5kbpers <tangminghua@pingcap.com> * fix test build for protobuf Signed-off-by: 5kbpers <tangminghua@pingcap.com> * make clippy Signed-off-by: 5kbpers <tangminghua@pingcap.com> * address comments Signed-off-by: 5kbpers <tangminghua@pingcap.com> * make clippy Signed-off-by: 5kbpers <tangminghua@pingcap.com> * address comments Signed-off-by: 5kbpers <tangminghua@pingcap.com> * address comments Signed-off-by: 5kbpers <tangminghua@pingcap.com>

view details

庄天翼

commit sha f168104d3e678a9f83c9a7581110fffd93ad6e92

copr: remove nullable signature for expression like (#8331) Signed-off-by: TennyZhuang <zty0826@gmail.com>

view details

phosphorus

commit sha a4f38fb45195f11576e1d888cab8212b5d43b1d7

copr: remove nullable signature for expression (#8333) Signed-off-by: Phosphorus15 <steepout@qq.com>

view details

Tianxiao Shen

commit sha bf716a111fde9fe8da56f8bd840c53d80c395525

copr: remove nullable signature for math expressions (#8335) Signed-off-by: xxchan <xxchan@sjtu.edu.cn>

view details

Lei Zhao

commit sha 307e27edc4a99a8bd7d5de0067a520d43276896f

storage: remove useless codes (#8342) Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

view details

Edward Elric

commit sha 792dca542ef2f6712a8c622803a245ee26c9b56c

copr: remove nullable signature for expression (#8340) Signed-off-by: Edward Elric <sasuke688848@gmail.com>

view details

Nick Cameron

commit sha 5720da80dc16ba86942011d73d8ebc263bf07590

Merge branch 'master' into remove-assert

view details

push time in 8 days

pull request commenttikv/tikv

storage: remove useless codes

/merge

youjiali1995

comment created time in 9 days

push eventtikv/client-rust

Nick Cameron

commit sha a6414c0aa544d249682682e24d8678c2a1d1f2bd

Move last_connected time from PD cluster to RetryClient Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha f16df2dbc5c5f2a0c0350e2976649b165711ec62

Make tso crate-private Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha bc9ae7103edbca9d0ab8ddf6b64a14bd796df096

Refactor pd clients Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha a70aa203b0e8ae58f239950a5b746813d85014a3

Move some functionality for pd cluster up a layer Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha d110ad1ecb1deb3380702b3c68990a29a78a5cdd

Move region and store from common to store crate Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha 92a1649e487c16401dfda57101834772791ff4dc

Tidy up Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha 9d9dca410312566f3f65361b708df5439f3aea32

Merge pull request #158 from nrc/refactor Refactor PD crate to make it more independent

view details

push time in 9 days

PR merged tikv/client-rust

Refactor PD crate to make it more independent

This PR refactors the PD crate and pd/retry to make the PD crate lower level and more independent. That in turn let me move some stuff out of the common crate. The tricky bit was refactoring retry to work with async/await. I have converted a bunch of code from futures to async/await along the way.

PTAL @sticnarf @ekexium

+304 -300

1 comment

17 changed files

nrc

pr closed time in 9 days

push eventnrc/rustaceans.org

Dan Bruder

commit sha 5cd7fa0bd0a2cf1681d44e943a4988bb93ea1410

Update danbruder.json

view details

Nick Cameron

commit sha 69971ba464bbd3dffe589a5b4ef374f655ed1d94

Merge pull request #641 from danbruder/patch-1 Update danbruder.json email and notes

view details

push time in 9 days

issue commenttikv/tikv

Async Commit

@abbccdda yes - if TiDB attempts to read and finds a lock, it will block until the lock times out or is committed/rolled back. If it times out, then TiDB starts the recovery procedure, at the end of which the transaction is committed or rolled back, and the value reported to the client.

In the happy case, as soon as all writes are visible, TiDB sends a commit message, so only clients who try to read before that message happens will go through the recovery procedure.

zhangjinpeng1987

comment created time in 9 days

PR opened tikv/tikv

WIP implement check secondary locks component/storage component/transaction sig/transaction status/WIP

Signed-off-by: Nick Cameron nrc@ncameron.org

<!-- Thank you for contributing to TiKV!

If you haven't already, please read TiKV's CONTRIBUTING document.

If you're unsure about anything, just ask; somebody should be along to answer within a day or two.

PR Title Format:

  1. module [, module2, module3]: what's changed
  2. *: what's changed -->

What problem does this PR solve?

WIP - blocked on https://github.com/pingcap/kvproto/pull/657 and implementing a test

Issue Number: cc #8316

Problem Summary: Implements a command for checking the status of async commit locks

What is changed and how it works?

Proposal: xxx <!-- REMOVE this line if not applicable -->

What's Changed: Adds command boilerplate, plus implementation in process.rs/txn.rs for CheckSecondaryLocks.

Tests <!-- At least one of them must be included. -->

  • Unit test - TODO

Release note <!-- bugfixes or new feature need a release note -->

No release note (partial implementation only)

+229 -19

0 comment

11 changed files

pr created time in 9 days

create barnchnrc/tikv

branch : async3

created branch time in 9 days

PR opened pingcap/kvproto

Reviewers
Add a key error field to CheckSecondaryLocks

Was missing from the initial PR.

PTAL @sticnarf @youjiali1995

+614 -557

0 comment

2 changed files

pr created time in 9 days

create barnchnrc/kvproto

branch : check-err

created branch time in 9 days

push eventnrc/client-rust

longfangsong

commit sha f02bdf08204bad943071bdbfe56ca416a45a6559

remove compact code Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 0845b893d028cefa9bb65a768c245965a9f5133a

fix ci Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 0fccfed5cd7ca12977ccb307401b316970bf0371

specify rust toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha d6ef4db3f2fa5e960d9e90b134d8253266860025

add nightly toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 519b360c65eeca7917a93b493f422831318269e5

add nightly toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha f8510f960241f1ba6cac78b40869db6645628f50

Merge remote-tracking branch 'upstream/master' into remove-compact # Conflicts: # Cargo.toml # src/kv_client/mod.rs # tikv-client-pd/src/cluster.rs

view details

longfangsong

commit sha 484d0bb0a3d7c12fc0031cc9f6185ad3be80e686

remove compact code Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha fa157d440c327177425b81e7d6114e7649ddf08c

merge master into this Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 325ca2c9028b6a4688e6142be96520f45d857671

Remove rust-toolchain and unnecessary restrictions to rust version in .travis.yml Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 100ae95a21c372799ee16ea21bd7099b0e3a7c6d

Update dependencies Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 913da196f0be6c7e91e49a3cac8930f9117af299

Remove unnecessary file Thanks to @nrc Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

Nick Cameron

commit sha 07194c4c436e393358986b84daa2ad1e41b4886c

Merge pull request #149 from longfangsong/remove-compact Remove compat

view details

Nick Cameron

commit sha a6414c0aa544d249682682e24d8678c2a1d1f2bd

Move last_connected time from PD cluster to RetryClient Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha f16df2dbc5c5f2a0c0350e2976649b165711ec62

Make tso crate-private Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha bc9ae7103edbca9d0ab8ddf6b64a14bd796df096

Refactor pd clients Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha a70aa203b0e8ae58f239950a5b746813d85014a3

Move some functionality for pd cluster up a layer Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha d110ad1ecb1deb3380702b3c68990a29a78a5cdd

Move region and store from common to store crate Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

Nick Cameron

commit sha 92a1649e487c16401dfda57101834772791ff4dc

Tidy up Signed-off-by: Nick Cameron <nrc@ncameron.org>

view details

push time in 9 days

push eventtikv/client-rust

longfangsong

commit sha f02bdf08204bad943071bdbfe56ca416a45a6559

remove compact code Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 0845b893d028cefa9bb65a768c245965a9f5133a

fix ci Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 0fccfed5cd7ca12977ccb307401b316970bf0371

specify rust toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha d6ef4db3f2fa5e960d9e90b134d8253266860025

add nightly toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 519b360c65eeca7917a93b493f422831318269e5

add nightly toolchain Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha f8510f960241f1ba6cac78b40869db6645628f50

Merge remote-tracking branch 'upstream/master' into remove-compact # Conflicts: # Cargo.toml # src/kv_client/mod.rs # tikv-client-pd/src/cluster.rs

view details

longfangsong

commit sha 484d0bb0a3d7c12fc0031cc9f6185ad3be80e686

remove compact code Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha fa157d440c327177425b81e7d6114e7649ddf08c

merge master into this Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 325ca2c9028b6a4688e6142be96520f45d857671

Remove rust-toolchain and unnecessary restrictions to rust version in .travis.yml Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 100ae95a21c372799ee16ea21bd7099b0e3a7c6d

Update dependencies Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

longfangsong

commit sha 913da196f0be6c7e91e49a3cac8930f9117af299

Remove unnecessary file Thanks to @nrc Signed-off-by: longfangsong <longfangsong@icloud.com>

view details

Nick Cameron

commit sha 07194c4c436e393358986b84daa2ad1e41b4886c

Merge pull request #149 from longfangsong/remove-compact Remove compat

view details

push time in 9 days

issue closedtikv/client-rust

Upgrade gRPC and remove futures compat code

GRPC-rs now supports futures 0.3 so if we upgrade then we shouldn’t need the compatibility shims to go from 0.1 to 0.3.

closed time in 9 days

nrc
more