profile
viewpoint
Michael Sproul michaelsproul @sigp Sydney, Australia https://sproul.xyz/ Building Ethereum 2.0 at Sigma Prime. I like types, specs and proofs.

michaelsproul/aus_senate 25

Implementation of the Australian senate voting algorithm

michaelsproul/autozfs 13

Auto-mount ZFS external hard drives (macOS)

michaelsproul/bullshit 11

Simple horoscope generator in Python

eth2-clients/slashing-protection-interchange-tests 5

Tests for the slashing database interchange format

michaelsproul/bulk-pay-aus 4

Make batch payments from Australian banks

michaelsproul/colourcat 4

Tool for distinctly colouring numbers, hashes and identifiers in log files

michaelsproul/arduino-power-monitor 2

Monitor household power usage with an Arduino.

michaelsproul/.files 1

Configuation files and symlink deploy script

delete branch sigp/lighthouse

delete branch : fix-eth1

delete time in 13 hours

PullRequestReviewEvent

Pull request review commentsigp/lighthouse

Ensure eth1 deposit/chain IDs are used from YamlConfig

 impl YamlConfig {             genesis_slot: chain_spec.genesis_slot,             far_future_epoch: chain_spec.far_future_epoch,             base_rewards_per_epoch: chain_spec.base_rewards_per_epoch,-            deposit_chain_id: chain_spec.deposit_chain_id,-            deposit_network_id: chain_spec.deposit_network_id,+            deposit_chain_id: self.deposit_chain_id,+            deposit_network_id: self.deposit_network_id,

These should go in a different section above (their own section I think), rather than under the Constants, not configurable heading IMO

paulhauner

comment created time in 15 hours

PullRequestReviewEvent
PullRequestReviewEvent

issue commentAlethio/eth2stats-client

Eth2Stats client using 1.5GB memory

I'm running v0.14.0 now with Lighthouse v0.3.1 and there is no longer any issues (Lighthouse's op pool is under control as of v0.3.1)

michaelsproul

comment created time in 15 hours

delete branch sigp/lighthouse

delete branch : clippy-tidy

delete time in 18 hours

issue openedsigp/lighthouse

Reduce validator client duty traffic

Description

A user on Discord is reporting their VC falling behind on duties when running with a high number of validators (1k), particularly when running the VC and BN on different machines. They're seeing errors like this:

beacon node:
19:33:44.010 WARN Error processing HTTP API request       method: GET, path: /eth/v1/validator/duties/proposer/18548, status: 400 Bad Request, elapsed: 2.045724ms

validator:
19:33:44.326 ERRO Failed to download validator duties     error: Failed to get proposer indices: ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: requested epoch is 18548 but only current epoch 18549 is allowed", stacktraces: [] }), service: duties

19:33:44.327 WARN Skipping block production for expired slot, info: Your machine could be overloaded, notification_slot: 593567, current_slot: 593568, service: block

I suspect the cause of the issue is the (serial) loading of individual duties every slot, here:

https://github.com/sigp/lighthouse/blob/eba51f0973e0636e34d608ebdf501b64a6696a3d/validator_client/src/duties_service.rs#L592-L610

We could improve the situation by requesting duties less often (not as safe), in bulk (not sure if this is supported by the standard API), or in parallel. More thought and testing required.

Version

Lighthouse v0.3.x (presumably)

created time in 18 hours

PullRequestReviewEvent

issue commentsigp/lighthouse

Investigate flock-based lock files

For the record, Teku deliberately designed our lock files to be compatible with Lighthouse's so the locking works across clients.

Wow, I didn't know this!

We might be able to keep being mostly compatible, in that Lighthouse could delete its lock files in the best case, and use the OS-lock in the same way Teku uses the PID

michaelsproul

comment created time in 19 hours

startedConsenSys/eth2.0-dafny

started time in 2 days

issue openedsigp/lighthouse

Investigate flock-based lock files

At the moment we use a hand-rolled implementation of lock files that requires the process to exit cleanly in order for the locks to get cleaned up. I suspect we could achieve better guarantees and better UX using file locking primitives provided by the OS. The most common and widely-supported syscall seems to be flock, which allows a process to lock a file exclusively, releasing it only once the file is closed (which happens regardless of how the process exits). This means we'd move from a paradigm of checking if lock files exist on start-up, to acquiring locks on files that may already exist, and we would no longer need to delete the lock files on shutdown.

There are a few Rust crates providing high-level cross-platform wrappers over syscalls like flock which I think we should investigate:

  • https://docs.rs/fs3/0.5.0/fs3/trait.FileExt.html
  • https://docs.rs/file-lock/1.1.20/file_lock/index.html

created time in 2 days

PR opened sigp/lighthouse

Update slashing protection interchange to v5

Proposed Changes

Update the slashing protection interchange format to v5 in preparation for finalisation as part of an EIP.

Also, add some more tests and update the commit hash for https://github.com/eth2-clients/slashing-protection-interchange-tests to include the new generated tests.

+90 -28

0 comment

5 changed files

pr created time in 4 days

push eventsigp/lighthouse

Michael Sproul

commit sha 73ed5ffe2b175ac9106d4b4a21b4e88661a698cf

Update slashing protection interchange to v5

view details

push time in 4 days

create barnchsigp/lighthouse

branch : interchange-v5

created branch time in 4 days

delete branch eth2-clients/slashing-protection-interchange-tests

delete branch : ci

delete time in 4 days

delete branch eth2-clients/slashing-protection-interchange-tests

delete branch : more-tests

delete time in 4 days

push eventeth2-clients/slashing-protection-interchange-tests

Michael Sproul

commit sha 359085be9da6e5e19644977aa45947bcec5d99de

Add more v5 tests (#1)

view details

push time in 4 days

PR merged eth2-clients/slashing-protection-interchange-tests

Add more v5 tests

Add a test using multiple validators, and a test with signing_root

+2 -0

0 comment

2 changed files

michaelsproul

pr closed time in 4 days

PR opened eth2-clients/slashing-protection-interchange-tests

Add more v5 tests

Add a test using multiple validators, and a test with signing_root

+2 -0

0 comment

2 changed files

pr created time in 4 days

create barncheth2-clients/slashing-protection-interchange-tests

branch : more-tests

created branch time in 4 days

push eventeth2-clients/slashing-protection-interchange-tests

Michael Sproul

commit sha 7726d14667b44b7fc00c4abe960ff8ae0621ae9c

Add schema checks to CI

view details

push time in 4 days

create barncheth2-clients/slashing-protection-interchange-tests

branch : ci

created branch time in 4 days

pull request commentsigp/lighthouse

Version bump to 0.3.1

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

With the conflict resolved, this might be able to go in a batch with #1811

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Adds colour help to bn and vc subcommands

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

I couldn't get this into a 3 way rollup with #1810 and #1812, there must be a merge conflict or something. Once they've merged to master we can try this again

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Adds colour help to bn and vc subcommands

There's something wrong with CI on this PR, the debug build had stalled. If you force push a new head in a bit we can try it again

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Update to latest libp2p

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Increase UPnP logging and decrease batch sizes

All together now...

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

bors r-

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Update to latest libp2p

bors r-

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Increase UPnP logging and decrease batch sizes

bors r-

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Increase UPnP logging and decrease batch sizes

bors r- bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Update to latest libp2p

bors r- bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

bors r-

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

bors r+

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Discovery v5.1

Going to try to clear the queue

bors r-

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Adds colour help to bn and vc subcommands

bors retry

AgeManning

comment created time in 4 days

pull request commentsigp/lighthouse

Version bump to 0.3.1

Need to update Cargo.lock (running make lint is the fastest way I think)

AgeManning

comment created time in 4 days

delete branch sigp/lighthouse

delete branch : temp-states

delete time in 4 days

pull request commentmichaelsproul/rust_radix_trie

TrieKey support for Vec<T> of integer types

No worries, you did well!

jakobdalsgaard

comment created time in 4 days

pull request commentmichaelsproul/rust_radix_trie

TrieKey support for Vec<T> of integer types

Done, v0.2.1 contains this commit :tada: https://crates.io/crates/radix_trie/0.2.1

jakobdalsgaard

comment created time in 4 days

push eventmichaelsproul/rust_radix_trie

Michael Sproul

commit sha 21446ae3ef3c7b9916aadf195f630daa1da25874

Release 0.2.1

view details

push time in 4 days

pull request commentmichaelsproul/rust_radix_trie

TrieKey support for Vec<T> of integer types

Yeah that's great! Thank you! I'll publish a new release now

jakobdalsgaard

comment created time in 4 days

push eventmichaelsproul/rust_radix_trie

Jakob Dalsgaard

commit sha d4ab984d77f5c7fe5b02d549497ae09546b71eb0

TrieKey support for Vec<T> of integer types (#63)

view details

push time in 4 days

PR merged michaelsproul/rust_radix_trie

TrieKey support for Vec<T> of integer types

I wanted to use your implementation with TrieKeys of u32 and due to Rust orphan rules could not implement this in my own crate; there is an open issue on creating support for TrieKey of Vec<T> for T being arbitrary types -- which off course would be nice -- in the meantime, would this solution be worth adding?

+18 -0

0 comment

1 changed file

jakobdalsgaard

pr closed time in 4 days

push eventeth2-clients/slashing-protection-interchange-tests

Michael Sproul

commit sha a3dc016dcccc9fd36e8066ff357d3d0617c9125a

Update README example to v5

view details

Michael Sproul

commit sha 7bf7e08129ff3c9c99e52494756d522549d33dd7

Add schema and validator

view details

push time in 4 days

push eventeth2-clients/slashing-protection-interchange-tests

Michael Sproul

commit sha a301696891c0e6adfac32d710687859673da1238

Bump tests to v5

view details

push time in 4 days

pull request commentsigp/lighthouse

Implement database temp states to reduce memory usage

This is ready for review now. I'm quite happy with the locking solution for now -- it's general enough to be useful for other concurrency issues if we encounter them, but doesn't require us to rewrite everything that touches the DB.

I've also disabled the --max-skip-slots flag by default. The risk here is that long forks could waste our CPU cycles and disk space, but only within limits (and the disk issue can likely be addressed by a more aggressive pruning as in #1782).

michaelsproul

comment created time in 4 days

push eventsigp/lighthouse

Michael Sproul

commit sha 467de4c8d0bab31bcbbe2d830af31fd3ecedda75

Add docs for slashing protection (#1760) ## Proposed Changes * Add documentation about slashing protection, including how to troubleshoot issues and move between clients. * Add an error message if the validator client is started with 0 validators. Previously it would hit an error relating to the slashing protection database not existing, which wrongly pushed people towards using the unsafe `--init-slashing-protection` flag.

view details

blacktemplar

commit sha 8248afa7932cb7eb89fd2f5cfded9a77078a6ed8

Updates the message-id according to the Networking Spec (#1752) ## Proposed Changes Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.

view details

blacktemplar

commit sha a0634cc64f5938f0a9ad75894b9e73f595919f70

Gossipsub topic filters (#1767) ## Proposed Changes Adds a gossipsub topic filter that only allows subscribing and incoming subscriptions from valid ETH2 topics. ## Additional Info Currently the preparation of the valid topic hashes uses only the current fork id but in the future it must also use all possible future fork ids for planned forks. This has to get added when hard coded forks get implemented. DO NOT MERGE: We first need to merge the libp2p changes (see https://github.com/sigp/rust-libp2p/pull/70) so that we can refer from here to a commit hash inside the lighthouse branch.

view details

Pawan Dhananjay

commit sha aadbab47cc7d04fa61bb0750cda3e21d4c86d527

Doc fixes (#1762) ## Issue Addressed N/A ## Proposed Changes Minor doc fixes. Adds a section on custom data directories. Co-authored-by: Michael Sproul <micsproul@gmail.com>

view details

Pawan Dhananjay

commit sha 97be2ca295f4c60f1a970bc5e291730359adf806

Simulator and attestation service fixes (#1747) ## Issue Addressed #1729 #1730 Which issue # does this PR address? ## Proposed Changes 1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr. 2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely. More testing is needed for attestation service fixes.

view details

Herman Junge

commit sha d7b9d0dd9f4e35680515c0cd0ed437d4e1ad625a

Implement matches! macro (#1777) Fix #1775

view details

blacktemplar

commit sha 6ba997b88e2b7ba12d02598b7a73c4479c34adbf

add direction information to PeerInfo (#1768) ## Issue Addressed NA ## Proposed Changes Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.

view details

Michael Sproul

commit sha 703c33bdc729570942314239b339678628cc149c

Fix head tracker concurrency bugs (#1771) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.

view details

divma

commit sha 2acf75785c316a8ab9745976f075a2b271406773

More sync updates (#1791) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager

view details

Paul Hauner

commit sha 02d94a70b7fa50d6a6b3d7343e6ad8a3ad9470f3

Allow VC to start without any validators (#1779) ## Issue Addressed NA ## Proposed Changes - Don't exit early if the VC is without any validators. - When there are no validators, always create the slashing database (even without `--init-slashing-protection`).

view details

realbigsean

commit sha fdb9744759245f1df5df4d5b25a27bbfef70aebe

use head slot instead of the target slot for the not_while_syncing fi… (#1802) ## Issue Addressed Resolves #1792 ## Proposed Changes Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter` ## Additional Info N/A

view details

realbigsean

commit sha 628891df1d29920173284f89200b661d4aa2987b

fix genesis state root provided to HTTP server (#1783) ## Issue Addressed Resolves #1776 ## Proposed Changes The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field. ## Additional Info

view details

Paul Hauner

commit sha e1eec7828b479d1c8f7d30456aaae898d6e25c81

Fix error in VC API docs (#1800) ## Issue Addressed NA ## Proposed Changes - Ensure the `description` field is included with the output (as per the implementation). ## Additional Info NA

view details

divma

commit sha 668513b67ee2bbe1f6eb93832a6340d0a084d0b8

Sync state adjustments (#1804) check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads

view details

Daniel Schonfeld

commit sha 8f86baa48d232c2ec9a13adec5b5526955ff6d10

Optimize attester slashing (#1745) ## Issue Addressed Closes #1548 ## Proposed Changes Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances ## Additional Info Initial pass, need to write a test for it

view details

realbigsean

commit sha a3552a4b7003066e350db89a12ab32fcb504074b

Node endpoints (#1778) ## Issue Addressed `node` endpoints in #1434 ## Proposed Changes Implement these: ``` /eth/v1/node/health /eth/v1/node/peers/{peer_id} /eth/v1/node/peers ``` - Add an `Option<Enr>` to `PeerInfo` - Finish implementation of `/eth/v1/node/identity` ## Additional Info - should update the `peers` endpoints when #1764 is resolved Co-authored-by: realbigsean <seananderson33@gmail.com>

view details

Michael Sproul

commit sha 1ebcdf3407d7e7194c84f09258bad00d751b5599

Update to spec v1.0.0-rc.0 and BLSv4

view details

Kirk Baird

commit sha fbdf5e95a5ecb788717fc4b75537537811382186

Update blst and milagro_bls subgroup checking (#1793) * Update blst and milagro_bls subgroup checking Signed-off-by: Kirk Baird <baird.k@outlook.com> * cargo fmt Signed-off-by: Kirk Baird <baird.k@outlook.com>

view details

Michael Sproul

commit sha 083db9c94f9f2f035624c3ee2f3ad520005ada14

Remove hardcoded v0.12.x testnets

view details

push time in 4 days

delete branch sigp/lighthouse

delete branch : refine-pool-pruning

delete time in 5 days

push eventsigp/lighthouse

Kirk Baird

commit sha dd34beccc00564c018105538e0668b2dd94fb9d8

Update blst and milagro_bls subgroup checking (#1793) * Update blst and milagro_bls subgroup checking Signed-off-by: Kirk Baird <baird.k@outlook.com> * cargo fmt Signed-off-by: Kirk Baird <baird.k@outlook.com>

view details

push time in 5 days

delete branch sigp/lighthouse

delete branch : spec-v1.0.0-rc-bls-subgroup

delete time in 5 days

PR merged sigp/lighthouse

Update blst and milagro_bls subgroup checking

Issue Addressed

n/a

Proposed Changes

Subgroup checks are added for blst and updated in milagro_bls.

Subgroup checks will now be done:

  • during deserialisation for PublicKeys
  • during verification for Signatures/AggregateSignatures

Additional Info

We can no longer safely aggregate Signatures that have not been verified. The only place Signature aggregation currently occurs is during Attestation aggregation which will only be done if the signature is verified.

Currently we are pointing to our own fork of blst again to expose the subgroup checking function. We should update back to supranational/blst after this PR is merged https://github.com/supranational/blst/pull/35.

+35 -79

0 comment

5 changed files

kirk-baird

pr closed time in 5 days

PullRequestReviewEvent

pull request commentsigp/lighthouse

Refine op pool pruning

Thank you!

bors r+

michaelsproul

comment created time in 5 days

Pull request review commentsigp/lighthouse

Refine op pool pruning

 impl<T: EthSpec> OperationPool<T> {     }      /// Prune if validator has already exited at the last finalized state.-    pub fn prune_voluntary_exits(&self, finalized_state: &BeaconState<T>) {+    pub fn prune_voluntary_exits(&self, finalized_state: &BeaconState<T>, spec: &ChainSpec) {         prune_validator_hash_map(             &mut self.voluntary_exits.write(),-            |validator| validator.is_exited_at(finalized_state.current_epoch()),+            |validator| validator.exit_epoch != spec.far_future_epoch,             finalized_state,         );     }      /// Prune all types of transactions given the latest finalized state and head fork.

Hmm, true. I'll save the CI mins and just merge tho

michaelsproul

comment created time in 5 days

PullRequestReviewEvent

push eventsigp/lighthouse

divma

commit sha 2acf75785c316a8ab9745976f075a2b271406773

More sync updates (#1791) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager

view details

Paul Hauner

commit sha 02d94a70b7fa50d6a6b3d7343e6ad8a3ad9470f3

Allow VC to start without any validators (#1779) ## Issue Addressed NA ## Proposed Changes - Don't exit early if the VC is without any validators. - When there are no validators, always create the slashing database (even without `--init-slashing-protection`).

view details

realbigsean

commit sha fdb9744759245f1df5df4d5b25a27bbfef70aebe

use head slot instead of the target slot for the not_while_syncing fi… (#1802) ## Issue Addressed Resolves #1792 ## Proposed Changes Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter` ## Additional Info N/A

view details

realbigsean

commit sha 628891df1d29920173284f89200b661d4aa2987b

fix genesis state root provided to HTTP server (#1783) ## Issue Addressed Resolves #1776 ## Proposed Changes The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field. ## Additional Info

view details

Paul Hauner

commit sha e1eec7828b479d1c8f7d30456aaae898d6e25c81

Fix error in VC API docs (#1800) ## Issue Addressed NA ## Proposed Changes - Ensure the `description` field is included with the output (as per the implementation). ## Additional Info NA

view details

divma

commit sha 668513b67ee2bbe1f6eb93832a6340d0a084d0b8

Sync state adjustments (#1804) check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads

view details

Daniel Schonfeld

commit sha 8f86baa48d232c2ec9a13adec5b5526955ff6d10

Optimize attester slashing (#1745) ## Issue Addressed Closes #1548 ## Proposed Changes Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances ## Additional Info Initial pass, need to write a test for it

view details

realbigsean

commit sha a3552a4b7003066e350db89a12ab32fcb504074b

Node endpoints (#1778) ## Issue Addressed `node` endpoints in #1434 ## Proposed Changes Implement these: ``` /eth/v1/node/health /eth/v1/node/peers/{peer_id} /eth/v1/node/peers ``` - Add an `Option<Enr>` to `PeerInfo` - Finish implementation of `/eth/v1/node/identity` ## Additional Info - should update the `peers` endpoints when #1764 is resolved Co-authored-by: realbigsean <seananderson33@gmail.com>

view details

Michael Sproul

commit sha c1220c6c4e5c298234332474688b86613c210ece

Implement temp states to reduce memory usage

view details

Michael Sproul

commit sha ffe991fb1d67057758b7cf521ee6b199b3c08189

Bump database schema to v2

view details

Michael Sproul

commit sha f7566da3841a398ec3104aec033d6b7ea1014fd7

Implement garbage collection for temp states

view details

Michael Sproul

commit sha 9a2fb7edf280fead9df962af9833847110ec903e

Cleanups

view details

Michael Sproul

commit sha 9cadf5b83b5cacb4a036f0fa50f9c65832c99e93

Add transaction locking to fix race condition

view details

Michael Sproul

commit sha eeef4a13849e202ef7f0cd0d3d41dceab90f0e35

Turn off --max-skip-slots by default

view details

push time in 5 days

pull request commentsigp/lighthouse

Refine op pool pruning

Ready for review now

michaelsproul

comment created time in 5 days

push eventsigp/lighthouse

Michael Sproul

commit sha 8ebfb0018565ab37292ec29a119cefeaac070e91

Add transaction locking to fix race condition

view details

push time in 5 days

push eventsigp/lighthouse

Michael Sproul

commit sha d34bb98ad83031124329a088722381f02bd0f3ad

Refine op pool pruning

view details

push time in 5 days

pull request commentsigp/lighthouse

Refine op pool pruning

Fixing CI locally now

michaelsproul

comment created time in 5 days

PullRequestReviewEvent

pull request commentsigp/lighthouse

Refine op pool pruning

Running on my Medalla node, this pruned the attestation pool from 1528950 attestations down to 3266 :sweat_smile:

michaelsproul

comment created time in 5 days

PR opened sigp/lighthouse

Reviewers
Refine op pool pruning A0 ready-for-review t Consensus & Verification

Issue Addressed

Closes #1769 Closes #1708

Proposed Changes

Tweaks the op pool pruning so that the attestation pool is pruned against the wall-clock epoch instead of the finalized state's epoch. This should reduce the unbounded growth that we've seen during periods without finality.

Also fixes up the voluntary exit pruning as raised in #1708.

+25 -17

0 comment

2 changed files

pr created time in 5 days

create barnchsigp/lighthouse

branch : refine-pool-pruning

created branch time in 5 days

issue commentsigp/lighthouse

Remove pubkey cache file

From Discord:

CRIT Failed to start beacon node             reason: Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("InconsistentIndex { expected: Some(79843), found: 0 }")

Reportedly occurred after OOM.

michaelsproul

comment created time in 5 days

pull request commentsigp/lighthouse

Implement database temp states to reduce memory usage

Another option could be to only allow a single atomic tx to be prepared at once (e.g., by passing some Mutex token around).

Yeah, I was thinking of something along these lines, just a lock around that critical section. I'll experiment with it today.

michaelsproul

comment created time in 5 days

pull request commentsigp/lighthouse

Implement database temp states to reduce memory usage

Another change that should go into this PR: increasing the max skip slots to ♾, or deleting the flag entirely, but I think we may as well keep it

michaelsproul

comment created time in 5 days

PR opened sigp/lighthouse

Implement database temp states to reduce memory usage A0 ready-for-review t Database & Slashing Protection

Issue Addressed

Closes #800 Closes #1713

Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

  • Add DBColumn::BeaconStateTemporary, for storing 0-length temporary marker values.
  • Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
  • Add a garbage collection process to delete leftover temporary states on start-up.
  • Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

Race 1: Permanent state marked temporary

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

  1. Thread 1 checks if state s already exists, and seeing that it doesn't, prepares an atomic commit of (s, s_temporary_flag).
  2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and deleting the temporary flag.
  3. Thread 1 is (finally) scheduled again, and marks s as temporary with its transaction. 4a. The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state s will fail... hope it isn't on the main chain! Alternatively (4b) happens... 4b. Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state s will disappear temporarily, but will come back once thread 1 finishes running.

I hope that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. LMDB only allows one R/W transaction at a time.

Race 2: Temporary state returned from get_state

I don't think this race really matters, but in load_hot_state, if another thread stores a state between when we call load_state_temporary_flag and when we call load_hot_state_summary, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than not having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.

+312 -77

0 comment

11 changed files

pr created time in 6 days

create barnchmichaelsproul/lighthouse

branch : demo-memory-exhaustion

created branch time in 6 days

push eventsigp/lighthouse

Michael Sproul

commit sha d21611bb0cdbe8a6d97cd364a09197ac3c9a846d

Cleanups

view details

push time in 6 days

create barnchsigp/lighthouse

branch : temp-states

created branch time in 6 days

Pull request review commentsigp/lighthouse

Optimize attester slashing

 impl TestingAttesterSlashingBuilder {             );             let message = attestation.data.signing_root(domain); -            for validator_index in validator_indices {+            for validator_index in indices_to_sign {

Oh yeah, actually that makes a lot of sense. Those tests are weird (horrible)

danielschonfeld

comment created time in 6 days

PullRequestReviewEvent

Pull request review commentsigp/lighthouse

Optimize attester slashing

 impl TestingAttesterSlashingBuilder {             );             let message = attestation.data.signing_root(domain); -            for validator_index in validator_indices {+            for validator_index in indices_to_sign {

This could be simplified to attestation.attesting_indices, but other than that, looks great!

danielschonfeld

comment created time in 6 days

PullRequestReviewEvent
PullRequestReviewEvent

startedAeledfyr/deepsize

started time in 6 days

issue commentsigp/lighthouse

Memory usage increases significantly during non-finality

I'm working on this as part of #800 and #1769. Once those issues are resolved we can reassess memory usage to see if it's within acceptable limits and close this issue.

emansipater

comment created time in 6 days

issue commentsigp/lighthouse

Large enum variant in store

I've fixed this while working on #800

paulhauner

comment created time in 6 days

issue commentsigp/lighthouse

Excessive pooled attestation count

Our attestation pruning is too conservative, it only deletes relative to the finalized state's epoch, but could safely use the current epoch IMO. I.e. prune attestations with targets older than or equal current_epoch.saturating_sub(2).

I'll raise a PR

paulhauner

comment created time in 7 days

delete branch sigp/lighthouse

delete branch : head-tracker-fix

delete time in 7 days

create barnchmichaelsproul/lighthouse

branch : temp-states

created branch time in 7 days

issue openedsigp/lighthouse

Remove pubkey cache file

Description

The pubkey cache file is unnecessary when we have a perfectly good database available. I think it would save complexity and reduce the likelihood of error if it were removed and rolled into the database. Its separate handling has been the source of bugs, e.g. #1680

This also links into the locking timeout issue #1096.

created time in 8 days

issue openedsigp/lighthouse

Remove head tracker in favour of fork choice

Description

The head tracker contains redundant information that is already present in the fork choice block DAG. This is undesirable as it means there isn't just "one source of truth", and the two data structures need to be kept in sync across concurrent executions, which is highly non-trivial (see #1771 for more)

Steps to resolve

  • Write code to derive the list of heads from fork choice
  • Remove the HeadTracker struct from the beacon chain
  • Port the migration code to use the head data from the fork choice
  • Work out how to reconcile pruning of fork choice with pruning of the database. Possible approaches:
    • Mark blocks deleted from disk as deleted in fork choice, until they are cleaned up by fork choice's own pruning mechanism
    • Mirror the pruning of the on-disk database in the fork choice struct (i.e. prune the two structures together)

created time in 8 days

pull request commentsigp/lighthouse

Fix head tracker concurrency bugs

Oh yeah, I'll create the issue for deleting the head track now

bors r+

michaelsproul

comment created time in 8 days

issue commentsigp/lighthouse

Potential memory exhaustion vector

I'm coming round to the idea of the garbage collection of temp states as the only way to delete states created as temporary. Otherwise two threads could race, and if one reverts, it might delete a state required by the other. A flow something like:

  1. Thread 1 atomically stores state s and flag s_temp.
  2. Thread 2 wants to store s, but seeing it there already just stages the deletion of s_temp later
  3. Thread 1 errors, and commits the "on drop" revert transaction -- deleting s and s_temp
  4. Thread 2 succeeds, re-deleting s_temp. However, the state that it wanted stored, s, is also gone :frowning: #rekt

With the garbage collection approach, we just leave the temp states there if the block import fails, and they either get cleaned up by future blocks which make them non-temporary, or at startup by iterating through the temp states column and deleting all the entries and their corresponding states (before allowing any block importer to start running in parallel).

All this concurrency has me like :exploding_head:

paulhauner

comment created time in 8 days

push eventsigp/lighthouse

Michael Sproul

commit sha b2145f85f8ae713d3c2a29adf980105a23230494

Fix clippy

view details

push time in 8 days

pull request commentsigp/lighthouse

Allow VC to start without any validators

This was an oversight on my part, and I'm mostly in favour of merging, I'm just a bit hesitant about creating an empty slashing protection DB which users could confuse for their real DB during the v0.3 migration.

A cleaner solution might be to make the slashing protection DB optional in the validator store, initialising it to None if 0 validators are provided on start-up, and then creating it via the API if necessary. What do you think?

paulhauner

comment created time in 8 days

pull request commentsigp/lighthouse

Fix head tracker concurrency bugs

@paulhauner I ended up not persisting fork choice, because I think there's no invariant violated if the head tracker and fork choice have different sets of blocks. The main reason we were careful about persisting the head and fork choice was related to the head block root, which I (re)discovered is completely unused (I've opened #1784 to track this). This lead to several simplifications -- no more locking the head in persisted_head_and_fork_choice :open_mouth:, and no more threading PersistedBeaconChain through to migration, only to overwrite the head tracker field.

Let me know if you think differently, but I couldn't find any places where we assume block in head_tracker --> block in fork_choice or vice versa.

As it stands I think the main invariants we uphold are:

block in head_tracker --> block in database

block in fork_choice --> block in database

and we used to try to maintain:

persisted_beacon_chain in database --> persisted_beacon_chain.canonical_head_block_root in fork_choice

although we don't any longer.

michaelsproul

comment created time in 8 days

push eventsigp/lighthouse

Michael Sproul

commit sha 7f8ddd78c6021d60ae136884b54d69cf45c88910

Clean up PersistedBeaconChain a bit

view details

Michael Sproul

commit sha 26ff3bfdc3b996415873c8aa585c379af25f1e17

Clean up head tracker persistence

view details

push time in 8 days

issue openedsigp/lighthouse

Remove canonical_head_block_root from PersistedBeaconChain

Description

In #1639 the canonical_head_block_root field of PersistedBeaconChain was rendered obsolete by the use of fork choice to derive the head block on startup. We intended to remove it entirely when we did the breaking schema change for v0.3.0, but that PR (#1638) got closed and forgotten about :frowning_face:

As a practice migration, I think we should remove the field entirely in a future release, i.e. automatically update the user's database from the old schema (with the block root) to the new (without) on startup.

created time in 8 days

push eventsigp/lighthouse

Michael Sproul

commit sha 2c1b9574aa6ad76053ade786fa36ed7d55ed736b

Persist fork choice after pruning

view details

push time in 8 days

pull request commentsigp/lighthouse

Optimize attester slashing

The tests so far look great! Thanks!

I'm not sure I understand this. Explain please?

Calling ctxt.attester_slashing([1, 2, 3, 4]) will just create a slashing with attestation_1.indices: [1, 2, 3, 4] and attestation_2.indices: [1, 2, 3, 4]. It would also be nice to test with non-identical sets like attestation_1.indices: [1, 2, 3], attestation_2.indices: [3, 4, 5], i.e. a slashing that only slashes val 3, but includes attestations signed by {1, 2} and {4, 5}. It would require mucking around with TestingAttesterSlashingBuilder to get it to take two lists of indices. We could write the more generic function that takes two lists, then implement the current double function in terms of it to minimise breakage of other code (trust me these builders are a PITA). We don't need to do that for this PR, but it'd be great if you felt like it

Also, is there a way to test different effective balances?

Yep! It's a bit dodgy, but I think mutating them on the ctxt state after it's created is probably the easiest way to do it, like:

let mut ctxt = TestContext::new();
ctxt.state.validators[i].effective_balance = 17_000_000_000;
// Make sure you borrow state _after_ doing the mutations
let (op_pool, state, spec) = (&ctxt.op_pool, &ctxt.state, &ctxt.spec);

If something mucks up, it might be prudent to call state.drop_all_caches(), as technically we shouldn't be mutating the state like this ;)

danielschonfeld

comment created time in 8 days

Pull request review commentsigp/lighthouse

Optimize attester slashing

 impl<T: EthSpec> OperationPool<T> {          // Set of validators to be slashed, so we don't attempt to construct invalid attester         // slashings.-        let mut to_be_slashed = proposer_slashings+        let to_be_slashed = proposer_slashings             .iter()             .map(|s| s.signed_header_1.message.proposer_index)             .collect::<HashSet<_>>(); -        let epoch = state.current_epoch();-        let attester_slashings = self+        let coverage: Vec<AttesterSlashing<T>> = self             .attester_slashings             .read()             .iter()-            .filter(|(slashing, fork)| {-                if *fork != state.fork.previous_version && *fork != state.fork.current_version {-                    return false;-                }--                // Take all slashings that will slash 1 or more validators.-                let slashed_validators =-                    get_slashable_indices_modular(state, slashing, |index, validator| {-                        validator.is_slashable_at(epoch) && !to_be_slashed.contains(&index)-                    });--                // Extend the `to_be_slashed` set so subsequent iterations don't try to include-                // useless slashings.-                if let Ok(validators) = slashed_validators {-                    to_be_slashed.extend(validators);-                    true-                } else {-                    false-                }-            })-            .take(T::MaxAttesterSlashings::to_usize())             .map(|(slashing, _)| slashing.clone())             .collect();

Oh right, there's a slightly tricky borrow issue to get past here. The self.attester_slashings.read() call returns a lock guard that needs to be held until maximum_cover is called. You can achieve this by assigning it to a local variable like this:

        let reader = self.attester_slashings.read();
        let relevant_attester_slashings = reader.iter().flat_map(|(slashing, fork)| {
            if *fork == state.fork.previous_version || *fork == state.fork.current_version {
                AttesterSlashingMaxCover::new(slashing, &to_be_slashed, state, spec)
            } else {
                None
            }
        });

        let attester_slashings = maximum_cover(
            relevant_attester_slashings,
            T::MaxAttesterSlashings::to_usize(),
        );

I also added the fork check back in, as that will be relevant when we hard fork

danielschonfeld

comment created time in 8 days

more