profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/borzunov/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Alexander Borzunov borzunov Yandex Research (@yandex-research) Building hivemind for @learning-at-home // ex⁠-⁠research engineer at Yandex Self-Driving, ex⁠-⁠intern at Facebook

learning-at-home/hivemind 841

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

borzunov/cpmoptimize 172

🚀 🐍 Optimizes Python bytecode calculating linear recurrences, reducing the time complexity from O(n) to O(log n)

DestructiveVoice/DestructiveFarm 145

📢 🔒 Exploit manager for attack-defense CTF competitions

borzunov/bit-torrent 121

📁 🌎 BitTorrent client built with Python + asyncio

borzunov/remoteink 90

📖 🖥️ Turns PocketBook E-Ink reader into a computer monitor

borzunov/plusplus 81

Enables increment operators in Python with a bytecode hack

borzunov/dontasq 31

⚡🐍 Extends built-in Python collections with LINQ-style methods

borzunov/alice_scripts 29

👩 📜 Easy way to make skills for Alice (Russian voice assistant)

slava-sh/messenger 9

Toy messaging platform

borzunov/timus-charts 5

Adds charts to Timus Online Judge profiles

startedv-iashin/SpecVQGAN

started time in 4 hours

startedCompVis/taming-transformers

started time in 5 hours

PullRequestReviewEvent

startedopenai/DALL-E

started time in a day

issue commentlearning-at-home/hivemind

Authorization protocol for a moderated Hivemind network

Not completely, see the comment above. Previously, the task was blocked by implementing the averager over libp2p. Now, I see two options:

  1. Sign all averager messages as described in the protocol above (this involves signing only a hash of the whole message, so the performance overhead may turn out negligible).
  2. If the option 1 turns out to be too slow, come up with a solution involving some group auth keys stored in a DHT.
borzunov

comment created time in 2 days

issue commentlearning-at-home/hivemind

Averaging is extremely slow in some setups

The problem was due to congestions in connections with asymmetric bandwidth, in particular, ACK compression. libp2p could not receive ACKs for too long and started to retransmit older TCP messages, making the situation even worse.

yhn112

comment created time in 2 days

startedwebdataset/webdataset

started time in 2 days

startedilanschnell/bitarray

started time in 3 days

startedhuggingface/paper-style-guide

started time in 4 days

push eventlearning-at-home/hivemind

Aleksandr Borzunov

commit sha b1bec423a44400978182d306cfc304b4b3d6701f

Fix type annotation

view details

push time in 5 days

push eventlearning-at-home/hivemind

Aleksandr Borzunov

commit sha 495b8e904df5127549055640ae93f8f35b92094c

Simplify Servicer._make_rpc_caller()

view details

push time in 5 days

push eventlearning-at-home/hivemind

Your Name

commit sha d20f221368c2c186d58d5b2728ad8258226f07d5

Simplify Servicer._make_rpc_caller()

view details

push time in 5 days

push eventlearning-at-home/hivemind

Your Name

commit sha 5b803be73786e983fc68dca42648686abfd8753c

Simplify Servicer._make_rpc_caller()

view details

push time in 5 days

PullRequestReviewEvent

Pull request review commentlearning-at-home/hivemind

Implement simplified all-reduce for asymmetric TCP connections

 class AllReduceRunner(ServicerBase):       (the actual number of values by peer will be nearly proportional, but there are no exact guarantees)     :param modes: AveragingMode for each peer in ordered_peer_ids (normal, client-only or auxiliary)     :param gathered: additional user-defined data collected from this group-    :param kwargs: additional paramters (e.g. part_size_bytes) will be passed to TensorPartContainer+    :param kwargs: additional parameters (e.g. part_size_bytes) will be passed to TensorPartContainer+    :note: full mode peers send and receive tensor parts concurrently, assuming full-duplex TCP stream. In turn,+      non-averaging peers will only receive results after they finished sending, which helps them avoid congestion+      in case of asymmetric high-latency connections, avoiding issues such as ACK compression.
    :note: Full-mode peers send and receive tensor parts concurrently, assuming a full-duplex TCP stream. In turn,
      non-averaging peers receive results only after they finish sending, which helps them avoid
      throughput issues in case of asymmetric high-latency connections (e.g. ACK congestion).

Minor fixes to the comment such as:

  • ACK compression -> ACK congestion
  • helps them avoid congestion <...>, avoiding <...>
  • English tenses
justheuristic

comment created time in 5 days

PullRequestReviewEvent

Pull request review commentlearning-at-home/hivemind

Implement simplified all-reduce for asymmetric TCP connections

 async def rpc_aggregate_part(         elif request.code == averaging_pb2.PART_FOR_AVERAGING:             try:                 sender_index = self.sender_peer_ids.index(context.remote_id)-                async for msg in self._accumulate_parts_streaming(achain(as_aiter(request), stream), sender_index):-                    yield msg++                if not self.should_delay_results(context.remote_id):+                    async for msg in self._accumulate_parts_streaming(achain(as_aiter(request), stream), sender_index):+                        yield msg++                else:+                    done_receiving = asyncio.Event()+                    delayed_results = asyncio.Queue()++                    async def _accumulate_parts():+                        inputs_aiter = attach_event_on_finished(achain(as_aiter(request), stream), done_receiving)+                        async for msg in self._accumulate_parts_streaming(inputs_aiter, sender_index):+                            delayed_results.put_nowait(msg)+                        delayed_results.put_nowait(None)++                    accumulate_task = asyncio.create_task(_accumulate_parts())++                    await done_receiving.wait()++                    while True:+                        next_result = await delayed_results.get()+                        if next_result is None:+                            break+                        yield next_result+                    await accumulate_task

This code seems equivalent to the following, isn't it?

                    delayed_results = []
                    async for msg in self._accumulate_parts_streaming(achain(as_aiter(request), stream), sender_index):
                        delayed_results.append(msg)
                    for msg in delayed_results:
                        yield msg
justheuristic

comment created time in 5 days

Pull request review commentlearning-at-home/hivemind

Implement simplified all-reduce for asymmetric TCP connections

 def group_size(self):     def _get_peer_stub(self, peer: PeerID) -> StubBase:         return self._servicer_type.get_stub(self._p2p, peer, namespace=self._prefix) +    def should_delay_results(self, peer_id: PeerID) -> bool:+        return self.peer_fractions[self.ordered_peer_ids.index(peer_id)] == 0

Should we keep an option to enable bidirectional streaming even in the client-mode peers?

justheuristic

comment created time in 5 days

PullRequestReviewEvent

startedEleutherAI/project-menu

started time in 6 days

PR opened learning-at-home/dalle

Add hivemind backend

Runs on a part of LAION with:

CUDA_VISIBLE_DEVICES=XXX python train_dalle.py \
  --image_text_folder https://YYY.tar --wds jpg,txt --truncate_captions \
  --distr_backend hivemind --batch_size_per_step 4 --target_batch_size 128 --target_group_size 2 \
  --initial_peers ZZZ

The --initial_peers may be omitted on the first peer.

Screenshot 2021-10-15 at 23 40 11

+59 -0

0 comment

3 changed files

pr created time in 7 days

PR closed lucidrains/DALLE-pytorch

[removed]

Removed.

+59 -0

0 comment

3 changed files

borzunov

pr closed time in 7 days

push eventlearning-at-home/dalle

Aleksandr Borzunov

commit sha bc625ca883e919b609e6516264eebf2e6d9339fd

Add hivemind backend

view details

push time in 7 days

PR opened lucidrains/DALLE-pytorch

Add hivemind backend
+61 -0

0 comment

3 changed files

pr created time in 7 days

create barnchlearning-at-home/dalle

branch : hivemind-backend

created branch time in 7 days

push eventlearning-at-home/dalle

push time in 7 days

push eventlearning-at-home/dalle

Aleksandr Borzunov

commit sha d8994262febf7843a442101215b635dfe84c863b

Add hivemind backend

view details

push time in 7 days

push eventlearning-at-home/hivemind

Alexander Borzunov

commit sha 91d1d31796c656d6f85793665d749fb60d6ed93d

Fix minor issues in documentation (#392) - Refer to Discord in docs - Highlight bibtex syntax - Update macOS compatibility info - Make bibtex formatting consistent - Make PyPI badge blue instead of orange - Remove link to the Learning@home homepage - Update log format in examples/albert/README.md

view details

push time in 8 days

delete branch learning-at-home/hivemind

delete branch : refer-to-discord-in-docs

delete time in 8 days