profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/justheuristic/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
justheuristic justheuristic YSDA Moscow, Russia https://github.com/learning-at-home/hivemind building the hivemind

ddtm/dl-course 105

DL Course Materials

dbaranchuk/memory-efficient-maml 66

Memory efficient MAML using gradient checkpointing

dbaranchuk/learning-to-route 48

Code for ICML2019 paper: Learning to Route in Similarity Graphs

bigscience-workshop/Megatron-DeepSpeed 42

Ongoing research training transformer language models at scale, including: BERT & GPT-2

justheuristic/as_a_service 7

A simple module that turns your batch-parallel function into a background service. Useful for RL/DL experiments.

justheuristic/Anton-python-training 3

A temporary repo used to train Anton in the ways of python

justheuristic/GoTo-workshops 1

GoTo Hack: workshop preparations

justheuristic/AgentNet 0

Deep Reinforcement Learning library for humans

create barnchlearning-at-home/hivemind

branch : bzn

created branch time in 7 hours

push eventlearning-at-home/hivemind

justheuristic

commit sha b5c06bda864742488d852c8ffbb2a9a48d6904da

isort

view details

push time in 9 hours

push eventlearning-at-home/hivemind

justheuristic

commit sha 33e4afe3600b1008f1002d33a21d527d85873f73

isort

view details

push time in 9 hours

push eventlearning-at-home/hivemind

justheuristic

commit sha 512cadc34e12a9dc3af4cb2c67264782054d1009

and now its black

view details

push time in 9 hours

Pull request review commentlearning-at-home/hivemind

FP16 support, a few patches from sahajbert

 def load_state_from_peers(self, **kwargs):                 try:                     self.averager.load_state_from_peers(timeout=self.load_state_timeout, **kwargs)                     break+                except KeyboardInterrupt:+                    raise

This would previously cause a deadlock if user would KeyboardInterrupt CollaborativeOptimizer during load_state_from_peers

justheuristic

comment created time in 9 hours

PullRequestReviewEvent

Pull request review commentlearning-at-home/hivemind

FP16 support, a few patches from sahajbert

 def accumulated_grads(self) -> Iterator[torch.Tensor]:         """local gradient accumulators"""         if self.reuse_grad_buffers:             yield from self._grad_buffers()-        elif self._grads is None:-            with torch.no_grad():-                self._grads = [-                    torch.zeros_like(grad, device=self.accumulate_grads_on) for grad in self._grad_buffers()-                ]-        yield from self._grads

This wold actually an error with reuse_grad_buffers=True, but it worked because noone asked for more than len(grad_buffers) elements

justheuristic

comment created time in 9 hours

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentlearning-at-home/hivemind

FP16 support, a few patches from sahajbert

 def accumulated_grads(self) -> Iterator[torch.Tensor]:         """local gradient accumulators"""         if self.reuse_grad_buffers:             yield from self._grad_buffers()-        elif self._grads is None:-            with torch.no_grad():-                self._grads = [-                    torch.zeros_like(grad, device=self.accumulate_grads_on) for grad in self._grad_buffers()-                ]-        yield from self._grads+            return+        else:+            if self._grads is None:+                self._grads = [torch.zeros_like(grad, device=self.accumulate_grads_on) for grad in self._grad_buffers()]+            yield from self._grads      @torch.no_grad()     def accumulate_grads_(self, batch_size: int):         """add current gradients to grad accumulators (if any)"""         if self.reuse_grad_buffers:-            return  # user is responsible for accumulating gradients in .grad buffers-        alpha = float(batch_size) / self.batch_size_per_step-        for grad_buf, grad_acc in zip(self._grad_buffers(), self.accumulated_grads()):-            grad_acc.add_(grad_buf.to(grad_acc.device), alpha=alpha)+            # user is responsible for accumulating gradients in .grad buffers+            assert batch_size == self.batch_size_per_step, "Custom batch size is not implemented for reuse_grad_buffers"+        else:+            alpha = float(batch_size) / self.batch_size_per_step+            for grad_buf, grad_acc in zip(self._grad_buffers(), self.accumulated_grads()):+                grad_acc.add_(grad_buf.to(grad_acc.device), alpha=alpha)      @torch.no_grad()     def apply_accumulated_grads_(self, scale_by: Optional[float] = None):-        if self.reuse_grad_buffers:

This previously caused a bug where reuse=True peers would not be scaled by scale_by. As a result, they would have larger gradients and dominate the reuse=False peers.

justheuristic

comment created time in 9 hours

PR opened learning-at-home/hivemind

Reviewers
FP16 support, a few patches from sahajbert

New features:

  • CollaborativeOptimizer can now combine fp16=True and reuse_grad_buffers=True with a special scaler
  • CollaborativeOptimizer peers with reuse_grad_buffers=True and reuse_grad_buffers=False can now co-exist
  • CollaborativeOptimizer peers with and without AMP can now co-exist

The new behavior of CollaborativeOptimizer with fp16 is:

  • grad_scaler=None: regular fp32 behavior
  • reuse_grad_buffers=False with GradScaler: works as usual, independently un-scales each tensor before accumulation, does not affect internal optimizer
  • reuse_grad_buffers=True with GradScaler: when calling scaler.step(opt), it will raise error and complain that it requires HivemindGradScaler
  • reuse_grad_buffers=False with HivemindGradScaler: applies unscale/update only around global optimizer step
+122 -24

0 comment

2 changed files

pr created time in 9 hours

create barnchlearning-at-home/hivemind

branch : fp16

created branch time in 9 hours

push eventlearning-at-home/hivemind

justheuristic

commit sha 4a9bc92cd18bd1860b4cf01d053082a3ce4e76f1

Implement weights as part of the allreduce protocol, not matchmaking (#384) * implement parts as part of the allreduce protocol, not matchmaking * remove metadata field from AveragingData (unused) Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

view details

push time in 13 hours

delete branch learning-at-home/hivemind

delete branch : allreduce_weights

delete time in 13 hours

PR merged learning-at-home/hivemind

Implement weight as part of the allreduce protocol, not matchmaking

This PR allows specifying allreduce weights in AllReduceRunner, instead of gathering them during matchmaking.

This will allow peers to use their actual batch size in both DPU and advance matchmaking (aka @yhn112 -style matchmaking)

[WIP] implement advance matchmaking as a working example

+34 -31

1 comment

5 changed files

justheuristic

pr closed time in 13 hours

push eventlearning-at-home/hivemind

justheuristic

commit sha 2c28b822cad8c807313b8fedf3deb230a5ffa8bd

Update hivemind/proto/averaging.proto Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

view details

push time in 13 hours

push eventyandexdataschool/Practical_DL

justheuristic

commit sha 4dd7dec33d0c28386ab724f50d946ef5d7e67546

publish week3

view details

justheuristic

commit sha e7d410791907d93c3eb8b9d24af14bb1b053a862

Merge branch 'fall21' of github.com:yandexdataschool/Practical_DL into fall21

view details

push time in a day

push eventyandexdataschool/Practical_DL

mkspopov

commit sha c1460de74aabfce41241f777fb67e3524c8197e1

Update homework03_part2_autoencoders_basic.ipynb Small fixes

view details

justheuristic

commit sha ab35adedcb5c6fd5d83a9381d56445500c53f2b4

Merge pull request #93 from mkspopov/patch-1 Update homework03_part2_autoencoders_basic.ipynb

view details

push time in a day

push eventyandexdataschool/Practical_DL

Seva Konyakhin

commit sha 8ae7cb9011984f6c9870577707f0345595252f9a

[homework03] a small fix in discriminator loss

view details

justheuristic

commit sha 457a46f18513b934b6755151a21b8c9adf452e70

Merge pull request #95 from sevakon/spring21 [homework03] a small fix in discriminator loss

view details

push time in a day

push eventyandexdataschool/Practical_DL

justheuristic

commit sha 1ff3eab0afa5dc654c6995e01a95a027bf88ac57

typo

view details

push time in a day

push eventyandexdataschool/Practical_DL

justheuristic

commit sha a5823961b2706ba950234b9854a9145975a2472b

Update README.md

view details

push time in a day

push eventlearning-at-home/hivemind

Denis Mazur

commit sha b44236972a22c4e371ec9c31be0cadd8d6f03391

Fix pickle vulnerability (#386)

view details

Alexander Borzunov

commit sha d809e303c55276bb63055a58a3ef1b925977d38f

Remove arguments with default values from example instructions (#388) * Remove arguments with default values from example instructions * Reorder arguments for free-tier GPU trainers

view details

justheuristic

commit sha 12e790cab1e6d980e57ff4c9bb20e71322a77f12

Merge branch 'master' into allreduce_weights

view details

push time in a day

Pull request review commentlearning-at-home/hivemind

Implement weight as part of the allreduce protocol, not matchmaking

 async def _generate_input_for_peer(self, peer_index: int) -> AsyncIterator[avera             code=averaging_pb2.PART_FOR_AVERAGING,             group_id=self.group_id,             tensor_part=first_part,+            metadata=self._weight_binary,

done so

justheuristic

comment created time in a day

PullRequestReviewEvent

push eventlearning-at-home/hivemind

justheuristic

commit sha c6fa3248e1312a4d4a1055a428a5b997d5b0120a

double trouble

view details

push time in a day

push eventlearning-at-home/hivemind

Alexander Borzunov

commit sha e9a5c9a8d37d27a1676f0c828da9676be53ced40

review

view details

push time in a day

push eventyandexdataschool/nlp_course

justheuristic

commit sha 2f2a04e03144f0f77fe569cc55ef3cb3c6593100

minor

view details

push time in 2 days

push eventlearning-at-home/hivemind

Alexander Borzunov

commit sha d809e303c55276bb63055a58a3ef1b925977d38f

Remove arguments with default values from example instructions (#388) * Remove arguments with default values from example instructions * Reorder arguments for free-tier GPU trainers

view details

justheuristic

commit sha 894d5d49d577cabf882adf405ccab8029d0b1fad

Merge branch 'master' into colab_with_large_part_size

view details

push time in 2 days