profile
viewpoint

byzhang/blaze 4

A fork of blaze-lib

byzhang/byzhang-graphlab 1

Automatically exported from code.google.com/p/byzhang-graphlab

byzhang/3D-convolutional-speaker-recognition 0

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

byzhang/alchemydatabase 0

Automatically exported from code.google.com/p/alchemydatabase

byzhang/AlphaGo 0

A replication of DeepMind's 2016 Nature publication, "Mastering the game of Go with deep neural networks and tree search," details of which can be found on their website.

byzhang/ardb 0

A redis protocol compatible nosql, it support multiple storage engines as backend like Google's LevelDB, Facebook's RocksDB, OpenLDAP's LMDB.

byzhang/arrayfire 0

ArrayFire: a general purpose GPU library.

byzhang/autojsoncxx 0

A header-only library and a code generator to automatically translate between JSON and C++ types

startedgoogle/latexify_py

started time in 3 days

startedElementAI/N-BEATS

started time in 2 months

startedrosuH/EasyWatermark

started time in 2 months

startedcouler-proj/couler

started time in 2 months

Pull request review commentwangkuiyi/gotorch

Create design.md

+# GoTorch++This document explains the motivations and critical design challenges of+GoTorch.++## GoTorch and Go+++GoTorch includes a Go binding of the C++ core of PyTorch, known as libtorch.+There are many language bindings of libtorch, including Rust and+Haskell. However, according to our survey, most Python users don’t feel+programming in Rust, Haskell, or Julia, is more efficient than in Python.  So,+language binding does not make much sense alone.++The complete story of GoTorch includes Go+, a language whose syntax is as+concise as Python, but its compiler generates Go programs. Programming deep+learning systems in Go+ is hopefully as efficiently as in Python, and Go++translates the work into Go source code, which compiles into native code running+on servers and mobile devices, including phones, pads, and self-driving cars.++In addition to the Go binding of libtorch, GoTorch also includes the other two+layers of functionalities of PyTorch provided in Python -- torch.nn.functional,+and torch.nn.++## Layers of Functionalities++Generally, PyTorch provides three layers of APIs, not all of which are in+libtorch.++1. The finest-grained layer is in libtorch -- about 1600 native functions, each+   is a fundamental operation in mathematics or its corresponding gradient+   operation.  Each native function has CPU and GPU implementations. By linking+   libtorch with github.com/pytorch/xla, we get an additional implementation for+   Google TPU.++1. A higher-level abstraction is in the Python package torch.nn.functional,+   which provides functions defined in Python and calls native functions in+   C/C++.++1. The highest layer is modules; each is a Python class with a method forward+   defining the forward computation process and data members that can store+   states.++## Tensors and Garbage Collection++libtorch includes the C++ definition of the fundamental data type `at::Tensor`+and native functions that operate it.++The key design feature of the tensor is automatic garbage collection (GC). In+C++, the class `at::Tensor` contains only one data member, a smart pointer to+the real tensor object.  This smart pointer performs reference count-based GC,+which frees a tensor once its reference count gets zero.  Comparing with Go and+Java’s GC, which runs the mark-and-sweep algorithm, reference count reacts

Compared to Go and Java’s GC which run the mark-and-sweep algorithm, reference count reacts ...

wangkuiyi

comment created time in 2 months

Pull request review commentwangkuiyi/gotorch

Create design.md

+# GoTorch++This document explains the motivations and critical design challenges of+GoTorch.++## GoTorch and Go+++GoTorch includes a Go binding of the C++ core of PyTorch, known as libtorch.+There are many language bindings of libtorch, including Rust and+Haskell. However, according to our survey, most Python users don’t feel+programming in Rust, Haskell, or Julia, is more efficient than in Python.  So,+language binding does not make much sense alone.++The complete story of GoTorch includes Go+, a language whose syntax is as+concise as Python, but its compiler generates Go programs. Programming deep+learning systems in Go+ is hopefully as efficiently as in Python, and Go++translates the work into Go source code, which compiles into native code running+on servers and mobile devices, including phones, pads, and self-driving cars.++In addition to the Go binding of libtorch, GoTorch also includes the other two+layers of functionalities of PyTorch provided in Python -- torch.nn.functional,+and torch.nn.++## Layers of Functionalities++Generally, PyTorch provides three layers of APIs, not all of which are in+libtorch.++1. The finest-grained layer is in libtorch -- about 1600 native functions, each+   is a fundamental operation in mathematics or its corresponding gradient+   operation.  Each native function has CPU and GPU implementations. By linking+   libtorch with github.com/pytorch/xla, we get an additional implementation for+   Google TPU.++1. A higher-level abstraction is in the Python package torch.nn.functional,+   which provides functions defined in Python and calls native functions in+   C/C++.++1. The highest layer is modules; each is a Python class with a method forward+   defining the forward computation process and data members that can store+   states.++## Tensors and Garbage Collection++libtorch includes the C++ definition of the fundamental data type `at::Tensor`+and native functions that operate it.++The key design feature of the tensor is automatic garbage collection (GC). In+C++, the class `at::Tensor` contains only one data member, a smart pointer to+the real tensor object.  This smart pointer performs reference count-based GC,+which frees a tensor once its reference count gets zero.  Comparing with Go and+Java’s GC, which runs the mark-and-sweep algorithm, reference count reacts+instantly but cannot handle the case of cyclic-dependency.++PyTorch programmers access `at::Tensor` from the Python binding. Python’s GC+uses reference counts like `shared_ptr`.  For completeness, Python runs+mark-and-sweep from time to time to handle cyclic-dependencies.++Go provides an asynchronous API, `runtime.GC()`, to trigger GC and returns+immediately without waiting for the completion of GC.  If all tensors are in CPU+memory, this mechanism works; however, in deep learning, we would prefer to host+tensors in GPU memory, which is a precious resource. We prefer to free tensors+immediately when they are out-of-use so that the next iteration can create new+tensors in GPU.++### Synchronize Go’s GC++We started with inventing new GC mechanisms in the library, including adding a+global reference count table. However, after trying several strategies, we+noticed that we could customize Go’s GC for the tensor type in GoTorch+specifically to make it synchronous, or able to wait till the completion of GC+before returning.++The basic idea behind the design is the categorization of tensors by different+purposes in deep learning:++1. model parameters -- created before, updated during, and freed after the train+   loop,+1. buffers -- with lifespan similar to model parameters but used to BatchNorm to+   keep statistics of input data, and+1. intermediate results -- including those generated during the forward and+   backward pass in each step of the train loop.++The customized Go GC mechanism doesn’t handle the first two categories, which is+the topic of the next section, Porting Modules.++To handle intermediate results, GoTorch users need to call `gotorch.GC()` at the+beginning of each train loop step.  The first job of `gotorch.GC()` is to mark+that all tensors generated since then, which are considered intermediate+results, are subject to the customized GC.  After the train loop, users are+supposed to call `gotorch.FinishGC()` to unset the mark.

why do users need to call FinishGC? All masked tensors should have been recycled when exiting loops?

wangkuiyi

comment created time in 2 months

Pull request review commentwangkuiyi/gotorch

Create design.md

+# GoTorch++This document explains the motivations and critical design challenges of+GoTorch.++## GoTorch and Go+++GoTorch includes a Go binding of the C++ core of PyTorch, known as libtorch.+There are many language bindings of libtorch, including Rust and+Haskell. However, according to our survey, most Python users don’t feel+programming in Rust, Haskell, or Julia, is more efficient than in Python.  So,+language binding does not make much sense alone.++The complete story of GoTorch includes Go+, a language whose syntax is as+concise as Python, but its compiler generates Go programs. Programming deep+learning systems in Go+ is hopefully as efficiently as in Python, and Go++translates the work into Go source code, which compiles into native code running+on servers and mobile devices, including phones, pads, and self-driving cars.++In addition to the Go binding of libtorch, GoTorch also includes the other two+layers of functionalities of PyTorch provided in Python -- torch.nn.functional,+and torch.nn.++## Layers of Functionalities++Generally, PyTorch provides three layers of APIs, not all of which are in+libtorch.++1. The finest-grained layer is in libtorch -- about 1600 native functions, each+   is a fundamental operation in mathematics or its corresponding gradient+   operation.  Each native function has CPU and GPU implementations. By linking+   libtorch with github.com/pytorch/xla, we get an additional implementation for+   Google TPU.++1. A higher-level abstraction is in the Python package torch.nn.functional,+   which provides functions defined in Python and calls native functions in+   C/C++.++1. The highest layer is modules; each is a Python class with a method forward+   defining the forward computation process and data members that can store+   states.++## Tensors and Garbage Collection++libtorch includes the C++ definition of the fundamental data type `at::Tensor`+and native functions that operate it.++The key design feature of the tensor is automatic garbage collection (GC). In+C++, the class `at::Tensor` contains only one data member, a smart pointer to+the real tensor object.  This smart pointer performs reference count-based GC,+which frees a tensor once its reference count gets zero.  Comparing with Go and+Java’s GC, which runs the mark-and-sweep algorithm, reference count reacts+instantly but cannot handle the case of cyclic-dependency.

Is cyclic-dependency an issue for smart pointers?

wangkuiyi

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

startedconnorferster/handcalcs

started time in 2 months

more