profile
viewpoint
Frédéric Bastien nouiz NVIDIA Montréal

NervanaSystems/neon 3853

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

dmlc/dlpack 487

common in-memory tensor structure

Epistimio/orion 205

Asynchronous Distributed Hyperparameter Optimization.

lupoglaz/TorchProteinLibrary 94

PyTorch library of layers acting on protein representations

inducer/compyte 51

A common set of compute primitives for PyCUDA and PyOpenCL

glorotxa/DeepANN 39

Theano based deep ANN learning code

benanne/nervana_theano 35

A rudimentary wrapper around the fast Maxwell kernels for GEMM and convolution operations provided by nervanagpu

abergeron/ccw_tutorial_theano 25

Common Code Workflow tutorial on Theano

mzoehr/Theano 19

theano patch

issue commentTheano/Theano

matplotlib and keyring should be added to configuration files

Theano isn't maintained anymore. It was forked and is continued by another group: https://github.com/aesara-devs/aesara Maybe you want to check if the issue is still in their fork and if so, report there.

But from a quick look, the matplotlib issue, we probably should remove that from the code. I do not understand why that code is there.

The other script is something optional. So we shouldn't force everybody to install that optional dependency.

yuluc123

comment created time in 10 days

issue commentNVIDIA/tensorflow

compile from source can not git clone cudnn_frontend_archive

Here is a diff that fix the issue:

diff --git a/tensorflow/workspace2.bzl b/tensorflow/workspace2.bzl
index 08a69af50cd..693e1018cb7 100644
--- a/tensorflow/workspace2.bzl
+++ b/tensorflow/workspace2.bzl
@@ -158,10 +158,10 @@ def _tf_repositories():
     new_git_repository(
         name = "cudnn_frontend_archive",
         build_file = "//third_party:cudnn_frontend.BUILD",
-        patches = ["//third_party:cudnn_frontend_header_fix.patch"],
-        patch_args = ['-p1'],
-        commit = "e9ad21cc61f8427bbaed98045b7e4f24bad57619",
-        remote = "https://oauth2:J64G8MymaUmqNKG_N3rR@gitlab-master.nvidia.com/cudnn/cudnn_frontend.git"
+        #patches = ["//third_party:cudnn_frontend_header_fix.patch"],
+       #        patch_args = ['-p1'],
+        commit = "73210a9",
+        remote = "https://github.com/NVIDIA/cudnn-frontend.git",
     )
aixi

comment created time in 19 days

pull request commenttensorflow/tensorflow

Fix TF 2.7 release notes

Ok. So I updated this PR to be as at the start to modify only the 2.7 branch.

@googlebot I consent. My commit use an email that is in my github account. So it should works as before. But my commit name is "Frederic Bastien", while my github account name is "Frédéric Bastien". There is some special character that I converted in the github commit. Can Google bot started to check the name? If so, I'll check if I can use the special character in my github commit.

nouiz

comment created time in 20 days

push eventnouiz/tensorflow

TensorFlow Release Automation

commit sha 2763b3dda3f73138e333c09bb59f3abdb82e50b7

Insert release notes place-fill

view details

TensorFlow Release Automation

commit sha 9b658003fcf9dd96af992ce6ee84e404e0720e05

Update version numbers to 2.7.0-rc0

view details

Mihai Maruseac

commit sha 0a3c5b0b7b72ab85addb8b17e42f3748f3a5b054

Merge pull request #52126 from tensorflow-jenkins/version-numbers-2.7.0rc0-29745 Update version numbers for TensorFlow 2.7.0-rc0

view details

pranve

commit sha fcf8099f530145a2034691991d1fb7cda71b1b91

Update Release.md Hi... Made the edits to maintain the consistency in format

view details

Mihai Maruseac

commit sha e6d19cda0aa85c3bb239ed262957bc8bdb5e6417

Update RELEASE.md

view details

Mihai Maruseac

commit sha 266fd9d040517f135a669f71e23a45372544b345

Update RELEASE.md

view details

Mihai Maruseac

commit sha ec9a6a0de9c7e6bffb95229d0189938098d97422

Update RELEASE.md

view details

Mihai Maruseac

commit sha 3fbfce56db250bfb84cd0b9480a3ec9a68f4cca6

Update RELEASE.md

view details

Mihai Maruseac

commit sha 54b2cadbe00277a4719375c30ec37bcd0cd7bdd5

Merge pull request #52120 from tensorflow-jenkins/relnotes-2.7.0rc0-12384 Update release notes for TensorFlow 2.7.0

view details

pranve

commit sha d8b96ab2453e074f655e5fde0b02edb50862260e

merging to r2.7

view details

Mihai Maruseac

commit sha 55f0a1e302f2307983b76767f30e7bfb36162835

Merge pull request #52189 from pranve/r2.7 Mark fail test as nopip..merging to r2.7

view details

pranve

commit sha 085e9db7b56a4322650ec8a2db18bcd668c5e0cb

Removes dependency of resource_var.h in stateful_random_ops_cpu_gpu.h…merge to 2.7

view details

Mihai Maruseac

commit sha 512c6219cf67dd11123156507e13225500f46eb7

Merge pull request #52191 from pranve/r2.7 Removes dependency of resource_var.h in stateful_random_ops_cpu_gpu.h…merge to 2.7

view details

Rahul Joshi

commit sha a858115d24af256d7517ff54f8dba327370abc6d

Fix MacOS TF build by reverting an LLVM commit locally. Revert https://github.com/llvm/llvm-project/commit/33e1713a00a5291e5de658d0eb0aebdbf1d3aa03 which seems to break macOS CPU TF build. PiperOrigin-RevId: 399933600 Change-Id: I2b4b25ff6b558687e29778649b195594a34c0f0d

view details

Mihai Maruseac

commit sha ac826525fa751b5556ce40cc5f6dc30804fdeeb9

Merge pull request #52202 from tensorflow/cherry-pick-llvm-mac-os-fix Fix MacOS TF build by reverting an LLVM commit locally.

view details

Katherine Wu

commit sha e3a672333b56f6281f2e7a40fc06b2233f0c5663

Remove use of `eval` when evaluating the input example. Use `ast.eval_literal` instead which safely evaluates the expression. PiperOrigin-RevId: 400012249 Change-Id: I5ff98608ea2d736d093aa488af723ff4f6707e02

view details

Rick Chao

commit sha b32dac6e425940064932a235b767bbf5973b6f26

PSv2: Disable parameter_server_strategy_v2_test on macos to unblock 2.7 release. PiperOrigin-RevId: 400280224 Change-Id: Iec94fb2cf05e8ed2fa1e2dac111470c32b75b60d

view details

Mihai Maruseac

commit sha ce35e5c3a8efdb8161c6a85c8fb9ffb5bbdc9ffd

Merge pull request #52247 from tensorflow/cherrypick2-15e88399ce014ed7dbcf6d540225ddf8a3baf5b4-on-r2.7 PSv2: Disable parameter_server_strategy_v2_test on macos to unblock 2…

view details

Yong Tang

commit sha f8691f9998083e535c9515977f814aceadd14293

Bump tensorflow-io-gcs-filesystem to 0.21.0 This PR bumps tensorflow-io-gcs-filesystem to 0.21.0 so that TF 2.7 branch cut can carry the latest change which includes the fix that removes the extra build directory in the python pip package (https://github.com/tensorflow/io/pull/1497) Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

view details

Mahmoud Abuzaina

commit sha 0d28bfb1e4a3bfd69730ac92f731bdf07f782258

Setting op device for eager op

view details

push time in 20 days

pull request commenttensorflow/tensorflow

Fix TF 2.7 release notes

CLA bot is confused when doing cherrypicks.

I didn't cherry-pick anything here.

There's a caveat here, we'll add a new commit on the branch after the release, but probably we can handle this the same as #52971

Can you confirm that, I need to do this commit on top of master and not on the release branch? I updated this PR to do this. I changed the RELEASE.md files instead of README.md.

nouiz

comment created time in 20 days

push eventnouiz/tensorflow

TensorFlower Gardener

commit sha 89f19778770ee76c01cbb555c244fcd7b96b4d74

Merge pull request #52516 from offscale:tensorflow.compiler.xla.service.computation_layout.cc PiperOrigin-RevId: 404413619 Change-Id: I7ba26e499bea0025d8c027c9cf09bc3b990a4a40

view details

Terry Heo

commit sha 256d07198ff1b0fac9b6b48c206a1367847202ad

tflite: Add Subgraph::RemoveUnusedInputs() API This API removes unused inputs of the subgraph by checking inputs of each node. Also added tensorflow/lite:subgraph_test. PiperOrigin-RevId: 404419363 Change-Id: If40344f09bdb0f6b849e45111e74495a664cc413

view details

Thai Nguyen

commit sha 2145b1da6f70da8cb27d57359dde59450ee33dab

Add INT8 support for Tile op PiperOrigin-RevId: 404419588 Change-Id: I67d7116feefa69bc41ebacf00ca7e2b7b06eebaf

view details

Roman Dzhabarov

commit sha a1320ec1eac186da1d03f033109191f715b2b130

[NFC] Remove unused AnnotateOutputShapes method. PiperOrigin-RevId: 404420835 Change-Id: I81bcd510db8373a0c79fce2cd9d772b2fe7cbaab

view details

Terry Heo

commit sha 770d9ef305f997c1f2cc1ab846ebd0771c07d441

Refactor inputs/outputs handling of WHILE op With the new Subgraph::RemoveUnusedInputs() API, we can remove unnecessary inputs of COND subgraph. Copying inputs and outputs of subgraphs are changed as the following. [Original loop] 1. eval cond 2. copy cond inputs to body inputs 3. eval body 4. copy body outputs to cond inputs [While with static outputs] 1. eval cond 2. copy body outputs to body inputs 3. eval body 4. copy body outputs to cond inputs (fast) [While with dynamic outputs] 1. eval cond 2. copy WHILE op outputs to body inputs 3. eval body 4. copy body outputs to cond inputs (fast) 5. copy body outputs to WHILE op outputs This change improves latency for static outputs cases. And no latency increase is observed with dynamic outputs cases. And regardless of the outputs type, the memory usage is improved since it doesn't need to keep unused inputs with COND subgraph. PiperOrigin-RevId: 404424553 Change-Id: I6aaee2e89ef46cc1ca2db15ae79eb5c7b97ea235

view details

Haoliang Zhang

commit sha 4b593da433b90dd06b8c2e89b6a4f8796b2a4d5f

Change batch size to 100 in on-device training colab. PiperOrigin-RevId: 404424786 Change-Id: I333deafa5413cf0ed4a30d477acd6c15b2fecbcd

view details

Taehee Jeong

commit sha dcacb7a29fd9c3a008248f2956b8a8b409208d05

Fix failing test. Increasing steps seems to resolves the issue. PiperOrigin-RevId: 404425153 Change-Id: I6e1b58be9bb13c902c663a988bcf965e688d40fb

view details

Ethan Kim

commit sha 32750390e54bb3d5128122efbc2d860a9ec15349

Update quantization accuracy documentation PiperOrigin-RevId: 404428668 Change-Id: I68798fc86d273b008adedbe0cae274565eb4a4ae

view details

Mihai Maruseac

commit sha d921175cb96f9c41a57ea624464efb477e6c25a4

Disabling some tests as they fail with numpy 1.20 PiperOrigin-RevId: 404431010 Change-Id: I0d206707d7ea15b284b2f0c26827feec8e2cfd7a

view details

A. Unique TensorFlower

commit sha 6a6772c6784997df7140afb386366df758f88e3d

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 404438366 Change-Id: Ica04022f4345b6a75d6a7f111c6178cd973aa94d

view details

TensorFlower Gardener

commit sha 1180d389b41314e9715336b4b7b194140292d5c8

Merge pull request #52511 from offscale:tensorflow.compiler.xla.service.algebraic_simplifier.cc PiperOrigin-RevId: 404452101 Change-Id: I1a2527208abd4a11b3ee5b4f194dd95f085625ff

view details

TensorFlower Gardener

commit sha d755170f3c9291f9d127cf127d8afea9d27b1c10

Merge pull request #52432 from offscale:tensorflow.compiler.xla.service.hlo_alias_analysis_test.cc PiperOrigin-RevId: 404452734 Change-Id: I7205e35b3747b54066b0ea360ca3ddb0e6ab6ed0

view details

TensorFlower Gardener

commit sha d8f7d5f1b6ee3b8f669703469bf5b0f58b7ec65e

Merge pull request #52514 from offscale:tensorflow.compiler.xla.service.ar_crs_combiner.cc PiperOrigin-RevId: 404453834 Change-Id: I7e973a621cd68f700c990d03f7f1bd56b2040c8c

view details

Christian Sigg

commit sha 498cb7becc38ad7aa36c0656e4a34aae3211ba53

Sanitize location names so that they don't need to be quoted, which would crash the LLVM PTX backend. PiperOrigin-RevId: 404454586 Change-Id: Ie5fe4db1a3f3dd93fbdcbf2016af77a43a6d7abf

view details

TensorFlower Gardener

commit sha f65e5d4fc7a1493b78212e0414e0781be616b0ee

Merge pull request #52497 from offscale:tensorflow.compiler.xla.client.lib.prng.cc PiperOrigin-RevId: 404455831 Change-Id: Icf9fd1e26b80b844e8bbca5c7ee1107cecae8d74

view details

TensorFlower Gardener

commit sha 4d8bc85b8156c5d3f7495dc176ab59978ae9b0d3

Merge pull request #52513 from offscale:tensorflow.compiler.xla.service.allocation_tracker.cc PiperOrigin-RevId: 404457050 Change-Id: Ida0dcea8afe0dceaf0fce1cfcb9fd69d0071dce9

view details

TensorFlower Gardener

commit sha 5c0d6435660eceac5f18696b45ac5591d0f427eb

Merge pull request #52498 from offscale:tensorflow.compiler.xla.client.lib.quantize_test.cc PiperOrigin-RevId: 404461389 Change-Id: I34534df45665851e02cf09420bc7a3356a28c1ec

view details

A. Unique TensorFlower

commit sha 1c2009bfa5e93882675b2802fffc7bec6c559d52

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 404468798 Change-Id: If1a7eb48da9e3d7f95c923612838334cd2986174

view details

Christian Sigg

commit sha 9ca9403102e2ca46f9642695d0e011f2f83de68e

Add `tensorflow/compiler/xla/service/gpu/tests:all` which currently pass to `bef_executable_tests` suite. PiperOrigin-RevId: 404469193 Change-Id: I33bd070c37a34a69c5bb505b5bc434588e60590a

view details

A. Unique TensorFlower

commit sha 97a291ba92525e0b7da000d04f51473c94335df8

Update GraphDef version to 925. PiperOrigin-RevId: 404471699 Change-Id: Idde6724dd5881e0d027fdd39bf30e83d140f6e1c

view details

push time in 20 days

pull request commenttensorflow/tensorflow

Fix TF 2.7 release notes

@googlebot I consent.

I already have the CLA signed. I'm not sure why this was asked.

nouiz

comment created time in 20 days

pull request commenttensorflow/tensorflow

Add a note about cudaMallocAsync in the release note of TF2.7

Thanks for catching this. Here is the fix to the release branch: https://github.com/tensorflow/tensorflow/pull/52990

nouiz

comment created time in 20 days

PR opened tensorflow/tensorflow

Fix TF 2.7 release notes

Fix https://github.com/tensorflow/tensorflow/pull/52670/. The changes is in TF2.7 and not 2.6.

@mihaimaruseac

+1834 -2286

0 comment

73 changed files

pr created time in 20 days

create barnchnouiz/tensorflow

branch : fbastien_2.7notes_v2

created branch time in 20 days

pull request commenttensorflow/tensorflow

Add a note about cudaMallocAsync in the release note of TF2.7

This doesn't seem to be included in 2.7 release notes here: https://github.com/tensorflow/tensorflow/releases/tag/v2.7.0

Any idea why?

nouiz

comment created time in 23 days

delete branch nouiz/tensorflow

delete branch : fbastien_2.7notes

delete time in a month

pull request commenttensorflow/tensorflow

Add a note about cudaMallocAsync in the release note of TF2.7

Should I port that to the master branch?

nouiz

comment created time in a month

push eventnouiz/tensorflow

Frederic Bastien

commit sha 313aca451e6b5391883747505c854cb073b2cab4

Add a note about cudaMallocAsync

view details

push time in a month

create barnchnouiz/tensorflow

branch : fbastien_2.7notes

created branch time in a month

pull request commenttensorflow/tensorflow

XLA persistent compilation cache

I rebased due to conflict. I implemented the code changes you suggested and a few other changes in 6 commits. I still need to add a test.

bas-aarts

comment created time in a month

push eventbas-aarts/tensorflow

Dimitris Vardoulakis

commit sha 6fac8aa97d8017fca901dbfd69b89d1c505cd8ba

Add a setter for the shape_size function. PiperOrigin-RevId: 400048718 Change-Id: I4736dceb7cb642a7162cde9eed301a19862fc242

view details

Edward Loper

commit sha 92e84b48f6e2bed5be6088192216048fea63cba8

Fix out-of-bounds memory error in tf.ragged.cross shape inference when it is called with invalid inputs. PiperOrigin-RevId: 400049589 Change-Id: Icd535f4c6b96b3c926befb70a0e44e6d2b004c0a

view details

Geoffrey Martin-Noble

commit sha 00a79ee9871cc8d48bd45d63703a7dd2b45f42a1

Roll forward of "[MHLO] Shape inference and further verification for static and dynamic gather" Rolling forward after fixing some broken internal tests that were using invalid gathers. PiperOrigin-RevId: 400061054 Change-Id: I5f787baf801c1e91408d08940646cc5a6cbd9a14

view details

Peng Wang

commit sha 6abd271bfa8f0a8d22d71897b4b62318174e51f7

Sets the force_same_dtype argument to False for the maybe_promote_tensors call in tensor_equals and tensor_not_equals. Also changes force_same_dtype's default to False, since force_same_dtype=True is a dangerous behavior (e.g. causing silent downcasting). PiperOrigin-RevId: 400061590 Change-Id: I0c155aa95f5acb23d70ba735c84601ab405bb5b3

view details

Geoffrey Martin-Noble

commit sha f8a17851a9bbafe46fb6af81ec8b2f4e8fa8041a

Lowering of general mhlo.gather to linalg This is a complete lowering of the gather op and all its weirdnesses. Accounting for all these makes the lowering pretty fiddly and also means we miss more efficient representations for the common special cases. Some of those are already covered by the lowering via mhlo.torch_index_select, but this is something that should be revisited. In the meantime this gets us a (hopefully) correct lowering for the full API. In addition to the lit tests here, I also spot-checked the output of running this through IREE against `jax.lax.gather`. Fixes https://github.com/tensorflow/mlir-hlo/issues/16 PiperOrigin-RevId: 400063077 Change-Id: Icec8c50d891206a1a45c588a07a11f6038fb4bd0

view details

Meghna Natraj

commit sha 137658a79fc50e80d393bb4c0dd5bbf3b9c3855b

Validate random ops `dtype` argument and update documentation. PiperOrigin-RevId: 400066999 Change-Id: Id47fea80a88e99e98c539a28abeec2a318d7c14b

view details

A. Unique TensorFlower

commit sha 328d945250c02bf72e672c93599e98c42beace96

Update Argument/Fetch nodes in graphdef exporter to make sure node name is canonicalized (e.g. no :0 in suffix). PiperOrigin-RevId: 400069029 Change-Id: I40a686f715f9c61d38f0d71ed91f843987002c2e

view details

A. Unique TensorFlower

commit sha 7fe0df313afa0d1e5fccf7f419c668fbff222dc3

Integrate LLVM at llvm/llvm-project@a21c557955c6 Updates LLVM usage to match [a21c557955c6](https://github.com/llvm/llvm-project/commit/a21c557955c6) PiperOrigin-RevId: 400069688 Change-Id: I5fc992172571e0bddbbad473d432971ba6f15f30

view details

A. Unique TensorFlower

commit sha b8c4fd53226a2b145fce4af0287a42af0b3b3fe9

Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/ad9981d394f5ccc784d6273e51aa41c38b7cf727. PiperOrigin-RevId: 400071284 Change-Id: I971a8bc1fdc2d21117c3191dfbe334afb088f12c

view details

A. Unique TensorFlower

commit sha f062e905d10e4d8135d4889820d6a08d5c90b32c

[TFLite/MLIR] Avoids crash in ReduceWhileOperands. PiperOrigin-RevId: 400071680 Change-Id: Ic8d50ed93f9b820dc63c9a40e5bb1e4906ab1fe4

view details

Peng Wang

commit sha 1e6f23c33aac24829df3ad9dc53f6f82d69c42b2

Renames UpdateVariableAndFill_Philox_Arg::not_used to state_var_guard, since it's used by the CPU functor specialization. PiperOrigin-RevId: 400075357 Change-Id: Ib0a6572a01eda038895f3bd8518e5441a1c9ca68

view details

Ken Franko

commit sha f8d0a057d6b2deb0b95e909e859b5eef0d1a9c98

Move OutsideCompiledToHostLaunchPass above TFFunctionalControlFlowToRegions. TFFunctionalControlFlowToRegions adds a ToBool op which could lose the _xla_outside_compilation attribute. Using the launch representation for outside compilation preserves the outside compilation behavior. PiperOrigin-RevId: 400075910 Change-Id: I0240ac8d70ce8f8f916ea34155791936b9a62dea

view details

Peng Wang

commit sha b79d673a9af74048b970cf1d64effda017c1c803

Adds name scope and name for the tf.Variable created inside tf.random.Generator. PiperOrigin-RevId: 400078035 Change-Id: I0cd5ee962e4aa973e11aa1c3bec8fcce06577c9e

view details

A. Unique TensorFlower

commit sha 3991c43707db3db2b8afcb43314dd5a8083b55d3

Factor allocate-and-fallback logic out of `DoConvolveWithExecutionPlan`. PiperOrigin-RevId: 400082570 Change-Id: I24325abfa577daa4bd3e087da21e1572164ae58f

view details

A. Unique TensorFlower

commit sha 4af137a8fa955b5c3873afedbaf4ebb2362531d4

Sets the force_same_dtype argument to False for the maybe_promote_tensors call in tensor_equals and tensor_not_equals. Also changes force_same_dtype's default to False, since force_same_dtype=True is a dangerous behavior (e.g. causing silent downcasting). PiperOrigin-RevId: 400085486 Change-Id: I9869c69544b34d26d2a3311121439f42934b13b7

view details

Taehee Jeong

commit sha af4115fb4dc1adc8cd49c729f4331dedf95797f3

Add tfds update command to quantization debugger colab. The default tensorflow_datasets version for current colab runtime is 4.0.1,and it has known issue with imagenet_v2 dataset which has been resolved. (https://github.com/tensorflow/datasets/issues/2888) Updated accuracy numbers and added note that this numbers might change in the future. PiperOrigin-RevId: 400101225 Change-Id: I2fadd6a02fc0cb5856ce358ec032fcf07a193311

view details

A. Unique TensorFlower

commit sha 36533d8cf16964ec21982d8521b23b95f0911058

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 400104177 Change-Id: I327059d3cf930c83ab654c564081075803499a8d

view details

Taehee Jeong

commit sha 119917fde87c241c1a5582ae9157c692409a5e96

Remove warnings regarding type inference during the build Ops with tensor list output can't determine output type correctly, so the first output is set to tfr.tensor_list, and subsequent outputs don't have correct type. Changed the python code to get the tensor_list typed ouput first, and then unpack them one by one. PiperOrigin-RevId: 400104350 Change-Id: Ia0e1146c99dbb3a61c16ccc3704256ed11bb48d7

view details

Duncan Riach

commit sha 7e00ae0d4ae1d8b1736584cf8b79c809cd8c445d

[determinism] Add tests for tf.nn.ctc_loss

view details

A. Unique TensorFlower

commit sha 23372629592065433ab9867b4723cdad34488e99

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 400120090 Change-Id: I9b3e9dc4de1fc654a253433e3a36b0b714a30408

view details

push time in a month

push eventbas-aarts/tensorflow

Frederic Bastien

commit sha f0117c64a248acb76845b2e972fa550bd17d927e

Fix compilation due to rebase

view details

Frederic Bastien

commit sha 14bd2bf75dc4fc5710566191d4911e672a2d6bd8

Dump correctly the LLVM_IR to the cache.

view details

Frederic Bastien

commit sha 35ac2c073d6d598ec39b55c65db2a145658721a0

Add to the in memory cache too.

view details

Frederic Bastien

commit sha 16884b9b3f943df5b8f63a223b46e6efb95df536

Refactoring to encapsulate the cache internal.

view details

Frederic Bastien

commit sha 0dbee4fb8b14afd07318e8e8d3f9ad541f3923f2

make more method private

view details

Frederic Bastien

commit sha f7ed339bfdaf25dec0be88ef7b9500d968fd8880

Code simplification

view details

push time in a month

CommitCommentEvent
CommitCommentEvent

issue commenttensorflow/tensorflow

How can I clear GPU memory in tensorflow 2?

Hi, I understand that memory problems aren't great. Just a few notes:

  • cudaMallocAsync isn't made to help the problem you encounter. It helps only for some other cases.
  • You need TF nightly build to use cudaMallocAsync for now. The working version will be available in the next TF release.

About your issues. I tried what you describe years ago with another software then TF. It was something hard to do due to python. The only reliable way that I found is to have bash script that launch all the jobs or to launch sub process one after the other. You said that you used subprocess, but are you sure it wasn't multiple threads? They aren't the same thing and in your case could make a big difference. Also make sure that you read the subprocess return value.

HristoBuyukliev

comment created time in 2 months

push eventbas-aarts/tensorflow

Trent Lo

commit sha 3fc05031be502a10df755a18fe5a0f46553dce1c

[XLA:GPU] Skip mlir for testUpdateVariableWithCompileTimeConstMemoryUsage.

view details

Mehdi Amini

commit sha 08f33029d5d944de66bd99d62441302a61d231b7

Make the tf_cc_binary friendly to build_cleaner (NFC) This is doing two things: 1) Registering the macro with build_cleaner using `register_extension_info` 2) Using the req_dep=... tag to instruct build_cleaner to ignore the MKL macro, which is not detected as needed. register_extension_info isn't available in OSS as far as I can tell, this will be a no-op there. PiperOrigin-RevId: 395970288 Change-Id: I8fcf4741f045f84d1bc6b6fa57119931ce1f77e8

view details

Edward Loper

commit sha fc0d404021d5cdb71c72c5f55847b79335160322

ExtensionType: when building the constructor, if any Tensor fields have default values, then do *not* convert them to tensors -- wait to do so until the constructor is run. (We might be in a different graph context when the type is created vs when the constructor is run). PiperOrigin-RevId: 395987098 Change-Id: Idc2efe3ab3a1fe04ab719fbe784843115c4428c1

view details

Changhui Lin

commit sha b3a1bcccbd95310a5118bd0e5f6eb1513cb582c8

Update to use the new op tfrt_gpu.module.function, and remove the alias for ModuleFunctionOp. PiperOrigin-RevId: 395987675 Change-Id: I11eb2bcd1f65640985087929b3523193f1d0114a

view details

Karim Nosir

commit sha 40344cb7acb69c9e81d7601ec4808aeb97af4cf9

Disable MaxPool 2D when padding is SAME and filter > 12, we see issues with larger filters. Add test coverage for max pool op. PiperOrigin-RevId: 395987802 Change-Id: I08fab2065a373081973c909e2156839d896e70a5

view details

A. Unique TensorFlower

commit sha 549a0de78209761cedc4647c3df701c9c188d11f

Add CreateUnowned() to resource manager API. This CL is part changes for adding serialization support to refcounting ResourceHandle. Ref-counting resource handles will add names to resource manager via CreateUnowned() to support serialization. This will be enabled in a follow up CL. PiperOrigin-RevId: 395988040 Change-Id: I0d74b46acc71f63e7efd415622176025cc2e42e3

view details

A. Unique TensorFlower

commit sha a60caf0a0098cd6d19139dcab9102ba59b82c891

Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/93b60a08b3f5d8ab43c714541934166fd9dafbc1. PiperOrigin-RevId: 395990376 Change-Id: I3b3dec687e90a27c8523342fa69d11be9f956b86

view details

Faizan Muhammad

commit sha f3b316fd45751e399a978da893558afa81f8917f

Refactor Cache Key Encoding PiperOrigin-RevId: 395991839 Change-Id: Ib5e12f10f3a18341312adfd7496d654458e7ea67

view details

Jun Xu

commit sha 45967265f19ab5dc5f303686ec4e9dc57d3b994e

Output a single startup warning if the environment doesn't support AutoGraph. PiperOrigin-RevId: 395996266 Change-Id: I72e92b9ba7826d632ce942fe3c3e458fc713f217

view details

A. Unique TensorFlower

commit sha c7126a56a0cb504d24c0f070965005963765a958

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 395999996 Change-Id: I423b663c03ed54a044baf96b75175120e83e7022

view details

George Karpenkov

commit sha b4a4a98b4cb13a69b003e7776399111df6ea1e02

[XLA:GPU] [NFC] Remove FusionLayoutAnalysis usage from reduction emitter It's inconsistent with GetShapeFromTensorType(mlir::Value) which is also used, and returns an incorrect value when more constraints are added in LayoutAssignment. Let's just use GetShapeFromTensorType(mlir::Value) for now, and move to FusionLayoutAnalysis once it's correct and can be used everywhere. PiperOrigin-RevId: 396006121 Change-Id: Ic3241843df2fc1861a9c8a422724cf1f17b5933d

view details

Rahul Joshi

commit sha 9dbacd1f44431f636be76a0482dbf7a6db8ff1ff

[XLA:GPU][NFC] Rename test in preparation for using it to test more e2e SPMD compilation PiperOrigin-RevId: 396006132 Change-Id: I59f91d5622d71e96b923f1a5c1da2ce0e0a8d572

view details

A. Unique TensorFlower

commit sha 29e60573d456925c4ae028118078d636257590e7

Fix Apple cc_library layering check violations (ios) PiperOrigin-RevId: 396008438 Change-Id: I3c9e325060b8a4d5d4964d3d7e096ec571e006eb

view details

TensorFlower Gardener

commit sha ae5760bae1b2cee89b2fd64bfce4b099f849827a

Merge pull request #51866 from nouiz:upstream-kExp PiperOrigin-RevId: 396010343 Change-Id: I2fe37b03f58c6c86a37787bf090af02619e818c7

view details

Aden Grue

commit sha be692cdb8dbd0ab86c402a072d7a4791fc4bc918

[NFC] Remove 'RunHloPasses' from 'RunHloPassesAndBufferAssignement' There's no reason to bundle these things together when 'RunHloPasses' is already part of the public API. It just complicates the API unnecessarily. Also there is only one caller! PiperOrigin-RevId: 396012296 Change-Id: I30d57189c03e146945d67f213747a8087e28b62c

view details

TensorFlower Gardener

commit sha 93d8e92b25404fab5386f4a3e0f41a1305f821e3

Merge pull request #51753 from kvignesh1420:graduate-choose-from-datasets PiperOrigin-RevId: 396018830 Change-Id: I7fb6570b81072ea6906e70efca945dd93f78f95f

view details

A. Unique TensorFlower

commit sha 86a17da25db6fe478cbe2f9cd06ddad95ecfb13e

Remove unused field nested_while_level PiperOrigin-RevId: 396020342 Change-Id: Ibf9cc4fb4b96afc0ac4a1490c905d369f45f7afd

view details

Marcello Maggioni

commit sha b04a39b8f131be71c5e6cac2eaedac97915e8eec

[XLA] Add an option to allow to compile with sharding propagation propagating sharding to outputs. This allows some frameworks (JAX in this case) to perform something akin to "querying" sharding of values when they split computations into multiple smaller ones that feed each other. By compiling the computations to be composed with this flag SPMD will derive the sharding of the output that then can be used to set automatically by the framework the sharding of the input of the next computation after the composition. Currently a full compilation needs to happen to generate the executable with the binary, but we can can create in the future a fast-path where we create an empty executable only with metadata to save compile time. PiperOrigin-RevId: 396022035 Change-Id: I5d2e8b6b45b1bf197ed4bba07323b33b67d75806

view details

A. Unique TensorFlower

commit sha e22fa88445ba50fca02dee5f3dd643952dacbf0e

Output a single startup warning if the environment doesn't support AutoGraph. PiperOrigin-RevId: 396022896 Change-Id: I894972502103af535118d5d396f7eb9e04bd811e

view details

A. Unique TensorFlower

commit sha e8d587050caba8601a3f39752d3ee7aeb8caa482

...Androidx Migration Internal refactor... PiperOrigin-RevId: 396025610 Change-Id: I26aa9bcbc92244b5703f3cdd36bfb14c70444e78

view details

push time in 2 months

pull request commenttensorflow/tensorflow

XLA persistent compilation cache

@gbaned this is strange. @bas-aarts already did many PR: https://github.com/tensorflow/tensorflow/pulls?q=is%3Apr+author%3Abas-aarts+is%3Aclosed So he must have signed the CLA.

He started this PR and I'm finishing it. Could that have confused the bot?

Anyway, there was a merge conflict, so I just rebased it. Maybe it will fix the bot?

bas-aarts

comment created time in 2 months

PullRequestReviewEvent

Pull request review commenttensorflow/tensorflow

XLA persistent compilation cache

 void WarnIfBadDriverJITVersion(); // Returns the directory containing nvvm libdevice files. string GetLibdeviceDir(const HloModuleConfig& hlo_module_config); ++// Persistent compilation cache.+// This cache stores .ptx and .cubin files to be used by subsequent+// compilations. The cache is a directory of files. The file name is a hash of+// the file content. All files are read (disk IO) and stored in memory when the+// cache is constructed. When an uncached compilation occurs, the result is+// written (disk IO) to the cache directory immediately. Autotuning is currently+// non-deterministic, so a few executions might be required to populate the+// cache.+// +// Deployment:+// For best performance, keep the cache small (per model) containing only the+// binaries needed for this execution. In that scenario, after cache creation,+// there will be no disk IO.+class PersistentCompilationCache

This would request a big rewrite of the cache. This would complicate the files structure too. Currently we only cache the value and the hash of the key. Not the full key. So when reading the cache, we can't populate the structure you suggest.

What you propose will move the complexity from understanding the in memory caching scheme to the file caching scheme itself. It doesn't seem a good win. As it seems to me to just move the complexity around and it is a non-trivial change, I do not like this suggestion.

I added more comments to explain it better. So it will help clarify the current hashing mechanism. Which was your main point.

I also did a small refactoring to make have only 1 place to edit when we change ptxas parameters. This will also help for the maintenance of the code.

What do you think of this new version? Does it explain well the caching structure?

bas-aarts

comment created time in 2 months

push eventbas-aarts/tensorflow

Frederic Bastien

commit sha ca8fe6a5c4bf8abd869a930790e95d36a401e418

[NFC] Create the list of ptxas parameters at only 1 place. This will help make sure the XLA cache key doesn't diverge from the compilation code.

view details

Frederic Bastien

commit sha ead6a9e493948031b130facd79a199a0dcf058b0

Better explanation of how the cache works.

view details

push time in 2 months

delete branch nouiz/tensorflow

delete branch : upstream_master_tf2_pow3

delete time in 2 months

CommitCommentEvent

push eventnouiz/tensorflow

Yong Tang

commit sha 8a0c82f95115967868f066fc02eaf36ccb6a1233

Fix crash of tf.image.pad_to_bounding_box with large input value. This PR tries to address one of the issues raised in 46890 where tf.image.pad_to_bounding_box will crash with large input value. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

view details

Wang,Quintin

commit sha 9cdfea554bbad1aa06aa8ee2a35a1036324d6a99

fix While Op segfault when CPU build with PluggableDevice, add UT

view details

Srinivasan Narayanamoorthy

commit sha bf345528b02c87dbb40c1f67ae01289813b97354

Changes to not rewrite conv_grad ops to MKL when explicit padding is enabled.

view details

Srinivasan Narayanamoorthy

commit sha 3b60cee018001d59f2fe3681b81ba82cfaa9fe0a

Enabling simple heuristic based tuning for innerproduct primitive

view details

Wang Yanzhang

commit sha b7ec38fbbf77905e213dcee2e37daaa23d9d22eb

[PluggableDevice] TensorList support with DEVICE_DEFAULT Use stream apis to enable the TensorList with DEVICE_DEFAULT.

view details

Wang Yanzhang

commit sha 8223427497c8fdb1a31a0601a222b2a8b20dee2a

fix the clang-format code style

view details

Wang Yanzhang

commit sha 7d63dc69467ae53fb0eb64bfaeae73b561058340

fix: optimize CopyTensor for performance

view details

Wang Yanzhang

commit sha 1a9d9d399a2e2481eb7baf3324546cce6a7a6aad

fix: DCHECK call func first

view details

Jonathan Dekhtiar

commit sha a31025b53cfecd7dea666fa694b847582ef17da4

ConvertReduce with added INT32 support

view details

Tamas Bela Feher

commit sha 6cd3e3c67c0e6bfdb739302db42027efa4a30bee

Load and select optimization profiles for static engine

view details

Srinivasan Narayanamoorthy

commit sha e89042202822efac0ca2b37a288f1af73e61c36b

review comments.

view details

Srinivasan Narayanamoorthy

commit sha 68b26f85122fa634c9bcd05b46e69b0b9f0149bf

minor change.

view details

George Karpenkov

commit sha 131e7c7eed43597e0b2c7676c43a46133389d492

[TF2XLA] [NFC] Move broadcasting helpers from tf2xla to XLA client This allows usage from XLA codebase. To minimize disturbance from a single CL, leave the forwarding shim in place for now. PiperOrigin-RevId: 398065372 Change-Id: Ifb316e0f8e3620be373f5bf2a09981a7e1c90e0e

view details

Justin Lebar

commit sha 76d3d71fdb9ad11add614ec6127f370b663da6dd

Give cudnn/cublas custom-calls better names in HLO. Instead of naming these instructions "custom-call.42", name them e.g. "cublas-gemm.42". PiperOrigin-RevId: 398069934 Change-Id: I9a90731daa7fa482dba696f562defb0ef77c81aa

view details

Sagun Bajra

commit sha d4ae671934b6f72ed0ffc740f809d05a94c8985d

Improve consistency of error messages across the matmul_op. PiperOrigin-RevId: 398071755 Change-Id: Ic9cc8eefc97b76df03eb672b3debfa9b0f47fec4

view details

Roman Dzhabarov

commit sha 14b5d8db269503063b879d7c9c80f1ab7e1c6b70

[NFC] Remove unused functionality to run Grappler on saved model. PiperOrigin-RevId: 398072114 Change-Id: I4b22caeda32313f227d5e40b6717090362059c03

view details

A. Unique TensorFlower

commit sha 8b47a4475720f279dd8adbc2aae5058855bf3777

Improving error messages for variable_scope.py PiperOrigin-RevId: 398072522 Change-Id: I632505586cdd1dc4d742b51853f3d7105736d685

view details

A. Unique TensorFlower

commit sha 5acd3e941cccbf80b359db9f99795d047bad0b26

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 398075109 Change-Id: I230b3a75bc28861b129c69b9fa7ac63453968fa8

view details

Edward Loper

commit sha 10f321a49238ff8dc8ba8d83ebe0f2bcef1ca83f

Add the `BatchableExtensionType` subclass, which can be used to define extension types that support APIs that make use of batching, such as `tf.data.Dataset`, `tf.keras`, and `tf.map_fn`. PiperOrigin-RevId: 398076402 Change-Id: Ia8eedadb5d575af319df20332524b7be16129105

view details

A. Unique TensorFlower

commit sha 54efe33af30cbde3fe6c774df6f5a77980cbbc02

[XLA:SPMD] Pass through parameter sharding into repleated layer body. PiperOrigin-RevId: 398080911 Change-Id: I7c44f2484286d392ab9c28165c24e6b632b7edff

view details

push time in 2 months

more