profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/kaixih/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Kaixi Hou kaixih @NVIDIA Santa Clara https://kaixih.github.io/ Deep learning, GPUs, High performance computing

kaixih/gpu_unified_cache 4

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

kaixih/cudnn_migration 3

Migrate your cuDNN v7 legacy APIs to cuDNN v8 frontend APIs

kaixih/addons 0

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons

kaixih/benchmarking_cmp_op 0

Benchmarking comparison operations of different data types, i.e., int, float, and double

kaixih/config_files 0

Commonly used config files in linux

kaixih/dl_samples 0

Code samples to use deep learning frameworks, libraries.

kaixih/horovod 0

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

startedOUCMachineLearning/OUCML

started time in a month

pull request commenttensorflow/tensorflow

Fix the shape inference function for FusedBatchNormGradEx

Thanks for the reminder. The root cause is that there will be only 5 outputs when num_side_inputs == 0 for the FusedBatchNormGradEx. So, we need to skip setting the 5th output when no side input tensor is detected. It is fixed with the new commit. PTAL. @cheshire

kaixih

comment created time in 2 months

push eventkaixih/tensorflow

Rohit Santhanam

commit sha 28db1e8e3b2c3b99ef72c90c054a56ae77eba3c4

Proper fix for XLA unit test seg fault issue. This was root caused to be a divergence between the CUDA and HIP conv_canonicalizer implementation.

view details

A. Unique TensorFlower

commit sha 72ac61dbed0ea43dce96431b2b72aa0a7bc83a93

Fix stray brackets in documentation. PiperOrigin-RevId: 386013169 Change-Id: I0251cf534566842c1405db03744beceb51e4e6ad

view details

Jian Li

commit sha ce4a1ec7597bdaab139e89ba942fa87ecdf931b5

Switch macro to template for guarding divide by zero in Div kernel. PiperOrigin-RevId: 386021628 Change-Id: I7fc0f5e097524162b3ff34211ce11fda38fefa95

view details

TensorFlower Gardener

commit sha f225635f61cababc398b0073eca94ec198729702

Merge pull request #49735 from Tessil:toupstream/int8_int16x8_reduce_prod PiperOrigin-RevId: 386024020 Change-Id: I65b8ae56df04e844377fd1f6d6e9ce5f0808f034

view details

Andrew Audibert

commit sha 47d06e96f423a0f7bc2b41242a43d5764f954777

[tf.data] Refactor shuffle implementation. This refactor breaks the previously monolithic GetNext method into several smaller methods. In addition to making the `shuffle` implementation more readable, this refactor lays the groundwork for supporting "shuffle all", where the shuffle buffer is resized to shuffle an entire epoch of data. PiperOrigin-RevId: 386026101 Change-Id: I73ce487258a06ecedf7783d650d67ca3b4b4dcc8

view details

Peng Wang

commit sha 517f66b1e9a72f77c7086acb3bd8cc01a8c055b1

Merges StatelessRandomGetAlg's functionality into the V2 stateless RNG ops, so that the former is no longer needed (whose "IsStateful" mark causes problems). Also repurposes the value 3 of the `Algorithm` enum to mean "auto-selection". PiperOrigin-RevId: 386026171 Change-Id: Icca6c1e3551be2c906c00a3a4f89896e6d8308bb

view details

Juanli Shen

commit sha 429a11315ee72fe19d7e43d2c70782773cada793

Remove obsolete TODO PiperOrigin-RevId: 386029673 Change-Id: Ic04114f6943447e29728134ab7d33fb924214416

view details

Taehee Jeong

commit sha 2dae99df07acaa2e680b9ecbb47820e41e08104c

Remove quantization parameters from input and output values After the decomposition is finished, quantization parameters are already decomposed into proper scaling ops in the graph, thus quantization parameters are no longer needed, nor can be understood by the TF graph. So the quantization information is removed from input and output tensors, and only the storage data type is used for the tensors' element type. PiperOrigin-RevId: 386029733 Change-Id: Ic8ea799d5d8df66d7539acfcad995bb450d6c71f

view details

Juanli Shen

commit sha cc0536a1cbc003b1e6dbc5c2bd28839cbb8096ef

Remove `tensorflow::SessionOptions` from `SavedModel::Options` Saved model should not expose API as `tensorflow::SessionOptions`. All the values should be set inside of saved model. PiperOrigin-RevId: 386037545 Change-Id: I7dc73f93a559ee29e55583cbd931e0f01cb07095

view details

A. Unique TensorFlower

commit sha d84bf6f51338bb5231d3e3741c74c07a6d417b4b

Add more context to tf.function error message when the node is available. Second attempt after the windows error rollback. Thanks cheshire@ for point to me tf_stack.cc. The internals have been reworked based on tf_stack.cc. 1. Add a Tensorflow-jit Traceback. Use the full source file like Python stack trace. (no common prefix removal) 2. Include the full source file ref in node defined_at, which allowed me to remove the common prefix logic, which is duplicated between C++. There is comment in error_impl.py hinting at doing this change after Python 2.x support is dropped, which we already do. 3. Match the input node index with that of the graphs. 4. profiler shall unpack the frame objects immediately to avoid redundant string-fying. PiperOrigin-RevId: 386038406 Change-Id: Ie0f549d41973eb6d573a816688a562d604faeeab

view details

Michael Banfield

commit sha c887954b02ec22c5cfbff1e123c0b0c350efd932

Internal clean up. PiperOrigin-RevId: 386040106 Change-Id: Ic29638ad7498514aa6a74229341aa80d1ec123f8

view details

Justin Lebar

commit sha 65627b449f1d4ec19aa7b7ae86154b936049aab7

Fix two bugs in int8 fused convs with the frontend API. - We were using the output pointer as the side-input pointer (copy-paste bug). - We need to say that the bias buffer is "vectorized", otherwise the int8x4 conv is rejected by cudnn. With this change we can now enable the new API for fused int8 convolutions. PiperOrigin-RevId: 386040593 Change-Id: I3b163eba3d5c387f1d934eb0b79fad2ca0e6b579

view details

Chuanhao Zhuge

commit sha cc98caef8cb9a316d6e88095ac55094a342abb21

Gracefully handle encoding WAV files with zero length. Removes checks for positive frames and non-nullptr audio in EncodeAudioAsS16LEWav. When creating audio summaries for zero length tensors, the backing data pointer for the tensor of samples is nullptr which causes EncodeAudioAsS16LEWav to return an InvalidArgument. When the number of frames to encode is zero, it is ok for the audio pointer to be null. TESTED: - unit test PiperOrigin-RevId: 386041567 Change-Id: I327f07574d49dcfe93c5588e0626bf4f7e12fdd8

view details

Katherine Wu

commit sha b2ac24daa7fde93fb138e0bc53164684be00fad7

Fix `None` argument error when taking grads with loaded custom gradients. The fix is simple - replace all None types with zeros. See tf.UnconnectedGradients for info about the `None` gradients. PiperOrigin-RevId: 386043023 Change-Id: Ica97170286f376733aa0ce6f231d077f6cdb4830

view details

Monica Song

commit sha c5f963c5eba429c00322d2544ba15b055ace6d33

Fix tiny typo in scope name in docstring. PiperOrigin-RevId: 386043159 Change-Id: Ib199ae6378f54b8520f9140a8ea1ca9298337a80

view details

Amit Patankar

commit sha b6cb1c746aaaf6f8588b22836c498f86a6927b1c

Delete Python 3.6 jobs for testing and releases. This concludes our Python 3.6 support. PiperOrigin-RevId: 386043635 Change-Id: If117e240f0112eda1a6756c90a2696b6cec87321

view details

A. Unique TensorFlower

commit sha a7b2c26eb51c03a5acecc673c63473476e16a3f6

Use third-party benchmark library instead of in-house implementation PiperOrigin-RevId: 386045704 Change-Id: Ibf551fa07de6b9639a65cba7a012019473d77e86

view details

Amit Patankar

commit sha ee16c86696684663eac4f6213a799042e4d349ad

Delete gpu_pip_on_cpu jobs. PiperOrigin-RevId: 386045946 Change-Id: Ia99977567fd2420e0c558a5c198a7f24aaed187f

view details

A. Unique TensorFlower

commit sha bbdf3e7f5cf1778ffcc663a7b15657ee2637ee77

Adds method to AsyncBufferTransferManager to query PjRtDevice. PiperOrigin-RevId: 386049432 Change-Id: I164522e32f7979bfb43116e1d672ab998a40dab2

view details

Matt Watson

commit sha 17d2c1fd84be170826201210230b3c9ccbe4b3f8

Disable flaking test PiperOrigin-RevId: 386059070 Change-Id: I64d0ef58305cdc1a99bcd21f6e9eff81f2305af1

view details

push time in 2 months

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

Yes, sounds like a execution plan lifetime issue. Let me reply it in the email thread.

kaixih

comment created time in 2 months

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

Yes, I just checked that the engine index 11 (or "new algorithm" in your comment) is an IMPLICIT_PRECOMP_GEMM_XXX engine, meaning it's just one of many IMPLECIT_PRECOMP_GEMM engines used inside the legacy algorithm IMPLICIT_PRECOMP_GEMM.

For the identity activation issue, I think the cudnn fusion still has limitation on supporting arbitrary no-op in the supported fusion pattern (but as mentioned in the email, the plan is to have more comprehensive support for runtime fusion for NHWC, at least for this particular pattern.).

Also, if your observation holds that ConvBiasAdd+No-op actually works when using engine 11 via the pattern of ConvBiasAddRelu (sorry, I couldn't confirm it right now), I think it is a good idea to file a bug towards cudnn to support this variant pattern (i.e. allowing Relu to be Indentity/No-Op) via an update over the heuristics/fallback, which IMO seems to be straightfoward.

kaixih

comment created time in 2 months

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

Hi Justin, I did a quick check and it seems the problem is the bias tensor doesn't get vectorized: https://gist.github.com/jlebar/a63d912cf20a422772e39f5c6d82c8c8#file-gistfile0-txt-L3002-L3009 (The vectorCount and vectorDim lines are missing for this tensor.)

Could you please check if the bias tensor is correctly vectorized? For example, here the vector_size should be 4: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/cuda/cuda_dnn.cc#L3583

kaixih

comment created time in 2 months

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

Can you try the TF_CPP_VMODULE=cuda_dnn=4 to output the logs? Then, at lease we can get an idea of the conv shape. Also the cudnn logs would help us to repro the issue if we know it is a cudnn issue.

May I ask if this is for the normal convolution or fused conv?

kaixih

comment created time in 2 months

PullRequestReviewEvent

Pull request review commenttensorflow/tensorflow

Support Cudnn Frontend Errata Filter

 class CudnnFilterDescriptor {   SE_DISALLOW_COPY_AND_ASSIGN(CudnnFilterDescriptor); }; +// The errata sheet (JSON format) for marking the cudnn engines that might be+// buggy. For example, we don't want the engine 999 of forward convolution:+// R"({ "version" : 1,+//      "rules"   : [+//        { "rule_id"             : "ConvFwd_eng999",+//          "operation"           : "ConvFwd",+//          "engine"              : 999,+//          "knob"                : [],+//          "cudnn_version_start" : 8000,+//          "cudnn_version_end"   : -1+//        }+// ]})"+// We skip eng0 in the static filter because they are too slow. Additionally,+// users can specify an additional errata JSON file via+// CUDNN_ERRATA_JSON_FILE at runtime.+absl::optional<json> CudnnExecutionPlanEngineFilter(bool is_static) {

Done. PTAL.

kaixih

comment created time in 2 months

push eventkaixih/tensorflow

Kaixi Hou

commit sha 1c27154c20deea333c44c73d60f662366a795cb4

Separate functions

view details

push time in 2 months

push eventkaixih/tensorflow

Blake Hechtman

commit sha 491ea166528922468d5b5b7b826f42df44e3b88f

[XLA] Do not reserve large hash maps when there are many small computations. PiperOrigin-RevId: 380641027 Change-Id: I51ae2adcaefbc94a139eac4a5cafb5df724cb280

view details

Andrew Audibert

commit sha 4878ad9bb9a33637e62e340040b7bb0ed4506e4f

[tf.data] Improve error message when dataset serialization fails. Previously we would get an error along the lines of NotFoundError: Resource localhost/hash_table_6d5754e2-c5bd-4c7f-8f99-5aa8782dd5a8/ResourceBase does not exist. Now the error looks like FailedPreconditionError: Serialization error while trying to register a dataset with tf.data service. The dataset may depend on a resource located on a different device. To address this, call `register_dataset` from the device with the resource, then use `from_dataset_id` to create per-device datasets. Original error: Invalid argument: Trying to access resource hash_table_8945247c-dee3-4c71-a49b-1d9d8aa5f13c located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:CPU:1 To get the information about which device the resource is being accessed from, this CL plumbs the OpKernelContext through the SerializationContext. PiperOrigin-RevId: 380641324 Change-Id: I995dccb2766ac526c1154e3e8627e30e7cf7630a

view details

Yujing Zhang

commit sha a4553f8ad74f5ad486057da400fb8e101f12e09f

PR #49173: [Crash fix] Fix cudaMallocAsync crashes. Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49173 The first commit fixes #48869. The second commit fixes another follow up crashes when using TF_GPU_ALLOCATOR=cuda_malloc_async. The 2 fixes are: - The Allocator API have... PiperOrigin-RevId: 380643089 Change-Id: I06f04d8b2d8ed6b08b91f94123a5e9e8a1681793

view details

Yuefeng Zhou

commit sha f07a03050cbaab476808e35eb9d06dd9a809ca89

When creating remote eager context, use the collective_executor_mgr from env_. Add a test for running remote collective ops. PiperOrigin-RevId: 380645544 Change-Id: If99a85cfd36362c0a546f7be7253d8d77addbd53

view details

Prakalp Srivastava

commit sha e8fd64732f55c3f576f6ad64b1bdf60b256f8b54

Modify MLIR bridge 1st phase metrics to use counter(s) instead of boolean to collect metrics at graph-level granularity. PiperOrigin-RevId: 380646836 Change-Id: Ie03116c9558197142dea0f7f74dd4e468d40846b

view details

xiaohong1031

commit sha 96d1f254e41f82bd9460e79b690063b346b12c79

disable 2 tests due to MKL backend does not support

view details

A. Unique TensorFlower

commit sha 4f9f14b20b03db683d313d2fdb642550dcb5cd8d

Integrate LLVM at llvm/llvm-project@186f2ac612ad Updates LLVM usage to match [186f2ac612ad](https://github.com/llvm/llvm-project/commit/186f2ac612ad) PiperOrigin-RevId: 380647265 Change-Id: Icc30dd2cd2d2ed5f4356e73e6000f4d71c4512a0

view details

Yunxing Dai

commit sha c8d061f3b072d1f4a06c8ab2d3c18b78a0237ac3

Add dynamic partition op in TF2XLA. PiperOrigin-RevId: 380647331 Change-Id: I5d2a1c4afea0de28476595d4fb667d543c2612eb

view details

Dan Moldovan

commit sha 91032c64d026916153469eda761d02e27cdfa858

Use a more robust verification method for signatures, one which doesn't itself blow up when arguments involved don't support Pythonic equality testing. PiperOrigin-RevId: 380649914 Change-Id: I2927ba72c948eaf0c814e32e35fc1ab7e89faf4e

view details

A. Unique TensorFlower

commit sha a16a64127058043344d7cc3cad525ed05e324d4d

Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/06724be44ad2941e1400847216c1d55f1bacd7b9. PiperOrigin-RevId: 380653100 Change-Id: Id64d6b52862ca697a8438dee2e30aabc85584fda

view details

Fergus Henderson

commit sha 41c7d05b70ac1b731e2e8512f510acadf0529b3f

(lite) Document requirements on object lifetimes in the C API. In particular, document the lifetime requirements for the following function parameters: - TfLiteModelCreate's "model_data" parameter - TfLiteInterpreterResizeInputTensor's "input_dims" parameter - TfLiteInterpreterOptionsAddCustomOp's "model_data" parameter - TfLiteInterpreterOptionsSetOpResolver's "op_resolver_user_data" parameter PiperOrigin-RevId: 380656809 Change-Id: I35042d3bfd74b84b65cd2ad1cc67121b2a721f7b

view details

Peter Hawkins

commit sha 1778c85be001ee2bed2b0893017d563c2539b765

[XLA:Python] Add a .live_buffers() on XLA:Python Device objects. Adds an easy and public method to access the list of live buffers on a device from JAX. PiperOrigin-RevId: 380661911 Change-Id: Ia1d9e90086a2651455fd9c893d1ab8b0078bb8b4

view details

Karim Nosir

commit sha 35c860464ae81dd790ba2a714bbd90cdc5e69166

Explicitly construct std::tuple, looks like no implicit constructor from initializer list PiperOrigin-RevId: 380662646 Change-Id: Ie7dfd971db981bf97e8c835548a04ff8c5c1938a

view details

Jared Duke

commit sha b078592bf032530e4c8c5144e479ca882d48053d

Build infra change for TFLite ARM configurations PiperOrigin-RevId: 380673322 Change-Id: Iff84b374275e0d6d61c1d7236cdf4a4966efccf5

view details

Monica Song

commit sha 110671bca1b5dfb1dcdeae7e3c765f2c72aa59c9

Add migration block for tf.compat.v1.saved_model.load PiperOrigin-RevId: 380676521 Change-Id: I49aeaf7d8f0e7b2a1d60ed800854bd28befe77bb

view details

Andrew Audibert

commit sha f8bf5ab0114466fe0c1c5a9ba2713bd9c44ab9b0

[tf.data] Add support for serializing resource variables. In particular, this enables datasets that depend on resource variables to be used with tf.data service. PiperOrigin-RevId: 380677174 Change-Id: I6b8dae0a9f7802ec411d1f21675b632ee680e5fe

view details

Jeremy Meredith

commit sha 5800172962ce5410121908589bc9685547492eac

Improving C++ Op codegen to move the "name" argument to the end with a default nullptr, and avoiding a temporary array for 0-output Ops. PiperOrigin-RevId: 380677456 Change-Id: Idf59e652806a98d34199f761648384da9ece6abe

view details

Karim Nosir

commit sha 75e7f7a88d1de06b1f90175bcf5d41f4bd81d3fc

Condition disabling variable lifting based on the flag for enabling native resource variables. PiperOrigin-RevId: 380679278 Change-Id: I17384b37ed1bc5cf554ca4feebfd5e01cfff2a77

view details

Jingyue Wu

commit sha 77d8298b1b6b00ea39e0e0dd81bb6df4cc41962b

Expose xla::RemoveDynamicDimension as a TF op. PiperOrigin-RevId: 380680165 Change-Id: I45558ae0f8f4e8bc6df0ffa7692e2702c5aa767b

view details

A. Unique TensorFlower

commit sha 882f0950edaa00d1c420253a24585f0a264f32d7

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 380681185 Change-Id: Idbf9418842af37027a93e3382516a2a1a3c29ed7

view details

push time in 2 months

push eventkaixih/tensorflow

Kaixi Hou

commit sha 187b877b4b42a40b904b3bdfe45827cdab555adf

Fix the shape inference function for fused grad ex

view details

push time in 2 months

PR opened tensorflow/tensorflow

Fix the shape inference function for FusedBatchNormGradEx

Related to https://github.com/tensorflow/tensorflow/pull/49277

This PR changes FusedBatchNormGradEx to use a different shape inference function to set output 5.

cc. @nluehr

+35 -1

0 comment

3 changed files

pr created time in 2 months

create barnchkaixih/tensorflow

branch : fix_fused_bn_grad_shape_upstream

created branch time in 2 months

pull request commenttensorflow/tensorflow

Re-enable Fused BatchNorm + Add + Activation for the backprop

This now has the following failure: Failure: F0709 00:43:15.026466 2393 shape_inference.cc:114] Check failed: output(i).IsSet() 5 for {{node fused_batch_norm_grad}} = _FusedBatchNormGradEx[T=DT_HALF, U=DT_FLOAT, activation_mode="Relu", data_format="NHWC", epsilon=0.1, is_training=true, num_side_inputs=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](output_grad_cast, input_cast, scale, fused_batch_norm:3, fused_batch_norm:4, fused_batch_norm:5, offset, fused_batch_norm)

Emm... This should be straightforward to change. Originally, I shared the FusedBatchNormGradShape function for the new fused op, which doesn't set the output 5 - side_input_backprop. Let me work on a fix.

kaixih

comment created time in 2 months

pull request commenttensorflow/tensorflow

Support Cudnn Frontend Errata Filter

Changed to use the cache for json handle and revert the previous related commit. PTAL. @timshen91

kaixih

comment created time in 3 months

push eventkaixih/tensorflow

azazhu

commit sha 5ec6611130968662d55afad694a4a476eb3dcea5

[MLIR][DISC] pattern conversion from tf2mhlo: add ConvertPrintOp

view details

azazhu

commit sha 2763b86949ab363eac0c8554ffa08edf387913fa

rebase

view details

azazhu

commit sha c5a15bb9adf41333663c13cebe045f5f52764d81

rebase

view details

azazhu

commit sha 7ebbbd2ee8f2b503a5e0fba7b837c56ac0c39a7f

Merge branch 'master' into feature/disc_porting_tf_to_mhlo_pat_part6

view details

azazhu

commit sha 7feb6c5f3834e62333c8074a79ccafaa94f5d72a

Merge branch 'master' into feature/disc_porting_tf_to_mhlo_pat_part6

view details

azazhu

commit sha 86cabd32aeadb79e328b5819fb416d9d901eb08f

Merge branch 'github_master' into feature/disc_porting_tf_to_mhlo_pat_part6

view details

Deven Desai

commit sha e7a449380238a03dddd2ff01a67205194e24bd09

[ROCm] Enable unique op on the ROCm platform.

view details

Wenyi Zhao

commit sha 410262f963c6ff25875355d1b7885472fc89f84e

[MLIR][DISC] legalize disc to llvm This PR provides conversion patterns to convert `disc_ral.dispatch_op` and `gpu.launch_op` to its disc-compatible llvm format.

view details

Wenyi Zhao

commit sha 3838e485168ce10569b3022f33225a9722073efb

fix

view details

TensorFlow Release Automation

commit sha 6a5ff7beeaa5003994595e9eea487d2a5907f8de

Prepare stub for 2.7.0 release notes

view details

geetachavan1

commit sha 630cc637cb28621b5fd5bc9607790fb015a918e8

Update RELEASE.md

view details

azazhu

commit sha 9feaa42e4e9bf136de3bdff16831adabba7a2303

Merge branch 'github_master' into feature/disc_porting_tf_to_mhlo_pat_part6

view details

Alexander Bosch

commit sha c0c21ffda6011b29fc6ab0d71f558067cf5e51f6

Added Userange Analysis for Buffers.

view details

Alexander Bosch

commit sha d02dd23ec0162527a1645fe89f82f58523486b8b

Addressed reviewers comments.

view details

Alexander Bosch

commit sha cff3b2655c253b37a90ea81881c0f4fa2a2575d4

Addressed a missed comment

view details

Alexander Bosch

commit sha c6a63081390d772a474e836b46d6384f3876d44d

Changed formatter to LLVM.

view details

Alexander Bosch

commit sha d059665a32c5f62e7a4ef18639947ca8964249a4

Addressed reviewers comments.

view details

Alexander Bosch

commit sha a4030cf0bc9e321e489cc6ab215ec0cfc5f5cbe0

Resolved conflicts.

view details

Philipp Hack

commit sha 638140a045d35acc978115919d6d3b846e73cb3b

Implements device-independent fake quantization functors and adds unit tests for symmetric fake quantization.

view details

Peter Kasting

commit sha a132ffe90b165523ff7596cbb57138156a33ddb4

Fix -Wunreachable-code-aggressive. Bug: chromium:1066980

view details

push time in 3 months

Pull request review commenttensorflow/tensorflow

Support Cudnn Frontend Errata Filter

 class CudnnFilterDescriptor {   SE_DISALLOW_COPY_AND_ASSIGN(CudnnFilterDescriptor); }; +// The errata sheet (JSON format) for marking the cudnn engines that might be+// buggy. For example, we don't want the engine 999 of forward convolution:+// R"({ "version" : 1,+//      "rules"   : [+//        { "rule_id"             : "ConvFwd_eng999",+//          "operation"           : "ConvFwd",+//          "engine"              : 999,+//          "knob"                : [],+//          "cudnn_version_start" : 8000,+//          "cudnn_version_end"   : -1+//        }+// ]})"+// We intentionally return an empty string for now. Alternately, users can also+// specify an additional errata JSON file via CUDNN_ERRATA_JSON_FILE at runtime.+std::string CudnnExecutionPlanEngineFilter() {+  static std::string filter_str = "";

Sure, can you point me with which commit you are referring to? I just want to confirm if we want to avoid the eng0 from all three conv directions: ConvFwd, ConvBwdFilter, ConvBwdData.

kaixih

comment created time in 3 months

PullRequestReviewEvent

Pull request review commenttensorflow/tensorflow

Support Cudnn Frontend Errata Filter

 GetFirstWorkingExecutionPlan(   cudnn_frontend::filter(engine_config, filtered_configs, generic_filter_fn);   cudnn_frontend::filter(fallback_list, filtered_configs, generic_filter_fn); +  auto fn = []() { return true; };+  json json_handle_static;+  json json_handle_runtime;+  std::string errata_str = CudnnExecutionPlanEngineFilter();+  bool use_static_errata = false;+  if (errata_str != "") {+    use_static_errata = true;+    json_handle_static = json::parse(errata_str);

Do you mean sth like a static variable?

kaixih

comment created time in 3 months

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

So, for the frontend API, we are still working on if/how we should expose the reordering things to users.

At the risk of distracting from the PR: I'd like the ability to say "I have already reordered it" to be exposed in the frontend API.

Ideally in XLA we will do the reordering ourselves. That is, we'd like you to tell us what the reordering is (or I guess we'll figure it out by observing what ReorderFilterAndBias does). Then we'll teach XLA to do this reordering itself.

By doing so, we'll be able to fuse the reordering operation into other ops, hopefully making it free.

Exactly, we would like exposing this feature for advanced users like XLA who knows what's going on with manual filter/bias reordering. For now, the API will just assume filter/bias has already reordered if we simply change the vectorCount to 32 in the cudnn code sample or if we enable int8x32 in this PR.

kaixih

comment created time in 3 months

push eventkaixih/tensorflow

Kaixi Hou

commit sha b954a503ff5bdcfa589e4c18f64e60c1075d0949

Cleanup changes no longer needed

view details

push time in 3 months

pull request commenttensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

@awpr Thanks for your comments. Most of your feedback are resolved. PTAL.

For the int8x32 support, the IMMA kernels require the reordered filter/bias (more can be found https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnReorderFilterAndBias). So, for the frontend API, we are still working on if/how we should expose the reordering things to users. That is why in the v0.4 release note only contains the int8x4 samples.

kaixih

comment created time in 3 months

push eventkaixih/tensorflow

Kaixi Hou

commit sha 12a1b7f08d2ca7bdddc7ceeffae6b57b8774695c

Refactor Batch/FilterDescriptor

view details

Kaixi Hou

commit sha 1925ebab916250dd3c1c4b8993cb1f119ea3191b

Change range

view details

push time in 3 months

PR opened tensorflow/tensorflow

Cudnn frontend v0.4: Support of int8x4, API updates

Cudnn frontend v0.4 released: https://github.com/NVIDIA/cudnn-frontend/releases/tag/v0.4

Accordingly, this PR does the following things:

  • Support vector data type (Currently only int8x4);
  • Change setDataType() in OperationBuilder to setComputePrecision()
  • Update setAlpha/Beta which can deduce the datatype from the compute precision instead of given alpha/beta.

cc. @nluehr

+246 -217

0 comment

4 changed files

pr created time in 3 months

push eventkaixih/tensorflow

Kaixi Hou

commit sha 0cf3afd528ac1ee6d99526707b3d1d0451afcff9

Disable the int8x32 for cudnn frontend

view details

push time in 3 months

PR opened tensorflow/tensorflow

Support Cudnn Frontend Errata Filter

Cudnn frontend v0.4 introduces a new API to allow users to filtering out undesired engines: https://github.com/NVIDIA/cudnn-frontend/releases/tag/v0.4. Users can either hard-code a list of those engines or provide a JSON file via CUDNN_ERRATA_JSON_FILE during runtime. This feature could greatly improve the debugging process.

This PR integrate this feature by allowing both hard-coded errata list (intentionally blank for now) and runtime list. An example of using runtime list:

# cat /home/tmp/sample_errata.json
{ "version" : 1,
  "rules"   : [
    { "rule_id"             : "ConvFwd_eng1",
      "operation"           : "ConvFwd",
      "engine"              : 1,
      "knob"                : [],
      "cudnn_version_start" : 8000,
      "cudnn_version_end"   : -1
    }
  ]
}

# CUDNN_ERRATA_JSON_FILE=/home/tmp/sample_errata.json TF_CPP_VMODULE=cuda_dnn=4 python -u conv2d_tf2_func.py
...
2021-07-01 20:50:21.828423: I tensorflow/stream_executor/cuda/cuda_dnn.cc:4489] Exclude engine (runtime): ConvFwd_eng1_k2=3_k3=0
2021-07-01 20:50:21.828626: I tensorflow/stream_executor/cuda/cuda_dnn.cc:4489] Exclude engine (runtime): ConvFwd_eng1_k2=0_k3=0
...

cc. @nluehr

+104 -10

0 comment

3 changed files

pr created time in 3 months

create barnchkaixih/tensorflow

branch : cudnn_frontend_errata_upstream

created branch time in 3 months

create barnchkaixih/tensorflow

branch : cudnn_frontend_vect_upstream

created branch time in 3 months