profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/akuegel/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Adrian Kuegel akuegel Google Germany GmbH

akuegel/toolchains 0

Bazel toolchain configurations used across TensorFlow ecosystem

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

[ROCm]: Updating Matrix Solve Op to use rocSolver

Your change has now been rolled forward again.

stevenireeves

comment created time in 7 days

pull request commenttensorflow/tensorflow

[ROCm]: Updating Matrix Solve Op to use rocSolver

Will do, but probably will happen tomorrow because I am waiting for internal approval of the modified change.

stevenireeves

comment created time in 8 days

pull request commenttensorflow/tensorflow

[ROCm]: Updating Matrix Solve Op to use rocSolver

The change got reverted because some internal target included cusolver.h and thus got broken by this change. But I will roll forward with the fix.

stevenireeves

comment created time in 8 days

PR opened tensorflow/toolchains

Bump ROCM RBE container

The new container has ROCM 4.3.1 instead of ROCM 4.2

+1 -1

0 comment

1 changed file

pr created time in 9 days

push eventakuegel/toolchains

Adrian Kuegel

commit sha 02e31ef390f686d8e2247bfbf1c9b81c39ee01f0

Bump ROCM RBE container The new container has ROCM 4.3.1 instead of ROCM 4.2

view details

push time in 9 days

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

[XLA] Enable pointwise row vectorization for small row.

The newly added test is failing to build, because you don't have a dependency for the header include

#include "tensorflow/compiler/xla/error_spec.h"

nouiz

comment created time in 24 days

pull request commenttensorflow/tensorflow

[XLA] Add back: lift bitcast PR

Any idea what is happening?

Sorry for the delay, the problem is that the process is complicated. I approved the PR in Gerrit when it was imported from Github, as such I was not allowed to also approve it when it was sent out internally for review (it has to be approved by someone else). Then, additionally the test we got from which I had extracted the reproducer did not work anymore by now, so I couldn't verify myself that the issue is completely fixed, and I warned the reporter of the issue that the changelist will land again. After they didn't react, @cheshire approved the changelist and it got committed.

nouiz

comment created time in 24 days

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

So @Artem-B has figured out why we had the older driver version when running the test. A tensorrt target was loading libcuda.so before we could load the cuda compat package. This is fixed now, so the test should actually also be running fine internally :)

nouiz

comment created time in a month

pull request commenttensorflow/tensorflow

fix datatypes in cwise and gather ops

Yeah, let's close this. We still have a plan for adding support for int64 index values with the new jit mode. Essentially it would jit compile with 64 bit indexing if it sees a tensor where this would be needed.

kushanam

comment created time in a month

PR closed tensorflow/tensorflow

fix datatypes in cwise and gather ops cla: yes size:S comp:core

cwise_ops_gpu_common and gather_functor kernels deduce loop variables from input template datatype. they cause overflows and consequently result in Illegal CUDA memory access issues. Sample code to reproduce is added below.

import tensorflow as tf tf.compat.v1.disable_eager_execution() ​ n = 13417677 # 13417676 works h = 160 x = tf.compat.v1.get_variable('x', [n, h]) with tf.compat.v1.Session() as sess: sess.run(tf.compat.v1.global_variables_initializer())

+26 -13

16 comments

4 changed files

kushanam

pr closed time in a month

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

@Artem-B is there a way to check whether we indeed are running our tests with cuda_compat?

My answer above applies only to the build/test of the internal version of TF, not to the OSS one. I'm a bit out of date on the state of OSS test infrastructure and don't know what we use there ATM.

In general, OSS builds would need to have cuda_compat installed within the container they are running. My guess is that it's not. I can check tomorrow.

Sorry, I should have clarified, this test is failing internally with the output I posted above (seemingly indicating that the driver is not new enough). In OSS, the test runs successfully.

nouiz

comment created time in a month

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

@akuegel Did the internal test fail? If so, what is the new error message?

During my work hours yesterday, the change hadn't been imported yet, I have read there were issues with the tool which does the import/export, possibly caused by Github problems. Today I see that the change was imported, and the test failed again, this time with an insightful error message:

gpu_cudamallocasync_allocator.cc:136] Disable cuda_malloc_async or update your CUDA driver to a version compitible with CUDA 11.2 or higher. We detected a version compatible with: 11000

So it is indeed the problem that we don't have a recent enough driver. So either the cuda_compat package isn't being used, or doesn't help here. @Artem-B is there a way to check whether we indeed are running our tests with cuda_compat?

nouiz

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

@akuegel Can you check in the log if there is any mention that the GPU is enabled and working in that job?

I updated this PR with extra information in that error message. Mostly, the error is that some argument to the function aren't valid. So I print them. The only argument that I think could fail is the device id. Can you run the test with that extra error message?

With this additional debug statement, I can see: "On device: 0". Failed to get device attribute : CUDA error: invalid argument (CUDA_ERROR_INVALID_VALUE) Is it possible that a certain minimum driver version is needed to run this test successfully?

@Artem-B Do you know which driver version are we currently running internally?

nouiz

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commenttensorflow/tensorflow

[XLA] Add back: lift bitcast PR

 tf_cc_test(     ], ) +cc_library(+    name = "fusion_bitcast_lift",+    srcs = ["fusion_bitcast_lift.cc"],+    hdrs = ["fusion_bitcast_lift.h"],+    deps = [+        "//tensorflow/compiler/xla:shape_util",+        "//tensorflow/compiler/xla/service:hlo",+        "//tensorflow/compiler/xla/service:hlo_casting_utils",

This target was removed yesterday, and depending on "//tensorflow/compiler/xla/service:hlo" is enough to get access to hlo_casting_utils.h header. Can you please remove this line? This should also show up as compile error if you sync to HEAD.

nouiz

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

I tried with our CUDA 11.2 and 11.3 container and the test pass. Hopefully it was an not-related error that won't happen with the new approval.

It did happen with the new approval, same error, and only when executing your newly added test. Unfortunately it looks definitely related. I believe this error would have been triggered before in the first merge as well had we been running with a cuda version >= 11.2, but first time essentially your new code was disabled and therefore couldn't trigger this error. I don't know of any other difference (except the newer cuda version). Can you make sense of the cuda error that is returned? This is line 133 of gpu_cudamallocasync_allocator.cc. In which cases would such an error be returned? Maybe we can narrow it down through that.

@cheshire any idea what else could be different why it fails internally and not externally?

nouiz

comment created time in a month

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

I tried the commit in this PR and the gpu_device_test passed. I rebased it and pushed the rebased version and it still pass. The Github Linux CI passed too. So I'm not able to reproduce any error here.

What idea what would be different in your system that would make it fail?

Internally we are running with cuda 11.3, in open source it is cuda 11.2 (as far as I can see). Otherwise I think there is no difference. The line I quoted as failing is triggered in the test that you are adding in this PR. It did not fail at the time when your PR was first merged, but I think at that time we were not using cuda 11.3 yet. So this could indeed be the reason.

nouiz

comment created time in a month

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

Add back "PR #49173: [Crash fix] Fix cudaMallocAsync crashes."

@cheshire Note, the problem is the internal cuda_asan test fail. I do not have any information about it. So I can't fix it. I'm also not able to reproduce it.

The error is actually another one: F0817 14:52:42.569874 7691 gpu_cudamallocasync_allocator.cc:133] Failed to get device attribute: CUDA error: invalid argument (CUDA_ERROR_INVALID_VALUE)

This happens without cuda_asan. I guess that in the meantime, the surrounding code has changed, and you will have to adapt your change to that. I couldn't reproduce the cuda_asan error, so I guess that is fixed now.

nouiz

comment created time in a month

issue commenttensorflow/tensorflow

ROCM: Segmentation fault late in build process

@deven-amd Deven, could you maybe help here? In the ROCM RBE build, this seems to work fine.

supermar1010

comment created time in a month