Ask questionscublas runtime error on torch.bmm() with CUDA10 and RTX2080Ti

🐛 Bug

When running torch.bmm(a, b) on cuda tensors I get the following error

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/

I'm running the appropriate CUDA for the card (plenty of other models have trained fine). The same CUDA version on a machine with a 1080Ti works fine.

To Reproduce

Here is a minimal script to reproduce the problem:

import torch

x = torch.randn(32, 60, 60)

torch.bmm(x, x) # no problem without cuda

x = x.cuda()

(x ** 2).sum()  # no problem with other cuda operations

torch.bmm(x, x) # crash here


python Collecting environment information... PyTorch version: 1.1.0 Is debug build: No CUDA used to build PyTorch: 9.0.176

OS: CentOS Linux release 7.4.1708 (Core) GCC version: (GCC) 6.3.0 CMake version: version

Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti Nvidia driver version: 410.72 cuDNN version: Could not collect

Versions of relevant libraries: [pip3] numpy==1.15.4 [pip3] torch==1.1.0 [pip3] torchsample==0.1.2 [pip3] torchvision==0.2.1 [conda] cuda80 1.0 0 soumith [conda] mkl 2017.0.3 0
[conda] pytorch 0.2.0 py36h53baedd_4cu80 [cuda80] soumith [conda] torch 1.1.0 <pip> [conda] torchsample 0.1.2 <pip> [conda] torchvision 0.1.9 py36h7584368_1 soumith [conda] torchvision 0.2.1 <pip>


Answer questions mrshenli

@pbloem please let us know if @ngimel's reply solves the problem.


