profile
viewpoint

Ask questionsCuda required when loading a TorchScript with map_location='cpu'

🐛 Bug

I'm exporting torchscripts from a machine equiped with cuda. When trying to use the models loaded with map_location='cpu' on my Macbook (which had at some point an external GPU but doesn't anymore), I get the following error (full trace is bellow):

Cannot initialize CUDA without ATen_cuda library.

To Reproduce

Minimal code sample

import torch

model = torch.jit.load('checkpoint-10000-embedding.torchscript',
                       map_location='cpu')
model.eval()

x = torch.ones(1, 3, 224, 224)
y = model(x)

the TorchScript is here, which is simply a MobileNetv2 fine-tuned on GPU.

This should raise:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-38-7e4fd521c4f1> in <module>
      6 
      7 x = torch.ones(1, 3, 224, 224)
----> 8 y = model(x)

/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

RuntimeError: 
Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at ../aten/src/ATen/detail/CUDAHooksInterface.h:58)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x11cf4ae77 in libc10.dylib)
frame #1: at::CUDAHooksInterface::initCUDA() const + 129 (0x1192dfe71 in libcaffe2.dylib)
frame #2: at::Context::lazyInitCUDA()::'lambda'()::operator()() const + 30 (0x1192c3ace in libcaffe2.dylib)
frame #3: std::__1::__call_once(unsigned long volatile&, void*, void (*)(void*)) + 139 (0x7fff5e67a6a8 in libc++.1.dylib)
frame #4: at::LegacyDeviceTypeInit::initCUDA() const + 148 (0x1192c38f4 in libcaffe2.dylib)
frame #5: std::__1::__call_once(unsigned long volatile&, void*, void (*)(void*)) + 139 (0x7fff5e67a6a8 in libc++.1.dylib)
frame #6: at::LegacyTypeDispatch::initForDeviceType(c10::DeviceType) + 176 (0x1192c16a0 in libcaffe2.dylib)
frame #7: at::LegacyTypeDispatch::getNonVariableType(c10::Backend, c10::ScalarType) + 46 (0x1192c11fe in libcaffe2.dylib)
frame #8: at::getType(c10::TensorOptions) + 168 (0x1192bfa88 in libcaffe2.dylib)
frame #9: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool) + 575 (0x11951441f in libcaffe2.dylib)
frame #10: at::TypeDefault::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool) const + 37 (0x119852705 in libcaffe2.dylib)
frame #11: torch::autograd::VariableType::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool) const + 978 (0x122e44c12 in libtorch.1.dylib)
frame #12: std::__1::__function::__func<torch::jit::(anonymous namespace)::$_489, std::__1::allocator<torch::jit::(anonymous namespace)::$_489>, int (std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&)>::operator()(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 255 (0x1233fff6f in libtorch.1.dylib)
frame #13: torch::jit::InterpreterStateImpl::runImpl(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 245 (0x12345c6a5 in libtorch.1.dylib)
frame #14: torch::jit::InterpreterStateImpl::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 28 (0x1234546ac in libtorch.1.dylib)
frame #15: torch::jit::(anonymous namespace)::ExecutionPlan::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) const + 42 (0x12343a35a in libtorch.1.dylib)
frame #16: torch::jit::GraphExecutorImpl::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 5008 (0x1234312c0 in libtorch.1.dylib)
frame #17: torch::jit::script::Method::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 306 (0x118e004c2 in libtorch_python.dylib)
frame #18: pybind11::object torch::jit::invokeScriptMethodFromPython<torch::jit::script::Method>(torch::jit::script::Method&, torch::jit::tuple_slice, pybind11::kwargs) + 52 (0x118dfffe4 in libtorch_python.dylib)
frame #19: void pybind11::cpp_function::initialize<torch::jit::script::initJitScriptBindings(_object*)::$_23, pybind11::object, pybind11::args, pybind11::kwargs, pybind11::name, pybind11::is_method, pybind11::sibling>(torch::jit::script::initJitScriptBindings(_object*)::$_23&&, pybind11::object (*)(pybind11::args, pybind11::kwargs), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 269 (0x118dffd3d in libtorch_python.dylib)
frame #20: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3372 (0x118954f6c in libtorch_python.dylib)
frame #21: _PyMethodDef_RawFastCallDict + 472 (0x10ef882f1 in Python)
frame #22: _PyCFunction_FastCallDict + 44 (0x10ef87941 in Python)
frame #23: _PyObject_Call_Prepend + 150 (0x10ef889ed in Python)
frame #24: PyObject_Call + 136 (0x10ef87d9b in Python)
frame #25: slot_tp_call + 92 (0x10efc5610 in Python)
frame #26: PyObject_Call + 136 (0x10ef87d9b in Python)
frame #27: _PyEval_EvalFrameDefault + 7162 (0x10f015e6d in Python)
frame #28: _PyEval_EvalCodeWithName + 1867 (0x10f01d6d3 in Python)
frame #29: _PyFunction_FastCallDict + 441 (0x10ef878c1 in Python)
frame #30: _PyObject_Call_Prepend + 150 (0x10ef889ed in Python)
frame #31: slot_tp_call + 71 (0x10efc55fb in Python)
frame #32: _PyObject_FastCallKeywords + 358 (0x10ef87af4 in Python)
frame #33: call_function + 746 (0x10f01ce20 in Python)
frame #34: _PyEval_EvalFrameDefault + 6594 (0x10f015c35 in Python)
frame #35: _PyEval_EvalCodeWithName + 1867 (0x10f01d6d3 in Python)
frame #36: PyEval_EvalCode + 51 (0x10f0141d0 in Python)
frame #37: builtin_exec + 554 (0x10f011c6e in Python)
frame #38: _PyMethodDef_RawFastCallKeywords + 495 (0x10ef886f2 in Python)
frame #39: _PyCFunction_FastCallKeywords + 44 (0x10ef87c8e in Python)
frame #40: call_function + 636 (0x10f01cdb2 in Python)
frame #41: _PyEval_EvalFrameDefault + 6594 (0x10f015c35 in Python)
frame #42: gen_send_ex + 244 (0x10ef93735 in Python)
frame #43: _PyEval_EvalFrameDefault + 17203 (0x10f0185a6 in Python)
frame #44: gen_send_ex + 244 (0x10ef93735 in Python)
frame #45: _PyEval_EvalFrameDefault + 17203 (0x10f0185a6 in Python)
frame #46: gen_send_ex + 244 (0x10ef93735 in Python)
frame #47: _PyMethodDef_RawFastCallKeywords + 590 (0x10ef88751 in Python)
frame #48: _PyMethodDescr_FastCallKeywords + 81 (0x10ef8cf95 in Python)
frame #49: call_function + 801 (0x10f01ce57 in Python)
frame #50: _PyEval_EvalFrameDefault + 6414 (0x10f015b81 in Python)
frame #51: function_code_fastcall + 112 (0x10ef88068 in Python)
frame #52: call_function + 753 (0x10f01ce27 in Python)
frame #53: _PyEval_EvalFrameDefault + 6594 (0x10f015c35 in Python)
frame #54: function_code_fastcall + 112 (0x10ef88068 in Python)
frame #55: call_function + 753 (0x10f01ce27 in Python)
frame #56: _PyEval_EvalFrameDefault + 6414 (0x10f015b81 in Python)
frame #57: _PyEval_EvalCodeWithName + 1867 (0x10f01d6d3 in Python)
frame #58: _PyFunction_FastCallDict + 441 (0x10ef878c1 in Python)
frame #59: _PyObject_Call_Prepend + 150 (0x10ef889ed in Python)
frame #60: PyObject_Call + 136 (0x10ef87d9b in Python)
frame #61: _PyEval_EvalFrameDefault + 7162 (0x10f015e6d in Python)
frame #62: _PyEval_EvalCodeWithName + 1867 (0x10f01d6d3 in Python)
frame #63: _PyFunction_FastCallKeywords + 225 (0x10ef87c53 in Python)
:
operation failed in interpreter:
  running_mean49 = _120.running_mean
  running_var49 = _120.running_var
  _121 = getattr(_0, "18")
  _122 = getattr(_121, "0").weight
  _123 = getattr(_121, "1")
  weight50 = _123.weight
  bias50 = _123.bias
  running_mean50 = _123.running_mean
  running_var50 = _123.running_var
  _124 = torch.to(CONSTANTS.c0, torch.device("cuda:0"), 6, False, False)
         ~~~~~~~~ <--- HERE
  mean = torch.detach(_124)
  _125 = torch.to(CONSTANTS.c1, torch.device("cuda:0"), 6, False, False)
  std = torch.detach(_125)
  _126 = torch.sub(x, torch.view(mean, [1, -1, 1, 1]), alpha=1)
  input = torch.div(_126, torch.view(std, [1, -1, 1, 1]))
  input0 = torch._convolution(input, _2, None, [2, 2], [1, 1], [1, 1], False, [0, 0], 1, False, False, True)
  input1 = torch.batch_norm(input0, weight, bias, running_mean, running_var, False, 0.10000000000000001, 1.0000000000000001e-05, True)
  input2 = torch.hardtanh_(input1, 0., 6.)
  input3 = torch._convolution(input2, _5, None, [1, 1], [1, 1], [1, 1], False, [0, 0], 32, False, False, True)

Expected behavior

<!-- A clear and concise description of what you expected to happen. -->

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: Could not collect

OS: Mac OSX 10.14.5
GCC version: Could not collect
CMake version: version 3.14.4

Python version: 3.7
Is CUDA available: No
CUDA runtime version: 9.2.148
GPU models and configuration: Could not collect
Nvidia driver version: 1.1.0
cuDNN version: Probably one of the following:
/usr/local/cuda/lib/libcudnn.7.dylib
/usr/local/cuda/lib/libcudnn.dylib
/usr/local/cuda/lib/libcudnn_static.a

Versions of relevant libraries:
[pip3] numpy==1.16.3
[pip3] torch==1.1.0
[pip3] torchvision==0.2.2.post3
[conda] Could not collect
pytorch/pytorch

Answer questions notsimon

Yes, that's correct this model came from a trace without prior CPU conversion. Thanks for your help!

useful!

Related questions

TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above hot 3
AttributeError: module 'torch.jit' has no attribute 'unused' hot 3
Script freezes with no output when using DistributedDataParallel hot 2
Adding Pixel Unshuffle hot 2
DataLoader leaking Semaphores. hot 2
[feature request] Add matrix exponential hot 2
cublas runtime error on torch.bmm() with CUDA10 and RTX2080Ti hot 2
libtorch does not initialize OpenMP/MKL by default hot 2
Use torch.device() with torch.load(..., map_location=torch.device()) hot 2
PyTorch 1.5 failed to import c:miniconda3-x64envs estlibsite-packages orchlibcaffe2_nvrtc.dll - pytorch hot 2
Error during import torch, NameError: name &#39;_C&#39; is not defined - pytorch hot 2
Quantisation of object detection models. hot 2
Problems with install python from source hot 2
torch.utils.tensorboard.SummaryWriter.add_graph do not support non-tensor inputs - pytorch hot 2
a retrained and saved jit module could not be reload. hot 2
source:https://uonfu.com/
Github User Rank List