profile
viewpoint

Ask questionsGetting error "RuntimeError: CUDA error: device-side assert triggered" in epoch 0 on WaveGrad vocoder training

First things first: Thanks @nmstoker for providing great gatherup tool :+1: .

  • I'm trying to train WaveGrad vocoder model for use with our pretrained taco2 thorsten model (ljspeech structure)
  • Using NVidia xavier AGX machine (ubuntu aarch64) for training
  • Copied and adjusted "wavegrad_libritts.json" to match our taco2 audio settings
  • I've uploaded taco2 config (irrelevant for wavegrad training, or?) and vocoder config used for training. wavegrad-thorsten-conf.zip

Details

While in epoch 0 i get following error:

python ./TTS/bin/train_vocoder_wavegrad.py --config_path ./TTS/vocoder/configs/wavegrad_thorsten.json 
 > Using CUDA:  True
 > Number of GPUs:  1
   >  Mixed precision is enabled
 > Git Hash: ac46c3f
 > Experiment folder: /home/thorsten/___prj/tts/models/vocoder/wavegrad/mozilla/wavegrad-model-output/wavegrad-thorsten-November-30-2020_01+03PM-ac46c3f
 > Loading wavs from: /home/thorsten/___prj/tts/datasets/thorsten-de_v02/
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > stats_path:/home/thorsten/___prj/tts/models/vocoder/wavegrad/mozilla/taco2-files/taco2-scale_stats.npy
 | > hop_length:256
 | > win_length:1024
 > Generator Model: wavegrad
 > WaveGrad has 15827106 parameters

 > EPOCH: 0/10000

 > TRAINING (2020-11-30 13:04:15) 
/media/nvidia/WD_NVME/PyTorch/JetPack_4.4/GA/pytorch-v1.6.0/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [47,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
 ! Run is removed from /home/thorsten/___prj/tts/models/vocoder/wavegrad/mozilla/wavegrad-model-output/wavegrad-thorsten-November-30-2020_01+03PM-ac46c3f
Traceback (most recent call last):
  File "./TTS/bin/train_vocoder_wavegrad.py", line 504, in <module>
    main(args)
  File "./TTS/bin/train_vocoder_wavegrad.py", line 401, in main
    epoch)
  File "./TTS/bin/train_vocoder_wavegrad.py", line 116, in train
    noise, x_noisy, noise_scale = model.compute_y_n(x)
  File "/home/thorsten/___prj/tts/models/vocoder/wavegrad/mozilla/lib/python3.6/site-packages/TTS-0.0.7+ac46c3f-py3.6-linux-aarch64.egg/TTS/vocoder/models/wavegrad.py", line 110, in compute_y_n
    noise_scale = l_a + torch.rand(y_0.shape[0]).to(y_0) * (l_b - l_a)
RuntimeError: CUDA error: device-side assert triggered
(mozilla) thorsten@nvidia-agx:~/___prj/tts/models/vocoder/wavegrad/mozilla/TTS$ 

Platform OS

  • Linux

Python Environment

  • Python 3.6.9

  • Virtual env: Venv / virtualenv

Package Installation

  • TTS installed from source on GitHub <br> <details> <summary>Click to see package list (package count: 116)</summary><br>

    :package: Package list from Pip

    Package Version
    absl-py 0.11.0
    astroid 2.4.2
    astunparse 1.6.3
    attrdict 2.0.1
    attrs 20.3.0
    audioread 2.1.9
    bokeh 1.4.0
    cachetools 4.1.1
    cardboardlint 1.3.0
    certifi 2020.11.8
    cffi 1.14.4
    chardet 3.0.4
    click 7.1.2
    clldutils 3.5.4
    colorama 0.4.4
    colorlog 4.6.2
    commonmark 0.9.1
    confuse 1.3.0
    csvw 1.8.1
    cycler 0.10.0
    Cython 0.29.21
    dataclasses 0.7
    decorator 4.4.2
    docopt 0.6.2
    filelock 3.0.12
    Flask 1.1.2
    future 0.18.2
    gast 0.3.3
    gatherup 0.0.4
    gdown 3.12.2
    german-transliterate 0.1.3
    google-auth 1.23.0
    google-auth-oauthlib 0.4.2
    google-pasta 0.2.0
    grpcio 1.33.2
    h5py 2.10.0
    idna 2.10
    importlib-metadata 3.1.0
    importlib-resources 3.3.0
    inflect 5.0.2
    isodate 0.6.0
    isort 4.3.21
    itsdangerous 1.1.0
    Jinja2 2.11.2
    joblib 0.17.0
    Keras-Preprocessing 1.1.2
    kiwisolver 1.3.1
    lazy-object-proxy 1.4.3
    librosa 0.7.2
    llvmlite 0.31.0
    Markdown 3.3.3
    MarkupSafe 1.1.1
    matplotlib 3.3.3
    mccabe 0.6.1
    nose 1.3.7
    num2words 0.5.10
    numba 0.48.0
    numpy 1.18.5
    oauthlib 3.1.0
    opt-einsum 3.3.0
    packaging 20.7
    phonemizer 2.2.1
    Pillow 8.0.1
    pip 20.2.4
    pkg-resources 0.0.0
    prompt-toolkit 3.0.8
    protobuf 3.14.0
    pyasn1 0.4.8
    pyasn1-modules 0.2.8
    pycparser 2.20
    Pygments 2.7.2
    pylint 2.5.3
    pyparsing 2.4.7
    pysbd 0.3.3
    PySocks 1.7.1
    python-dateutil 2.8.1
    pyworld 0.2.12
    PyYAML 5.3.1
    questionary 1.8.1
    regex 2020.11.13
    requests 2.25.0
    requests-oauthlib 1.3.0
    resampy 0.2.2
    rfc3986 1.4.0
    rich 8.0.0
    rsa 4.6
    scikit-learn 0.23.2
    scipy 1.4.1
    segments 2.1.3
    setuptools 50.3.2
    six 1.15.0
    SoundFile 0.10.3.post1
    tabulate 0.8.7
    tensorboard 2.4.0
    tensorboard-plugin-wit 1.7.0
    tensorboardX 2.1
    tensorflow 2.3.0+nv20.9
    tensorflow-estimator 2.3.0
    termcolor 1.1.0
    threadpoolctl 2.1.0
    toml 0.10.2
    torch 1.6.0
    tornado 6.1
    tqdm 4.54.0
    TTS 0.0.7+ac46c3f
    typed-ast 1.4.1
    typing-extensions 3.7.4.3
    umap-learn 0.4.6
    Unidecode 0.4.20
    uritemplate 3.0.1
    urllib3 1.26.2
    wcwidth 0.2.5
    Werkzeug 1.0.1
    wheel 0.35.1
    wrapt 1.12.1
    zipp 3.4.0

</details>

- generated at 13:31 on Nov 30 2020 using Gather Up tool :gift:

mozilla/TTS

Answer questions nmstoker

I realise each case can be different but I'm pretty sure it went okay for me using the WaveGrad in TTS with the LR at 1e-4, although I did hit some minor problems where I was continuing training (although looks like those aren't quite what you have here).

For the original error, "CUDA error: device-side assert triggered", I think you may be able to get more detail if you run it again with: CUDA_LAUNCH_BLOCKING="1"

useful!
source:https://uonfu.com/
Github User Rank List