profile
viewpoint

Ask questionsLoss instability during training

I have already trained a WaveGlow model from scratch using LJ Speech dataset and everything worked well during training.

I now try to train a new model using a private dataset that contains only 2 hours of speech. Some audios are inferior to segment_length=16000 (approximatively 10 audios for a dataset composed of 2300 audios). This training is performed in FP32 and, except for batch_size=24, I use the same hyper-parameters than the one in config.json.

Training loss decreases slowly during 120k iterations (which represents a lot of epochs for my small dataset) but further iterations lead to two types of errors:

  • NaN loss: I tracked det(W) at each layer to check if the determinant was crossing between positive and negative values, but when loss becomes NaN, all determinants are strictly positive. I don't really know which other term in the loss could cause a NaN issue.
  • Jump in the loss: The loss jumps from negative values to positive ones. The loss continues to diverge at next iterations and model forgets what it learned.

I already used this private dataset using WaveNet and everything worked well, which assumes that this dataset is not corrupted.

I tried to decrease the learning rate but instability still persists. Any insight or help to understand these problems would be greatly appreciated.

NVIDIA/waveglow

Answer questions nshmyrev

so the majority of the clips won't be longer than 16000ms (16s). So why the default config has segment_length=16000

16000 is in samples, not milliseconds. So the minimum size is 1 second or it will be padded with 0.

Consider also https://github.com/NVIDIA/waveglow/issues/95 on how segment_length causes NaNs

useful!

Related questions

No questions were found.
source:https://uonfu.com/
Github User Rank List