Ask questionsLoss instability during training

I have already trained a WaveGlow model from scratch using LJ Speech dataset and everything worked well during training.

I now try to train a new model using a private dataset that contains only 2 hours of speech. Some audios are inferior to segment_length=16000 (approximatively 10 audios for a dataset composed of 2300 audios). This training is performed in FP32 and, except for batch_size=24, I use the same hyper-parameters than the one in config.json.

Training loss decreases slowly during 120k iterations (which represents a lot of epochs for my small dataset) but further iterations lead to two types of errors:

  • NaN loss: I tracked det(W) at each layer to check if the determinant was crossing between positive and negative values, but when loss becomes NaN, all determinants are strictly positive. I don't really know which other term in the loss could cause a NaN issue.
  • Jump in the loss: The loss jumps from negative values to positive ones. The loss continues to diverge at next iterations and model forgets what it learned.

I already used this private dataset using WaveNet and everything worked well, which assumes that this dataset is not corrupted.

I tried to decrease the learning rate but instability still persists. Any insight or help to understand these problems would be greatly appreciated.


Answer questions nshmyrev

so the majority of the clips won't be longer than 16000ms (16s). So why the default config has segment_length=16000

16000 is in samples, not milliseconds. So the minimum size is 1 second or it will be padded with 0.

Consider also on how segment_length causes NaNs


Related questions

No questions were found.
Github User Rank List