Ask questionsLoss instability during training
I have already trained a WaveGlow model from scratch using LJ Speech dataset and everything worked well during training.
I now try to train a new model using a private dataset that contains only 2 hours of speech. Some audios are inferior to segment_length=16000
(approximatively 10 audios for a dataset composed of 2300 audios). This training is performed in FP32 and, except for batch_size=24
, I use the same hyper-parameters than the one in config.json
.
Training loss decreases slowly during 120k iterations (which represents a lot of epochs for my small dataset) but further iterations lead to two types of errors:
det(W)
at each layer to check if the determinant was crossing between positive and negative values, but when loss becomes NaN, all determinants are strictly positive. I don't really know which other term in the loss could cause a NaN issue.I already used this private dataset using WaveNet and everything worked well, which assumes that this dataset is not corrupted.
I tried to decrease the learning rate but instability still persists. Any insight or help to understand these problems would be greatly appreciated.
Answer
questions
nshmyrev
so the majority of the clips won't be longer than 16000ms (16s). So why the default config has segment_length=16000
16000 is in samples, not milliseconds. So the minimum size is 1 second or it will be padded with 0.
Consider also https://github.com/NVIDIA/waveglow/issues/95 on how segment_length causes NaNs
Related questions
No questions were found.