Ask questionsLoss instability during training
I have already trained a WaveGlow model from scratch using LJ Speech dataset and everything worked well during training.
I now try to train a new model using a private dataset that contains only 2 hours of speech. Some audios are inferior to
segment_length=16000 (approximatively 10 audios for a dataset composed of 2300 audios). This training is performed in FP32 and, except for
batch_size=24, I use the same hyper-parameters than the one in
Training loss decreases slowly during 120k iterations (which represents a lot of epochs for my small dataset) but further iterations lead to two types of errors:
det(W)at each layer to check if the determinant was crossing between positive and negative values, but when loss becomes NaN, all determinants are strictly positive. I don't really know which other term in the loss could cause a NaN issue.
I already used this private dataset using WaveNet and everything worked well, which assumes that this dataset is not corrupted.
I tried to decrease the learning rate but instability still persists. Any insight or help to understand these problems would be greatly appreciated.
Answer questions nshmyrev
so the majority of the clips won't be longer than 16000ms (16s). So why the default config has segment_length=16000
16000 is in samples, not milliseconds. So the minimum size is 1 second or it will be padded with 0.
Consider also https://github.com/NVIDIA/waveglow/issues/95 on how segment_length causes NaNs
Related questionsNo questions were found.