profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/speechbrain/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

speechbrain/speechbrain 3027

A PyTorch-based Speech Toolkit

speechbrain/speechbrain.github.io 284

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

speechbrain/HyperPyYAML 11

Extensions to YAML syntax for better python interaction

speechbrain/kaldi 3

This is the official location of the Kaldi project.

speechbrain/simple_site 1

Minimal tutorial on making a simple website with GitHub Pages

issue openedspeechbrain/speechbrain

wav_len and src mismatch under multiple GPU

Hi,

I want to train the default LibriSpeech Transformer model on Tedlium dataset. Only the pipeline of audio processing and the size of Transformer are changed, other configurations remain the same.

Issue

I run the train.py under --data_parallel_backend, which means two GPUs.

I got this error:

  File "/home/anaconda3/envs/sb/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/envs/sb/lib/python3.8/site-packages/speechbrain/nnet/attention.py", line 416, in forward
    output, attention = self.att(
  File "/home/anaconda3/envs/sb/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/envs/sb/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 978, in forward
    return F.multi_head_attention_forward(
  File "/home/anaconda3/envs/sb/lib/python3.8/site-packages/torch/nn/functional.py", line 4283, in multi_head_attention_forward
    assert key_padding_mask.size(1) == src_len
AssertionError

It seems like the length of mask is unmatched, so I checked this.

    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig
        tokens_bos, _ = batch.tokens_bos
        # Add augmentation if specified
        ......
        # compute features
        ......
        # forward modules
        src = self.modules.CNN(feats) 
        enc_out, pred = self.modules.Transformer(
            src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index
        )

Before the wav_lens is fed into self.modules.Transformer, the print of wav_lens is tensor([0.3970, 0.2246, 1.0000, 0.1892], device='cuda:0') (the max length of src is 508)

When I checked the wav_lens again in TransformerASR.forward():

        (src_key_padding_mask,
         tgt_key_padding_mask,
         src_mask,
         tgt_mask,
        ) = self.make_masks(src, tgt, wav_len, pad_idx=pad_idx)

wav_len is separated into two parts:

  • tensor([0.3970, 0.2246], device='cuda:0')
  • tensor([1.0000, 0.1892], device='cuda:1')

Therefore, the shape of src_key_padding_mask under cuda:0 is [2, 202].

I know the problems of multiple GPUs are always complicated.

If you have any idea, could you tell me why the error occurs?

Possible solution

  1. If I run the train.py only on one GPU or CPU, no error occurs.
  2. I added wav_len = wav_len/wav_len.max() before self.make_masks(), no error occurs.

Is the second solution reasonable? Because I only know the wav_len is using a relative length, but don't know the effect of the second solution on Transformer.

Thanks so much.

created time in 3 hours

issue commentspeechbrain/speechbrain

How long the longest speech can be transcribed with transformer

In practice, for vanilla attention-based models, it is better to transcribe files that are close to the average length of the training dataset.

Jeremiah0425

comment created time in 3 hours

pull request commentspeechbrain/speechbrain

Add language identification recipe using the Voxlingua107 dataset

Thank you, let me try it now. Yes, it is on the shared filesystem because the dataset is bigger than the local SSD. It could indeed be a bottleneck, especially on LUSTRE filesystems. However, I did some experiments in the past on the same filesystem using webdataset and that wasn't a big bottleneck. I guess Aku has some experience to share as well with that.

On Mon, 20 Sept 2021 at 11:33, Tanel Alumäe ***@***.***> wrote:

There was indeed a problem with shard-000165.tar. Everything should be fixed now. Sorry about that.

Regarding training time: are the shards on a local disk or network share? If they are on a network drive, it could become a bottleneck, even when using this tar-file based reading which should be much more network-friendly than reading individual wav files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/speechbrain/speechbrain/pull/950#issuecomment-923038081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVS6FKOKPMBLCVYVP2TUC5H4TANCNFSM5CW5USUA .

alumae

comment created time in 8 hours

pull request commentspeechbrain/speechbrain

Add language identification recipe using the Voxlingua107 dataset

There was indeed a problem with shard-000165.tar. Everything should be fixed now. Sorry about that.

Regarding training time: are the shards on a local disk or network share? If they are on a network drive, it could become a bottleneck, even when using this tar-file based reading which should be much more network-friendly than reading individual wav files.

alumae

comment created time in 8 hours

issue commentspeechbrain/speechbrain

Language id performance

The phenomenon is the same as when I trained CommonLanguage. Actually, I printed out the voice id and listened to it. There is indeed no problem with these voices that report nan errors.

So there is no answer for this phenomenon?

lhanzl

comment created time in 8 hours

issue commentspeechbrain/speechbrain

Language id performance

The phenomenon is the same as when I trained CommonLanguage. Actually, I printed out the voice id and listened to it. There is indeed no problem with these voices that report nan errors.

lhanzl

comment created time in 8 hours

issue commentspeechbrain/speechbrain

Language id performance

Which model are you training? and on which data? Yes, try to see if reducing gradient_clipping is enough. According to my experience, sometimes there are some sentences in the datasets that systematically cause NaN issues. If this is the case, try to print the sentence ids (ids variable) and remove the critical sentences.

I'm traning ECAPA-TDNN embedding model with voxceleb1+2+own data.

lhanzl

comment created time in 12 hours

issue commentspeechbrain/speechbrain

Language id performance

Which model are you training? and on which data? Yes, try to see if reducing gradient_clipping is enough. According to my experience, sometimes there are some sentences in the datasets that systematically cause NaN issues. If this is the case, try to print the sentence ids (ids variable) and remove the critical sentences.

On Mon, 20 Sept 2021 at 07:10, asr-lord ***@***.***> wrote:

After 5 epochs of the training of my own model, following https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxCeleb/SpeakerRec, I get the NaN error.

Is it fixed @mravanelli https://github.com/mravanelli ? Or do I need to change the seed and add gradient_clipping: 5.0 in the hparams file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/speechbrain/speechbrain/issues/874#issuecomment-922831744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVV3DVAPISJXERYF6ODUC4JAVANCNFSM474DUXMA .

lhanzl

comment created time in 12 hours

issue commentspeechbrain/speechbrain

Language id performance

After 5 epochs of the training of my own model, following https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxCeleb/SpeakerRec, I get the NaN error.

Is it fixed @mravanelli ? Or do I need to change the seed and add gradient_clipping: 5.0 in the hparams file?

lhanzl

comment created time in 12 hours

pull request commentspeechbrain/speechbrain

Add language identification recipe using the Voxlingua107 dataset

I spoke too early. It looks like there is an issue in one of the shards:

tarfile.ReadError: ("unexpected end of data @ <_io.BufferedReader name='/miniscratch/ravanelm/voxlingua107/voxlingua107_shards//train/shard-000165.tar'>", <_io.BufferedReader name='/miniscratch/ravanelm/voxlingua107/voxlingua107_shards//train/shard-000165.tar'>, {'__url__': '/miniscratch/ravanelm/voxlingua107/voxlingua107_shards//train/shard-000165.tar', '__worker__': '(1, 4)', '__rank__': 'None', '__nodeinfo__': "('cn-c017', 94144)"})

alumae

comment created time in a day

pull request commentspeechbrain/speechbrain

Add language identification recipe using the Voxlingua107 dataset

Thank you @alumae. It seems working now. I'm training a model to see if I can replicate your numbers. According to the documentation, I see that an epoch should take 40 minutes on a NVIDIA A100. On my side, it looks like it takes 4 hours on a RTX8000. The difference sounds too much to me. What do you think?

alumae

comment created time in a day

startedspeechbrain/speechbrain

started time in a day

push eventspeechbrain/speechbrain

Aku Rouhe

commit sha 80aa2fa10df27a7aa72af3e7ff4ae0a3996ef443

Fix run_shell for non UTF-8 encoded output

view details

Mirco Ravanelli

commit sha 5a86e6a9c762bfadbca5bc41d153bb27a6576d3e

Merge pull request #996 from Gastron/issue-994-oserror-decode Fix Issue #994 / Fix run_shell for non UTF-8 encoded output

view details

push time in a day

PR merged speechbrain/speechbrain

Fix Issue #994 / Fix run_shell for non UTF-8 encoded output ready to review

This should fix issue #994.

+29 -5

1 comment

3 changed files

Gastron

pr closed time in a day

pull request commentspeechbrain/speechbrain

Fix Issue #994 / Fix run_shell for non UTF-8 encoded output

Thank you for the fix @Gastron!

Gastron

comment created time in a day

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain.github.io

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

startedspeechbrain/speechbrain

started time in 2 days

push eventspeechbrain/speechbrain

Parcollet Titouan

commit sha 8e03cf1c51735edfdc089b38ffba3d809003dd1d

wav2vect -> wav2vec

view details

push time in 2 days

startedspeechbrain/speechbrain

started time in 3 days