From b21925aca04da924aa3303ab9e7fcccda5fe7321 Mon Sep 17 00:00:00 2001 From: Rishik Reddy <36101925+RISHIKREDDYL@users.noreply.github.com> Date: Tue, 6 Feb 2024 16:50:10 +0530 Subject: [PATCH] Update README.md (#2389) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 890d7d8a9..3aa8c2cfc 100644 --- a/README.md +++ b/README.md @@ -146,7 +146,7 @@ The results will be saved in the `output_folder` specified in the YAML file. ## 🎙️ Speech/Audio Processing | Tasks | Datasets | Technologies/Models | | ------------- |-------------| -----| -| Speech Recognition | [AISHELL-1](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1), [CommonVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice), [DVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/DVoice), [KsponSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/KsponSpeech), [LibriSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech), [MEDIA](https://github.com/speechbrain/speechbrain/tree/develop/recipes/MEDIA), [RescueSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/RescueSpeech), [Switchboard](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Switchboard), [TIMIT](https://github.com/speechbrain/speechbrain/tree/develop/recipes/TIMIT), [Tedlium2](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Tedlium2), [Voicebank](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Voicebank) | [CTC](https://www.cs.toronto.edu/~graves/icml_2006.pdf), [Tranducers](https://arxiv.org/pdf/1211.3711.pdf?origin=publication_detail), [Tranformers](https://arxiv.org/abs/1706.03762), [Seq2Seq](http://zhaoshuaijiang.com/file/Hybrid_CTC_Attention_Architecture_for_End-to-End_Speech_Recognition.pdf), [Beamsearch techniques for CTC](https://arxiv.org/pdf/1911.01629.pdf),[seq2seq](https://arxiv.org/abs/1904.02619.pdf),[transducers](https://www.merl.com/publications/docs/TR2017-190.pdf)), [Rescoring](https://arxiv.org/pdf/1612.02695.pdf), [Conformer](https://arxiv.org/abs/2005.08100), [Branchformer](https://arxiv.org/abs/2207.02971), [Hyperconformer](https://arxiv.org/abs/2305.18281), [Kaldi2-FST](https://github.com/k2-fsa/k2) | +| Speech Recognition | [AISHELL-1](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1), [CommonVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice), [DVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/DVoice), [KsponSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/KsponSpeech), [LibriSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech), [MEDIA](https://github.com/speechbrain/speechbrain/tree/develop/recipes/MEDIA), [RescueSpeech](https://github.com/speechbrain/speechbrain/tree/develop/recipes/RescueSpeech), [Switchboard](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Switchboard), [TIMIT](https://github.com/speechbrain/speechbrain/tree/develop/recipes/TIMIT), [Tedlium2](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Tedlium2), [Voicebank](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Voicebank) | [CTC](https://www.cs.toronto.edu/~graves/icml_2006.pdf), [Tranducers](https://arxiv.org/pdf/1211.3711.pdf?origin=publication_detail), [Transformers](https://arxiv.org/abs/1706.03762), [Seq2Seq](http://zhaoshuaijiang.com/file/Hybrid_CTC_Attention_Architecture_for_End-to-End_Speech_Recognition.pdf), [Beamsearch techniques for CTC](https://arxiv.org/pdf/1911.01629.pdf),[seq2seq](https://arxiv.org/abs/1904.02619.pdf),[transducers](https://www.merl.com/publications/docs/TR2017-190.pdf)), [Rescoring](https://arxiv.org/pdf/1612.02695.pdf), [Conformer](https://arxiv.org/abs/2005.08100), [Branchformer](https://arxiv.org/abs/2207.02971), [Hyperconformer](https://arxiv.org/abs/2305.18281), [Kaldi2-FST](https://github.com/k2-fsa/k2) | | Speaker Recognition | [VoxCeleb](https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxCeleb) | [ECAPA-TDNN](https://arxiv.org/abs/2005.07143), [ResNET](https://arxiv.org/pdf/1910.12592.pdf), [Xvectors](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf), [PLDA](https://ieeexplore.ieee.org/document/6639151), [Score Normalization](https://www.sciencedirect.com/science/article/abs/pii/S1051200499903603) | | Speech Separation | [WSJ0Mix](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WSJ0Mix), [LibriMix](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriMix), [WHAM!](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WHAMandWHAMR), [WHAMR!](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WHAMandWHAMR), [Aishell1Mix](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Aishell1Mix), [BinauralWSJ0Mix](https://github.com/speechbrain/speechbrain/tree/develop/recipes/BinauralWSJ0Mix) | [SepFormer](https://arxiv.org/abs/2010.13154), [RESepFormer](https://arxiv.org/abs/2206.09507), [SkiM](https://arxiv.org/abs/2201.10800), [DualPath RNN](https://arxiv.org/abs/1910.06379), [ConvTasNET](https://arxiv.org/abs/1809.07454) | | Speech Enhancement | [DNS](https://github.com/speechbrain/speechbrain/tree/develop/recipes/DNS), [Voicebank](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Voicebank) | [SepFormer](https://arxiv.org/abs/2010.13154), [MetricGAN](https://arxiv.org/abs/1905.04874), [MetricGAN-U](https://arxiv.org/abs/2110.05866), [SEGAN](https://arxiv.org/abs/1703.09452), [spectral masking](http://staff.ustc.edu.cn/~jundu/Publications/publications/Trans2015_Xu.pdf), [time masking](http://staff.ustc.edu.cn/~jundu/Publications/publications/Trans2015_Xu.pdf) | -- GitLab