README.md 1.6 KB
Newer Older
Nikhilesh Bhatnagar's avatar
Nikhilesh Bhatnagar committed
1
# Dhruva IITM ASR
ssmt's avatar
ssmt committed
2

Nikhilesh Bhatnagar's avatar
Nikhilesh Bhatnagar committed
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## How to get the models

### For Indian Languages
Store the models and the respective dictionary files in the folder `asr/1/models`. The languages are listed on the website `https://asr.iitm.ac.in/models` 
The base url for the model is `wget https://asr.iitm.ac.in/SPRING_INX/models/fine_tuned/SPRING_INX_data2vec_aqc_Bengali.pt`
For dictionar is `wget https://asr.iitm.ac.in/SPRING_INX/models/dictionaries/SPRING_INX_Urdu_dict.txt`


### For english whisper
Place the model files in `asr/1/whisper_models`
* Install `faster_whisper`
* In a python interpreter import `faster_whisper` and run `model = faster_whisper.WhisperModel('large-v3', device='cuda', compute_type="int8", device_index=1, download_root='/path/to/repo/asr/1/whisper_models')`
* That will store the model files in that custom location

## To create the environment
* Install conda pack
* Create a new python 3.10 environment
* Clone `https://github.com/Speech-Lab-IITM/data2vec-aqc/tree/master` and `git apply` the patch `aqc.patch`
* You can now create the wheel using `python setup.py bdist_wheel`
* Do the same for `https://github.com/Spijkervet/torchaudio-augmentations`
* Do the same for `https://github.com/flashlight/sequence`
* And do `pip install git+https://github.com/kpu/kenlm.git fast_pytorch_kmeans tensorboardX flashlight-text soundfile torchaudio data2vec-aqc/dist/fairseq-0.12.2-cp310-cp310-linux_x86_64.whl sequence/dist/flashlight_sequence-0.0.0+91e2b0f.d20240210-cp310-cp310-linux_x86_64.whl torchaudio-augmentations/dist/torchaudio_augmentations-0.2.4-py3-none-any.whl faster-whisper`
* Finally use conda pack to save the env.