Disfluency identification is a fundamental natural language processing (NLP) task that plays a crucial role in improving the accuracy and fluency of spoken language processing applications such as automatic speech recognition (ASR), machine translation, dialog systems, and language understanding. Disfluencies are interruptions, hesitations, or corrections in spoken language that can impact the overall performance and usability of such applications. This shared task focuses on disfluency identification in six Indian languages, namely Hindi, Kannada, Bengali, Telugu, Tamil and Marathi. The primary objective of this shared task is to advance the development of disfluency identification models and systems for Indian languages. Participants are encouraged to create disfluency detection models on text that can automatically identify disfluencies in spoken text for the specified languages. Participants will be provided with a corpus of transcribed speech and text data in Hindi, Kannada, Bengali, Telugu, Tamil and Marathi (10hrs of transcribed speech per language). The dataset will include a mix of scripted, synthetic and spontaneous speech, representing various domains such as news, interviews, conversations, technical, educational and more. The dataset has been annotated with disfluency labels, including hesitations, repetitions, corrections, and other common disfluency types.
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
The subtask is defined as follows:
SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall, F1-score and BLEU score. The submissions will be ranked by F1-score.
Training Data : Part 1 and Part 2 to be released - Password for the data download will be shared after registration. - by clicking download, you are agreeing to data license and share task rules
To register for participation in the shared tasks, please fill this form.
Please consult the Shared Task website for official dates for the Shared Tasks. All submission deadlines are 11:59 PM IST (Anywhere on Earth) Time Zone (UTC+5:30).
Event | Date |
---|---|
Shared Task Announcement | November 14, 2023 |
Registration Open | November 14, 2023 |
Training Data Released | November 20, 2023 |
Deadline for Registration | |
Training Data - 2 Released | December 01, 2023 |
Test Set Release (Blind) | |
System Runs Due | |
Preliminary System Reports Due in SoftConf | |
Notification for Report Acceptance | |
Camera Ready Due | |
Participant Presentations at ICON 2023 | December 17, 2023 |
Prizes will be awarded to the top-performing participants or teams.
1st Prize : 20K INR 2nd Prize : 15K INR 3rd Prize : 10K INRFor further information about this task and dataset, please contact: