Language Technologies Research Centre (LTRC) at the International Institute of Information Technology, Hyderabad (IIITH) is pleased to announce a shared task for dependency parsing in 7 Indian languages: Hindi, Kannada, Bengali, Telugu, Malayalam, Marathi, and Urdu.
Dependency Parsing is an essential NLP task that identifies the grammatical structure of a sentence and the relationships between its constituent words. It is also beneficial for natural language processing applications such as machine translation, dialog systems, summarization, and question answering systems.
The main goal of this shared task is to develop multilingual neural dependency parsing models for Indian languages. The dependency annotated data also contains rich linguistic information such as Part-of-Speech tags, Chunk tags, morph information (root, lexical category, gender, number, person, case, vibhakti marker). We also want to advance research in the direction of incorporating linguistic properties in neural architectures for downstream tasks like dependency parsing.
All participants are required to create a neural multilingual model that can parse a sentence in any of the languages.
Participants will be provided with a corpus of dependency annotated data for Hindi, Kannada, Bengali, Telugu, Malayalam, Marathi, and Urdu. The dependency annotations are based on the Paninian dependency framework. The guidelines of the dependency labels will also be released along with the data release. The data will be released in two formats: SSF (Shakti Standard Format) and CONLL.
The dependency parsers will be evaluated based on label attachment score (LAS). The evaluation will involve comparing the model's predictions against a blind test data.
To register for participation in the shared task, please fill this form.
Prizes will be awarded to the top-performing participants or teams.