• Unsupervised Technical Domain Terms Extraction 2020
  • Home
  • About
  • Register
  • Important Dates
  • Contact
  • Organizers
  • ICON 2020
Test Data Released

Unsupervised Technical Domain Terms Extraction (TermTraction - 2020)



Automatic extraction of domain-specific terms based on relevant domain concepts for a given document is a highly challenging task. Domain Term extraction can be defined as the automated process of identifying terminology from domain specific texts. Domain terms are described as textal span which represent/describe the domain. Participants are asked to develop an unsupervised system/s which automatically identify the technical terms from the given text (a document) of English. Such a document provides information about specific technical domains like Computer Science, Physics, Life Science, Law etc.

Domain dependent machine translation needs to pay special attention to the domain configuration especially to the domain terms. Using this as a feature in machine translation (MT) systems has shown benefit for overall translation adequacy.

For this task, Participants will be provided with domain specific corpora/text without any tagging and Participants can use these domain specific monolingual data to develop their unsupervised domain term extraction system/s. Participants have to use only provided monolingual domain corpora for the task but can use available tools to improve their system (i.e POS tagger, morph, etc.).

Goals of the shared task:

  • To develop a language processing tool that potentially impacts research and downstream applications like Machine Translation, Summarization, Question Answering etc.
  • To provide the community a new dataset and boost the research for Technical domains .
  • For the downstream tasks (i.e Machine Translation) technical Domain Term Identification would be the important process. It determines the list of domain terms for a given input text and subsequently Machine Translation can choose its resources.



Task Details

Unsupervised Technical Domain Terms Extraction (TermTraction - 2020) - English

The Task is defined as follows:

  • Given: Domain specific documents, participants have to develop an unsupervised algorithm to identify list of Technical Terms of that domain. Technical Domains are as following.
    • BioChemistry
    • Communication
    • Computer_science
    • Law

Evaluation

SubTask will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download

- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Example : Domain - Chemistry , Domain Terms are in green

We are not going to that , remove it completely , but nevertheless this is an indication that , NO plus is going to be a poorer donor , compared to carbon monoxide . So , this drastic reduction in the stretching frequency can only happen if you have , a large population of the anti - bonding orbitals of NO plus . And it has got a structure , which is very similar , a structure which is very similar to the structure of nickel tetra carbonyl . You will see that , while carbon monoxide is ionized with 15 electron volts , if you supply 15 electron volts ,carbon monoxide can be oxidized or ionized .

Source : Web


Registration

To register for participation in the shared tasks, please fill this form.


Important Dates

Please consult the Shared Task website for official dates for the Shared Tasks. All submission deadlines are 11:59 PM IST (Anywhere on Earth) Time Zone (UTC+5:30).

Event Date
Shared Task Announcement Oct 07, 2020
Registration Open Oct 07, 2020
Data Released Oct 14, 2020
Deadline for Registration Oct 30, 2020 Nov 08, 2020
Test Set Release (Blind) Nov 02, 2020 Nov 10, 2020
System Runs Due Nov 10, 2020 Nov 18, 2020
Preliminary System Reports Due in SoftConf Nov 20, 2020 Nov 28, 2020
Notification for Acceptance Dec 03, 2020
Camera Ready Due Dec 05, 2020
Participant Presentations at ICON 2020 TBD


Contact

For further information about this task and dataset, please contact:

  • Contact: termtraction2020@googlegroups.com




Organizing Committee

  • Dipti Misra Sharma (IIIT-Hyderabad)
  • Asif Ekbal (IIT-Patna)
  • Karunesh Arora (C-DAC, Noida)
  • Sudip Kumar Naskar (Jadavpur University)
  • Dipankar Ganguly (C-DAC, Noida)
  • Sobha L (AUKBC-Chennai)
  • Radhika Mamidi (IIIT-Hyderabad)
  • Sunita Arora (C-DAC, Noida)
  • Pruthwik Mishra (IIIT-Hyderabad)
  • Vandan Mujadia (IIIT-Hyderabad)


Contact: termtraction2020@googlegroups.com

Follow us: https://twitter.com/termtraction2020

© 2020 LTRC, IIIT-Hyderabad

Back to top