• Technical Domain Identification 2020
  • Home
  • About
  • Register
  • Important Dates
  • Contact
  • Organizers
  • ICON 2020
Test Data Released

TechDOfication 2020 : Technical DOmain Identification



For this task, participants are asked to develop system/s that automatically identify the technical domain of a given text (a small passage) in specified Langauge (English, Bangla, Gujarati, Hindi, Malayalam, Marathi, Tamil, Telugu). Such a piece of text provides information about specific Coarse grained technical domains (subTask-1*) like Computer Science, Physics, Life Science, Law etc or the Fine grained subdomains (subTask-2*) for Computer Science domain, it can be of Operating System, Computer Network, Database etc.

Goals of the shared task:

  • To develop a language processing tool that potentially impacts research and downstream applications like Machine Translation, Summarization, Question Answering etc.
  • To provide the community a new dataset and boost the research for Technical domains .
  • For the downstream tasks (i.e Machine Translation) technical Domain Identification would be the first process. It determines the domain for a given input text and subsequently Machine Translation can choose its resources as per the identified domain. The task can also be viewed at coarse vs fine grained level based on the requirement.



Task Details

Subtask-1a Coarse-grained Domain Classification - English

The subtask is defined as follows:

  • Given: A short document consist of English text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Chemistry (che)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Law (law)
    • Math (math)
    • Physics (physics)

Evaluation

SubTask 1a will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Example : Domain - Chemistry (SubTask - 1*)

We are not going to that , remove it completely , but nevertheless this is an indication that , NO plus is going to be a poorer donor , compared to carbon monoxide . So , this drastic reduction in the stretching frequency can only happen if you have , a large population of the anti - bonding orbitals of NO plus . And it has got a structure , which is very similar , a structure which is very similar to the structure of nickel tetra carbonyl . You will see that , while carbon monoxide is ionized with 15 electron volts , if you supply 15 electron volts , carbon monoxide can be oxidized or ionized . So , from cobalt the electron density has flown into the nitric oxide pi star orbitals and so you have reduction in the stretching frequency .

Source : Web

Subtask-1b Coarse-grained Domain Classification - Bangla

The subtask is defined as follows:

  • Given: A short document consist of Bangla text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Management (mgmt)
    • Physics (physics)

Evaluation

SubTask 1b will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1c Coarse-grained Domain Classification - Gujarati

The subtask is defined as follows:

  • Given: A short document consist of Gujarati text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Management (mgmt)
    • Physics (physics)

Evaluation

SubTask 1c will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1d Coarse-grained Domain Classification - Hindi

The subtask is defined as follows:

  • Given: A short document consist of Hindi text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Management (mgmt)
    • Physics (physics)
    • Math (math)
    • Other (other)

Evaluation

SubTask 1d will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1e Coarse-grained Domain Classification - Malayalam

The subtask is defined as follows:

  • Given: A short document consist of Malayalam text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)

Evaluation

SubTask 1e will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1f Coarse-grained Domain Classification - Marathi

The subtask is defined as follows:

  • Given: A short document consist of Marathi text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Physics (physics)

Evaluation

SubTask 1f will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1g Coarse-grained Domain Classification - Tamil

The subtask is defined as follows:

  • Given: A short document consist of Tamil text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Management (mgmt)
    • Physics (physics)
    • Other (other)

Evaluation

SubTask 1g will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-1h Coarse-grained Domain Classification - Telugu

The subtask is defined as follows:

  • Given: A short document consist of Telugu text, participants have to develop an algorithm to identify which domain it belongs to from following.
    • Bio-Chemistry (bioche)
    • Communication Technology (com_tech)
    • Computer Science (cse)
    • Management (mgmt)
    • Physics (physics)
    • Math (math)

Evaluation

SubTask 1h will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Subtask-2a Fine-grained Domain Classification - Computer Science

The subtask is defined as follows:

  • Given: A short document consist of English text, participants have to develop an algorithm to identify which sub-domain it belongs to from following.
    • Artificial Intelligence (ai)
    • Algorithm (algo)
    • Computer Architecture (ca)
    • Computer Networks (cn)
    • Database Management system (dbms)
    • Programming (pro)
    • Software Engineering (se)

Evaluation

SubTask 2a will be scored using standard evaluation metrics, including accuracy, precision, recall and F1-score. The submissions will be ranked by F1-score.

Corpus

Download
Test Data Download
- Password for the data download will be shared after registration.
- by clicking download, you are agreeing to data license and share task rules

Example : Sub-Domain - Artificial Intelligence (SubTask - 2a)

Now we come to a point when two rules are stating that it is a mammal , one with a confidence of 0.72 and another with a confidence of 0.21 . Combination of evidences is what we already saw , certainty factor of e1 and e2 is min of certainty factor of e1 and certainty factor of e2 or disjunction is max of e1 and e2 . If it be certainly false then probability of h by e is 0 then MB is 0 , MD is 1 and certainty factor is minus 1 . And if there is some other rule , for example there is a rule which is something like this , if streptococcus then XYZ and the confidence the certainty factor of that is 0.8 . If we take the rule R1 then I will infer mammal with a confidence of 0.72 .But the second rule which has got in itself 0.21 will be added to this but before adding it should be multiplied with 1 minus MB ( h , e1 ) the same old formula we did here .

Source : Web


Registration

To register for participation in the shared tasks, please fill this form.


Important Dates

Please consult the Shared Task website for official dates for the Shared Tasks. All submission deadlines are 11:59 PM IST (Anywhere on Earth) Time Zone (UTC+5:30).

Event Date
Shared Task Announcement Oct 07, 2020
Registration Open Oct 07, 2020
Data Released Oct 14, 2020
Deadline for Registration Oct 30, 2020 Nov 08, 2020
Test Set Release (Blind) Nov 02, 2020 Nov 10, 2020
System Runs Due Nov 10, 2020 Nov 18, 2020
Preliminary System Reports Due in SoftConf Nov 20, 2020 Nov 28, 2020
Notification for Acceptance Dec 03, 2020
Camera Ready Due Dec 05, 2020
Participant Presentations at ICON 2020 TBD


Contact

For further information about this task and dataset, please contact:

  • Contact: techdofication2020@googlegroups.com




Organizing Committee

  • Dipti Misra Sharma (IIIT-Hyderabad)
  • Asif Ekbal (IIT-Patna)
  • Karunesh Arora (C-DAC, Noida)
  • Sudip Kumar Naskar (Jadavpur University)
  • Dipankar Ganguly (C-DAC, Noida)
  • Sobha L (AUKBC-Chennai)
  • Radhika Mamidi (IIIT-Hyderabad)
  • Sunita Arora (C-DAC, Noida)
  • Pruthwik Mishra (IIIT-Hyderabad)
  • Vandan Mujadia (IIIT-Hyderabad)


Contact: techdofication2020@googlegroups.com

Follow us: https://twitter.com/techdofication2020

© 2020 LTRC, IIIT-Hyderabad

Back to top