README 2.78 KB
Newer Older
priyank's avatar
priyank committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
						CRF CHUNKING 1.4 (CHUNKING)
						--------------------------
Malayalam CRF Chunker : Chunking involves identifying simple noun phrases, verb groups, adjectival
phrases,and adverbial phrases in a sentence. This involves identifying the boundary
of chunks and marking the label.

There is two part in chunker one is training(training mode) and other is testing
(perform or run mode).

In training mode, we train the chunker using the manually chunked annotated data and
generate trained model (e.g. 120k_mal_chunker.model).

In perform/run mode, we use above parameter to tag the input sentence.Both training
and testing pos tagger internally use the BIO format.

Requirements:
------------
Operating System		:    LINUX/UNIX system

Compiler/Interpreter/Librarie(s):    Perl, SSF API's and C++ compiler (gcc 3.0 or higher)

we assumed that CRF Tool kit is installed at your system.
if CRF Tool Kit is not installed at your system then you download
and install from http://crfpp.sourceforge.net/ site.

version of CRF should be 0.42 or higher.

For installation on Linux, please refer to the file INSTALL.

Directory Structure:
--------------------

chunker
     |
     |    
     |---data_bin (data bin files)
     |     |
     |     |---mal/120k_mal
     |	   |
     |     |---mal/training_wx.txt
     |	     	
     |
     |---reference_data (contains the referenece input and output)
     |     |--mal (contains the referenece input and output)
     |
     |---tests (contains the referenece input and output)
     |     |
     |     |--mal (contains the referenece input and output)
     |
     |---doc (documentaion files of the chunker)
     |
     |---README (How to run/use the module)
     |
     |---INSTALL (How to install in sampark directory structure)
     |
     |---ChangeLog (version inforamation)
     |
     |---Makefile (first level make file for copying the module source in sampark system)
     |
     |---Makefile.stage2 (2nd level make file for actual installation/copying in the bin and data_bin directory)
     |
     |---chunker_run.sh (to run the chunker module)
     |
     |---chunker_train.sh (to train the chunker module)
     |
     |---chunker.sh (for the use of dashboard spec file)
     |
     |---chunker.spec (individual chunker module run with dashboard)
     |
     |---ssf2tnt_pos.pl (convert ssf to tnt format)
     |
     |---convert_biotossf.pl (convert bio to ssf format)




How to Use??
------------


1. perl $setu/bin/sl/chunker/mal/common/ssf2tnt_pos.pl $1 > chunkinput_pos.tnt

2. crf_test -m $setu/data_bin/sl/chunker/mal/240K_mal_chunker.model chunkinput_pos.tnt > chunker_out.tnt

3. perl $setu/bin/sl/chunker/mal/common/convert_biotossf.pl < chunker_out.tnt



#################################
Any Quries or suggestions mail to
jisha.jayan@iiitmk.ac.in
#################################