README 3.89 KB
Newer Older
priyank's avatar
priyank committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
			Simple Parser for Hindi Version 1.1.1
			-------------------------------------

Simple Parser : 
---------------

Parsing is the process of assigning grammatical labels to each chunk/constituent in the sentence. Identification of the grammatical labels (karaka and non-karaka relations) for each word of the sentence helps many applications such as WSD, NER etc. There are a number of approaches, such as rule-based, statistics based, transformation-based etc. which are used for parsing. Here we use a rule based approach based on the Paninian Dependency Framework.

Requirements:
------------
Operating System		:    LINUX/UNIX system

Compiler/Interpreter/Librarie(s)	:    PERL and SSF API's



For installation on Linux, please refer to the file INSTALL.


Directory Structure:
--------------------

simple_parser_hin-1.1.1
     |
     |---tests (contains the developer input and output)
     |
     |---val_data (contains the reference input and output)
     |
     |---doc (documentaion files of the simple_parser_hin)
     |
     |---README (How to run/use the module)
     |
     |---INSTALL (How to install in sampark directory structure)
     |
     |---ChangeLog (version information)
     |
     |---Makefile (first level make file for copying the module source in sampark system)
     |
     |---Makefile.stage2 (second level make file for actual installation/copying in the bin and data_bin directory)
     |
     |---simple_parser_hin_run.sh (to run the simple parser module)
     |
     |---simple_parser_hin.sh (for the use of dashboard spec file)
     |
     |---simple_parser.spec (simple parser module run with dashboard)
     |


How to Use??
------------

0. Please note that the simple parser for Hindi works for SSF sentence in wx format. However, if you wish to make it work for the utf format please do the following:
	(a). Download the font convertor module.
	(b). Install the convertor module by reading the README and INSTALL files of the module.
	(c). For an SSF sentence in utf format, type the following command (Specify language in the '--slang' attribute):
	perl $setu/bin/sys/common/convertor/convertor.pl --path=$setu/bin/sys/common/convertor --stype=ssf --slang=hin -s utf -t wx < $setu/bin/sl/simple_parser/hin/tests/simple_parser1.in

	For a simple test file, type the following command:
	perl $setu/bin/sys/common/convertor/convertor.pl --path=$setu/bin/sys/common/convertor --stype=text --slang=hin -s utf -t wx < $setu/bin/sl/simple_parser/hin/tests/simple_parser1.in
	(d). This converts the utf format into wx format. Use this wx format as input to the simple parser for Hindi.
	(e). To convert the wx output to utf output, repeat point (c).


1. perl $setu/bin/sl/simple_parser/common/simple_parser.pl --path=$setu/bin/sl/simple_parser --rulefile=$setu/data_bin/sl/simple_parser/hin/rules_new_tagset.txt --logFile=$setu/bin/sl/simple_parser/simple_parser.log --input=$setu/bin/sl/simple_parser/hin/tests/simple_parser1.in --sLang=hin

   *sample input and output files are provided in tests directory.
   *keep in consideration that for the source language field, type first three letters of the name of the language. For example, if the language is Hindi then, sLang=hin

2. To use the simple parser without using a rule file:
 perl $setu/bin/sl/simple_parser/common/simple_parser.pl --path=$setu/bin/sl/simple_parser --logFile=$setu/bin/sl/simple_parser/simple_parser.log --input=$setu/bin/sl/simple_parser/hin/tests/simple_parser1.in --sLang=hin

3. To run using the shell script:
	sh $setu/bin/sl/simple_parser/hin/simple_parser_hin_run.sh $setu/bin/sl/simple_parser/hin/tests/simple_parser1.in

4. perl $setu/bin/sl/simple_parser/common/simple_parser.pl --help

    for displaying help

#################################
Authors:
	Mridul Gupta, Vineet Yadav
	LTRC
        IIIT-Hyderabad.
	
Any Queries or suggestions mail to

mridulgupta@students.iiit.ac.in, 
vineetyadav@students.iiit.ac.in

#################################