Commit 2dd0e277 authored by priyank's avatar priyank

sampark-kan-hin

parents
Pipeline #15 failed with stages
in 0 seconds

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.

25-march-15 Avinash Kumar Singh<kumaravinashsingh@gmail.com>, tara
* Version 1.0.6
(A) Analyze Modules (sl)
1. tokenizer-indic-1.9
2. morph-kan-2.5
3. postagger-crf-kan-1.5
4. chunker-crf-kan-1.3
5. pruning-1.9
6. guess-morph 1.0
7. pickonemorph-1.1
8. repair-kan-2.4.2
9. headcomputation-1.8
10. vibhakticomputation-2.3.4
11. simple-parser-kan-1.1.1
(B) Transfer Modules (sl_tl)
1. transfergrammer-2.2
2. mwe-2.4.2
3. lexicaltransfer-3.2.4
4. translitration-3.4
(C) Generation Modules(tl)
1. infinity2root 1.1
2. agreementfeature 1.4
3. interchunk-hin-1.4
4. vibhaktispliter-2.3
5. agrdistribution-hin-1.4
6. defaultsfeature-hin-1.1
7. wordgen-hin-2.1
8. post-processor 1.2
18-September-13 Avinash Kumar Singh<kumaravinashsingh@gmail.com>
* Version 1.0.5
updated modules are :--
1.tokenizer-indic-1.9
2.morph 2.4
3.guess-morph 1.0
4.failsafe 1.0
5.repair 1.0
6.transferEngine 2.2
7.infinity2root 1.1
8.agreementfeature 1.4
9.wordgenerator 1.92
10.post-processor 1.0
18-August-12 Avinash Kumar Singh<kumaravinashsingh@gmail.com>
* Version 1.0.4
pipeline have the following modules along with versions
(A) Analyze Modules (sl)
1. Tokenizer-0.9922
2. morph-kan-2.5
3. postagger-crf-kan-1.4
4. chunker-crf-kan-1.3
5. pruning-1.7
6. pickonemorph-1.1
7. headcomputation-1.6
8. vibhakticomputation-2.1
9. simple-parser-kan-1.1.1
(B) Transfer Modules (sl_tl)
1. transfergrammer-2.2
2. lexicaltransfer-3.2.4
3. translitration-3.4
(C) Generation Modules(tl)
1. vibhaktispliter-2.3
2. interchunk-hin-1.5
3. intrachunk-hin-1.1
4. agrdistribution-hin-1.2
5. defaultsfeature-hin-1.1
6. wordgen-hin-1.9
-------------------------------------------------------------------------------
Installing and usage steps of ILMT System
-------------------------------------------------------------------------------
A) How the pipeline Setup in Linux Fedora Core-8
1) untar the translation system
tar -xvzf sampark-kan-hin-1.0.6.tgz
mv sampark-kan-hin-1.0.6 sampark
2) export the environment variable setu
First of all ensure that you have set $setu environment variable to the sampark path.
if you are not set the setu environment variable Then set it by adding the following
line in your .bashrc
export setu=/home/kan2hin/sampark
if user is kan2hin and you untar translation system in HOME directory
you will have to run 'source .bashrc' to make these changes effective.
test that the path is set by 'env | grep setu'
B) How to use?
1. change path in two places in spec file.
$installation_path/sampark/bin/sys/slang_tlang/slang_tlang_setu.spec
-first change in spec file
in %GLOBAL% section, change <ENV>$setu path accordingly.
- second change in spec file
in <OUTPUT_DIR> line, change output directory path accordingly.
where $installation_path is the path where translation system is installed.
slang stand for source language, tlang stand for target language.
e.g. if user is kan2hin and installation_path is /home/kan2hin
spec file location : /home/kan2hin/sampark/bin/sys/kan_hin/kan_hin_setu.spec
1-first change : <ENV>$setu=/home/kan2hin/sampark
2-second change : <OUTPUT_DIR>/home/kan2hin/OUTPUT.tmp
2. run Dashboard command from the terminal
Dashboard.sh
if the Dashboard is running first time, it ask for setting
3. In Dashboard GUI
3.1 for Dashboard setting
set Dashboard path (/usr/share/Dashboard)
click apply, close and restart the Dashboard again to make setting effective.
Note : this step is required only for the first time of Dashboard run.
3.2 for ILMT Setting
In Setting Menu click ILMT setting
Setting->ILMT Setting
3.2.1 browse your system path
e.g.
/home/kan2hin/sampark
3.2.2 set the output directory path
e.g.
/home/kan2hin/output
3.2.3 choose source language and target language
e.g.
source language : kan
target language : hin
3.3 for translation
copy/paste input text in input pane (left pane) and click for translation .
Run -> Translation (or alternatively click on green button in tool bar)
(Note : for more usage of Dashboard. Please read the DashboardManual.pdf in doc directory)
SEG Group, ILMT-Project
IIIT-Hyderabad-500032
Any quries or suggestions mail to :
1. Mr. Pawan Kumar <hawahawai@rediffmail.com>
2. Mr. Rashid Ahmad <rashid101b@gmail.com>
3. Mr. Rambabu <rams123kl@gmail.com>
4. Mr. Avinash Kumar Singh <kumaravinashisingh@gmail.com>
05-08-2013 Kunal Sachdeva <kunal.sachdeva@students.iiit.ac.in>,<kunal.sachdeva90@gmail.com>
*Version 1.5
1. The end of sentence marker has not been kept as separate chunk(VER 1.4 was separating it into a differnt chunk).
2. NULL chunks have been removed.
24-06-2013 Kunal Sachdeva <kunal.sachdeva@students.iiit.ac.in>
*Version 1.4
1. New version of Chunker with improvemnt in accuracy.
25-11-2008 Avinesh PVS, Rashid Ahmad <avinesh@students.iiit.ac.in>,<rashid101b@gmail.com>
* Version 1.3
Problem regarding the feature structure has been solved..
sample tests in tests/probcase1.rin
20-10-2008 Avinesh PVS, Rashid Ahmad <avinesh@students.iiit.ac.in>,<rashid101b@gmail.com>
* Version 1.2
1. rename some files and directory.
2. added some tests data in tests/hin directory and srs in doc directory.
3. modify README, Makefile and Makefile.stage2
10-10-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.1
1. Modifications as per the sampark directory structure.
10-10-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.0
1. CRF based Chunker basic implementation, with Makefiles,installation
files etc.
Basic Installation
==================
1. Create a new directory with some name (eg. mkdir new_dir)
2. Update ~/.bash_profile(.bashrc file for ubuntu users) with a new enviornment variable called 'setu'
(export setu="PATH OF NEW_DIR")
3. source ~/.bash_profile
4. `cd' to the directory containing the package's source code.
5. Type `make' to copy the package source to $setu/src/sl/crf_chunker-1.5
6. `cd' to the sytem wide directory $setu/src/sl/crf_chunker-1.5
7. Type `make install' to install the programs, data files and
documentation in the sampark system wide directory.
(Refer Chapter-6 in the ilmtguidlines_0.3)
8. You can remove the program binaries and object files from the
source code directory by typing `make clean'.
For how to use chunker, please refer to the file README.
CRF CHUNKING 1.5 (CHUNKING)
--------------------------
Hindi CRF Chunker : Chunking involves identifying simple noun phrases, verb groups, adjectival
phrases,and adverbial phrases in a sentence. This involves identifying the boundary
of chunks and marking the label.
There is two part in chunker one is training(training mode) and other is testing
(perform or run mode).
In training mode, we train the chunker using the manually chunked annotated data and
generate trained model (e.g. 300k_hin_chunker.model).
In perform/run mode, we use above parameter to tag the input sentence.Both training
and testing pos tagger internally use the BIO format.
Requirements:
------------
Operating System : LINUX/UNIX system
Compiler/Interpreter/Librarie(s): Perl, SSF API's and C++ compiler (gcc 3.0 or higher)
we assumed that CRF Tool kit is installed at your system.
if CRF Tool Kit is not installed at your system then you download
and instaaled from http://crfpp.sourceforge.net/ site.
version of CRF should be 0.42 or higher.
For installation on Linux, please refer to the file INSTALL.
Directory Structure:
--------------------
chunker
|
|
|---data_bin (data bin files)
| |
| |---hin/300k_hin_chunker.model
|
|---data_src
| |
| |---hin/training_wx.txt
|
|
|
|---tests (contains the referenece input and output)
| |
| |--hin (contains the referenece input and output of hindi as source langauge)
|
|---doc (documentaion files of the chunker)
|
|---README (How to run/use the module)
|
|---INSTALL (How to install in sampark directory structure)
|
|---ChangeLog (version inforamation)
|
|---Makefile (first level make file for copying the module source in sampark system)
|
|---Makefile.stage2 (2nd level make file for actual installation/copying in the bin and data_bin directory)
|
|---chunker_run.sh (to run the chunker module)
|
|---chunker_train.sh (to train the chunker module)
|
|---chunker.sh (for the use of dashboard spec file)
|
|---chunker.spec (individual chunker module run with dashboard)
|
|---ssf2tnt_pos.pl (convert ssf to tnt format)
|
|---convert_biotossf.pl (convert bio to ssf format)
How to Use??
------------
1. perl $setu/bin/sl/chunker/common/ssf2tnt_pos.pl $1 > chunkinput_pos.tnt
2. crf_test -m $setu/data_bin/sl/chunker/hin/300k_hin_chunker.model chunkinput_pos.tnt > chunker_out.tnt
3. perl $setu/bin/sl/chunker/common/convert_biotossf.pl < chunker_out.tnt
#################################
Author: Avinesh PVS
LTRC
IIIT Hyderabad
Any Quries or suggestions mail to
avinesh@students.iiit.ac.in
(or)
avinesh.pvs@gmail.com
#################################
perl $setu/bin/sl/chunker/common/ssf2tnt_pos.pl $1 > chunkinput_pos.tnt
crf_test -m $setu/data_bin/sl/chunker/kan/chunker_kan_264K.model chunkinput_pos.tnt > chunker_out.tnt
perl $setu/bin/sl/chunker/common/convert_biotossf.pl < chunker_out.tnt
%SPEC_FILE%
#
# Generated Dashboard Specification file
#
# This file gives the specifications to run the system. It contains:
#DO NOT change the naming convention of any of the sections.
%GLOBAL%
#
# Global variables
#
# Root directory of the system
<ENV>$mt_iiit=/usr/share/Dashboard
<ENV>$setu=/home/setu/sampark
<ENV>$src=$setu/src
<ENV>$bin=$setu/bin
<ENV>$data_bin=$setu/data_bin
<ENV>$data_src=$setu/data_src
<ENV>$val_data=$setu/val_data
# Other variables used in the generation of executable
# type=int, char, char*
<VAR>$slang=tel
<VAR>$tlang=hin
<VAR>$stlang=tel_hin
# API for PERL,C language
<API lang=perl>$mt_iiit/lib/shakti_tree_api.pl
<API lang=perl>$mt_iiit/lib/feature_filter.pl
<API lang=C>$mt_iiit/c_api_v1/c_api_v1.h
# READER,PRINTER function for language PERL
<READER lang=perl>read
<PRINTER lang=perl>print_tree_file
# READER,PRINTER function for language C
<INCLUDE lang=C>stdio.h
<INCLUDE lang=C>stdlib.h
<READER lang=C>read_ssf_from_file
<PRINTER lang=C>print_tree_to_file
# Output directory for storing temporaries (relative path to current dir)
#<OUTPUT_DIR>OUTPUT.tmp
#<OUTPUT_DIR>$val_data/system/$stlang
<OUTPUT_DIR>/home/setu/module_inout/chunker.tmp
# Run in SPEED or DEBUG or USER mode
<MODE>DEBUG
#<MODE>SPEED
%SYSTEM%
# Each module should have a unique identifying name (i.e. unique value for the second column)
# -----------------------------------------------
# Source Language Analyzer Modules (SL)
# -----------------------------------------------
# chunker
1 chunker $setu/bin/sl/chunker/chunker.sh dep=<START> intype=1 lang=sh
perl $setu/bin/sl/chunker/common/ssf2tnt_pos.pl $1 > chunkinput_pos.tnt
crf_test -m $setu/data_bin/sl/chunker/kan/chunker_kan_264K.model chunkinput_pos.tnt > chunker_out.tnt
perl $setu/bin/sl/chunker/common/convert_biotossf.pl < chunker_out.tnt
perl /var/www/html/sampark/system/kan_hin/sampark/bin/sl/chunker/common/ssf2tnt_pos.pl $1 > chunkinput_pos.tnt
crf_test -m /var/www/html/sampark/system/kan_hin/sampark/data_bin/sl/chunker/kan/chunker_kan_264K.model chunkinput_pos.tnt > chunker_out.tnt
perl /var/www/html/sampark/system/kan_hin/sampark/bin/sl/chunker/common/convert_biotossf.pl < chunker_out.tnt
#! /usr/bin/perl
# Report Bugs to prashanth@research.iiit.ac.in
#
# Usage : perl convert-BIOtoSSF.pl < bio.txt > ssf.txt
#
#
my $line = "";
my $startFlag = 1;
my $wno = 1;
my $prevCTag = "";
my $error = "";
my $lno = 0;
my $sno = 1;
my $cno=0;
#scan each line from standard input
while($line = <STDIN>)
{
$lno ++;
if($line =~ /^\s*$/)
{ # start of a sentence
print "\t))\t\t\n";
print "</Sentence>\n\n";
$startFlag = 1;
$wno = 1;
$prevCTag = "";
$sno ++;
next;
}
if($startFlag == 1)
{
print "<Sentence id=\"$sno\">\n";
}
chomp($line);
my @cols = split(/\s+/,$line);
if($cols[3] =~ /^B-(\w+)/)
{
my $ctag = $1;
if($prevCTag ne "O" && $startFlag == 0)
{
print "\t))\t\t\n";
$wno++;
}
$cno++;
print "$cno\t((\t$ctag\t\n";
$wno=1;
$prevCTag = $ctag;
}
elsif($cols[3] =~ /^O/)
{
if($prevCTag ne "O" && $startFlag == 0)
{
print "\t))\t\t\n";
$wno++;
}
$prevCTag = "O";
}
if($cols[3] =~ /I-(\w+)/ )
{ # check for inconsistencies .. does not form a chunk if there r inconsistencies
my $ctag = $1;
if($ctag ne $prevCTag)
{
$error =$error . "Inconsistency of Chunk tag in I-$ctag at Line no:$lno : There is no B-$ctag to the prev. word\n";
}
}
$cols[2]=~s/___/ /g;
print "$cno.$wno\t$cols[0]\t$cols[1]\t$cols[2]\n";
$wno ++;
$startFlag = 0;
}
#!/usr/bin/perl
sub ssf2tnt_pos{
my $line;
while ($line = <>)
{
chomp($line);
if($line=~/<\/S/)
{
print "\n";
next;
}
if($line =~ /^\s*$/) # if the line has all space charcters
{
print "\n";
next;
}
$line=~s/[ ]+/___/g;
my ($att1,$att2,$att3,$att4) = split (/[\t]+/, $line);
if($att1 =~ /$\<.*/ || $att2 eq "((" || $att2 eq "))") #unwanted lines
{
next;
}
else
{
print $att2,"\t",$att3,"\t",$att4,"\n";
}
}
}
&ssf2tnt_pos;
<Sentence id="1">
1 wAjA JJ <fs af='wAjA,Avy,,,,,,'>|<fs af='wAjA,n,n,sg,,d,0,0'>|<fs af='wAjA,n,n,sg,,d,0,0'>
2 usirugalYu NN <fs af='usirugalYu,n,n,sg,,d,0,0'>|<fs af='usirugalYu,n,n,sg,,d,0,0'>|<fs af='usiru,n,n,pl,,d,0,0'>|<fs af='usiru,n,n,pl,,d,0,0'>
3 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
4 mirimiriminuguva VM <fs af='mirimiriminugu,v,any,any,any,,va,va'>
5 hallu NN <fs af='hallu,n,n,sg,,d,0,0'>|<fs af='hallu,n,n,sg,,d,0,0'>
6 nimma PRP <fs af='nInu,pn,,pl,1,o,a,a'>|<fs af='nInu,pn,,pl,1,o,a,a'>
7 vyakwiwvavannu NN <fs af='vyakwiwva,n,n,sg,,o,annu,annu'>|<fs af='vyakwiwva,n,n,sg,,o,annu,annu'>
8 beVlYaguwwaveV VM <fs af='beVlYagu,v,n,pl,3,,uww,uww'>|<fs af='beVlYagu,v,n,pl,3,,uww,uww'>
9 . SYM <fs af='.,punc,,,,,,'>
</Sentence>
<Sentence id="1">
1 (( NP
1.1 wAjA JJ <fs af='wAjA,Avy,,,,,,'>|<fs af='wAjA,n,n,sg,,d,0,0'>|<fs af='wAjA,n,n,sg,,d,0,0'>
1.2 usirugalYu NN <fs af='usirugalYu,n,n,sg,,d,0,0'>|<fs af='usirugalYu,n,n,sg,,d,0,0'>|<fs af='usiru,n,n,pl,,d,0,0'>|<fs af='usiru,n,n,pl,,d,0,0'>
))
2 (( CCP
2.1 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
))
3 (( VGF
3.1 mirimiriminuguva VM <fs af='mirimiriminugu,v,any,any,any,,va,va'>
))
4 (( NP
4.1 hallu NN <fs af='hallu,n,n,sg,,d,0,0'>|<fs af='hallu,n,n,sg,,d,0,0'>
))
5 (( NP
5.1 nimma PRP <fs af='nInu,pn,,pl,1,o,a,a'>|<fs af='nInu,pn,,pl,1,o,a,a'>
5.2 vyakwiwvavannu NN <fs af='vyakwiwva,n,n,sg,,o,annu,annu'>|<fs af='vyakwiwva,n,n,sg,,o,annu,annu'>
))
6 (( VGF
6.1 beVlYaguwwaveV VM <fs af='beVlYagu,v,n,pl,3,,uww,uww'>|<fs af='beVlYagu,v,n,pl,3,,uww,uww'>
6.2 . SYM <fs af='.,punc,,,,,,'>
))
</Sentence>
<Sentence id="1">
1 hallugalYiMxa NN <fs af='hallu,n,n,pl,,o,iMxa,iMxa'>|<fs af='hallu,n,n,pl,,o,iMxa,iMxa'>
2 nimma PRP <fs af='nInu,pn,,pl,1,o,a,a'>|<fs af='nInu,pn,,pl,1,o,a,a'>
3 AwmaviSvAsa NN <fs af='AwmaviSvAsa,n,n,sg,,d,0,0'>|<fs af='AwmaviSvAsa,n,n,sg,,d,0,0'>
4 saha RP <fs af='saha,Avy,,,,,,'>
5 heVccAguwwaxeV VM <fs af='heVccAgu,v,n,sg,3,,uww,uww'>
6 . SYM <fs af='.,punc,,,,,,'>
</Sentence>
<Sentence id="1">
1 (( NP
1.1 hallugalYiMxa NN <fs af='hallu,n,n,pl,,o,iMxa,iMxa'>|<fs af='hallu,n,n,pl,,o,iMxa,iMxa'>
))
2 (( NP
2.1 nimma PRP <fs af='nInu,pn,,pl,1,o,a,a'>|<fs af='nInu,pn,,pl,1,o,a,a'>
2.2 AwmaviSvAsa NN <fs af='AwmaviSvAsa,n,n,sg,,d,0,0'>|<fs af='AwmaviSvAsa,n,n,sg,,d,0,0'>
2.3 saha RP <fs af='saha,Avy,,,,,,'>
))
3 (( VGF
3.1 heVccAguwwaxeV VM <fs af='heVccAgu,v,n,sg,3,,uww,uww'>
3.2 . SYM <fs af='.,punc,,,,,,'>
))
</Sentence>
<Sentence id="1">
1 namma PRP <fs af='nAnu,pn,,pl,1,o,a,a'>|<fs af='nAnu,pn,,pl,1,o,a,a'>
2 vasadugalYu NN <fs af='vasadu,n,n,pl,,d,0,0'>|<fs af='vasadu,n,n,pl,,d,0,0'>
3 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
4 hallugalYa NN <fs af='hallu,n,n,pl,,o,a,a'>|<fs af='hallu,n,n,pl,,o,a,a'>
5 maxyeV NN <fs af='maxyeV,n,n,sg,,d,0,0'>|<fs af='maxyeV,n,n,sg,,d,0,0'>
6 byAktIriyAgalYu NN <fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>
7 iruwwaveV VM <fs af='iru,v,n,pl,3,,uww,uww'>
8 . SYM <fs af='.,punc,,,,,,'>
</Sentence>
<Sentence id="1">
1 (( NP
1.1 namma PRP <fs af='nAnu,pn,,pl,1,o,a,a'>|<fs af='nAnu,pn,,pl,1,o,a,a'>
1.2 vasadugalYu NN <fs af='vasadu,n,n,pl,,d,0,0'>|<fs af='vasadu,n,n,pl,,d,0,0'>
))
2 (( CCP
2.1 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
))
3 (( NP
3.1 hallugalYa NN <fs af='hallu,n,n,pl,,o,a,a'>|<fs af='hallu,n,n,pl,,o,a,a'>
3.2 maxyeV NN <fs af='maxyeV,n,n,sg,,d,0,0'>|<fs af='maxyeV,n,n,sg,,d,0,0'>
3.3 byAktIriyAgalYu NN <fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>
))
4 (( VGF
4.1 iruwwaveV VM <fs af='iru,v,n,pl,3,,uww,uww'>
4.2 . SYM <fs af='.,punc,,,,,,'>
))
</Sentence>
<Sentence id="1">
1 ivugalYu NN <fs af='ivugalYu,pn,,sg,1,o,0,0'>|<fs af='ixu,pn,,sg,1,o,0,0'>|<fs af='ixu,pn,,pl,1,o,0,0'>
2 hallugalYannu NN <fs af='hallu,n,n,pl,,o,annu,annu'>|<fs af='hallu,n,n,pl,,o,annu,annu'>
3 koVlYakAgi NN <fs af='koVlYakAgi,Avy,,,,,,'>|<fs af='koVlYaku,n,n,sg,,o,Agi,Agi'>|<fs af='koVlYaku,n,n,sg,,o,Agi,Agi'>
4 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
5 usirugalYannu NN <fs af='usirugalYu,n,n,sg,,o,annu,annu'>|<fs af='usirugalYu,n,n,sg,,o,annu,annu'>|<fs af='usiru,n,n,pl,,o,annu,annu'>|<fs af='usiru,n,n,pl,,o,annu,annu'>
6 xurgaMXayukwavAgi RB <fs af='xurgaMXayukwavAgi,unk,,,,,,'>
7 mAdibiduwwaveV VM <fs af='mAdibidu,v,n,pl,3,,uww,uww'>
8 . SYM <fs af='.,punc,,,,,,'>
</Sentence>
<Sentence id="1">
1 (( NP
1.1 ivugalYu NN <fs af='ivugalYu,pn,,sg,1,o,0,0'>|<fs af='ixu,pn,,sg,1,o,0,0'>|<fs af='ixu,pn,,pl,1,o,0,0'>
1.2 hallugalYannu NN <fs af='hallu,n,n,pl,,o,annu,annu'>|<fs af='hallu,n,n,pl,,o,annu,annu'>
))
2 (( NP
2.1 koVlYakAgi NN <fs af='koVlYakAgi,Avy,,,,,,'>|<fs af='koVlYaku,n,n,sg,,o,Agi,Agi'>|<fs af='koVlYaku,n,n,sg,,o,Agi,Agi'>
))
3 (( CCP
3.1 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
))
4 (( NP
4.1 usirugalYannu NN <fs af='usirugalYu,n,n,sg,,o,annu,annu'>|<fs af='usirugalYu,n,n,sg,,o,annu,annu'>|<fs af='usiru,n,n,pl,,o,annu,annu'>|<fs af='usiru,n,n,pl,,o,annu,annu'>
))
5 (( RBP
5.1 xurgaMXayukwavAgi RB <fs af='xurgaMXayukwavAgi,unk,,,,,,'>
))
6 (( VGF
6.1 mAdibiduwwaveV VM <fs af='mAdibidu,v,n,pl,3,,uww,uww'>
6.2 . SYM <fs af='.,punc,,,,,,'>
))
</Sentence>
<Sentence id="1">
1 namma PRP <fs af='nAnu,pn,,pl,1,o,a,a'>|<fs af='nAnu,pn,,pl,1,o,a,a'>
2 vasadugalYu NN <fs af='vasadu,n,n,pl,,d,0,0'>|<fs af='vasadu,n,n,pl,,d,0,0'>
3 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
4 hallugalYa NN <fs af='hallu,n,n,pl,,o,a,a'>|<fs af='hallu,n,n,pl,,o,a,a'>
5 maxyeV NN <fs af='maxyeV,n,n,sg,,d,0,0'>|<fs af='maxyeV,n,n,sg,,d,0,0'>
6 byAktIriyAgalYu NN <fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>
7 iruwwaveV VM <fs af='iru,v,n,pl,3,,uww,uww'>
8 . SYM <fs af='.,punc,,,,,,'>
</Sentence>
<Sentence id="1">
1 (( NP
1.1 namma PRP <fs af='nAnu,pn,,pl,1,o,a,a'>|<fs af='nAnu,pn,,pl,1,o,a,a'>
1.2 vasadugalYu NN <fs af='vasadu,n,n,pl,,d,0,0'>|<fs af='vasadu,n,n,pl,,d,0,0'>
))
2 (( CCP
2.1 mawwu CC <fs af='mawwu,Avy,,,,,,'>|<fs af='mawwu,n,n,sg,,d,0,0'>|<fs af='mawwu,n,n,sg,,d,0,0'>
))
3 (( NP
3.1 hallugalYa NN <fs af='hallu,n,n,pl,,o,a,a'>|<fs af='hallu,n,n,pl,,o,a,a'>
3.2 maxyeV NN <fs af='maxyeV,n,n,sg,,d,0,0'>|<fs af='maxyeV,n,n,sg,,d,0,0'>
3.3 byAktIriyAgalYu NN <fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>|<fs af='byAktIriyA,n,n,pl,,d,0,0'>
))
4 (( VGF
4.1 iruwwaveV VM <fs af='iru,v,n,pl,3,,uww,uww'>
4.2 . SYM <fs af='.,punc,,,,,,'>
))
</Sentence>
GUESS_MORPH 1.0
-------------------------
GUESS_MORPH (TELUGU)
---------------------
The task of guess_morph is to change the gender of the source in to target language. The input for the agreement would be the root word along with its feature structure. The guess_morph then reads the condition and delete the rest features.
Requirements:
------------
Operating System : LINUX/UNIX system
Compiler/Interpreter/Librarie(s): Perl and SSF API's
For installation on Linux, please refer to the file INSTALL.
Directory Structure:
--------------------
guess_morph
|
|---tests (contains the referenece input and output)
|
|---README (How to run/use the module)
|
|---INSTALL (How to install in sampark directory structure)
|
|---ChangeLog (version inforamation)
|
|---Makefile (first level make file for copying the module source in sampark system)
|
|---Makefile.stage2 (2nd level make file for actual installation/copying in the bin and data_bin directory)
|
|---guess_morph_run.sh (to run the module)
|
|---guessmorph_tel.sh (for dashboard spec file)
|
|---guessmorph_tel.pl (main file of telgenrator)
How to Use??
------------
1. perl $setu/bin/sl/guess_morph/guessmorph.pl guessmorph.rin
*sample input and output files are provided in the tests dir namely guess_morph.rin guess_morph.rout
#################################
Author: Christopher M
CALTS
HCU,Hyderabad
Any Quries or suggestions mail to
efthachris@gmail.com
#################################
while($line=<>)
{
chomp($line);
($id,$token,$pos,$fs)=split(/\t+/,$line);
chomp($fs);
if($pos eq "FRAGP"){
$pos=~s/FRAGP/BLK/g;
}
if($fs=~/\|/)
{
$fs=~s/<fs af=\'|\'>//g;
if(($fs) && ($fs=~/\|/))
{
($fs1,$fs2,$fs3,$fs4)=split(/\|/,$fs);
($root1,$lcat1,$g1,$n1,$p1,$c1,$tam1,$suff1)=split(/,/,$fs1);
($root2,$lcat2,$g2,$n2,$p2,$c2,$tam2,$suff2)=split(/,/,$fs2);
if(($suff1 eq "0_o")&&($suff2 eq "lo"))
{
$fs1 ="" ;
}
}
$fst1= "$id\t$token\t$pos\t<fs af='$fs2'>";
$fst2= "$id\t$token\t$pos\t<fs af='$fs1'>|<fs af='$fs2'>";
if($fs1 eq "") {
print "$fst1\n";
}
else {
print "$fst2\n";
}
}
else {
print "$id\t$token\t$pos\t$fs\n";
}
}
perl $setu/bin/sl/guess_morph/guess-morph.pl $1
perl /var/www/html/sampark/system/kan_hin/sampark/bin/sl/guess_morph/guess-morph.pl $1
1 (( FRAGP <fs af='xx,n,m,sg,2,,0_o,0_o'>|<fs af='xx,n,m,sg,3,,lo,lo'>
1.1 temp NNP <fs af='xx,n,m,sg,2,,0_o,0_o'>|<fs af='xx,n,m,sg,3,,lo,lo'>
))
1 ((
1.1 temp NNP <fs af='xx,x,m,sg,3,,,>
))
'Shakti Standard Format of Feature Structure':
E.g.: <af=ram,n,m,s,3,0,,>|<af=ram,v,m,s,3,0,,/attr1=val1/attr2=val2/attr3=<attr4=val4|<attr5=val5>>>
'Representaion of the Structure':
A Feature structure is represented by a hash.
All the attributes in the feature structure make the keys of the hash.
The value of the key is a reference to a OR node (i.e. an array).
Each field of the OR node (array) contains either reference to a hash (i.e. another feature structure)
or a value.
User when he calls read_FS($string) he gets the reference to an or node (i.e. a collection of feature structures).
OR node which can have one/more feature structures
<af=,,,,/a1=v1/a2=v2>|<af=,,,,/b1=w1/b2=w2>
$ORNode is reference to an array
$ORNode => -- --
| V1 |
| V2 | Vn = { Reference to Feature Structure (Ref to a hash) (or) string }
| .. |
| Vn |
-- --
$FS is a reference to a hash
$FS =>
-- --
| attr1 val1 |
| attr2 val2 | valn = { $ORNode }
| ... |
| attrn valn |
-- --
#% read_FS($string) --> reference to an OR node (array).
$string is the feature structure (to be loaded) in the new shakti standard format.
reference to an OR node (array) is returned.
#% read_FS_old($string) --> reference to an OR node (array).
$string is the feature structure in the old shakti standard format.
reference to an OR node (array) is returned.
#% get_values($featurePath,$FSreference) --> An OR node (array) containg the values of that attribute.
$FSreference is the reference to an OR node (array) of the feature structure.
An OR node is returned containing all possible values for that specified feature path.
#% get_values_2($featurePath,$FSreference) --> An array containg the matched values.
$FSreference is the reference to a single feature structure.
An OR node is returned containing all possible values for that specified feature path.
#% get_attributes($FSReference) -> array containing the attributes for that feature structure
$FSReference is the reference to a single feature structure (i.e. reference to a hash ).
array containing the attributes for that feature structure is returned.
#% get_path_values($attr,$fs) --> 2D array of values and paths.
$fs is the reference to an OR node (array) with one or more Feature Structure present in it.
A 2D array of values and paths is returned.
field 0 of that array contains the path.
field 1 of that array contains the corresponding value. <Changes made to the value here will not be reflected>.
#% get_path_values_2($attr,$fs) --> 2D array of values and paths.
$fs is the reference to a single feature structure. (i.e. reference to a hash)
a 2D array of values and pathss is returned
field 0 of that array contains the path.
field 1 of that array contains the corresponding value. <Changes made to the value here will not be reflected>.
#% copyFS($fs) --> Reference of a new FS
$fs is the reference to a single feature structure
A reference to a new copied feature structure is returned.
#% add_attr_val($featurePath,$value,$FSReference) --> -nil-
$FSReference is the reference to an OR node (array).
$value is a reference to an OR node (array) which will be the value of the attribute mentioned by $featurePath.
Nothing is returned.
#% add_attr_val_2($featurePath,$value,$FSReference) --> -nil-
$FSReference is the reference to a single feature structure (i.e reference to a hash).
$value and $featurePath have the same meaning as specified above.
#% update_attr_val($featurePath,$val,$FSReference) --> -nil-
$FSReference is the reference to an OR Node (array)
The value specified by the featurepath willll be changed to the new val ($val)
$val is the reference to another OR node (array).
#% update_attr_val_2($featurePath,$val,$FSReference) --> -nil-
$FSReference is the reference to a single feature structure (Reference to a hash).
The value specified by the featurepath willll be changed to the new val ($val)
$val is the reference to another OR node (array).
#% del_attr_val($featurePath,$FSReference)
$FSReference is the reference to an OR node (array).
Deletes the value of the attribute specified by $featurePath.
#% del_attr_val_2($featurePath,$FSReference)
$FSReference is the reference to a single feature structure (i.e. reference to a hash).
Deletes the value of the attribute specified by $featurePath.
#% unify($fs1,$fs2) --> $fs3;
$fs1 and $fs2 are references to two OR nodes containing one or more feature structures,
$fs3 is either -1 or a reference to a new OR node of feature structures.
-1 is returned in the case that the featurestructures cannot be unified.
Though provision has been provided for passing OR nodes as the arguments,
each OR node passed in the argument should
refer to only one feature structure.
#% unify_2($fs1,$fs2) --> $fs3;
$fs1 and $fs2 are references to two feature structures (i.e. reference to hashes),
$fs3 is either -1 or a reference to a new OR node of feature structures.
-1 is returned in the case that the featurestructures cannot be unified.
#% merge($fs1,$fs2) --> -nil-
$fs1 and $fs2 are references to OR nodes containing multiple possible feature structures.
Changes the values of $fs1 to those of $fs2 for all the common attributes of fs1 and fs2
Rest of the values of $fs1 are left untouched.
#% merge_2($fs1,$fs2) --> -nil-
$fs1 and $fs2 are references to single feature structures.
Changes the values of $fs1 to those of $fs2 for all the common attributes of fs1 and fs2
Rest of the values of $fs1 are left untouched.
#% load_hash() --> Reference to a hash
Loads the string passed into a hash
Reference to that hash is returned.
#% printFS_SSF($fs) --> -nil-
$fs is a reference to an OR node containing one or more feature structures.
prints the structure in the standard format.
#% printFS_SSF_2($fs) --> -nil-
$fs is a reference to a single feature structure (i.e. reference to a hash)
prints the structure in the standard format.
#% make_string($FSReference) --> -$string-
$FSReference is the reference to an OR node.
$string is the feature structure in the standard format.
#% make_string_2($FSReference) --> -$string-
$FSReference is the reference to a single feature structure (i.e reference to a hash).
$string is the feature structure in the standard format.
#% prune_FS($featurePath,$fieldNumber,$FSReference) --> +1/-1
$FSReference is the reference to an OR node.
Deletes the feature structure or value from the OR node (array) , which is the value of the attribute specified by
$featurePath.
+1 indicates successful completion of the function
-1 indicates that such a feature path does not exist.
#% prune_FS_2($featurePath,$fieldNumber,$FSReference) --> +1/-1
$FSReference is the reference to a single feature structure.
Deletes the feature structure or value from the OR node (array) , which is the value of the attribute specified by
$featurePath.
+1 indicates successful completion of the function
-1 indicates that such a feature path does not exist.
#% get_fs_reference($ref_to_array,$index_feature_structure)
$ref_to_array is the reference to an OR node (array).
$index_... is the field you want from that array.
#% get_num_fs($ref_to_array) --> number of feature structures in that reference passed (Or values also)
$ref_to+array is the reference to an OR node (array).
#% printFS_SSF_old($fs) --> -nil-
$fs is a reference to an OR node.
prints the feature structure in the old shakti format.
#% make_string_old($fs) --> -$string-
$fs is a reference to an OR node.
makes the feature structure in the old shakti format.
And that string is returned.
'API for SSF'
'-----------'
'REPRESENTATION'
'--------------'
'Index Num F1 F2 F3 F4'
'----- --- -- -- -- --'
(0) 11 0 (( SSF
(1) 3 1 (( NP f1
(2) 1 1.1 Ram NNP f1.1
(3) ))
(4) 6 2 (( VGADV f2
(5) 4 2.1 (( VG f2.1
(6) 1 2.1.1 is VBZ f2.1.1
(7) 1 2.1.2 playing VBG f2.1.2
(8) ))
(9) ))
(10) ))
'The above Data-structure (array) would be stored in @_TREE_'
#% Reads the file into the data-structure @_TREE_
#% &read ([$filename]) --> -nil-
#% Prints the data-structure
#% &print_tree( [$tree] ) -nil-
#% &print_node($node,[$tree]) -nil-
#% Gets the children nodes
#% &get_children( $node , [$tree] ) -> @children_nodes;
#% To get children of root, $node = 0;
#% Gets the Leaf nodes
#% &get_leaves( [$tree] ) -> @leaf_nodes;
#% Gets the Leaf nodes
#% &get_leaves_child($node, [$tree] ) -> @leaf_nodes of that node;
#% Get the nodes which have a particular field-value.
#% &get_nodes( $fieldnumber , $value , [$tree] ) -> @required_nodes
#% Get the nodes which have a particular field-value.
#% &get_nodes_pattern( $fieldnumber , $value , [$tree] ) -> @required_nodes
#% Deletes a node
#% &delete_node( $node , [$tree] )
#% Create a parent for a sequence of nodes
#% &create_parent( $node_start , $node_end , $tag , [$tree] );
#% Delete the parent but keep the children
#% &delete_layer ( $node , [$tree] )
#% Creates a new tree
#% &create_tree; -> $empty_tree;
#% Only SSF as the parent will be there.
#% &add_tree($tree, $sibling_tree, $direction(0/1), [$tree]) -> -nil-
#% &add_node ( $tree , $sibling_node , $direction (0/1) ,[$tree]) -> $index_node
#% Get's all the fields of a given leaf/node
#% &get_fields ( $node , [$tree] ) -> ($zeroth,$first,$second,$third,$fourth)
#% Get a particular field of a leaf/node
#% &get_field ( $node , $fieldnumber , [$tree] ) -> $value_of_field
#% Modify a particular field of a leaf/node
#% &modify_field( $node , $fieldnumber , $value , [$tree] )
#% Copy a node as another tree
#% &copy ( $node ) -> $tree
#% If entire tree has to be copied, $node = 0
#% Move a node to a particular place
#% &move_node( $node , $node2 , $direction , [$tree] )
#% $direction = 0 if before the sibiling, 1 if after ths sibling
#% Copy the entire tree
#% copy_tree ( [$tree] ) -> $tree2
#% Gets the parent of a node
#% &get_parent( $node , [$tree] ) -> $parent_node
#% Gets the next sibling
#% &get_next_node( $node , [$tree] ) -> $next_node
#% Gets the previous sibling
#% &get_previous_node( $node , [$tree] ) -> $previous_node
#% Adds a leaf before/after a node
#% &add_leaf( $node , $direction[0/1] , $f2 , $f3, $f4)
#% Changes Old Shakti representation to New Shakti Representation
#% &change_old_new($Tree) -> -nil-
#% Changes the new Shakti Representation to the Old Shakti Representation
#% &change_new_old($Tree) -> -nil-
9a10
>
12,14d12
< #$SSF_API = $ENV{'SSF_API'};
<
< #require "$SSF_API/feature_filter.pl";
49,50d46
< # All if loop can be change to if else and some
< #variables(like visible,flags etc) are used
129a126
> #print STDERR "End TB $pnum\n";
175,179c172,174
< # Litha Changes
< # Orignal Statement
< # open(OUT, ">tmp/sentSSF.$$") or die("could not open to write\n");
< my @sent = "";
< my $j = 0;
---
> print STDERR "$pnum-$sentnum-$cur_sent_id\n";
>
> open(OUT, ">tmp/sentSSF.$$") or die("could not open to write\n");
183,185c178
< # Litha Changes
< # Orignal Statement
< #close(OUT);
---
> close(OUT);
188,191c181
< # Litha Changes
< # Orignal Statement
< #$tRee = &read("tmp/sentSSF.$$");
< $tRee = &read(\@sent);
---
> $tRee = &read("tmp/sentSSF.$$");
193a184
> # $sentf = 0;
198a190,191
> # $pf = 0;
> # $sentnum = 0;
216,220c209
< # Litha Changes
< # Orignal Statement
< # open(OUT, ">tmp/sentSSF.$$") or die("could not open to write\n");
< my @sent = "";
< my $j = 0;
---
> open(OUT, ">tmp/sentSSF.$$") or die("could not open to write\n");
225,228c214
< # Litha Changes
< # Orignal Statement
< # print OUT "$all_lines[$i]\n";
< @sent[$j++]= "$all_lines[$i]\n";
---
> print OUT "$all_lines[$i]\n";
236,238c222
< # Litha Changes
< # Orignal Statement
< # close(OUT);
---
> close(OUT);
241,244c225
< # Litha Changes
< # Orignal Statement
< # $tRee = &read("tmp/sentSSF.$$");
< $tRee = &read(\@sent);
---
> $tRee = &read("tmp/sentSSF.$$");
304,305c285,286
< print "\ntb_num is available.But user is not providing the tb_num: $tb_no\n";
< print "sentence_id: $sent_id is present in the tb_num: $tb_no \n";
---
> print "\ntb is available.But user is not providing the tb_no:\n";
> print "\ntb_ num: $tb_no\tSentence_id: $sent_id"."\n\n";
308,309c289,300
< print "\ntb_ num: $tb_no\tSentence_id: $sent_id"."\n\n";
<
---
> if($tb_no == $pnum)
> {
> print "\ntb_ num: $tb_no\tSentence_id: $sent_id"."\n\n";
> }
> else{
> if($pnum)
> {
> print "\ntb_num is available, tb_num= $pnum";
> print "but user is not providing tb_num";
> print "\nSentence_id: $sent_id"."\n\n";
> }
> }
315c306
< if(!$paraf)
---
> if(($paraf) && ($pnum == 0))
317c308
< print "\nError : tb_num is not available.But user is providing the tb_num:\n\n";
---
> print "\nError : tb is available.But user is not providing the tb_no:\n";
319c310
< elsif(($paraf) && ($pnum == 0))
---
> elsif($paraf && $sentf)
321,322c312
< print "\nError : tb_num is available.But user is not providing the tb_num.\n";
< print "Sentence_id: $sent_id is not present in the all tb_nums: \n\n"
---
> print "Sentence_id is not present in the tb_no : $pnum\n\n";
326c316
< print "\nSentence_id: $sent_id is not present in the tb_num : $pnum\n\n";
---
> print "\nError : tb is not available.But user is providing the tb_no:\n\n";
453,454c443,444
<
< # Litha Changes up to end of this function
---
>
> # Litha Changes
474a465
> # Sriram Changes
489a481,482
> #print "<Sentence id=\"$sentcount\">\n";
> #print "<Sentence id=\"".join('==',@{$para->[0]->{'sent_Ids'}})."\">\n";
526,527c519,520
< # Litha Changes upto end of the program
< print OUT "$StoryRef->[0]->{\"first_line\"}$StoryRef->[0]->{\"second_line\"}$StoryRef->[0]->{\"third_line\"}$StoryRef->[0]->{\"meta\"}";
---
>
> print OUT "$StoryRef->[0]->{\"first_line\"}\n\n$StoryRef->[0]->{\"second_line\"}\n\n$StoryRef->[0]->{\"third_line\"}\n\n$StoryRef->[0]->{\"meta\"}\n";
532,538c525,533
< my $paras = $StoryRef->[$i];
< if($paras->[0]->{'body_visible'} == 1)
< {
< print OUT "<body>\n\n";
< }
< my $paracount = &get_paracount($paras);
< for(my $j = 1; $j <= $paracount; $j++)
---
> print OUT "<body>\n\n";
> my $para = $StoryRef->[$i];
>
> # INformation is there in $StoryRef->[1]->[$paranum]->[0]->
> my $segment = $para->[0]->{'segment'};
> my $bullet = $para->[0]->{'bullet'};
> my $lang = $para->[0]->{'language'};
>
> for(my $j = 1; $j <= $para->[0]->{'numSens'}; $j++)
540,552c535,543
< my $para = $paras->[$j];
< my $segment = $para->[0]->{'segment'};
< my $bullet = $para->[0]->{'bullet'};
< my $lang = $para->[0]->{'language'};
< if($para->[0]->{'para_visible'} == 1)
< {
< print OUT "<tb number=\"$j\" segment=\"$segment\" bullet=\"$bullet\">\n";
< }
< if($para->[0]->{'text_visible'} == 1)
< {
< print OUT "<text>\n";
< }
< for(my $k = 1; $k <= $para->[0]->{'numSens'}; $k++)
---
> my $sentences = $para->[$j];
> # print OUT "<p>\n";
>
> print OUT "<tb number=\"$j\" segment=\"$segment\" bullet=\"$bullet\">\n";
> #print OUT "<tb number=\"$j\" segment=\"yes\" bullet=\"yes\">\n";
> print OUT "<text>\n";
> #print OUT "<tb>\n";
>
> for(my $k = 1; $k <= $sentences->[0]; $k++)
555,560c546,548
< if($para->[0]->{'sent_visible'} == 1)
< {
< print OUT "<Sentence id=\"".$para->[0]->{'sent_Ids'}->[$k]."\">\n";
< close(OUT);
< }
< &print_tree_file(">>$outfile", $para->[$k]);
---
> print OUT "<Sentence id=\"$sentcount\">\n";
> close(OUT);
> &print_tree_file(">>$outfile", $sentences->[$k]);
562,565c550
< if($para->[0]->{'sent_visible'} == 1)
< {
< print OUT "</Sentence>\n";
< }
---
> print OUT "</Sentence>\n";
567,575c552,554
< if($para->[0]->{'text_visible'} == 1)
< {
< print OUT "</text>\n";
< print OUT "<foreign language=\"select\" writingsystem=\"LTR\"></foreign>\n";
< }
< if($para->[0]->{'para_visible'} == 1)
< {
< print OUT "</tb>\n";
< }
---
> print OUT "</text>\n";
> print "<foreign language=\"select\" writingsystem=\"LTR\"></foreign>\n";
> print OUT "</tb>\n";
577,580d555
< if($paras->[0]->{'body_visible'} == 1)
< {
< print OUT "</body>\n";
< }
582c557,560
< print OUT "$StoryRef->[0]->{\"last_line\"}";
---
>
> print OUT "</body>\n";
> print OUT "</document>\n";
>
630,639c608,614
< # Litha Changes
< # Orignal Statement
< # my $filename;
< # $filename=$_[0];
< # if($filename)
< # {
< # open(stdin,$filename) or die $!."\n";
< # }
< my $sent_ref;
< $sent_ref=shift;
---
> my $filename;
>
> $filename=$_[0];
> if($filename)
> {
> open(stdin,$filename) or die $!."\n";
> }
650,653c625
< # Litha Changes
< # Orignal Statement
< #while(<stdin>)
< foreach (@$sent_ref)
---
> while(<stdin>)
27-02-2013 Kunal Sachdeva <kunal.sachdeva@students.iiit.ac.in>,<kunal.sachdeva90@gmail.com>
* Version 1.8
1. Problem regarding FRAGP chunk has been removed.Test cases are in tests folder.
16-11-2012 Kunal Sachdeva <kunal.sachdeva@students.iiit.ac.in>,<kunal.sachdeva90@gmail.com>
* Version 1.7
1. Problem regarding forward slash and angular brackets has been removed.Test cases are in tests folder.
17-7-2009 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.6
1. Name computation has been seperated. As per Tree Banking Standards.
20-1-2009 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.4
1. Problems regarding NST.. Cases are in tests/error1.rin.
18-10-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.3
1. Problems regarding the Single word NP Chunks.. cases are in
tests/beng/headcomputation3/4/5.rin
22-09-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.2
1. Include Makefile for copying and installation.
2. improve the directory structure as per ilmt_guideline_0.3 except application logging and code review.
20-09-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.1
1. Bug regarding a chunk tag NP with tokens PRP and PSP.
2. Initially it was making PSP as the head of the chunk. (ERROR)
3. Now it makes the PRP as the head of the chunk.
04-07-2008 Avinesh PVS <avinesh@students.iiit.ac.in>,<avinesh.pvs@gmail.com>
* Version 1.0
1. base line version
Basic Installation
==================
1. Create a new directory with some name (eg. mkdir new_dir)
2. Update ~/.bash_profile with a new enviornment variable called 'setu'
(export setu="PATH OF NEW_DIR")
3. source ~/.bash_profile
4. Type `make' to copy the package source to $setu/src/sl/headcomputation-1.8
5. `cd' to the sytem wide directory $setu/src/sl/headcomputation-1.8
6. Type `make install' to install the programs, data files and
documentation in the sampark system wide directory.
7. You can remove the program binaries and object files from the
source code directory by typing `make clean'.
HEAD COMPUTATION 1.8 (HEAD COMPUTATION)
---------------------------------------
HEAD COMPUTATION :
------------------
Head computation is the task of computing the heads of noun and verb
groups and more importantly they provide sufficient information for
further processing of the sentence according to the Paninian Theory.
Requirements:
------------
Operating System : LINUX/UNIX system
Compiler/Interpreter/Librarie(s): Perl and SSF API's
For installation on Linux, please refer to the file INSTALL.
Directory Structure:
--------------------
headcomputation
|
|---src (functional source code of the headcomputation)
|
|---API (SSF API's)
|
|---tests (contains the referenece input and output)
|
|---doc (documentaion files of the headcomputation)
|
|---README (How to run/use the module)
|
|---INSTALL (How to install in sampark directory structure)
|
|---ChangeLog (version inforamation)
|
|---Makefile (first level make file for copying the module source in sampark system)
|
|---Makefile.stage2 (2nd level make file for actual installation/copying in the bin and data_bin directory)
|
|---headcomputation_run.sh (to run the headcomputation module)
|
|---headcomputation.sh (for the use of dashboard spec file)
|
|---headcomputation.spec (individual headcomputation module run with dashboard)
|
|---headcomputation.pl (main file of headcomputation)
#################################
Author: Avinesh PVS
LTRC
IIIT Hyderabad
Any Quries or suggestions mail to
avinesh@students.iiit.ac.in avinesh.pvs@gmail.com
#################################
HEAD COMPUTATION 1.8 (HEAD COMPUTATION)
---------------------------------------
HEAD COMPUTATION :
------------------
Head computation is the task of computing the heads of noun and verb
groups and more importantly they provide sufficient information for
further processing of the sentence according to the Paninian Theory.
How to Use??
------------
1. perl $setu/bin/sl/headcomputation/headcomputation.pl --path=$setu/bin/sl/headcomputation --input=$setu/bin/sl/headcomputation/tests/headcomputation.rin
*sample input and output files are provided in the tests dir namely headcomputation.rin headcomputation.rout
2. perl $setu/bin/sl/headcomputation/headcomputation.pl --help
for display the help
To INSTALL
-----------
To Install please check INSTALL file.
#################################
Author: Avinesh PVS
LTRC
IIIT Hyderabad
Any Quries or suggestions mail to
avinesh@students.iiit.ac.in avinesh.pvs@gmail.com
#################################
#!/usr/bin/perl
use Getopt::Long;
GetOptions("help!"=>\$help,"path=s"=>\$head_home,"input=s"=>\$input,"output=s",\$output);
print "Unprocessed by Getopt::Long\n" if $ARGV[0];
foreach (@ARGV) {
print "$_\n";
exit(0);
}
if($help eq 1)
{
print "Head Computation - Head Computation Version 1.8\n(30th May 2009)\n\n";
print "usage : ./run-headCompute.pl --path=/home/headComputation-1.8 [-i inputfile|--input=\"input_file\"] [-o outputfile|--output=\"output_file\"] \n";
print "\tIf the output file is not mentioned then the output will be printed to STDOUT\n";
exit(0);
}
if($head_home eq "")
{
print "Please Specify the Path as defined in --help\n";
exit(0);
}
my $src=$head_home . "/src";
require "$head_home/API/shakti_tree_api.pl";
require "$head_home/API/feature_filter.pl";
require "$src/copy_np_head.pl";
require "$src/copy_vg_head.pl";
require "$src/single_quote_changeName-0.1.pl";
if ($input eq "")
{
$input="/dev/stdin";
}
&read_story($input);
$numBody = &get_bodycount();
for(my($bodyNum)=1;$bodyNum<=$numBody;$bodyNum++)
{
$body = &get_body($bodyNum,$body);
# Count the number of Paragraphs in the story
my($numPara) = &get_paracount($body);
#print STDERR "Paras : $numPara\n";
# Iterate through paragraphs in the story
for(my($i)=1;$i<=$numPara;$i++)
{
my($para);
# Read Paragraph
$para = &get_para($i);
# Count the number of sentences in this paragraph
my($numSent) = &get_sentcount($para);
# print STDERR "\n $i no.of sent $numSent";
#print STDERR "Para Number $i, Num Sentences $numSent\n";
#print $numSent."\n";
# Iterate through sentences in the paragraph
for(my($j)=1;$j<=$numSent;$j++)
{
#print " ... Processing sent $j\n";
# Read the sentence which is in SSF format
my($sent) = &get_sent($para,$j);
#print STDERR "$sent";
# print "check--\n";
# &print_tree($sent);
# Get the nodes of the sentence (words in our case)
#Copy NP head
# &AddID($sent);
&make_chunk_name($sent);
&copy_np_head($sent,$head_home);
#Copy NP VG head
&copy_vg_head($sent,$head_home);
}
}
}
if($output eq "")
{
&printstory();
}
if($output ne "")
{
&printstory_file("$output");
}
perl $setu/bin/sys/common/printinput.pl $1 > headcomputationinput
perl $setu/bin/sl/headcomputation/headcomputation.pl --path=$setu/bin/sl/headcomputation --input=headcomputationinput
%SPEC_FILE%
#
# Generated Dashboard Specification file
#
# This file gives the specifications to run the system. It contains:
#DO NOT change the naming convention of any of the sections.
%GLOBAL%
#
# Global variables
#
# Root directory of the system
<ENV>$mt_iiit=/usr/share/Dashboard
#<ENV>$setu=/home/setu/sampark
<ENV>$setu=/home/setu/ilmtmodules/modules/headcomputation/headcomputation-1.8/setu
<ENV>$src=$setu/src
<ENV>$bin=$setu/bin
<ENV>$data_bin=$setu/data_bin
<ENV>$data_src=$setu/data_src
<ENV>$val_data=$setu/val_data
# Other variables used in the generation of executable
# type=int, char, char*
<VAR>$slang=tel
<VAR>$tlang=hin
<VAR>$stlang=tel_hin
# API for PERL,C language
<API lang=perl>$mt_iiit/lib/shakti_tree_api.pl
<API lang=perl>$mt_iiit/lib/feature_filter.pl
<API lang=C>$mt_iiit/c_api_v1/c_api_v1.h
# READER,PRINTER function for language PERL
<READER lang=perl>read
<PRINTER lang=perl>print_tree_file
# READER,PRINTER function for language C
<INCLUDE lang=C>stdio.h
<INCLUDE lang=C>stdlib.h
<READER lang=C>read_ssf_from_file
<PRINTER lang=C>print_tree_to_file
# Output directory for storing temporaries (relative path to current dir)
#<OUTPUT_DIR>OUTPUT.tmp
#<OUTPUT_DIR>$val_data/system/$stlang
<OUTPUT_DIR>/home/setu/module_inout/prune.tmp
# Run in SPEED or DEBUG or USER mode
<MODE>DEBUG
#<MODE>SPEED
%SYSTEM%
# Each module should have a unique identifying name (i.e. unique value for the second column)
# -----------------------------------------------
# Source Language Analyzer Modules (SL)
# -----------------------------------------------
# Prunning
1 headcomputation $setu/bin/sl/headcomputation/headcomputation.sh dep=<START> intype=1 lang=sh
perl $setu/bin/sl/headcomputation/headcomputation.pl --path=$setu/bin/sl/headcomputation -i $1
perl /var/www/html/sampark/system/kan_hin/sampark/bin/sys/common/printinput.pl $1 > headcomputationinput
perl /var/www/html/sampark/system/kan_hin/sampark/bin/sl/headcomputation/headcomputation.pl --path=/var/www/html/sampark/system/kan_hin/sampark/bin/sl/headcomputation --input=headcomputationinput
##!/usr/bin/perl
# For the details please see get_head.pl
sub copy_np_head
{
my $sent=@_[0];
my $vibh_home=@_[1];
my $src=$vibh_home . "/src";
require "$vibh_home/API/shakti_tree_api.pl";
require "$vibh_home/API/feature_filter.pl";
require "$src/get_head_np.pl";
&copy_head_np("NP",$sent,$vibh_home);
&copy_head_np("JJP",$sent,$vibh_home);
&copy_head_np("CCP",$sent,$vibh_home);
&copy_head_np("RBP",$sent,$vibh_home);
&copy_head_np("BLK",$sent,$vibh_home);
&copy_head_np("NEGP",$sent,$vibh_home);
&copy_head_np("FRAGP",$sent,$vibh_home);
&copy_head_np("NULL__CCP",$sent,$vibh_home);
&copy_head_np("NULL__NP",$sent,$vibh_home);
#&print_tree();
} #End of Sub
1;
#!/usr/bin/perl
#for details please check get_head.pl
sub copy_vg_head
{
my $sent=@_[0];
my $vibh_home = @_[1];
my $src=$vibh_home . "/src";
require "$vibh_home/API/shakti_tree_api.pl";
require "$vibh_home/API/feature_filter.pl";
require "$src/get_head_vg.pl";
&copy_head_vg("VGF",$sent,$vibh_home);
&copy_head_vg("VGNF",$sent,$vibh_home);
&copy_head_vg("VGINF",$sent,$vibh_home);
&copy_head_vg("VGNN",$sent,$vibh_home);
&copy_head_vg("NULL__VGNN",$sent,$vibh_home);
&copy_head_vg("NULL__VGF",$sent,$vibh_home);
&copy_head_vg("NULL__VGNF",$sent,$vibh_home);
}
1;
#!/usr/bin/perl
sub copy_head_np
{
my ($pos_tag)=$_[0]; #array which contains all the POS tags
my ($sent)=$_[1]; #array in which each line of input is stored
my $vibh_home = $_[2]; #stores the path
my %hash=();
if($pos_tag =~ /^NP/)
{
$match = "NN"; #Modified in version 1.4
#For NST
}
if($pos_tag =~ /^V/ )
{
$match = "V";
}
if($pos_tag =~ /^JJP/ )
{
$match = "J";
}
if($pos_tag =~ /^CCP/ )
{
$match = "CC";
}
if($pos_tag =~ /^RBP/ )
{
$match = "RB";
}
my @np_nodes = &get_nodes(3,$pos_tag,$sent);#gives the nodes at which each pos_tag tag matches(index of chunk start)
for($i=$#np_nodes;$i>=0;$i--)
{
my (@childs)=&get_children($np_nodes[$i],$sent);#gives the nodes(index) at which childs(words in a chunk) are found
$j = $#childs;
while($j >= 0)
{
#$f1=node id in decreasing order
#$f2=tokens(words) in dec order
#$f3=word tags
#$f4=feature structure
# print "$childs[$j]"."\n"; "--"."@sent"."\n";
my($f0,$f1,$f2,$f3,$f4)=&get_fields($childs[$j],$sent);
$word=$f2;
# print "--".$f4,"---\n";
$f4=~s/\//&sl/;
my ($x,$f4)=split(/</,$f4);
my ($f4,$x)=split(/>/,$f4);
$f4=~s/</&angO/;
$f4=~s/>/&angC/;
$f4="<".$f4.">";
# print "3 start head>>".$f4."<<\n";
my $fs_ref = &read_FS($f4);
# print "3 end head\n";
my @name_val = &get_values("name", $fs_ref);
#print "$word"."\n";
if($f3 eq "PRP") ##to make sure that the pronouns are identified correctly
{
$f3 = "NN";
}
if($f3 eq "WQ") ##to make sure that the pronouns are identified correctly
{
$f3 = "NN";
}
if($f3=~/^$match/)
{
if($hash{$f2} eq "")
{
$hash{$word}=1;
}
elsif($hash{$f2} ne "")
{
$hash{$word}=$hash{$word}+1;
}
$id=$hash{$word};
my ($x,$y)=split(/>/,$f4);
$x =~ s/ name=[^ >]+//;
if($id==1)
{
$att_val="$word";
}
elsif($id!=1)
{
$att_val="$word"."_"."$id";
}
#$new_fs = $x." head=\"$name_val[0]\">";
$new_fs = $x." head=$name_val[0]>";
#my $new_head_fs=$x." name=\"$att_val\">";
#&modify_field($childs[$j],4,$new_head_fs,$sent);
last;
}
elsif($j == 0)
{
my($f0,$f1,$f2,$f3,$f4)=&get_fields($childs[$#childs],$sent);
#-----------------modifications to handle PRP and PSP case------------------
$change=$#childs;
$f4=~s/\//&sl/;
my ($x,$f4)=split(/</,$f4);
my ($f4,$x)=split(/>/,$f4);
$f4=~s/</&angO/;
$f4=~s/>/&angC/;
$f4="<".$f4.">";
while(1)
{
if($f3 eq "PSP" or $f3 eq "PRP")
{
$change=$change-1;
if($childs[$change] eq "") ##Modifications per Version 1.3
{ ##To handle NP chunks with single PSP
$change=$change+1; ##
last; ##
}
($f0,$f1,$f2,$f3,$f4)=&get_fields($childs[$change],$sent);
}
else
{
last;
}
}
$new_fs = $f4;
$word=$f2;
my $fs_ref = &read_FS($f4);
my @name_val = &get_values("name", $fs_ref);
if($hash{$f2} eq "")
{
$hash{$word}=1;
}
elsif($hash{$f2} ne "")
{
$hash{$word}=$hash{$word}+1;
}
$id=$hash{$word};
#--------------------------------------------------------------------------------
my ($x,$y)=split(/>/,$f4);
$x =~ s/ name=[^ >]+//;
if($id==1)
{
$att_val="$word";
}
elsif($id!=1)
{
$att_val="$word"."_"."$id";
}
#$new_fs = $x." head=\"$name_val[0]\">";
$new_fs = $x." head=$name_val[0]>";
#my $new_head_fs=$x." name=\"$att_val\">";
#&modify_field($childs[$change],4,$new_head_fs,$sent);
}
$j--;
}
($f0,$f1,$f2,$f3,$f4) = &get_fields($np_nodes[$i],$sent);
if($f4 eq '')
{
##print "1check ---$new_fs\n";
&modify_field($np_nodes[$i],4,$new_fs,$sent);
($f0,$f1,$f2,$f3,$f4) = &get_fields($np_nodes[$i],$sent);
$fs_ptr = &read_FS($f4,$sent);
#print "---x--$x\n";
#&add_attr_val("name",$head_att_val,$fs_ptr,$sent);
($f0,$f1,$f2,$f3,$f4) = &get_fields($np_nodes[$i],$sent);
#print "2check ---$f4\n";
}
else
{
$fs_ptr = &read_FS($f4,$sent);
$new_fs_ptr = &read_FS($new_fs,$sent);
&merge($fs_ptr,$new_fs_ptr,$sent);
$fs_string = &make_string($fs_ptr);
&modify_field($np_nodes[$i],4,$fs_string,$sent);
($f0,$f1,$f2,$f3,$f4) = &get_fields($np_nodes[$i],$sent);
$fs_ptr = &read_FS($f4,$sent);
#&add_attr_val("name",$head_att_val,$fs_ptr,$sent);
#&modify_field($np_nodes[$i], 4, $head_att_val,$sent);
}
}
#print "hiii--\n"
#&print_tree();
#print "hiii\n";
}
1;
#!/usr/bin/perl
#&AddID($ARGV[0]);
sub copy_head_vg
{
my($pos_tag) = $_[0]; #array which contains all the POS tags
my($sent) = $_[1]; #array in which each line of input is stored
my $vibh_home = $_[2]; #stores the path
require "$vibh_home/API/shakti_tree_api.pl";
require "$vibh_home/API/feature_filter.pl";
my %hash=();
if($pos_tag =~ /^NP/)
{
$match = "N";
}
if($pos_tag =~ /^V/ )
{
$match = "V";
}
if($pos_tag =~ /^JJP/ )
{
$match = "J";
}
if($pos_tag =~ /^CCP/ )
{
$match = "CC";
}
if($pos_tag =~ /^RBP/ )
{
$match = "RB";
}
@np_nodes = &get_nodes(3,$pos_tag,$sent);
for($i=$#np_nodes; $i>=0; $i--)
{
my(@childs) = &get_children($np_nodes[$i],$sent);
$j = 0;
while($j <= $#childs)
{
#$f1=node id in decreasing order
#$f2=tokens(words) in dec order
#$f3=word tags
#$f4=feature structure
my($f0,$f1,$f2,$f3,$f4) = &get_fields($childs[$j],$sent);
$word=$f2;
$f4=~s/\//&sl/;
my ($x,$f4)=split(/</,$f4);
my ($f4,$x)=split(/>/,$f4);
$f4=~s/</&angO/;
$f4=~s/>/&angC/;
$f4="<".$f4.">";
if($f3 =~ /^$match/)
{
$new_fs = $f4;
my $fs_ref = &read_FS($f4); #feature structure is sent to function where all the categories are dealt
my @name_val = &get_values("name", $fs_ref);
if($hash{$f2} eq "")
{
$hash{$word}=1;
}
elsif($hash{$f2} ne "")
{
$hash{$word}=$hash{$word}+1;
}
$id=$hash{$word};
my ($x,$y)=split(/>/,$f4);
$x =~ s/ name=[^ >]+//;
if($id==1)
{
$att_val="$word";
}
elsif($id!=1)
{
$att_val="$word"."_"."$id";
}
#$new_fs = $x." head=\"$name_val[0]\">";
$new_fs = $x." head=$name_val[0]>";
#my $new_head_fs=$x." name=\"$att_val\">";
#&modify_field($childs[$j],4,$new_fs,$sent);
last;
}
elsif($j == 0)
{
my($f0,$f1,$f2,$f3,$f4) = &get_fields($childs[$#childs],$sent);
$word=$f2;
$f4=~s/\//&sl/;
my ($x,$f4)=split(/</,$f4);
my ($f4,$x)=split(/>/,$f4);
$f4=~s/</&angO/;
$f4=~s/>/&angC/;
$f4="<".$f4.">";
my $fs_ref = &read_FS($f4);
my @name_val = &get_values("name", $fs_ref);
if($hash{$f2} eq "")
{
$hash{$word}=1;
}
elsif($hash{$f2} ne "")
{
$hash{$word}=$hash{$word}+1;
}
$id=$hash{$word};
my ($x,$y)=split(/>/,$f4);
$x =~ s/ name=[^ >]+//;
if($id==1)
{
$att_val="$word";
}
elsif($id!=1)
{
$att_val="$word"."_"."$id";
}
#$new_fs = $x." head=\"$name_val[0]\">";
$new_fs = $x." head=$name_val[0]>";
#my $new_head_fs=$x." name=\"$att_val\">";
#&modify_field($childs[$#childs],4,$new_fs,$sent);
}
$j++;
}
($f0,$f1,$f2,$f3,$f4) = &get_fields($np_nodes[$i],$sent);
if($f4 eq '')
{
&modify_field($np_nodes[$i],4,$new_fs,$sent);
}
else
{
$fs_ptr = &read_FS($f4,$sent);
$new_fs_ptr = &read_FS($new_fs,$sent);
&merge($fs_ptr,$new_fs_ptr,$sent);
$fs_string = &make_string($fs_ptr,$sent);
&modify_field($np_nodes[$i],4,$fs_string,$sent);
}
}
}
1;
#!/usr/bin/perl
#use strict;
sub make_chunk_name()
{
my($i, @leaves, $new_fs, @tree, $line, $string, $file, @lines, @string2, $string_ref1, $string1, $string_name);
$input = $_[0];
my %hash_index;
my %hash_chunk;
my @final_tree;
#&read_story($input);
my @tree = &get_children(0, $input);
my $ssf_string = &get_field($tree[0], 3, $input);
if($ssf_string eq "SSF")
{
@final_tree = &get_children(1, $input);
}
else
{
@final_tree = @tree;
}
my $k, $index=0, $count=0, $index_chunk=0;
@tree = &get_children($s,$input);
foreach $i(@final_tree)
{
$string = &get_field($i, 4,$input);
@leaves = &get_children($i,$input);
my $string_fs = &read_FS($string, $input);
foreach $m(@leaves)
{
$string1 = &get_field($m, 4,$input);
$string_fs1 = &read_FS($string1, $input);
$new_fs = &make_string($string_fs1, $input);
&modify_field($m, 4, $new_fs, $input);
}
}
foreach $i(@final_tree)
{
my $count_chunk=0;
$index_chunk++;
$string = &get_field($i, 4, $input);
$string_fs = &read_FS($string, $input);
my @old_value_name = &get_values("name", $string_fs, $input);
#print @old_value_name,"\n";
if($old_value_name[0]=~/\'/ or $old_drel[0]=~/\"/)
{
$old_value_name[0]=~s/\'//g;
$old_value_name[0]=~s/\"//g;
}
my @chunk = &get_field($i, 3, $input);
for ($ite1=1; $ite1<$index_chunk; $ite1++)
{
my $actual_chunk_name = $hash_chunk{$ite1};
my @chunk_name_split = split(/__/, $actual_chunk_name);
if($chunk_name_split[0] eq $chunk[0])
{
$count_chunk++;
}
}
my @chunk1;
if($count_chunk == 0)
{
$hash_chunk{$index_chunk} = "$chunk[0]"."__1";
$chunk1[0] = $chunk[0];
}
else
{
$new_count_chunk = $count_chunk+1;
$chunk1[0] = "$chunk[0]"."$new_count_chunk";
$hash_chunk{$index_chunk} = "$chunk[0]"."__$new_count_chunk";
}
foreach $m_drel(@final_tree)
{
my $string_child = &get_field($m_drel, 4, $input);
my $string_fs_child = &read_FS($string_child, $input);
my @old_drel = &get_values("drel", $string_fs_child, $input);
my @old_dmrel = &get_values("dmrel", $string_fs_child, $input);
my @old_reftype = &get_values("reftype", $string_fs_child, $input);
my @old_coref = &get_values("coref", $string_fs_child, $input);
#my @old_attr = &get_attributes($string_fs_child, $input);
if($old_drel[0]=~/\'/ or $old_drel[0]=~/\"/)
{
$old_drel[0]=~s/\'//g;
$old_drel[0]=~s/\"//g;
}
if($old_dmrel[0]=~/\'/ or $old_dmrel[0]=~/\"/)
{
$old_dmrel[0]=~s/\'//g;
$old_dmrel[0]=~s/\"//g;
}
if($old_reftype[0]=~/\'/ or $old_reftype[0]=~/\"/)
{
$old_reftype[0]=~s/\'//g;
$old_reftype[0]=~s/\"//g;
}
if($old_coref[0]=~/\'/ or $old_coref[0]=~/\"/)
{
$old_coref[0]=~s/\'//g;
$old_coref[0]=~s/\"//g;
}
my @old_drel_name = split(/:/, $old_drel[0]);
my @old_dmrel_name = split(/:/, $old_dmrel[0]);
my @old_reftype_name = split(/:/, $old_reftype[0]);
my @old_coref_name = split(/:/, $old_coref[0]);
if(($old_drel_name[1] eq $old_value_name[0]) && ($old_drel_name[1] ne ""))
{
my @new_drel;
$new_drel[0] = "$old_drel_name[0]:$chunk1[0]";
&del_attr_val("drel", $string_fs_child, $input);
# &add_attr_val("drel", \@new_drel, $string_fs_child, $input);
}
if(($old_dmrel_name[1] eq $old_value_name[0]) && ($old_dmrel_name[1] ne ""))
{
my @new_dmrel;
$new_dmrel[0] = "$old_dmrel_name[0]:$chunk1[0]";
&del_attr_val("dmrel", $string_fs_child, $input);
# &add_attr_val("dmrel", \@new_dmrel, $string_fs_child, $input);
}
if(($old_reftype_name[1] eq $old_value_name[0]) && ($old_reftype_name[1] ne ""))
{
my @new_reftype;
$new_reftype[0] = "$old_reftype_name[0]:$chunk1[0]";
&del_attr_val("reftype", $string_fs_child, $input);
# &add_attr_val("reftype", \@new_reftype, $string_fs_child, $input);
}
if(($old_coref_name[0] eq $old_value_name[0]) && ($old_coref_name[0] ne ""))
{
my @new_coref;
$new_coref[0] = $chunk1[0];
&del_attr_val("coref", $string_fs_child, $input);
# &add_attr_val("coref", \@new_coref, $string_fs_child, $input);
}
# my $name_attribute_chunk = &make_string($string_fs_child, $input);
# &modify_field($m_drel, 4, $name_attribute_chunk, $input);
}
&del_attr_val("name", $string_fs, $input);
# &add_attr_val("name", \@chunk1, $string_fs, $input);
# my $name_fs_chunk = &make_string($string_fs, $input);
# &modify_field($i, 4, $name_fs_chunk, $input);
my $string1 = &get_field($i, 4, $input);
my $attr = &read_FS($string1, $input);
#my @attribute_array = &get_attributes($attr, $input);
#$count=@attribute_array;
#print $count, "\n";
}
foreach $i(@final_tree)
{
$string = &get_field($i, 4, $input);
@leaves = &get_children($i, $input);
foreach $m(@leaves)
{
$count=0;
$index++;
$string2 = &get_field($m, 4, $input);
$string_fs2 = &read_FS($string2, $input);
my @token = &get_field($m, 2, $input);
for ($ite=1; $ite<$index; $ite++)
{
my $actual_name = $hash_index{$ite};
my @name_split = split(/__/, $actual_name);
if($name_split[0] eq $token[0])
{
$count++;
}
}
if($count == 0)
{
my @token1;
$token1[0] = $token[0];
&del_attr_val("name", $string_fs2, $input);
&add_attr_val("name", \@token1, $string_fs2, $input);
my $name_fs = &make_string($string_fs2, $input);
&modify_field($m, 4, $name_fs,$input);
$hash_index{$index} = "$token[0]"."__1";
}
else
{
$new_count = $count+1;
my @new_token = "$token[0]"."$new_count";
&del_attr_val("name", $string_fs2, $input);
&add_attr_val("name", \@new_token, $string_fs2,$input);
my $name_fs = &make_string($string_fs2,$input);
&modify_field($m, 4, $name_fs, $input);
$hash_index{$index} = "$token[0]"."__$new_count";
}
}
}
}
1;
<Sentence id="11">
1 (( NP
1.1 hamalAvara NN <fs af='hamalAvara,n,m,pl,3,d,0,0' posn='10' name='हमलावर'>
))
2 (( NP
2.1 galI NN <fs af='galI,n,f,sg,3,o,0,0' posn='20' name='गली'>
2.2 meM PSP <fs af='meM,psp,,,,,,' posn='30' name='में'>
))
3 (( NP
3.1 motarasAikileM NN <fs af='motarasAikila,n,f,pl,3,d,0,0' posn='40' name='मोटरसाइकिलें'>
))
4 (( JJP
4.1 KadZI JJ <fs af='KadZA,adj,f,pl,,,,' posn='50' name='खड़ी'>
))
5 (( VGNF
5.1 kara VM <fs af='kara,v,any,any,any,,0,0' posn='60' name='कर'>
))
6 (( NP
6.1 raNabIra NN <fs af='raNabIra,n,m,sg,3,o,0,0' posn='70' name='रणबीर'>
))
7 (( NP
7.1 ( SYM <fs af=',punc,,,,,,' posn='80' name='('>
7.2 48 QC <fs af='48,num,any,any,,any,,' posn='90' name='४८'>
7.3 ) SYM <fs af=',punc,,,,,,' posn='100' name=')'>
))
8 (( FRAGP
8.1 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='110' name='के'>
))
9 (( NP
9.1 Gara NN <fs af='Gara,n,m,sg,3,o,0,0' posn='120' name='घर'>
9.2 meM PSP <fs af='meM,psp,,,,,,' posn='130' name='में2'>
))
10 (( VGF
10.1 Gusa VM <fs af='Gusa,v,any,any,any,,0,0' posn='140' name='घुस'>
10.2 gae VAUX <fs af='jA,v,m,pl,any,,yA1,yA1' posn='150' name='गए'>
))
11 (( BLK
11.1 . SYM <fs af='।,punc,,,,,,' posn='160' name='।'>
))
</Sentence>
<Sentence id="2">
1 (( NP
1.1 unhoMne PRP <fs af='vaha,pn,any,sg,3h,o,ne,ne' posn='10' name='उन्होंने'>
))
2 (( VGF
2.1 kahA VM <fs af='kaha,v,m,sg,any,,yA,yA' posn='20' name='कहा'>
))
3 (( CCP
3.1 ki CC <fs af='ki,avy,,,,,,' posn='30' name='कि'>
))
4 (( CCP
4.1 agara CC <fs af='agara,avy,,,,,,' posn='40' name='अगर'>
))
5 (( NP
5.1 sarakAra NN <fs af='sarakAra,n,f,sg,3,d,0,0' posn='50' name='सरकार'>
))
6 (( NP
6.1 eka QC <fs af='eka,num,any,any,,any,,' posn='60' name='एक'>
6.2 alaga JJ <fs af='alaga,adj,any,any,,o,,' posn='70' name='अलग'>
6.3 rAjya NN <fs af='rAjya,n,m,sg,3,o,0,0' posn='80' name='राज्य'>
))
7 (( NP
7.1 ( SYM <fs af='(,punc,,,,,,' posn='90' name='('>
7.2 goraKAlEMda NN <fs af='goraKAlEMda,n,m,sg,3,o,0,0' posn='100' name='गोरखालैंड'>
7.3 ) SYM <fs af='),punc,,,,,,' posn='110' name=')'>
))
8 (( FRAGP
8.1 kI PSP <fs af='kA,psp,f,sg,,o,,' posn='120' name='की'>
))
9 (( NP
9.1 unakI PRP <fs af='vaha,pn,f,sg,3h,o,kA,kA' posn='130' name='उनकी'>
))
10 (( NP
10.1 asalI JJ <fs af='asalI,adj,any,any,,o,,' posn='140' name='असली'>
10.2 mAMga NN <fs af='mAMga,n,f,sg,3,o,0,0' posn='150' name='मांग'>
10.3 ko PSP <fs af='ko,psp,,,,,,' posn='160' name='को'>
))
11 (( VGNN
11.1 mAnane VM <fs af='mAna,v,any,any,any,o,nA,nA' posn='170' name='मानने'>
11.2 ko PSP <fs af='ko,psp,,,,,,' posn='180' name='को2'>
))
12 (( JJP
12.1 wEyAra JJ <fs af='wEyAra,adj,any,any,,,,' posn='190' name='तैयार'>
))
13 (( VGF
13.1 nahIM NEG <fs af='nahIM,avy,,,,,,' posn='200' name='नहीं'>
13.2 hE VM <fs af='hE,v,any,sg,3,,hE,hE' posn='210' name='है'>
))
14 (( CCP
14.1 wo CC <fs af='wo,avy,,,,,,' posn='220' name='तो'>
))
15 (( RBP
15.1 Pira RB <fs af='Pira,adv,,,,,,' posn='230' name='फिर'>
))
16 (( NP
16.1 unheM PRP <fs af='vaha,pn,any,sg,3h,o,ko,ko' posn='240' name='उन्हें'>
))
17 (( NP
17.1 parvawIya JJ <fs af='parvawIya,adj,any,any,,o,,' posn='250' name='पर्वतीय'>
17.2 kRewra NN <fs af='kRewra,n,m,sg,3,o,0,0' posn='260' name='क्षेत्र'>
))
18 (( NP
18.1 xArjiliMga NN <fs af='xArjiliMga,n,m,sg,3,o,0,0' posn='270' name='दार्जिलिंग'>
18.2 ko PSP <fs af='ko,psp,,,,,,' posn='280' name='को3'>
))
19 (( NP
19.1 janajAwIya JJ <fs af='janajAwIya,adj,any,any,,d,,' posn='290' name='जनजातीय'>
19.2 xarjA NN <fs af='xarjA,n,m,sg,3,d,0,0' posn='300' name='दर्जा'>
))
20 (( VGF
20.1 xenA VM <fs af='xe,v,m,sg,any,,nA,nA' posn='310' name='देना'>
20.2 hogA VAUX <fs af='ho,v,m,sg,3,,gA,gA' posn='320' name='होगा'>
))
21 (( BLK
21.1 . SYM <fs af='।,punc,,,,,,' posn='330' name='।'>
))
</Sentence>
<Sentence id="6">
1 (( NP
1.1 GIsiMga NN <fs af='GIsiMga,n,m,sg,3,o,0,0' posn='10' name='घीसिंग'>
1.2 ne PSP <fs af='ne,psp,,,,,,' posn='20' name='ने'>
))
2 (( NP
2.1 saMviXAna NN <fs af='saMviXAna,n,m,sg,3,o,0,0' posn='30' name='संविधान'>
2.2 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='40' name='के'>
))
3 (( NP
3.1 anucCexa NN <fs af='anucCexa,n,m,sg,3,d,0,0' posn='50' name='अनुच्छेद'>
3.2 371 NN <fs af='371,n,m,sg,3,o,0,0' posn='60' name='३७१'>
))
4 (( NP
4.1 ( SYM <fs af='(,punc,,,,,,' posn='70' name='('>
4.2 je NN <fs af='je,n,m,sg,3,o,0,0' posn='80' name='जे'>
4.3 ) SYM <fs af='),punc,,,,,,' posn='90' name=')'>
))
5 (( FRAGP
5.1 kA PSP <fs af='kA,psp,m,sg,,d,,' posn='100' name='का'>
))
6 (( NP
6.1 viroXa NN <fs af='viroXa,n,m,sg,3,d,0,0' posn='110' name='विरोध'>
))
7 (( VGF
7.1 kiyA VM <fs af='kara,v,m,sg,any,,yA,yA' posn='120' name='किया'>
))
8 (( BLK
8.1 . SYM <fs af='।,punc,,,,,,' posn='130' name='।'>
))
</Sentence>
This diff is collapsed.
<Sentence id="1">
2 (( NP <fs af='viswAra,n,m,sg,,o,,' name="PS">
2.1 viswAra NN <fs af='viswAra,n,m,sg,,o,,'>
2.2 ke PSP <fs af='kA,n,m,sg,,d,7,'>
2.3 sAWa NST <fs af='sAWa,adv,,,,,,'>
2.4 hI RP <fs af='hI,adv,,,,,,'>
))
</Sentence>
<Sentence id="2">
1 (( NP
1.1 lAla JJ <fs af='lAla,adj,any,any,,,any,'>
1.2 ilA NN <fs af='ilA,n,m,sg,,o,,'>
1.3 ke PSP <fs af='kA,n,m,sg,,d,7,'>
1.4 pAsa NST <fs af='pAsa,adv,,,,,,'>
1.5 sWiwa JJ <fs af='sWiwa,n,m,sg,,o,,'>
1.6 cAzxanI NN <fs af='cAzxanI,n,f,sg,,o,,'>
))
</Sentence>
<Sentence id="3">
1 (( NP
1.1 cAroM QC <fs af='cAra,num,m,pl,3,d,,'>
1.2 ora NST <fs af='ora,nst,f,,3,d,,'>
))
</Sentence>
<Sentence id="4">
1 (( NP <fs af='Age,nst,m,,3,d,,'>
1.1 usa PRP <fs af='vaha,pn,any,sg,3,o,ke,'>
1.1 se PSP <fs af='se,psp,any,sg,3,o,ke,'>
1.2 Age NST <fs af='Age,nst,m,,3,d,,'>
))
</Sentence>
<Sentence id="1">
1 (( NP <fs af='viswAra,n,m,sg,,o,,' head=viswAra name="PS">
1.1 viswAra NN <fs af='viswAra,n,m,sg,,o,,' name=viswAra>
1.2 ke PSP <fs af='kA,n,m,sg,,d,7,' name=ke>
1.3 sAWa NST <fs af='sAWa,adv,,,,,,' name=sAWa>
1.4 hI RP <fs af='hI,adv,,,,,,' name=hI>
))
</Sentence>
<Sentence id="2">
1 (( NP <fs af='cAzxanI,n,f,sg,,o,,' head=cAzxanI>
1.1 lAla JJ <fs af='lAla,adj,any,any,,,any,' name=lAla>
1.2 ilA NN <fs af='ilA,n,m,sg,,o,,' name=ilA>
1.3 ke PSP <fs af='kA,n,m,sg,,d,7,' name=ke>
1.4 pAsa NST <fs af='pAsa,adv,,,,,,' name=pAsa>
1.5 sWiwa JJ <fs af='sWiwa,n,m,sg,,o,,' name=sWiwa>
1.6 cAzxanI NN <fs af='cAzxanI,n,f,sg,,o,,' name=cAzxanI>
))
</Sentence>
<Sentence id="3">
1 (( NP <fs af='ora,nst,f,,3,d,,' head=ora>
1.1 cAroM QC <fs af='cAra,num,m,pl,3,d,,' name=cAroM>
1.2 ora NST <fs af='ora,nst,f,,3,d,,' name=ora>
))
</Sentence>
<Sentence id="4">
1 (( NP <fs af='vaha,pn,any,sg,3,o,ke,' head=usa>
1.1 usa PRP <fs af='vaha,pn,any,sg,3,o,ke,' name=usa>
1.2 se PSP <fs af='se,psp,any,sg,3,o,ke,' name=se>
1.3 Age NST <fs af='Age,nst,m,,3,d,,' name=Age>
))
</Sentence>
<Sentence id="1">
1 (( NP
1.1 BArawIya JJ <fs af='BArawIya,n,m,sg,,o,,'>
1.2 saMskqwi NN <fs af='saMskqwi,n,f,sg,,o,,'>
1.3 meM PSP <fs af='meM,pn,,,,,,'>
))
2 (( NP
2.1 parvoM NN <fs af='parva,n,m,pl,,d,,'>
2.2 kA PSP <fs af='kA,n,m,sg,,o,7,'>
))
3 (( NP
3.1 viSeRa JJ <fs af='viSeRa,n,m,sg,,o,,'>
3.2 sWAna NN <fs af='sWAna,n,m,sg,,o,,'>
))
4 (( VGF
4.1 hE VAUX <fs af='hE,v,any,sg,2,,,hE'>
4.2 . SYM <fs af='&sdot;,punc,,,,,,'>
))
</Sentence>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
$setu/bin/sl/morph/kan/morph.out --logfilepath morph.log --pdgmfilepath $setu/data_bin/sl/morph/kan/ --uwordpath $setu/data_bin/sl/morph/kan/dict_final --dictfilepath $setu/data_bin/sl/morph/kan/dict/ -ULDWH --inputfile $1 --outputfile morphoutput
cat morphoutput
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment