Example_sentences.txt 7.43 KB
Newer Older
priyank's avatar
priyank committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
Some examples have been shown below for the rules made for Hindi simple parser. Follow these example to make to rules for other languages.

Rule 1:
VGF	vib=X	NP	vib=ne	drel=k1	dep=X	mult=1	cost=5	verb=vfn

((raama ne))_k1 khaanaa khaayaa.

VGF is parent chunk.
Its constraint is that its TAM can be anything as denoted by 'X'.

NP is child chunk. Its constrain is that vibhakti (or case marker) should be 'ne'. The drel is k1. 
It is not dependent on the presence of any other dependency relation.
Its multiplicity is one within the sentential clause.
cost=5 means that among all rules, this rule has highest priority. Priority is given from 1 to 5. If cost=0, then there is no priority attached to the rule.
verb=vfn, means that the child chunk attaches itself with the finite verb (VGF) in the sentential clause. 

Rule 2:
VGF	vib=tam__ko	NP	vib=ko	drel=k1	dep=X	mult=1	cost=5	verb=vfn 

((raama ko))_k1 kaama karanaa padaa.

Finite verb's TAM agrees with the chunk with 'ko' vibhakti.
vib=tam__ko means that there is a list of tams called 'tam' which is the key and its value is 'ko'.
nA_pada+yA	ko
nA_cAhe+yA_WA	ko
........

Rule 3:
VGF	vib=X	NP	vib=0&&list=nom__pronoun	drel=k1	dep=X	mult=1	cost=4	verb=vfn

((mein))_k1 khaanaa khaa rahaa huun.

All the pronouns in nominative (vib=0) should be k1.

Note:
If there are more than one constraints for a parent or child chunk then separate those constraints by '&&'.
eg: vib=0&&list=nom__pronoun

Rule 4:
VGF	vib=X	NP	vib=0	drel=k1	dep=X	mult=1	cost=1	verb=vfn

((raama))_k1 khaanaa khaa rahaa hei.

Rule 5:
VG	list=k2__trans	NP	vib=ko	drel=k2	dep=k1	mult=1	cost=0	verb=vnfn

((raama))_k1 ((phala ko))_k2 khaa rahaa hei.

VG means that the parent chunk can be VGF, VGNF, VGNN etc. but the verb closest to the child chunk will be the parent. That verb should be a transitive verb. Hence, the verb should be present in the list 'k2' which a list of transitive verbs.
In the list each verb is a key and its value is 'trans'. Hence list=k2__trans in the parent constraints.

dep=k1, means that k1 should be present in the sentence clause only then k2 will be identified.
verb=vnfn means the child can attach itself to non finite or finite verb depending upon which verb is the closest to the child chunk.

eg:
raama phala ko khaate hue skuula gayaa.

'phala ko' attaches to 'khaate hue' which is a non-finite verb.

Rule 6:
VG	vib=tam__p_1&&list=k2__trans	NP	vib=0	drel=k2	dep=X	mult=1	cost=0	verb=vnfn

raama ke dvaaraa ((kaama))_k2 kiya gayaa.
k2 is the chunk with '0' vibhakti when the TAM of the parent verb is passive. Here, yA_gayA is a passive TAM. It is present in the TAM list as key with value p_1. Also, the parent verb should be a transitive verb.

Rule 7:
VG	vib=X	NP	vib=se|xvArA|kA_xvArA	drel=k3	dep=X	mult=1	cost=0	verb=vnfn  
raama ((caakuu se))_k3 phala kaaTataa hei.

Chunk with 'se' vibhakti is k3. This is the default rule for k3 relation.

Rule 8:
VG	list=k4__ko	NP	vib=ko	drel=k4	dep=X	mult=1	cost=0	verb=vnfn

raama ne ((mohana ko))_k4 kitaaba dii.

Chunk with 'ko' vibhakti with its parent verb involving a recipient/beneficiary is k4. The list of recipient/beneficiary verbs is stored in 'k4' as keys with values 'ko'.

Rule 9:
VG	list=k4__se	NP	vib=se	drel=k4	dep=X	mult=1	cost=0	verb=vnfn  

raama ne ((mohana se)) baata kahii.

Chunk with 'se' vibhakti with its parent verb involving a recipient/beneficiary is k4. The list of recipient/beneficiary verbs is stored in list 'k4' as keys with values 'se'.

Rule 10:
VG	vib=X	NP|VGNF|VGNN|VGINF	vib=kA_lie	drel=rt	dep=X	mult=1	cost=0	verb=vnfn  

raama pustaka ((khariidane ke lie))_rt baazaara gayaa.

Any NP or VGNF, VGNN chunk with 'ke lie' vibhakti is 'rt'.

Note:
Morph merges vibhaktis kA/ke/kI into kA. Hence ke_lie becomes kA_lie.

Rule 11:
VGF	vib=X	NP	vib=kA_pAsa|kA_sAmane	drel=k7p	dep=X	mult=>1	cost=0	verb=vfn  

raama ((pustakaalaya ke paas))_k7p khadaa hei.

Multiplicity is greater than 1 because k7p can occur more than one time within a sentential clause.

Rule 12:
NP	vib=X	NP|JJP|VGNF|VGNN|VGINF	vib=vAlA|vAlI	drel=nmod	dep=X	mult=1	cost=0	verb=vnfn

raama ne ((neele ranga vaalii))_nmod shirt khariidii.
raama ko ((khaane vaalii))_nmod cheezein bahuta pasanda hein.

Rule 13:
NP	vib=X	NP|JJP|VGNF|VGNN|VGINF	vib=kA	drel=r6	dep=X	mult=>1	cost=0	verb=vnfn  

((raama kaa))_r6 bhaai mohana ghara chalaa gayaa.

Multiplicity is greater than 1 because r6 can occur more than one time within a sentential clause.

Rule 14:
VGF	vib=X	NP	vib=kA_pahale	drel=k7t	dep=X	mult=>1	cost=0	verb=vfn  

raama ((raviivaara ke pahale))_k7t hii ghara chalaa gaya.

Multiplicity is greater than 1 because k7t can occur more than one time within a sentential clause.

Rule 15:
VG	vib=X	NP	vib=sAWa|kA_sAWa	drel=ras	dep=X	mult=1	cost=0	verb=vnfn  

raama ((mohana ke saatha))_ras skuula gayaa.

Rule 16:
VGF	vib=X	NP|VGNF|VGNN|VGINF	vib=para|meM	drel=k7	dep=X	mult=>1	cost=0	verb=vfn  

raama ((skuula main))_k7 padhataa hei.

Multiplicity is greater than 1 because k7 can occur more than one time within a sentential clause.

Rule 17:
VGF	vib=X	NP	list=place	drel=k7p	dep=X	mult=>1	cost=0	verb=vfn

raama ((upara))_k7p rahataa hei.

A list of generic location nouns exists.

Rule 18:
VGF	vib=X	NP	list=time	drel=k7t	dep=X	mult=>1	cost=0	verb=vfn  

raama ((subaha))_k7t skuula jaataa hei.

A list of generic temporal nouns exists.

Rule 19:
VG	vib=X	RBP	vib=NULL|se	drel=adv	dep=X	mult=>1	cost=0	verb=vnfn  

raama ((tezi se))_adv skuula gayaa.

Adverbial chunk with 'se' or 'NULL' vibhakti is adv.
Note:
If '0' vibhakti is present instead of NULL, then make the change in the rule file accordingly.

Rule 20:
VG	vib=X	NP	vib=ora|kA_waraPZa|kA_ora	drel=rd	dep=X	mult=>1	cost=0	verb=vnfn  
raama ne ((ghara kii ora))_rd dekhaa.

Rule 21:
VG	vib=X	NP|VGNF|VGNN|VGINF	vib=kA_kAraNa|kA_vajaha|kA_kArana_se|kA_vajaha_se	drel=rh	dep=X	mult=>1	cost=0	verb=vnfn

raama ((baarisha ke kaarana))_rh ghara se baahara nahi nikala paayaa.

Rule 22:
VG	vib=X	VGNF|VGINF	vib=kara|wA_huA	drel=vmod	dep=X	mult=>1	cost=0	verb=vnfn 

raama phala ((khaata huaa))_vmod skuula chalaa gayaa.

Non-finite verb chunk with 'kara' or 'wA_huA' TAM is vmod.
Note:
For noun chunks and ver chunks case markers and TAM are stored in 7th field of the feature structure. They are extracted by the API from the keyword 'vib'. The parser reads this keyword from the rule file in the parent and child constraints field. Hence, for seventh field give keyword in the rule file as 'vib'.
 

Rule 23:
VG	list=k2__trans	NP	vib=0	drel=k2	dep=k1	mult=1	cost=0	verb=vnfn

raama_k1 ((phala))_k2 khaa rahaa hei.

Noun chunk with '0' vibhakti is k2. Also, for k2 to be identified, k1 should already be marked.

Rule 24:
VGF	vib=tam__hE	NP|JJP	vib=0	drel=k1s	dep=k1	mult=1	cost=0	verb=vfn  

raama ((doctor))_k1s hei.

Noun chunk with '0' vibhakti is k1. The TAM of the parent verb should be hE. Also, for k1 to be identified, k1 should already be marked.

Rule 25:
VG	vib=X	NP	vib=se&&list=place	drel=k5	dep=X	mult=1	cost=0	verb=vnfn  

raama ((yahaan se))_k5 chalaa gaya.

All generic location nouns with 'se' vibhakti are k5.

Rule 26:
VG	list=k5__se	NP	vib=se	drel=k5	dep=X	mult=1	cost=0	verb=vnfn  

pattaa ((peda se))_k5 giraa.

The parent verb should belong to a list of motion verbs. 'giranaa' is a motion verb. It is present in the list 'k5' which is the list of motion verbs. The noun chunk should have 'se' vibhakti.