Quantcast
Channel: Adobe Community : Unanswered Discussions - Acrobat
Viewing all articles
Browse latest Browse all 73766

How can I find the words which spans across end of line to next line in pdf ?

$
0
0

I am using Acrobat Adobe X Pro version for our form development and maintanence. I am writting a Acrobat JAVA batch script which reads through all the words and execute spell check and reports the mispelled words in a excel sheet. Since I am running this script in batch mode for more than 1000 pdfs - I am getting many words joined together. When I looked in to those pdfs all such words are looking okay because it is appearing in end of right margin and the next word is in the next line. Since there was no space between them it was extracted as a single word. Hence the failure.

 

I used wordf = this.getPageNthWordQuads(i,j)  to get the word begin and end coordinates. when I closely observe the values are creating a rectangle and that doesnt span across lines. I got the coordinates for the regular word and the word which span acoross two lines. both of the coordinates are same.

 

I think I am screwed - I have 8000 such words and no clue of how to get rid of them from the actual misspelled words.

 

please help. let me know if any class /method if I call will give me the end of line or do I need to go to next layer to find this split.

 

the addnot is somehow marking the words using this coordinates - please hellp me understand how this works. Thanks.

 

 

// for all pages

for (var i = 0; i < this.numPages; i++ )

{

// For all the words

pg += 1;

numWords = this.getPageNumWords(i);

for ( j = 0; j < numWords; j++)

{

//get the spell check 

ckWord = spell.checkWord(this.getPageNthWord(i,j))

 

if ( ckWord != null )

{

jn=0

ml=0

// if mispelled word found.

 

wordf = this.getPageNthWordQuads(i,j)

swordf = wordf.toString()

 

var st = swordf.split(",")

 

var diffx0 = parseInt(st[0])-8

var diffx1 = parseInt(st[1])-8

var diffx2 = parseInt(st[2])-8

var diffx3 = parseInt(st[3])-8

var diffx4 = parseInt(st[4])-8

var diffx5 = parseInt(st[5])-8

var diffx6 = parseInt(st[6])-8

var diffx7 = parseInt(st[7])-8

 

if (cWord == csword)

{

jn = 1

}

if ( st[1] != st[3] )

{

ml = 1

}

//dataLine += "\r\n write "

}

else

{

ml=2

}

dataLine += "\r\n"+this.documentFileName

+ "\t" + this.getPageNthWord(i,j)

+ "\t" + pg

+ "\t" + j

+ "\t" + ml

+ "\t" + jn

+ "\t st[0] " + diffx0 + "\t st[1] " + diffx1 + "\t st[2] " + diffx2 + "\t st[3] " + diffx3 

+ "\t st[4] " + diffx4 + "\t st[5] " + diffx5 + "\t st[6] " + diffx6 + "\t st[7] " + diffx7 

ck=1

}

}

}


Viewing all articles
Browse latest Browse all 73766

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>