1. Home
  2. Foxit Quick PDF Library
  3. Extract text from a defined rectangular area on a page

Extract text from a defined rectangular area on a page

Foxit Quick PDF Library includes a range of functionality for extracting text from PDF files, but usually it’s for extract text from an entire page. The extract functions which include “area” in the name let you specify a rectangular area from which you wish to extract text. The key functions for this using regular memory functions are SetTextExtractionArea and for direct access (DA) functions it is DASetTextExtractionArea.

Sample code demonstrating the use of the regular and DA functions for extracting text from a portion of the page is shown below:

SetTextExtractionArea with GetPageText

DPL.LoadFromFile(@"Sample.pdf", "");
DPL.SetOrigin(1); // Sets 0,0 coordinate position to top left of page, default is bottom left
DPL.SetTextExtractionArea(35, 35, 229, 30); // Left, Top, Width, Height
string ExtractedContent = DPL.GetPageText(8);
Console.WriteLine(ExtractedContent);

DASetTextExtractionArea with ExtractFilePageText

SetOrigin cannot be used with DASetTextExtractionArea so the 0,0 coordinates are at the bottom left of the page by default. This means we need to adjust top parameter so that the top is measured bottom up, rather than from top down. The page height is 792 points so it’s just a matter of subtracting 35 in our example above from 792 to give us 757 points.

DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.ExtractFilePageText(@"Sample.pdf", "", 1, 8);
Console.WriteLine(ExtractedContent);

DASetTextExtractionArea with DAExtractPageText

int fileHandle = DPL.DAOpenFile(@"C:\Users\Rowan\Dropbox (Debenu)\DQPL ReleaseTester\TestFiles\Text\Adobe PDF Library.pdf", "");
int pageRef = DPL.DAFindPage(fileHandle, 1);
DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.DAExtractPageText(fileHandle, pageRef, 8);
Console.WriteLine(ExtractedContent);

Foxit Quick PDF Library gives you precision control over which text is extracted from the document.

Updated on April 9, 2017

Was this article helpful?

Related Articles

Ready to try Foxit PDF SDK?
Click the link below to download your trial
Free Trial