1. Home
  2. Foxit PDF SDK for Windows
  3. How to Extract & Search for Text with Foxit PDF SDK (.NET)

How to Extract & Search for Text with Foxit PDF SDK (.NET)

Text Page

Foxit PDF SDK provides APIs to extract, select, search and retrieve text in PDF documents. PDF text contents are stored in TextPage objects which are related to a specific page. The TextPage class can be used to retrieve information about text in a PDF page, such as single character, single word, or text content within a specified character range or a rectangle and so on. It also can be used to construct objects of other text related classes to perform other operations for text contents or access specified information from text contents:

  • To search for text in the text contents of a PDF page, construct a TextSearch object with a TextPage object.
  • To access text such as hypertext links, construct a PageTextLinks object with TextPage object.

Example:

How to extract text from a PDF page

using foxit.common;
using foxit.pdf;
...
// Assuming PDFPage page has been loaded and parsed.
using (var text_page = new TextPage(page, (int)TextPage.TextParseFlags.e_ParseTextNormal))
{
int count = text_page.GetCharCount();
if (count > 0)
 {
 String chars = text_page.GetChars(0, count);
 writer.Write(chars);
 }
}
...

How to select text of a rectangle area in a PDF

using foxit.common;
using foxit.pdf;
using foxit.common.fxcrt;
...
RectF rect = new RectF(100, 50, 220, 100);
TextPage text_page = new TextPage(page, (int)foxit.pdf.TextPage.TextParseFlags.e_ParseTextNormal);
String str_text = text_page.GetTextInRect(rect);
...

Foxit PDF SDK provides APIs to search text in a PDF document, a XFA document, a text page or in a PDF annotation’s appearance. It offers functions to perform a text search and get the search results:

  • To specify the search pattern and options, use functions TextSearch.SetPattern, TextSearch.SetStartPage (only useful for a text search in a PDF document), TextSearch.SetEndPage (only useful for a text search in a PDF document) and TextSearch.SetSearchFlags.
  • To perform the search, use function TextSearch.FindNext or TextSearch.FindPrev.
  • To get the search results, use function TextSearch.GetMatchXXX().

Example:

How to search a text pattern in a page

using foxit.common;
using foxit.pdf;
...
// Assuming PDFDoc doc has been loaded.
using (TextSearch search = new TextSearch(doc, null))
{
 int start_index = 0;
 int end_index = doc.GetPageCount() - 1;
 search.SetStartPage(0);
 search.SetEndPage(doc.GetPageCount() - 1);
 String pattern = "Foxit";
 search.SetPattern(pattern);
 Int32 flags = (int)TextSearch.SearchFlags.e_SearchNormal;
 search.SetSearchFlags(flags);
 int match_count = 0;
 while (search.FindNext())
 {
 RectFArray rect_array = search.GetMatchRects();
 match_count++;
 }
...

In a PDF page, text contents that represent a hypertext link to a website/resource on the internet, or an email address are the same as common text. Prior to text link processing, user should first call PageTextLinks.GetTextLink to get a textlink object.

Example:

using foxit.common;
using foxit.pdf;
...
// Assuming PDFPage page has been loaded and parsed.
// Get the text page object.
TextPage text_page = new TextPage(page, (int)foxit.pdf.TextPage.TextParseFlags.e_ParseTextNormal);
PageTextLinks page_textlinks = new PageTextLinks(text_page);
TextLink text_link = page_textlinks.GetTextLink(index); // specify an index.
string str_url = text_link.GetURI();
...
Updated on October 23, 2019

Was this article helpful?

Related Articles

Ready to try Foxit PDF SDK?
Click the link below to download your trial
Free Trial