Extract text from PDFs as a text block list

Foxit Quick PDF Library provides an extensive API for programmatically extracting text from PDF files. This includes the options of just plain text output and also returning the text in a formatted CSV string with details about the font, size and style of the text.

The API now includes additional text extraction functions for extracting text as text blocks which can be easier to manage and parse. The text block functions let you retrieve the text block as well as information about the text bounds, font, color and size.

The full range of text extraction functions can be found in our online reference for extraction functions.

Here’s some C# sample code which demonstrates how to use some of these text block functions:

DPL.LoadFromFile(@"C:\Program Files (x86)\Debenu\PDF Library\DLL\GettingStarted.pdf", "");
 
double[] box = new double[9];
for (int i = 1; i &lt;= DPL.PageCount(); i++)
{
    int id = DPL.ExtractPageTextBlocks(4);
    for (int f = 1; f &lt;= DPL.GetTextBlockCount(id); f++)
    {
        double fontSize = DPL.GetTextBlockFontSize(id, f);
        string fontName = DPL.GetTextBlockFontName(id, f);
 
        for (int j = 1; j &lt;= 8; j++)
        { 
	    box[j] = DPL.GetTextBlockBound(id, f, j);
        }
 
        string text = DPL.GetTextBlockText(id, f);
 
        Console.WriteLine("Text Block ID: " + id);
        Console.WriteLine(text);
        Console.WriteLine("Font Name: " + fontName);
        Console.WriteLine("Font Size: " + fontSize);
        Console.WriteLine("Text Block Bounds:");
        foreach (var item in box)
        {
	    Console.WriteLine(item.ToString());
        }
        Console.WriteLine(Environment.NewLine);
    }
    DPL.ReleaseTextBlocks(id);
    Console.Read();
}

Updated on March 19, 2019

Tagged: tips and tricks sample code

Was this article helpful?

Yes No

Ready to try Foxit PDF SDK?

Click the link below to download your trial

Free Trial

Was this article helpful?

Related Articles