Package org.apache.pdfbox.tools
Class ExtractText
java.lang.Object
org.apache.pdfbox.tools.ExtractText
This is the main program that simply parses the pdf document and transforms it
into text.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Stringprivate static final Stringprivate static final Stringprivate booleanprivate static final Stringprivate static final Stringprivate static final Stringprivate static final Stringprivate static final org.apache.commons.logging.Logprivate static final Stringprivate static final Stringprivate static final Stringprivate static final Stringprivate static final String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidextractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, Writer output, boolean rotationMagic, boolean alwaysNext) (package private) static intgetAngle(TextPosition text) static voidInfamous main method.voidstartExtraction(String[] args) Starts the text extraction.private longstartProcessing(String message) private voidstopProcessing(String message, long startTime) private static voidusage()This will print the usage requirements and exit.
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOG -
PASSWORD
- See Also:
-
ENCODING
- See Also:
-
CONSOLE
- See Also:
-
START_PAGE
- See Also:
-
END_PAGE
- See Also:
-
SORT
- See Also:
-
IGNORE_BEADS
- See Also:
-
DEBUG
- See Also:
-
HTML
- See Also:
-
ALWAYSNEXT
- See Also:
-
ROTATION_MAGIC
- See Also:
-
STD_ENCODING
- See Also:
-
debugOutput
private boolean debugOutput
-
-
Constructor Details
-
ExtractText
private ExtractText()private constructor.
-
-
Method Details
-
main
Infamous main method.- Parameters:
args- Command line arguments, should be one and a reference to a file.- Throws:
IOException- if there is an error reading the document or extracting the text.
-
startExtraction
Starts the text extraction.- Parameters:
args- the commandline arguments.- Throws:
IOException- if there is an error reading the document or extracting the text.
-
extractPages
private void extractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, Writer output, boolean rotationMagic, boolean alwaysNext) throws IOException - Throws:
IOException
-
startProcessing
-
stopProcessing
-
getAngle
-
usage
private static void usage()This will print the usage requirements and exit.
-