Class PreflightParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.PDFParser
org.apache.pdfbox.preflight.parser.PreflightParser
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected PreflightContextprotected DataSourcestatic final CharsetDefine a one byte encoding that hasn't specific encoding in UTF-8 charset.protected PreflightDocumentprotected ValidationResultFields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolverFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, STREAM_STRING, T -
Constructor Summary
ConstructorsConstructorDescriptionPreflightParser(File file) Constructor.PreflightParser(File file, ScratchFile scratch) Constructor.PreflightParser(String filename) Constructor.PreflightParser(String filename, ScratchFile scratch) Constructor.PreflightParser(DataSource dataSource) Constructor.PreflightParser(DataSource dataSource, ScratchFile scratch) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidAdd the error to the ValidationResult.protected voidprotected void'endstream' must be preceded by an EOLprotected voidCheck that the PDF header match rules of the PDF/A specification.protected void'stream' must be followed by <CR><LF> or only <LF>protected voidCreate a validation context.protected voidcreatePdfADocument(Format format, PreflightConfiguration config) protected static ValidationResultCreate an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)This will get the PD document that was parsed.protected voidThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.protected intlastIndexOf(char[] pattern, byte[] buf, int endOff) Searches last appearance of pattern within buffer.private booleanvoidparse()This will parse the stream and populate the COSDocument object.voidParse the given file and check if it is a confirming file according to the given format.voidparse(Format format, PreflightConfiguration config) Parse the given file and check if it is a confirming file according to the given format.protected COSArrayThis will parse a PDF array object.protected COSNameThis will parse a PDF name from the stream.protected COSStreamWraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.protected COSStringCheck that the hexa string contains only an even number of Hexadecimal characters.protected COSBaseCallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.protected COSBaseparseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) This will parse the next object from the stream and add it to the local state.protected booleanparseXrefTable(long startByteOffset) Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so onMethods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, getAccessPermission, getDocument, getEncryption, getStartxrefOffset, isCatalog, isLenient, parseDictObjects, parseFDFHeader, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, rebuildTrailer, retrieveTrailer, setEOFLookupRange, setLenientMethods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
Field Details
-
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816 -
dataSource
-
validationResult
-
preflightDocument
-
ctx
-
-
Constructor Details
-
PreflightParser
Constructor.- Parameters:
file-- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
file-scratch-- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
filename-- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
filename-scratch-- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource- the datasource- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource- the datasourcescratch-- Throws:
IOException- if there is a reading error.
-
-
Method Details
-
createUnknownErrorResult
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)- Returns:
- the ValidationError instance.
-
addValidationError
Add the error to the ValidationResult. If the validationResult is null, an instance is created using the isWarning boolean of the ValidationError to know if the ValidationResult must be flagged as Valid.- Parameters:
error-
-
addValidationErrors
-
parse
Description copied from class:PDFParserThis will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.- Overrides:
parsein classPDFParser- Throws:
InvalidPasswordException- If the password is incorrect.IOException- If there is an error reading from the stream or corrupt data is found.
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)- Throws:
IOException
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)config- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.- Throws:
IOException
-
createPdfADocument
- Throws:
IOException
-
createContext
protected void createContext()Create a validation context. This context is set to the PreflightDocument. -
getPDDocument
Description copied from class:PDFParserThis will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Overrides:
getPDDocumentin classPDFParser- Returns:
- The document at the PD layer.
- Throws:
IOException- If there is an error getting the document.
-
getPreflightDocument
- Throws:
IOException
-
initialParse
Description copied from class:PDFParserThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Overrides:
initialParsein classPDFParser- Throws:
InvalidPasswordException- If the password is incorrect.IOException- If something went wrong.
-
checkPdfHeader
protected void checkPdfHeader()Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80 -
parseXrefTable
Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on- Overrides:
parseXrefTablein classCOSParser- Parameters:
startByteOffset- the offset to start at- Returns:
- false on parsing error
- Throws:
IOException- If an IO error occurs.
-
parseCOSStream
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.checkStreamKeyWord()andcheckEndstreamKeyWord()- Overrides:
parseCOSStreamin classCOSParser- Parameters:
dic- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
IOException- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
checkStreamKeyWord
'stream' must be followed by <CR><LF> or only <LF>- Throws:
IOException
-
checkEndstreamKeyWord
'endstream' must be preceded by an EOL- Throws:
IOException
-
nextIsEOL
- Throws:
IOException
-
parseCOSArray
Description copied from class:BaseParserThis will parse a PDF array object.- Overrides:
parseCOSArrayin classBaseParser- Returns:
- The parsed PDF array.
- Throws:
IOException- If there is an error parsing the stream.
-
parseCOSName
Description copied from class:BaseParserThis will parse a PDF name from the stream.- Overrides:
parseCOSNamein classBaseParser- Returns:
- The parsed PDF name.
- Throws:
IOException- If there is an error reading from the stream.
-
parseCOSString
Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and callBaseParser.parseCOSString()- Overrides:
parseCOSStringin classBaseParser- Returns:
- The parsed PDF string.
- Throws:
IOException- If there is an error reading from the stream.
-
parseDirObject
CallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.- Overrides:
parseDirObjectin classBaseParser- Returns:
- The parsed object.
- Throws:
IOException- if there is an error during parsing.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException Description copied from class:COSParserThis will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.- Overrides:
parseObjectDynamicallyin classCOSParser- Parameters:
objNr- object number of object to be parsedobjGenNr- object generation number of object to be parsedrequireExistingNotCompressedObj- iftruethe object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
IOException- If an IO error occurs.
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff) Description copied from class:COSParserSearches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Overrides:
lastIndexOfin classCOSParser- Parameters:
pattern- pattern to search forbuf- buffer to search pattern inendOff- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1if pattern could not be found
-