A Compare Profile for PDFC contains parameters and settings for the comparison of documents. Different Compare Profiles can lead to very different results when compared. Therefore, it may be necessary to adjust or optimize these for certain comparison scenarios.
From i-net PDFC provided and by the Administrator shared Compare Profile can not be changed by the user. These profiles, however, can duplicated and then customized for their own needs. With the administrative permissions (permission to "configuration") a Compare Profile can be shared with other users on the server.
Note: The a shared configuration can't be unshared, it has to be deleted.
The compare profiles provided by i-net PDFC can be activated or deactivated in the configuration under Comparison> Profiles.
In the footer of the Compare Profile window you have the possibility to manage Compare Profiles. The currently open profile can be duplicated, exported, published and deleted. In addition, settings for a Compare Profiles can be imported.
Note: In this way, for example, Compare Profiles can be exported from the i-net PDFC GUI and imported into the server. This also works in the other direction. Exported Compare Profiles can be edited outside the application, that means settings which can not be set with the configuration interface. This can be headers and footers that are over 100 pixels high. The possible settings can be found in the respective tables.
The following setting are optional and only used for managing the difference profiles. The properties have no effect at the comparisons.
Property Name | Description |
---|---|
PROFIL_NAME | A unique name to differentiate the profiles. |
PROFIL_DESCRIPTION | A descriptive text about the profile |
Each profile can be stored to an external file by clicking on Export
at the bottom of the profile panel. The export files are portable and can be used for any type of i-net PDFC installation - GUI, API and Server.
To import a profile, create a custom profile and select it as the active one, The Import
label will be available at the bottom of the profile configuration page. Click on this label to select a file to import. Alternatively drag&drop a profile XML file into the panel to load the settings.
The selected imported profile will replace all settings of the current profile.
The behavior and the precision of i-net PDFC can be specified using the configuration properties. These configuration properties of i-net PDFC are included in the file config.xml
.
After the installation the installation folder contains the config.xml
with default values. You can change the default values by editing this file. If it does not exist, then i-net PDFC uses the default values.
Note: The graphical user interface uses the file .pdfc
in the current users home directory instead of the config.xml
file in the installation directory. You can edit this file the same way to perform some special fine-tuning but this is not necessary in most of the cases.
A profile basically contains the settings for comparison mode, element comparison types and filters to be used. Each filter or comparison type may have additional options to fine-tune the feature.
Property Name | Description |
---|---|
CONTINUOUS_COMPARE | This value defines which comparison engine to be used. Currently available are 'CONTINUOUS' and 'STRICT'. The default value is: CONTINUOUS. |
The following property are only work in the comparison mode CONTINUOUS
.
Property Name | Description | Default | Range |
---|---|---|---|
CONTINUOUS_DETECT_PAGES | Specifies whether the continuous compare can be splited instead of comparing all content at once. If set to a value greater zero, this specifies how many pages may be added or inserted before the comparison fails to match the content. The larger this value, the more precise the comparison will be. On the downside a large value will increase the memory consumption. If this value is set to zero, all content will be compared at one. This will give the optimum result on the cost of a maximum memory requirement. | 5 | 0 - 2147483647 |
The following property are only work in the comparison mode STRICT
.
Property Name | Description | Default | Range |
---|---|---|---|
TOLERANCE_PAGE_LEFTCORNER | Specifies the maximum number of pixels that the left or top margin of a page can differ (is the upper left corner of all elements) before it is viewed as a difference. | 3 | 0 - 100 |
TOLERANCE_PAGE_RATIO | Specifies the tolerance for the aspect ratio of the PDF page. | 0.01 | 0 - 1 |
TOLERANCE_PAGE_SIZE | Specifies the maximum number of pixels that the width or height of a page can differ before it is viewed as a difference. | 2 | 0 - 100 |
Deprecated, these properties will be removed in a future version.
With the following properties it is possible to configure the output of i-net PDFC and the logging. This profile setting are deprecated and will be removed soon. Use the Settings or the CommandLine arguments instead.
Property Name | Description |
---|---|
CREATE_DIFFIMAGES | Specifies if a PNG image with the marked difference will be created for each pair of pages that contains differences. Possible values are: false , first , second and true - this creates the difference image for none, the first, the second or both files. The default value is: false. |
CREATE_ORIGIMAGES | Specifies if a PNG image with the original content will be created for each compared page. The default value is: false. |
CREATE_XORIMAGES | Creates an (negated) XOR image for any pair of pages with differences. The image will be stored as a PNG in the differences directory of the current comparison. If CREATE_DIFFIMAGES is enabled as well, the XOR image will be drawn onto the image created by CREATE_DIFFIMAGES between the two actual page images. The default value is: false. |
IMAGE_SCALE_FACTOR | Defines a scale factor for the generated images (original and difference images). The default is 1, i.e. no scaling. The default value is: 1. |
LOG_FILE | Specifies the file where logged information is to be stored. If a file is specified, the logging is written to the file, otherwise the logging is written to the console. Default is empty, logging to the console. |
LOG_LEVEL | Specifies the Logging Level. Available values: ''OFF'' // switches the output completely off. ''ERROR'' // logs error messages. ''WARN'' // contains all the messages from ERROR-Level and additionally informs about the irregularities during the execution. ''INFO'' // (Default) contains all the messages from WARN-Level and additionally describes settings and environment attributes. ''ALL'' // is used to display the maximal information during the PDFC execution including any debug info. |
MAX_ERRORS_PER_FILE | Sets the maximum number of differences for the console or log output. All futher differences will be counted but not show in detail. The default value is: 100. [ Value "-1" for unlimited] |
EXPORT_PDF_ALWAYS | Specifies whether the PDF export function should create a file for any comparison(value true ) or only in case of differences(value false ). The default value is: false |
Filters are an optional feature for the continuous mode. They help to remove redundant elements from the comparison and to overcome the issue that PDFs may not contain any information about the original text layout. Please note the these filters may not be exactly correct in every single case. Finding the original layout of a document depends heavily on the content of these documents. The chance of correctly detecting a header rises with the number of pages available. So it's recommended to use the desktop or web application of i-net PDFC when activating filters since they allow you to review the result of each filter.
Filters can be activated by adding them to the FILTERS property:
Property Name | Description |
---|---|
FILTERS | Specifies a comma-separated list of filters that will be executed before the actual comparison. |
If all filter available plugins are installed and activated in the configuration (server only), the following filter keys are available:
The continuous compare mode distinguishes between four types of content: text words, lines / shapes, images and annotations. Each of theses types can be excluded from the comparison.
Compared types can be included or excluded by COMPARE_TYPES
property:
Property Name | Description |
---|---|
COMPARE_TYPES | Specifies a comma-separated list of types that will be included in the comparison. Default is 'TEXT, LINE, IMAGE, ANNOTATION' |
Includes all text elements like words, numbers, punctuation and list items. The text comparison can be modified using the following properties:
Property Name | Description |
---|---|
DOCUMENT_LANGUAGE | This value defines the language for all text recognition plugins. If the configured language doesn't match the actual language of the document, the recognition errors will increase significantly. If the required language is not available, please have a look at the OCR help page on how to install further languages. The default value is 'auto-detect' in which case i-net PDFC will try to detect the language from the native text elements in the document, if any. In case the language cannot be detected, the client language or English will be used. |
TEXT_ALIGN_RATIO | This value is the maximum allowed y-jitter for the text line identification. It is relative to the text height of the respective line. This value can be used to compensate rounding errors of different PDF generators. The default value is 0.15 |
COMPARE_TEXT_STYLES | A comma separated list defining which text properties of matched words to compare. Available values are SIZE, COLOR, FONT, STYLES, ROTATION. The default value is 'true' which compares all properties. |
TOLERANCE_TEXT_SIZE | This property defines the tolerated difference in the text size as a ratio. It's only relevant in case COMPARE_TEXT_STYLES is set to true. The default value is 0.05 |
TOLERANCE_COLOR | Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the line comparison as well. Will be used for Text and Line comparison. The default value is: 0.01 which is 1% |
COMPARE_TEXT_CASE_SENSITIVE | This switch toggles the case sensitivity of the text comparison. If set to 'false', all text elements will be compared as lower case. This cause the comparison to run slightly slower and take some more memory. The conversion to lower case will be performed using the default localization of the runtime. The default value is 'true' |
TOLERANCE_UNDERLINE_LENGTH | Specifies the maximum difference in percent, in which the length of underlines may differ before it is viewed as a difference. The default value is: 0.1. The range is 0.0 - 10.0. This value will only be use for a STRICT comparison mode |
This value includes all graphical elements except images. The line and shape comparison can be modified using the following properties:
Property Name | Description | Default | Range |
---|---|---|---|
COMPARE_LINE_STYLES | If set to 'true', the styles of all matched lines and shapes will be checked as well. This will compare the color, stroke and thickness of all lines. | 'true' | |
TOLERANCE_LINE_POSITION | Specifies the maximum number of pixels that the position of a line or curves can differ per axis before it is viewed as a difference. | 3 | 0 - 100 |
TOLERANCE_LINE_SIZE | Specifies the maximum number of pixels that the length of a line can differ in total before it is viewed as a difference. | 2 | 0 - 100 |
TOLERANCE_LINE_THICKNESS | Specifies the maximum difference in stroke thickness of two lines or curves (measured in pt) before it is viewed as a difference. | 1 | 100 |
TOLERANCE_COLOR | Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the text comparison as well. Will be used for Text and Line comparison. | 0.01 (1%) | 0.0 - 1.0 |
TOLERANCE_BOX_ROUND_EDGES | Specifies the maximum number of pixels (1 pixel is approximately 0.265mm) that a control point of a quadratic Bézier curve may differ in total before it is viewed as a difference. | 3 | 0 - 10 |
This value includes all images. Note that comparing images may have a notable impact on your performance. The image comparison can be modified using the following properties:
Property Name | Description | Default | Range |
---|---|---|---|
TOLERANCE_IMAGE_DISTANCE | Specifies the maximum number of pixels that the position of an image can differ before it is viewed as a difference. | 3 | 0 - 10 |
TOLERANCE_IMAGE_PIXEL_VALUE | Specifies the maximal allowed discrepancy of pixel values (Double) before it is viewed as a difference. | 0.05 | 0.0 - 1.0 |
TOLERANCE_IMAGE_SIZE | Specifies the maximum difference in percent that the area spanned by an image may differ before it is viewed as a difference. | 0.1 | 0.0 - 1.0 |
USE_PIXEL_MEDIUM_VALUE | This property of the image comparison specifies, if i-net PDFC should compare the medium values instead of single-pixel values. | 'true' |
Property Name | Description |
---|---|
COMPARE_ANNOTATIONS_DETAILED |
will be fully detailed as any other difference in the document (true). Default is 'false' |