Comparison Profile

A Compare Profile for PDFC contains parameters and settings for the comparison of documents. Different Compare Profiles can lead to very different results when compared. Therefore, it may be necessary to adjust or optimize these for certain comparison scenarios.

Manage Compare Profiles

In the footer of the Compare Profile window you have the possibility to manage Compare Profiles. The currently open profile can be duplicated, exported, published/unpublished and deleted. In addition, settings for a Compare Profiles can be imported.

Note: In this way, for example, Compare Profiles can be exported from the i-net PDFC GUI and imported into the server. This also works in the other direction. Exported Compare Profiles can be edited outside the application, that means settings which can not be set with the configuration interface. This can be headers and footers that are over 100 pixels high. The possible settings can be found in the respective tables.

Default Profiles and Publishing

The default compare profiles provided by i-net PDFC can be activated or deactivated in the server configuration under Comparison> Profiles.

Any user with administrative permissions or the permission 'Manage Users and Groups' can publish a user profile either for all users or for selected users or groups. Once a profile is published it will appear alongside the default profiles for any user who has access to the profile. The publishing state of a profile is displayed in the list of available profiles unless the profile is shared for anyone. So, this mechanism can for instance be used to augment the default profiles of the server.

Published profiles can be unpublished by any user with administrative permissions or the permission 'Manage Users and Groups'. They can only be modified by the owner or an administrator.

To customize a default profile or a write protected shared profile, it has to be duplicated to create a new writable profile.

Import/Export of profiles

Each profile can be stored to an external file by clicking on Export at the bottom of the profile panel. The export files are portable and can be used for any type of i-net PDFC installation - GUI, API and Server.

To import a profile, create a custom profile and select it as the active one, The Import label will be available at the bottom of the profile configuration page. Click on this label to select a file to import. Alternatively drag&drop a profile XML file into the panel to load the settings.

The selected imported profile will replace all settings of the current profile.

Profile settings

A profile basically contains the settings for comparison mode, element comparison types and filters to be used. Each filter or comparison type may have additional options to fine-tune the feature.

Comparison mode

This option has the biggest impact on the comparison. Here are the differences between the default comparison mode and the "strict mode":

Default mode Strict mode
Allows for parts of the document to be matched even when one part is located further down the document due to an inserted paragraph or element.Each part of the document must be lined up in the same position in both documents in order to be seen as matching. This means that if a paragraph is inserted in the one document, all content underneath this paragraph will be seen as different to the other document since it will have moved.
Places an emphasis on the continuous flow of content being the same, as opposed to look/location on each individual page. Places an emphasis on both location of elements AND content.

Filters and optimizations

i-net PDFC offers various specialized optimizations for comparing content of specific kinds. You can turn these optimizations on and off at will.

Compared Types

Text comparison

The text comparison includes all text elements such as words, numbers, punctuation and list elements. i-net PDFC will determine these elements as required and according to the rules of the system language. Such text will be compared by element. So even if only a single character is changed, the whole word will be marked as different. This is due to the fact that a slight change could be a typo or if could completely change the meaning of the word. Textual content will be compared in natural reading order instead. This order may be different to the word order specified in the document since some generators (especially for PDF) have no meaningful word order.

Deviation tolerance for text

The deviation tolerance for text sets the maximum allowed y-jitter for the text line identification. It is relative to the text height of the respective line. This value can be used to compensate rounding errors of different PDF generators.

This property defines the tolerated difference in the text size as a ratio. It's only relevant in case COMPARE_TEXT_STYLES is set to true and only if the strict comparison mode is being used.

Case sensitive comparison

If set to false, all text elements will be compared as lower case. It will cause the comparison to run slightly slower and take some more memory. The conversion to lower case will be performed using the default localization of the runtime. The default value is 'true'

Check text size

Verifies that the text size is the same in both documents.

Check text color

Verifies that the text color is the same in both documents.

Check font names

Verifies that the font names are the same in both documents.

Check text styles

Verifies that the text styles are the same in both documents.

Check non-semantic white spaces

Check for changes in white spaces and line breaks that are not semantically relevant. A common example is the removal of a white space between a word and the adjacent comma. Such changes are merely stylistic and do not change the meaning of the content. Thus these changes belong to the category 'Modified Styles'.

Language

In case you're going to use an optical character recognition filter like 'Extract Text', i-net PDFC requires to know the language of the document. If the language analyzer plugin is available you may choose 'Auto-detect' to let the analyzer detect the language automatically. But, if there is no such plugin or if there are no native text elements in the document, you'll have to explicitly set the language. In case the selected or detected language doesn't match the document language, the text recognition rate will be very poor.

If the language of the document is missing in the selection, please manually install this language. Further details can be found on the OCR help page .

Line and shapes comparison

Lines and shapes can be compared as well, this will compare each and every line in the document for differences. It is recommended to leave this option off unless necessary, since little movements and extra space can cause lines to be placed at different positions, leading to a multitude of detected differences. You can additionally decide whether line styles (such as dashed vs. dotted lines) are to be compared, as well as define the tolerance level for the differences in line sizes (length/thickness). The tolerance levels here are measured in pixels - e.g. a tolerance of "20 pixels" for the size would cause a line which is 50 pixels wide to be seen as identical to a line which is 30 pixels wide, but as different to a line which is 29 pixels wide.

Deviation tolerance for lines

In graphical interfaces the tolerance slider will adjust the size, location(strict mode only) and thickness tolerance.

Image comparison

The image comparison of i-net PDFC will compare all images of a document according to it's visual appearance. The comparison can be configured to tolerate color differences to some extent. For overlapping, connected or clipped images the comparer will only take the visual pixels into account.

You should note that the image comparison may have a notable impact on the comparison performance.

Deviation tolerance for images

In graphical interfaces the tolerance slider adjusts the tolerance for color, size and location(strict mode only).

Property Name Description
Detailed View This property of the image comparison specifies, if i-net PDFC should compare in blocks and show this in the result for the case if the difference under 50%. This option increase the ressource consumption. The default value is: false.

Annotation comparison

Annotations are optional and often editable content, usually in PDF documents. Since annotations are not part of the primary content of the document, they are ignored by the comparison by default. With this option you can choose to compare annotations as well.

Property Name Description
Detailed View Differences in annotations will by default be summarized into one marker per different annotation. To get distinct markers for each difference in a comparison, please activate this option.