The comparison of i-net PDFC aims to find the smallest possible number of differences. This can lead to words being recognized as the same even if they occur only by chance on both pages without belonging to the same context. A typical example are common words such as 'and' or 'the' or punctuation marks.
To reduce the unfavorable hits in large replacements, the option "Combine large text differences" can be selected in the filter area of the profile. The comparison will thus combine large replacements even if a small number of identical elements are included.
The following example shows the result for a text replacement with the "Text only" profile:
Document 1 | Document 2 |
---|---|
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. | Nam liber tempor cum soluta nobis eleifend option. |
The word 'tempor' is recognized as equal, although it is in a different context, because the whole paragraph has been replaced. The 'Combine large text differences' option can be used to correct this 'hit':
Document 1 | Document 2 |
---|---|
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. | Nam liber tempor cum soluta nobis eleifend option. |
Combining differences is a heuristic procedure. Thus, erroneous combinations are possible, but very unlikely. In any case, no content is marked as 'equal', which is not. There is only a small chance that actually equal elements are marked as different.