public class SimpleDiff extends Object implements DiffAlgorithm
Overview of Algorithm
by bwm
The algorithm is optimised for situations where the input sequences have few repeated objects. If it is given input with many repeated objects it will report suboptimal changes. However, given appropriate input, it is fast, and linear in memory usage.
The algorithm consists of the following steps:
The first step is to compute a an equivalence set for the input data. The equivalence set is computed from objects that are in the original input sequence
eq(x) = the index of the first occurence of x in the original sequence.
With this equivalence function, the algorithm can compare integers rather than strings, which is considerably more efficient.
The second step is to compute the datastructure on which the algorithm will operate. Having computed the equivalence function in the previous step, we can compute two arrays where indx[i] = eqs(orig[i]) and jndx[i] = eqs(rev[i]). The algorithm can now operate on indx and jndx instead of orig and rev. Thus, comparisons are then on O(int == int) instead of O(Object.equals(Object)).
The algorithm now matches indx and jndx. Whilst indx[i] == jndx[i] it skips matching objects in the sequence. In seeking to match objects in the input sequence it assumes that each object is likely to be unique. It uses the known characteristics of the unique equivalence function. It can tell from the eq value if this object appeared in the other sequence at all. If it did not, there is no point in searching for a match.
Recall that the eq function value is the index earliest occurrence in the orig sequence. This information is used to search efficiently for the next match. The algorithm is perfect when all input objects are unique, but degrades when input objects are not unique. When input objects are not unique an optimal match may not be found, but a correct match will be.
Having identified common matching objects in the orig and revised sequences, the differences between them are easily computed.
Delta
,
Modifications: 27/Apr/2003 bwm Added some comments whilst
trying to figure out the algorithm 03 May 2003 bwm Created this
implementation class by refactoring it out of the Diff class to enable
plug in difference algorithms
Constructor and Description 

SimpleDiff() 
Modifier and Type  Method and Description 

protected Map 
buildEqSet(Object[] orig,
Object[] rev)
create a
Map from each common item in orig and rev to the
index of its first occurrence in orig 
protected int[] 
buildIndex(Map eqs,
Object[] seq,
int NF)
build a an array such each a[i] = eqs([i]) or NF if eqs([i]) undefined

Revision 
diff(Object[] orig,
Object[] rev)
Compute the difference between original and revised sequences.

protected int 
scan(int[] ndx,
int i,
int target) 
protected int scan(int[] ndx, int i, int target)
public Revision diff(Object[] orig, Object[] rev) throws DifferentiationFailedException
diff
in interface DiffAlgorithm
orig
 The original sequence.rev
 The revised sequence to be compared with the original.DifferenciationFailedException
 if the diff could not be computed.DifferentiationFailedException
 if the diff could not be computed.protected Map buildEqSet(Object[] orig, Object[] rev)
Map
from each common item in orig and rev to the
index of its first occurrence in origorig
 the original sequence of itemsrev
 the revised sequence of itemsCopyright © 2003–2019 The Sakai Foundation. All rights reserved.