The algorithm is slightly different to the one by Google Refine. The replacements of extended western characters is already done in the third step and not as the last step. This is mostly done so the sorting will work properly.
- change all characters to their lowercase representation
- remove all punctuation, whitespace, and control characters
- normalize extended western characters to their ASCII representation
- obtain all the string n-grams
- sort the n-grams and remove duplicates
- join the sorted n-grams back together
var fingerprint =// returns arispari