In topic "distance measures of text attributes" Neil McGuigan wrote:
... if you're trying to calculate the distance between terms, and not documents, then I would look into the Levenshtein Edit Distance, which I believe, is not (yet) implemented in RapidMiner.
The Levenshtein distance is included in an open source library I found on the net.
SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc. SimMetrics provides a library of float based similarity measures between String Data as well as the typical unnormalised metric output. http://www.aktors.org/technologies/simmetrics/index.html
It is intended for researchers in information integration, II, and other related fields. It includes a range of similarity measures from a variety of communities, including statistics, DNA analysis, artificial intelligence, information retrieval, and databases.
How to install SimMetrics library on Microsoft SQL Server:http://anastasiosyal.com/POST/2009/01/11/18.ASPX?