Pages: [1]
  Print  
Author Topic: [SOLVED] Levenshtein Edit Distance and other string similarity measures  (Read 2392 times)
Retegniw
Newbie
*
Posts: 38


« on: March 17, 2013, 08:49:12 AM »

In topic "distance measures of text attributes" Neil McGuigan wrote:

Quote
... if you're trying to calculate the distance between terms, and not documents, then I would look into the Levenshtein Edit Distance, which I believe, is not (yet) implemented in RapidMiner.

The Levenshtein distance is included in an open source library I found on the net.

Quote
SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc. SimMetrics provides a library of float based similarity measures between String Data as well as the typical unnormalised metric output.
It is intended for researchers in information integration, II, and other related fields. It includes a range of similarity measures from a variety of communities, including statistics, DNA analysis, artificial intelligence, information retrieval, and databases.
http://www.aktors.org/technologies/simmetrics/index.html

Source code:
http://sourceforge.net/projects/simmetrics/

Documentation:
http://www.coli.uni-saarland.de/courses/LT1/2011/slides/stringmetrics.pdf

How to install SimMetrics library on Microsoft SQL Server:
http://anastasiosyal.com/POST/2009/01/11/18.ASPX?

Regards

Roland
« Last Edit: March 19, 2013, 03:17:53 PM by RWingerter » Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #1 on: March 18, 2013, 02:02:53 PM »

Hi Roland,

thanks for the input. Unfortunately, the quoted library is released under the GPL, which is not compatible to the licensing model of our Enterprise Edition, so it won't be integrated into the core of RapidMiner. Nevertheless it should be possible to create an extension integrating that library, but that won't get a high priority on our roadmap.
Of course the community is free to implement a custom extension, which can also be published on the rapid-i marketplace.


Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Retegniw
Newbie
*
Posts: 38


« Reply #2 on: March 18, 2013, 04:33:39 PM »

Hi Marius,

thanks for your feedback. It would really be nice if we could use Edit distance in RapidMiner. Unfortunately it is beyond my abilities to write Java code, although the algorithm looks simple enough, cf.

http://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance

Regards

Roland

Logged
awchisholm
Sr. Member
****
Posts: 393


WWW
« Reply #3 on: March 19, 2013, 02:34:17 PM »

Hello

You could always use R - the 'vwr' package contains a function


Andrew
Logged

Retegniw
Newbie
*
Posts: 38


« Reply #4 on: March 19, 2013, 03:15:56 PM »

Thank you, Andrew. That's good to know.

Roland
Logged
Pages: [1]
  Print  
 
Jump to: