![]() and begins matching the regex inside the lookahead. While straight text comparison would tell you that "dog" and "cat" are not even close to similar WordNet would recognize that they are both animals. They do not consume characters in the string, but only assert whether a match is possible or not. It uses a dictionary to determine how related two documents are. One last module that I found while browsing is WordNet::Similarity. There are many other modules out there as well that also will compute similarity I have just not made it to testing them yet. I have found that anything above 95% is always correct, 85% above is probably correct, 80% and above is a toss up and anything below 80% is almost always wrong. So far String::Similarity?has worked well for me. Very interesting concept that I am sure works well, however as someone had commented it would not work very well for album titles because they are so short. The concept is that the compressed version of the two text files should have similar size and an analysis of the resulting compressed files should reveal their similarity. ![]() This is a concept discussed on Perl Monks and per the author was discussed in Scientific American at one point. The major draw back is that if you have strings of different lengths the results start getting a little fuzzy. This module uses the unix Diff algorithm to calculate the difference between two items. While very cool this is not what I am looking for I just need to compare two short strings. This is almost identical to String::Similarity except that it is designed to take to complete files and compare them. This module is more complex than what I need. This module is designed to work with strings and allows you to program specific variables such as tolerance for different types of differences. This is not as usefull for me since I want to compare things on a percentage basis and this will return the actual number of changes in an integer. This is same method as I mentioned above, but this module returns the distance or number of edits required to make the strings identical. This module returns a value between 0 and 1, 1 being identical. So "Foo" and "Fou" would require one change and "Foo" and "Bar" require 3. Stated simply the Levenshtein method counts how many changes are required to to make string1 identical to string2. This is my suggestion for short string comparison. The major flaw I found was that if string1 contained "&" and string2 contained "and" this would cause a low match generally below what I would automatically recognize. These are short strings and may contain non-alphanumeric keys.Īs I discussed in my last post String::Compare?was a great concept and certainly a good starting point but it has a few flaws. It is important to remember that I am trying to compare Music Album titles and artists. Some of the included modules are: String::Similarity, Text::Levenshtein, String::Approx, Text::Similarity, Algorithm::HowSimilar, Compressed file concept.? I did my own personal test and review of each of the following modules and a few additional concepts. Since then I have found some additional modules that do similar things. In my most recent post I discussed a the String::Compare module. I have received a couple of suggestions via email in regards to how best compare two text strings.
0 Comments
Leave a Reply. |