Soluling home   Document home

Fuzzy Matching

Fuzzy Matching

Fuzzy matching (Wikipedia) is a technology to find strings that match less than 100%. It makes it possible to find strings matches that are very similar but not 100% the same. For example, if you have already translated a very similar sentence, it may be useful to reuse the existing translation than to translate it from scratch. Let's have an example. If you have the following sentence.

This is a sample

You have translated it into German and French. Now, if you have three other sentences.

This is one sample
This is another sample
These are samples

All the above sentences are similar to the original sentence. Also, the meaning is similar or even the same. Fuzzy matching can calculate a matching percentage for each string. The following table contains fuzzy percents for the strings compared to the original string.

Sentence Fuzzy percent Description
This is one sample 83% The match is close. The translation of the original string can be reused here.
This is another sample 73% The match is quite close. The translation of the original string can be reused if slightly modified.
These are samples 58% The match is quite far. The translation of the original string may be reused if modified.

Fuzzy matching lets you reuse a much wider amount of existing sentences than a perfect match. However, whenever using translation provided by fuzzy matching, a great concern should be taken, carefully checking the translation before accepting it. For example, take a look at the following two sentences:

This is a very good bike
This is a very bad bicycle

Fuzzy matching gives a 73% match for those two sentences; even the meaning of the sentences is very different.

Soluling's translation memory uses fuzzy matching and segmentation to find the most optimal translations. Fuzzy matching is also used when importing data. In both cases, there are two fuzzy options that you can set in order to configure a fuzzy engine for your needs. The options are percent and filter.

Percent

This specifies the threshold percentage. All matches that exceed or equal to this limit are counted. The default value is 80%. It makes the fuzzy engine pass only those strings that have very similar to the string needed. You can increase this limit all the way to 100%. A value closer to 100% gives more accurate matches, but the possibility of finding at least one match decreases. If you lower the limit, you will get more matches but less accurate.

Filter

Soluling's fuzzy engine uses a very efficient fuzzy matching algorithm. Still, if your translation memory contains tens of thousands of strings, it might take too long to compare all strings. This is why the application uses filtering. This is a special algorithm that quickly calculates if a string might have a close match to the search string. Filtering can greatly reduce the time needed for finding suitable matches. There are three levels of filtering: low, medium, and high. Medium is the default value, and it performs moderate filtering. If you want the fuzzy engine working faster, set the filtering to high. If the speed is not an issue for you, set the filtering to low.

To make fuzzy matching faster, make sure you install Soluling into a computer with at least 2 CPUs. Soluling can utilize multiple CPUs when calculating fuzzy matches.