Skip to main content

Varicent ELT Help Center

Fuzzy Matcher

Abstract

A fuzzy-matching tool that you can use to recognize matching data between two data sets.

Fuzzy Matcher is a fuzzy-matching tool that you can use to recognize matching data between two data sets. Use the Match Level slider to specify your desired degree of fuzziness. When Match Level is set to 100%, fuzzy matching is case-sensitive, otherwise, it isn't.

Note

A low number indicates a higher fuzziness, and in turn a broader scope for matching data. A higher number indicates a lower fuzziness, and in turn a more exact match when comparing data. Set the Match Level slider to 100 for an exact, case-sensitive match.

Input and output

This tool takes two data sets. You can select the data source to be used for matching, and this is the data Varicent ELT will use to match and join the two data sets. One input is the "messy" imperfect data containing multiple rows that you want to match to a single row in the other data source. The other input is the data set that contains unique IDs, and serves as an "answer key".

When configuring the tool, the Source column is used to match the "messy" data to the "answer key" data. The Match column helps train the tool to correctly match the rows.

This tool joins the two data sets using a many-to-one method. Each row from the first input is mapped to, at most, one row from the second input. The output also adds a Similarity Measure column that shows the likelihood of the match being correct.

Note

The order of the conditions does not affect the end result of your match.

When to use this tool

Use this tool when you want to detect matching data between two data sets.

Usage example

Let's say you have two data sets. The first contains transaction data of purchases made by customers. This information is entered manually, so there are some misspellings or missing information in some rows. The second data set contains customer or account information. This data set is the "answer key" where each row represents a unique customer ID with no missing or incorrect information.

Without cleaning up the transaction data set, Varicent ELT would treat a misspelled name and a correctly-spelled name as 2 different customers. We want to use the Fuzzy Matcher to fuzzy-match those rows to the same customer ID. When you run the tool, it fuzzy-matches each row in one data set to a maximum of one row in the other data set.

Tip

For a more detailed explanation using a practical example, create a Fuzzy Matcher Example blueprint from the Apps tab.