Skip to main content

Varicent ELT Help Center

Smart Matcher

Abstract

The Smart Matcher is a smart fuzzy-matching tool that you can use to train a model to recognize matching data between two data sets.

The Smart Matcher is a smart fuzzy-matching tool that you can use to train a model to recognize matching data between two data sets.

Input and output

This tool takes two data sets. The top input is the imperfect data containing multiple rows that you want to match to a single row in the bottom data source. You can replace the top data set in later exports. The bottom input is the data set that contains unique IDs. This is the data used to match and join the two data sets. In practical terms, the top input is "messy" data that has incorrect or missing information in some columns. The bottom input is the data set that serves as an "answer key".

When configuring the tool, the Target column is used to match the "messy" data to the "answer key" data. The Matching columns help train the tool to correctly match the rows.

This tool joins the two data sets using a many-to-one method. Each row from the top input is mapped to, at most, one row from the bottom input. The output also adds a Probability column that shows the likelihood of the match being correct.

When to use this tool

Use this tool when you want to train a model to fuzzy-match data sets. The tool gets better at matching as you add more data and build the pipe.

Usage example

Let's say you have two data sets. The first contains transaction data of purchases made by customers. This information is entered manually, so there are some misspellings or missing information in some rows. The second data set contains customer or account information. This data set is the "answer key" where each row represents a unique customer ID with no missing or incorrect information.

Without cleaning up the transaction data set, Varicent ELT would treat a misspelled name and a correctly-spelled name as 2 different customers. We want to use the Smart Matcher to fuzzy-match those rows to the same customer ID. When you run the tool, it fuzzy-matches each row in the top data set to, at most, one row in the bottom data set.

Tip

For a more detailed explanation using a practical example, create a Smart Matcher Example blueprint from the Apps tab.