Regressor
Train the model to predict continuous or ranged numeric columns, such as height, age, sales, and so on.
Important
The Regressor tool is only available to users on Varicent Advanced Algorithm Library plans. If you are interested in using this tool, please contact your Varicent Customer Success Manager.
Train the model to predict continuous or ranged numeric columns, such as height, age, sales, and so on.
This tool looks at all the available Regressor tools and picks the best one.
Note
The speed and quality slider is more of a spectrum. This setting determines the number of machine learning models considered. Models that do not support the training data are automatically excluded.
This tool adds two columns to your data: a prediction and a probability. It's the model's certainty about the prediction.
When you run the tool, the data is automatically split: 80% of the data is used for training. The remaining 20% is used for testing. Each model being considered is trained and evaluated to select the one with the best score. This is done 5 times to predict the test values (the 20% of your data). The final score is the average of all 5 scores.
Tip
You can configure this tool without using the configuration menu.
In the tab to auto-complete. Then start typing the name of the column you want to use and press tab to auto-complete.
menu, start typing the first few letters of the tool name and pressWhen to use this tool
Use when you want to predict values.
Regressors solve continuous value problems. For example, if the values in the target column are 1 and 6, the predicted answer could be 5.
Performance measures
You can select a model performance measure that dictates how Varicent ELT selects a winning model. The performance measures that you can choose from are:
Automatic: Select this measure to let Varicent ELT choose for you.
Pearson Correlation: How correlated the predictions are to the actual labels being predicted. The higher, the better (maxes at 1).
Mean Absolute Error: How close each guess is to the actual value on average.
Root Mean Squared Error: How close each guess is to the actual value on average. A square operation is used to punish larger differences more.
Symmetric Mean Absolute Percentage Error: An accuracy measure based on percentage (or relative) errors. Relative error is the absolute error divided by the magnitude of the exact value and will be scaled to 0-2.
How to read the data in this tool
In the row viewer, there are three tabs: Data, Stats and Tool.
The Data tab consists of your imported data. View all of the imported data in one spot.
The Stats tab consists of the statistics for your data. View all of the top values for each column.
The Tool tab is a visualization of additional insight into the tool and the data. The following columns are available:
Column importance: Displays the columns in order of importance.
Smart excluded: Displays the columns that don't predict the target column.
Full details
In the row viewer, on the Tool tab, there is a feature called Full details. Click to expand and explore more information about your model.
Configuration
Use the following configuration options to configure the Regressor tool.
Go to the Pipes module from the side navigation bar.
From the Pipes tab, click an existing pipe to open, or create a new pipe. To create a new pipe, read the Creating a pipe documentation.
In your pipe, add your data source.
Click
+ Tool.
In the Tools modal, search for Regressor. Click + Add Tool.
Tip
You can also find the Regressor tool in the Learn section.
Connect the tool to your dataset.
In the configuration pane, enter the following information:
Table 74. Regressor tool configurationField
Description
Train type
Select the train type that you want to use to train your data.
Target column
Select the column that you want to use the Regressor tool on.
Advanced section
Performance Measure
Select the type of performance measure to use:
Automatic: Let Varicent make the decision for you.
Pearson Correlation: How correlated the predications are to the actual labels being predicted. The higher the better (maxes at 1).
Mean Absolute Error: How close each guess is tot the actual value on average. This is a golf metric (best is 0).
Root Mean Squared Error: How close each guess is to the acutal value on average. A square operation is used to punish larger differences more. This is a golf metric (best is 0).
Symmetric Mean Absolute Percentage Error: An accuracy measure based on percentage (or relative) errors. Relative error is the absolute error divided by the magnitude of the exact value, meaning this will be scales 0-2. This is a golf metric (best is 0).
Speed versus Quality slider
Use the slider to indicate if you want speed versus quality when the Regressor is working.
Exclude columns
Select the column(s) that you want to exclude from the Regressor.
Smart exclude
Select this option if you want to have Smart Exclude identify and automatically exclude columns that don’t help the target column after you build.