Introduction to ESTARD Data Miner

Introduction to Data Mining

Step By Step Guide

Program Interface

Using Databases

Using Rules & Decision Trees

BI Functions

Reporting & Saving 
 Home page

Using Rules and Decision Trees Settings


At the very beginning of working on a new database it is hard to guess what settings will suite your case best. The settings also depend on what is your aim: your aim can be to obtain few rules, but with highest probability, or obtaining all rules, that describe more than ten cases. In both cases, your personal settings will vary. Here we will give some recommendations on how to start using settings:

  • It is better to start with high values for "Probability" and "Rule cases" settings. If the number of obtained rules is low, or not created at all, low down these values and create rules once again. 
  • The more rules or decision tree nodes are created, the longer it takes to analyse and output all of them, this is why low settings for "Probability" and "Rule cases" will result in longer analysis.

Example of using settings

suppose you want to analyse a database with 40 000 of records. As the class field you've selected a field that contains such values: True/False. In this case if you set rules cases equal 5, you would probably get thousands of rules, that will describe small data patterns. Using such settings you will create overfitted profiles.

If it is hard to decisde what value to set for "Minimum Number of Cases for a Rule" - check "Classes Statistics", select the smallest value in the "Met In" column and set it for the "Minimum Number of Cases for a Rule".  Of course, if the minimum value is very small in comparison to records number, it is better to use higher values (for example, if you have 40 000 records in a table, and the minimum value found in "Met In" equals "1", then it's better to ignore such value).

"Minimum rule probability" setting also has direct influence on the number of rules, and, as a result on time necessary for their creation and output. It is also recommended to start with higher values for this setting, for example - with 50%-90%. This value can also be correlated with value in the "Met In %" column on "Classes Statistics" page.

For decision tree creation it is better to start with minimum values in settings ad then continuing playing with them, adjusting the minimum number of cases for a rule.

Try repeating query with different settings, until you will obtain all necessary combinations of data.