Introduction to ESTARD Data Miner

Introduction to Data Mining

Step By Step Guide

Program Interface

Using Databases

Using Rules & Decision Trees

BI Functions

Reporting & Saving 
 Home page

Step by Step Guide


To create your first Statistics, Decision Rules and Decision Trees follow these simple steps:

Loading and setting your data

Load the Learning database

The first step for the Data Mining process is selecting the target database. It can be any database containing information you would like to use for data mining. This database is called the Learning database and can be loaded on the "Database" page ( button) or from the main menu. Loading a database is performed in a few steps with the help of a "Wizard".

Check the ID field

After loading the learning database the list of tables and table fields details will be displayed on the "Database" page. In case if your database contains only one table, you don't need an ID field.

Analysing two or more tables together. ID field

In case if you've selected a database with several tables you might want to analyse dependencies between these tables. In such a case you will need a field that will set a correlation between records in these tables.  Such a field should contain only unique values (each value should be met only one time), it can be of any type, and it should appear in every table you want to analyse. This field is called the ID field and is set automatically (if any detected) by the program after you load a database. To change the ID field just select the necessary table from the "Main Table" menu and the field from the "ID Field" menu.

If your tables do not have ID fields, you can analyse each of them separatly from others.

Understanding comments to Table fields

Comments to the list of table fields, displayed in "Field Type", "% of Unique values" and "Comments" are helpful for further analysis.

"Field type" value contains the type of field in the database. It is necessary to know the field type if you have problems with setting your first query. For example, you are using Excel file with several columns, including "Sales" column, which should contain numeral values. Let's suppose that after selecting this column as the "class/examined field" you find out that controls for intervals creation are not displayed. Instead of it you see the list of values found in the column. And one of them is not numeral zero ("0"), but text "None". In case you have such uncleaned data, you will see that the program detected that the "Sales" column contains Text data, and not Numeral, and for good results data should be cleaned - "None" should be replaced by zero("0").

"% of Unique Values" contains the approximate % of unique values found in the column. If the field contains too much unique values, in most cases it might not give good results for analysis. Such fields are marked with grey in the controls on the "Initial Query" page. These fields are not recommended to be used for analysis, but it is important to remember that in your case using such field might reveal some unique information for you.

"Comments"- depending on "% of Unique values", this column will contain information about which functions the seleceted field can perform, for example, whether it can be the ID field, or whether it's not recommended for use in analysis.                       

Your first query

There are three types of queries in EDM - the initial Statistics query, Rules query and the Decision Tree query.

To perform Rules query or Decision Tree query first you have to perform the initial Statistics query. 

To create rules or trees you can use Query Wizard, in this case the Statistics query will be performed automatically, or you can go to the " Initial Query" page and set all manually. The difference between these two methods is that if you use the "Query Wizard" ( button), the Statistics query is performed every time, and the total time of the data mining is longer than in case of using controls on "Initial Query" page.

Here we will explain how to work with "Initial Query" page.

Select the examined "class" field

To select the examined field, scroll the "Manage Fields" list, double click a field (or drag and drop it to the "Examined Field/Class", see the image below).

Class values are all unique values met in the examined field. For example, if a field contains only two types of values: "True"/"False", this means there are two classes in this field.

The best way to receive valuable data from data mining is to analyze the field that contains key information in record. It can be of any type, but not the ID field. Besides, setting Text fields with high level of unique values will probably not give good results. For example, field "Customer name", containing names of the customers of a company, will probably contain lots of unique values - one per each customer. If you set "Customer name" field as the class field, you will receive Rules equal to records, which are the descriptions of every customer. This doesn't apply to numeral fields, because for this type of fields classes are created manually, and you can create as many intervals as you need.

Creating numerical classes

Creating a numerical class means creating intervals you want to analyze.

By default program creates 2 equal intervals. You can use them or create your own. You can edit the number of ranges by changing the “Intervals” value and clicking the “check” button.

By default the first interval is selected and available for adjusting it. You will see it's minimum and maximum values in "From" and "To" controls. Input desired values into these fields and click "check" button to adjus an interval.

Intervals you create should not intersect. For example, intervals "1..10" and "11..20" do not intersect, while "1..10" and "6..20" intersect. If you create intersecting intervals, you will be asked to change the inputted value. Add or delete intervals to the list using the corresponding buttons.

To decide which intervals you want to analyze use the "Class Field Values" chart to analyze how many values are met in equal intervals. 

If you choose a numeral field as the "class" field, all interwals you will create will be selected for analysis by default. In other case you will be displayed a list of all unique values found in the target field. Mark the values you would like to analyse.

Select fields to be analysed

After selecting the class field it is necessary to select at least one another field from the "Manage Fields" list. To add a field, double click it in the "Manage Fields" list, or drag and drop it to the "Fields List". You can also use popup menu or buttons below the list to manage the fields list.

Fields marked with grey color are the ones with the high % of unique values and are not recommended for use. Nethertheles, fields with high % of unique values might give you important results, so it's important to repeat your analysis for several times, before stopping your work.

You can click the "Recommended" button and all recommended fields will be selected automaticaly.

 
 
Start your first request and view it's results

Click the "Process Statistics Query" button on the "Initial Query" page . A notification of successful query will be displayed.

Now you can view statistics on the "Statistics" page and create rules and decision trees.

Viewing statistics

After obtaining statistics you can view such data:

  • Classes Statistics - all class field values met in the class field, their number and percent.
  • Fields Statistics - all values met in all fields selected for analysis (with exception of the class field and ID fields) and number of cases for each value.
  • Exceptions - unexpected groups of data met in fields.
  • Field Values and Class - displays dependencies between values in the class field and other selected field.
 
 
 
 
 
 
 
 
 
Create Profiles (Rules) and view results

Before creating the rules make sure you've created Statistcs. To create decision rules check the current rules query settings   ("Initial Query" or "Query Parameters"  page).  It is better to start with higher values and then repeat rules creation with lower settings. First time you might get no rules at all, or get a small number of rules, but if you are working with a large database with many thousands of records, this will highly increase  the perfomance and as a result you won't get "overfitted" rules. For example, you want to analyse a database with 40 000 of records. As the class field you've selected a field that contains such values: True/False. In this case if you set rules cases equal 5, you would probably get thousands of rules, that will describe small data patterns. If it is hard to decisde what value to set for "Minimum Number of Cases for a Rule" - check "Classes Statistics", select the smallest value in the "Met In" column and set it for the "Minimum Number of Cases for a Rule". "Minimum rule probability" setting also has direct influence on the number of rules, and, as a result on time necessary for their creation and output. It is also recommended to start with higher values for this setting, for example - with 50%-90%.

Switch to the "Query Parameters" tab. Select classes and fields you would like to use for rules creation.

Start rules creation by pressing "Create Rules" button .

The program will automatically switch to the "Rules" page after creating the rules.

Viewing rules

All rules are grouped by classes and sorted by length. In the "Rule Details" form you can view details for each rule, or sort rules by length, probability and cases. After sorting, rules will remain grouped by classes. On this page you also can view the number of rules for each class, undefined classes and overall rules number. You also can create a report or export rules to .TXT file.

Create Decision Tree and view results

Before creating the decision tree make sure you've created Statistcs.  Use default settings for creating your first decsion tree. If "Minimum number of cases for a branch" value is set to "1" this means this setting is turned off. If necessary, you can change this value, to cut off branches that describe small data patterns. Switch to the "Query Parameters" tab. Select classes and fields you would like to use for decision trees creation.

Start decision tree creation by pressing "Create Decision Tree" button .

The program will automatically switch to the "Decision Tree" page after creating the tree. On this page you can view the obtained decision tree, create a report or export tree to .TXT file.

Using BI Functions: WHAT-IF

Use Decision Rules and Decision Trees for WHAT-IF analysis

Now after you've created rules or decision trees, you can use them not only for your reports and analysis, but you also can use the obtained data for fast WHAT-IF analysis of a case.

To perform WHAT-IF analysis, go to the "WHAT-IF" page. You will see the list of fields you've selected for initial query.

For example: you've created rules describing clients with high risk of bad debt. Now you can analyse information about a new client with just a few clicks: input the data you have about the new customer into the corresponding fields, make sure that option "Use Rules" is marked in the "Analyse Case Options" dialog ( button at "WHAT-IF" page ) and press the "Analyse" button . If some rules correspond to the inputed data, they will be displayed in the list below, grouped by classes they describe. Now you will know how much the new client has incommon with clients with high risk of bad debt.  You also can see charts representing results in graphical way.

Use  Decision Rules and Decision Trees to search a database or test obtained data

You also can apply the obtained rules and decision trees to search for target records in a database. Continuing the example with clients with high risk of bad debt described above, after creating rules or decision trees you will have to go to the "Search a database" page and make sure that in options dialog the "Use Rules" option is selected. Now upload the target database ( button). Press the "Search"   button at the "Search a Database" page and select rules you want to use for the search. To receive records described by any of selected rules, mark the "Join Rules with OR" option. If you want to select records described by each of rules, select the "Join Rules with AND" option. Now press the "Process Query" button. You will see a list of records and their number.

If you don't receive any records, try to change the list of rules, or mark the "Join Rules with OR" option.