COMPANY NEWS


  11.06.07
Estard Data Miner 1.4 has been released

  24.05.07
Estard Data Miner v 1.4 beta has been released

DOWNLOADS


Download Estard Data Miner demo version
Download archived Estard Data Miner demo version

View Estard Data Miner Online Guide

ABOUT DATA MINING


 

DATA MINING TECHNIQUES


 

Business Intelligence


DATA MINING Techniques


What are the most used Data Mining techniques?

 

Classification

Classification is probably the most widely used data mining technique.

Most decision making models are usually based upon classification methods. These techniques, also called classifiers, enable the categorisation of data (or entities) into pre-defined classes.

The use of classification algorithms involves a training set consisting of pre-classified examples. In the tax audit domain, the two classes could be compliant filings versus non-compliant filings, and the training set would be assembled from historical audits. The classifier calibration algorithm uses the pre-classified examples to determine a set of parameters required for proper discrimination between the classes. The algorithm then encodes these parameters into a model called a classifier. Once such a classifier is calibrated, it can assign new filings to either of the classes.

There are many algorithms that can be used for classification, such as decision trees, neural networks, logistic regression, etc.

Using this data mining technique, the data mining tool learns from examples or the data (data warehouses, databases etc) how to partition or classify certain objects (it can be an object, an action, or any other information, that can be formalised).  As a result, data mining software formulates classification rules.

  • Example - customer database

    • Question - Does the customer belong to loyal ones?

    • Typical rule formulated -
if PURCHASED = monthly and PROFIT > 5000$ and INCIDENTS = 0 then CUSTOMER_TYPE = LOYAL

 

Clustering (SEGMENTATION)

Clustering is a data mining technique, used to discover and explore groupings within data or entities. Clustering approaches are mainly  used for segmentation – for example, it can be used to identify polluted soil areas. Clustering method allows entities to be partitioned into distinct groups, also called  “segments”.  The main difference between classification and clustering is that clustering is structuring data without knowing anything about classes, while classification method assigns new knowledge to  the classes that are known apriori.

Cluster analysis is a visual method, that helps to understand data structure.

 

Association

Association rules are basic types of patterns or regularities that are found in transactional-type data. This data mining technique has its origins in traditional retail marketing where it can discover affinities between items that occur within a particular shopping trip (for example, what items typically co-occur as contents of a shopping basket). Hence, an alternative name for this type of analysis is “market-basket analysis”.

From a set of transaction data (for example tax filings, or insurance claims), association rules can discover characteristics within a transaction that imply the presence of other characteristics in the same transaction. For two sets of characteristics X and Y, an association rule is usually denoted as to convey that the presence of the characteristic X in a transaction frequently implies the presence of characteristic Y.

With the help of association methods data mining software creates rules that associate one attribute of a relation to another. Discovering these rules is very efficient on set oriented approaches.
  • Example - customer database in a supermarket
    • 56% of customers who purchase Article1 also purchase Article2
56 is the confidence factor of the rule.

 

Sequence/Temporal

Sequential patterns involve mining frequently occurring patterns of activity over a period of time. In many situations, not only may the coexistence of items within a transaction be important (which would be discovered by association rules algorithms), but also the order in which those items appear across ordered transactions, and the amount of time between transactions (which would be discovered by sequential pattern detection algorithms). Thus, sequential pattern detection methods are similar to association rules, except that they look for patterns across time (as opposed to patterns within transactions). This could be a pattern that represents a sequence of tax filings over time, or a sequence of purchases over time, etc.

Sequence rules differ from other data mining methods with the temporal factor.
 
(c) 2004-2007 ESTARD.
ALL RIGHTS RESERVED.
HOME : PRODUCTS : DATA MINING : ORDER : ABOUT US : CONTACT US : SITE MAP