Find out meanings of the most important data mining terms.
| Antecedent |
Antecedent is a part of association. When association is
determined between two events, the first event is called
antecedent. For example "If customer buys pizza he also buys bear
or lemonade in 89% cases", here "buys pizza" is the antecedent. |
| Association
|
An association algorithms aim is to create decision rules that describe how often events have
occurred together. For example, "When customers purchase a PC,
they also purchase a monitor 87% of the time." These relationships are
mostly created with a
confidence interval. |
| Backpropagation
|
Backpropagation is a neural network training method. The neural network is adjusted until the difference between actual output and desired output is minimal. |
| Classification
|
Classification is a process of determination of a predefined
class, to which an example belongs. For example, given classes of
clients that correspond to loyal and reliable ones, and the others,
identify from a data on a new client to which class she/he belongs. |
| Classification tree
|
Classification trees place examples and categorial variables into
pre-defined classes. |
|
Data |
Data is information, facts, figures and values. Data can be collected by observation, different measurements, experiments, surveys, etc.
|
|
Data Mining |
An information extraction activity. Data mining goal is to discover
new knowledge, revealing hidden facts
contained in databases. Some Data mining methods are:
statistical analysis, neural networks, machine learning, modelling, database technologies.
The result of Data mining processing are often the decision rules,
used to predict future results. |
|
Data Warehouse |
A data warehouse is a collection of data, structured and designed
for further querying, decision making, data mining and knowledge
discovery. |
| Decision Trees
|
A tree-like way of representing a collection of hierarchical
rules, that are nodes of the Tree, where a decision has to be
made. Final nodes represent classes or values.
|
|
Deduction |
Deduction concludes information that is a logical consequence
of the processed data.
|
|
Discrete Data |
Discrete data consists of a finite set of values. |
|
Leaf Node |
Is a node, that cannot be further splitted. It represents a class, a group or a decision in Decision Trees. |
|
Node |
A node in a Decision or Classification Tree. A node represents a
decision, grouping or condition for further splitting. |
|
Pattern Recognition |
Pattern recognition is a process o data classification. The process
is usually based on statistical information, extracted from data.
Pattern Recognition aim is to detect relationships between
variables. Data mining techniques use automatic pattern discovery,
detecting complicated non-linear relationships in data. The patterns
are usually extracted from measurements and observations data. |
|
Prediction |
Prediction (also called regression) is similar to classification.
The only difference is that in prediction the target attribute is
not discrete but a continuous one. Prediction is processed
to discover the numerical value of the target attribute for selected
examples. |
|
Predictive Models |
Examples of predictive models are decision trees, neural networks, logistic regression or rule based models. The predictive
model is used to obtain scoring. The characteristics are inputted as a vector into the model, and as a result we
receive an
output score. |
|
Training data |
This data is used to create or train a model. |
|
Validation |
Validation is testing already obtained decision rules, decision trees or other models on a data, that differs from the
training data. |
|
Validation Set |
A validation set is a part of data, used for data mining. The data set is usually divided into two parts: a training (also
called learning) data set and validation data set. While the training data set is used for building a model, the validation
set is used to test models efficacy. |
|
Variance |
Variance is a measure of dispersion from average values. |
|
Visualisation |
Visualisation is a process of graphical displaying data. Data can be represented as graphs, charts, models, etc. |