Clustering in Data Mining

Clustering is that the process of creating a group of abstract objects into classes of comparable objects. A cluster of data objects are often treated together group.
While doing cluster analysis, we first partition the set of data into groups supported data similarity then assign the labels to the groups.

Datamining Cluster Analysis

Let's understand this with an example, suppose we are a market manager, and we have a new tempting product to sell. We are sure that the product would bring enormous profit, as long as it is sold to the right people. So, how can we tell who is best suited for the product from our company's huge customer base?

Good Clustering Algorithm Aims:

Intra-cluster similarities are high, It implies that the data present inside the cluster is similar to one another.
Inter-cluster similarity is low, It means cluster holds data that is not similar to other data.

What is a Cluster ?

A subset of objects such that the distance between any of the two objects in the cluster is less than the distance between any object in the cluster and any object that is not located inside it.
A connected region of a multidimensional space with a comparatively high density of objects.

What is Clustering in Data Mining ?

The method of converting a group of abstract objects into classes of similar objects.
Method of partitioning a group of data or objects into a group of serious subclasses called clusters.
Data objects of a cluster can be considered as one group.

2. What is the result of clustering a partitionned table in Hive ?

Applications of Cluster Analysis in Data Mining

Applications of Cluster Analysis

Helps in allocating documents on the internet for data discovery.
Clustering Analysis used in data analysis, market research, pattern recognition, and image processing.
It can be used to determine plant and animal taxonomies, categorization of genes with the same functionalities and gain insight into structure inherent to populations.
It is also used in tracking applications such as detection of credit card fraud.
To find different groups in their client base and based on the purchasing patterns.

Why clustering used in Data Mining ?

Advanced algorithm may give the best results with one type of data set, but it may fail or perform poorly with other kinds of data set.

Scalability

Scalability in clustering implies that as we boost the amount of data objects, the time to perform clustering should approximately scale to the complexity order of the algorithm.
For example, if we perform K- means clustering, we all know it's O(n), where n is that the number of objects within the data. Scalability in clustering implies that as we boost the quantity of data objects, the time to perform clustering should approximately scale to the complexity order of the algorithm. If we raise the amount of data objects 10 folds, then the time taken to cluster them should also approximately increase 10 times. It means there should be a linear relationship. If that's not the case, then there's some error with our implementation process.

Interpretability

Outcomes of clustering be interpretable, comprehensible, usable.

Discovery of clusters with attribute shape

It should be able to find arbitrary shape clusters. They should not be limited to only distance measurements that tend to discover a spherical cluster of small sizes.

Read Also

1. What is Cassandra Data Modelling

Ability to deal with different types of attributes

It should be capable of being applied to any data like data based on intervals (numeric), binary data, and categorical data.

Ability to deal with Noisy Data

Databases contain data that is noisy, missing, or incorrect.

High Dimensionality

Tools should not only able to handle high dimensional data space but also the low-dimensional space.

Clustering in Data Mining

What is a Cluster ?

What is Clustering in Data Mining ?

Read Also

Applications of Cluster Analysis in Data Mining

Why clustering used in Data Mining ?

Scalability

Interpretability

Discovery of clusters with attribute shape

Read Also

Ability to deal with different types of attributes

Ability to deal with Noisy Data

High Dimensionality

UP NEXT IN Data Mining

Related Searches to Clustering in Data Mining

Wikitechy

Workshop

Join our Community

Other Languages

Clustering in Data Mining

What is a Cluster ?

What is Clustering in Data Mining ?

Read Also

Applications of Cluster Analysis in Data Mining

Why clustering used in Data Mining ?

Scalability

Interpretability

Discovery of clusters with attribute shape

Read Also

Ability to deal with different types of attributes

Ability to deal with Noisy Data

High Dimensionality

UP NEXT IN Data Mining

Related Searches to Clustering in Data Mining

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages