Data Mining: Classification VS Clustering (cluster analysis)

Standard

For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

CLASSIFICATION CLUSTERING
  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.

Advertisements

12 thoughts on “Data Mining: Classification VS Clustering (cluster analysis)

  1. Ann

    So the biggest difference between the two is that classification is predetermined while cluster isn’t? And the sub-datasets with cluster, are they similar with each other or within each other? Thanks!

    Like

    • Right, segments are pre-determined for classification.

      If you do a hands-on session on clustering then that might help you with your second question – Let me know if you have any questions.

      Like

      • Md. Shahidul Islam

        Dear , I want to analysis on semi supervised data streams. By using a windows approach I want to establish this. So please , can you help me ? How can I do this ? Please let me suggest some tips.

        Thanks.
        Md. Shahidul Islam

        Like

Thank this author by sharing the article on social media. If you have any questions or comments, please leave a reply below:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s