[video] Data Science is not NEW – it’s just that we live in a VERY special time!

  • Data Analysis is NOT new
  • Data Mining is NOT new
  • Predictive Analytic is NOT new
  • Machine Learning is NOT new
  • Statistics is NOT new
  • And Data Science is NOT new

So what’s new?

  • The rate at which data is produced.
  • The variety in Data that’s being produced.
  • The “amount” of data that’s being produced.

And we did not have Tools and Techniques before – But now we do! Indeed, We live in a VERY special time!

Here’s a nice 5 minute video titled “Data Science: Beyond Intuition”.

Link to video: http://vimeo.com/48456421  AND Thanks Ryan Swanstrom for sharing!


Back to basics: What is the difference between Data Analysis and Data Mining?


What is the difference between Data Analaysis and Data Mining:

1) One view is that: Data Mining is one particular form of Data Analysis.

difference between data mining and data analysis

One of the reason I researched about the difference between Data Analysis and Data Mining because I find that the terms are used Interchangeably and now I know why. It’s because Data Mining is considered as a particular form of Data Analysis.

2) I found other view that says:

Data Analysis is meant to support decision-making, support conclusions & Highlight note-worthy information. So when “Analyzing data” – we know what we want; we want answers to support our hypothesis; we want data in summarized form to highlight useful information.


Data Mining is meant for “Knowledge discovery” and “predictions”. So when “Mining data” – we look for undefined insights; We want the data to tell us something we didn’t knew before; We want to find patterns in the data that we had not anticipated.







Data Mining: Classification VS Clustering (cluster analysis)


For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.