- Data Analysis is NOT new
- Data Mining is NOT new
- Predictive Analytic is NOT new
- Machine Learning is NOT new
- Statistics is NOT new
- And Data Science is NOT new
So what’s new?
- The rate at which data is produced.
- The variety in Data that’s being produced.
- The “amount” of data that’s being produced.
And we did not have Tools and Techniques before – But now we do! Indeed, We live in a VERY special time!
Here’s a nice 5 minute video titled “Data Science: Beyond Intuition”.
Link to video: http://vimeo.com/48456421 AND Thanks Ryan Swanstrom for sharing!
What is the difference between Data Analaysis and Data Mining:
1) One view is that: Data Mining is one particular form of Data Analysis.
One of the reason I researched about the difference between Data Analysis and Data Mining because I find that the terms are used Interchangeably and now I know why. It’s because Data Mining is considered as a particular form of Data Analysis.
2) I found other view that says:
Data Analysis is meant to support decision-making, support conclusions & Highlight note-worthy information. So when “Analyzing data” – we know what we want; we want answers to support our hypothesis; we want data in summarized form to highlight useful information.
Data Mining is meant for “Knowledge discovery” and “predictions”. So when “Mining data” – we look for undefined insights; We want the data to tell us something we didn’t knew before; We want to find patterns in the data that we had not anticipated.
For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:
- We have a Training set containing data that have been previously categorized
- Based on this training set, the algorithms finds the category that the new data points belong to
- We do not know the characteristics of similarity of data in advance
- Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
|Since a Training set exists, we describe this technique as Supervised learning
||Since Training set is not used, we describe this technique as Unsupervised learning
|Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not.
||Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm
If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.