Presented at #sqlpass summit 2015.
A quick blog post to let you know about a #sqlpass webinar on 1/15.
Description: The world is becoming more efficient. Today, seventy percent of the companies that graced the Fortune 1000 list a mere decade ago have vanished. Agility and survival are function of innovation, culture, and the ability to predict the future. To that end, data analytics offers a lifeline, a means of survival that will drive productivity and continue to disrupt and redefine business. However, the resources available to today’s business leaders sit on two vastly different ends of the spectrum. On the one hand, highly technical academic resources and on the other largely fluffy overviews of value propositions and potentials. The state of the industry shouldn’t be surprising. The same dynamics played out in early years of the internet. Software providers, technical leaders, and consulting firms greatly benefit from mystifying the world of data analytics into something that is incomprehensible. That lack of conceptual understanding is incredibly risky and propels the cost of analytics initiatives upwards. This webcast aims to bridge that gap between the technical data scientists and business leaders. Ultimately, this understanding will help to: – Connect the strategic goals of business leaders with the capabilities of technical advisers – Focus investments and initiatives within analytics and technology – Distill immensely complex subject matter into comprehensible examples – Accelerate the path to value and increase the ROI of analytics initiatives
Alex is a Predictive Analytics Architect in the Oil and Gas industry with a passion for distilling complexity into insights and evangelizing data science. His work has been featured on KDNuggets and he was recognized by DataScienceCentral as a top 180 blogger in 2014.
I hope to see you there!
In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.
Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Science stage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:
|Business Intelligence.(WHAT & WHY)||Data Science & advanced analytics.|
||Can you predict bike rentals on an hourly basis?|
||Can you predict the credit risk of the customer during contract negotiations stage?|
|Customer relationship management||
||Can you predict customer churn?|
||Can you predict whether a scheduled flight will be delayed by more than 15 minutes?|
||Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?|
I hope this helps!
Insider’s Introduction to Microsoft Azure Machine Learning (AzureML)
Thu, Sep 18, 2014 12:00 PM – 1:00 PM EDT
Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.
Mark is a consultant who provides enterprise data science analytics advice and solutions. He uses Microsoft Azure Machine Learning, Microsoft SQL Server Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint, Power BI, .NET). He is a SQL Server MVP and has a research doctorate (PhD) from Georgia Tech.
Hope to see you there!
Business Analytics Virtual Chapter’s Co-Leader
Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):
Now, here’s the difference between Multi Class and Two Class:
if your Test Data needs to be classified into two classes then you use a two-class classification model.
1. Is it going to Rain today? YES or NO
2. Will the buyer renew his soon-to-expire subscription? YES or NO
3. What is the sentiment of this text? Positive OR Negative
As you can see from above examples the test data needs to be classified in two classes.
Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.
1. Sentiment analysis of customer reviews? Positive, Negative, Neutral
2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow
I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.
Example: Azure Machine Learning (AzureML) studio’s classifier list:
I hope this helps!
In this post, I’ll post what why does the “Naive Bayes machine learning” algo have the word Naive in it?
So here is the short answer:
It “assumes” that the features are independent. (In other words: There’s no relation between the features that are used while building the model)
Let’s go a little deeper:
First up, few basic pointers.
> It’s a machine learning algorithm used for classification
> It’s based on Bayesian Statistics.
> you can read about it here: http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Now, what do you mean when you mean that it is Naive because it assumes that features are independent?
Let’s take an example:
Suppose, you are building a “credit card approval” model based on Income and CreditScore
(SideNote: For those who do not know what is credit score, here you go: http://en.wikipedia.org/wiki/Credit_score_in_the_United_States)
And you have the following columns in the training data (Note: In machine learning, think of this columns as features)
Here the features are Income & CreditScore and the target of the classification model is Approved.
In real world, there’s some relation between “income” and “creditscore”. Agree? Great! But Naive Bayes doesn’t think so. Let me reiterate the point of this blog post and see if it makes more sense now: it assumes that the features are “independent” and that’s why it is Naive!
I hope this helps. your comments are very welcome!
Weka is a popular free open source machine learning tool. In this post, I’ll note the steps that I took to install it on windows machine:
1. Search “Download Weka”. As of today, the URL is http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. Now, it’ll have options to download the Weka. Here, based on your
– Machine configuration (x86 vs x64)
– Java version and the corresponding Weka version
So let’s check that:
3. To check the Java version installed on your computer, open up command prompt and type Java -version
let’s see if it’s compatible w/ the weka version:
As you can see, the version of weka that I’ll be installing requires Java 1.7 and I already have that – so for now my machine, I selected the option:
Click here to download a self-extracting executable without the Java VM
Also remember to check the operation system type (x86 vs x64) and download the corresponding version of weka.
4. After downloading, install it. I left all the options default.
5. After successful installation, I launched weka by going to:
start > all programs > weka 3.6.9 > weka 3.6.9
That’s about it for this post.
Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?
I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:
If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section:
For the Past couple of months, One of the things that I have thought about is “What is the Difference Between Machine Learning & Data Mining”. I have Studied Data Mining and Advanced Data Mining concepts at both Undergraduate and Graduate level and recently I started learning about Machine Learning via Coursera.org – I was curious to know the difference between the two similar/inter-related fields. After, spending time understanding what Machine Learning is – Here’s what I am thinking:
When I learned Data Mining – The focus was on Taking a Data-set and using (more than one) Algorithm(s) to detect Patterns in the data-set. I am studying machine learning – Here, we’re asked to write algorithms (and build models). So To me, Data Mining seems to be deal with practical aspects of putting Machine Learning algorithms to use.
When I took Data Mining courses – I didn’t write algorithms. But learned what different Data Mining Algorithms can do and what kind of patterns each algorithm helps us find. In machine learning class, my focus is to learn how to write the algorithms (build the model) and optimize it so that it can predict well.
Also, in machine learning the goal is clear – the questions are mostly like “Build a model from Past Data that predicts X “. whereas I remember, For our Graduate Level class, My professor gave our Team a data-set of “fatal accident data” and said “Go play with it!”
These were my experiences. What are your experiences with Data Mining, Machine Learning – and how do you differentiate between these two fields which are similar in more than one ways?
- What is the difference between Data Analysis and Data Mining? (parasdoshi.com)
- Data Mining: Classification VS Clustering (cluster analysis) (parasdoshi.com)
- Data Science is not NEW – it’s just that we live in a VERY special time! (parasdoshi.com)
- Where can we find datasets that we can play with for Business Intelligence, Data Mining, Data Analysis Projects? (parasdoshi.com)