[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)

Standard

Presented at #sqlpass summit 2015.

Advertisements

Top two key techniques to analyze data:

Standard

There are many techniques to analyze data. In this post, we’re going to talk about two techniques that are critical for good data analysis! They are called “Benchmarking” and “Segmentation” techniques – Let’s talk a bit more about them:

1. Benchmarking

It means that when you analyze your numbers, you compare it against some point of reference. This would help you quickly add context to your analysis and help you assess if the number if good or bad. This is super important! it adds meaning to you data!

Let’s look at an example. CEO wants to see Revenue numbers for 2014 and an analyst is tasked to create this report. If you were the analyst, which report would you think resonated more w/ the CEO? Left or Right?

Benchmarking data providing context in analysis

I hope the above example helped you understand the importance of providing context w/ your data.

Now, let’s briefly talk about where do you get the data for benchmark?

There are two main sources: 1) Internal & 2) External

The example that you saw above was using an Internal source as a benchmark.

An example of an external benchmark could be subscribing to Industry news/data so that you understand how your business is running compared to similar other businesses. If your business sees a huge spike in sales, you need to know if it’s just your business or if it’s an Industry wide phenomenon. For instance, in Q4 most e-commerce sites would see spike in their sales – they would be able to understand what’s driving it only if they analyze by looking at Industry data and realizing that it’s shopping season!

Now, let’s shift gears and talk about technique #2: Segmentation.

2. Segmentation

Segmentation means that you break your data into categories (a.k.a segments) for analysis. So why do want to do that? Looking at the data at aggregated level is certainly helpful and helps you figure out the direction for your analysis. The real magic & powerful insights are usually derived by analyzing the segments (or sub sets of data)

Let’s a look at an example.

Let’s say CEO of a company looks at profitability numbers. He sees $6.5M and it’s $1M greater than last years – so that’s great news, right? But does that mean everything is fine and there’s no scope of optimization? Well – that could only be found out if you segment your data. So he asks his analyst to look at the data for him. So analyst goes back and after some experimentation & interviews w/ business leaders, he find an interesting insight by segmenting data by customers & sales channel! He finds that even though the company is profitable – there is a huge opportunity to optimize profitability for customer segment #1 across all sales channel (especially channel #1 where there’s a $2M+ loss!) Here’s a visual:

segmentation data Improve profitability low margin service offerings customers

I hope that helps to show that segmentation is a very important technique in data analysis!

Conclusion:

In this post, we saw segmentation & benchmark techniques that you can apply in your daily data analysis tasks!

Examples to help you differentiate between Business Intelligence and Data Science problems:

Standard

In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.

Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Science stage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:

Business Intelligence.(WHAT & WHY) Data Science & advanced analytics.
Bike Rentals
  1. How many bikes did we rent in Q3 2014? How does that compare to Q3 2013?
  2. What is the trend of total bike rentals at week level? Can you break it down by geography?
Can you predict bike rentals on an hourly basis?
Credit Risk
  1. How many customers have a credit risk of ‘C’?
  2. Can you rank customers by their payments due amount that have a credit risk ‘C’?
Can you predict the credit risk of the customer during contract negotiations stage?
Customer relationship management
  1. How many account cancellations occurred this year (broken down by month and customer segmentation)?
  2. How does percentage of account cancellations this year compare to that previous year?
 Can you predict customer churn?
Flight Delays
  1. What is the trend of % of flight delayed this year?
  2. Can you break down flight delays this year by their reasons?
Can you predict whether a scheduled flight will be delayed by more than 15 minutes?
Customer feedback
  1. What is the customer satisfaction % trend this year?
  2. What is the customer satisfaction % broken down by customer segments and product segments?
Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?

I hope this helps!

PASS Business Analytics VC: 7 Ideas on Encouraging Advanced Analytics by Mark Tabladillo #sqlpass

Standard

Thu, Jul 17, 2014 12:00 PM – 1:00 PM EDT


Abstract:
Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.

Bio:
Mark Tabladillo is a Microsoft MVP and SAS expert based in Atlanta, GA. His Industrial Engineering doctorate (including applied statistics) is from Georgia Tech. Today, he helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He has spoken at major conferences including Microsoft TechEd, PASS Summit, PASS Business Analytics Conference, Predictive Analytics World, and SAS Global Forum. He tweets @marktabnet and blogs at http://marktab.net.

REGISTER HERE: bit.ly/PASSBAVC071714

hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

Resource: Introduction to Data Science by Prof Bill Howe, UW

Standard

Introduction to Data Science course taught by Bill Howe just started on coursera platform. Having studied the Data Intensive Computing in Cloud course at UW taught by Prof Bill Howe, I can say that this course would be great resource too!

Check it out: https://www.coursera.org/course/datasci

Introduction to Data Science

PASS Business Analytics Conference Keynote Day #2

Standard

Dr. Steven Levitt’s (Indiana Jones of economics & Author of Freakonomics) work involves finding insights from data. In the keynote, he shared some of the interesting & fun insights that he found from data.

One Example: Dr. Levitt: According to the data, It was 7 times more dangerous to sell crack in Chicago than it was being in combat in Iraq. https://twitter.com/markvsql/status/322707949158006786

He also talked about other insights that he found which could also be found in his book Freakonomics. After getting audience fascinated about what analyzing data can do – he moved to his real world experiences of analyzing data for businesses. And tied all these fascinating insights back to some tips he had for the audience. Here is a brief recap of the tips he shared:

> “Ideas don’t come out of the blue. Almost always ideas come out of the data” – Dr. Steven Levitt

> “You guys are the future. What you’re doing is the key to a business’ success or failure.”

> Experiment & Test Hypothesis using DATA

> Misconceptions can cripple you. Let the data speak, even when it might be difficult

> Most important people = who understand and know what to do with data, not those who pretend they know the answer.

> Dr. Levitt: without data any biz will be left behind, must experiment and accept failure

*Above text is linked to tweets.

That’s about it for this post. What do you think about the tips that Dr Levitt shared?

 

How to start Analyzing Twitter Data w/ R?

Standard

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

[video] Data Science is not NEW – it’s just that we live in a VERY special time!

Standard
  • Data Analysis is NOT new
  • Data Mining is NOT new
  • Predictive Analytic is NOT new
  • Machine Learning is NOT new
  • Statistics is NOT new
  • And Data Science is NOT new

So what’s new?

  • The rate at which data is produced.
  • The variety in Data that’s being produced.
  • The “amount” of data that’s being produced.

And we did not have Tools and Techniques before – But now we do! Indeed, We live in a VERY special time!

Here’s a nice 5 minute video titled “Data Science: Beyond Intuition”.

Link to video: http://vimeo.com/48456421  AND Thanks Ryan Swanstrom for sharing!

Visualizing dataset of 2 million+ passwords:

Standard

I found a data-set of password(s) on DataScienceCentral: Password and hijacked email dataset for you to test your data science skills – And for fun, I played with the data-set for an hour or so:

1) Password Length vs Frequency

1 how to choose password password length

2) Percentage of passwords having at least one special character vs passwords having no special character:

2 passwords that have special character vs the one's that dont

3) Percentage of passwords that have: at-least one number, one alphabet & one special character AND length = 8 or more.

Answer: 1.4856%

Let’s see a comparison of Passwords of length 8 or more (69.302%) vs Passwords of length 8 or more having combination of alphabets & numbers & special characters (1.485%)

4 passwords having combination of alphabets plus numbers and special characters

That’s about it for now – it was fun!

 

And for those interested, here are the few behind the scene technical details:

Tools I used:

1. Excel & 2. SQL Server

Note: I first tried using Google refine to augment data – but it crashed on me. So thought of using SQL Server and TSQL. And if excel 2010 supported 2+ million then I would not have needed SQL server. Anyhow – the tool used is not important here.

Initial state:

2 million passwords in a .txt file.

Information I appended to the data-set using TSQL:

1. Length of password

2. Has Alphabets?

[a-zA-Z]

3. Has Numbers?

[0-9]

4. Has special Characters?

[^a-zA-Z0-9]

Plus few others derived from #2, #3 & #4 like ” has alphabets+ characters + special characters?”

That’s about it for the technical details. Ping me if interested!

 

Where can we find datasets that we can play with for Business Intelligence, Data Mining, Data Analysis Projects?

Standard

Update 1st August: I found this too: UCI MAchine Learning Repository http://archive.ics.uci.edu/ml/

Update 12 Nov 2012: I found this! Link to 400 datasets! http://www.datawrangling.com/some-datasets-available-on-the-web

Update 19 Dec 2012: Lynn Langit has a list here: http://lynnlangit.wordpress.com/public-datasets/

——————————————-

Recently on SQL Server Data Mining Forum, I answered a question about where to find DataSets for Business Intelligence Project.

Apart from Datasets AdventureWorks and Contoso data-sets, there are places where you can download data-sets to play with for your Business Intelligence, Data Mining or Data Analysis Projects.

Here is the List of data-sets that I have collected:

1. KDNuggests: Datasets for Data Mining

2. Quora: Where can I get large datasets open to the public?

3. Windows Azure Data Market

4. National council of Teachers of Mathematics

5. Introduction to Data Science: Data Sets

6. Hilary Mason’s Data-Set Bundle: https://bitly.com/bundles/hmason/1  (Also featured in Quora Link that I shared earlier)

7. And If you can’t find the data-set, ask it here: http://getthedata.org/  

Have I missed anything? Do comment! I’ll add the link with due credit.