PASS September 2014 Outstanding Volunteer

Standard

Dan English and I got the “PASS Outstanding Award” for our work with Business Analytics Virtual Chapter. Thanks & Congrats Dan, It’s great to have you on the virtual chapter’s leadership team :)

Dan English's BI Blog

Just a couple of weeks ago I received an email notifying me that I had been nominated as one of the PASS Outstanding Volunteers for September 2014! The official email stated that Paras Doshi, the PASS Business Analytics Virtual Chapter co-lead were selected for our excellent work with the Business Analytics Virtual Chapter.image

image

The official list for the PASS Outstanding Volunteer nominations is listed here –> Outstanding Volunteers

Dan English and Paras Doshi – Dan and Paras are model Virtual Chapter leaders. They have taken the BA VC to a new level in terms of activity and audience and have done wonderful, innovative work when it comes to posting session recordings on YouTube. Through their efforts they promote growth for PASS in terms of audience and the range of content offered – not just SQL server, but the holistic data platform. They have gone from being completely new VC…

View original post 83 more words

Advertisements

Back to basics: Design your Business Intelligence system to have lowest level data even if it’s not asked!

Standard

Here’s a scenario:

A Business Intelligence (BI) system for Sales is being developed at a company. Here are the events that occur:

1) Based on the requirements, It is documented that the Business needs to analyze Sales numbers by product, month, customer & employee

2) While designing the system IT learns that the data is stored at each Invoice Level but since the requirements document doesn’t say anything about having details down to invoice level, they decide to aggregate data before bringing in their system.

3) They develop the BI system within the time frame and sends it to business for data validation.

4) Business Analysts starts looking at the BI system and finds some numbers that don’t look right for a few products and need to see Invoices for those products to make sure that the data is right so they ask IT to give them invoice level data.

5) IT realizes that even though business had not requested Invoice Level data explicitly but they do NEED the lowest level data! They realize it’s crucial to pass data validation. Also, they talk with their business analysts and found out that they may  sometimes need to drill down to lowest level data to find insights that may be hidden at the aggregate level.

6) so IT decides to re-work on their solution. This increases the timeline & budget set for the project. Not only that they have lost the opportunity to gain the confidence of business by missing the budget and timeline.

7) They learn to “Design BI system to have the lowest level data even if it’s not asked!” and decides to never make this mistake again in the future!

This concludes the post and it’s important to include lowest level data in your BI system even if it’s not explicitly requested – this will save you time & build your credibility as a Business Intelligence developer/architect.

Business Intelligence Dashboard for Inventory management for a manufacturing organization:

Inventory Management Business Intelligence Manufacturing
Standard

Mockup:

BI system allows the analysts & operational specialists to drill down to the lowest data available but here’s a dashboard for executives & Sr. managers:

Inventory Management Business Intelligence Manufacturing

Back to basics: Multi Class Classification vs Two class classification.

Standard

Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):

Machine Learning Predictive Algorithms analytics Introduction

Now, here’s the difference between Multi Class and Two Class:

if your Test Data needs to be classified into two classes then you use a two-class classification model.

Examples:

1. Is it going to Rain today? YES or NO

2. Will the buyer renew his soon-to-expire subscription? YES or NO

3. What is the sentiment of this text? Positive OR Negative

As you can see from above examples the test data needs to be classified in two classes.

Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.

Examples:

1. Sentiment analysis of customer reviews? Positive, Negative, Neutral

2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow

I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.

Example: Azure Machine Learning (AzureML) studio’s classifier list:

Azure Machine Learning classifiers list

I hope this helps!

Business Intelligene Dashboard for Quality Managers

Quality Test Results Dashboard
Standard

Business Goal:

Need to understand the patterns in Quality test results data across all plants.

Summary:

– The solution involved creating a Business Intelligence system that gathered data from multiple plants. I was involved in mentoring IT team, development and end-user training of a Business Intelligence Dashboard that used SQL server analysis services as it’s data source.

– Dashboard development involved multiple checkpoint meetings with business leaders since this was the first time they had a chance to visualize quality test results data consolidated from multiple plants. Since they were new to data visualization, I used to prepare in advance and create 3-4 relevant visualization templates to kick off meetings.

Mockup:

(it is intended to look generic since I can’t discuss details. Also, drill down capabilities had been added to the dashboard to go down to the lowest granularity if needed)

Quality Test Results Dashboard

Business Intelligence Dashboard for Plant Managers (operations focused):

Standard

Business goal:

Plant managers needed a centralized automated solution that helped them monitor key metrics (operations focused) to help them better manage manufacturing plants.

Technical Summary:

– Work with the plant managers to identify key metrics & calculations to be displayed on dashboard

– Work with the IT managers to identify data source systems.

– Develop the Dashboard using SQL Server Reporting Services. (Built iteratively by making sure to have three checkpoint meetings with plant managers while working with IT/Business-Analysts to ensure data integrity)

– Developed drill down reports see detailed data at plant and machine level.

Mockup:

Plant Managers dashboard operations manufacturing

Back to basics: continuous Vs. Discrete variables and their importance in Data Visualization.

Standard

Take a look at the following chart, do you see any issues with it?

month trend chart line chart string to date

Notice that the month values are shown as “distinct” values instead of shown as a “continuous” values and it misleads the person looking at the chart.  Agree? Great! You already know based on your instincts what continuous and discrete values are, it’s just that we will need to label what you already know.

In the example used above, the “Date & Time” shown as a “Sales Date” is a continuous value since you can’t never say the “Exact” time that the event occurred…1/1/2008 22 hours, 15 minutes, 7 seconds, 5 milliseconds…and it goes on…it’s continuous.

But let’s say you wanted to see Number of Units Sold Vs Product Name. now that’s countable, isn’t it? You can say that we sold 150 units of Product X and 250 units of product Y. In this case, Units sold becomes discrete value.

The chart shown above was treating Sales Date as discrete values and hence causing confusion…let’s fix it since now you the difference between continuous and discrete variables:

Statistics Discrete Continuos Variable Data Visualization

Conclusion:

To develop effective data visualizations, it’s important to understand the data types of your data. In this post, you saw the difference between continuous and discrete variables and their importance in data visualization.

Business Intelligence Dashboard project for a Business Leader

Standard

Business Goal:

Design and Develop a Business Leader Dashboard to keep an eye on the health of multiple business units under his leadership.

In other words,
Dashboard should provide an one-stop shop for executives to monitor the health of their business unit(s). Its analogous to a car driver’s dashboard that helps monitor important performance indicators that they need to focus on while driving the car while making sure the driver get alerted for things such as “engine check” and “oil levels”. Dashboards uses state-of-the-art features like Key performance indicators (KPI’s), interactive data visualizations and drill down capability to create an immersive user experience for an executive.

Technical Summary:

– Work with the Business Leader to identify key metrics he needed to see on the dashboard to keep an eye of he health of the business units.

– Work with the IT leaders of each business units to map data available to come up with (consistent) formula to get the metrics needed by business leader

– Develop the Dashboard. (Built iteratively by making sure to have two checkpoint meetings with business leader and working with business analysts to make sure the data is right)

– Develop drill down reports for each metric for each business to see detailed data plus trends.

Mockup

(I can’t write about role of the business leader or the metrics displayed because of non disclosure agreements. so this mockup may look generic but it’s intended to be this way)

Business Leader Dashboard

Analyzing user survey data using Text Mining techniques for a Startup

Text mining analytics
Standard

Business Summary:

needed a way to extract insight from 80,000 words (120 pages).

Technical Summary:

I used Text mining techniques to deliver analysis in 1100 words (1.5 pages) from 80,000 words (120 pages). That’s 72x less than original size.

The analysis gave insight into a key competitive advantage that they had against established players in the market.

Text mining analytics