Let’s look at four datasets which have identical statistical properties:
Here’s the DATA:
Here’s their statistical properties:
|Mean of x in each case
|Variance of x in each case
|Mean of y in each case
||7.50 (to 2 decimal places)
|Variance of y in each case
||4.122 or 4.127 (to 3 decimal places)
|Correlation between x and y in each case
||0.816 (to 3 decimal places)
|Linear regression line in each case
||y = 3.00 + 0.500x
They look identical – don’t they? BUT let’s visualize the data:
Only visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.
Source: Anscombe’s quartet
One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:
Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.
From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:
Let’s thing about this for a moment: Why do we need “analysis”?
We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:
Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.
Data Reporting ≠ Data Analysis
I was playing w/ a time series data set in Excel 2010 and learned how to add a Trend-line and in this blog post, I’ll share how I added it:
First up, How is Trend-line useful? Here are few answers:
– It helps us see how data is changing over time, in other words, it helps us find “trends”
– It helps us forecast future.
With that, here is the chart without Trend-line:
Now let’s add the trend-line and you’ll be able to compare on your own how Trend-line makes it easier to spot “trends”. Here are the steps:
1. select the line > right-click > add trend line
2. configure the trend-line options
3. I also changed the line style
4. And Here’s the chart w/ trend-line
In this post, we saw how to add trend-line in the time series chart in excel 2010
If you work with any statistical analysis tool, sometimes you may have run into configuring the data into either of these following categories: Nominal, Ordinal, Interval, Ratio
Here is what each term means:
||Simply names or call them set of characters
||Example: Full name, fruits, cars, etc
||Nominal + They have order
||Example: Small, medium, big
||Ordinal + the intervals between each value are equally split
||Example: temperature in Fahrenheit scale:10 20 30 etc
Note that 20F is not twice as cold as 40F. So multiplication does not make sense on Interval data. But addition and subtraction works. Which brings us to next point: Ratio
||Interval + multiplication makes sense
||Weight: 60KG, 120KG.120 KG = 2 * 60 KG
I hope the examples are of help when you are working with statistical analysis tools and need to categorize the data.