There’s been a growing interest in Hadoop & Big Data, Here’s the Proof:

Standard

I like to keep an eye on Technology Trends. One of the ways I do that is by subscribing to leading magazines for articles – I may not always read the entire article but I definitely read the headlines to see what Industry is talking about. during last 12 months or so I have seen a lot of buzz around Big Data and I thought to myself – It would be nice to see a Trend line for Big Data. Taking it a step further, I am also interested in seeing if there is a correlation between growing trend in “Hadoop” and “Big Data”. Also, I wanted to see how it compares with the Terms like Business Intelligence and Data Science. With this, I turned to Google Trends to quickly create a Trend report to see the results.

Here’s the report:

Big Data Hadoop Business Intelligence

Here are some observations:

1) There’s a correlation between Trend of Big Data and Hadoop. In fact, it looks like growing interest in Hadoop fueled interest in “Big Data”.

2) Trend line of Big Data and Hadoop overtook that of Business Intelligence in Oct 2012 and sep 2012 respectively.

3) Decline in Trend line of Business Intelligence.

4) There seems to be a steady increase in Trend line for Business Analytics and Data Science.

And Here’s the Google Trend report URL: http://www.google.com/trends/explore#q=Big%20Data%2C%20Hadoop%2C%20Business%20Intelligence%2C%20Business%20Analytics%2C%20Data%20Science&cmpt=q

What do you think about these trends?

Advertisements

Microsoft StreamInsight: “The type or namespace name ComplexEventProcessing does not exist in the namespace Microsoft”

Standard

error streaminsight complex event processing version 21In this blog post, I’ll document how I solved the error “The type or namespace name ComplexEventProcessing does not exist in the namespace Microsoft”. Here are the steps:

1. I browsed through other errors/warnings as well – I was also missing assemblies from Reactive Extensions and so I added them first.

2. For my scenario, I had installed StreamInsight 2.0 successfully on my machine but I downloaded the sample that needed assemblies from StreamInsight 2.1 – notice the version mismatch here? That was the problem!

3. One of the message said “Could not locate assembly Microsoft.ComplexEventProcessing version = 21.0.0.0” – notice the version = 21.0.0.0 – it suggested that I needed the assemblies from StreamInsight 2.1

4. So I downloaded “Microsoft® SQL Server® StreamInsight 2.1” and installed it. And it worked!

5. FYI: I found the Microsoft.ComplexEventProcessing assembly on my machine at C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.ComplexEventProcessing\*

That’s about it for this post. I hope it helps someone who is having issues with finding the assembly with the right version number to get started working with StreamInsight.

How I installed StreamInsight 2.0 on my demo machine:

Standard

I installed StreamInsight 2.0 on my demo machine today and so I thought I would document the process.

Before we begin, few references to existing documents on the interwebs: Official documentation about Installation is here: http://msdn.microsoft.com/en-us/library/ee378749.aspx and Introduction to concepts of StreamInsight can be accessed via the following resources:

1)      MSDN documentation: http://msdn.microsoft.com/en-us/library/ee391416.aspx

2)      Pluralsight: http://blog.pluralsight.com/2012/01/17/free-streaminsight-training/

3)      SQL server central article: http://www.sqlservercentral.com/articles/StreamInsight/69208/

4)      A channel 9 video pointed out by Johan Ahlen: http://joinsights.com/2011/05/22/great-streaminsight-presentation-by-torsten-grabs/

Now, here are the steps that I took to install StreamInsight on my demo machine:

1)      I located the StreamInsight installer inside the SQL Server 2012 Developer edition setup that I had:

1 Installing streaminsight sql server developer edition

Note that even though StreamInsight is licensed with SQL Server – It is different “software” that solves different technical problem(s). And note that StreamInsight does not have dependencies on SQL Server. It is a separate install.

2. On the Instance Configuration page, I added “StreamInsightInstaller” as the instance name. This is the first instance of StreamInsight that I am installing on my demo machine

3. On next dialog box, I added the product key that I have for SQL Server Developer edition. You also have the option to activate 180 day trial.

4. Then specify the StreamInsight service and group settings

5. Click on Install on next dialog box

6. You would also need to install the SQL compact edition. To do that, I navigated to C:\Program Files\Microsoft StreamInsight 2.0\Redist

Note that, if you have chosen the x64 version then you would have to first install the x86 version of SQL Server compact and then the x64 version of SQL Server Compact.2 stream insight install sql serve compact edition

7. Now installation is complete at this point.

8. If the StreamInsight Service is not started, then go to services and start it.

3 install streaminsight service not started

In services: Right click > Start:

4 windows service streaminsight start

9. Now, You can run samples. To access them you can go to Start > all programs > StreamInsight Samples

5. stream insight samples installation

Conclusion

In this blog-post, we saw how to install StreamInsight 2.0 on your machine.

Three V’s of Big Data with Example:

Standard

In this blog-post, we would see the Three V’s of Big Data with Example:

1. Volume:

TB’s and PB’s and ZB’s of data that gets created:

From the webinar “How to Walk The Path from BI to Data Science: An interview with Michael Driscoll, data scientist and CEO of Metamarkets” – A global surge in Data

2. Velocity:

The speed at which information flows.

Example: 50 Million tweets per day!

twitter 50 million tweets per day

(This is back in Nov. of 2010 – the number must have increased!)

UPDATE 23 Nov 2012: on, wikipedia it says – 340 million tweets per day!

twitter 2012 340 million tweets per day

3. Variety:

All types of data is now being captured which may be in structured format or not.

Example: Text from PDF’s, Emails, Social network updates, voice calls, web traffic logs, sensor data, click streams, etc

data variety big data

Image courtesy

And this may be followed by other V’s like V for Value.

Conclusion:

In this blog-post, we saw Three V’s of Big Data with Example.

Related Posts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

A Social Media Analytics Sample Dashboard in Excel Powered by PowerPivot.

Standard

I found a great sample Dashboard on Social Media Analytics in Excel that is powered by PowerPivot. Here’s the screenshot of the Dashboard.

excel powerpivot twitter social media analytics dashboard 1

Here are the steps if you want to download and play with the Dashboard:

  1. Install Power Pivot add-in
  2. Download the “Analytics for Twitter” excel sample (powered by PowerPivot). Link: http://www.microsoft.com/en-us/download/details.aspx?id=26213
  3. It creates an “Analytics for Twitter” excel file on Desktop > Open it.
  4. The dashboard is powered by data it pulls in the Power Pivot:excel powerpivot twitter social media analytics dashboard
  5. You can change the search queries:
    a. Edit the default search terms:excel powerpivot twitter social media analytics dashboard
    b. Refresh Data:excel powerpivot refresh data
    c. Updated Dashboard!excel powerpivot twitter social media analytics dashboard

That’s about it. And here’s a Youtube Video showing some features in this sample:

Conclusion:
In this blog-post, I shared a great sample dashboard built on top of PowerPivot model.

 

Getting started with HDInsight (a.k.a Microsoft’s Big Data hadoop Platform) on local Windows Machine!

Standard

Recently Microsoft announced HDInsight on Windows server! and so it’s good to get a chance to play with its public preview! Currently there two ways you can run HDInsight: 1) Via Windows Azure 2) On your local Windows machine.

In this blog-post, I would show you step by step to install a HDInsight on a local Windows Machine. For the purpose of this blog-post, I am going to show it on Windows 7 but it also supported on Windows Server 2008 R2.

download hadoop on windows machine hdinsight

Note that the ideal audience for this blog-post would be a developer who wants to kick tires of Hadoop on windows machine to see what it can do! If I had wanted to target it to Hadoop administrators then I would have shown how to do it on Windows Server and also how to manage the Hadoop cluster with system center. But for this blog-post, I am going to target developers so that they can get started playing with Hadoop on a windows machine! With that, here are the steps to install Hadoop (HDInight) on Windows 7:

1) open Web Platform Installer. Download and install it if you haven’t yet.

2) search for Hadoop

install hadoop windows via web platform installer

3) Install it!

4) You should get a message saying that it successfully installed it!

5) Do you see a Microsoft HDInsight Dashboard ICON on your Desktop? Yes? Great! Open it!

windows hadoop big data dashboard6) And here’s the IIS manager showing the site that hosts the above Dashboard. Just wanted to show this to folks who might not see the Dashboard at http://localhost:8085/

IIS windows hadoop local host site port 8085

7) That’s about it for his post. If you want to continue learning, check out the “documentation” link at the bottom on the Hadoop Dashboard which is: http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows-en-us.aspx

Conclusion:

In this blog-post, we saw how to install HDInsight (Microsoft’s Hadoop) on local windows machine.

Related Articles:

Who on earth is creating “Big data”?

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

How to Install Microsoft HDInsight Server Hadoop on Windows 8 Professional

Crunch more than 1 million rows in Excel 2010 with free addin called Power Pivot!

Standard

Lately, I have been talking to few business folks who do their own data analysis in excel (2010) and sometimes they run into the excel 2010 limit of 1 million rows. And so when I hear that, I talk about Power Pivot and I talk about what It can do and what it cannot and they are just amazed that there’s a FREE add-in that will help them crunch more than 1 million rows!

happy suprised business user excel power pivot

Image courtesy

You can explore more about this amazing add-in here: http://www.microsoft.com/en-us/bi/powerpivot.aspx

And Read more about pros/benefits of PowerPivot:

Top 5 Ways PowerPivot Helps Excel Pros

PowerPivot? But I use pivot tables in Excel

Back to basics: What is the difference between Data Analysis and Data Mining?

Standard

What is the difference between Data Analaysis and Data Mining:

1) One view is that: Data Mining is one particular form of Data Analysis.

difference between data mining and data analysis

One of the reason I researched about the difference between Data Analysis and Data Mining because I find that the terms are used Interchangeably and now I know why. It’s because Data Mining is considered as a particular form of Data Analysis.

2) I found other view that says:

Data Analysis is meant to support decision-making, support conclusions & Highlight note-worthy information. So when “Analyzing data” – we know what we want; we want answers to support our hypothesis; we want data in summarized form to highlight useful information.

While

Data Mining is meant for “Knowledge discovery” and “predictions”. So when “Mining data” – we look for undefined insights; We want the data to tell us something we didn’t knew before; We want to find patterns in the data that we had not anticipated.

Sources:

http://www-stat.stanford.edu/~jhf/ftp/dm-stat.pdf

http://stats.stackexchange.com/questions/5026/what-is-the-difference-between-data-mining-statistics-machine-learning-and-ai

http://stats.stackexchange.com/questions/1521/data-mining-and-statistical-analysis

http://en.wikipedia.org/wiki/Data_analysis

 

Data Mining: Classification VS Clustering (cluster analysis)

Standard

For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

CLASSIFICATION CLUSTERING
  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.

Who on earth is creating “Big data”?

Standard

With all the news about “Big Data”, I had a question:

Where does Big Data come from?

So I researched and here are the Big Data “Sources” that I found:

1. Enterprise data (emails, word documents, pdf’s, etc)

2. Transactions

3. Social Media

4. Sensor Data

5. Public Data (energy, world resource, labor statistics etc)

where does Big Data Come from / Big Data Sources

Am I missing anything? Please feel free to point those out!