Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Advertisements

Things I shared on Social Media Networks during Noc 12 – Dec 31 (2012)

Standard

Big Data: The Coming Sensor Data Driven Productivity Revolution http://bit.ly/TQAPsW

Check out some nice getting started tutorials at beyondrelational site: http://bit.ly/RVVHRV

Complexity is your enemy. Any fool can make something complicated. It is hard to make something simple – Richard Branson

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing to retail and healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data,” says Tim O’Reilly

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

Nice collection of about 20+ videos around the topic of “Data Science”: http://bit.ly/WMkZqc

Nice collection of videos by Berkeley school of information: http://bit.ly/Tf1yAD #Information #Data

Just found Facebook’s data team’s page: http://on.fb.me/ToYILO

via V Talk Tech – A Parth Acharya Blog – Nice HeatMap of stocks! http://on.fb.me/SfBbvF

what’s the biggest fear about cloud computing? via Windows Azure http://on.fb.me/VjIiHR

Resource: Presentations from the Sentiment Analysis Symposium http://bit.ly/VtPH3B

If I switched to the newest “holiday” theme on WordPress, this is how it would look: http://on.fb.me/UEuyFr

Nice! Code School now has R programming language! I have been playing with R for a while now and definitely want to learn more – here’s the link to learn R: http://bit.ly/VEAnkZ

Interesting tool from Google to optimize and analyze web page speeds: http://bit.ly/HTubNC

Performed #sentiment #Analysis on #starbucks twitter data using #R ! It was fun! http://on.fb.me/Z3qLo8

In 2002: The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year. And of course, over the past 10 years, this number would be bigger. http://bit.ly/TPT9r3

Reading: Business Analytics vs Business Intelligence? http://bit.ly/YUtJwx

Big data is a nickname for the recent increase in largely external and unstructured business and consumer information. How are businesses across industries harnessing traditional enterprise information management functions and systems to translate big data into useful business intelligence? http://www.deloitte.com/view/en_US/us/Services/additional-services/deloitte-analytics-service/217c19e69249b310VgnVCM2000003356f70aRCRD.htm

For business analytics professionals: 12 webcasts on Jan 30th 2013 http://bit.ly/RUFsZ3 #sqlpass #analytics #24hop

Some nice insights about how to build an Internet platform, from the founder of Zipcar: http://bit.ly/Yco6IP

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

See what went into building WATSON, an advanced machine learning & natural language processing system powered by Big Data!

Standard

Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?

I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:

If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section:

Microsoft StreamInsight: “The type or namespace name ComplexEventProcessing does not exist in the namespace Microsoft”

Standard

error streaminsight complex event processing version 21In this blog post, I’ll document how I solved the error “The type or namespace name ComplexEventProcessing does not exist in the namespace Microsoft”. Here are the steps:

1. I browsed through other errors/warnings as well – I was also missing assemblies from Reactive Extensions and so I added them first.

2. For my scenario, I had installed StreamInsight 2.0 successfully on my machine but I downloaded the sample that needed assemblies from StreamInsight 2.1 – notice the version mismatch here? That was the problem!

3. One of the message said “Could not locate assembly Microsoft.ComplexEventProcessing version = 21.0.0.0” – notice the version = 21.0.0.0 – it suggested that I needed the assemblies from StreamInsight 2.1

4. So I downloaded “Microsoft® SQL Server® StreamInsight 2.1” and installed it. And it worked!

5. FYI: I found the Microsoft.ComplexEventProcessing assembly on my machine at C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.ComplexEventProcessing\*

That’s about it for this post. I hope it helps someone who is having issues with finding the assembly with the right version number to get started working with StreamInsight.

Three V’s of Big Data with Example:

Standard

In this blog-post, we would see the Three V’s of Big Data with Example:

1. Volume:

TB’s and PB’s and ZB’s of data that gets created:

From the webinar “How to Walk The Path from BI to Data Science: An interview with Michael Driscoll, data scientist and CEO of Metamarkets” – A global surge in Data

2. Velocity:

The speed at which information flows.

Example: 50 Million tweets per day!

twitter 50 million tweets per day

(This is back in Nov. of 2010 – the number must have increased!)

UPDATE 23 Nov 2012: on, wikipedia it says – 340 million tweets per day!

twitter 2012 340 million tweets per day

3. Variety:

All types of data is now being captured which may be in structured format or not.

Example: Text from PDF’s, Emails, Social network updates, voice calls, web traffic logs, sensor data, click streams, etc

data variety big data

Image courtesy

And this may be followed by other V’s like V for Value.

Conclusion:

In this blog-post, we saw Three V’s of Big Data with Example.

Related Posts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Hadoop on Azure’s Javascript Interactive Console has basic graphing functions:

Standard

The Hadoop on Azure’s Javascript console has basic graphing functions: Bar, Line & Chart. I think this is great becuase it gives an opportunity to visualize data that’s in HDFS directly from the Interactive Javascript Console! Here’s a screenshot:

hadoop on azure bar and line graph javascript

In the console, I ran the help(“graph”) command to see how I can use this function:
Draw a graph of data
graph.bar(data, options) Bar graph
graph.line(data, options) Line graph
graph.pie(data, options) Pie chart

Parameters
data (array) Array of data objects
options (object) Options object, with
x (string) Property to use for x-axis values
y (string) Property to use for y-axis values
title (string) Graph title
orientation (number) x-axis label orientation in degrees
tickInterval (number) x-axis tick interval

Conclusion:

In this blog-post, I posted that Hadoop on Azure’s Javascript Interactive Console has basic graphing functions.

Related articles:

How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?

Standard

In this blog post, we would:

1. Upload Twitter Text Data into Hadoop on Azure cluster

2. Create a Hive Table and load the data uploaded in step 1 to the Hive Table

3. Analyze data in Hive via Excel Add-in

Before we begin, I assume you have access to Hadoop on azure, Have your sample data (don’t have one? learn from a blog post), familiar with Hadoop ecosystem and know your way around the Hadoop on Azure Dashboard.

Now, Here are the steps involved:

STEP 1: Upload Twitter Text Data into Hadoop on Azure cluster

1. Have your data to be uploaded ready! I am just going to Copy Paste the File from my host machine to the RDP’ed machine. In this case, the machine that I am going is the Hadoop on Azure cluster.

For the purpose of this blog post, I have a text file having 1500 tweets:

upload twitter text data to hadoop on azure

2. Open web browser > Go to your cluster in Hadoop on Azure

3. RDP into your Hadoop on Azure cluster

Remote Desktop into Hadoop on Azure cluster

4. Copy-Paste the File. It’s a small data file so this approach works for now.

uploading twitter text data to hadoop on azure hdfs cluster

Step 2: Create a Hive Table and load the data uploaded in step 1 to the Hive Table

1. Stay on the machine that you Remote Desktop (RDP’ed) into.

2. Open the Hadoop command line (you’ll see a icon on your Desktop)

3. switch to Hive:

write hive commands in hadoop on azure

4. Use the following Hive Commands:

DROP TABLE IF EXISTS TweetSampleTable;

CREATE TABLE TweetSampleTable (
id string,
text string,
favorited string,
replyToSN string,
created string,
truncated string,
replyToSID string,
replyToUID string,
statusSource string,
screenName string
);

LOAD DATA LOCAL INPATH ‘C:\apps\dist\examples\data\tweets.txt’ OVERWRITE INTO TABLE TweetSampleTable;

Note that for the purpose of this blog-post, I’ve chose string as data type for all fields. This is something that depends on the data that you have. If I were building a solution, I would spend some more time choosing the right data type.

Step 3. Analyze data in Hive via Excel Add-in

1. Switch to Hadoop on Azure Dashboard

2. Go to the Hive Console and run the show tables to verify that there is a tweetsampletable.

show all tables in hive hadoop on azure

3. Now if you haven’t, Download and Install the Hive ODBC Driver from the Downloads section of your Hadoop on Azure Dashboard.

4. I setup  a ODBC connection to Hive by following the instructions here: How To Connect Excel to Hadoop on Azure via HiveODBC (en-US)

5. After that, Open Excel. I have Excel 2010 64 bits.

6. Switch to Data Tab > Hive Pane

7. Choose the Hive connection > select Table > Select Columns > And off you go!

you have Hive Data in Excel!

Hadoop on azure Hive Excel addin

Now go Analyze!

Conclusion:

In this blog-post, we saw How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?

Visualizing MapReduce Algorithm with WordCount Example:

Standard

In this blog-post, we would visualize how MapReduce Algorithms operates to perform a Word Count on a Text Input:

First of all, for all programmers out there, Here is the code (Javascript):

var map = function (key, value, context) {
    var words = value.split(/[^a-zA-Z]/);
    for (var i = 0; i < words.length; i++) {
        if (words[i] !== "") {
            context.write(words[i].toLowerCase(), 1);
            }
        }
};
var reduce = function (key, values, context) {
    var sum = 0;
    while (values.hasNext()) {
          sum += parseInt(values.next());
    }
    context.write(key, sum);
};

Courtesy: Microsoft Hadoop on Azure Samples

Now, let’s visualize this using an example.

Suppose the Text is “Hadoop on Azure sample Hadoop is on Windows Azure Hadoop is on Windows server” – Then this is how you can think of what happens to your input when it is processed first by Map function and then by Reduce function:

INPUT MAP REDUCE

Hadoop on Azure sample

Hadoop is on Windows Azure

Hadoop is on Windows server

Hadoop 1 Hadoop 3
On 1
Azure 1 on 3
Sample 1
Hadoop 1 Azure 2
Is 1
On 1 Sample 1
Windows 1
Azure 1 Is 2
Hadoop 1
Is 1 Windows 2
On 1
Windows 1 Server 1
Server 1

Conclusion:

In this blog post, we visualized how MapReduce Algorithm operates for a WordCount Example.

Things I shared on Social Media Networks during Oct 19 – Nov 11

Standard

The Goal of this series is to recap the conversations that I’m having on social networks and I do not want my Blog readers to miss that. So Here is the recap of last three weeks:

1. I was at SQL PASS 2012!

SQL PASS 2012 Paras Doshi

2. A nice Dashboard!

Metro fied Business Intelligene Dashboard windows 8

3. Learn to build an Enterprise Information management system using SSIS, DQS and MDS:

http://parasdoshi.com/2012/11/07/resource-learn-to-build-a-enterprise-information-management-system-using-data-quality-services-master-data-services-and-sql-server-integration-services/

 Enterprise Information management system using SSIS, DQS and MDS

4. Fake Data!

5. I reached 2000 points on MSDN!Paras Doshi reached 2000 points on MSDN!

6. A nice video by Jeremy Howard on Predictive Analytics:

7. A nice data visualization via the Data Mining add-in excel

nice data visualization via the Data Mining add-in excel

8. Get started on Hadoop on windows 7/server!

Download here: http://parasdoshi.com/2012/10/27/getting-started-with-hdinsight-a-k-a-microsofts-big-data-hadoop-platform-on-local-windows-machine/

Demo Here: http://parasdoshi.com/2012/11/02/end-to-end-demo-hadoop-hdinsight-hive-excel-power-view-azure-data-market/

Hadoop on windows 7/server!

9. I was at Give Camp 2012! if you do not know about “Give Camp”, then you should check it out!

Here’s last year’s (2011) post: http://parasdoshi.com/2011/10/24/i-gave-back-at-dallas-givecamp-and-why-i-think-every-software-professional-should-consider-doing-so-too/

Give Camp 2012

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

End to End Demo: Hadoop (HDInsight) + Hive + Excel + Power View + Azure Data Market

Standard

A great end to end demo shown by Microsoft at Strata conference 2012:

Description about Demo:

Scenario: Analyze web logs of an online bike store.

Tools demonstrated:

Hadoop (Get started with HDInsight)

Hive.

Excel 2013

Power View

Azure Data Market.

A Dashboard in Power View showing co-relation between discount campaigns and the traffic:

This is mash-up of data from Hadoop (traffic) and data from SQL Server (discount campaigns)

end to end demo microsoft hadoop hdinsight

Conclusion

In this blog-post, I shared an awesome demo about HDInsight. check it out!