Tableau: Data Cleaning for Geographic Maps

Standard

Data cleaning is a major part of any analytic’s/data-visualization undertaking. If data cleaning is ignored then it leads to inaccurate data reporting & thus suboptimal business decisions.

To that end, if you create a Tableau’s Geographic map, please check the accuracy of your data by going to:

Menu Bar > Map > Edit Locations

Let me give you some examples:

Now, I have “states/province” as my geographic role for the variable and when I created a geographic map, I created a geographic map it didn’t show any state for New York State! See Before:

data cleaning geogrphic map before

So what did I do?

I navigated to Menu bar > Map > Edit locations:

data cleaning geogrphic map State

So I fixed it!

data cleaning geogrphic map Tableau

And After:

data cleaning geogrphic map after

Note that New York State is lighted up!

In the past, I’ve also have entered Latitude & Longitude if need be.  This is when it was not able to recognize few US cities and it was saying “ambiguous” – I inputted Latitude & Longitude to clean the data:

data cleaning geogrphic map city

Conclusion:

In this post, I described how you should check the data accuracy of a Tableau Geographic Map.

Advertisements

Received President’s Volunteer Service Award!

Standard

Received President’s volunteer service award for year 2012!

Letter from US president Barack Obama:

letter from Barack Obama presidents volunteer service award

Label Pin:

label pin president's volunteer service award copy

Certificate:

certificate presidents volunteer service award copy

Related Notes:

– Check out Give Camp – It’s a great way for IT professionals to give back to society. If you do not a give camp in your country, you can always start one :)

Met revered APJ Abdul Kalam

My First Five Toastmaster International club speeches:

Standard

In this post, I want to share the title of the first five Toastmaster speeches that I delivered over the past few months:

1. Ice Breaker

2. Glossophobia (Fear of Public Speaking)

3. Get ready to outsource your computing

4. What’s it like for folks who live below poverty line?

5. what’s it like to start a startup?

Related Post:
Half way through Toastmaster’s competent communicator manual
Want to practice public speaking? Join Toastmaster’s!

How many websites in USA exceed the data collection limitations of Google Analytics?

Standard

Little bit of background:

– I was researching on the limitations of Google Analytics

– After reading the Limitations, I wanted to know – How many websites in USA exceed the limitations of Google Analytics?

So Here’s the Short Answer:

Only 108 sites exceed this limitation

(as of today)

And Here’s the long answer:

Limitations of Google Analytics. Here’s the URL: http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983

And I am quoting from the above URL:

Data Collection limit: You should not send more than 10 million hits per month. If you exceed this limit, there is no assurance that the excess hits will be processed.
Data Freshness limit: Sending more than 200,000 visits per day to Google Analytics will result in your reports being refreshed only once per day

And to take it further, I wanted to know how many website in USA get greater than 10 million hits per month, turns out only 108 websites in US get that much traffic.
Source: http://www.quantcast.com/top-sites/US?jump-to=108

so from data collection limit standpoint, only these 100 odd sites would exceed the limitations of Google Analytics.

To put things in Perspective: MySpace.com does not exceed Data Collection Google Analytics Limit:

my space can use google analytics

Conclusion

Just knowing about the Data Collection Limit was not interesting but I combined data from other data sources – it seemed very interesting to me! Anyhoo – In this post, I shared:

> Limitations of Google Analytics

> Answered How many websites in USA exceed the limitations of Google Analytics?

[UPDATE Feb 10th 2013] I made a mistake in correlating data from Quantcast and Google Analytics. Lesson learned: double-check for units when comparing data from two different sources

Florin Dumitrescu pointed out that while Quantcast uses People/Month and Google uses hits/month. They may NOT be always the same. Sorry about this.

Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Playing w/ the Occupational Employement Statistics Data-Set:

Standard

I found some data-sets on Occupational Employment Statistics on Bureau of Labor Statistics site and I played with it to see if I can find something interesting:

Few things about the data & visualization that I am going to share

  • US only
  • I downloaded the national level data But there’s also state level data available if you’re interested to drill down.
  • The reports that you see where created after I got a chance to “clean” the data-set a bit and created a data model that suited basic reporting on top of it.
  • For this blog post, I am going to play w/ May 2010 & 2011 data
  • With the help of original data-set, you can drill down to get statistics about a particular Job Category if you want. For this blog-post, I am going to share visualizations that correspond to Job categories.
  • click on images to see the higher resolution image.

With that, Here are some visualizations:

1) Job Category VS mean hourly salary:

1 Job category vs hourly salary mean bureau of labour statistics

2) Job Category VS number of employees:

2 Job category vs number of employees bureau of labour statistics

3) Scatter Plot:

X Axis: Number of employees

Y – Axis: Wage (Mean Hourly Salary May 2011)

Size of Bubble: Wage (Mean Hourly Salary May 2011)

*Note: This may not be the best approach to create the Scatter Plot as I have used the same value (Mean Hourly Salary May 2011) twice – But since I was just playing w/ it, I went with what I had in the model.

Here’s the visualization:

3 scatter plot number of employees vs mean hourly wage may 2011 employment statistics

Some of the things I observed:

1) I belong to an Industry (Computer and Mathematical occupations) which has relatively higher mean hourly wage.

2) There are few people working in “farming, fishing & forestry occupations” that do not get paid much.

3) There are lots of people working in “office administrative support occupations” that do not get paid much.

4) Management Occupations, Legal Occupations and computer & mathematical occupations have relatively higher mean hourly wages.

Conclusion:

In this post, I played w/ Occupational Employment statistics data-sets and shared some visualizations.

Prepare yourselves for ‘Capped Data Plans’ VS ‘growing cloud computing adoption’ battle.

Standard

I love cloud! And No, I am not a marketer – I am a technologist and I love cloud after minus-ing the marketing mumbo-jumbo/hype. And i would like cloud to emerge victorious. But i see a road block and that’s a problem. And…..I like pointing at problems (Oops!) I like coming up with creative solutions to problems.

So, in this post, I am going to talk about a problem Errrr, probable solution(s) to a potential problem that can adversely affect our lifestyle. To be specific, it can drastically increase the money we pay for Internet in future. And with growing cloud computing adoption, we will consume more bandwidth in future since we will have LOT of data on cloud. The more the data transfer between cloud and our device – the more will be the bandwidth usage.

Now, Today, it’s not a problem. At least not in USA – as we can live with “LIMITED” mobile (/Wireless cellular things like 4G, 3G, etc) data plans  combined with “UNLIMITED” Home data plans (Wired one’s – yes, the Internet made up of the fiber cables). So in future, “IF” we still have UNLIMITED home data plans around – this battle may fortunately never happen.

Wait…But think what if there will be no UNLIMITED plan?

I know it’s scary. So year by year, The internet usage would start doubling and Limit (cap) on data usage would be halved.

Wait..That’s sounds like an Inverse Moore’s law!

And what if the Home data plans would look like: $2 for every GB. At my current usage  – I would be paying little less than 80 dollars. And I can’t imagine my data usage if all my data is on cloud.  What if I do not have a local storage and instead, I’ll have all my data on some “cloud storage”. I can imagine a bill of $200 per month. And that’s a problem? You think so too?

No!? Unrealistic?! Home data plans cannot be capped?! Here’s the article: http://money.cnn.com/2011/05/17/technology/netflix_canada/index.htm that talks about the challenge that Netflix faced in Canada because of “Capped Home data plans”. In fact, Netflix had to offer customization option in video quality that could help Canadian customers “save” on data usage. Any-who- if Home data plans were capped in Canada – I can say, this may happen in USA and other countries too.

And still if you think that capped home data plans are not a possibility. I would like to point out that, not long ago, we had UNLIMITED data plans through wireless cellular services and Now, we do not have it. (And I know – Sprint offers unlimited data plans. But could that not change too?)

So what can we do about it?

– Spread the word. Prepare yourselves for the battle! Contact Imp. People. Contact Gov. etc, etc, etc. ( Ok, I tweeted, Now what?! Any probable solution? )

Ideally, the data should not be capped. (And if it does happen – We can run an occupy Comcast campaign! Ok – sorry. could not resist it!)

realistically, How about a reasonable capped home data plan? What’s reasonable – Well, I mean I do not like 2 GB and 5 GB limits. That’s way too small. On the other hand a capped data plan of 1 TB is too high. It’s virtually unlimited. How about something in between which is reasonable for both – we THE CLOUD USERS and the Internet providers (ISP’s)? That is what I think – you may have different perspective on it – if you do, go ahead and post your views as comments.

Joining University of Washington’s certificate in cloud computing program!

Standard

I’ll be joining the web version of the University of Washington’s certificate in cloud computing program.

I have researched on cloud for a while now and to me, it seems like a right time to study cloud in more detail. And what’s more – it’s University of Washington and so the faculty teaching courses are top-notch. And so I hope to learn from the veterans and awesome academicians via this certificate program.

Program Information: http://www.pce.uw.edu/certificates/cloud-computing.html

 

Related posts:

http://parasdoshi.com/2011/09/27/proof-that-the-word-cloud-computing-has-replaced-distributed-computing-2/

http://parasdoshi.com/2011/08/01/presented-on-what-mobile-devices-plus-cloud-computing-mean-for-the-real-world-at-ignite-ahmedabad/

http://parasdoshi.com/2011/07/07/cloud-computing-is-awesome-defining-cloud-computing-the-urban-dictionary-style/

India can win 50 gold medals at olympics!

Standard

It was 11th August 2008, a day on which India’s Abhinav Bindra got first ever individual gold medal in 10m air-rifle event. He has made us Indians proud and has ended a long drought. Bravo! Abhinav…
Though this news makes you happy, the overall performance of India at Olympics is not at all encouraging.
There are no official statistics with me but there is a little doubt that on per capita basis India finishes last in any Olympic tally. Dead last. Commentators blamed our Olympic shortfall on the lack of training infrastructure’s in India, the low priority we put on sports as a society and the Indian physique. But I believe this is not entirely true. Here’s why:
Whatever it is that keeps India from Olympic gold,it certainly isn’t a shortfall in the Indian gene pool. Take any Olympic event and you can see that India is rich in future champions but only if we look in the right places.
If you have visited an Indian construction site,You will find many potential gymnastics champion. I recently saw a man walking across thin planks on the 15th floor of a building. He did this with no safety. Bring that man down 200 feet and put him on a four-foot high balance beam and surely he will dance bollywood tunes on the beam. India could easily field a gymnastics team from our construction sites.
Now, Take the rickshaw pullers, living on a high protein and carbohydrate diet of dal and roti, these men transport riders in 40-degree weather for 14 hours a day, seven days a week. Take the best among them and train them for Tour-De-France as training for Olympics. Who would you think will have a harder time switching places- rickshaw pullers or lance Armstrong? My bet…Mr.Lance…
Our fisherman sail three months of the year in the hardest monsoon conditions.Brutal rain don’t hamper these rough-and-tough sailors. Take the best of these men and train them for three years and they will be formidable Olympic contenders.
Likewise we can bet on men and women working at the construction sites, They can be seen lifting pounds of brick and dirt onto their head…watch there motion…they can win us weightlifting medal.
Each year,the best American athletes go to train in Colorado 5,000 feet above sea level as high altitudes makes the lungs stronger. If we take our best and train them in Leh, 14,000 feet above sea level, imagine what supermen and superwomen we may develop.
The Indian countryside and cities are replete with future Olympians. These men and women train daily in the hardest of conditions with simple diets. Without a doubt, with proper training and infrastructure,these men and women will add to our Olympic golds.
It’s a shame that world has missed 100 years of Indian Olympic prowess; it would be a travesty if the world misses another 100 years.

[This Article was Published in ISTE nirma’s Annual Magazine]

Chandrayaan Budget : 400 crores!! can India afford a Moon Mission?

Standard

For Moon Mission, a space vehicle (Chandrayaan) will revolve around Lunar surface to collect various information. It’s cameras and sensors will be looking for elements such as 222-radon, 210-plumbum,Magnesium,Aluminium,Silicon,calcium,,iron,uranium and Thorium. These minerals are what everyone from US to Japan is looking for on the moon. With earthly minerals and energy resources fast running out, scientist believe that by the middle of 21st century ,man should be able to mine the moon and use its abundant helium-3 as fuel to be used as energy source.

The Moon is estimated to have about one million tones of helium-3, which scientist believe, could be the fuel for the future, and is scarce on earth. It is estimated that 40 tonnes of helium-3 is enough to generate the total power requirement of the US for one-year!! .One area which seems to be seriously considered is mining of helium-3 and bringing to earth for use in fusion reactor to generate electricity.

Once believed to be wasteland, lunar surface is now believed to be capable of supporting life. After the American mission ‘Clementine’ found traces of water-ice at the moon’s poles, scientist guesstimate that more than 10 billion tones of ice exist there. This Lunar ice can be used to produce lunar water. This water can be used to irrigate lunar land or can be used in cooking. As scientist say, that is all fiction. But the lunar-ice can be used as fuel (by breaking it into liquid hydrogen and oxygen which are most powerful chemical propellants) for communication satellites in orbit between moon and earth.

All this will take years to fructify, but advanced countries are already developing the towards such ends. International treaties prohibit any country from colonizing the moon, but once mining of the moon becomes technologically feasible, India is convinced that pioneers will work out treaties to suit their own interests. So we should be there to stake a claim when the others land their Bulldozers!!.

Published in ISTE’s Nirma University Chapter’s annual magazine.