First Impression: Google’s BigData offering called BigQuery

Standard

 

As a part of University of Washington’s (UW) cloud class’s assignment, I played with Google’s BigData offering BigQuery and I am writing this blog post to share what I think about it. please note that the views are my own and do not represent those of the instructor’s and fellow students at UW. And also I am not a BigData “Expert”, Think of me as a student trying to get my head around various offerings out there – So if you feel otherwise about what I have written, Just let me know in the comments section. Any-who read along to know what I think of BigQuery:

First up what is BigQuery?

It’s a platform to analyze your data (lot’s of it) by running SQL-Like Queries. And it’s really SQL-Like, and so if you are from SQL world like me – you would not face any issues in getting up and running in seconds by referring to the nicely written documentation.

And other point to consider here is that even though it’s SQL-Like, you’ll be able to analyze considerable number of rows in few seconds. Let me give you an example: I played with a  sample (called gsod) which had 115M rows and as per my experiments, I was able to get answers to simple computations like max, mean, avg, etc in less than couple of seconds. And little complex queries having where, joins and group by in around 5-6 seconds. Your results may vary depending on the type of query you run but the BOTTOMLINE is that it is FAST. that’s a good news!

BigQuery is Fast!

But what bothers me is that How am I suppose to “UPLOAD” lots of data on the Google CLOUD. It takes time, right? But I guess that’s an issue with every cloud based BigData offering. But here’s what I am thinking – If your data is already on the cloud. for e.g. Amazon’s or Microsoft’s – Does it not make sense to run analytic’s on Amazon’s and Microsoft’s cloud instead of porting your data to Google’s?

[Sidenote: I like it that Hadoop on Azure allows Amazon S3 data source. Nice move!]

My concern: Time spent in uploading truckload of data to Google’s cloud just so that we can use it for BigQuery

And even if you have your data on GAE data-store, you’ll have to uplaod your data to BigQuery separately. Source

Zooming out for a moment, I feel the Goal of BigQuery was to offer an easy to use BigData platform, And I feel that’s what they have delivered:

An easy-to-use + easy-to-setup “Hadoop+Hive” Like Offering.

[Update: Aug 20th 2012: I have been thinking about it more and I realized that BigQuery is more about satisfying real-time Big Data Scenario’s. And Hadoop/Hive/MapReduce is more about Batch Oriented  analysis and it’s great if you need to pre-process tons and tons of data]

But this “easiness” means that It is NOT as advanced as a Hadoop Installation (or Hadoop-on-Azure or Amazon’s elastic-map-reduce). But again, it’s easier and faster to get started with BigQuery. I guess, it just depends on what you are trying to achieve and based on that you’ll have to figure which is right tool for your scenario. No generic answer here, Sorry!

And BTW BigQuery supports only CSV – Talk about Variability (One of the V’s of BigData!). Let’s not get into that. I just wanted to Point that out because if you’re looking to analyze data-sets that cannot be converted to CSV for running SQL-Like Queries on top of them then BigQuery is not for you.

Conclusion:

Try out BigQuery. It’s easy to get started. It’s powerful if SQL-Like queries are all what you’ll need to analyze your data. If you are BigData enthusiast/expert/student – It’ll be a nice exercise to mentally compare other BigData offerings with BigQuery.

If you decide to try BigQuery or have already tried it out, I’ll love to hear what you think of it. Please leave a comment!

UPDATE (based on Michael Manoochehri’s comment): I didn’t implied that it is prohibitively expensive to upload data to BigQuery. Because I know, it’s NOT! Here is the result that Michael Manoochehri shared: As a test I once ingested about 350 Gb of CSV data (split into 10gb raw files, then I gzipped each one into ~1Gb). I ingested the entire batch using the bq command line tool, and had the entire dataset in BigQuery in just a few hours. I agree that it’s not 100% trivial to move 300 Gb of data from a local cluster into Google’s cloud – but it’s not really that difficult.

[Update: Aug 20th 2012: If you are interested in the Mechanics behind BigQuery – search for “Google Dremel Whitepaper”. it’s an amazing read]

 

Advertisements

4 thoughts on “First Impression: Google’s BigData offering called BigQuery

  1. I don’t know if you actually tried uploading data to Google’s Cloud, then ingested this staged data into BigQuery, but it’s not really as potentially prohibitive as you imply. You can do both steps using a batch process. As a test I once ingested about 350 Gb of CSV data (split into 10gb raw files, then I gzipped each one into ~1Gb). I ingested the entire batch using the bq command line tool, and had the entire dataset in BigQuery in just a few hours. I agree that it’s not 100% trivial to move 300 Gb of data from a local cluster into Google’s cloud – but it’s not really that difficult.

    Like

    • You’re right. I have not ingested data into BigQuery but i stated it based on my experiments of uploading data-sets to AWS and Azure. And yes it’s not difficult, it’s very easy but it’s just that it requires time to upload data. Let me update the blog-post to make it clear :) I am sorry if I implied that it is difficult. That was not my intention.

      Like

  2. manoochehrigoogle

    I don’t know if you actually tried uploading data to Google’s Cloud, then ingested this staged data into BigQuery, but it’s not really as potentially prohibitive as you imply. You can do both steps using a batch process. As a test I once ingested about 350 Gb of CSV data (split into 10gb raw files, then I gzipped each one into ~1Gb). I ingested the entire batch using the bq command line tool, and had the entire dataset in BigQuery in just a few hours. I agree that it’s not 100% trivial to move 300 Gb of data from a local cluster into Google’s cloud – but it’s not really that difficult.

    Like

Thank this author by sharing the article on social media. If you have any questions or comments, please leave a reply below:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s