Presented at #sqlpass summit 2015.
I was at the HP Big data conference last week and I heard something during the keynote that’s worth sharing with you.
As Data & Analytics professionals, we spend a lot of our time on finding insights, trends & patterns out of the data but the keynote speaker (Ken Rudin, Facebook) encouraged everyone to take that a step further = Think about Driving impact based on the insights. It’s simple yet a powerful idea! Over past few months, I have started working closely with decision makers and helping drive impact vs just “handing-off” insights.
Don't strive for actionable insights but focus on taking it to next level: drive impact – Ken Rudin #HPBigData2015
— Paras Doshi (@paras_doshi) August 12, 2015
I hope that helps! Just wanted to share that with you. What do you think?
R is a popular tool among data scientists because it’s just like a Swiss Army knife (or may be more!) for them!
Analogy credit: Tapping the Data Deluge with R by Jeffrey Breen
Sometime back I worked on a research project that involved writing some R code – we were searching for tools ways to pull data from multiple social networks, perform text analysis and create effective data visualizations. R seemed like a great tool & so I was searching for a book/guides that teaches me fundamentals I needed to know to get few R related things done. One of the books that I used often during the research project was “R in nutshell”. I didn’t read it cover-to-cover but it was a great reference book for me. I used to read guides online/other-books and then I used to combine information from this book to get stuff done. The section I liked the most was on Data visualization which included some great code snippets to create effective data visualization using ggplot2 library. I used to take code snippets from this book & apply it on data-sets that I had.
Also, I liked it that the book has some end-to-end examples that cover the entire life cycle of data analysis/statistical-analysis.
I recommend this book as a “reference” for someone who started working with R.
I received a copy of this book as part of OREILLY’s Blogger program. Thanks OREILLY! If you are a blogger, you should check out that program!
Think of “continuum” as something you start and you never stop improving upon. In my mind, Business Analytics Continuum is continuous investment of resources to take business analytics capabilities to next level. So what are these levels? Douglas McDowell explained about this concept in recent post here – I think it was a great food for thought for me and hence I posting about this particular concept here.
Here is the visual representation of the concept:
And I would encourage you to read the entire post and other posts in the series here: PASS BAC Preview Series: Business Analytics Defined
— Paras Doshi (@paras_doshi) January 30, 2013
And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!
Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx
— 24hop (@pass24hop) February 8, 2013
And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!
Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!
That’s about it for this post. Enjoy the recordings!
This is a Quick Post, Just want to share a command to upload local data to HDFS using Hadoop Command Line.
The command looks like:
> hadoop fs -copyFromLocal input.txt input/SqrtJob/input.txt
- Hadoop on Windows: How to Browse the Hadoop Filesystem? (parasdoshi.com)
- Microsoft HDInsight Preview for Windows: How to create a directory in Hadoop File System? (parasdoshi.com)
- Microsoft HDInsight Preview for Windows: How to use Sqoop to load data into HDFS from SQL Server? (parasdoshi.com)
- There’s been a growing interest in Hadoop & Big Data, Here’s the Proof: (parasdoshi.com)
- Five big data predictions for 2013 (strata.oreilly.com)
Download Link Here:
(if you need the .ppt version of this talk, please contact me via http://parasdoshi.com/contact/)
In this post, I want to point out that HDInsight (Hadoop on Windows) comes with a sample datasets (log files) that you can load using the command:
1. Hadoop command Line > Navigate to c:\Hadoop\GettingStarted
2. Execute the following command:
powershell -ExecutionPolicy unrestricted –F importdata.ps1 w3c
After you have successfully executed the command, you can sample files in /w3c/input folder:
Conclusion: In this post, we saw how to load some data to Hadoop on Windows file system to get started. Your comments are very welcome.
Official Resource: http://gettingstarted.hadooponazure.com/loadingData.html
In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?
With that, here are the steps:
1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7
2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine
Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!
3. Before we begin, decide on three things:
3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.
3b. select the table that you want to load into HDFS
In my case, it’s this table:
You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1
[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]
4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)
5. Navigate to: c:\Hadoop\sqoop-1.4.2\bin>
*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.
6. Run dir command to see various files under this directory.
Also you can run sqoop help for more information on the command that we are about to run.
7. Now here’s the command to Load data from SQL Server to HDFS:
c:\Hadoop\sqoop-1.4.2\bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1
8. After successfully running the above command, let’s browse the file in HDFS!
That’s about it for this post!
Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS
In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)
- How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?
- Visualizing MapReduce Algorithm with WordCount Example:
- End to End Demo: Hadoop (HDInsight) + Hive + Excel + Power View + Azure Data Market
- Microsoft® HDInsight Preview for Windows: How to create a directory in Hadoop File System?