Just like AWS has Elastic Map Reduce with pre-installed Hadoop ecosystem, Google has Apache Hadoop with bdutil controller to enhance its functionalities, Azure has recently launched hdinsight, their own Hadoop provisioning service on the azure cloud. It is built using Hortonworks Data Platform (HDP) distribution of hadoop.
Let’s get started and explore the capabilities it offers.
1. Login to the manage.windowsazure.com portal, click the hdinsight service and click on CREATE AN HDINSIGHT CLUSTER.
2. Select Hadoop and provide details like cluster name, password and the number of Hadoop nodes you want.
3. If you don’t have azure power shell, you can download it from here.
If you are using power shell for the first time, you will need to provide your azure account credentials to power shell using this command:
Next tell the power shell the name of the hdinsight cluster you just created, using this command:
Azure’s hdinsight has some sample data and the power shell cli has its own way of executing hadoop related commands. Here is an example you can try.
Invoke-Hive "select country, state, count(*) as records from hivesampletable group by country, state order by records desc limit 5;"
4. On the hdinsight screen in manage.windowsazure.com, you will see a button named “Query console”. Click on that and provide your hdinsight username and password.
There are some pre-built solutions like twitter trend data analysis on sample data provided by hdinsight. They are pretty simple, just have to follow the instructions provided by them and you can quickly learn to use hdinsight as per your application needs by following those solutions.
The hdinsight service has launched recently and needs a lot of improvement. It is very easy to use and has taken away my pain of setting up a hadoop cluster on my laptop and setting up a multi-node cluster on virtual machines. I am confident that the hdinsight service will have a bright future. Good job Microsoft.