Thursday, October 31, 2013

Geting a portable hadoop environment

Before we start learning individual components of hadoop ecosystem, it is good to get your portable hadoop environment. There are various options available, but after playing with some of them, I personally find hortonworks sandbox virtual machine image for Oracle VirtualBox as easiest for getting started on hadoop ecosystem.
Everything you need is freely available and fairly simple to setup. Follow these steps you are novice in downloading and installing software.
  1. Download Oracle VirtualBox from here.
I have Microsoft Windows 7 on my laptop so I need VirtualBox for windows hosts as shown in below image.
 
 
 
  1. In previous step you downloaded VirtualBox-4.3.0-89960-Win.exe file. Now you just need to execute it and follow on screen instructions. In fact you just need to click next button three times, Yes button once, install button once and finally finish button. It’s done. You have powerful virtualization software on your laptop.
  2. After installation Oracle VirtualBox should start automatically which looks like image shown below. You can close it now and when you need to start it again, you can find it in your start menu. 

  1. After installing Oracle VirtualBox on Microsoft Windows 7, I found a new network connection setup. Presence of this network stopped me to connect through internet using my data card. If you face similar problem, you need to disable this network. Follow instructions below if you want to disable this network.
Start Control Panel->Network and Internet->Network and Sharing Center->Change Adapter Settings. You should see a network connection as below. Right click and disable it. You may have to reconnect your internet once again.

  1. Now you are ready to setup virtual machine into Oracle VirtualBox. For our purpose, you need to download hortonworks sandbox appliance from here. 



  1. You downloaded Hortonworks+Sandbox+2.0+VirtualBox.ova file.
  2. Start Oracle VirtualBox and select File->Import Appliance as shown in image below.




  1. You will see a dialog box, click browse button and select Hortonworks+Sandbox+2.0+VirtualBox.ova file which you have downloaded from hortonworks website.



  1. Click Next button and then Import button when new dialog box appears. Your Virtual machine import will start and complete in few minutes.


  1. After completion you will see Hortonworks Sandboc 2.0 listed in your Oracle VirtualBox as shown below. 
  2. You will notice that your virtual machine is configured with 2048 MB memory. This means your guest OS once started will consume 2 GB of your RAM. If you have 3 GB or more RAM on your laptop, you don’t need to worry. If you have less than that, you can click on the System button just above and reduce memory it to 1 GB.
  3. To start your virtual machine click Start Button. Once started, you will get your portable Hadoop running which is a familiar Linux terminal interface.


  1. To shut down your virtual machine, Click ACPI Shutdown as shown in below screen.