Wednesday, February 9, 2011

Hadoop/Mahout - setting up a development environment

This post explains how to setup a development environment for Hadoop and Mahout.

Prerequisites:  need to have Mahout and Hadoop sources. (See previous posts).

On a development machine
1) Download Helios version of Eclipse like eclipse-java-helios-SR1-linux-gtk-x86_64.tar.gz
and save it locally. Opem the zip file using:
 tar xvzf *.gz

 2) Install the Map-reduce eclipse plugin
cd eclipse/plugins/
wget https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar

 3) Follow the directions in
http://m2eclipse.sonatype.org/installing-m2eclipse.html
to install maven plugin in eclipse.

4) Eclipse-> File-> import maven project -> select mahout root dir -> finish
you will see a list of all subprojects. Press OK and wait for compilation to finish.
If everything went smoothly project should compile.

5)Select Map-reduce view -> Map-reduce location tab -> Edit Hadoop locations
In general tab, Add location name (just a name to identify this configuration) and the host and
port for the Map/reduce master (default port 50030 using the EC2 configuration described in previous posts) and DFS master (default port 50070) -> Finish

No comments:

Post a Comment