Adventures in Web Development: Hadoop

Showing posts with label Hadoop. Show all posts

Saturday, November 2, 2013

Hadoop 2.x : jar file location for wordcount example

The Jar files for Hadoop 2.x have moved location from Hadoop 1.x. I found the following command

javac -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar -d wordcount_classes myWordCount.java

will allow you to compile the standard wordcount example code. You can see that the common files are in /share/hadoop/common/ and the mapreduce files are in /share/hadoop/mapreduce/. Finally the common lib file are in /share/hadoop/common/lib

This post is in answer to this stackoverflow question:

http://stackoverflow.com/questions/19488894/compile-hadoop-2-2-0-job

(or set your classpath lke this

export CLASSPATH=$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

and compile like this:

javac -classpath $CLASSPATH -d myWordCountClasses myWordCount.java

)

Wednesday, October 30, 2013

Hadoop 2 on Ubuntu on Azure.

This is to be read in conjunction with http://ac31004.blogspot.co.uk/2013/10/installing-hadoop-2-on-mac_29.html

Fire up a Azure Ubuntu server and ssh to it

Install a Java JDK:
apt-get install default-jdk

On you home machine, download a copy of Hadoop and secure copy it to the Azure machine (your username and machine will be different)
scp hadoop-2.2.0.tar.gz user@Hadoopmachine.cloudapp.net:

Unzip it and untar it
gunzip hadoop-2.2.0.tar.gz
tar xvf hadoop-2.2.0.tar

You'll still need to set up the env variables
export JAVA_HOME=/usr/lib/jvm/default-java
export HADOOP_INSTALL=/home/user/hadoop-2.2.0
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

Also add JAVA_HOME, add Hadoop_INSTALL and change path in /etc/environment, see http://trentrichardson.com/2010/02/10/how-to-set-java_home-in-ubuntu/ for details

After setting up core-site.xml and hdfs-site.xml you'll make the datanode and name nodename directories

mkdir -p /home/hadoop/yarn/namenode
mkdir /home/hadoop/yarn/datanode

Everything else should be the same.

Tuesday, October 29, 2013

Installing Hadoop 2 on a Mac

I've had a lot of trouble getting Hadoop 2 and yarn 2 running on my MAC. There are some tutorials out there but they are often for
beta and alpha versions of the hadoop 2.0 family. These are the steps I used to get Hadoop 2.2.0 working on my MAC running OSX 10.9

Note: watch for version differences in this blog. It was written for Hadoop 2.2.0, we are currently on 2.6.2 so that will need to be changed throughout.

Get hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/

make sure JAVA_HOME is set (if you have Java 6 on your machine):
export JAVA_HOME=`/usr/libexec/java_home -v1.6`
(Note your Java version should be 1.7 or 1.8)

point HADOOP_INSTALL to the hadoop installation directory
export HADOOP_INSTALL=/Applications/hadoop-2.2.0

And set the path
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

You can test hadoop is found with
hadoop -version

make sure ssh is set up on your machine:
system preferences -> sharing -> remote login is ticked

try:
ssh @localhost

where is the name you used to logon.

in $HADOOP_INSTALL/etc these are the conf files I changed.

core-site.xml

 <configuration>  
 <property>  
   <name>fs.default.name</name>  
   <value>hdfs://localhost:9000</value>  
  </property>  
 </configuration>

hdfs-site.xml

 <configuration>  
 <property>  
   <name>dfs.replication</name>  
   <value>1</value>  
  </property>  
  <property>  
   <name>dfs.namenode.name.dir</name>  
   <value>file:/Users/Administrator/hadoop/namenode</value>  
  </property>  
  <property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/Administrator/hadoop/datanode</value>  
  </property>  
 </configuration>

Make the directories for the namenode and datanode data (note the file above and the mkdir below will need to reflect where you want to store the files, I've stored mine in the home directory of the Administrator user on my Mac).

mkdir -p /Users/Administrator/hadoop/namenode
mkdir -p /Users/Administrator/hadoop/datanode

hadoop namenode -format

yarn-site.xml

 <configuration>  
 <!-- Site specific YARN configuration properties -->  
 <property>  
 <name>yarn.resourcemanager.address</name>  
 <value>localhost:8032</value>  
 </property>  
 <property>  
 <name>yarn.nodemanager-aux-services</name>  
 <value>madpreduce.shuffle</value>  
 </property>  
 </configuration>

start-dfs.sh
start-yarn.sh
jps

should give
9430 ResourceManager
9325 SecondaryNameNode
9513 NodeManager
9225 DataNode
9916 Jps
9140 NameNode

if not check log files. If data node is not started and you get incompatible id's error, stop everything delete datanode directory and recreate
datanode directory

try a ls
hadoop fs -ls

if you get

ls: `.': No such file or directory

then there is no home directory in the hadoop file system. So

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username>
where is the name you are logged onto the machine with.

now change to $HADOOP_INSTALL directory and upload a file

hadoop fs -put LICENSE.txt

finally try a mapreduce job:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount LICENSE.txt out

Adventures in Web Development