Tuesday, October 29, 2013

Installing Hadoop 2 on a Mac

I've had a lot of trouble getting Hadoop 2 and yarn 2 running on my MAC.  There are some tutorials out there but they are often for
beta and alpha versions of the hadoop 2.0 family.  These are the steps I used to get Hadoop 2.2.0 working on my MAC running OSX 10.9


Get hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/

make sure JAVA_HOME is set (if you have Java 6 on your machine):
export JAVA_HOME=`/usr/libexec/java_home -v1.6`

point HADOOP_INSTALL to the hadoop installation directory
export HADOOP_INSTALL=/Applications/hadoop-2.2.0

And set the path
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

You can test hadoop is found with
hadoop -version

make sure ssh is set up on your machine:
system preferences -> sharing -> remote login is ticked

try:
ssh @localhost

where is the name you used to logon.

in $HADOOP_INSTALL/etc these are the conf files I changed.

core-site.xml

 <configuration>  
 <property>  
   <name>fs.default.name</name>  
   <value>hdfs://localhost:9000</value>  
  </property>  
 </configuration>  


hdfs-site.xml

 <configuration>  
 <property>  
   <name>dfs.replication</name>  
   <value>1</value>  
  </property>  
  <property>  
   <name>dfs.namenode.name.dir</name>  
   <value>file:/Users/Administrator/hadoop/namenode</value>  
  </property>  
  <property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/Administrator/hadoop/datanode</value>  
  </property>  
 </configuration>  


Make the directories for the namenode and datanode data (note the file above and the mkdir below will need to reflect where you  want to store the files, I've stored mine in the home directory of the Administrator user on my Mac).

mkdir -p /Users/Administrator/hadoop/namenode
mkdir -p /Users/Administrator/hadoop/datanode

hadoop namenode -format

yarn-site.xml
 <configuration>  
 <!-- Site specific YARN configuration properties -->  
 <property>  
 <name>yarn.resourcemanager.address</name>  
 <value>localhost:8032</value>  
 </property>  
 <property>  
 <name>yarn.nodemanager-aux-services</name>  
 <value>madpreduce.shuffle</value>  
 </property>  
 </configuration>  


start-dfs.sh
start-yarn.sh
jps

should give
9430 ResourceManager
9325 SecondaryNameNode
9513 NodeManager
9225 DataNode
9916 Jps
9140 NameNode

if not check log files.  If data node note started and  you get incompatible id's error, stop everything delete datanode directory and recreate
datanode directory

try  a ls
hadoop fs -ls

if you get

ls: `.': No such file or directory

then there is no home directory in the hadoop file system.  So

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username>
where is the name you are logged onto the machine with.

now change to $HADOOP_INSTALL directory and upload a file

hadoop fs -put LICENSE.txt


finally try a mapreduce job:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0 wordcount LICENSE.txt out

14 comments:

  1. Thanks for the great post. It really helped me get started. Since I ran into a few problems while following your directions, I thought I'd post the problem and solutions here in case they are useful to anyone else.


    Problem 1
    ------------
    When executing 'hadoop version', I would get a error. I apologize that I didn't capture the exact error syntax but the gist was that the hadoop was complaining about the location of java_home.

    Solution 1
    ------------
    Instead of using

    export JAVA_HOME=`/usr/libexec/java_home -v1.6`

    I added the following to my .bash_profile file:

    export JAVA_HOME="$(/usr/libexec/java_home)"



    Problem 2
    ------------
    When I perform an operation on the file system, I'd get errors that read "Unable to load realm info from SCDynamicStore".

    Solution 2
    ------------
    I added the following line to the bottom of the hadoop-env.sh file:

    export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

    I also added the following to the yarn-env.sh file:

    YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"



    Hope this helps!

    ReplyDelete
    Replies
    1. Thanks for your solutions. I recon problem 1 is because the syntax I had explicitly sets the java version to 1.6 which might not have been installed on your system. Thanks for the answer to problem 2

      Delete
  2. This is really essential information for web developers who are in the beginning stage of website developing.
    Web Designing Companies India | Web Development Companies

    ReplyDelete
  3. Thanks for your post!

    I have been trying to configure it this whole Sunday afternoon, but whatever tutorial I try, I keep getting the following error:

    ======

    13/12/01 18:44:32 INFO mapreduce.Job: Job job_1385919832889_0001 failed with state FAILED due to: Application application_1385919832889_0001 failed 2 times due to AM Container for appattempt_1385919832889_0001_000002 exited with exitCode: 127 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)


    .Failing this attempt.. Failing the application.
    13/12/01 18:44:32 INFO mapreduce.Job: Counters: 0
    =====


    Any ideas?

    ReplyDelete
    Replies
    1. So I get this error when trying to run the wordcount example.

      Delete
    2. Solved it. I had some leftovers from other tutorials in my config files. Make sure to only make the changes in this tutorial fellow mac users! :)

      Delete
    3. Tackled it. I had a few remains from different excercises in my config indexes. Make a point to just make the progressions in this excercise individual mac clients! :)

      http://www.theweb77.com/simple-website/

      Delete
    4. Hadoop is a open source framework which is written in java by apche
      software foundation.Hadoop Tutorial

      Delete
  4. Awesome! Worked perfectly for me! Thanks!

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. Much obliged concerning your answers. I recon issue 1 is on the grounds that the punctuation I had unequivocally sets the java form to 1.6 which may not have been introduced on your framework. A debt of gratitude is in order regarding the reply to issue.



    best website design//Mobile Apps N Webs Development

    ReplyDelete
  7. Thanks for great article. All worked for me - and this was the first time I tried to get Hadoop up and running.

    ReplyDelete
  8. This website is very helpful for the students who need info about the Hadoop courses.i appreciate for your post. thanks for shearing it with us. keep it up.
    Hadoop Training in hyderabad

    ReplyDelete
  9. I hope this information of installing process would be as best reference to install the hadoop for the people.I really grateful to this blog for updating useful things.
    Web Designing Companies Bangalore | Website Development Company Bangalore

    ReplyDelete