Tuesday, October 29, 2013

Installing Hadoop 2 on a Mac

I've had a lot of trouble getting Hadoop 2 and yarn 2 running on my MAC.  There are some tutorials out there but they are often for
beta and alpha versions of the hadoop 2.0 family.  These are the steps I used to get Hadoop 2.2.0 working on my MAC running OSX 10.9

Note:  watch for version differences in this blog.  It was written for Hadoop 2.2.0, we are currently on 2.6.2 so that will need to be changed throughout.

Get hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/

make sure JAVA_HOME is set (if you have Java 6 on your machine):
export JAVA_HOME=`/usr/libexec/java_home -v1.6`
(Note your Java version should be 1.7 or 1.8)

point HADOOP_INSTALL to the hadoop installation directory
export HADOOP_INSTALL=/Applications/hadoop-2.2.0

And set the path
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

You can test hadoop is found with
hadoop -version

make sure ssh is set up on your machine:
system preferences -> sharing -> remote login is ticked

try:
ssh @localhost

where is the name you used to logon.

in $HADOOP_INSTALL/etc these are the conf files I changed.

core-site.xml

 <configuration>  
 <property>  
   <name>fs.default.name</name>  
   <value>hdfs://localhost:9000</value>  
  </property>  
 </configuration>  


hdfs-site.xml

 <configuration>  
 <property>  
   <name>dfs.replication</name>  
   <value>1</value>  
  </property>  
  <property>  
   <name>dfs.namenode.name.dir</name>  
   <value>file:/Users/Administrator/hadoop/namenode</value>  
  </property>  
  <property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/Administrator/hadoop/datanode</value>  
  </property>  
 </configuration>  


Make the directories for the namenode and datanode data (note the file above and the mkdir below will need to reflect where you  want to store the files, I've stored mine in the home directory of the Administrator user on my Mac).

mkdir -p /Users/Administrator/hadoop/namenode
mkdir -p /Users/Administrator/hadoop/datanode

hadoop namenode -format

yarn-site.xml
 <configuration>  
 <!-- Site specific YARN configuration properties -->  
 <property>  
 <name>yarn.resourcemanager.address</name>  
 <value>localhost:8032</value>  
 </property>  
 <property>  
 <name>yarn.nodemanager-aux-services</name>  
 <value>madpreduce.shuffle</value>  
 </property>  
 </configuration>  


start-dfs.sh
start-yarn.sh
jps

should give
9430 ResourceManager
9325 SecondaryNameNode
9513 NodeManager
9225 DataNode
9916 Jps
9140 NameNode

if not check log files.  If data node is not started and  you get incompatible id's error, stop everything delete datanode directory and recreate
datanode directory

try  a ls
hadoop fs -ls

if you get

ls: `.': No such file or directory

then there is no home directory in the hadoop file system.  So

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username>
where is the name you are logged onto the machine with.

now change to $HADOOP_INSTALL directory and upload a file

hadoop fs -put LICENSE.txt


finally try a mapreduce job:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount LICENSE.txt out

28 comments:

  1. Thanks for the great post. It really helped me get started. Since I ran into a few problems while following your directions, I thought I'd post the problem and solutions here in case they are useful to anyone else.


    Problem 1
    ------------
    When executing 'hadoop version', I would get a error. I apologize that I didn't capture the exact error syntax but the gist was that the hadoop was complaining about the location of java_home.

    Solution 1
    ------------
    Instead of using

    export JAVA_HOME=`/usr/libexec/java_home -v1.6`

    I added the following to my .bash_profile file:

    export JAVA_HOME="$(/usr/libexec/java_home)"



    Problem 2
    ------------
    When I perform an operation on the file system, I'd get errors that read "Unable to load realm info from SCDynamicStore".

    Solution 2
    ------------
    I added the following line to the bottom of the hadoop-env.sh file:

    export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

    I also added the following to the yarn-env.sh file:

    YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"



    Hope this helps!

    ReplyDelete
    Replies
    1. Thanks for your solutions. I recon problem 1 is because the syntax I had explicitly sets the java version to 1.6 which might not have been installed on your system. Thanks for the answer to problem 2

      Delete
  2. This is really essential information for web developers who are in the beginning stage of website developing.
    Web Designing Companies India | Web Development Companies

    ReplyDelete
  3. Thanks for your post!

    I have been trying to configure it this whole Sunday afternoon, but whatever tutorial I try, I keep getting the following error:

    ======

    13/12/01 18:44:32 INFO mapreduce.Job: Job job_1385919832889_0001 failed with state FAILED due to: Application application_1385919832889_0001 failed 2 times due to AM Container for appattempt_1385919832889_0001_000002 exited with exitCode: 127 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)


    .Failing this attempt.. Failing the application.
    13/12/01 18:44:32 INFO mapreduce.Job: Counters: 0
    =====


    Any ideas?

    ReplyDelete
    Replies
    1. So I get this error when trying to run the wordcount example.

      Delete
    2. Solved it. I had some leftovers from other tutorials in my config files. Make sure to only make the changes in this tutorial fellow mac users! :)

      Delete
    3. Tackled it. I had a few remains from different excercises in my config indexes. Make a point to just make the progressions in this excercise individual mac clients! :)

      http://www.theweb77.com/simple-website/

      Delete
    4. Hadoop is a open source framework which is written in java by apche
      software foundation.Hadoop Tutorial

      Delete
  4. Awesome! Worked perfectly for me! Thanks!

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. Much obliged concerning your answers. I recon issue 1 is on the grounds that the punctuation I had unequivocally sets the java form to 1.6 which may not have been introduced on your framework. A debt of gratitude is in order regarding the reply to issue.



    best website design//Mobile Apps N Webs Development

    ReplyDelete
  7. Thanks for great article. All worked for me - and this was the first time I tried to get Hadoop up and running.

    ReplyDelete
  8. I hope this information of installing process would be as best reference to install the hadoop for the people.I really grateful to this blog for updating useful things.
    Web Designing Companies Bangalore | Website Development Company Bangalore

    ReplyDelete
  9. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    selenium training in chennai

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.Devops interview Questions

    ReplyDelete
  12. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
    samsung mobile service center in vadapalani

    ReplyDelete
  13. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information.
    DevOps Training in Chennai

    DevOps Course in Chennai

    ReplyDelete
  14. i am very gladfully to u share a this kind of information with us u make a blog on web development. if you want to know about server hosting or interested in best Managed Dedicated Server you can ask us for more details and services.

    ReplyDelete
  15. Hello friend your blog is very instructive, and it contains a very good amount of knowledge, knowing about Installing Hadoop 2 on a Mac. Web Hosting plays a very important role in the business world. And it is important to have the best hosting services. Buy the best Dedicated Server Hosting service for your website.

    ReplyDelete
  16. This blog has really best knowledge for the Web development Dubai and also helped me to find out out new ways at my work, thanks for sharing the blog.

    ReplyDelete
  17. Great post! Such an informative and well-written piece. Are you looking for a gardener that cares for or designs your garden then click here for the best Gärtner service.

    ReplyDelete