STEPS TO RUN WORD COUNT MAP-REDUCE

If you have hadoop installed (extracted hadoop tar file) please ignore the below installation step.

Installation:

==> please replace user value with appropriate user name from your system
terminal> cd /home/user

==> to download the tarball from below location you:
Either you can paste below link in web browser which would start downloading or you can you wget command to download hadoop tar file.

terminal>wget  http://redrockdigimark.com/apachemirror/hadoop/common/stable/hadoop-2.9.1.tar.gz
http://redrockdigimark.com/apachemirror/hadoop/common/stable/hadoop-2.9.1.tar.gz

==> untar the file using below command
terminal>tar -xzf hadoop-2.9.1.tar.gz

==> you can use the below command to check if the files has been sucessfully extracted or not 
terminal> cd hadoop-2.9.1/bin

==> below ls command will list all the commands which related to hadoop
terminal> ls

==> below command would show you the current files from the current directory.
terminal> /home/user/hadoop-2.9.1/hadoop fs -ls

You can install hadoop in sudo-distributed mode as well. But for the beginning above steps are sufficient.

Creating Jar to be run 

Currently I am using Intellij (IDE) to show you to create jar, You can use any IDE.(Intellij and Eclipse are most frequently used by organisations )

  • Create a intellij maven project.
  • Create a package with name "org.myorg.hadoop" in the project.
  • Create a new class with name "WordCount"
  • Copy and paste the code  from the below location to "WordCount.java"
    • https://worldofhadoopbigdata.blogspot.com/2018/07/hadoop-map-reduce-word-count-java-code.html
  • Now to create jar go to 
    • File->Project Structure -> Artifacts
    • Click on plus sign -> Jar -> From modules with dependencies

    • Provide the main Class as org.myorg.hadoop.WordCount
    • Click On OK.
    • Now all the setting is being done. Now we will create jar.
    • Click on Build-> build artifacts
    • Click on Action Build
  • This would create a jar in location 
    • ProjectLocation/out/artifacts/{Project_name}_jar/{Project_name}.jar

Running the Jar

Now that the jar has been created. You can run the below command to start running the map-reduce.
  • /home/user/hadoop-2.9.1/hadoop jar /path_to/{Project_name}.jar input_file_location output_file_location
  • In above command :
    • first parameter is input_file_location which is the input file on which word count needs to be performed
    • second parameter is output_file_location is the location at which final output is stored

To check the output we can use below command:

/home/user/hadoop-2.9.1/hadoop fs -ls output_file_location 

this directory would have final output of word-count.

Comments

Popular posts from this blog

Hadoop calculate maximum temperature explained

Sqoop In Depth

Introduction to Sqoop Part 1