STEPS TO RUN WORD COUNT MAP-REDUCE
If you have hadoop installed (extracted hadoop tar file) please ignore the below installation step.
Installation:
==> please replace user value with appropriate user name from your system
terminal> cd /home/user
==> to download the tarball from below location you:
Either you can paste below link in web browser which would start downloading or you can you wget command to download hadoop tar file.
terminal>wget http://redrockdigimark.com/apachemirror/hadoop/common/stable/hadoop-2.9.1.tar.gz
http://redrockdigimark.com/apachemirror/hadoop/common/stable/hadoop-2.9.1.tar.gz
==> untar the file using below command
terminal>tar -xzf hadoop-2.9.1.tar.gz
==> you can use the below command to check if the files has been sucessfully extracted or not
terminal> cd hadoop-2.9.1/bin
==> below ls command will list all the commands which related to hadoop
terminal> ls
==> below command would show you the current files from the current directory.
terminal> /home/user/hadoop-2.9.1/hadoop fs -ls
You can install hadoop in sudo-distributed mode as well. But for the beginning above steps are sufficient.
Creating Jar to be run
Currently I am using Intellij (IDE) to show you to create jar, You can use any IDE.(Intellij and Eclipse are most frequently used by organisations )
- Create a intellij maven project.
- Create a package with name "org.myorg.hadoop" in the project.
- Create a new class with name "WordCount"
- Copy and paste the code from the below location to "WordCount.java"
- https://worldofhadoopbigdata.blogspot.com/2018/07/hadoop-map-reduce-word-count-java-code.html
- Now to create jar go to
- File->Project Structure -> Artifacts
- Click on plus sign -> Jar -> From modules with dependencies
- Provide the main Class as org.myorg.hadoop.WordCount
- Click On OK.
- Now all the setting is being done. Now we will create jar.
- Click on Build-> build artifacts
- Click on Action Build
- This would create a jar in location
- ProjectLocation/out/artifacts/{Project_name}_jar/{Project_name}.jar
Running the Jar
Now that the jar has been created. You can run the below command to start running the map-reduce.
- /home/user/hadoop-2.9.1/hadoop jar /path_to/{Project_name}.jar input_file_location output_file_location
- In above command :
- first parameter is input_file_location which is the input file on which word count needs to be performed
- second parameter is output_file_location is the location at which final output is stored
To check the output we can use below command:
/home/user/hadoop-2.9.1/hadoop fs -ls output_file_location
this directory would have final output of word-count.
Comments
Post a Comment