World Of Big Data

Posts

Showing posts from August, 2017

HDFS Block Concepts

August 25, 2017

File system Blocks : file system is used to control how data is stored and retrieved. Without a file system, information placed in a storage medium would be one large body of data with no way to tell where one piece of information stops and the next begins. A block is the smallest unit of data that can be stored or retrieved from the disk. Filesystems deal with the data stored in blocks. Filesystem blocks are normally in few kilobytes of size. Even if you try to store a block that has contents less than that of block size still it will occupy the block size on the disk.Blocks are transparent to the user who is performing filesystem operations like read and write. Need of distributed filesystems When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines. Filesystems that manage the storage across a network of machines are called distributed filesyst...

Hadoop calculate maximum temperature explained

August 19, 2017

Analyzing the Data with Hadoop Using Map Reduce To take advantage of the parallel processing that Hadoop provides, we need to express our query as a MapReduce job because MapReduce framwork will manage the parallel processing by it self. MapReduce divides the processing into 2 phases - the map phase the reduce phase Each phase has input in the form of key value pair, and both phases produces output as key value pair. The output that has been generated by the map phase is given to a reduce phase as an input. It is the programmers responsibility to specifies two functions: the map function the reduce function Lets take an example where input to the map phase is from the below link which has the NCDC data- https://raw.githubusercontent.com/lmsamarawickrama/Hadoop-MapReduce/master/NCDC%20weather%20files/1901 Using the above data we need to calculate maximum temperature per year. While writing a mapreudce code, We choose a text input...

Hadoop Map-Reduce Word Count Java Example

August 17, 2017

This hadoop tutorial aims to give developers a great start in the world of hadoop mapreduce programming by giving them a hands-on experience in developing their first hadoop based WordCount application. Hadoop MapReduce WordCount example is a standard example where hadoop developers begin their hands-on programming with. This tutorial will help hadoop developers learn how to implement WordCount example code in MapReduce to count the number of occurrences of a given word in the input file. (so called) Pre-requisites to follow this Hadoop WordCount Example Tutorial Hadoop must be installed or you should have a sandbox running on your Virtualbox (or VMWare). In case you have installed Hadoop on your machine Single node hadoop cluster must be configured and running. Optional - IDE must be installed (IntelliJ or Eclipse or any IDE) Hadoop Map Reduce Example - Word Count – How it works? Hadoop WordCount operation occurs in 3 stages – Mapper Phase Shuffle Phase Reducer Ph...

Hadoop Introduction

August 16, 2017

Introduction to big data and hadoop A problem that led to hadoop -- Before getting into technicalities in this Hadoop tutorial blog, let me begin with an interesting story on how Hadoop came into the picture and why is it so popular in the industry nowadays. So, it all started with two people, Doug Cutting and Mike Cafarella , who were in the process of building a search engine system that can index 1 billion pages. After their research, they estimated that such a system will cost around half a million dollars in hardware, with a monthly running cost of $30,000, which is quite expensive. However, they soon realised that their architecture will not be capable enough to work around with billions of pages on the web. They came across a paper, published in 2003, that described the architecture of Google’s distributed file system, called GFS, which was being used in production at Google. Now, this paper on GFS proved to be something that they were looking for, and...