Posts

Showing posts from September, 2017

Introduction to Sqoop Part 1

Introduction To use Hadoop for analytics, It is required to load the data into Hadoop clusters. Then later we can use this data for processing it  using traditional processing tool (e.g Map-Reduce/Hive/Pig). Sqoop is  used to Import data from RDBMS system to Hadoop distributed File system (HDFS). And for Exporting data from HDFS back to RDBMS, Sqoop is used. Loading GBs and TBs of data into HDFS from production databases or accessing it from map-reduce applications is a challenging task. While doing so, we have to consider things like data consistency, overhead of running these jobs on production systems and at the end if this process would be efficient or not. Using batch scripts to load data is an inefficient way to go with. Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities: Imports individual tables or entire databases to files in HDFS Generates Java classes to allow you to interact with your imported data Provi...