Knowledgebase

Distributed Cache in Hadoop

Distributed Cache is a facility provided by the Hadoop MapReduce framework. It caches files when...

HDFS: Read & Write Commands using Java API

Hadoop Distributed File SystemIt is the most important component of the Hadoop Ecosystem. HDFS is...

Hadoop : OOZIE

It is a workflow scheduler system to manage Apache Hadoop jobs. It combines multiple jobs...

Hadoop MapReduce: Counters & Joins

MapReduce is the core component of Hadoop which provides data processing. MapReduce works by...

Hadoop PIG

Hadoop Pig is nothing but an abstraction over MapReduce. While it comes to analyze large sets of...

Hadoop PIG : Installation

Pig Installation Before we start with the actual process, change user to 'hduser' (user used for...

Hadoop Setup - Installation & Configuration

Requirement: Ubuntu installed and running Java Installed Perform the following steps: 1)...

Hadoop: Features, Components, Cluster & Topology

Apache HADOOP is a framework used to develop data processing applications which are executed in a...

Hadoop: What is Sqoop and Flume?

Sqoop is a tool designed to transfer data between Hadoop and relational database servers. Sqoop...

Introduction to BIG DATA

'Big Data' is a data but huge in size. It is also described as a collection of data that is huge...

Limitations of Hadoop

Various limitations of Apache Hadoop are given below along with their solution- 1. Issues with...

Understanding Hadoop High Availability Feature

Objective This blog provides you the description of Hadoop HDFS High Availability feature. In...

What is MapReduce? How it Works?

MapReduce is the processing layer of Hadoop. MapReduce programming model is designed for...

Articles