BIG DATA- HADOOP
Hadoop is a complete eco-system of open source projects that provide us the framework to deal with big data. Let’s start by brainstorming the possible challenges of dealing with big data (on traditional systems) and then look at the capability of Hadoop solution.
Following are the challenges I can think of in dealing with big data :
- High capital investment in procuring a server with high processing capacity.
- Enormous time taken
- In case of long query, imagine an error happens on the last step. You will waste so much time making these iterations.
- Difficulty in program query building
Here is how Hadoop solves all of these issues :
- High capital investment in procuring a server with high processing capacity:Hadoop clusters work on normal commodity hardware and keep multiple copies to ensure reliability of data. A maximum of 4500 machines can be connected together using Hadoop.
- Enormous time taken :The process is broken down into pieces and executed in parallel, hence saving time. A maximum of 25 Petabyte (1 PB = 1000 TB) data can be processed using Hadoop.
- In case of long query, imagine an error happens on the last step. You will waste so much time making these iterations :Hadoop builds back up data-sets at every level. It also executes query on duplicate datasets to avoid process loss in case of individual failure. These steps makes Hadoop processing more precise and accurate.
- Difficulty in program query building : Queries in Hadoop are as simple as coding in any language. You just need to change the way of thinking around building a query to enable parallel processing.