What is Hadoop?
I’m sure you’ve heard about Big Data. If not, I recommend you my blog post “ What is Big Data ?” The most well known technology used for Big Data is Hadoop. It is used by Yahoo, eBay, LinkedIn and Facebook. It has been inspired from Google publications on MapReduce, GoogleFS and BigTable. As Hadoop can be hosted on commodity hardware (usually Intel PC on Linux with one or 2 CPU and a few TB on HDD, without any RAID replication technology), it allows them to store huge quantity of data (petabytes or even more) at very low cost (compared to SAN bay systems). Hadoop is an open source suite, under an apache foundation: http://hadoop.apache.org/ . The Hadoop “brand” contains many different tools. Two of them are core parts of Hadoop: Hadoop Distributed File System (HDFS) is a virtual file system that looks like any other file system except than when you move a file on HDFS, this file is split into many small files, each of those files is replicated and stored on (usua...