Hadoop

Home » Hadoop
Apache Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.

It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.

Hadoop is designed to be robust, in that your Big Data applications will continue to run even when individual servers or clusters fail.

Hadoop-in-enterprise

Understanding Big Data:

Analytics for Enterprise Class Hadoop and Streaming Data

Apache Hadoop has two main subprojects

  • MapReduce –
    The framework that understands and assigns work to the nodes in a cluster.MapReduceis a programming model which is used to process large data sets in a batch processing manner.
  • HDFS
    A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail so it achieves reliability by replicating data across multiple nodes. Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive and Zookeeper, that extend the value of Hadoop and improves its usability.

What’s the big deal?

Hadoop changes the economics and the dynamics of large scale computing. Its impact can be boiled down to four salient characteristics.

Hadoop enables a computing solution that is

  • Scalable
    New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
  • Cost effective
    Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
  • Flexible
    Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.
  • Fault tolerant
    When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Is Hadoop right for you?

Eighty percent of the world’s data is unstructured, and most businesses don’t even attempt to use this data to their advantage. Imagine if you could afford to keep all the data generated by your business? Imagine if you had a way to analyze that data? At InfoVille Solution we understand this latent intelligence and help organizations realize most of it. With our experience you can be confident of interpreting the data for your business goals and needs.

Pin It on Pinterest

Share This