Apache Hadoop:

  • It is a Big Data framework which uses Hadoop Distributed File System to Store the data and MapReduce framework to process that data. Java is used as native language to write MapReduce programs.
  • Apache Hbase and Cassandra are both NoSQL databases.
  • It does not maintain the strict ACID transactions and require better data modelling to be used effectual.
  • Their presentation can be calculated by benchmarking the database using a tool called YCSB(Yahoo Cloud Serving Benchmark).
  • Factors examine in benchmarking can be read, write ,read-modify-write latency in executing queries.
Hbase:
  • It is a Master -Slave NoSQL database dependent upon Hadoop cluster .
  • It is good for Heavy reads and less Write applications
apache hadoop ecosystem

Apache Cassandra:

  • It is a master less and shared-nothing, ring based architecture which does not depend upon Hadoop framework.
  • It is good for Write heavy and less read applications.

Apache Hive:

  • It is a collective processing framework to process the data using a language called Hive Query Language(HQL).
  • HQL is a sql wrapper on top of HDFS which protect writing Mapreduce programs in Java.
  • Instead one can use SQL like language to do their daily tasks.
  • Apache Hive is mainly used for ETL and data warehousing feature in Hadoop.
  • The process of hive is to create tables, joins, unions, aggregates etc. It develop visualization reports can be done by Tableau.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,