Data Hive

  • Data Hive is a data warehouse software project built on top of Apache Hadoop for providing query, and analysis.
  • Hive gives SQL like interface to query data stored in different databases and file systems that integrate with Hadoop.
Apache hive data model
Data Processing Task
  • Download the data
  • Upload the data
  • Start the hive view
    • Explore the hive user interface(UI)
    • Create table temp_drivers
    • Create query to populate hive table temp_drivers with drivers.csv data
    • Create table drivers
    • To create query for extract data from temp_drivers and store it to drivers.
    • Create temp_timesheet and timesheet tables.
    • For filter the data (driverid, hours_logged, miles_logged).
    • For join the data (driverid, name, hours_logged, miles_logged).

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,