Best way to take the back up of hive partitioned table into a disk

  • Partitioning is the optimization technique in Hive which improves the performance significantly.
  • Apache Hive is the data warehouse on the top of Hadoop, which enables ad-hoc analysis over structured and semi-structured data
  • Apache Hive organizes tables into partitions.Partitioning is dividing a table into connected components based on the values of specific columns like date, city, and department.
  • Each table in hive can have one or additional partition keys to identify a particular partition. using partition it is easy to do queries on slices of the data.
hive-partitioned-table-into-a-disk
The Hadoop Distributed File System considered reliable from a technical failure purpose of view, but to archive data you need to copy it offline, with copyToLocal, or copy the data to another cluster or another location on the cluster.

For copyToLocal the command syntax is:

hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Additionally, you can use Hive to copy data from a table to local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM wikitechy  where year=2018

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,