Type conversion pig hcatalog ?



What is type conversion

  • Type conversion is converting one type data to another type. It is also known as Type Casting.

Set Up

  • The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables. No HCatalog-specific setup is required for these interfaces.

Running Pig

The -useHCatalog Flag

  • To bring in the appropriate jars for working with HCatalog, simply include the following flag / parameters when running Pig from the shell, Hue, or other applications:
  • pig -useHCatalog

Stale Content Warning

The fully qualified package name changed from

org.apache.hcatalog.pig to org.apache.hive.hcatalog.pig in Pig versions 0.14+.
In many older web site examples we may find references to the old syntax which no longer functions.

Previous Pig Versions 0.14+
org.apache.hcatalog.pig.HCatLoader org.apache.hive.hcatalog.pig.HCatLoader
org.apache.hcatalog.pig.HCatStorer org.apache.hive.hcatalog.pig.HCatStorer

HCatLoader

  • HCatLoader is used with Pig scripts to read data from HCatalog-managed tables.

Usage

  • HCatLoader is accessed via a Pig load statement.
  • Using Pig 0.14+
A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

Assumptions

  • We must specify the table name in single quotes: LOAD 'tablename'. If we are using a non-default database we must specify your input as 'dbname.tablename'. If we are using Pig 0.9.2 or earlier, we must create your database and table prior to running the Pig script.
  • Beginning with Pig 0.10 we can issue these create commands in Pig using the SQL command. The Hive metastore lets we create tables without specifying a database; if we created tables this way, then the database name is 'default' and is not required when specifying the table for HCatLoader.

HCatLoader Data Types

  • Restrictions apply to the types of columns HCatLoader can read from HCatalog-managed tables. HCatLoader can read onlythe Hive data types listed below.
  • Pig will interpret each Hive data type.

Types in Hive 0.12.0 and Earlier

Hive 0.12.0 and earlier releases support reading these Hive primitive data types with HCatLoader:

  • boolean
  • int
  • long
  • float
  • double
  • string
  • binary and these complex data types
  • map - key type should be string
  • ARRAY any type
  • struct any type fields

Running Pig with HCatalog

  • Pig does not automatically pick up HCatalog jars. To bring in the necessary jars, we can either use a flag in the pig command or set the environment variables PIG_CLASSPATH and PIG_OPTS as described below.

The -useHCatalog Flag

  • To bring in the appropriate jars for working with HCatalog, simply include the following flag:
  • pig -useHCatalog

Jars and Configuration Files

  • For Pig commands that omit -useHCatalog, we need to tell Pig where to find your HCatalog jars and the Hive jars used by the HCatalog client. To do this, we must define the environment variable PIG_CLASSPATH with the appropriate jars.
  • HCatalog can tell we the jars it needs. In order to do this it needs to know where Hadoop and Hive are installed. Also, we need to tell Pig the URI for your metastore, in the PIG_OPTS variable.
  • In the case where we have installed Hadoop and Hive via tar, we can do this:
export HADOOP_HOME=<path_to_hadoop_install>

export HIVE_HOME=<path_to_hive_install>

export HCAT_HOME=<path_to_hcat_install>

export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:\
$HIVE_HOME/lib/slf4j-api-*.jar

export PIG_OPTS=-Dhive.metastore.uris=thrift://<hostname>:<port>
Or we can pass the jars in your command line:
<path_to_pig_install>/bin/pig -Dpig.additional.jars=\
$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar  <script.pig>
The version number found in each filepath will be substituted for *. For example, HCatalog release 0.5.0 uses these jars and conf files:
·	$HCAT_HOME/share/hcatalog/hcatalog-core-0.5.0.jar
·	$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.5.0.jar
·	$HIVE_HOME/lib/hive-metastore-0.10.0.jar
·	$HIVE_HOME/lib/libthrift-0.7.0.jar
·	$HIVE_HOME/lib/hive-exec-0.10.0.jar
·	$HIVE_HOME/lib/libfb303-0.7.0.jar
·	$HIVE_HOME/lib/jdo2-api-2.3-ec.jar
·	$HIVE_HOME/conf
·	$HADOOP_HOME/conf
·	$HIVE_HOME/lib/slf4j-api-1.6.1.jar

Load Examples

This load statement will load all partitions of the specified table.
/* myscript.pig */

A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

If only some partitions of the specified table are needed, include a partition filter statement immediately following the load statement in the data flow.
The filter statement can include conditions on partition as well as non-partition columns.
/* myscript.pig */

A = LOAD 'tablename' USING  org.apache.hive.hcatalog.pig.HCatLoader();

-- date is a partition column; age is not
B = filter A by date == '20100819' and age < 30;

-- both date and country are partition columns
C = filter A by date == '20100819' and country == 'US';
...
...


To scan a whole table, for example:
a = load 'student_data' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate name, age;

Notice that the schema is automatically provided to Pig; there's no need to declare name and age as fields, as if we were loading from a file.

Filter Operators

A filter can contain the operators 'and', 'or', '()', '==', '!=', '<', '>', '<=' and '>='.

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp > '20110924';

A complex filter can have various combinations of operators, such as:

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp == '20110924' or datestamp == '20110925';

These two examples have the same effect:

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp == '20110924' or datestamp == '20110925';

Related Searches to Type conversion pig hcatalog