Internal architecture of Apache Pig

  • Pig Latin consist of pig to analyze the data from Hadoop.
  • Its a highlevel data processing language it perform various operations like data types and operators.
  • To perform a particular task Pig Script is used and execution mechanisms like(Grunt Shell, UDFs, Embedded).
  • To produce the desired output continuous transformations applied in Pig Framework.
  • Programmer’s job become easier when converting these scripts into a series of MapReduce jobs.
Pig Architecture

Apache Pig Components

  • There are various components in the Apache Pig framework.

Parser

  • The Pig Scripts are handled by the Parser.It checks the syntax,type checking, and other miscellaneous checks.The output will be like a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.
  • In the DAG, the logical operators of the script are represented as nodes and therefore the data flows are represented as edges.

Optimizer

  • The logical plan DAG is passed to the logical optimizer,it carries out the logical optimizations,projection and pushdown.

Compiler

  • It compiles the optimized logical plan into a series of MapReduce jobs.

Execution engine

  • Producing the desired results MapReduce jobs are executed on Hadoop.
Pig Latin

Pig Latin Data Model

  • It allows datatypes such as map and tuple and it representation of Pig Latin’s data model.

Atom

  • Any single value in Pig Latin, irrespective of their data, type is known as an Atom.
  • Values of Pig in Atom are number,int,long,float,double,chararray, and bytearray.
  • Field define as a piece of data or a simple atomic value. Example − ‘xxx’ or ‘30’

Tuple

  • A record formed by ordered set of fields is known as a tuple. A tuple is similar to a row in a table of RDBMS. Example − (xxx, 30)

Bag

  • Is an unordered set of tuples.
  • A collection of tuples (non-unique) is known as a bag.Tuple have any number of fields (flexible schema).
  • No need that every tuple contain the same number of fields and the same position (column) have the same type.
    • Example − {(xxx, 30), (yyyy, 45)}
  • A Field contains bags in context its called inner bags.

Map

  • Map is a set of key-value pairs the key define as chararray and its unique.
  • It is represented by ‘[]’ Example − [name#xxx, age#30]

Relation

  • A relation is a bag of tuples.Pig Latin relations are unordered (there is no guarantee that tuples are processed in any particular order).

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,