pig tutorial - apache pig tutorial - Pig latin - pig latin - apache pig - pig hadoop




What is pig latin - Pig Programming Model: Data

  • Pig operations operate on relations
  • A relation is a bag
  • A bag is a collection of tuples
  • A tuple is an ordered set of fields
  • A field is any type of data

Basic data types:

  • Boolean: True, False
  • Int and Long: 1, 2, 3, 4, 5
  • Float and Double: 2.3, 1.4, 4.5
  • Chararray: ‘Hello’, ‘I am a string’
  • DateTime: 2014-09-11T12:20:14.1234+00:00
  • … more but you won’t probably use them very often
learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig latin data model

Tuple: A catch-all data type

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig data type

Bag:

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig data type bag

Working with Data

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig data methods

Loading data?

  • Data source: Local or HDFS (usually!)
  • LOAD instruction:
    • Data is automatically loaded in a distributed relation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig load data
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig load data
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig latin data type tuple map

    Checking relations’ content

  • DUMP instruction:
    • Prints the content of a relation at standard output
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig dump statement
  • DESCRIBE instruction:
    • Prints the schema of the relation at standard output
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig describe statement
  • ILLUSTRATE instruction:
    • Prints the schema of the relation and a tuple example at standard output
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig illustrate statement

    Operating on relations

  • FOREACH instruction:
    • Generate new relations by projecting data of a relation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig foreach statement
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig foreach statement
  • FOREACH instruction:
    • Let us execute the instruction and… it seems that nothing happens!
    • We had some tracing output with LOAD, DUMP, and ILLUSTRATE…

    Operating on relations

  • Pig employs lazy evaluation
  • Computation only when:
    • LOAD, ILLUSTRATE, DUMP, STORE
  • Pig keeps a DAG on MR jobs needed to compute relations (optimized!)
  • Operating on relations

  • FILTER instruction:
    • Generate a new relation by filtering data on a relation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig filter operation
  • SPLIT instruction:
    • Splits a relation into multiple relations based on conditions
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig split operation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig split operation
  • GROUP instruction:
    • Creates tuples with the key and a of bag tuples with the same key values
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig group by operation
  • We can use multiple relations. Creates one bag per relation
  • learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig group by operation
  • Nested FOREACH:
    • Operate on data in bags inside a relation and then project
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig nested foreach operation
  • (inner) JOIN instruction:
    • Our classic database operator for relations!
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig inner join operation
  • (left) JOIN instruction:
    • Our classic database operator for relations!
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig left outer join operation
  • CROSS instruction:
    • Cartesian product of two or more relations
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig cross join operation
  • UNION instruction:
    • Joins in the same relation multiple relations
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig union operation
  • DISTINCT instruction:
    • Only preserves unique tuples
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig distinct operation
  • ORDER BY instruction:
    • Sorts relations by a specific criteria
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig order by operation
  • LIMIT instruction:
    • Truncates relation’s size
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig limit operation
  • RANK instruction:
    • Appends position of each tuple in the relation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig rank operation
  • We can also sort and rank!
  • learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig sort rank operation
  • SAMPLE instruction:
    • Sample the relation!
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig sample instruction
  • CUBE instruction:
    • Is this really useful? Yes! Many aggregates with just one operation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig cube operation
  • CUBE/ROLLUP instruction:
    • Like standard CUBE but nulls values are introduced from right to left
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig cube rollup operation
  • ASSERT instruction:
    • Assert that the whole relation fulfills a condition
    • Useful for debugging
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig assert operation
  • STORE instruction:
    • Stores the relation into the local FS or HDFS (usually!)
    • Useful for debugging
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - apache pig store operation

    Where to find useful PigLatin scripts?

    • PiggyBank - Pig’s repository of usercontributed functions
      • load/store functions (e.g. from XML)
      • datetime, text functions math, stats functions
    • DataFu - LinkedIn's collection of Pig UDFs
      • statistics functions (quantiles, variance etc.)
      • convenient bag functions (intersection, union etc.)
      • utility functions (assertions, random numbers, MD5, distance between lat/long pair), PageRank

    How to develop PigLatin scripts?

  • Eclipse plugins
    • PigEditor
      • syntax/errors highlighting
      • check of alias name existence
      • auto completion of keywords, UDF names
    • PigPen
      • graphical visualization of scripts (box and arrows)
    • Pig-Eclipse
    • Plugins for Vim, Emacs, TextMate
      • Usually provide syntax highlighting and code completion

    How to run PigLatin scripts?

  • PigServer Java class, a JDBC like interface
  • Python and JavaScript with PigLatin code embedded
    • adds control flow constructs such as if and for
    • avoids the need to invent a new language
    • uses a JDBC-like compile, bind, run model

    Related Searches to Apache Pig Overview

    Adblocker detected! Please consider reading this notice.

    We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

    We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

    We need money to operate the site, and almost all of it comes from our online advertising.

    Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

    ×