pig tutorial - apache pig tutorial - pig optimizer - pig latin - apache pig - pig hadoop

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig optimizer

can merge together two foreach statements

can simplify the expression in filter statement

Pig Multi-Query Execution

parses the entire script to determine if intermediate tasks can be combined

Example: one MULTI_QUERY ,MAP_ONLY job

Basic Optimization Rules

Filter early and often

apply filters as early as possible to reduce the amount of data processed
do not apply filter, if the cost of applying filter is very high and only a small amount of data is filtered out
remove NULLs before JOIN

Project early and often

Pig does not (yet) determine whether a field is no longer needed and drop the field from the record

Use the right data type

Pig assumes the type of double for numeric computations
specify the real type to speed of arithmetic computation (even 2x speedup for queries like bellow)
early error detection

Use the right JOIN implementation

Select the right level of parallelism

PARALLEL keyword

Select the right level of parallelism

Specify number of reducers explicitly

the SET default_parallel command (the script level)
the PARALLEL clause (the operator level)

Let Pig set the number of reducers (since Pig 0.8)

based on the size of the input data (assumes no data size change)
by default allocates a reducer for every 1GB of data

COGROUP*, GROUP*, JOIN*, CROSS, DISTINCT, LIMIT and ORDER start a reduce phase

(*) some implementation will force a reducer, but some will do not

Select the right level of parallelism

Based on Examples - Parallelism samples

SET default_parallel 1 - 3m8.033s
SET default_parallel 2 - 2m52.972s
SET default_parallel 6 - 2m42.771s
SET default_parallel 10 - 2m32.819s
SET default_parallel 20 - 2m38.023s
SET default_parallel 50 - 2m48.035s

Combine small input files

separate map is created for each file
may be inefficient if there are many small files

maxCombinedSplitSize method in pig

Combine small input files

pig.maxCombinedSplitSize – specifies the size of data to be processed by a single map. Smaller files are combined until this size is reached

Based on Examples - maxCombinedSplitSize samples

default - 2m32.819s
pig.maxCombinedSplitSize 64MB - 2m42.977s
pig.maxCombinedSplitSize 128MB - 2m38.076s
pig.maxCombinedSplitSize 256MB - 3m8.913s

Use LIMIT operator

Compress the results of intermediate jobs
Tune MapReduce and Pig parameters

Related Searches to pig optimizer

pig commandspig script tutorialpig scriptpig programmingprogramming pigpig apachepig mapreducepig architecturepig documentationpig examplespig join examplepig latin programhadoop pig commandshadoop pig examplesforeach generate pigstore command in pigpig tutorial apache pig tutorial hadoop pig tutorial pig latin tutorial learn pig pig hadoop pig tutorial point learn pig latin pig big data pig latin hadoop apache pig pig latin pig commands pig hive pig interview questions hadoop pig hive pig script how to learn pig latin pig and hive pig language pig tutorial pdf apache pig tutorial pdf hadoop pig examples pig store pig programming apache pig download pig data pig script example pig group pig storage pig in latin pig order what is apache pig how to read pig latin pig flatten pigstorage flatten in pig pig latin examples pig mapreduce apache pig commands pig commands pdf pig examples pig load pig code guide pig pig jobs store command in pig tutorial peppa pig peppa pig tutorial simple pig how to write in pig latin datapig pig latin program uses of pig

pig tutorial - apache pig tutorial - pig optimizer - pig latin - apache pig - pig hadoop

Pig Optimizer Example

Pig Multi-Query Execution

Basic Optimization Rules

maxCombinedSplitSize method in pig

Related Searches to pig optimizer

Wikitechy

Workshop

Join our Community

Other Languages

pig tutorial - apache pig tutorial - pig optimizer - pig latin - apache pig - pig hadoop

Pig Optimizer Example

Pig Multi-Query Execution

Basic Optimization Rules

maxCombinedSplitSize method in pig

Related Searches to pig optimizer

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages