Set Number of Reducer in Pig:

SET default_parallel XXX
  • Where XXX is the number of reducer.
  • This command is used to set the number of reducers at the script level
  • The coder need to write this configuration at top/beginning of their pig script.
  • Alternatively, use the PARALLEL clause to set the number of reducers at the operator level.
  • We set the value using the PARALLEL clause will override any value we specify through (“SET default parallel.”) to reduce phase you can include the PARALLEL clause with any operator.
    • COGROUP
    • CROSS
    • DISTINCT
    • GROUP
    • JOIN (inner)
    • JOIN (outer) and
    • ORDER BY.

For Example:

In GROUP operator the PARALLEL class has been used.

  • A = LOAD ‘myfile’ AS (t, u, v);
  • B = GROUP A BY t PARALLEL 18

Here 18 is number of reducer.

  • If neither “set default parallel” nor the PARALLEL clause are used, using size of the input data Pig sets the number of reducers.

The properties values has been specified

  • pig.exec.reducers.bytes.per.reducer – Defines the number of input bytes per reduce; Pig reducer default value is 1000*1000*1000 (1GB).
  • pig.exec.reducers.max – Defines the upper bound on the number of reducers; Pig reducer default is 999.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,