Set Number of Reducer in Pig:

SET default_parallel XXX
  • Where XXX is the number of reducer.
  • This command is used to set the number of reducers at the script level
  • The coder need to write this configuration at top/beginning of their pig script.
  • Alternatively, use the PARALLEL clause to set the number of reducers at the operator level.
  • We set the value using the PARALLEL clause will override any value we specify through (“SET default parallel.”) to reduce phase you can include the PARALLEL clause with any operator.
    • COGROUP
    • CROSS
    • DISTINCT
    • GROUP
    • JOIN (inner)
    • JOIN (outer) and
    • ORDER BY.

For Example:

In GROUP operator the PARALLEL class has been used.

  • A = LOAD ‘myfile’ AS (t, u, v);
  • B = GROUP A BY t PARALLEL 18

Here 18 is number of reducer.

  • If neither “set default parallel” nor the PARALLEL clause are used, using size of the input data Pig sets the number of reducers.

The properties values has been specified

  • pig.exec.reducers.bytes.per.reducer – Defines the number of input bytes per reduce; Pig reducer default value is 1000*1000*1000 (1GB).
  • pig.exec.reducers.max – Defines the upper bound on the number of reducers; Pig reducer default is 999.

Categorized in:

Apache Pig

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Share Article:

Leave a Reply

Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Powered By
Best Wordpress Adblock Detecting Plugin | CHP Adblock