What is default numbers of reducers while executing a pig query ?
Set Number of Reducer in Pig:
- Where XXX is the number of reducer.
- This command is used to set the number of reducers at the script level
- The coder need to write this configuration at top/beginning of their pig script.
- Alternatively, use the PARALLEL clause to set the number of reducers at the operator level.
- We set the value using the PARALLEL clause will override any value we specify through (“SET default parallel.”) to reduce phase you can include the PARALLEL clause with any operator.
-
- COGROUP
- CROSS
- DISTINCT
- GROUP
- JOIN (inner)
- JOIN (outer) and
- ORDER BY.
For Example:
In GROUP operator the PARALLEL class has been used.
- A = LOAD ‘myfile’ AS (t, u, v);
- B = GROUP A BY t PARALLEL 18
Here 18 is number of reducer.
- If neither “set default parallel” nor the PARALLEL clause are used, using size of the input data Pig sets the number of reducers.
The properties values has been specified
pig.exec.reducers.bytes.per.reducer– Defines the number of input bytes per reduce; Pig reducer default value is 1000*1000*1000 (1GB).pig.exec.reducers.max– Defines the upper bound on the number of reducers; Pig reducer default is 999.