[Solved-1 Solution] Records proactively spilled in Hadoop Pig?



Hdfs

  • Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and provides a shell for users to interact with HDFS.

Problem:

We are new to Hadoop and was curious about the command line messages from my pig script.

Total records written : 7676
Total bytes written : 341396
Spillable Memory Manager spill count : 103
Total bags proactively spilled: 39
Total records proactively spilled: 32389322

The end result is indicated to be a "Success!". We still not sure. What do these numbers above mean?

Solution 1:

The first two shows the total records/bytes written to HDFS by the MR job.

It can happen, that during a MR job not all records fit into the memory. Spill counters indicate how many records have been written to the local disks of your datanodes to avoid running out of memory.

Pig uses two methods to control the memory usage and do a spill if necessary:

Spillable Memory Manager :

  • This is like a central place where the spillable bags are registered. In case of low memory this manager goes through the list of the registered bags and performs a GC.

Proactive (self) spilling:

  • Bags can also spill themselves if their memory limit is reached (see pig.cachedbag.memusage)
  • Back to the statistics we have:
    • Total bags proactively spilled: # of bags that have been spilled
    • Total records proactively spilled: # of records in those bags
  • It's always good to check the spill stats of job since lot of spilling may indicate huge performance hit that need to be avoided.

Related Searches to Records proactively spilled in Hadoop Pig?