[Solved-2 Solutions] Renaming part files of PIG output ?



What is Map Reduce ?

  • Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner

Problem:

We have a requirement of changing the part file naming convention after running my PIG job.

We want part-r-0000 to be userdefinedName-r-0000.

Any possible solution for avoiding hadoop -cp and hadoop -mv commands.

Solution 1:

  • This files are created by map-reduce jobs generated by Pig. So you should configure Apache Map-reduce. The corresponding property is mapreduce.output.basename
  • We can define any Hadoop property directly in your pig script:
SET mapreduce.output.basename 'custom-name';

Solution 2:

  • Starting the pig by using the below one:
 pig -Dmapreduce.job.queuename=my-queue -Dmapreduce.output.basename=my-outputfilename;

Related Searches to Renaming part files of PIG output