[Solved-2 Solutions] How do you deal with empty or missing input files in Apache Pig ?



Problem:

How do you deal with empty or missing input files in Apache Pig ?

Solution 1:

  • To deal with the 0-byte problem, we've found that we can detect the situation and instead insert a file with a single newline. This causes a message like:
Encountered Warning ACCESSING_NON_EXISTENT_FIELD 13 time(s).

Alternatively, we could produce a line with the appropriate number of '\t' characters for that file which would avoid the warning, but it would insert garbage into the data that we would then have to filter out.

Solution 2:

  • The approach been using is to run pig scripts from a shell. we have one job that gets data from six different input directories.
  • The shell checks for the existence of the input file and assembles a final pig script from the fragments.

Related Searches to How do you deal with empty or missing input files in Apache Pig ?