[Solved-2 Solutions] Pig non-aggregated warnings output location ?



Problem:

Pig: 0.8.1-cdh3u2
Hadoop: 0.20.2-cdh3u0
  • Debugging FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings, but its difficult to make individual warnings printed anywhere. Disabling aggregation via -w or aggregate.warnings=false switch removes the summary messages, BUT it does remove the actual warning too.
  • There's nothing written in the pig's log for this run, AND there's no place you can locate the logs with the individual warnings. Is there any help?

Solution 1:

  • Hadoop job logs are recorded locally on each compute node. Therefore we first need to setup the hadoop cluster manager to collect the logfiles onto the distributed files system so that we can analyse them. If weuse Hadoop-on-demand .
  • We need to specify the following thing:
log-destination-uri = hdfs://host123:45678/user/hod/logs

After we have the logs on HDFS you can run a simple PIG query to find the offending conversion. Something like the following should do the trick:

a1= LOAD '*.log' USING PigStorage(']') ;
a2= FILTER a1  by ($1 MATCHES ' WARN.*Unable to interpret value.*');

Solution 2:

  • It's really complicated to find which data or value is causing issue, but at least we can find which column is creating this issue. Once we find the column we can use Dynamic Invoker which may help in type conversion.

How to use Dynamic Invoker :

DEFINE ConvertToDouble InvokeForDouble('java.lang.Double.parseDouble', 'String');
ConvertToDouble(column_name);

Related Searches to Pig non-aggregated warnings output location ?