[Solved-2 Solutions] Manipulating a data structure in Pig/Hive ?



Problem:

We have a data structure like this:

Item Year Jan Feb Mar
A 2000 3.2 4.1 2.8
B 2000 4.3 5.0 3.9
A 2001 7.2 6.8 9.2

We need the data look like this:

Item Year Month Value
A 2000 Jan 3.2
A 2000 Feb 4.1
A 2000 Mar 2.8
B 2000 Jan 4.3
B 2000 Feb 5.0
B 2000 Mar 3.9
A 2001 Jan 7.2
A 2001 Feb 6.8
A 2001 Mar 9.2

Here we would like to be able to do this in Pig, but knowing how to do it in Hive, R, Python, or Excel/LibreCalc ?

Solution 1:

You can try this:

CREATE TABLE myDatabase.newTable STORED AS TEXTFILE AS 
SELECT item, year, 'jan' AS Month, jan AS Value FROM myDatabase.myTable UNION ALL
SELECT item, year, 'feb' AS Month, feb AS Value FROM myDatabase.myTable UNION ALL
SELECT item, year, 'mar' AS Month, mar AS Value FROM myDatabase.myTable;

Solution 2:

This solution will work in Hive. but it is pretty similar to SQL.

select item, year,
'Jan' as Month,
Jan as value
from yourtable
UNION
select item, year,
'Feb' as Month,
Feb as value
from yourtable
UNION
select item, year,
'Mar' as Month,
Mar as value
from yourtable  

Related Searches to Manipulating a data structure in Pig/Hive