[Solved-1 Solution] Apache PIG - Get only date from TimeStamp ?



What is timestamp ?

  • A timestamp is the current time of an event that is recorded by a computer. Through mechanisms such as the Network Time Protocol ( NTP ), a computer maintains accurate current time, calibrated to minute fractions of a second.

Apache Pig provides the following Date and Time functions

S.N. Functions & Description
1 ToDate(milliseconds) This function returns a date-time object according to
the given parameters. The other alternative
for this function are ToDate(iosstring), ToDate(userstring, format),
ToDate(userstring, format, timezone)
2 CurrentTime() returns the date-time object of the current time.
3 GetDay(datetime) Returns the day of a month
from the date-time object.
4 GetHour(datetime) Returns the hour of a day from the date-time object.
5 GetMilliSecond(datetime) Returns the millisecond of a second from the date-time object.
6 GetMinute(datetime) Returns the minute of an hour from the date-time object.
7 GetMonth(datetime) Returns the month of a year from the date-time object.
8 GetSecond(datetime) Returns the second of a minute from the date-time object.
9 GetWeek(datetime) Returns the week of a year from the date-time object.
10 GetWeekYear(datetime) Returns the week year from the date-time object.
11 GetYear(datetime) Returns the year from the date-time object.
12 AddDuration(datetime, duration) Returns the result of a date-time object along with the duration object.
13 SubtractDuration(datetime, duration) Subtracts the Duration object from the Date-Time object and returns the result.
14 DaysBetween(datetime1, datetime2) Returns the number of days between the two date-time objects.
15 HoursBetween(datetime1, datetime2) Returns the number of hours between two date-time objects.
16 MilliSecondsBetween(datetime1, datetime2) Returns the number of milliseconds between two date-time objects.
17 MinutesBetween(datetime1, datetime2) Returns the number of minutes between two date-time objects.
18 MonthsBetween(datetime1, datetime2) Returns the number of months between two date-time objects.
19 SecondsBetween(datetime1, datetime2) Returns the number of seconds between two date-time objects.
20 WeeksBetween(datetime1, datetime2) Returns the number of weeks between two date-time objects.
21 YearsBetween(datetime1, datetime2) Returns the number of years between two date-time objects.

Problem:

We have the following code

Data = load '/user/cloudera/' using PigStorage('\t') 
as
(   ID:chararray, 
    Time_Interval:chararray, 
    Code:chararray); 

transf = foreach Source_Data generate  (int) ID, 
                                   ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp,
                        (int) Code; 

SPLIT transf INTO       Src25 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-25'),
                        Src26 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-26');


STORE Src25 INTO '/user/cloudera/2016-07-25' using PigStorage('\t');
STORE Src26 INTO '/user/cloudera/2016-07-26' using PigStorage('\t');

We want to split the files by date and the rules that we are putting in Split statement it gives me error...

How can we transform TimeStamp (used on transf statement) in Date to make the comparasion ?

Solution 1:

  • We get the datetime object from ToDate, use GetYear(),GetMonth(),GetDay() on the datetime object and use CONCAT to construct only the date.
  • ToDate gives the date-time for the particular object.
  • Its one of the function which is frequently used according to the user needs.

The below code shows how to use Todate function.

transf = foreach Source_Data generate  
                   (int) ID, 
                   ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp,
                   (int) Code;

transf_new = foreach transf generate
                     ID,
                     TimeStamp,
                     CONCAT(CONCAT(CONCAT(GetYear(TimeStamp),'-')),(CONCAT(GetMonth(TimeStamp),'-')),GetDay(TimeStamp)) AS Day,-- Note:Brackets might be slightly off but it should be like 'yyyy-MM-dd' format
                     Code;

-- Now use the new Day column to split the data
SPLIT transf_new INTO       Src25 IF (Day =='2016-07-25'),

Related Searches to Apache PIG - Get only date from TimeStamp