Fundamentals of Pig: Workings with Tuples

In the previous blog we uploaded the Windows Event log to the Hadoop environment and started analyzing it using Pig. We will see in this blog how we can work with the tuples.

Filtering Data:

In the script below there is no filter applied, so it fetches all the tuples.

Events = LOAD ‘MyAppEvents.csv’ USING PigStorage(‘,’) as (Level,DateTime,Source,EventID,TaskCategory, TaskDescription);

Describe Events;

Result = FOREACH Events GENERATE Level,EventID, TaskDescription;

Dump Result;

You can see one such example is highlighted in the picture given below.

Tuples of data can be filtered using the FILTER option in Pig.

Events = LOAD ‘MyAppEvents.csv’ USING PigStorage(‘,’) as (Level,DateTime,Source,EventID,TaskCategory, TaskDescription);

Describe Events;

Result = Filter Events by EventID is not null

Dump Result;

In this above code snipped the events are filtered when the EventID is not null, you can see the results.

 

More to come..

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s