Fundamentals of Pig

Introduction:

Pig is a High-level scripting platform for analyzing large data sets. It’s an abstraction built on top of hadoop. It contains domain-specific dataflow language Pig Latin and a translation engine which converts the Pig Latin to MapReduce jobs. It uses familiar keywords such as Join, Group and filter. This has been Hadoop Subproject since 2007.

What do I need to work with Pig:

You might need Windows or Linux environment with Hadoop with Java 1.6 above. It would be easy if you can get started with Cloudera or Hortonworks distribution of Hadoop.

Running Pig:

You can run pig as commands or statements in Local mode or MapReduce mode. In Local mode all the files are installed in local host and filesystem. In the Mapreduce mode we need to access to the Hadoop Cluster and HDFS installation. Mapreduce is the default mode of execution.

Big Picture in a simple way:


Structure of the Pig Latin Script:


Advertisements