Download Data and Create a Solution on Paper

Week 6 Assignment 1 – Download Data and Create a Solution on Paper

In this assignment, you will perform the steps that are required before actually running queries in a cluster. You will end up saving time and money by first understanding the data, the relationships and the information that is required to be generated from the data.

  1. Download the zip file containing the data used for this assignment
  2. Read the “readme59.txt” file to understand the data and the relationship between the data
  3. Draft a document outlining your solution to answer the following questions. This document must contain the Hive query language (HQL) commands, names of the data file(s) that you will need to include in the queries and the results that you expect to get from the queries.  NOTE :  The HQL commands look similar to  SQL queries. 

    1. What is the total number of baseball players?
    1. How many players were born before 1960? 
    1. How many players were born after 1960?
    1. How many players were born in the USA? 
    1. How many players were born outside the USA?
    1. Group the number of players by year of birth and then only list the top 10 years 
    1. List the number of players by month of birth
    1. Provide a list of players with the following information:
      1. Player name and city, state, country and total salary for all years combined.
    1. Provide a list of players with his age as of today
    1. Provide a list of players that were inducted into the Hall of Fame
    1. Provide a list of the top 10 highly paid players
    1. Provide a list of all players for any team and from any year. For example, print the list of players who played for Chicago Cubs in 2000. 

Requirements for the assignments:

  • Assignment file must have a .doc or .docx extension; screen shots should be in .jpg, .gif, or .pdf format
  • Points for this assignment = 10