Python Question

When you finish this class we hope that you would be able to write and understand Python code to solve unique Data Analytics tasks on your own.

Something we feel an employer would expect if this class was on your resume.

That you have taught yourself by reading and completing the zyBook activities, asking questions, and trying out code.

For each assignment it is expected that:

  • You would go through the program development cycle.
  • Understand the problem task thoroughly. UNDERSTAND
  • Plan your code by producing an algorithm showing all of the steps. ANALYZE
  • This could be done as putting pseudocode comments in the code.
  • Write your code. APPLY
  • Test your code thoroughly. EVALUATE – FINISH CREATION
  • Double check the assignment requirements.

Programming Submission Rubric for DAT 535.

80% Assignment Requirements Fulfilled. The program runs correctly.

20% Divided equally among the items in the list below:

  • Only zipped folders or single files should be submitted to Blackboard. for Jupyter Notebook files.
  • All files require the correct extensions. .py for Spyder IDE files or .ipynb
  • Zip multiple files in a folder to submit.
  • All submission folder and file names must include the student’s last name.
  • All individual files should have self documenting names.
  • All variables should have self documenting names.
  • Use of comments, including a comment block at the top of each file with your name and other details.
  • Include this sentence in the comment block at the top and type in your name:

I certify, that this computer program submitted by me is all of my own work. Signed: Your Name

  • All sources cited.
  • Correct spelling and grammar.
  • Neat, clearly presented code.

The class is likely to have students with different exposures to computer programming. You are not required to have any experience in Python. The important issue is how much you learn during our class.

Session 2 Assignment Week 3 & 4

Please submit your assignment as ONE Jupyter Notebook file. No need to zip to submit.

Use a lot of comments and headings.

PART 1 – 15 Points

The mtcars dataset contains data from the 1974 Motor Trends magazine, and includes 10 features of performance and design from a sample of 32 cars.

  • Import the csv file mtcars.csv as a data frame using a pandas module function.
  • Find the mean, median, and mode of the column wt.
  • Print the mean and median.

Below are the steps required for the task.

import pandas as pd

# Read in the file mtcars.csv

cars = # Your code here

# Find the mean of the column wt

mean = # Your code here

# Find the median of the column wt

median = # Your code here

print(“mean = {:.5f}, median = {:.3f}”.format(mean, median))

PART 2 – 15 Points

The intelligence quotient (IQ) of a randomly selected person follows a normal distribution with a mean of 100 and a standard deviation of 15. Use the scipy function norm and user input values for IQ1 and IQ2 to perform the following tasks:

  • Calculate the probability that a randomly selected person will have an IQ less than or equal to IQ1.
  • Calculate the probability that a randomly selected person will have an IQ between IQ1 and IQ2.

For example, if the input is:

105110 

the output is:

The probability that a randomly selected person  has an IQ less than or equal to 105.0 is 0.631.The probability that a randomly selected person  has an IQ between 105.0 and 110.0 is 0.117.

Below are the steps required for the task.

# Import norm from scipy.stats

from scipy.stats import norm

# Input two IQs, making sure that IQ1 is less than IQ2

# Add appropriate prompts for the user

IQ1 = float(input())

IQ2 = float(input())

# Input two IQs, making sure that IQ1 is less than IQ2

# Add appropriate prompts for the user

while IQ1 > IQ2:

print(“IQ1 should be less than IQ2. Enter numbers again.”)

IQ1 = float(input())

IQ2 = float(input())

# Calculate the probability that a randomly selected person has an IQ less than or equal to IQ1.

probLT = # Your code here

# Calculate the probability that a randomly selected person has an IQ between IQ1 and IQ2

probBetw = # Your code here

print(“The probability that a randomly selected person n has an IQ less than or equal to ” + str(IQ1) + ” is “, end=””)

print(‘%.3f’ % probLT + “.”)

print(“The probability that a randomly selected person n has an IQ between ” + str(IQ1) + ” and ” + str(IQ2)+ ” is “, end=””)

print(‘%.3f’ % probBetw + “.”)

PART 3 – 15 Points

The hmeq_small dataset contains information on 5960 home equity loans, including 7 features on the characteristics of the loan.

  • Load the data set hmeq_small.csv as a data frame.
  • Create a new data frame with all the rows with missing data deleted.
  • Create a second data frame with all missing data filled in with the mean value of the column.
  • Find the means of the columns for both new data frames.

Ex: Using only the first hundred rows, found in hmeq_sample.csv, the output is:

Means for hmeqDelete are  LOAN        3208.333333MORTDUE    67495.958333VALUE      82529.125000YOJ            8.500000CLAGE        144.749455CLNO          16.583333DEBTINC       33.052122dtype: float64Means for hmeqReplace are  LOAN        3045.918367MORTDUE    49386.494253VALUE      64033.483871YOJ            8.179775CLAGE        140.209320CLNO          15.586957DEBTINC       30.947152dtype: float64

Below are the steps required for the task.

import pandas as pd

# Read in hmeq_small.csv

hmeq = # Your code here

# Create a new data frame with the rows with missing values dropped

hmeqDelete = # Your code here

# Create a new data frame with the missing values filled in by the mean of the column

hmeqReplace = # Your code here

# Print the means of the columns for each new data frame

print(“Means for hmeqDelete are “, # Your code here)

print(“Means for hmeqReplace are “, # Your code here)

PART 4 – 15 Points

The hmeq_small dataset contains information on 5960 home equity loans, including 7 features on the characteristics of the loan.

  • Load the hmeq_small.csv data set as a data frame.
  • Standardize the data set as a new data frame.
  • Normalize the data set as a new data frame.
  • Print the means and standard deviations of both the standardized and normalized data.

Ex: Using the first 100 rows, found in hmeq_sample.csv, the output is:

The means of hmeqStand are  LOAN      -4.984675e-17MORTDUE    1.914178e-17VALUE     -1.790682e-18YOJ       -7.235161e-17CLAGE     -4.194176e-17CLNO      -6.033821e-17DEBTINC    6.125368e-17dtype: float64The standard deviations of hmeqStand are  LOAN       1.005141MORTDUE    1.005797VALUE      1.005420YOJ        1.005666CLAGE      1.005602CLNO       1.005479DEBTINC    1.017700dtype: float64The means of hmeqNorm are  LOAN       0.671006MORTDUE    0.358735VALUE      0.299044YOJ        0.292135CLAGE      0.448986CLNO       0.346377DEBTINC    0.624927dtype: float64The standard deviations of hmeqNorm are  LOAN       0.269531MORTDUE    0.247183VALUE      0.187587YOJ        0.237945CLAGE      0.226345CLNO       0.188681DEBTINC    0.222946dtype: float64 

Below are the steps required for the task.

import pandas as pd

from sklearn import preprocessing

hmeq = # Read in the file hmeq_small.csv

# Standardize the data

standardized = # Your code here

# Output the standardized data as a data frame

hmeqStand = # Your code here

# Normalize the data

normalized = # Your code here

# Output the normalized data as a data frame

hmeqNorm = # Your code here

# Print the means and standard deviations of hmeqStand and hmeqNorm

print(“The means of hmeqStand are “, # Your code here)

print(“The standard deviations of hmeqStand are “, # Your code here)

print(“The means of hmeqNorm are “, # Your code here)

print(“The standard deviations of hmeqNorm are “, # Your code here)