When you finish this class we hope that you would be able to write and understand Python code to solve unique Data Analytics tasks on your own.
Something we feel an employer would expect if this class was on your resume.
That you have taught yourself by reading and completing the zyBook activities, asking questions, and trying out code.
For each assignment it is expected that:
- You would go through the program development cycle.
- Understand the problem task thoroughly. UNDERSTAND
- Plan your code by producing an algorithm showing all of the steps. ANALYZE
- This could be done as putting pseudocode comments in the code.
- Write your code. APPLY
- Test your code thoroughly. EVALUATE – FINISH CREATION
- Double check the assignment requirements.
Programming Submission Rubric for DAT 535.
80% Assignment Requirements Fulfilled. The program runs correctly.
20% Divided equally among the items in the list below:
- Only zipped folders or single files should be submitted to Blackboard. for Jupyter Notebook files.
- All files require the correct extensions. .py for Spyder IDE files or .ipynb
- Zip multiple files in a folder to submit.
- All submission folder and file names must include the student’s last name.
- All individual files should have self documenting names.
- All variables should have self documenting names.
- Use of comments, including a comment block at the top of each file with your name and other details.
- Include this sentence in the comment block at the top and type in your name:
I certify, that this computer program submitted by me is all of my own work. Signed: Your Name
- All sources cited.
- Correct spelling and grammar.
- Neat, clearly presented code.
The class is likely to have students with different exposures to computer programming. You are not required to have any experience in Python. The important issue is how much you learn during our class.
Session 2 Assignment Week 3 & 4
Please submit your assignment as ONE Jupyter Notebook file. No need to zip to submit.
Use a lot of comments and headings.
PART 1 – 15 Points
The mtcars dataset contains data from the 1974 Motor Trends magazine, and includes 10 features of performance and design from a sample of 32 cars.
- Import the csv file mtcars.csv as a data frame using a pandas module function.
- Find the mean, median, and mode of the column wt.
- Print the mean and median.
Below are the steps required for the task.
import pandas as pd
# Read in the file mtcars.csv
cars = # Your code here
# Find the mean of the column wt
mean = # Your code here
# Find the median of the column wt
median = # Your code here
print(“mean = {:.5f}, median = {:.3f}”.format(mean, median))
PART 2 – 15 Points
The intelligence quotient (IQ) of a randomly selected person follows a normal distribution with a mean of 100 and a standard deviation of 15. Use the scipy function norm and user input values for IQ1 and IQ2 to perform the following tasks:
- Calculate the probability that a randomly selected person will have an IQ less than or equal to IQ1.
- Calculate the probability that a randomly selected person will have an IQ between IQ1 and IQ2.
For example, if the input is:
105110
the output is:
The probability that a randomly selected person has an IQ less than or equal to 105.0 is 0.631.The probability that a randomly selected person has an IQ between 105.0 and 110.0 is 0.117.
Below are the steps required for the task.
# Import norm from scipy.stats
from scipy.stats import norm
# Input two IQs, making sure that IQ1 is less than IQ2
# Add appropriate prompts for the user
IQ1 = float(input())
IQ2 = float(input())
# Input two IQs, making sure that IQ1 is less than IQ2
# Add appropriate prompts for the user
while IQ1 > IQ2:
print(“IQ1 should be less than IQ2. Enter numbers again.”)
IQ1 = float(input())
IQ2 = float(input())
# Calculate the probability that a randomly selected person has an IQ less than or equal to IQ1.
probLT = # Your code here
# Calculate the probability that a randomly selected person has an IQ between IQ1 and IQ2
probBetw = # Your code here
print(“The probability that a randomly selected person n has an IQ less than or equal to ” + str(IQ1) + ” is “, end=””)
print(‘%.3f’ % probLT + “.”)
print(“The probability that a randomly selected person n has an IQ between ” + str(IQ1) + ” and ” + str(IQ2)+ ” is “, end=””)
print(‘%.3f’ % probBetw + “.”)
PART 3 – 15 Points
The hmeq_small dataset contains information on 5960 home equity loans, including 7 features on the characteristics of the loan.
- Load the data set hmeq_small.csv as a data frame.
- Create a new data frame with all the rows with missing data deleted.
- Create a second data frame with all missing data filled in with the mean value of the column.
- Find the means of the columns for both new data frames.
Ex: Using only the first hundred rows, found in hmeq_sample.csv, the output is:
Means for hmeqDelete are LOAN 3208.333333MORTDUE 67495.958333VALUE 82529.125000YOJ 8.500000CLAGE 144.749455CLNO 16.583333DEBTINC 33.052122dtype: float64Means for hmeqReplace are LOAN 3045.918367MORTDUE 49386.494253VALUE 64033.483871YOJ 8.179775CLAGE 140.209320CLNO 15.586957DEBTINC 30.947152dtype: float64
Below are the steps required for the task.
import pandas as pd
# Read in hmeq_small.csv
hmeq = # Your code here
# Create a new data frame with the rows with missing values dropped
hmeqDelete = # Your code here
# Create a new data frame with the missing values filled in by the mean of the column
hmeqReplace = # Your code here
# Print the means of the columns for each new data frame
print(“Means for hmeqDelete are “, # Your code here)
print(“Means for hmeqReplace are “, # Your code here)
PART 4 – 15 Points
The hmeq_small dataset contains information on 5960 home equity loans, including 7 features on the characteristics of the loan.
- Load the hmeq_small.csv data set as a data frame.
- Standardize the data set as a new data frame.
- Normalize the data set as a new data frame.
- Print the means and standard deviations of both the standardized and normalized data.
Ex: Using the first 100 rows, found in hmeq_sample.csv, the output is:
The means of hmeqStand are LOAN -4.984675e-17MORTDUE 1.914178e-17VALUE -1.790682e-18YOJ -7.235161e-17CLAGE -4.194176e-17CLNO -6.033821e-17DEBTINC 6.125368e-17dtype: float64The standard deviations of hmeqStand are LOAN 1.005141MORTDUE 1.005797VALUE 1.005420YOJ 1.005666CLAGE 1.005602CLNO 1.005479DEBTINC 1.017700dtype: float64The means of hmeqNorm are LOAN 0.671006MORTDUE 0.358735VALUE 0.299044YOJ 0.292135CLAGE 0.448986CLNO 0.346377DEBTINC 0.624927dtype: float64The standard deviations of hmeqNorm are LOAN 0.269531MORTDUE 0.247183VALUE 0.187587YOJ 0.237945CLAGE 0.226345CLNO 0.188681DEBTINC 0.222946dtype: float64
Below are the steps required for the task.
import pandas as pd
from sklearn import preprocessing
hmeq = # Read in the file hmeq_small.csv
# Standardize the data
standardized = # Your code here
# Output the standardized data as a data frame
hmeqStand = # Your code here
# Normalize the data
normalized = # Your code here
# Output the normalized data as a data frame
hmeqNorm = # Your code here
# Print the means and standard deviations of hmeqStand and hmeqNorm
print(“The means of hmeqStand are “, # Your code here)
print(“The standard deviations of hmeqStand are “, # Your code here)
print(“The means of hmeqNorm are “, # Your code here)
print(“The standard deviations of hmeqNorm are “, # Your code here)
