Instructions: The .ipynb file should be submitted as **searchable PDF **and not
Instructions: The .ipynb file should be submitted as **searchable PDF **and not as a picture. Read the instructions carefully. Please download your assignment here.
All answers along with their codes should be submitted as searchable PDF .
Pictures or snapshots of your work will not be accepted.
All generated csv files and .ipynb file must be submitted in a zip-folder as a secondary source.
Ensure the zip-folder has four csv files (Your Name, college_data, Height, Sustainability).
You may use Jupyter notebook or Colab as per your convenience.
Note: Reach out to your instructor for any question regarding csv files, codes or zip-folder.
Failure to comply with the instructions will result in 0 grade on the relevant portions of the assignment. Your instructor will grade your submission based on what you submitted. Failure to submit an assignment or submitting an assignment for another class will result in a 0 grade, without the opportunity to resubmit. Make sure that you submit your original work. Suspected plagiarism cases will be treated as possible academic misconduct and will be reported to the College Academic Integrity Committee for formal investigation. As part of this procedure, your instructor may require you to meet with them for an oral exam on the assignment.
Statistical Intuitions and Applications
Assignment #2
A – Statistical Intuition in Medical Science
Question 1:
The data is sourced from the 2020 annual CDC survey of 400k US adults regarding their health status. The dataset is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to comprehensively collect health-related information from residents across the United States.
The code given below select a random sample of 300 respondents from the original data set which you need to use to analyze the data for those 300 respondents.
Use the Heart Disease dataset to answer the following questions:
a. Is having heart disease independent of whether the respondent is diabetic? Include the two-way table and all your calculations in your answer.
b. Is there a relationship between having heart disease and the general health condition of the respondent? Include the two-way table, stacked bar graph, and all your calculations in your answer.
c. Is there a relationship between having heart disease and the gender of the respondent? Include the two-way table, stacked bar graph, and all your calculations in your answer.
d. Does doing physical activity lower the risk of having heart disease? If so, by how much?
B – Statistical Intuition in Data Science
Question 2:
a. Compare the probability distributions of any three classes of your choice and identify the key differences. Your answers should have proper justification.
b. Explain how will you determine which class (or classes) has the lowest number of students who fail.
c. In a class (or classes) that has the highest number of students fail, which grade is the highest?
d. Explain how will you calculate the number of students with B grade in Class_2 and Class_4.
e. After comparing students with C grade in Class_1 and Class_4, Amna concluded that Class_4 has 5% more students with C grade than in Class_1. Provide an evidence if she is right or wrong.
C – Statistical Intuition in Health Sciences
Question 3:
Dubai Health Authority (DHA) contacted you to analyze a particular situation in the two hospitals: Dubai Hospital and Rashid Hospital which is to assess their performance in terms of resources allocation (Staffing, medical equipment, space etc). DHA provided you the most updated information available to them which can be obtained through the code below. It gives you the probability distributions of number of patients in these hospitals who have been diagnosed with a certain medical condition along with expected values and standard deviations.
a. How do you interpret the differences in both the mean and standard deviation when assessing the patient load and healthcare demands at the two hospitals?
b. How does this difference in expected patient counts affect resource allocation, such as staffing and medical supplies, for the two hospitals?
D – Statistical Intuition in Social Innovation
Here we will consider a simple dataset from social sciences that records the height of individuals.
Question 4:
Consider a dataset from social sciences which is related to the heights of individuals in a population that often forms a normal distribution.
a. Explain how would you calculate the probability that a randomly selected individual has a height greater than 185 cm?
b. Explain how would you calculate the probability that a randomly selected individual has a height between 160 cm and 180 cm?
c. Calculate the 90th percentile of the height distribution. What does this value represent in the context of heights?
d. In a random sample of 49 individuals from this population, what is the probability that the sample mean height is within 1 cm of the population mean?
e. Suppose we want to find the height range that contains the middle 95% of the individuals in the population. Explain how do you obtain the lower and upper bounds of this range?
f. If you needed to cast a basketball team with the tallest players from this dataset, how many players would you need to ensure they win every game without jumping? Explain.
E – Statistical Intuition in Sustainability: COP-28
Imagine you have been invited to participate in the upcoming COP-28 which gives you an opportunity to contribute to the community using your statistical skills. You conducted a survey within your area and observed the recycling behavior in your community and recorded whether individuals in the community recycle (“1” for yes and “0” for no). You managed to get 100 records.
Question 5:
The answers to the following questions will provide you valuable insights that can be presented in COP-28.
a. What is the proportion of individuals in the community who recycle, based on the dataset?
b. Calculate a 95% confidence interval for the proportion of recyclers in the community. What does this interval tell us about the likely range of recycling behavior in the entire community?
c. If the 95% confidence interval for the proportion of recyclers is (0.55, 0.75), how would you interpret this interval? What level of confidence does it represent, and what can we conclude about the community’s recycling behavior?
d. If a separate survey conducted earlier found that the recycling rate in the community was 70%, how does this rate compare to the point estimate obtained from the dataset, and what additional insights can you provide?
Assignment Information
Weight:
18%
Learning Outcomes Added
CompProgramDesign: Generate working programs in a computer language that can solve computational problems; find and fix bugs that appear in them.
Distributions: Identify different types of distributions and make inferences based on samples from distributions appropriately.
Visualizations: Interpret, analyze, and create data visualizations.
InferentialStats: Apply and interpret confidence intervals, statistical significance, and regression.
Probability: Apply and interpret fundamental concepts of probability, including conditional and bayesian probabilities.