Using Excel, organize the data to prepare for analysis. Ensure that each field (Student ID, Name, Major, Grade, Gender, Age, Class Status) is clearly labeled in the dataset.
PART ONE:
PivotTable Creation:
- Create a PivotTable to analyze the distribution of students across different majors.
- Set the ‘Major’ field as the rows of the PivotTable.
- In the PivotTable, calculate the frequency (count) of students in each major. Display this count in a column labeled ‘Frequency’.
- Add a new column in the PivotTable for ‘Relative Frequency’. Calculate the relative frequency for each major by dividing the frequency of each major by the total number of students. Format this column as a percentage.
Interpretation: Based on the frequency distribution table, which major has the highest number of students? Which major has the least? What is the relative frequency of students in the most popular major?
Visualization
- Create a bar chart using the PivotTable data to visually represent the frequency of students in each major.
- Create a pie chart to represent the relative frequency distribution of students across the majors.
- Write a brief analysis (150-200 words) summarizing the distribution of majors among the students. Discuss any patterns you observe and suggest possible reasons for these patterns. Consider the implications for the universitys course offerings and resource allocation.
- Suggest at least one strategy the university could use to better balance the distribution of students across different majors, if needed.
Video: Using Pivot Tables for Categorical Data in Excel
PART TWO:
Creating a Frequency Distribution Table
- Use Excel’s PivotTable feature to construct a frequency distribution table for the “Grade” variable.
-
- Include the following columns in the PivotTable:
- Grade Ranges: Define appropriate grade ranges (bins) for grouping the data.
- (Refer to the instructions in the video below to determine the number of classes, class limits, and class width for the grades)
- Frequency: Count the number of students falling within each grade range.
- Relative Frequency (Percentage): Calculate the relative frequency for each grade range, expressed as a percentage of the total number of students.
- Cumulative Frequency: Calculate the cumulative frequency by summing the frequencies of the current and all preceding grade ranges.
- Percent Cumulative Frequency: Calculate the percent cumulative frequency by dividing the cumulative frequency by the total number of students, then multiplying by 100.
- Grade Ranges: Define appropriate grade ranges (bins) for grouping the data.
- Include the following columns in the PivotTable:
- Data Visualization
- Histogram: Create a histogram using the PivotTable data to visualize the distribution of grades. Ensure that the bins are set appropriately to effectively display the spread of grades.
- Cumulative Frequency Graph (Ogive): Construct a cumulative frequency graph to visualize the cumulative frequency distribution. Plot the cumulative frequencies on the y-axis and the upper boundaries of the grade bins on the x-axis.
- Interpreting the Results:
- Which grade range has the highest frequency of students? What does this indicate about the overall performance level?
- How does the cumulative frequency distribution help in understanding the grade data? For example, how many students scored below a particular grade range?
- What does the relative frequency tell you about the distribution of grades? Are there any grade ranges with notably higher or lower frequencies?
- Analyze the frequency distribution table and the visualizations to answer the following questions:
- Implications and Insights
- Based on the histogram, discuss the central tendency and variability in the grades. Is the distribution skewed in any direction, and what might this suggest about student performance?
- Analyze the cumulative frequency graph to identify any patterns or trends. For example, does a significant number of students score above or below a particular threshold?
- Consider potential academic implications based on your findings. What recommendations could you make to educators or administrators to address observed trends, such as high frequencies in low or high grade ranges?
PART THREE:
Creating the Contingency Table:
- Use Excel’s PivotTable feature to create a contingency table that shows the frequency of each gender within each major. The table should include:
- Rows representing different majors (e.g., Computer Science, Engineering, Biology, Mathematics, Economics).
- Columns representing different genders (e.g., Male, Female, Non-binary).
- Add a filter for the “Class Status” variable. The class statuses include Freshman, Sophomore, Junior, and Senior.
Table Layout:
- Include the following elements in your contingency table:
- Frequency Counts: The number of students for each gender in each major.
- Row Totals: The total number of students in each major.
- Column Totals: The total number of students for each gender.
- Grand Total: The total number of students in the dataset.
PART FOUR:
Constructing a Box Plot:
- Using Excel, create a box plot (also known as a box-and-whisker plot) for the “Grade” variable. This plot should display the following:
- Minimum (excluding outliers): The lowest value within 1.5 times the interquartile range (IQR) from the lower quartile.
- First Quartile (Q1): The 25th percentile of the data.
- Median (Q2): The 50th percentile or middle value of the data.
- Third Quartile (Q3): The 75th percentile of the data.
- Maximum (excluding outliers): The highest value within 1.5 times the IQR from the upper quartile.
- Outliers: Data points that fall outside the “whiskers” (1.5 times the IQR from the quartiles).
- Identifying Outliers:
- From the box plot, identify any outliers in the data. Outliers are typically shown as individual points beyond the whiskers of the box plot.
- Handling Outliers:
- What are potential reasons for these outliers in the grade data?
- How might these outliers affect the analysis of the data set, including measures of central tendency and variability?
- What methods could you use to handle these outliers? Discuss at least two approaches.
- Discuss the implications of the identified outliers.
- Data Analysis:
- After identifying and handling the outliers, provide a summary of how the central tendency (mean, median) and variability (range, standard deviation) of the grades are affected by the presence or removal of outliers.
- Discuss whether it is appropriate to include or exclude outliers in this analysis, considering the context of the data (e.g., academic performance assessment).
- Visualization and Reporting:
- Include a brief report (100-150 words) summarizing your findings, decisions on handling outliers, and the rationale behind your choices.
