Regression

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

GET A 40% DISCOUNT ON YOU FIRST ORDER

ORDER NOW DISCOUNT CODE >>>> WELCOME40

GB 365 – Business Analytics II Homework Assignment # 1 The purpose of this assignment is to give you practice implementing and interpreting basic regression analyses with real world data. You are expected to submit your own assignment, but please feel free to consult with your classmates. We especially encourage the use of Moodle Forums to discuss the homework. We will monitor these and step in to clarify when confusion seems to be an issue. Submission details  Type your assignment in Word and submit it as a Word or PDF file. This document should include all plots, but regression results can be left in the Excel document.  Include the Excel files showing your work (i.e., formulas, regression results, etc.).  Upload a single zip file containing all of these files by 8:00 pm on Sunday, February 21st. Late homework will not be accepted. Part 1: Getting comfortable with Excel Background You recently graduated and landed your first job as an analyst with Food Fight Group, owner of 17 restaurants and bars here in Madison. The group is thinking of opening an 18th location, a bar located in the Capitol neighborhood. Your boss has heard that reviews can be critical to a bar’s success, especially in that neighborhood, but she wants to better understand to what extend this is true. Specifically she wants to know which aspects of reviews are most impactful. Your boss turns to you, as the newly hired analyst, and asks, “How can we measure the impact of reviews on the company’s success?” Realizing that Yelp’s data are all available online, you assemble a dataset containing some basic information on all businesses that Yelp has identified as “Bars” and being in the “Capitol” neighborhood. Specifically, you collect the following fields: Variable Definition Business_name The name of the restaurant. reviews Number of reviews AvgRating Average rating of the restaurant. AvgReviewLength Average length of the review (in number of words). TotCheckins Number of check‐ins Let’s analyze these data and see what the data tell us about the role of customer reviews!   1. Open the “Homework 1 – Data Part 1.csv” file in Excel. You should see each of the five columns listed above. NOTE: When saving your document for the first time, make sure that you “Save as” an .xls file (rather than a .csv file) in order to preserve your work! GB 365 – Business Analytics II 2. This is the data you have at your disposal. Think about possible relationships between these variables. What relationships could be of interest to you as an analyst? Why? Don’t just list the variables between which you ultimately find relationships. Sometimes, finding out that two variables are unrelated is as informative as finding a strong relationship. In other words, you should be able to answer this before you start plotting anything. 3. One thing you would like to know is whether higher ratings actually bring more customers. So you focus on two variables: AvgRating and TotCheckins. The first step is (and should be, whenever possible):  visualize your data.   a. Make histograms of both AvgRating and TotCheckins. Be sure that the bin sizes are appropriately sized to convey the patterns in the data. b. What do these plots suggest about Yelp reviewers? Does this cause any concern about your sample? c. So, make a scatterplot, with TotCheckins as the Y‐axis (vertical) and AvgRating as the X‐ axis (horizontal). Beware: many a student has submitted graphs with the axes switched! i. Make sure it has a title and both axes are labeled. ii. Increase the size of the points, choose your favorite color and shape. iii. Increase the font size on the axis title, axis labels, and axes.   d. Now, what does the scatterplot tell you about the relationship?   i. Is there a relationship at all? Is it positive or negative? What does it mean to have a positive or negative relationship? Interpret. Use caution: contents may be hot! Correlation does not imply causation!   ii. Is the relationship weak or strong? How do you know?   iii. Let’s make a first attempt to quantify the relationship: add a linear trend line.   1. What information about the variables does this line convey? 2. How many check‐ins are expected for a restaurant with 3 stars? 3. How many check‐ins are expected for a restaurant with 4 stars? 4. How many more check‐ins do we expect for 4 vs. 3 stars? 4. If you did everything correctly, there should be one point on the scatterplot that seems unusual.   a. Which point is it? Specify its coordinates, i.e. the value of TotCheckins and AvgRating. What is unusual about this point?   b. What is the name of the restaurant which is represented by this point?   c. What should you do about this point: throw it out or keep it? Why? Discuss advantages and disadvantages of both approaches.   5. We have analyzed the relationship between the number of check‐ins and the average rating. Can we argue that a higher rating attracts more customers? Why or why not? What are potential issues with that? Discuss.   GB 365 – Business Analytics II Part 2 – Targeting Higher Reviews Background After reviewing your analysis of the relationship between Yelp ratings and check‐ins (from Part 1), your boss has taken some time to read up on how Yelp reviews impact restaurants. She found that a ½ star increase in Yelp ratings is worth 5‐9% of restaurant revenue and could lead to 19% more sellouts (Luca, 2011; Anderson & Magruder, 2012). With this in mind, she is extremely interested in increasing the average Yelp ratings across the Food Fight Group portfolio. However, she isn’tsure exactly how to do this, and she has asked for your insight on what drives average ratings on Yelp. The two of you assume that Yelp ratings generally reflect the quality of the dining experience, and that this is probably driven by food and service quality. However, simultaneously increasing food and service quality across a group of restaurants could be challenging (and expensive), and you’d like to focus on one or the other (to start). Your job is to understand which area the group should focus on first, and why. Importantly, you have not been asked to identify the most cost effective method. You can safely leave it to others to evaluate the associated costs. You only need to identify which is likely to be more effective in terms of raising the Yelp rating and support that decision empirically. Data You start the analysis by collecting data on all 419 bars and restaurants in Madison, WI currently listed on Yelp. You collect a number of attributes, including:   Variable Definition BusinessName The name of the restaurant or bar. Categories A comma separated list of the categories assigned by Yelp (i.e., “Pizza”). PriceRange An indicator of restaurant price tier, categorized into four buckets {1,2,3,4} where 4 is the most expensive. AvgRating The average number of stars assigned by Yelp reviewers. You then start to think carefully about how to quantify food and service quality using only the data at hand. Since most reviews contain a description of the patron’s experience, you decide to mine the text of these reviews for quality indicators. First, you define a set of adjectives (e.g., delicious, tasty, appetizing, scrumptious, yummy, delectable, and mouthwatering) that would be used to compliment positive experiences with food, and then define a similar set (e.g., cheerful, happy, smiling, fast, quick, speedy, rapid, prompt, and immediate) for service. For each restaurant, you calculate the percent of reviews containing these terms, and name these variables PercFoodComp and PercServComp. This data is available in the “Homework 1 – Data Part II.csv” file. Cleaning the Data & Creating New Variables Cleaning the data is an important part of any analysis. Because we are looking at what drives average ratings, you feel that businesses with very few reviews may skew the results. Consequently, drop all businesses with fewer than 10 total reviews.  (Hint: Use the filters in Excel.) GB 365 – Business Analytics II Visualizing Relations
hips As with any good analysis, you start by visualizing the relationships between the key variables. 1) Create histograms of:   a. % of Reviews Complimenting Food b. % of Reviews Complimenting Service c. Average Rating 2) Create the following three scatterplots, add a trendline, the equation for this line, and R2 value, and interpret the results. (Hint: Make the axis scales the same to help when comparing across plots.) a. % of Reviews Complimenting the Food vs. Average Rating b. % of Reviews Complimenting the Service vs. Average Rating c. % of Reviews Complimenting the Food vs. % of Reviews Complimenting the Service   3) Does multicollinearity between PercServComp and PercFoodComp seem to be a concern? Give some argument to justify your response. Running Regressions The plots have given us some valuable insight into the bivariate relationships between service quality, food quality, and average rating, but we want to understand any potential tradeoffs between the two quality measures.   4) Regress AvgRating on the two quality measures. Be sure to include the residual plots. a. Calculate: i. The expected increase in average rating resulting from a 10% increase in the percent of reviews complimenting service. ii. The expected increase in average rating resulting from a 10% increase in the percent of reviews complimenting food. b. What do the p‐values for PercFoodComp and PercServComp tell us? c. What does the R2 value tell you? d. Do the residual plots raise any concerns regarding the potential violations of our regression assumptions? e. At this point, would you recommend that your boss focus on improving food or service quality? Why? As you think about the problem, you realize that whether food or service quality matters more may depend on whether you’re operating a restaurant or a bar. 5) Create an indicator for whether each business is categorized as a restaurant (i.e., whether Categories contains the word “Restaurants”), and interact this variable with the food and service quality variables. All non‐restaurant business in the data are bars. (Hint: Combine the “FIND()” and “ISERR()” functions  to create this indicator). GB 365 – Business Analytics II 6) Regress average ratings on the five quality and restaurant variables (two quality variables, one restaurant indicator, two interactions). a. Calculate: i. The expected increase in average rating for a bar resulting from a 10% increase in the percent of reviews complimenting service. ii. The expected increase in average rating for a bar resulting from a 10% increase in the percent of reviews complimenting food. iii. The expected increase in average rating for a restaurant resulting from a 10% increase in the percent of reviews complimenting service. iv. The expected increase in average rating for a restaurant resulting from a 10% increase in the percent of reviews complimenting food. b. What do the p‐values for each of the five slope coefficients tell us? c. Does this model fit better or worse than our last model? How can you tell? d. How would your recommendation now change based on whether you were trying to improve average ratings for a restaurant or a bar? Prices can also play an important role in the dining experience. While your boss probably won’t change the pricing structure of each restaurant in order to achieve higher ratings, it may still be interesting to understand how average ratings change with price. 7) Regress average ratings on price (and only price) and calculate the following. (Hint 1: price is an ordinal variable. Hint 2: There is a restaurant missing a price that will have to be excluded from the regression.) a. What is the expected average rating for a business in price tier one? b. What is the expected average rating for a business in price tier two? c. What is the expected difference in average rating for a business in tier two vs. tier one? d. What is the expected difference in average rating for a business in tier three vs. tier one? Works Cited Anderson, M., & Magruder, J. (2012). Learning from the Crowd: Regression Discontinuity Estimates of the Effects of an Online Review Database. The Economic Journal, 957‐989. Luca, M. (2011). Reviews, Reputation, and Revenue: The Case of Yelp.com. Working Paper.
=FIND() =ISERR() • Looks for a string (a series of letters or numbers) in a cell. • Returns the number corresponding to the position if the string is in the cell. • Returns an error if the string is not in the cell • This function is case sensitive. That is, “Test” is not the same as “test”. • Returns a value of TRUE if the referenced cell or nested function returns an error. • Returns a value of FALSE if the referenced cell or nested function returns a value (for example, a number). • There are many ways to write a function that accomplishes any task in Excel. Find one that works for you, and don’t worry if it’s not the same as someone else’s. • Remember that you can nest functions. That is, you can put one function inside of another function. • You may also want to use an IF() statement. • You don’t need it, but you can google the LOWER() function. It is usually helpful in this type of situation. • Don’t get your binary variable mixed up! Ask yourself, “if this is a restaurant, do I want a 1 or 0?”

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

GET A 40% DISCOUNT ON YOU FIRST ORDER

ORDER NOW DISCOUNT CODE >>>> WELCOME40

 

 

Posted in Uncategorized