Please download the hotelsat.csv. Hotel management would like to use the insights from the findings and better their offerings for the right customer.
Q1. Management would like to know whether there is any pattern between how much each customer spend per night (avgRoomSpendPerNight) and overall satisfaction (satOverall) per segment (eliteStatus) as well as per visit purpose (visitPurpose). Please visually explore whether you are observing any pattern. Please briefly interpret the findings.
Q2. Management would like to know whether there is any pattern between how customers rate the city (satCity) and overall satisfaction (satOverall) per segment (eliteStatus) as well as per visit purpose (visitPurpose). Please visually explore whether you are observing any pattern. Please briefly interpret the findings.
Group mean differences (t-test and Anova)
Q3. (a) Visualize the distributions of the variables in the hotel satisfaction data via scatterplotMatrix. By looking at the distributions of the variables, do you see any need to transform any variables? Which variables need to be transformed if needed? And, please transform them as new variables and add to the dataset. [note: please use the log transformed version of the variables if you see need using them any questions in this block] (b) Please run the correlations among the variables (except the characters). Do you see any problem? Please explain.
Q4. Hotel management wants to know whether people who come from closer locations spend more/less money per night on room (avgRoomSpendPerNight) and on food (avgFoodSpendPerNight), and whether they stay longer/shorter (nightsStayed). They use the median distanceTraveled as the cut off point. People who are higher than the median distanceTraveled is coded as distant locations, and people who are lower than the median distanceTraveled is coded as close locations. The question is: are there statistical differences between people who come from closer distance (less than median value) and people who come from longer distance (more than median value) when it comes to (a) avgRoomSpendPerNight, (b) avgFoodSpendPerNight, and (c) nightsStayed? Please briefly explain [hint: you need to convert the distanceTraveled into a logical data type].