# [SOLVED] an exercise on hypothesis testing, factor analysis and clustering analysis

This is an exercise on hypothesis testing, factor analysis and clustering analysis.
Please download the data “Movie Preference.csv”. The variables included are:

• Column 1: StudentId (respondent identification number, 1, 2, 3, …)
• Column 2: Master (=1 if a master student, =0 if an undergraduate student)
• Column 3: Female (=1 if a female: =0 if a male)
• Column 4: Foreign (=1 if from a foreign country, =0 if from US)
• Columns 5-11: importance rating on the 7 movie attributes (with a 7-point scale where 1 means not important at all and 7 means highly important)

M1: I can relate to the characters
M2: The movie is visually pleasing
M3: Set and costume-design are an important part of a movie
M4: Movie features major stars
M5: Movie has first-rate special effects
M6: Engaging story-line
M7: I feel “transported” while watching

1. Run the factor analysis on the importance rating of the 7 movie attributes (M1-M7)
1. Based on the “eigenvalue >1” criterion, how many factors would you extract? Show the R output to support your choice.
2. Report the factor loadings. Please name the factors and provide your reasoning.
3. Save the factor scores as columns to the movie dataset you read in, and show the first 5 rows of the data (i.e., StudentID 1-5).
4. Plot all the respondents (i.e., students) based on the factor scores of the two most important factors (i.e., a perceptual map of all the respondents).
1. Run the K-means clustering on the factor scores obtained from 1c, to create three

Note: Set R random seed before you run clustering analysis using “set.seed(12345)”. This is simply for me to replicate your results.

1. Save the cluster indicator as an extra column to the movie dataset. How many respondents are classified into each of the three clusters?
2. Which factor(s) are the important ones in forming the three clusters? Provide the R output as your support.
3. Describe the differences between the three clusters on the important factors identified in the previous step. Please describe the characteristics of each cluster and provide your reasoning. Provide the R output as your support.
4. Hypothesis testing

For the important factor(s) identified in 2b, please select a proper test (e.g., correlation test, T test; Chi-square test; ANOVA…) to test (a) and (b).
For each question, please write down H0 (Null hypothesis) and H1 (alternative hypothesis), report your test results, conclusions, and reasoning.

1. whether the cluster indicator is significantly different between undergraduate and master students
2. whether the factor scores obtained from 1c are significantly different between female and male students

