Statistical Report 2

Payton McCarthy

Professor Davis

MTH 332

13 February, 2020

Analysis of Who Plays Video Games

Abstract: This study looks to compare data of the random surveys taken from students from the University of California, Berkeley. These students were enrolled in Statistics 2 during the Fall of 1994. This survey aims to look at the reported frequency of play from these students, whether the students like to play video games or not and why, and then a collection of general information about the students.

Intro and Background: There was an exam given the week before the survey was taken. 314 students had taken the exam, and only 95 of them were chosen randomly for the survey, 91 of whom had responded. The objective of this study is to explore the responses of the students in the survey with the intent of providing useful information to others.

Methods: This study uses the computer program RStudio to make statistical calculations about the dataset. After downloading the data ‘video.data’ from the course website, use this code

data <- read.table(“video.data”,header=TRUE,sep=””)

to label the dataset as ‘data’. Then I quickly labeled all the constants given to us,

N <- 314

n <- 91

nonrespondents <- 4

Then I separated the data into two different datasets, based on the first column “time”. This is simply the number of hours played in the week prior to survey, this data was separated into two lists of all the time-values of the students who did play in the last week, and the time-values of all the students within the last week. This was done so that statistical calculations about each of these two separate groups can be made. Do this by using

stu_DidPlay <- filter(data, time != 0)

stu_DidPlay_values <- pull(stu_DidPlay, var = -15)

values_time <- pull(data, var = -15)

The fractions of who did play can be calculated by

fraction_DidPlay <- length(stu_DidPlay_values) / n

Now all the descriptive statistics can be found fairly simply by simply computing them:

x_DidPlay <- mean(stu_DidPlay_values)

x_time <- mean(values_time)

Median_DidPlay <- median(stu_DidPlay_values)

Median_time <- median(values_time)

SD_DidPlay <- sd(stu_DidPlay_values)

SD_time <- sd(values_time)

IQR_DidPlay <- IQR(stu_DidPlay_values)

IQR_time <- IQR(values_time)

min_DidPlay <- min(stu_DidPlay_values)

min_time <- min(values_time)

max_DidPlay <- max(stu_DidPlay_values)

max_time <- max(values_time)

Next is to find the z-scores, skewness, and kurtosis for both groups. This is done by calculating the z-score for each value in both lists of students who did play and all students:

z_DidPlay <- (stu_DidPlay_values – mean(stu_DidPlay_values))/sd(stu_DidPlay_values)

z_time <- (values_time – mean(values_time))/sd(values_time)

These new lists are comprised of all the z-scores for all the time-values for both groups. Now the skewness and kurtosis can be calculated by:

skewness_DidPlay <- sum(z_DidPlay ^ 3) / (length(z_DidPlay) * (sd(stu_DidPlay_values)) ^ 3)

skewness_time <- sum(z_time ^ 3) / (length(z_time) * (sd(values_time)) ^ 3)

kurtosis_DidPlay <- sum(z_DidPlay ^ 4) / (length(z_DidPlay) * (sd(stu_DidPlay_values)) ^ 4)

kurtosis_time <- sum(z_time ^ 4) / (length(z_time) * (sd(values_time)) ^ 4)

We now want to calculate a 95% confidence interval for this data. I did so for both datasets:

lower_2SDinterval_DidPlay <- (x_DidPlay – (2 * SD_DidPlay)) / sqrt(n)

upper_2SDinterval_DidPlay <- (x_DidPlay + (2 * SD_DidPlay)) / sqrt(n)

lower_2SDinterval_time <- (x_time – (2 * SD_time)) / sqrt(n)

upper_2SDinterval_time <- (x_time + (2 * SD_time)) / sqrt(n)

The next thing performed was to display all the important statistical information in a visual way that would further aid in understanding the results. The first way this is done is by looking at the Normal Curves of both datasets and compare them easily, this was done using this code:

ggplot(data = data.frame(stu_DidPlay_values = c(0, 30)),

mapping = aes(x = stu_DidPlay_values)) +

stat_function(mapping = aes(colour = “Students Who Did Play”),

fun = dnorm,

args = list(mean = mean(stu_DidPlay_values),

sd = sd(stu_DidPlay_values))) +

stat_function(mapping = aes(colour = “All Students”),

fun = dnorm,

args = list(mean = mean(values_time),

sd = sd(values_time))) +

scale_colour_manual(values = c(“blue”, “red”)) +

labs(x = “Hours Played Last Week”,

y = “Probabilities”,

title = “Normal Curves for Hours Played Last Week Between Groups”)

Next I looked at the second column in the ‘video.data’ labeled “like”. This data tells us how much the student likes to play. 1 = ever played; 2 = very much; 3 = somewhat; 4 = not really; 5 = not at all. It was important to filter out the data where a 99 was used, as this would skew the data:

stu_like <- filter(data, like != 99)

stu_like_values <- pull(stu_like, var = -14)

The descriptive stats for this dataset were found by,

x_stu_like_values <- mean(stu_like_values)

Median_stu_like_values <- median(stu_like_values)

SD_stu_like_values <- sd(stu_like_values)

IQR_stu_like_values <- IQR(stu_like_values)

min_stu_like_values <- min(stu_like_values)

max_stu_like_values <- max(stu_like_values)

z_stu_like_values <- (stu_like_values – mean(stu_like_values))/sd(stu_like_values)

skewness_stu_like_values <- sum(z_stu_like_values ^ 3) / ((length(z_stu_like_values) * (sd(stu_like_values)) ^ 3)

kurtosis_stu_like_values <- sum(z_stu_like_values ^ 4) / ((length(z_stu_like_values) * (sd(stu_like_values)) ^ 4)

Then a second figure was made, this was another Normal Distribution graph that shows the probability of how much a student likes to play:

ggplot(data = data.frame(stu_like_values = c(1, 5)),

mapping = aes(x = stu_like_values)) +

stat_function(mapping = aes(colour = “Who Likes to Play”),

fun = dnorm,

args = list(mean = mean(stu_like_values),

sd = sd(stu_like_values))) +

labs(x = “How Much Students Like to Play”,

y = “Probabilities”,

title = “Normal Curve for How Much Students Like to Play”)

This kind of graph is beneficial for displaying the mean and standard deviation simply. The skewness and kurtosis can be clearly seen and is comparable.

Next thing done was to look at the frequency of play, that is, how often do these students normally play video games? 1 = daily; 2 = weekly; 3 = monthly; 4 = semesterly,

freq_play <- filter(data, freq != 99)

freq_play_values <- pull(freq_play, var = -12)

Descriptive stats,

x_freq_play_values <- mean(freq_play_values)

Median_freq_play_values <- median(freq_play_values)

SD_freq_play_values <- sd(freq_play_values)

IQR_freq_play_values <- IQR(freq_play_values)

min_freq_play_values <- min(freq_play_values)

max_freq_play_values <- max(freq_play_values)

z_freq_play_values <- (freq_play_values -mean(freq_play_values)/sd(freq_play_values))

skewness_freq_play_values <- sum(freq_play_values ^ 3) / (length(z_freq_play_values) * (sd(freq_play_values)) ^ 3)

kurtosis_freq_play_values <- sum(freq_play_values ^ 4) / (length(z_freq_play_values) * (sd(freq_play_values)) ^ 4)

Finally, a Normal Distribution was made for the students reported frequency of play,

ggplot(data = data.frame(freq_play_values = c(1, 4)),

mapping = aes(x = freq_play_values)) +

stat_function(mapping = aes(colour = “How Often Will Play”),

fun = dnorm,

args = list(mean = mean(freq_play_values),

sd = sd(freq_play_values))) +

labs(x = “How Often Students Will Play”,

y = “Probabilities”,

title = “Normal Curve for Frequency of Play”)

Results: Here is a table sharing all of the values of the relevant and descriptive statistics described above.

Table:

Descriptive Statistic	Hours Played by Students Who Did Play	Hours Played by All Students	If Students Like to Play (1-5)	Reported Frequency of Play of Students (1-4)
Mean	3.32647	1.24286	3.02222	2.70513
Median	2	0	3	3
Standard Deviation	5.63616	3.77704	0.87381	1.02068
IQR	1	1.25	1	2
Minimum	0.1	0	1	1
Maximum	30	30	5	4
Z-3	—	—	0.40079	—
Z-2	—	—	1.2746	0.66377
Z-1	—	—	2.14841	1.68445
Z0	3.32647	1.24286	3.02222	2.70513
Z1	8.96263	5.0199	3.89603	3.72581
Z2	14.59879	8.79694	4.76984	—
Z3	20.23495	12.57398	—	—
Skewness	0.01942	0.10566	0.83539	0.11051
Kurtosis	0.01539	0.195606	5.25736	1.75250

The fraction of students who played video games the week before the survey is about 37.4%.

The estimated 95% confidence interval for the amount of hours spent by students who did play the week before is (-0.833, 1.530), and for the amount of hours spent by all students in general is (-0.662, 0.922). A look at the Normal Distributions for these confidence intervals helps to paint a clearer picture:

It is impossible for students to have a negative amount of hours spent playing video games, which is visually shown by the graph above. Therefore, the confidence intervals are better stated to be (0, 1.530) and (0, 0.922), respectively.

We can compare this with the reported frequency of play and see that students tend to lean towards 3 more than 2, that is, they play about monthly rather more so than weekly. According to the table, all students across the board have an average frequency of play of 2.7 which seemingly corresponds to a sample average of 1.25 hours of gaming the week previous to the survey across all students, and an average of 3.3 hours of gaming across all students who actually played.

If we look at the final Normal Distribution, for how much students actually like to play, we will see that it looks far more standard. This would indeed seem to indicate that the students frequency of play may have been affected by the exam that was during the week previous to the survey.

Discussion and Conclusion: In this final part, we can look at some of the “attitude polls” from the students. By counting, we can easily see that 23/90 of the reported students ‘very much’ like to play video games, 46/90 ‘somewhat’ like to play, and 21//90 are ‘not really’ or ‘not at all’ into playing video games.

The reasons for such vary. However, some 72 participants gave reasons why they do like to play video games, as reported doing so mostly to relax, a 66/72 = 91.7%. Some also reported having secondary or even tertiary reasons they like to play, like 38.9% like the ‘feeling of mastery’, 37.5% play because they are ‘bored’, 36.1% play for the ‘graphics/realism’ that video games offer, and 33.3% like the ‘mental challenge’ that video games can present.

Some 83 participants also reported why they do not like to play video games. The two majority reasons being that it takes up ‘too much time’ at 57.8% and that it ‘costs too much’ at 48.2%. This is followed up by feelings of it being ‘pointless’ and ‘frustrating’, 39.8% and 31.3%, respectively.

Leave a Reply Cancel reply