**1.3 Game-based assessment (GBA)**

GBAs are designed to measure knowledge, skills, or abilities (KSAs) within the context of the game [21–23]. They are defined as evaluations that use game elements

#### *Commercial-off-the-Shelf (COTS) Games: Exploring the Applications of Games for Instruction… DOI: http://dx.doi.org/10.5772/intechopen.103965*

to immerse the individual in a specific game environment, allowing them to interact with it, and demonstrate the desired KSA [2]. Importantly, GBAs typically use game activities as tasks to generate evidence of complex skills [24]. Using a GBA with strong psychometric evidence (e.g., reliability and validity evidence for the intended use) has the potential to provide a number of benefits. Some researchers have proposed that GBAs might be designed to generate positive outcomes in test-takers such as reduced test anxiety, challenges faking, and measuring more behavior-based measures [3, 7–9, 16, 25, 26]. In the context of work, some researchers have referred to GBA as "an assessment method in which job candidates are players participating in a core gameplay loop while trait information is inferred" [9].

While traditional assessments use participants' responses to textual or graphic prompts to collect data about their knowledge, skills, and abilities, GBAs gather this data based on the test-taker's in-game behaviors [27]. These behaviors can be a range of information such as overt information like a choice a player makes when faced with a discrete decision in a game. A theory-driven assessment often focuses on a theoretical research model to build the design and intention of the evaluative components into the assessment. These components may lead to a narrative or series of decisions around targeted behaviors related to the variable being assessed. An alternative to a theory-driven approach are games built using a data-driven approach where minor behaviors during the gameplay are used to predict the outcomes of interest. A datadriven GBA is not built with a targeted construct and measurement in mind [28–30] but can empirically measure a construct using *trace data* such as mouse clicks, movements, interactions with objects in games, or time spent on a task [28]. This *trace data* can be collected automatically by a game and has been shown to demonstrate meaningful information with regards to assessing player information [28]. Either type of GBA changes how assessments are traditionally measured, but holds onto the psychometric properties within the game to evaluate a variety of KSAs [9, 31].

There are several benefits regarding GBAs that make them particularly appealing to organizations who might use them for personnel selection. One of these is the suggestion that GBAs might be able to predict job performance beyond traditional selection methods [16, 32]. Given the longstanding concern of applicant faking, the potential to reduce socially desirable responses are another benefit that GBAs may provide to organizational assessments [16, 32]. Since GBAs can be designed to measure traits and behaviors indirectly, this has the potential to obscure the purpose of the assessment which could in turn make it more challenging for candidates to identify the variable being assessed and what a good answer would look like [32]. A last benefit that may be of particular interest to organizations is that GBAs have been shown to reduce adverse impact compared with traditional paper-and-pencil tests [27, 33, 34]. Because of these benefits, organizations may want to consider investing in GBAs to evaluate different characteristics in the workplace, such as cognitive ability, individual characteristics, or job skills [27, 35].

#### **1.4 Commercial off-the-shelf (COTS) games**

Given the time, money, and expertise needed to develop game-based interventions, there are contexts in which it is more reasonable for researchers or practitioners to use an existing COTS game for their purpose rather than investing the time or resources into developing their own game [3, 27, 36]. However, a critical consideration when choosing to use a COTS game is that these games are rarely designed for the purpose they will be used for. Although a growing number of vendors are developing COTS

games for learning or assessment purposes and having a growing body of evidence to support those games for those uses, in most contexts a researcher and practitioner will not have the option of a COTS game designed for the purpose they intend. This makes the consideration and selection of the COTS game to be used a critical step.

If a COTS game is being used for assessment or evaluation, the behaviors being displayed by the player must be measurable, either by a metric captured in the game or by an observable behavior that can be recorded by an observer. Additionally, the content needs to present a scenario in which the player has the opportunity to demonstrate relevant behavior to the variable being measured. For example, see the series of studies by [27] where VR COTS games were used for measuring spatial recognition variables.

If a COTS game were being used for a learning application, the game would need to present the relevant content that the player needs to learn. This includes presenting the information and ideally presenting a context in which the relevant information could be used or the intended skill could be practiced. It would be further valuable for the selection of the COTS game to avoid games with excessive information or components that aren't relevant to the learning experience, also called *seductive details* [37].

Research on traditional assessments focuses heavily on the psychometric reliability and validity of the instrument [9], but empirical support for the psychometric properties of COTS games is still in its infancy. This lack of research of the efficacy and utility of COTS games for instructional and assessment purposes has led to the current series of studies which are intended to provide preliminary evidence on the use of COTS game scores. In these studies, we ask the following research question.

**Research question:** *Will scores on COTS games significantly converge with traditional multiple-choice assessment scores?*

Below we detail four correlational studies focused on the application of COTS games. We seek to better understand potential applications of COTS games as learning or assessments interventions by better understanding the similarity between scores produced in a COTS game to traditional multiple-choice assessment. Each of the COTS games in the studies below was carefully considered and chosen as it represents content that could be relevant to learners in a GBL context. Since the scores produced by COTS games are not intended for learning or assessment purposes, our question is general to understanding if these scores are meaningfully related to a relevant measure in these particular cases; scores on a traditional multiple-choice assessment in the same content area. We caveat that this evidence does not generalize to all COTS games and individual data would need to be collected and analyzed if a COTS game is being considered for a similar application. However, this series of studies does provide a general proof-of-concept that may aid future applications of COTS games.

### **2. Research series**

Data from the first three studies were collected as part of other projects and analyzed for this research question. All four studies collected samples from undergraduate and graduate students from Universities in the Western United States, with approval from the associated Institutional Review Board. Each study reports demographic information, scores from the COTS game used, and scores from the multiple-choice assessment. An image of the COTS game used in each study is shown in **Figure 1**. Each multiple-choice assessment included four response options, one

*Commercial-off-the-Shelf (COTS) Games: Exploring the Applications of Games for Instruction… DOI: http://dx.doi.org/10.5772/intechopen.103965*

#### **Figure 1.**

*Images from the commercial-off-the-shelf (COTS) games used in Study 1 (Quintet, top left), Study 2 (Arm Surgery 2, top right), Study 3 (PC Building Simulator, bottom left), and Study 4 (Car Mechanic Simulator, bottom right).*

correct answer, and good metrics based on a pilot sample (i.e., reasonable difficulty between 0.30 and 0.90 and knowledge discrimination of *r* > 0.25). Each assessment was developed using research-based principles [37, 38]. Lastly, each study collected the participant's tendency to play videogames using the Video Game Pursuit scale (VGPu) [39]. The scores on this 19-item measure are reported on a 5-point scale from 1 = strongly disagree to 5 = strongly agree. This measure is used since researchers have proposed that a propensity towards playing video games may impact the results of videogame interventions [40].

#### **2.1 Study 1**

Participants were 385 students; primarily female (51%) and Caucasian 73% with an average age of almost 19 years (*SD* = 1.72). In the study, participants completed a consent form, then an initial survey measuring demographic information and video game pursuit. Participants then played the assigned game for 20 minutes followed by a second questionnaire which included a multiple-choice assessment. The study concluded with a four-minute debriefing video.

#### *2.1.1 COTS game*

In the videogame *Quintet*, participants play as crew members of a spaceship (i.e., captain, helm, tactical, engineer, or scientist) who must work with a team to complete various science missions. Participants must learn their role and how

#### *Computer Game Development*

to manage their responsibilities on the ship to earn points and meet the mission objectives.

#### *2.1.2 Multiple-choice assessment*

A 26-item multiple choice assessment was written for this study with questions linked to the specific missions, roles, and responsibilities in the game.

#### **2.2 Study 2**

Participants were 140 students; primarily female (69%), and mostly Caucasian (39%) or Hispanic (29%) with an average age of almost 24 years (*SD* = 4.85). In the study, participants completed a consent form, then played the assigned game for 15 minutes followed by a questionnaire with the multiple-choice assessment, demographic questions, and the measure of videogame pursuit. The study concluded with a short debriefing statement.

#### *2.2.1 COTS game*

The videogame *Arm Surgery 2*, takes place in a virtual operating room where players take on the role of a surgical doctor. The game begins with a tutorial, where a nurse guides the player through surgical techniques and medical instruments. Players then have to complete a surgery to repair a broken arm as quickly as possible with as few mistakes as possible.

#### *2.2.2 Multiple-choice assessment*

A 17-item multiple choice assessment was written for this study. All questions pertained to the surgical terms, tools, and procedures presented during the game.

#### **2.3 Study 3**

Participants were 100 students; mostly females (67%) of Caucasian (40%) or Hispanic (32%) descent with an average age of about 23 years (*SD* = 6.28). In the study, participants signed a consent form then reviewed an 8-page tutorial on how to play the game. Participants then played the assigned game for 30 minutes before completing a survey of demographic questions, videogame pursuit, and a multiple-choice assessment. Participants were then given a debriefing handout and dismissed.

#### *2.3.1 COTS game*

In the game *PC Building Simulator*, players take on the role of a PC repair shop owner. They must manage their shop by diagnosing and repairing computers to make money for their store.

#### *2.3.2 Multiple-choice assessment*

An 18-item multiple choice assessment was written for this study. All questions were related to the tasks and terminology used in the game for repairing PCs.

*Commercial-off-the-Shelf (COTS) Games: Exploring the Applications of Games for Instruction… DOI: http://dx.doi.org/10.5772/intechopen.103965*

#### **2.4 Study 4**

Participants were 78 students; mostly females (70%) of Hispanic (27%) or Asian (28%) descent with an average age of about 23 years (*SD* = 6.28). In the study, participants completed a consent form then played the assigned game for 15 minutes. The study ended with a survey of demographic measures, videogame pursuit, and a multiple-choice assessment.

#### *2.4.1 COTS game*

In the game *Car Mechanic Simulator 2*, players take on the role of a mechanic to repair, paint, and tune cars in a garage. Their goal is to make money by repairing as many cars as possible.

#### *2.4.2 Multiple-choice assessment*

A 10-item multiple choice assessment was written for this study. All questions asked about the tools and types of repairs done in the game.
