Teaching social skills to students with autism spectrum disorder through augmented, virtual and mixed reality

This systematic literature review was conducted to explore the social validity of aug- mented reality (AR), virtual reality (VR), and mixed reality (MR) as a means of providing social skill instruction to students with autism spectrum disorder (ASD). Forty-one articles met the inclusion criteria, including five studies utilizing AR and the remaining 36 utilizing VR for social skill interventions. No studies implemented MR. The targeted skills of the studies included emotion recognition, relationship skills, social awareness, cooperation, and executive functioning. The intervention was considered effective in 63% of studies, not effective in 10% of studies, and mixed results in 27% of studies. The social validity indicators reported by researchers ranged from two to 14 of 17 determined categories. Findings indicate the primary socially valid reasons for utilizing AR/VR for social skill instruction were high student motivation toward the intervention and a positive attitude toward the technology. Findings indicate that increasing the role of parents, educators, and students as both social skill selectors and treatment agents and adding valid and reliable skill measures may improve the effects of an intervention. Sustainability may increase by providing training to both treatment agents and participants. AR has the potential to improve generalization and VR provides a practice environment for performance deficits. Combining these technologies may provide a more effective social skill intervention.


Social validity framework
Determining whether an intervention is appropriate, desired, maintained and generalised is critical to social skill intervention and is known as social validity (SV) (Fox and McEvoy 1993). SV provides a measure to look at an intervention's (1) goals (i.e. importance or justification), (2) procedures (i.e. appropriate or acceptability) and (3) outcomes (i.e. meaningful or importance) (Armstrong et al. 1997). SV is not something an intervention has or lacks but rather a multidimensional process consisting of numerous variables, including intervention acceptability and importance (Finney 1991). SV should be a supplemental measure to the direct measurement targeted by treatment (Callahan et al. 2017).
Even though there are no established criteria for determining what constitutes SV, there are methods for determining whether enough information is present to verify aspects of SV (Callahan et al. 2017). Reichow et al. (2011), when determining quality indicators of evidence-based practices (EBPs) for students with ASD, identified SV as extremely important. For an intervention to be socially valid, the study should include a minimum of four of the following seven indicators: (1) socially important dependent variables, (2) time-and cost-effective interventions, (3) comparisons between individuals with and without disabilities, (4) clinically significant behavioural change, (5) satisfaction with intervention results by consumers, (6) independent variable manipulation by people the participant typically interacts with and (7) taking place in natural contexts (Reichow et al. 2011).
As this study required to validate both the technology and the intervention within the technology, we split these into nine categories and 17 indicators. The nine categories include the following: 1. social relevance of technology-dependent variables (i.e. participants have positive feelings towards the technology), 2. social relevance of intervention-dependent variables (i.e. participants have positive feelings towards the intervention), 3. accessible (i.e. readily available, time-and cost-effective), 4. ease of use, 5. satisfaction with the results by stakeholders (i.e. technology was reported useful by teachers, parents and/or clinicians), 6. a behavioural change that is large enough for practical value (i.e. participant's increase in knowledge or skills as a result of the intervention), 7. continued skill success reported after the intervention, 8. skills are generalised into a natural context (i.e. home, school and community), and 9. skills are maintained over time.
The extent to which an intervention is considered socially valid significantly influences whether the intervention is adopted and implemented by students, educators and parents (Kern and Manz 2004). Therefore, we took the SV indicators in the above nine categories and divided these to establish 17 indicators, which reportedly influence intervention use: (1) independent variable manipulation factors by people who typically will implement the intervention (i.e. (10) number of sessions, (11) session time, (12) time span and (13) application outside technology) and (2) clinically significant behavioural change comparisons (i.e. (14) norm-referenced, (15) pre-and post-comparison, (16) multiple measures of performance, (17) fidelity and reliability).

Social skill instruction and the role of technology
Social skill deficits negatively influence academic performance (Welsh et al. 2001), interfere with relationship development (Ke et al. 2018), and increase aggression, depression and anxiety (Koegel et al. 2014). If not addressed, these challenges con tinue into adulthood and are linked to under-employment and unemployment (Tobin et al. 2014). Technology has been used to assist in improving social skills (i.e. video modelling and social narratives) for decades (Chelkowski et al. 2019). However, the use of innovative technologies, such as augmented reality (AR), virtual reality (VR) and mixed reality (MR), for social skill instruction for students with autism spectrum disorder (ASD) is emerging.
Educators report feeling inadequate in providing social skill instruction to students (Dobbins et al. 2010). Technology has the potential to provide this instruction 3 (page number not for citation purpose) in a systematic manner. However, there are limited research studies on the application of innovative technologies to improve social skills in students. Researchers and government bodies (ASELA 2015) report that students receiving social skill instruction display marked improvements in their (1) motivation to learn, (2) commitment to school, (3) time devoted to schoolwork, (4) mastery of subject matter, (5) school attendance, (6) graduation rates, (7) grades and (8) test scores. Therefore, it is imperative for educators, researchers and learning tech nologists to have a better understanding of the impact and related outcomes of using AR, VR and MR in order to provide social skill instruction to students with disabilities.

Research on delivering interventions through innovative technology
Recent research reviews have identified evidence into the effectiveness of VR and AR separately. Due to the increasing research base into the effectiveness of MR and the continually changing nature of these technologies (Liu et al. 2017), our review combines these technologies to better understand aspects the user and implementor find helpful within each delivery platform. It is important to understand the variations in immersive technology. AR provides a digital overlay onto a real environment (i.e. Pokemon Go) through mobile devices (i.e. iPads) or glasses (i.e. Hololens). VR provides digital simulations of a real-world environment through varying computational devices (i.e. laptops, tablets and head-mounted displays [HMDs]). MR combines technologies into a continuous scale of AR and VR, which allows the user to interact with and manipulate physical and virtual elements. VR and MR exist on a continuum from non to fully immersive (Carreon et al. 2020). For example, HMDs are considered fully immersive when users do not experience outside stimuli, whereas the same situation presented through a computer screen alone is considered non-immersive (NI) technology, as users can still perceive the physical world around them.
There has been a great deal of debate as to whether NI screen-based simulations should be considered VR. Within the technology industry, NI technologies are still considered VR due to the elements within the device (i.e. iPad and Chromebook), which are compatible with VR software (i.e. Unity and Unreal) and provide an aspect (i.e. first-person experience and life-like avatars) of the virtual world (Mosher et al. 2021). For the purposes of including all virtual environments (VEs), both NI and immersive have been included in this study. Studies reporting immersion all used either HMDs, such as an Oculus Rift, or 3D glasses inside a VE. Regardless of immersive qualities, virtual technology is gaining popularity in research as a viable method for intervention delivery.
Numerous reviews have considered the use of VR and AR to instruct students with ASD. Bellani and colleagues (2011) found VR to be a promising method of instruction in a moderate-to-high virtual immersion. The review focused on tolerance of VR equipment rather than specific social skill acquisition and only included two studies involving students with ASD. Radu's (2014) meta-analysis found that AR in educational settings improved content recall, memory retention, collaboration and motivation. The meta-analysis also found that high achieving students did not display the same benefits as lower achieving students but was unable to determine the causes of these differences. Akcayir and Akcayir's (2016) review recognised barriers to the clarity about what makes AR effective. They suggest future research investigate technology specifics, the intervention setting and the participant needs. Mikropoulos and Natsis (2011) examined VR's use in specific content areas and found VR useful in improving higher order thinking skills.  Merchant and colleagues (2014) focused on the differences in three forms of desktop-based VR. The study findings revealed that game-based learning environments were most effective; however, there was no statistical significance between the three groups in student learning, outcomes or generalisation. Researchers determined that gaming aspects were more suited for acquisition of new knowledge, whereas simulations were more effective for providing feedback. The authors did not examine the in-depth features making VR unique and how the features impact the virtual experience, intervention and ultimately student outcomes. Howard and Gutworth (2020) focused on the effects of VR on social skills for typically developing students and found VR to be more effective than comparison programmes by almost three-fourths of a standard deviation. Unfortunately, their research study did not examine different effects on participants with different abilities and instructional needs. Vasquez and colleagues (2015) completed a review of 19 studies focused on social skill development for K-12 students with ASD. The review included a broad definition of virtual learning environments (VLE), which allowed for 3D emotion systems, animated television series and other technologies loosely affiliated with VR. Their review reinforced the evolution of virtual technology and how hardware and software have the potential to alter and impact student outcomes. The researchers found that simulations may be more effective for student engagement than non-simulated environments. However, accessibility and usability of the technologies were absent. Carreon and colleagues (2020) investigated the impact of VR on the outcomes of students with disabilities in K-12 environments. They found that a majority (80%) of studies use a form of NI VR and 72% focused on social skill interventions. The results of Carreon and colleagues' study reveal VR to be promising in delivering authentic instruction through various immersive technologies. However, they reinforced the need for understanding participant characteristics and what elements of the technology lead to positive outcomes for students with disabilities. Mosher and colleagues (2021) determined that AR and VR are being used to provide interventions to students with ASD in order to target relationship skills, emotion recognition, social awareness, cooperation and executive functioning. However, there was no discussion as to the presence or absence of SV measures to determine whether these interventions were likely to be useful and maintained over time.
Each of the above reviews considered an aspect of a virtually delivered intervention. Researchers have yet to explore these studies, considering whether the social skills chosen are important and useful and whether the technology methods are motivating and acceptable. This information is necessary to determine whether the intervention is needed, will be maintained and will be effective (Callahan et al. 2017). A systematic review is required, which considers the acceptability and usefulness of virtual technologies and the ability of the virtual intervention to promote the needed social skill acquisition, generalisation and maintenance of students with ASD.

Research purpose and questions
Technology is ubiquitous with education today. As virtual technology continues to gain popularity in education, it is imperative for education stakeholders to understand whether chosen virtual interventions are useful and purposeful. SV measures can assist in understanding if AR, VR and MR are readily accessible and useful in teaching, generalising 5 (page number not for citation purpose) and maintaining social skills. Therefore, in order to understand the current landscape of SV and students with ASD, a research review is needed to determine the following: RQ1. The SV reported in studies using AR, VR and MR for social skill acquisition. RQ2. The socially valid indicators (i.e. goals, procedures and outcomes) of studies using AR, VR and MR for teaching social skills to students with ASD.

Search and screening procedures
A systematic search was conducted in March 2020 across four databases (Education Resources Information Center, PsycINFO, ScienceDirect and Web of Science) chosen for their extensive scope in education and technology research. All databases were searched for articles published between 2000 and 2020 using the following search terms: 'autis* ', and 'social' and 'student', and 'generaliz*' and either 'reality' or 'virtual' to encompass all of the following: autism, autistic, ASD, generalise, generalisation, VE, virtual learning, VR, immersive virtual, AR, MR and extended reality. The search was filtered by language (English) and limited to published peer-reviewed articles. The initial search conducted by the first author returned 2,773 articles (see Figure 1).   A search protocol was given to the second author who independently replicated the search and yielded 2,774 articles. After removing duplicate articles, the results were shared and combined into a single database. The search returned 950 articles for screening. A comprehensive hand search of five journals, chosen for their extensive publishing of technology in special education (Journal of Special Education Technology and Computers and Education) and autism research (Journal of Autism and Developmental Disorders, Autism Research, and Focus on Autism and Other Developmental Disabilities), resulted in one additional article. An examination of 19 literature review references produced four additional articles, resulting in 955 articles for screening.
The authors screened the title and abstracts and excluded articles that did not utilise a research design or target a social skill intervention utilising virtual technology for school-aged children with ASD. Screening resulted in the elimination of 884 articles and the inclusion of 71 articles. Articles were included if they used VR, AR or MR as the independent variable; had one school-aged student with a diagnosis of ASD; and were empirically based using single subject, qualitative, quantitative, or mixed methods. Articles were excluded that examined elements of virtual or reality (e.g. usability) without focus on the application of the technology for teaching or learning and were not subject to peer review.
After applying the inclusion and exclusion criteria to the 71 articles, 40 articles remained. An ancestral review was conducted using references from the 40 articles, resulting in one additional article. Three reviewers independently reviewed all 41 articles and came to 100% inclusion agreement.

Coding procedures
The first author coded all 41 articles. References for articles were entered into a database and randomly assigned to two additional researchers for coding. A fourth researcher was trained to code any disagreements. Training of coders involved reviewing the coding criteria, coding three articles, discussing coding and disagreements, and providing feedback until 100% agreement was achieved. The coding form included primary and secondary quality indicators by type of experimental design (Reichow et al. 2011) and the quality indicators of systematic reviews in behavioural disorders (Maggin et al. 2017).
Categories were coded as 'unclear' when the authors did not provide sufficient details to determine the variable. The 17 indicators of SV were coded both for their presence in the study and whether the response from participants was negative, positive or had mixed results. These measures included technology's ease of use, usefulness of the intervention, participant's views towards the intervention, as well as cost and availability of the technology. Maintenance was coded by agent reporting and length.
The specific social skill was coded, as well as whether a single, multiple, or social and other skills (i.e. academic and motor coordination) were implemented. Relationship skills included verbal and non-verbal communication and social engagement. Executive functioning involved the ability to focus on a task, create a plan of action, complete multiple tasks at one time, or any combination of the three. Emotion recognition involved naming a given emotion when shown an image. Studies were coded in the social awareness category if the dependent variable involved understanding the causes of events or behaviours and perspective-taking. Studies were coded in cooperation when the dependent variable included working with others to complete a task (Shih et al. 2015). The use of direct 7 (page number not for citation purpose) instruction (DI) and observational learning (OL) within the technology delivered intervention was coded. DI was defined as the explicit teaching of each step necessary to learn the targeted skill (Plavnick and Hume 2014). OL was defined as learning that occurs from seeing others' behaviour and the implications for that behaviour (Catania 1998;Plavnick and Hume 2014). Each type of technology and whether outside measures (i.e. prompting) were present within the intervention were coded.
Inter-rater reliability was calculated using the Cochrane Review model (Higgins and Green 2011), in which 52 items of the 126 were considered for each article, resulting in 2,184 total items coded for reliability purposes. Inter-rater reliability was calculated by determining the percentage of agreement. The raters divided total agreements by agreements plus disagreements, multiplied it by 100 for each response on the coding form to calculate the agreement rate. Inter-rater reliability for the 41 articles was calculated at 96.7%. Discrepancies were resolved by a fourth, trained researcher who independently coded all articles in which the coders disagreed. The information was conveyed to coders who reached 100% consensus of the 41 articles.

Results
The 41 studies included 524 males and 87 females whose age ranged from 2 to 20 years. The treatment agent and setting were reported in 34 studies (19 occurred in schools, 10 in a clinic and five in multiple environments), with researchers implementing the technology in 21 studies, teachers in 16, clinicians in nine and parents in three. Thirty studies were conducted to improve multiple social skills, seven taught a single social skill, and the remaining four did not state the targeted skill. The social skills taught using the technology included emotion recognition, relationship skills, social awareness, cooperation and executive functioning. Table 1 provides the 17 indicators of SV and whether the specific SV information was reported. Figure 2 shows whether the nine categories were positive or negative (i.e. useful or not useful) by the type of technology used to present the intervention. Eighteen studies (44%) provided SV measures related to the feelings towards technology and 15 (37%) of the 41 studies reported the feelings of the intervention within the technology.

Goals, importance and justification
All 41 studies stated multiple goals for the study, at least one of which was for the participants to learn a social skill. All studies stated the importance of teaching social skills to students with ASD and reported a parent or teacher documented social skill deficit in the student. Two studies (7%) included the technology's cost or availability to parents and teachers. One of the studies (Yuan and Ip 2018) stated that the Cave Automatic Virtual Environment (CAVE) was not cost-effective or available outside the clinic. Participants in this study became limited to those who had the time and transportation to and from the clinic containing the CAVE technology. Researchers in the other study (Stichter et al. 2014) declared the VR iSocial to be cost-effective and accessible to parents and teachers to implement.

Procedures
All 41 studies stated the specific technology used to implement the social skill. Five studies (12%) used AR, 26 (63%) used NI VR, 10 (24%) used immersive VR and MR was not used in any study. Researchers in 17 studies (41%) conveyed the ease of use of technology reported by the participants, as well as the treatment agents (i.e. teacher). Participants and treatment agents in 11 studies (65%) stated the technology was easy to use. Authors of two studies (12%) expressed that the technology initially was difficult to use but became comfortable with time. Four studies (24%) showed mixed reports regarding the ease of use. Participants whose IQ scores were higher than 70 reported the technology was accessible, while those with IQ scores lower than 70 reported that the technology was difficult to use. AR and NI VR were the primary technologies used in studies where participants stated that the technology was easy to use. Researchers using immersive environments in only one study (Adjorlu and Serafin 2018) indicated the ease of use of technology. Other immersive VR implementors reported that the technology ease was feasible only after learning to use the technology (Lorenzo et al. 2013(Lorenzo et al. , 2016. Eighteen studies (45%) included the participant and treatment agents' attitudes and views towards the technology, with 13 studies showing a positive attitude (72%), two studies (11%) showing a negative attitude, and three studies (17%) showing a mix of positive and negative reactions. For example, Tsiopela and Jimoyiannis (2014) used a NI virtual computer game to teach primarily pre-vocational skill speed, the accuracy of vocational skills (e.g. organising and sorting), and self-confidence. Parents and teachers within the study expressed that the technology made a positive impact on student confidence, communication, social awareness, and relationship skills, as well as speed and accuracy of pre-vocational skills. This finding suggests that students may observe and practice within the technology skills outside of the technology's instructional objective. The two studies that reported a negative outlook (Lorenzo et al. 2013(Lorenzo et al. , 2016 were also the ones reporting the technology was not initially easy to use. The researchers reporting mixed attitudes were due to some participants not liking to wear the 3D glasses (Cai et al. 2013) and one participant with more severe impairments not wanting to interact with the virtual avatar (Mantziou et al. 2015).
Fifteen studies (37%) stated whether participants liked the intervention within the technology. Researchers in 14 of the 15 studies (93%) reported that participants found the intervention to be exciting and rewarding. The researchers from the remaining  study (7%) reported participants having mixed feelings towards the intervention (i.e. not enjoying at all or having varying levels of enjoyment throughout the intervention). Participants and treatment agents from 35 studies (85%) stated whether the intervention presented through the technology was useful, with those in 34 of the 35 studies (97%) finding it useful and one finding mixed results (3%). Five studies (15%) in which participants and agents reported the intervention useful did not significantly improve the targeted skill. For example, parents' reports of social competence in Stichter et al. (2014) deemed the technology to be useful despite no significant changes in the children's scores on emotion recognition after the intervention.
The included authors used various measurement approaches to determine intervention success. Researchers used norm-referenced assessments to identify students for the intervention in 16 studies (39%). Norm-referenced measures were primarily used to assess IQ and a specific social skill deficit (i.e. emotion recognition from facial expressions). Of the 15 studies (37%) that reported a control group, eight (53%) studies consisted of typically developing peers and 12 studies (80%) had a control group with matched abilities in either social skill competence, full IQ, performance IQ or verbal IQ. Eighteen studies (44%) implemented pre-and post-assessments. Norm-referenced assessments measured progress pre-and post-intervention in only nine studies (22%). The studies' primary measures of improvement include observation by treatment agent (N= 33, 85%) and rating scales with interviews (N = 29, 74%). Fifteen studies (37%) utilised researcher-developed assessments to determine intervention success.
The intervention duration also varied considerably between studies. The number of sessions the participants received varied from one session on 1 day (Cai et al. 2013) to 80 sessions over 4 months (Modugumudi et al. 2013). The average number of sessions across studies was 14 sessions. Thirty-four (83%) studies reported the intervention period and the number of sessions, and 31 studies (76%) reported the session time. Session times varied from 10 min (Alcorn et al. 2011) to 150 min (Parsons et al. 2004). Most of the intervention sessions were within 20-40 min (N=15, 48%).

Outcomes
The magnitude of the effect was not mentioned in any of the 41 studies. The significance of intervention was determined in 35 studies (85%) by the effectiveness of teaching the targeted skill and in six studies (15%) by whether the participants were able to use the technology to complete the social task. The intervention led to statistical improvement in 15 of the 41 studies (37%). The intervention was considered effective in 26 studies (63%), not effective in 4 studies (10%), and 11 studies (27%) reported mixed results. Mixed results were reported because either the technology accurately taught one skill but not the targeted skill or the technology improved targeted skills but did not reach statistical significance. The effectiveness of intervention was listed in all five social skill areas: relationship skills (N=13, 50%), emotion recognition (N = 9, 35%), social awareness (N= 6, 23%), cooperation (N=3, 12%) and executive functioning (N=3, 12%).
Researchers from 35 studies (85%) reported whether there was increase in knowledge, skills or experience from the technology intervention in participants, with 32 studies (91%) stating an increase in knowledge, skills or experience, and three studies (9%) stating no increase in knowledge, skills or experience. Figure 3 shows both reporting of statistical and significant effects of the targeted social skill, as well as whether the minimum requirements to determine study validity and reliability by design type were included (Campbell and Stanley 2015;Ledford and Gast 2014).
The included studies differed in the method used within the technology to teach the targeted social skill. Three studies (7%) taught social skills through DI. Seventeen studies (41%) imparted social skills through OL. A little over half of the studies (51%) utilised DI and OL within the technology to teach the targeted skill. OL alone effectively taught relationship skills and cooperation but not emotion recognition, executive functioning and social awareness. Over half of the studies reporting significant improvements (65%) utilised a combination of DI and OL, with six studies (23%) using only OL and three (12%) using only DI.
Researchers from 20 studies (49%) reported generalisation of the social skill outside of the technology environment. Of these, 19 studies (95%) stated the generalisation environment (i.e. school, home and community) and the person reporting the generalisation (i.e. parent, teacher and student). Fifteen studies (79%) reported that students could generalise skills learned within the technology into real-world environments. Four studies (20%) reported that some students were able to generalise, and some were not, and one study (5%) stated that there was no generalisation. The maintenance of the social skills was reported by 13 studies (32%), with 10 studies (77%) reporting maintenance, two (15%) showing maintenance for a few but not all participants and one study (8%) reporting no maintenance (Mitchell et al. 2007). The maintenance reported in studies varied from 10 days to 720 days. Of the 13 studies (32%) reporting generalisation and maintenance, 12 stated that the skill was both maintained and generalised.

RQ1 studies reporting social validity measures
This review examines the SV of utilising virtual technologies for teaching social skills to school-age children with ASD. Three decades of documentation show that SV is a critical component of social skill interventions (Carter and

11
(page number not for citation purpose) et al. 1989). However, current research has yet to provide adequate information within studies to determine whether AR, VR or MR are socially valid social skill acquisition modalities. No researcher reported information for all nine categories of SV, revealing that some studies either did not measure SV within the study or did not report measuring these indicators. Of the nine categories of SV, studies identified anywhere from zero to eight categories. The SV indicators reported by researchers ranged from two to 14 of the 17 indicators. Only two studies reported whether the technology was accessible and affordable, an essential aspect of SV. Participants in one NI VR study reported that it was accessible and affordable, while another study utilising immersive VR reported that it was not accessible or affordable. It is important to note that researchers who reported a higher number of SV indicators also tended to report significant improvements in social skills. The session information and measures used varied between studies, but the vast majority of studies reported that the technology was easy to use and the intervention useful. Even though no studies reported all SV indicators, 85% of researchers reported on the usefulness of the technology and whether the intervention within the technology improved social skills. Of studies reporting SV indicators, 87% expressed motivation towards the intervention and 72% reported a positive student attitude towards the technology.
SV measures are necessary to determine whether the skill selected for intervention improves the participant's functioning of daily life requirements and activities. The technology was aligned with the needs of student's specific social skill in 15 of the 41 studies (37%). Most studies utilised technology with an already programmed script for teaching specific skills and then sought students with social skill deficits, assuming that the technology would be beneficial. One of the studies (Adjorlu and Serafin 2018), reporting higher levels of SV, utilised teachers in the intervention creation. The teachers chose the virtual setting, helped in the intervention design and provided DI to students through a headset during the three scenarios. This study showed higher implementation fidelity than other studies implemented by teachers and may, in part, be due to the teachers' ability to design and utilise the technology for specific students.
Educational technology needs differ significantly across communities, educational settings and socioeconomic backgrounds (Miller and Bugnariu 2016). Yet, cultural validity is not included in Table 1 because it was not mentioned in any study, despite the studies spanning 11 countries. The expected norms and behaviours of culture are embedded within any social skill acquisition. For example, Self et al. (2007) considered fire and tornado drill safety a social skill because researchers felt that these skills benefit students' daily wellbeing. A separate researcher may consider these adaptive skills rather than social and may find that they are not necessary for social acceptance in everyday life. Providing information on cultural validity in future studies would help determine the perceived usefulness of the technology delivered intervention for the desired population.
The research findings revealed that there are many socially valid reasons for using AR and VR as a method of social skill instruction for students with ASD. Among studies reporting usefulness, the central element reported useful was the technology rather than the social skill or the intervention within the technology. It would be helpful to understand how useful the skill taught within the technology is for the participant and those who interact daily with the participant. It would also be helpful to know whether participants felt the intervention methods within the technology were adequate for acquiring a targeted skill. Knowledge of whether the social skills chosen in studies were selected because they were easier to program or measure or whether they addressed a primary skill deficit would help determine the actual usefulness of the intervention. Increasing SV measures within studies would provide a better understanding of the benefit of using virtual and augmented technology compared with other instruction methods. Many SV measures were primarily based on verbal reports from students, parents and teachers. Having a norm-referenced measure to determine skill acquisition, generalisation and maintenance would provide a greater understanding of the technologies' successful implementation.
The high levels of student motivation towards the intervention, positive attitudes towards the technology and perceived usefulness of the intervention suggest that AR and VR may be socially valid instructional methods. Increasing the role of parents, educators and students as both skill selectors and treatment agents within the technology has the potential to increase SV. In addition, providing accurate measures of student progress in skill development has the potential for improving the statistical significance of AR and VR delivered interventions.

RQ2 social validity reported: Goals, justification, procedures and outcomes
Researchers must obtain information from participants on their attitudes towards the intervention and the intervention delivery in order to determine SV. The authors of every study reported a justification for the need for teaching social skills to students with ASD. However, only 15 studies (37%) discussed the feelings of participants and treatment agents towards the intervention. Eighteen studies (45%) determined whether participants or treatment agents had positive or negative attitudes about using the technology. Participants' attitudes are vital, as they plays a significant role on an intervention's continued use upon study completion (Carter and Wheeler 2019).
Procedures used within each intervention were not always reported. The primary reporting was on whether students learned through OL or DI. In 41% of studies, students with ASD were not given any DI on the skills, even though researchers stated that the skill was 'taught' to students. DI in systematically teaching a social skill in a purposeful manner (Plavnick and Hume 2014) was the primary instructional component in only three studies (7%). Individuals with ASD tend to require one-to-one delivered DI to learn a new skill (Stahmer 2007). DI can be given through AR and VR; however, it is currently under-utilised in interventions delivered virtually. OL was the primary means of teaching in 38 studies (93%). However, a research study shows that students with ASD do not readily learn prosocial behaviours through OL (Plavnick and Hume 2014). When assessing whether AR and VR are effective means of instruction, future researchers must also consider if the delivery within these interventions provide an adequate education.
Another critical measure of SV is the acceptability of an intervention. There was a reported correlation in ease of use and the participant's attitude toward the technology (Lorenzo et al. 2013(Lorenzo et al. , 2016. As the use of VR became more natural, participants' attitudes improved. Contradictory to Howard and Gutworth's findings (2020) and in support of Miller and Bugnariu's conclusions (2016), higher levels of immersion were more conducive to successfully delivering social skill interventions for students with ASD. Even though treatment agents and participants found immersive VR more challenging to use initially, the immersive VR showed a greater ease of use as time went on and greater significant improvements compared with NI VR or AR.

(page number not for citation purpose)
The technology's acceptance by participants was only discussed in the studies using eyewear (i.e. HMD). As virtual technology in schools is primarily implemented through screen-based devices, it would help to understand what aspects of these technologies may hinder learning.
We found evidence contrary to Miller and Bugnariu (2016) who reported the closer the VR match to the real world, the better the outcomes. Environments too closely resembling the student's actual school led to more off-task behaviour and less effecrive results, as students were distracted when the immersive environment did not match their current physical environment (Adjorlu and Serafin 2018). All 20 studies reporting generalisation included settings where participants reported feeling like they were in a real room talking to real people without the environment matching their classrooms. This aspect of resembling reality may be more effective than resembling specific locations, which may distract students.
The majority of researchers from the 13 studies reporting generalisation and maintenance stated that the skills were generalised and maintained. For example, Cheng and colleagues (2010), utilising NI VR, found a significant improvement in all three students' performance on the Empathy Rating Scale. Through discussions with teachers and observations, they were able to identify students who had increased empathy apart from the VE. For two of the three students, empathy was maintained for 60 days. In another display of generalisation, Chen and Lin (2016) utilised a tablet and a storybook with embedded AR markers. They found that students were able to learn six core emotions and facial expressions, and apply this knowledge in their home and community.
Mixed generalisation results were reported in a few studies where some students generalised skills and some did not or some skills were generalised but others were not. Although researchers stated the possible reasons for the participant differences, no researcher systematically studied the variance in maintenance and generalisation. For example, Adjorlu and Serafin (2018) found that only two of five students were able to generalise skills of cooperation and sharing into the classroom following VR intervention. Still, they did not complete follow-up testing to determine why skills were not generalised for the three remaining students. Stichter et al. (2014) utilising NI VR reported improvements in all areas that generalised to the school, home and community, except executive functioning, but did not propose a reason why.
Twelve studies (80%) reported that students could maintain skills learned in the AR and VR environments once intervention was complete. Lorenzo and colleagues (2016), comparing VR and NI VR for students with ASD, found a greater presence of appropriate emotional behaviours in immersive VR. Through observation, questionnaires, interviews and rating scales, researchers were able to determine that students using immersive VR maintained improvements in self-control, empathy and emotion recognition for 2 years.
Interestingly, all studies reporting maintenance had over nine SV indicators, signifying a social skill may more likely be maintained when parents, teachers and participants find the technology enjoyable, easy to use and useful. We also discovered studies conducted in schools in which the intervention periods (i.e., months instead of days) were extended resulted in higher levels of social skill generalization and maintenance. This finding suggests the need for further research into whether interventions implemented in schools improved students' maintenance and generalisation over those in clinics or homes. The increased generalisation in schools may also be due to the teacher having a better understanding of the intervention, and therefore, being better able to apply aspects of the VR intervention into daily classroom routines.
Thirty-two (91%) studies reporting improvement in knowledge, skills or experience stated that improvement was due to the AR and VR delivered intervention and the interventions effectively taught a targeted social skill in 26 studies. Researchers in 10% of studies showed no significant improvement, while 27% of studies showed mixed statistical improvement results. Despite AR and VR not consistently reaching statistical significance, there is SV evidence supporting the use for social skill instruction. Further research is needed to determine whether these technologies are effective in providing social skill instruction for students with ASD. When authors provided detailed study descriptions to determine intervention success (i.e. reliable measures and clear variables), AR and VR were found to be useful.

Limitations
This research study utilised only peer-reviewed studies from 2000 to 2020 obtained from specific databases. Thus, while we believe that we were thorough in our identification of studies, the quick-paced, evolving nature of technology and the growing outlets publishing on virtual technologies lead to the possibilities of missing some current literature. Our focus on school-aged students does not provide enough information to determine the implications of this research for early childhood and adults with ASD.
While we controlled for ambiguous definitions through agreement from multiple coders, social skill categories, OL and DI may be defined and evaluated differently by different researchers. Effect sizes were also not calculated for these studies. Although not required to determine the evidence base of a strategy (Cook et al. 2014), calculating standardised effect sizes would provide comparisons across studies. Finally, we did not exclude studies with low or no validity and reliability measures due to insufficient evidence on the reason the study excluded this information (i.e. word count limitations or insufficient rigour).

Implications for researchers, programmers and practitioners
The SV measures revealed that participants had a positive attitude toward using all forms of technology when they felt the technology was easy to use. Based on the literature, if educators allow students to become comfortable with immersive VR before the intervention, more significant learning of skills may be present. NI environments often take little pre-training but may not have the same impact as immersive technologies. For example, Lorenzo et al. (2016) found that students in NI VR showed higher frequencies of adequate behaviours in the initial sessions than those in immersive technology. However, with training and practice, the immersive environment showed greater ease of use over time and greater overall improvements related to students' emotional behaviours and compliance. Therefore, investing time to ensure student comfort before implementing the intervention has a positive impact. Time spent training on technology use also assisted in improving attitudes from participants and treatment agents, which suggests that ease of use may impact feelings about the intervention.

(page number not for citation purpose)
Practitioners may have better results in interventions if they determine targeted intervention elements before implementing virtual technology. These elements include whether the social skill deficit is skill-based or performance-based and whether the intervention within the technology is best suited for the student's specific deficit. Interventions using OL were effective when teaching relationship skills and cooperation but were ineffective when teaching other social skills. Future researchers may need to clarify how to ensure interventions within VEs provide instruction necessary for the specific skill deficit.
Determining the aspects of technology that may hinder students with ASD may provide more productive learning spaces. Sensory needs and thresholds for participants should be considered in selecting both the equipment and the VE. Environments may be too distracting for sensory-seeking individuals (Adjorlu and Serafin 2018). For example, when the ability to fly and voice chat were enabled, Ke and Im (2013) found off-task behavior was increased by all students with ASD. As soon as these non-essential functions were disabled, students participated appropriately. Students who are sensory avoiders may require VR without wearables or haptic controllers (Cai et al. 2013). Practitioners, programmers and technologists may want to consider developing and utilising technology in which distracting features, as well as specific handheld device requirements, can be disabled for better individualisation.
Human developmental factors should be included to determine an acceptable age to switch to immersive environments in order to not interfere with cognitive and physical development in young children. This would assist educators in determining which VE is best suited for the developing brain. It would appear beginning with NI VR or AR for young students and moving towards immersion as the student ages is advantageous.
Studies that provided higher levels of SV indicators were more likely to report generalisation and maintenance. This finding could be due to the specific researchers' increased thoroughness in documenting attitudes, motivations, usefulness, generalisation and maintenance. However, it could also be because studies with higher SV measures considered essential social aspects that allowed continued use of the technology. Attitudes towards specific technologies often correlate with the degree to which students and educators are willing to use the technology (Dabholkar and Bagozzi 2002). There is a need for further research on the factors influencing social skill acquisition and generalisation, paying particular attention to treatment agents' attitudes and participants' motivation.
One of the studies (Didehbani et al. 2016) mentioned that once the programme began rewarding appropriate social interactions, teachers no longer needed to provide physical rewards. Playing in the environment with interactive objects became rewarding. This knowledge is helpful to practitioners because having an intervention as a reward may decrease the need for external reinforcers. This knowledge is also beneficial to researchers and programmers as there was no need for additional reinforcement measures (i.e. badges and unlocking additional rooms) within the technology. The technology was motivating in itself, which may allow researchers to focus more on the intervention delivered within the technology and less on providing game-like features.
A cost-benefit analysis comparing AR and VR with existing techniques delivered through technology, as well as human-delivered interventions, would provide researchers with a better justification for investing in virtual technologies. Justifying technology as the instruction mode over other instructional methods is needed before educators invest their time and resources into implementing these mediums. Understanding the developmental factors, intervention specifics and stakeholder input will not only improve SV but also further highlight the potential of virtual technologies.

Conflict of interest and funding
None. The authors have no conflicts of interest to disclose and no financial support was provided for this review. The opinions expressed are those of the authors and do not represent views of the University of Kansas.

Author Contributions
MAM conceptualised the study, searched and screened articles, coded included articles and drafted the article. ACC replicated the search and screening of articles, coded included articles and assisted with article preparation.