The effect of adding same-language subtitles to recorded lectures for non-native, English speakers in e-learning environments

Research in Learning Technology 2020. © 2020 Gordon Matthew. Research in Learning Technology is the journal of the Association for Learning Technology (ALT), a UK-based professional and scholarly society and membership organisation. ALT is registered charity number 1063519. http://www.alt.ac.uk/. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.


Introduction
In recent years, the number of online classrooms and e-learning environments have been growing sporadically due to advances in Internet access and other technologies (virtual classrooms, YouTube, etc.). This growth also sparked research interests in technology's effect on learning and understanding. By incorporating technology into the classroom, it also implies that students will be exposed to various forms of media (i.e. multimedia) that they can use to extract more information. However, in most of the higher education institutions in South Africa, the academic language of students (the language in which they acquired knowledge at the school-level) differ from 2 Citation: Research in Learning Technology 2020, 28: 2340 -http://dx.doi.org/10.25304/rlt.v28.2340 (page number not for citation purpose) the language of instruction (generally English) at these higher educational institutes, especially in e-learning environments (CHE 2010:182).
Although South Africa has a rich multicultural and multilingual language environment and despite the endorsement of multilingual education by the Language in Education Policy (LiEP) (Department of Education, 1997), English is still the most preferred medium of instruction in South African schools and universities, and the trend is growing to intermediate and secondary phases as well (Van der Walt and Klapwijk 2015). This poses a problem, as most of the students enrolling for University (or online learning environments) in South Africa are not proficient enough in English, as it is their second or sometimes their third proficient language, and this puts them at a great disadvantage compared to students with English as their first language (Van Rooy and Coetzee-Van Rooy 2015).
These disadvantaged students are generally described as non-native speakers or foreign-language learners of English, but actually they do not fall in either of these categories. This is because the country they live in does speak English (not non-native to them) and English is not foreign to them, because they are exposed to English from an early age (not a foreign language). However, because the language of learning and teaching (LoLT) at the school-level can be any of the 11 official languages of South Africa 2 , the students would not have necessarily had English as a language of instruction (this trend is growing), and therefore are non-natives with regard to English as LoLT.
A recent study had found that even if students did well in English as an additional language (L2) in Grade 12 (last year of school), it does not help them with their academic performance or achievements at the university level (Van Rooy and Coetzee-Van Rooy 2015). Another study by Roussel et al. (2017) found that learning new content through a foreign or non-native language (or less proficient language) without any instructional support may interfere with rather than facilitate learning.
A possible solution is providing instructional material with same-language subtitles (SLS) to recorded lectures. The addition of SLS will not only help the students in understanding the language (language acquisition) but also provide them access to the information they require. The foundation of the proposed solution lies in the dual channel assumption and modality effect (Mayer 2002). Findings from studies on the dual channel assumption have shown that presenting the same information (or content) in both visual and auditory formats can assist students with understanding the content better. The modality effect (Mayer, 2002) then further states that the working memory has two processing channels (visual and auditory) and by delivering the same content to both channels the processing of the information is reduced in the working memory. Thus, a person learns better from both words and pictures than from words alone (Mayer 2002), and SLS (visual representation of the dialogue) along with dialogue can act as an additional instructional material to the student.
However, there is no conclusive evidence on how subtitles affect learning. Only a few studies in recent years have tried to find ways and study the effects of implementing both native and non-native language subtitles to make information more accessible in South African Higher Education environments (Hefer 2011;Kruger 2013;Kruger, Hefer, and Matthew 2014;Kruger, Szarkowska, and Krejtz 2015;Matthew 2019). None of these studies, however, focussed on the effects of subtitles in an open 2 South Africa has 11 official languages as defined by the country's language policies. These languages include: English, Afrikaans, Sesotho, Sepedi, Setswana, isiZulu, IsiXhosa, isiNdebele, Tshivenda, Siswati and Xitsonga.

(page number not for citation purpose)
distance or e-learning environment and none of them provided conclusive evidence for the effect of subtitles on learning.

Subtitling and subtitles
Subtitling can be defined as (Díaz-Cintas and Remeal 2007): [A] translation practice that consists of presenting a written text, generally on the lower part of the screen, that endeavours to recount the original dialogue of the speakers, as well as the discursive elements that appear in the image (letters, inserts, graffiti, inscriptions, placards, etc.) and (in the case of deaf and hard-of-hearing viewers) the information that is contained within the soundtrack (song, voice off). (p. 8) Zanón (2006) identified three types of subtitling: 1. Bimodal or intralingual (e.g. the dialogue and subtitles are in the same language) 2. Standard or interlingual (e.g. English dialogue and mother tongue subtitles) 3. Reversed (e.g. mother tongue dialogue and English subtitles).
Generally, subtitles are created according to the task they must perform, that is, whether they are used in entertainment or education (Gottlieb 2012). The focus of this study was on subtitles in an educational context, where their goal is to decrease cognitive load and make information presented to students more understandable and, in doing so, facilitate learning. For educational subtitles, the focus is predominantly on intralingual (same-language) subtitling, although interlingual (standard) subtitling is also used in studies focussing on language learning and language acquisition (Bisson et al. 2014;O'Brien 2006;Winke, Gass and Syderenko 2013).
Because subtitles are generally either translations or transcriptions of speech that have to be presented in sync with the dialogue, subtitles are on screen for a limited time during which they have to be processed. In a multimodal presentation, such as a subtitled video, there is constant competition between the subtitles and the moving background they are presented on.
The unique advantage that subtitling has over other language transfer methods (e.g. dubbing, voice-over and re-speaking) is that 'it allows the viewer to retrieve the original material without destroying valuable aspects of the authenticity of the material' and 'the original speech and dialogue remain intact in the subtitles' (Kilborn 1993:646). Because the authenticity of the dialogue is kept intact, the viewer can extract the mood, personality and intention from the dialogue, even if the subtitles are foreign (Kilborn 1993:647).
In theory, subtitles are part of a multimodal, polysemiotic, audiovisual text. Polysemiotic means that subtitles are part of an array of channels that communicate simultaneously to the viewer. The phrase 'for a multimodal, polysemiotic, audiovisual text', means that subtitles consist of four other channels that deliver information simultaneously, which is defined by Gottlieb (1998Gottlieb ( , 2012 as: • a visual-verbal channel (e.g. subtitles and captions) • a verbal-auditory channel (e.g. words uttered by an on-or off-screen character, narrator or presenter) • a nonverbal-auditory channel (e.g. sound effects and music) • a nonverbal-visual channel (the speaker or presenter himself or herself, illustrations, diagrams, graphs, etc.). Because subtitles are a new source of instructional aid, not much research has been conducted on reading the text on moving images (e.g. subtitles on video), the focus being more on static reading (books, newspapers, etc.).
By nature, subtitles consist of both visual (on-screen text) and verbal (textual representation of dialogue) modalities all at once. By adding it as an extra, third source of information to a recorded lecture, the viewer needs to prioritise the information-processing channel (either visual or verbal) that is needed to process this additional information. However, this is not an easy task, as all the sources of information in a subtitled video are in constant competition with each other for working memory resources, which may affect the processing of information (i.e. cognitive load). Affectively, cognitive load can be subdivided into two components: intrinsic cognitive load (ICL) and extraneous cognitive load (ECL) based on how they cause cognitive load. ICL is caused by the learner-task interaction (expertise, prior knowledge and cognitive abilities of the learner) and generally refers to the impact of the difficulty of the task on the learner. ECL, on the other hand, is caused by the presentation of materials in a task and does not facilitate comprehension and learning but can be altered by external factors.
There are, however, a variety of different ways to measure cognitive load, some of which are more suited to certain experimental designs than others. It must be noted that cognitive load, in itself, cannot be directly measured and must be inferred from behavioural measurements (emotions, respiration, etc.) or measurement of psychological processes (e.g. self-reported task load questionnaires) and physiological processes (e.g. eye movements) (Casali and Wierwille 1982). De Jong (2010) also emphasised that most of the indicators of cognitive overload (too much information to process) are based on the assumptions made due to a decrease in performance or increase in error rate (which was also prominent in the studies mentioned in Table 1).
Over the years, many studies have been conducted to explain the effect of subtitles on performance (considering either learning or comprehension) based on the assumption that a higher measured cognitive load results in a decrease in performance or increase in errors. These types of studies usually fall into three categories: language acquisition, vocabulary learning and comprehension (or retention). Some research has found that the addition of an extra source of information may facilitate the lowering of cognitive load, whereas other researches have found no noticeable effect of adding additional sources of information (see Table 1). Table 1 provides a summary of studies conducted to determine the effects of subtitles on performance (and indirectly, cognitive load). The studies are grouped into three categories, namely, language acquisition, vocabulary learning and comprehension (or retention), and are also grouped on whether they used native language or foreign language subtitles. Furthermore, the results of these studies classified them as either having a positive (better), negative (worse) or neutral (no difference) effect on student performance. Within these studies, the assumptions generally are, if the addition of subtitles to a video lowers the performance of the students, then subtitles are deemed ineffective for learning as they create additional cognitive load on the brain. On the other hand, the addition of subtitles can also be deemed supplementary and facilitate learning (better performance) by providing an extra resource to gather information from, when, for instance, a student is a better reader than a listener.
These differences in perceived effect of subtitles provide a great amount of uncertainty and inconsistency regarding the possible benefits of subtitles on learning and performance. This is largely due to the fact that most of the results on performance are based on assumptions, a large number of variables and variation in the types of materials used. Therefore, the current study was conducted to determine what effect the addition of subtitles has on the performance and experienced cognitive load for non-native speakers of English at a South African University and also contribute valuable information to the debate on the effects of subtitles on cognitive load and learning.

Methodology
The focus of this study was to determine the effect of different sources of information (audio, video and subtitled video) on performance and cognitive load, and whether performance or cognitive load is significantly affected by the number of sources of information that needed to be processed. The setup for this experiment consisted of one video presented to four groups, each group viewing the video in one of the four presentation modes (PMs): audio-only; audio and video; and audio, video and subtitle. Two types of subtitles were used, verbatim and edited. The verbatim subtitles are automated transcripts that have been synchronised with the dialogue of the video, and the edited subtitles were similar to the verbatim subtitles but were corrected according to general subtitle standards (37 characters per line, maximum of two lines, etc.).
After each video, the participants were asked to answer a self-rated cognitive load questionnaire and a comprehension test on the content discussed in the video.

Participants
The participants in this experiment were randomly selected from a population of first-year students enrolled for an Academic Literacy module. All the participants majored in Economics and were from the North-West University's Vaal Triangle Campus in South Africa. The participant sample consisted of 64 students (M = 29, F = 35) between the ages of 19 and 26 (M = 21). The participants were non-native speakers of English. Because this study was only exploratory, no other personal information on the participants was collected.

Materials
For this experiment, a variety of materials were used. These materials included a video that contained (in some instances) one of the two types of subtitles (verbatim and edited), a self-rated cognitive load questionnaire and a comprehension test. Each of these materials will be discussed in the following subsections.

Video
The video used for this experiment was a recorded lecture (approximately 10 min long) from the Open Courseware website of the Massachusetts Institute of Technology (Leight 2017). The video was then modified to be presented in four different PMs, namely: 1. Audio only (black screen with sound) 2. Audio and video (regular video with sound) 3. Audio, video and automatic subtitles 4. Audio, video and corrected subtitles.
The topic of discussion during this video was an introduction to Consumer Theory and focussed mainly on the workings of income and substitution effects. Although the participants had only been exposed to certain concepts used in Economics for 7 (page number not for citation purpose) a just few months (their first semester), it is important to note that the content of this video served as an introduction to the basic concepts of Consumer Theory and would therefore be at an appropriate difficulty level for the participants.

Subtitles
As previously mentioned, two types of subtitles were used for this experiment -verbatim and edited subtitles. The verbatim subtitles for each video were extracted directly from the downloaded videos (Figure 1). These subtitles are automated transcripts that have been synchronised with the dialogue of the video. There were, however, some inconsistencies with these subtitles, as some of the subtitles contained three lines of text (which is against standard subtitle conventions), had inappropriate timing for reading or skipped some important pieces of information. The edited subtitles were similar to the verbatim subtitles ( Figure 2) but were corrected according to the general subtitle standards. For example, each subtitle contained a maximum of 37 characters per subtitle line, a maximum of two lines of text on the screen at once, the correct presentation speed (a maximum of 6 s of visible time per a two-line subtitle) and the correct line divisions of the texts. By analysing both types of subtitles, a comparison could be made of the subtitle PM and speed that the participants preferred, as well as how effectively each subtitle was processed.

Cognitive load questionnaire
After watching the recorded lecture, the participants were asked to complete an eightitem questionnaire (Leppink and Van den Heuvel 2015) on the cognitive load that they experienced during the lecture (see Figure 3). They were also asked to complete a comprehension test on the content of the video. From this questionnaire, the participants' perceived cognitive loads were measured by their answers to specific questions. Questions 1-4 were related to the intrinsic load experienced by the participants, and questions 5-8 were related to the extraneous load experienced (Leppink and Van den Heuvel 2015). The participants had to rate themselves according to the level of complexity and their ability to understand the context and language used in the videos. For each group of questions (those related to extraneous and intrinsic load), the mean score and standard deviation were calculated and used to compare the different perceived cognitive loads between the different modes of presentation (Leppink and Van den Heuvel 2015). A hard copy of the questionnaire (Figure 3) was presented to the participants during this experiment.

Comprehension test
The duration of the videos restricted the number of questions that could be asked based on the number of concepts discussed throughout the video. Therefore, the comprehension test for each video consisted of only six questions, covering definitions and concepts, true or false answers and multiple-choice questions, based on the content of each video for a total of 10 marks. The comprehension test also contained a combination of both recall (cued-recall and recognition) and comprehension items. The item-reliability score for the comprehension test was measured at 0.86 by using Winsteps (see Figure 4). Winsteps is a Windows-based, Rasch Analysis and Rasch Measurement software to measure the reliability of persons or items (Linacre 2006). Rasch analysis is a method for obtaining objective, fundamental, additive measures (qualified by standard errors and quality-control fit statistics) from stochastic observations of ordered category responses (Linacre 2006). The low person reliability recorded for the comprehension tests ( Figure 4) were due to the low number of participants used for this experiment. A reliable, statistically significant score for person reliability can only be achieved for participant sample sizes larger than 150, which for this study was not possible.

Experimental setup
The participants were divided into four groups (G1, G2, G3 and G4) that were exposed to the same video content, each group viewing the video in a different mode. Figure 5 represents the design of the experiment describing the groups and the different PMs they watched. The first group, G1 (audio-only) had 16 participants; G2 (video with audio and verbatim subtitles) had 13 participants; G3 (video with audio) had 23 participants; and G4 (video with audio and edited subtitles) had 13 participants. From Figure 5, it is also clear that each of the groups watched a different presentation of the same video (V a , V av , V avsa , V avsc ). Afterwards, the participants had to complete a comprehension test (C) and a cognitive load questionnaire (Q) on the content of the video.

Results
It should be noted that this experiment was exploratory with the sole purpose of determining whether the cognitive load induced by subtitles had any effect on performance. This effect was also compared between different PMs [audio only (A); audio and video (AV); audio, video and verbatim subtitles (SA); and audio, video and edited subtitles (SC)]. The descriptive statistics of the two self-reported cognitive load components (ICL and ECL) and the performance measure (Comp) are presented in Table 2. The data for this experiment were analysed according to linear mixed effect modelling (LMEM). ICL refers to cognitive load caused by the task difficulty, whereas ECL refers to the cognitive load caused by the presentation format of the task. Provided that the PMs of the videos were different for each group, a greater difference was expected for the ECL between the groups than for the ICL. Figure 6 gives the distribution of the data points for each variable for the full sample.

11
(page number not for citation purpose) Figure 6. Visual representation of distribution of data for variables.
A LMEM was constructed to determine the impact of PM on comprehension scores (Comp) (see Model 1). Table 3 gives the estimates of the LMEM based on Model 1.  Table 3, it is evident that estimates for the video PM (PMAV) and both the subtitled video modes (PMSA and PMSC) resulted in a decrease (negative estimate value) in comprehension, compared to the audio-only PM (intercept). However, this did not reach significance.
These values also indicate a slight, reverse modality effect -meaning that the PMs containing more modalities (sources of information) have a greater effect on the participants than those with fewer modalities (Tabbers et al. 2004). In this case, the PMs with no subtitles did slightly better than those with subtitles, but without any statistical significance. This is generally due to the complexity of the material and the fact that the participants did not have control over the presentation of the information (Leahy and Sweller 2011: 944).
It should also be noted that these results could not have been due to the unreliability of the questions (items) asked during the comprehension test, which were measured at 0.86 (as was shown in Figure 4), and therefore suggests that the difference in PM has no noticeable effect on performance (comprehension).
The second model (Model 2) tested the influence of the PM on the perceived ICL of students. Table 4 provides the estimates of LMEM based on Model 2.  Table 4, it is evident that all three PMs containing video (PMAV, PMSA and PMSC) were perceived as less difficult (decrease in estimate value for ICL) than for the audio-only PM, although this difference was not statistically significant. However, due to the little difference in the effect of ICL between the video and subtitled PMs, it can be argued that tasks are generally perceived to be more difficult when there is more than one source of information involved. However, this analysis falls outside the scope of this paper.
The third model (Model 3) tested the influence of PM on the perceived ECL. Because ECL concerns the cognitive load associated with the presentation format of stimuli, it is assumed that there will be a significant difference between three of the four PMs used in this experiment (audio-only, audio and video, and video with subtitles). This assumption is based on the evidence from the literature that the ECL experienced will increase as the number of sources of information is increased (Ayers and Sweller 2005). It also means that little difference will be noticed between the two subtitled PMs as they are presented in the same format and therefore contain the same number of sources of information. A third model (Model 3) tested the influence of PM on the perceived ECL of students. Table 5 presents the estimates of the LMEM on Model 3. For Model 3, no significant differences were found between the ECL for each of the PMs. Although there was no significant difference between the modes, the fact that the verbatim subtitles recorded a larger effect of ECL between the subtitled PMs, seems to suggest that the ECL perceived by students is caused by something other than the format. Because these results do not fall within the scope of this article, the cause of the higher ECL for verbatim subtitles will not be discussed here.

Conclusion
Although this study was simplistic and exploratory in nature, the findings suggest that SLSs, in an educational context, have no significant effect on either performance or perceived cognitive load for students. It also seems to indicate that students are able to adapt their information-processing ability depending on the amount of sources of information (i.e. the different PMs) they encounter.
From an information-accessibility stand point, specifically on e-learning platforms, this is a valuable finding. This means that students that are non-native speakers of English can access subtitled, content-specific videos without their performance or cognitive ability being affected by the extra source of information that needs to be processed.
Given the ongoing debate associated with the impact of subtitles as an extra source of information to be processed (i.e. whether it facilitates or is detrimental to learning) and the effect that it has on experienced cognitive load, this study, although very limited, provides valuable evidence to support the fact that subtitles have no additional effects on the learning process. Due to the fact that the evidence provided by other studies on the topic is mostly inconclusive on the effect of subtitles (specifically same language subtitles), the findings from this study may facilitate the understanding of how subtitles affect learning and information retention.
However, it is clear that the scope of this paper does not provide sufficient clarifications for anomalies and other unanswered questions and therefore beckons a second, similar experiment, where subtitle-related variables (presentation speed, number of words, composition, etc.) are controlled for and other physiological measurements (e.g. eye movements) are implemented to measure induced cognitive load and the processing of subtitles. Other limitations that were not considered in this experiment, but need to be considered for future experiments, are the influence of English proficiency, prior knowledge and memory capacity on the processing of subtitles.