Empowered learning through microworlds and teaching methods: a text mining and meta-analysis-based systematic review

Research in Learning Technology 2020. © 2020 J.M. Costa et al. Research in Learning Technology is the journal of the Association for Learning Technology (ALT), a UK-based professional and scholarly society and membership organisation. ALT is registered charity number 1063519. http://www.alt.ac.uk/. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.


Introduction
Since the creation of LOGO (Logic Oriented Graphic Oriented) (Papert 1980;Papert et al. 1979), microworlds have been used by teachers in a wide range of areas to enhance students' knowledge acquisition and to promote the development of higher order cognitive skills, such as planning a course of actions or heuristics of problem-solving (e.g. Pea and Kurland 1984). However, the development of cognitive skills and the acquisition of disciplinary knowledge associated with microworlds in schools are influenced by a wide range of variables that many studies do not discriminate. One of these variables is the teaching method, not always measured, because some studies use microworlds in the classroom to study other variables, such as improving students' learning just by using the microworld (e.g. Fargas-Marques and Costa-Castelló 2006;Yurtseven and Buchanan 2012).
We begin our paper with the definition of microworld and the main teaching methods that guides the teaching strategies with microworlds in the classroom. Next, we conduct a systematic review to understand the characteristics of the studies that used microworlds to achieve knowledge and pointed out the teaching method in their work. Finally, we present a meta-analysis with effect size and the study of the p-curve to understand the effectiveness of the teaching methods associated with the use of microworlds on students' knowledge acquisition.

Theoretical framework
Microworlds Papert (1980) has defined microworlds as simulations in computational environments where students can manipulate their objects and learn from those manipulations. A microworld is also a framework of learning and an incubator of knowledge as the entire learning experience must be able to take place exclusively within the microworld (Papert 1980;Somekh 1996). Sometimes, students also must learn the language of the microworld itself as in cases where its software only accepts a specific syntax (Somekh 1996). In general, semantics of the content, simulations and the interface of microworlds contribute to the learning experience of students (Somekh 1996).
Although there are divergences between some authors regarding the concept and characteristics that should be present in a microworld (e.g. Hoyles, Noss, and Adamson 2002;Rieber 1996;Sarama and Clements 2002), we will ground our study on the original concept of Papert (1980) because it reflects the initial concept of microworld, still valid, and that led to different views.
Not all digital learning environments can be considered as microworlds. According to Papert (1980), a microworld must have three main characteristics: (1) it should allow the creation of simple examples related to the knowledge that is expected to be acquired; (2) there should be no obstacles in the manipulation of the objects within the microworld and (3) the required concepts for learning must be definable within the microworld.
The first microworld, the Geometric Tortoise from LOGO (Papert 1980), was created a few decades ago, but nowadays there are a large diversity of microworlds that can be applied in several academic areas such as computer programming with Alice (Alice Project 2020; Scratch 2020), neurosciences with BrainExplorer (Schneider et al. 2013) or mathematics with Speedy World (Wang et al. 2018), having in common the three characteristics pointed out by Papert (1980).
Taking the example of Alice software (Alice Project 2020), we can assume that it is a microworld for programming learning since the construction of simple instructions allows the opportunity of learning the introductory concepts of computer science (criterion (1)), the manipulation of characters and space does not have any limitation and students can create their own instructions and relate them to the chosen characters as well (criterion (2)) and students can define a large set of variables, data types and instruction for each character according to the learning objectives to be achieved (criterion (3)).

Teaching methods
There are several ways to categorize the different teaching methods and instructional design models. The teaching methods, namely, the traditional method, are more comprehensive and older than the instructional design models, since the latter appeared in the 1950s, with Skinner's programmed teaching (Skinner 1954), and are closely related to the scientific theories of learning and human development (cf. Bruner 1966;Gagné 1985). Teaching methods aim to promote knowledge acquisition and optimize student learning. Instructional design also has the same goal, although it is mainly concerned with optimizing learning and performance (cf. Merrill 2002), has a more prescriptive and normative bent and is based on scientific theories and models of human learning.
Regarding the teaching methods, we chose a categorization that can be easily grasped: the direct and indirect teaching methods. The former comprises some teaching strategies with a long history, such as the teacher's exposure or lecture, included in the so-called conventional or traditional methods of teaching. While much of the time spent in the classroom is used by the teacher to expose the matter, as Bruner (1965) refers to as the main feature of the educated societies, each discipline has a typical mode to develop and sequence the classroom activities, carried out by the teacher and students (Schofield 1995). These methods can also be defined as teacher centred (Bransford, Brown, and Cocking 2000). The instructional models that may be included in this category are the so-called behaviourist or instructivist models, such as Skinner's model (Skinner 1954) referred earlier, with the difference that instead of being the teacher who teaches the student, the program takes the same role by leading the student to the desired goal by successive stages of increasing difficulty.
The indirect teaching methods include the project method (Kilpatrick 2007), problem-based learning, guided discovery learning (Bruner 1966) and other related methodologies. Constructivist instructional models may also be included in this category. These methods are also called student centred (Bransford, Brown, and Cocking 2000). They advocate that learning is more consistent when it is built by the students, individually but preferably in collaboration with peers or someone more competent, as opposed to that which is transmitted by the teacher. In other words, 'the rationale underlying these so-called discovery approaches is that the material that is generated is better learned than the material that is only received' (De Jong and Lazonder 2014, p. 371).
Cognitive instructional models base their proposals on experimental results on human cognitive architecture, where memory is the mechanism that allows human beings to learn (e.g. Anderson 1993;Baddeley 1997). Examples of these models are Robert Gagné's conditions of learning theory (Gagné 1985) and van Merriënboer's four components instructional design model (van Merriënboer 1997). Regarding the teaching methods, we can include the meaningful learning theory of David Ausubel (2000) and the mastery learning of Benjamin Bloom (1956).

Relationship between microworlds and teaching methods
The integration of microworlds in education in the mid-1980s, after the publication of the book 'Mindstorms: Computers and powerful ideas' (Papert 1980), leads to many teachers and researchers, influenced by Papert' ideas, thought that it was enough to get children to program without having to teach them, as the 'nuggets of knowledge' integrated into microworlds would be easily learned and transferred. Experimental research has contradicted this idea (e.g. De Corte 1993; Pea and Kurland 1984). Thus, together with the LOGO microworld, the indirect teaching methods were privileged, especially the guided discovery learning, which produced no statistically significant effects on knowledge acquisition and development of cognitive skills by students (e.g. Yuen-Kuang and Brigth 1991). Positive and significant results were found when teachers and researchers used teaching strategies that guided students (cf. Mayer 2014; Mendelsohn 1991), supporting researchers who criticize constructivist methods for learning complex skills (Kirsnher 2019; Littlefield et al. 1988).
The most important conclusion that can be drawn from all the research efforts associated with microworlds in the learning of children and young people is that it has no intrinsic virtues unless teachers and researchers associate it with certain teaching methods. Microworlds can be combined with distinct teaching approaches (McDougall 2002). In cognitive approaches, students have access to correct solutions of problems to be solved inside the microworld, aiming to change their cognitive structures to match scientific understanding. The constructivist or constructionist approach advocated by Papert (1980) has the main intention of guiding the student to build or modify his or her own knowledge even if it involves more time to explore the microworld and the absence of scientific theories in the manipulation of objects (McDougall 2002).
The most promising seems to be the cognitivist or mixed teaching models due to their good balance between direct and indirect teaching strategies (De Corte 1993). However, remains unclear which are the most efficient combinations. We propose the development of a meta-analysis to answer this question.

The systematic review
We conducted our systematic review through three steps (Cooper and Hedges 2009): (1) by searching on databases for specific term related to our research question; (2) application of inclusion and exclusion criteria to the resulted studies from (1); and (3) codification of the studies and meta-analysis.
The teaching method is an important factor in the use of technology in the schools as we mentioned before and recent research has confirmed (Costa and Miranda 2019; Vosinakis, Anastassakis, and Koutsabasis 2018). However, not all studies highlight the association between the use of microworlds and the teaching methods. Therefore, in our systematic review, we focus only on the studies that considered this variable combined with microworlds, even if it was only for control purposes.

Search criteria
During June 2019, we collected studies on the major databases: ACM (Association for Computing Machinery) Digital Library, ERIC (Education Resources Information Center), IEEE (Institute of Electrical and Electronics Engineers) Digital Library, ISI (Institute for Scientific Information) Web of Science, Science Direct, Springer Open and DOAJ (Directory of Open Access Journals). We only searched for the term 5 (page number not for citation purpose) 'microworld' and did not select any time range because we wanted to reach all the studies about the theme, with no time restrictions.
The search criteria returned a total of 668 articles. Through a subsequent analysis of their corresponding titles, 395 articles were identified as related to the educational field and were selected for a deeper analysis.

Inclusion criteria
The total number of selected articles, 395, is still a large number of manuscripts, which would require careful reading to validate if each one constitutes a valid contribution from the microworld in learning context. Furthermore, at least two experts would need to validate each article to mitigate the known human assessment subjectivity (Santos, Laureano, and Moro 2019). Therefore, an approach based on standard text mining techniques can be an alternative to automatically select those articles that are relevant to the studied subject , and help in defining and executing inclusion criteria. This approach consists in identifying a lexicon that enables to correctly categorize each article, and then in running a text mining script that builds a document-term matrix that quantifies how many times each of the terms from the lexicon occurs within each article (Cortez et al. 2018). Although the lexicon definition is subjective, after it is completed, the computational parsing process of each text is the same regardless of the article. Also, the negligible computational time taken and the fact that the script can be executed any number of times to tune the lexicon are two other advantages of this approach. The lexicon was defined under two key semantic concepts: (1) quantitative research design and (2) teaching strategies, according to the main concepts presented in 'Teaching methods' section. It should be stated that the list of terms was developed by considering a broader perspective, that is, we prefer to select a few irrelevant articles to discarding relevant ones. Tables 1 and 2 exemplify the main terms (reduced terms) and some of the corresponding related terms for each case (the full list can be consulted at https://fenix.iscte-iul.pt/homepage/smcmo@iscte.pt/microworlds). An article would be included if it met at least one term in each dictionary. For example, if an article contained the 'pre-experimental' term, stated in Table 1, and 'cognitivism', stated in Table 2, it would be added to the results from the application of the inclusion criteria. The application of the defined inclusion criteria returned 105 articles.

Exclusion criteria
The application of text mining did not distinguish whether the studies were only descriptive or the words of the dictionaries were introduced in the right context of our work. So, at this phase, we decided to apply the exclusion criteria manually to introduce a semantical analysis. We defined a set of criteria related to the study characteristics that could undermine the validity of the meta-analysis and the bias of results, namely: (1) No presentation of results/descriptive article; (2) Document compilation, not entirely related to our research scope; (3) Study does not use microworlds; (4) Results provided from self-reports; (5) Results provided from time counting or items counting; (6) Results not related to achievement or learning; (7) Results do not allow the calculus of Cohen's d.
The exclusion criteria were applied by the indicated order. If one study did not accomplish one exclusion criterion, it was automatically excluded since it was not possible to extract all the required information for subsequent analysis. After the application of the exclusion criteria, a total of 10 articles remained.

Codification of the studies
All descriptors that we used to develop this systematic review are available in Appendix A. Following Costa and Miranda's 2017 approach, we have followed the same guidelines of Hedges, Shymansky, and Woodworth (1989) for codification of the studies in their meta-analysis and then adapted to our work context.

The meta-analysis
As most of the systematic reviews, our 10 studies contain a set of substantial differences that could make it difficult to define the variance between them and the implications for understanding the effect of microworlds with different teaching methods. However, the meta-analysis process can mitigate those issues (Borenstein et al. 2009). In a first approach, we combined the studies through the effect sizes of each study using Cohen's d, and we converted the studies that did not present this measure. Conducting a meta-analysis with effect sizes to synthesize our studies has several advantages to accurate the effect of microworlds on learning for its precision and clear method (Borenstein et al. 2009;Hedges and Pigott 2004). This method also enables us to understand the factors and characteristics of the studies that influence the effect size and to explain the differences of the results between them (Coe 2002).
In a second approach, we combined the studies through the p-values. The p-curve method provides a meta-analysis method for determining the likelihood that an effect can be explained as a result of publication bias or p-hacking (Simonsohn, Nelson, and Simmons 2014a). A p-curve looks at the distribution of significant p-values and attempts to determine whether the distribution has a right-skew, which are diagnostic of evidentiary value (Simonsohn, Nelson, and Simmons 2014a). We followed the more specific and robust procedure described in Simonsohn, Nelson, and Simmons (2014b), and as applies recent meta-analyses in psychological research (Nelson, Simmons, and Simonsohn 2018).
Finally, we analyse the results, their limitations and conclusions.

Results
The main characteristics of the 10 selected studies are available in Appendix B. The first five studies were performed with an expository teaching method and the last five studies with a cognitive method. We present the results of all studies after the experimental treatment. These studies were conducted in three different continents represented by six countries, namely, USA, Brazil, Israel, Belgium, Germany and Portugal. Most of them were conducted in the area of computer science, but there is some expression of the natural sciences and the arts. All of the studies measured the achieved knowledge in an academic field and most of them used tests as data collection instruments, except for Pfahl, Koval, and Ruhe (2001) study, which used projects.
From this combination, we verify in both groups a large effect size in studies conducted in K-12. These studies were implemented in different areas of knowledge (computer science, drama and biology), so the area does not seem to be the cause. On the other hand, the studies implemented in high school present low or negative effect size. These studies were only conducted with the expository method. Studies in this age range that use cognitive methods must be implemented in further research.

Effect size analysis
The studies with more than one outcome and treatment have the results correlated (Borenstein et al. 2009). Borenstein et al. (2009) suggest two solutions to deal with this issue. The first consists of the study of the effect of each experimental group (EG) on the effect of the CG separately, which would result in an increase in the power of that study relative to others and bias the estimate ( Van den Noorgate et al. 2013). The alternative solution is to analyse the difference of the effect size between the two EGs. However, our data could not follow this option due to some differences in measurements between studies. For these cases, we followed the Van den Noorgate et al.'s (2013) suggestion and selected only one effect size per study. The same authors also recommend the use of random-effect model for combinate the studies with these characteristics. Table 3 presents the Cohen's d of each study (column d), their confidence interval to 95% (column 95%-CI), the weight of each study using random-effect model (column Weight) and the teaching method (column Method). Table 4 synthetizes the Cohen's d and the heterogeneity test by method. The results revealed a significative difference between the subgroups (Q = 12.51; p < 0.01). Figure 1 illustrates the forest plot with all studies. TE means the effect size and seTE means the standard error of the effect size. From the forest plot, we verify that there is a difference of less than 10 percentage points between the subgroups and both have five studies. These characteristics allow us to compare fairly the use of the expository and cognitive methods because the distribution is similar. Table 4 and Figure 1 evidence that the combination of the subgroup with the expository methods contains a very low heterogeneity (only 14.4%) with a combined effect size (d = 0.24) considered small in educational field (Hattie 2009). On the other hand, the combination of the subgroup with the cognitive methods contains a moderate heterogeneity (53.8%) with a combined effect size (d = 1.03) considered strong in educational field (Hattie 2009). These results point out to a strategy in classroom that use microworlds combined with cognitive methods not only is it better than with the expository methods, but it also has a strong impact on knowledge acquisition since it is expected that 84% of the subjects of the CG would be below the average subjects of the EG (Coe 2002). Meanwhile, microworlds combined with expository methods do not indicate a significant effect since only 58% of the subjects of the CG would be below the average subjects of the EG (Coe 2002).

P-curve analysis
Of the 10 studies selected in the analysis, only 6 had significant p-values and were therefore included in the p-curve analysis. The resultant p-curve can be seen in Figure 2. The results of the associated full p-curve and half p-curve statistical hypothesis tests indicate that there is evidential value (p < 0.001 and p < 0.001, respectively); a power analysis gives an estimated power of 92% (CI: 69%-99%). Together, these results indicate that it is likely that the observed p-values cannot be explained as result of publication bias or p-hacking and therefore indicate true evidentiary value. We note that all but one of the p-values for the expository analyses are not significant at the 0.05 level, whereas all but one of the p-values for the cognitivism analyses are significant. Running a p-curve analysis for the cognitivism analysis returns similar results, with an indication of evidential value (p < 0.001) and an estimated power of 96% (CI: 82%-99%).

Conclusions
Considering the used methodology in this study, we seem to emphasize the combination of the traditional meta-analysis, with the measurement of the traditional effect size and the p-curve analysis as main criterium, an emerging methodology that reinforces the results of the traditional meta-analysis. We suggest that, in future meta-analysis, researchers should focus on combining these two statistical analysis methodologies, as they are complementary. We also point out the use of text mining technique to select the articles that would be included in the meta-analysis as a great technique whether the dictionaries are appropriate. In our study, to test the robustness of our dictionaries, we applied the inclusion and exclusion criteria manually to 30 random studies from the search criteria and they matched the exact result of the process with text mining.
Regarding the results obtained in this work, the most salient is that cognitive methods associated with the microworlds have evident and significant effects on the acquisition of knowledge by students. The traditional method, namely, the expository method, has little impact on student learning, without statistical significance. One of the characteristics of the expository method is that it is centred on the teacher's strategies, reserving a less active role for the students. Today, we know that student activity is important in knowledge acquisition. Cognitive methods attach great importance to student cognitive activity and base their strategies on the way humans' process information (Ausubel 2000;Gagné 1985;van Merriënboer 1997). They also consider the results obtained from experimental research on cognitive architecture and the functioning of human memory (Anderson 1993;Baddeley 1997;Mayer 2014;Sweller 2011). We think that it is these characteristics of cognitive methods associated with the characteristics of the microworlds that facilitate and optimize student learning. Constructivist teaching methods associated with the microworlds, although widespread, did not enter the meta-analysis because they did not meet the inclusion and exclusion criteria. Many of these studies have no empirical results, and when they do, it is primarily about describing experiences and subjective analysis of student motivation. Another point to note is that the instruments used to collect empirical data are not always reliable. For example, simply counting programming keywords in codes developed by students.

11
(page number not for citation purpose) As a suggestion for future work, we think it would be of all convenience and useful to develop experimental work that meets the inclusion and exclusion criteria used in this meta-analysis, associated with constructivist teaching methods. Another idea that we take from this work is the reuse of dictionaries used as inclusion criteria to study other educational resources, even without being associated with technological tools. We also suggest that studies with a longer duration may be developed in the future, for example, a semester or an academic year, as most of those we find are of short and/ or medium duration.

Funding
The work by Joana Martinho Costa