Discontent with content analysis of online transcripts

10.1080/09687760903033066 ALT-J, Research in Learning Technology 0968 7769 (print)/1741-1629 (onli e) Original Article 2 09 ssoci tion for Learning Techno ogy 7 000July 2009 D JudithE riquez judithenriquez@hotmail.comContent analysis has dominated computer-mediated communication and educational technology studies for some time, and a review of its practices applied to online corpus of data or messages is overdue. We are confronted with complexity given the various foci, nuances and models for theorising learning and applying methods. One common suggestion to deal with the complexity in content analysis is a call for standardisation by replication or systematic research studies. This article presents its ‘discontent’ with content analysis, discussing the issues and concerns that surround the analysis of online transcripts. It does not attempt to resolve nor provide a definitive answer. Instead, it is an open inquiry into another way of looking at online content. It presents an alternative or perhaps an extension of what we have come to know as content analysis. It argues for the notion of genres as another way of conceptualising online transcripts. It proposes two things: first that in performing transcript analysis, it is worthwhile to think how messages relate to a system of interactions that persists even beyond the online environment; secondly, there is an emergent and recurring metastructuring that is at work in online environments that is worth exploring, instead of imposing structures – models and frameworks that do not fit the emerging communicative practices of participants.


Introduction
Content analysis (CA) is a research technique that is not singular.Rather, it is multiple in both its definition and application whether these are qualitative, quantitative or a mix of both.In its application, determinist assertions of collaboration and community to the online aspect of learning have tended to confine it to the recorded online transcripts (i.e.messages), and consequently, the context of learning has been bounded in the design and use of a particular technology.
Consensus-bound discourses of collaboration and community have been reinforced by the research techniques used in transcript analysis.The storage of messages in database files made research data readily available for analysis.Then in the 'hands' of researchers or data analysis software applications (e.g.NVivo), message transcripts become a collective of constructed knowledge produced by the participants.The knowledge reified in the transcripts is further investigated through 'chunking' of messages and assigning of codes.This technique or procedure is confronted with issues that are not easily resolved.
Most current educational technology studies emanate from a socio-constructivist background and pedagogical perspective.However, when we come to the constructs to be measured with CA, there are many of them.These include knowledge construction, online presence, interaction patterns or learning strategies (De Wever et al. 2006).It is a widely shared view that the available transcripts are the key to understanding online learning.Various models and tools have been developed to facilitate transcript analysis (e.g.Henri 1992;Gunawardena, Lowe, and Anderson 1997;Garrison, Anderson, and Archer 2000).
It is commonly the case that the recorded messages compel researchers and academics alike to ask questions where answers may be readily found within the message content.Are the participants making statements or asking questions?Is the discussion advancing beyond the sharing stage into negotiation and knowledge building (Gunawardena, Lowe, and Anderson 1997)?Is there evidence of cognitive presence, social presence (Rourke et al. 1999) and/or teaching presence (Anderson et al. 2001)?
Research findings from content analysis of transcripts offer, therefore, multiple descriptions regarding the cognitive and interpersonal characteristics of the technology.Inevitably, this diversity blurs the purpose of transcript analysis.Generally, its purpose is to provide a product of proof of what we have come to believe to be its educational value in the first place (e.g. to self-regulate learning, to build a learning community).The context of the transcripts -the technology itself, the physical setting and the conditions in which the transcripts are produced -is usually omitted in doing the analysis.
Although taxonomies have been developed to analyse the message content of teachers and students in a computer-mediated environment, little is still known about how learners communicate or interact online both as lurkers and participants in relation to the messages of their fellow students and that of their teacher.
This article presents its 'discontent' with content analysis.It discusses the issues and concerns that surround the analysis of online transcripts.It does not attempt to resolve nor provide a definitive answer.Instead, it is an open inquiry into another way of looking at online content.One may argue that using CA with other techniques and methods would deal with some of its limitations.However, the argument of this article goes beyond the need for mixed methods.We already know that.It explores the restricted manner on how we proceed to do content analysis itself.In which case, it presents an alternative or perhaps an extension of what we have come to know as content analysis.It argues for the notion of genres as another way of conceptualising online transcripts.It proposes two things: first, that in performing transcript analysis, it is worthwhile to think about how messages relate to a system of interactions that persists even beyond the online environment; secondly, there is an emergent and recurring metastructuring that is at work in online environments that is worth exploring, instead of imposing structures -models and frameworks -that do not fit the emerging communicative practices of participants.First, it reviews the frameworks and models and the unit of analysis in which popular CA is based.Then, it proceeds to highlight its methodological issues, particularly in applying CA quantitatively, before offering genre analysis as an alternative in attending to the communicative practices of participants in online environments.
The idea of using genres to study communication is not new.It has a rich tradition within the field of literary analysis (cf.Bakhtin et al. 1986), and is emerging as a useful way to explain social action in cultural studies (cf.Brown and Duguid 1991).
For more than a decade, it has been applied to the notion of organisational communications and specifically to online communications (e.g.Orlikowski and Yates 1994;Yates, Orlikowski, and Okamura 1999) and most recently, in terms of weblogs (e.g.Herring et al. 2005) and discussion forums (e.g.Enriquez forthcoming).

Current methods and approaches
Content analysis has dominated computer-mediated communication (CMC) and educational technology studies for a while now.And there appears to be an unquestioned link between the socio-constructivist view of learning and the methods and approaches of educational technology research.This almost tells the story that the technology is inherently constructivist or collaborative in its design or that it may be designed to be collaborative.
Furthermore, although the theoretical basis of such studies is considered to be established, a link between the theory and the methodology or methods is not always clear or present (de Laat and Lally 2004;de Wever et al. 2006;Rourke et al. 2001).

Models of online transcripts
The automatic creation of machine-readable transcripts of interactions makes the use of CMC in performing learning tasks quite unique (Harasim et al. 1995).It also provides researchers a 'ready-made' source of data.It is a widely shared view that the available transcripts are the key to understanding 'online learning'.Various models and tools have been developed to facilitate transcript analysis.For example, there is Henri's (1992) model.Perhaps, it is the most popular one.It has been used in other studies, for example, Hara, Bonk and Angeli (2000) and McKenzie and Murphy (2000).Henri has developed the most sophisticated cognitive analysis model for online interaction: explicit interaction through direct answer or comment; implicit interaction through indirect answer or comment; and independent statement.However, it has also been criticised, particularly by Gunawardena, Lowe and Anderson (1997), who happened to have developed what is considered the next popular model of transcript analysis.They argue that Henri's model is limited in focusing on a teacher-centred instructional paradigm that is inappropriate in a constructivist environment based on a learner-centred construction of knowledge.They theorised that the active construction of knowledge is a movement through five phases in an interaction analysis model: sharing/comparing information; discovery and exploration of dissonance or inconsistency among participants; negotiation of meaning/co-construction of knowledge; testing and modification of proposed synthesis or co-construction; and agreement statement(s)/application of newly constructed meaning.
Lastly, there is the community of enquiry model developed by Garrison, Anderson and Archer (2000).This is a framework that classifies and identifies 'presence' in online transcripts in three ways: the cognitive, social and teaching presence of both teachers and learners.
Models and frameworks have been useful as 'ordering mechanisms' on how to represent the nature or characteristics of knowledge construction, for example, in terms of cognition for Henri (1992) and presence for Garrison, Anderson and Archer (2000).
We have been so focused on evaluating the impact of computer-mediated environments on teaching and learning based on the educational value that we ourselves (or perhaps government initiatives and policies) have inscribed into the use of technologies.We have bounded the learning environment within the virtual (i.e. the technology itself) and omitted the 'real' learning environment in our analyses.
Besides, while (quantitative) CA may provide statistics regarding the number of messages, who contributes and how often and message coding of content, these have proved to be insufficient in indicating the impact of CMC usage in terms of meeting teaching and learning objectives.

Methodological issues of (quantitative) content analysis
This section considers the methodological issues of quantitative CA based on the review of 19 commonly cited studies involving content analysis that have been published over the past decade reported in Rourke et al. (2001); a review of 15 articles by de Wever et al. (2006); and in a recent review of CSCL conference proceedings (2001, 2002 and 2003) which included a total of 31 papers in Strijbos et al. (2006).Rourke et al. (2001) identified six fundamental issues of content analysis of conference transcripts: criteria of content analysis, research designs, types of content, units of analysis, ethical issues and software to aid analysis.Their paper was introduced with a scenario: Professor Jones who had just completed her first university course delivered entirely online.It was a 13-week course that generated 950 messages, all captured in machinereadable format.Almost instantly, she had data ready for analysis.She could test her hypothesis that online her students engaged in much higher levels of discourse and discussion than in face-to-face instruction.Further, she was interested in investigating the impact of collaborative learning activity that was assigned in the middle of the course.How she hired two graduate students to do the analysis for her as she did not have much time to do the analysis herself.Only to find out after two weeks, that the two coders failed to agree on the categorisations at 70%, the first one had 2032 incidents and the other only 635.And to add salt to injury, Professor Jones had learned that she could not use the 'ready-made' data from her online course as she had not asked the students' consent.(Rourke et al. 2001, 8) This scenario captured more than the six methodological issues identified and discussed in their paper.It also paints the general picture of the arrangements within which CA is used and the discourses and practices that frame computer conferencing in further and higher education.First, it was an online course.Second, an online task is a 'collaborative learning activity'.Third, analysis was intended and usually confined within the transcripts.Fourth, transcripts were perceived to be a 'document of learning', in this case, of higher order thinking, in an online environment.Fifth, it was text-based and presumably asynchronous.
The above scenario presents us with a familiar script of CMC regarding its flexibility: "The capacity of computer conferencing to support interaction among participants while providing for temporal and spatial independence creates a unique and valuable environment for distance, distributed, and lifelong learning applications" (Rourke et al. 2001, 8).
The focus on the issues identified by Rourke and his colleagues was quantitative in nature.They discussed four criteria to which the resulting descriptions or relationships that are derived may be considered valid: objectivity, reliability, replicability and systematic coherence.In terms of research design, 18 out of the 19 papers they reviewed were descriptive.
They brought to the fore manifest and latent content, wherein the former refers to surface content of transcripts that is easily observable, for example, the number of times students addressed each other by name (Rourke et al. 1999) and the latter, less obvious constructs, themes or variables, such as those ones modeled in Henri (1992), Gunawardena, Lowe and Anderson (1997) and Garrison, Anderson and Archer (2000) above.
Based on Berelson's (1952) definition, CA in communication research is "a research technique for the objective, systematic, quantitative description of the manifest content of communication" (1952( , 519, cited in Rourke et al. 2001, 4) , 4) and that for the sake of scientific objectivity, coding has to be restricted to manifest content (Holsti 1969;Riffe, Lacy, and Fico 1998).Besides, as Strijbos et al. (2006) admitted, latent variables are more difficult to replicate in the first place.
However, this would not interest educational researchers and theorists much.Their main interest with online transcripts has been on covert processes and latent variables relating to the social construction of knowledge (e.g.Gunawardena, Lowe, and Anderson 1997).Knowledge construction has been the basis for studies on higher-order cognitive processes, for example, critical thinking (e.g.Bullen 1998;Garrison, Anderson, and Archer 2001;Newman, Webb, and Cochrane 1995).It is the most common 'covert' variable in those reviewed by Rourke et al. (2001), de Wever et al. (2006), and the analysis involved is argued to be inherently subjective and interpretative.
So a preferred definition of content analysis might be that it aims "… to reveal information that is not situated at the surface [that is, 'manifest'] of the transcripts" (de Wever et al. 2006, 7).It is at this point that the objective and systematic content analysis defined by Berelson (1952) runs into some issues as analysts move from manifest content analysis to latent content analysis, particularly when the variables under study are latent projective, for example, a variable 'use of humour' (Rourke et al. 1999) does not reside in the surface meaning of the content, rather it resides in the interpretation of the coder drawing from his/her cultural background, age and personality type.
The investigation of latent variables has usually been framed within frameworks and models as shown in the previous section.The work of Henri (1992), which has been upheld and adapted in many other research studies, has been also been one of the most criticised.It was said to be subjective in Rourke et al. (2001) and restricted and inadequate to judge the quality of interaction in Meyer (2004).
Such criticisms raise the issue for some researchers, such as Rourke and his colleagues (e.g.Rourke et al. 2001;Rourke and Anderson 2004) and de Wever et al. (2006), that content analysis still lacks rigour in terms of reliability, validity and replicability to foster scientific quality and status in the educational technology research (e.g.e-learning, computer-supported collaborative learning, asynchronous learning networks).Rourke et al. (2001) identified five types of unit of analysis: sentence, paragraph, message, thematic, illocutionary.They said there seems to be a negative correlation between a reliable coding scheme and one that encompasses the construct under investigation.On the one hand, fixed units are objectively recognisable or manifest; however, they do not properly encompass the construct sought.On the other hand, a dynamic unit (e.g. unit of meaning) properly delimits the construct, but its latency invites subjective and inconsistent identification of the unit.

Unit of analysis reconsidered
Furthermore, according to Strijbos et al. (2006), there are four contextual constraints that affect the applicability of a unit of analysis smaller than a message: (1) Object of the study: here the difference between manifest and latent variables is the main consideration.Qualitative content analysis, which is the common one in educational technology research, refers to latent variables such as 'knowledge construction', that could not be directly observed from the transcript.This has to be inferred in retrospect.
(2) Nature of communication: here the difference between verbal and written communication and between synchronous and asynchronous media becomes important.In asynchronous text-based exchanges (e.g.forums), messages are usually compound sentences, and telephonic and oral styles are intermixed; while in chat rooms, messages are commonly short and resemble oral communication.
(3) The collaboration setting: the collaboration setting or the type of task also influences the applicability of a unit.A task focused on a given topic of discussion leads to less coordinated exchanges.While in a project-based task coordination becomes important and so messages have to link and address the different requirements towards the completion of a project.(4) The technological tool: the tool influences the applicability of a unit.Take, for example, the difference between a forum and a chat: messages in threaded forums tend to be longer and may focus on multiple issues, while chat postings evoke short statements usually regarding a singular topic and the frequency of exchange tends to be lower due to its asynchronous nature.
To take into account the above contextual constraints, we need a level of analysis that enacts emerging affordances of knowledge in the system of interaction between the environment and the design of tasks using technologies and not in terms of predefined attributes.A unit of analysis must not focus on discrete blocks of content alone; emphasis must instead be given to the referential properties of content.
Aside from the fact that there seems to be no consensus on the unit of analysis, there is also no agreement on what index to use to test and report inter-rater reliability between two coders.De Wever et al. (2006) identified at least nine. 2 It does not stop there: there is also no established standard to judge the level of reliability: Often a cut-off figure of 0.75-0.80 is used; others state that a value of 0.70 can be considered as reliable (Neuendorf 2002;Rourke et al. 2001). (de Wever et al. 2006, 10) This further magnifies the issue that CA applied to online transcripts has failed to adhere to objectivity, reliability, replicability and systematic coherence that make quantitative research valid (Rourke et al. 2001).This is not to say that quantitative CA is 'doomed' but that it has to be performed with thoughtful consideration.In fact, for those who would like to pursue the validity of quantitative CA, see Rourke and Anderson (2004) as this article follows a different lead.

Situated content analysis
Instead of breaking messages into blocks of semantic units, it is worthwhile to think how they may relate to a system of interactions, that is, the patterning that persists even beyond the online environment, which include the histories, constructs and habits of participants.
There are variations in the context in which learning with technology is situated.According to Baym (1995), there are five contextual elements that are rarely addressed.These are: (1) The outside environment in which the use of CMC is set.
(2) The temporal structure (e.g.synchronous or asynchronous) of the group.
(3) The infrastructure of the computer system.(4) The purposes for which CMC is used.
(5) The characteristics of the groups and each member.
These contextual elements structure the interaction between the technological environment and the pedagogical design that teachers and students may become involved in.
Before one can proceed with CA it is important to understand that recorded messages are limited in their ability to provide insight into the processes and practices that act upon student learning, regardless of whether they are collaborative or not.Aside from the contextual constraints identified by Strijbos et al. (2006) above, electronic engagements always require both reading and writing.'Frozen' electronic messages cannot possibly capture the interpretive moment.We rely on the written and the visible, because that is what we can monitor and control.

Genre analysis as an alternative
CMC has significantly changed the manner in which we write and talk.Online, we write to speak and we have written conversations.We have learned to write and talk in a singular communicative act (so it seems).And this point should be considered in our transcript analysis.It is argued here that we should ask, how do participants 'write' talk in an online environment?What are the communicative strategies that are adapted?
For example, in a forum online texts are commonly organised into discussion threads.Each thread is intended to visually depict a particular topic of conversation.However, as we know in oral conversation, in the process of communication, the conversation flows not because a particular topic is sustained, but because the interlocutors are able to in fact change the topic or refer to other things and able to take turns and repair breaks in the exchange.
In doing CA, the focus has always been confined within the online data retained in the electronic environment and the extent 'chunks' of messages fit into models and taxonomies we have developed given our educational value of knowledge construction, collaboration and community we ourselves have inscribed to online talk/text.It is about time that we analyse content not merely as 'stuff contained' but as situated and 'continued' in other places and technologies.It is for this reason that it becomes paramount that we take into account communication itself in the way we analyse online content.
In this section, genres are described as an alternative way of analysing online transcripts.Genres are considered as an analytical frame to consider language alongside the communication medium.
A genre is a patterning of communication created by a combination of the individual, social and technical forces implicit in a recurring communicative situation.A genre structures communication by creating shared expectations about the form and content of the interaction, thus easing the burden of production and interpretation.(Erickson 2000, 2) Genres have identifiable form and purpose (Orlikowski andYates 1994, 2002).These provide interlocutors with cues for electronic discourse.First of all, genres are socially constructed and shared (e.g. a discussion about assessment).Its form refers to its medium (e.g.discussion board), its structural features (e.g.letter format) and its linguistic features (e.g. level of formality, or graphic devices).The heart of the matter is that a genre has a recognisable form, but the form is what best enables a purpose.So in focusing on its form, the question is 'what purpose is being fulfilled?'This purpose may be multiple.
In different situations, participants draw from existing genre norms to accomplish a communicative action.The example provided by Yates and Orlikowski (2002) is that of choosing a letter template rather than an informal note genre for composing an e-mail message that is addressed to an unfamiliar international correspondent.In short, genres provide a template for interaction between members of a community.The particular genre template of a community is an important resource in facilitating efficient communication.In an online environment, individuals may draw on different genre norms out of habit and base genre norms on previous experiences to facilitate specific communicative act.
Secondly, genres are context dependent and interdependent.They shape, but do not determine the relational cues influenced by the technological environment, task design, previous genres used and the social relationships of those involved.People participate in genre usage rather than control it.One genre exists alongside others and is influenced by them.They are coordinated and combined to accomplish a specific purpose or communicative act (e.g.Mulholland 1999;Orlikowski and Yates 1994).Even though genres are dynamic entities that adapt to a change of circumstances, they develop regularities of form and substance.These regularities become established conventions and begin to influence all aspects of communication.
Prior studies in organisation and media science have shown that: (1) participants employ genres that accomplish the task at hand and the absence of certain genres provides information about their perception of the context of their interaction (e.g.Orlikowski and Yates 1994); (2) the genres used are initially and implicitly imported from the communicative practices use in other contexts (e.g.Orlikowski and Yates 1994); (3) there are 'key' participants who are able to explicitly shape or change the genres initially used (e.g.Yates, Orlikowski, and Okamura 1999).
In short, there are genres that serve as foundation blocks, but there are those that are specific to practice situations.If certain genres are the foundation of the genre templates for communicative interaction in an online environment, then it is important that these be supported at the very start of a course or activity.If other genres are important in a certain context of use only, then it is relevant for the tutor/teacher to know which contexts these might be and who the 'key' participants are.
In an educational institution, the genres would include lectures, course programmes, assignments (essays), exams, etc.Thus, in this case, the students would be likely to fall back on education-related genres and communicative practices that involved the use of electronic media (e.g.email, chat, text messaging, blogs).
Since genres are context dependent and interdependent in their social construction, they will always be identified by reference to conventions established in other related social activities in both physical and virtual settings.As such they are identified within transcripts, but also reflect the wider cultural contexts in which the transcripts were produced.In short, their textuality is relational -always simultaneously 'in here' and 'out there'.The genre patternings become recognisable in their relatedness to other things in a recurring and meaningful fashion.As already mentioned, the use of genre analysis is not something new.However, it has not been widely used as a research technique in CMC studies.
Before concluding this paper, 'snapshot' illustrations of the genre patternings identified in blogs (see for details Herring et al. 2005) and also those that were found in discussion forums (see for details Enriquez forthcoming) are briefly described.
Genre analysis proceeds with two main foci: purpose and form.Herring et al. (2005) used coding categories established in previous content analytic approach on Web genres (e.g.homepages).They also used grounded theory approach to allow for the unique characteristics of a blog genre to emerge.They coded 203 blogs with a total of 44 elements, which included structural and temporal features.It is no surprise that blogs were found to serve multiple purposes: as filters, personal journals, k-logs and also as a mix of these.Examples of structural features that were identified include: number of links and images, presence of a search feature, advertisements, calendar, archives, etc.At the blog entry level, body features such as images, links (to own website/s and/or blog/s, to others' websites and/or blogs) were coded.
Contrary to the claim that blogs are link-centred filters of Web content (e.g.Blood 2002), the sample reflected low incidence of links in the entries, a reflection of the prevalence of personal journal-type blogs.Blogs have features similar to homepages, asynchronous discussion forums and text-based forms of interactive CMC.
In Enriquez (forthcoming), a total of 174 postings from three discussion forums were coded.The initial categories for coding purpose and form in the postings were adapted from Firth (2002).Categories for purpose included individual comment, response or solicitation, group comment, response or solicitation.Categories for form included presence or absence of a greeting and/or closing remark.As Herring et al. (2005), a grounder theory approach was used to keep 'in view' emerging genre patternings and communicative strategies in the postings coded.For example, regularities and repetitions of individual and/or group comments, responses and solicitations were signaled or invoked by changing the subject heading.The subject heading had been used in at least two ways in the forums: (1) as a headline that provided the reader with brief information of what the posting is about; and (2) as a turn-taking strategy to interrupt, elaborate or change the subject or topic 'at hand'.However, there were instances when the subject heading was completely ignored.Participants threaded their postings under a given heading not because they had something to say about the subject 'at hand', nor were they responding to the posting to which they were threaded.In fact, they were merely writing 'talk' after the 'last' or latest interlocutor in the forum.In short, it was a temporal structuring that was an attempt to adapt a turn-taking mechanism for written 'talk'.

Conclusion
In reviewing the frameworks and models and the unit of analysis in which popular content analysis is based, this article has described its discontent with transcript analysis as predominantly practised and applied to electronic content.It has highlighted the methodological issues of CA and has brought into focus communication itself in investigating the communicative practices of participants in online environments.
This article presented the methodological issues surrounding the application of content analysis.It proposed genres as an alternative analytical frame for analysing online transcripts in CMC and educational technology studies.First, it suggested that our written conversations must be understood both within and outside the medium and the activity -that the act of 'writing talking' online is a confluence of many streams of activity richly equipped with tools, materials, experiences and purposes.This would allow us to understand learning as arising from a person's historical and multiple relationships and conditions rather than from magnifying the visible design of technological environments and analysing learning or knowledge construction through recorded messages.Secondly, we must turn our attention to communication or language itself.We have to understand what it means to 'speak' in writing.And with this intent, the 'discontent' with content analysis led to the identifiable form and purpose of genres as patternings of communication shared by participants in online environments (see Enriquez forthcoming).This allows us to see the communicative strategies and cues that are used in electronic discourse.The importance of metastructuring technology (cf.Orlikowski et al. 1995) through the interdependent and recurring patterns of genres becomes paramount in understanding the complex layering of online text.This emphasises that communication captured in online transcripts is not a mere carrier of content, but is an organising process crucial to what gets said and done and by whom.Furthermore, the structures that emerge and are maintained become themselves additional resources, strategies and communicative cues for further organisation in communicative actions or practices (cf.Crystal 2001;Davis and Brewer 1997;Orlikowski and Yates 1994).
In future studies, I suggest that we draw upon genres to understand further the communicative practices that develop with the convergence of speech and writing online to accomplish a given learning activity or to fulfill an academic purpose or goal.
article.Thanks to Frances Bell for her encouragement and input.Lastly, I have to express my gratitude to Martin Oliver who has been a great adviser, a clear thinker who has helped me through the muddles of my own 'complex' intellectual meanderings.