Abstract
With the ubiquity of technology, multimedia learning, such as videos, slide decks, wiki pages, blogs, and eLearning, etc, has been integrated into almost every learning program. Meaningful multimedia instruction should be designed in a learner-centered approach, and learner-centered approaches begin with an understanding of how the human mind works and how humans process information. This post, the cognitive theory of multimedia learning-Part 1, will help you demystify these questions.
Theoretical Foundations: How Human Mind Works
The cognitive theory of multimedia learning (CTML) has been popularized by Richard E. Mayer and others (e.g., Mayer & Moreno, 1998; Mayer, 2001; Mayer, 2003; Mayer & Fiorella, 2014; Mayer, 2020). This theory provides evidence-based insight on transforming complex knowledge into a multimedia-based learning representation, which enables learners to process information effectively with limited cognitive capacity. Mayer indicates that “Baddeley’s model of working memory, Paivio’s dual coding theory, and Sweller’s theory of cognitive load are integral theories that support the overall theory of multimedia learning” (2020, pp. 85-99). Mayer further explained, CTML “is based on three basic assumptions about how the human mind works – namely, that the human mind is a dual-channel, limited-capacity, active-processing system” (Mayer, 2020, p. 99):
a) Dual-Channels Assumption: Human process visual/spatial and auditory/verbal information from separate channels. For instance, when visual materials such as illustrations, animations, video, or onscreen text are presented to the eyes, humans begin by processing that information in the visual channel. On the other hand, when auditory materials such as narration or non-verbal sounds are presented to the ears, their information is processed in our auditory channel (Mayer, 2020). The concept of separate information processing channels has a long history in cognitive psychology and is most closely associated with Paivio’s dual-coding theory (Clark and Paivio, 1991; Paivio, 1986, 2006) and Baddeley’s model of working memory (Baddeley, 1999; Baddeley, et al., 2015). Mayer believes “the verbal and visual channels in our working memory can be used for processing information simultaneously thus enhancing the process of learning” (Mayer, 2020, p. 90).
b) Limited-Capacity Assumption: this assumption indicates that the amount of information that can be processed in each of our processing channel at a time is limited. As an example, when an animation is presented to a learner, he/she could hold only a few images in his/her visual channel of working memory at any moment, reflecting only portions of the presented material rather than its exact copy. Similarly, when a narration is presented, the learner could only perceive a few words in the verbal channel of working memory at any one time (Mayer 2020). The conception of limited capacity in consciousness also has a long history in psychology. Some modern examples are Baddeley’s (1999; Baddeley et al., 2015) theory of working memory and Sweller’s (1999; Kalyuga, 2011) cognitive load theory.
c) Active Processing: humans do not learn by just passively absorbing information. Instead, humans actively engage in cognitive processing of information to construct a coherent mental representation of their learning experiences. This active cognitive processing includes highlighting relevant information received, organizing them into a coherent cognitive structure, and integrating them with prior knowledge (Mayer (2020)). Mayer regards humans as “active processors who seek to make sense of multimedia presentations” (Mayer, 2020, p. 95).
How Human Process Information
In accord with the dual-channel assumption, “the sensory memory and working memory is divided into two channels – the one across the top deals with auditory sounds and eventually with verbal representations, whereas the one across the bottom deals with visual images and eventually with pictorial representations” (Mayer, 2020, p. 100). According to the limited-capacity assumption, “working memory is limited in the amount of knowledge it can process at one time so that only a few images can be held in the visual channel of working memory and only a few sounds can be held in the auditory channel of working memory” (Mayer, 2020, p. 100). As the active-processing assumption indicates, learners actively select knowledge to be processed in working memory, organize the material in working memory into coherent structures and integrate the newly acquired knowledge with knowledge stored in long-term memory (Mayer, 2020, p. 101), which can in turn facilitate new information processing that comes into the working memory.
Based on the three assumptions stated above, Figure 1 below represents multimedia learning in the human information processing system.
Figure 1
Mayer’s Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2020)

The central work of multimedia learning takes place in working memory. The left side of Figure 2 represents the raw information/materials (i.e. visual images of pictures and sound images of words) that comes into working memory. The metal conversion of a sound into a visual image and vice versa are represented by the arrow from sound to images and the arrow from images to sound respectively. The right side of Figure 2 represents the long-term knowledge (i.e. pictorial and verbal mental models and links between them) that is constructed from working memory. The long-term memory can store large amounts of knowledge over a long period of time, unlike working memories (Mayer 2020). The red box on the right side labeled Prior Knowledge indicates that a learner’s relevant prior knowledge in the long-term memory can in turn facilitate new information processing that comes into the working memory. When learning new knowledge, relevant knowledge storehouse helps working memory to absorb new information and organize it into established mental representation in the long-term memory.
Where Humans Store Knowledge
Humans store the information in three types of memories: a) sensory memory where we receive stimuli and store it for a very short time, b) working memory in which we actively process information and create mental models or schema, and c) long-term memory where all things that have been learned are stored (Mayer, 2020). Table 1 summarizes the characteristics of the three memories drawn from the CTML.
Table 1
Three Types of Memory Stores based on Mayer (2020)

The challenge for instructional design is to align instructions with how people learn. In the next article, I will describe the multimedia principles that will help you to design effective multimedia instructions.
⭐️ This article is an original work that I initially published on LinkedIn Article. Please add citation when sharing.