Learning English Speaking through Mobile-Based Role-Plays: The Exploration of a Mobile English Language Learning App called Engage

Bowen Yang, Shijun Zhou and Weijie Ju
EF Labs, Education First




Engage is a new form of mobile application that connects students studying English with teachers in real-time via their smartphones. Students receive target language through preparation dialogues, and then apply it to a role-play with a teacher. The conceptualization and development of Engage follows the user-centred design approach; and the product was built through multiple iterations: in the first iteration, students were invited to try out a paper mock-up; in the second iteration, students tried out a mobile prototype; in the external test, a fully functional application was released to App Store between October 25 and November 20, 2012, and 326 users downloaded it. The application was well-received by these test users, reflected in the post-study survey, student ratings, and students’ usage records. The external tests proved that the technical environment of the application was feasible for production; and the operationalization of the teacher service and cost model were also proven to be feasible and scalable.

Keywords: Mobile Assisted Language Learning (MALL), user-centred design, role-play.


1. Background

Mobile Assisted Language Learning (MALL) has been widely recognized as providing “portability”, “social interactivity”, “context sensitivity”, “connectivity” and “individuality” for language learners (Miangah & Nezarat, 2012, p.311). Over the past decade, a variety of mobile devices have been tried in the field of language education. PDA and QR Code have been used to design English learning systems and task-based language learning courses (Liu, 2009); mobile phones have been used to teach French listening and speaking skills (Demouy & Kukulska-Hulme, 2010); GPS on smartphone has served as a tool for contextual micro language learning for Mandarin vocabulary (Edge, Searle, Chiu, Zhao & Landay, 2011); and Nintendo DS Lite was used to carry out TOEIC self-studies (Kondo, Ishikawa, Smith, Sakamoto, Shimomura & Wada). However, the above-mentioned mobile devices covered in existing research are relatively out-of-date. With the advance of technology, smartphones are the dominant mobile device today. Education applications on smartphones are springing up, revolutionizing the way people learn foreign languages. In the App Store chart of the Education category on the Chinese market on Jan. 21, 2013, 39% of the top 100 free apps and 34% of the top 100 paid apps were for language learning.

The booming mobile learning market and the promising trend in MALL has inspired the team in EF Labs (1) to design and develop a new mobile application which can serve as an add-on service for the language learners who study in Englishtown. However, although a large number of language learning apps are available in the smartphone app markets, published research on mobile language learning applications is limited; and publications on the subcategory of teaching speaking language on smartphones are almost non-existent.

To understand the most current developments in the industry, several top-ranked applications from the App Store were reviewed. In summary, SpeakingPal English Tutor has dialogues that students listen to, speak and review through its built-in speech recognition. AutoSpeaking adopts a 4-step teaching method: repeat the dialogue after a recorded sentence is played; shadow to speak while playing the audio; practise the dialogue through a role-play; and wrap-up by recapturing the words and expressions. All recordings can be revisited later on. Speaking Training and Liulishuo also use speech recognition technology to encourage learners to interact with the device; Liulishuo has further introduced a social element; students upload their recordings and post their rank scores.

The synchronous interaction between the student and a teacher (artificial or real), arguably is crucial in teaching and learning oral language. However, this feature is obviously absent for these top-rated products on App store. The synchronicity of interaction can take two forms: machine-to-human interaction or human-to-human interaction. Real-time machine-to-human voice communication involves expertise in the field of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology (Schafer, 1994), which is outside the scope of this project. And for the human-to-human mode, the barriers include scheduling, sound quality, operation, and cost (Kukulska-Hulme & Shield, 2008).

Due to the gap in the literature, the lack of existing solutions and the operation and cost obstacles, an exploratory and iterative approach was adopted here to design and test the functionalities of the new form of a mobile application.

2. Methodology

2.1. User-Centred Design

The User-Centred Design (UCD) approach was adopted here in the product development process in order to develop useful and usable products (Kujiala, 2003). UCD is particularly suitable for interactive product design. It is also called the human-centred design process. The ISO standard for Human-Centred Design Processes for Interactive Systems states: "Human-centred design is an approach to interactive system development that focuses specifically on making systems usable." (ISO 13407, 1999) In our case, the mobile application required intensive interaction between the product and target users. A UCD approach proved suitable.

Iterative design is one of the principles of UCD, which requires the product to be designed, modified and tested repeatedly. It allows for the complete overhaul and rethinking of design through early testing of conceptual models and design ideas (Rubin, 1994). It is generally agreed that usability is achieved through the involvement of potential users in system design (Karat, 1997). Following this approach, two major phases of iterations were conducted in our research, and in each iteration phase, users were involved to test and validate the product.  The first phase was the internal tests phase, where the concept was developed, refined and validated through fast prototyping; and the second phase was the external tests phase, where the finalized product was pre-released to App Store for 10 days.

2.2. Internal tests

The internal tests involved two iterations. In the first iteration, which happened in May, 2012, five students from Englishtown were invited for tests with paper prototypes. Paper prototyping involves creating rough hand-sketched drawings of the interface to use as prototypes, or models, of a design. It saves time and resources by allowing quick modifications before any real code or development (Snyder, 2003). Test students were asked to work on a role-play task “Renting an apartment in London”, in which a student role-played a tenant talking to a teacher who role-played the landlord. Test students were given the task description and preparation materials with a sample dialogue with target language on a piece of paper. After preparation, students called the teacher through landline to carry out the role-play task. Prompt in the form of text and cue images were shown to students during their conversation over the phone.

The second iteration, which occurred on July 24th, 2012, involved another five students from EF English Centres (2) , who voluntarily joined the test after receiving an invitation from the service staff in the school. In this iteration, a usable prototype on iPhone was developed, as well as a functioning teacher client (a set of browser based webpages to serve teacher operations) with no back-end scheduling system. In the tests, preparation dialogue with translations and language points were presented as Step One in the application prototype, and role-plays as Step Two, conducted by connecting the iPhone with a landline phone. During the test, students were given an oral description of the product by the staff, and then they were given the prototype to play with. After finishing the preparation, a teacher called in to conduct the role-play with the students. Cue pictures were presented on the iPhone with remote control from the teacher client.

A face-to-face interview was conducted with each test student after the tests. Feedback relating to students’ experience using the application, the role-play mechanism and their rating of the product were collected, which was directly fed into the next iteration.

2.3. External tests

After the internal tests, the learning concept took shape and a fully functional application with backend systems was developed. The application was test released to App Store between October 25 and November 20, 2012. Email invitations were sent out to students from Englishtown and 326 downloaded the application onto their iOS devices. Students were able to click the App Store download link in the email and download the application. After logging in, students could book classes for the role-play sessions through the built-in booking page. Four native English teachers from the EF online school were recruited to teach the classes. Classes were scheduled back to back, 15 minutes per slot, and 3 hours per day during the peak hour. In total, 150 classes were delivered; and 46 students have attended classes. For data collection, a survey tool was built into the application. Students were asked to rate the preparation dialogue and role-play task on a five-point scale. In addition, questions relating to the usability of the product, perception of the content, operational side, and rating of the product were covered in the post-study survey. Meanwhile, the students’ attendance record was kept in the cloud server for later analysis.

3. Product framework

3.1. System architecture

A client-server mode (3) was used in the final design of this application. Firstly, the Client mode consisted of a student client application and a teacher administration portal. The major functions for the student client application were: content fetching and display, pre-recorded dialogues with text script, push notification, real-time messaging from teachers and embedded Voice-Over-IP (VoIP). For the teacher administration portal, major functions included a schedules checker, teaching materials downloading, slides control and feedback submission. Secondly, the server mode consisted of a class scheduling system, a Content Management System (CMS) and Application Programming Interface (API) for the client side and a VoIP infrastructure.

A high-quality VoIP system, a real-time messaging across different devices and a robust scheduling system were three crucial components in implementing the system design. In order to quickly build a robust system, existing cloud services were utilized where possible. The CMS and API server was deployed on Amazon Web Service. Twilio and Skype with Skype Number served as VoIP service providers. The AXIS system from Englishtown was used for the backend scheduling. Moreover, UrbanAirship (4) was used to send Apple Push Notification (5) . Real-time messaging was based on Pubnub (6), where both teacher and student used Javascript based APIs.

Engage system diagram

Figure 1. System diagram for Engage.


3.2. App design for Student Client

The student client was designed as an iPhone app. Apple’s iOS platform was ideal for the test because of its high penetration rate among the test students’ pool and its high quality of user experience. Moreover, prototyping could be implemented relatively fast with mature development tools such as XCode (7) .

Students’ needs were the foremost consideration in deriving the functionality of this application. The application design included a straightforward content viewing and downloading interface, an easy-to-use class booking and reminding mechanism, a clear role-play interface and an after-class review system. In addition, since native mobile applications were installed on end users’ devices, it was crucial to enable users to synchronize their progress across multiple devices and re-download the content package whenever necessary. All important user data, such as progress, teacher’s feedback and download history, were stored on the server and could be retrieved at any time in the application. Students were able to receive Push Notifications when it is necessary for the administrators to send the update or error notices.
The interaction flow of the app was as follows:

  1. Students entered the main view after viewing the downloadable daily topic and booked a role-play class. (see Figure 2)
  2. After downloading, students listened to the preparation dialogue. Sentences being played were highlighted synchronously. Students could also playback each individual sentence. (see Figure 3)
  3. Students could view the sentence translation and target language explanation by tapping the sentence. (see Figure 4)
  4. Going back to the main view, by tapping “Book a Role Play,” students were led to the list of available time slots. Tapping one of the slots confirmed and booked the class. (see Figure 5)
  5. Students were informed of the time of the class on the main view. Three local notifications were sent to the students at 1 hour, 5 minutes and 0 minutes prior to the class. (see Figure 6)
  6. Students attended the role-play class by tapping “Role Play” on the main view, and were led to the role-play view (real-time teacher-controlled slide). (see Figure 7)
  7. After the class, students could check the teacher feedback in the feedback view. (see Figure 8)
  8. The feedback view contained: 1) Evaluation of learning outcomes, which consisted of ranking (A, B or C) of each checkpoint. 2) Teacher’s comment. (see Figure 9)
  9. Students could review or delete the contents, login status or view helpdesk in the settings view. (see Figure 10)

Figure 2. The main view when contents not downloaded.

Figure 3. Sentences for preparation.


Figure 4. Translation and keywords upon tapping the sentence.

Figure 5. The class booking view.


Figure 6. The class information on the main view.

Figure 7. The role-play view.


Figure 8. Students can choose to view the teacher feedback.

Figure 9. The feedback view.


Figure 10. The settings view.


3.3. Content development

The exploration of content came along with the testing and building of the Engage product. Content development capitalized on what mobile devices can offer, that is, flexibility and light-weight learning; and it evolved to include a closely related 2-stage learning: in the offline preparation stages, students received input on target language through preparation dialogues, which they were then asked to apply to the live role-play during the online stage. A total number of 16 scenarios were developed, covering traveling, business, and everyday topics (see Table 1):

Travel topics

Business topics

Everyday topics

Shopping at 5th Avenue
Dining at Madison Square
Flying to New York

Attending a job interview
Rescheduling a meeting
Interview at a career fair

Moving to London
Family routines
Planning a party
Choosing a hobby
Describing people
Best day of your life
What were you like

Table 1. Engage topics.


3.3.1. Preparation contents

The preparation content mainly consisted of dialogues, target language, and cultural tips. In developing the content, several important considerations were taken into account:

  1. All dialogues were made as authentic as possible, for example, places (hotel, restaurant, airport, location etc.), brands, TV programmes etc. mentioned in the dialogues were real-world ones.
  2. Key target language (vocabulary, grammar, function, etc.) was extracted from each dialogue, accompanied by both Chinese and English explanations and sample sentences.
  3. Culture tips relating to the situation were included in each dialogue. Culture tips were short introductions of conventions, knowledge of places or names, etc.
  4. Considering that the dialogue prepared students for the role-play, the key target language was put in the interlocutor’s scripts with which the student were going to role-play.

3.3.2. Role-playing content

The validity of using role-plays as a pedagogical strategy has been backed by numerous studies.  Role-play is defined as “a simulation activity in which students are expected to take on a personal attitude, opinion, or role of someone else in a set context” (Senf, 2012, pp.3). Burke and Guest (2010, pp.34) describe role-plays as an excellent means to engage students, which emphasize “interactive, inquiry-based scholarship rather than passive learning.” In order for the activities to be successful, several important elements need to be emphasized, including modelling, giving students language support, setting realistic goals, using realistic scenarios, and using realia and visual aids (Parrish, 2004).

The role-playing contents in Engage were designed to be closely aligned with the preparation content. For each topic, the role-playing content consisted of the “Task description,” “Checkpoints,” and “Cue images.”

First, the “Task description” provided essential background information on the topic and set the “task” which students were asked to complete by the end of the role-play. For example, in one topic, “Moving to London,” students were asked to request information from a local agency on apartment rental and make decisions within a budget limit. Second, “Checkpoints,” as the name suggests, was the prescribed “pathways” the role-players were required to follow. The use of checkpoints served two purposes: firstly, as checkpoints corresponded to the conversation flow in the preparation dialogues, they gave the students a layer of support as to what exactly they were expected to talk about; secondly, the checkpoints helped to set standard ‘tracks’ the role-play should follow, thus preventing the conversation from going off topic. The checkpoints for “Moving to London,” for example, were:

  1. Tell the agent your preference (location, room size, environment, etc.);
  2. Tell the agent what facilities and home appliances you require;
  3. Make a reservation with the agent to view the apartment.

Each checkpoint corresponded to one or two “cue images” where the student was shown essential information they were expected to talk about. Most of the information also corresponded to the key target language. For example, in talking about home appliances, images such as a microwave and a refrigerator were shown to the student. Icons and symbols were used to represent abstract concepts such as “location” and “budget.”

Cue images added another layer of support for the students. They helped to establish a link between English words and visual stimuli. The internal tests revealed that students were often focused on phrasing during the live role-play; and therefore they were often unable to pick up vocabulary from text notes provided for them while they were speaking. During the first couple of internal tests where text notes were provided, the students commented that their minds were “too focused on phrasing English sentences,” and therefore they were not able to check the text notes during the session.

In the internal tests where cue images were used to replace the text notes, it was found that cue images were useful not only in guiding students in conversation, but also in increasing students’ retention of key vocabulary learned through preparation dialogues. The technique of using visual connections for words has been backed by literature (Sousa, 2006; Buzan, 1989; Hyerle, 2004). Cue images were made in order to elicit students’ memory of a specific vocabulary. For instance, during the internal tests of the topic “Renting an Apartment in London,” when talking about requirements of home appliances, almost all students could recall the target language vocabulary of microwave, refrigerator and cable TV. Students commented that they could recall these words because these images reminded them what had been taught in the preparation dialogue. Post-study interviews also revealed that cue images which gave “cues” on target language expressions were more useful than “background” images such as a photo of a restaurant or a shopping mall. However, not every piece of target language could be “translated” into cue images, for example, abstract words or expressions such as “agent” and “look for.”

All content was written by native speakers and edited by professional editors with experience in language teaching. Dialogues, key target language, sample sentences in the explanation of key target language were all pre-recorded by voice actors. Environmental sound effects such as background noise and telephone ring tones were also added in the post-production in order to make the scenarios as real as possible.

4. Valid learning experience and production feasibility

The designed learning experience has been proven valid for English learners. During the external test, 46 students successfully completed the entire learning experience: installing the application from App Store, downloading the content, preparing the dialogue to conducting the role-play and receiving teachers’ feedback. The learning concept was well-received; 37% (17 out of 46) students attended the role-play task more than once; the 21 collected survey responses averaged 9.6 on a 10-point scale (1 to 10) on the overall satisfaction rating; 57% of surveyed students said they wanted to practice the same task again. Students also showed interest in purchasing this learning experience: only 2 out of 21 respondents stated that they would not buy the full version if this application release.

Content was also well-received. Eleven rated topics received an averaged rating of 4.52 (out of 5). Role-play tasks received an average rating of 4.63 (see Table 2).



Role-Play Task




Attending a job interview



Choosing a hobby



Dining at Madison Square



Family routines



Flying to NY



Interview at a career fair



Moving to London



Planning a party



Rescheduling a meeting



Shopping at 5th Ave






Table 2. Ratings of each topic.

In terms of technological feasibility, the application and system architecture design have been successfully tested. All students were able to access teacher feedback, receive alerts of reminders and administrator’s notification and operate normally on any designed functionalities at any time. For class booking, AXIS streamlined the scheduling process for both teachers and students; students reported no obstacle in booking a class and receiving alert messages. Second, no case of voice delay was reported during the interviews. In the external tests, the delay was less than one second in both the 3G and Wifi environments (China Unicom 3G and China CNC network). However, sound quality issues were reported by a small proportion of the students.  Possible causes were problems with network bandwidth and settings, quality of the internet service providers and other technical interruptions. Third, in interviews and surveys, students reported no problems with viewing visual cues as they were switched.

The usability of the student client application has also been successfully tested. Embedding the VoIP proved to be a great improvement in the user experience. In internal tests, because calls were handled by telephone after the role-play phone call, which was carried outside the application, students had trouble switching back to the application by manual operation (8) . However, this problem was resolved in the external tests by embedding a VoIP functionality in the application.

A straightforward cost model was derived from the external tests where the cost of the service was proportional to students’ usage. Specifically, on the technical side, Twilio service cost was measured by the minutes used; Skype number was subscribed individually. AWS was charged by computing power and time consumed; Pubnub and UrbanAirship were charged by messages sent. Since all technical solutions were deployed on the mature cloud and billed according to usage rate, the service was easy to scale. In terms of the provision of teaching resources, administrators were easily able to allocate teaching resources (measured in teaching hours) in the AXIS scheduling system according to the demands for the class.

5. Conclusion

Engage has proven to be a successful first step toward connecting a real teacher and a student in real-time via mobile devices. The two-step learning experience consisting of the offline preparation and online synchronous role-play was proven to be a valid learning experience, shown by students’ usage, ratings, survey and interview findings. Through the external test, the feasibility of operationalizing this product has also been proven.

The development of this application also served as good testimony to the usefulness of the iterative approach, where users are invited to participate in cyclical tests and developers quickly respond to their feedback and make changes or tweak the system accordingly.

However, the concept of real-time mobile-based role-play for language learning still needs to be further explored. In our current version, students were only allowed to role-play one of the interlocutors; in future versions, students should have the flexibly to choose their roles. The peer role-play model may be a possible low-cost alternative, compared with the current model where a real teacher is used, if the demand for classes goes up dramatically. However, in a peer role-play model, issues such as student matching and scheduling need to be addressed.


Burke, T. & Guest, A. (2010). Using role playing as a teaching strategy: an interdisciplinary approach to learning. Proceedings of the 2nd Annual Conference on Higher Education Pedagogy, 34-35.

Buzan, T. (1989). Use both sides of your brain. New York: Penguin.

Demouy, V. & Kukulska-Hulme, A. (2010). On the spot: using mobile devices for listening and speaking practice on a French language programme. Open Learning: The Journal of Open, Distance and e-Learning, 25(3), 217-232.

Edge, D., Searle, E., Chiu, K., Zhao, J. & Landay, J.A. (2011, May). Micromandarin: mobile language learning in context. 2011 Annual Conference on Human Factors in Computing Systems. Symposium conducted in Vancouver, BC, Canada.

Hyerle, D. (2004).  Student successes with thinking maps: school-based research, results, and models for achievement using visual tools. CA: Corwin Press.
ISO 13407 (1999). Human-centred design processes for interactive systems. London: British Standards Institution.

Karat, C. (1997). Cost-justifying usability engineering in the software life cycle. In M. Helander, T.K.Landauer and P.Prabhu (Eds.), Handbook of Human-Computer Interaction (pp. 653-688). Amsterdam: Elsevier.

Kondo, M., Ishikawa, Y., Smith,C., Sakamoto, K., Shimomura, H., and Wada,N. (2012). Mobile assisted language learning in university EFL courses in Japan: developing attitudes and skills for self­regulated learning. ReCALL, 24, 169­187.

Kukulska-Hulme, A. and Shield, L.(2008). An overview of mobile assisted language learning: from content delivery to supported collaboration and interaction. ReCALL, 20(3), 271-289.

Kujala,S. (2003). User involvement: a review of the benefits and challenges. Behavior & Information Technology, 22(1),1-16.

Liu, T.-Y. (2009). A context-aware ubiquitous learning environment for language listening and speaking. Journal of Computer Assisted Learning, 25(6), 515-527.

Miangah, T. M., and Nezarat, A. (2012). Mobile-assisted language learning. Journal of Distributed and Parallel Systems, 3(1), 309-319.

Parrish, B. (2004). Teaching adult ESL: a practical introduction. New York: McGraw-Hill Companies.

Rubin, J. (1994). Handbook of usability testing: how to plan, design, and conduct effective tests. New York: Wiley.

Schafer, R. W. (1994). Scientific Bases of Human-Machine Communication by Voice. In D.B. Roe (Eds.), Voice communication between humans and machines(pp.34-75). Washington, D.C.: National Academy Press.

Senf, M. (2012, Dec). Role-play, simulations and drama activities. DocumBase. Retrieved from http://en.convdocs.org/docs/index-44311.html

Snyder, C. (2003). Paper prototyping: the fast and easy way to design and refine user interfaces. San Diego, CA: Morgan Kaufmann Pub.

Sousa, D. A. (2006). How the brain learns. CA: Corwin Press.

Traxler, J. (2007). Current state of mobile learning. International Review on Research in Open and Distance Learning, 8(2), 9-24.


[1] EF Labs, the R&D centre associated with Education First.

[2] EF English centres are the affiliated teaching centres of Education First.

[3] The client-server model is a distributed application structure in computing that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. http://en.wikipedia.org/wiki/Client–server_model.

[4] UrbanAirship is a cloud based push notification provider.

[5] Apple Push Notification Service allows third party applications to send push messages to end users through APIs from Apple Inc.

[6] Pubnub is a real-time message publish and subscription cloud service.

[7] Xcode is set of tools from Apple to develop and tune iOS and MacOS applications.

[8] In iOS, the switch of application can be triggered by one application calling the other app programmatically or double tapping the Home button manually.