ASR Assessment Used in Modern Chinese Language Classroom

 

ASR Assessment Used in Modern Chinese Language Classroom

Author Note

Abstract

This action research study aims to help meet the developmental needs of Chinese language learners. Recently, there has been an increase in the research of student engagement, active instruction, and the use of technology in the classroom. However, research on the use of Automatic Speech Recognition (ASR) for Chinese language learning remains scarce. The investigation in this paper reveals both the advantages and the limitations of using Artificial Intelligence (AI)-based ASR in spoken language assessment. The findings indicate that the use of ASR does benefit the outcome of students’ Chinese language learning. ASR also adds additional assessment methods for teachers’ practice of multiple modalities, so that they can further improve student engagement and better address different learning styles. Our research used the data collected by transcribing text from students’ speech to support the claim that ASR increases the frequency of teacher-student interaction, while also enabling teachers to identify student errors and provide constructive feedback more promptly. This suggests that using ASR as an official summative assessment method in the Chinese language classroom is practical and effective.

Key words: Automatic Speech Recognition (ASR), Chinese language education, ASR, Multiple modalities, Assessment.

 

 

Instreduction

The purpose of this paper is to explore the possibility of assessing students’ oral Chinese ability through analyzing the output text produced by Automatic Speech Recognition (ASR) technology. In our research, ASR technology is used as an assessment instrument for Chinese language learners, and it shows substantial potential for the evaluation of Chinese pronunciation, including accuracy with the four tones in speech. It even provides learners with the opportunity to do self-corrections. The latest generation of ASR claims to be able to process more authentic dialogue for man-machine interaction than ever before. Traditionally, spoken test performances were analyzed using a range of measures including grammatical accuracy, complexity, vocabulary, pronunciation, fluency (Iwashita, 2008).. Currently, ASR applications can be used to test speech accuracy and are accessible to the owners of an iPhone or any other smartphone. It can be used as a digital assistant to parse Chinese speech. For example, Apple’s Siri (2011), Windows Phone (Cortana, 2014), Google (Home, 2016), Amazon (Alexa, Echo Dot), and Samsung (Bixby, Dec. 10, 2017) are all available in the North American market. These Chinese voice applications are widely accessible to American students, who made this research possible. If an ASR device, such as Siri, can understand the questions students produce in Chinese and reply with authentic answers, then the students can improve their interpersonal and presentational communication skills. In addition, Android users have already used Google’s voice-recognition technology to send text messages (Vanian, 2017). Students must simply turn on the language choice function to do so. If Chinese learners can send a teacher their speech transcribed to characters, then their pronunciation score can be derived from the correctness rate of the text message, which is tied to the accuracy of phonic syntax input. A series of analyses given in this paper demonstrates the feasibility of using intelligent speech recognition technology to improve the effectiveness of the study of Chinese language learners.

Literature Review

Numerous studies have been conducted on the assessment of foreign language ability, but using AI-based ASR to assess learners’ language development has only become possible in recent years, along with the innovation of modern technology and its broad accessibility among the general population. As such, relevant research has been exceptionally limited. There has been discussion about whether flipped classrooms for learning classical Chinese could be supported by a mobile device-assisted learning system (Wang, 2016), and whether integrating teaching strategies into interactive response system (IRS) activities would be effective in facilitating teaching and learning (Wang, 2017). However, the accuracy scores derived from this assessment provided no quantitative literature reviews whatsoever. Recently, the Chinese company iFLYTEK announced the release of six education products using intelligent technologies, including ASR, and the collection of 35 billion samples of learning data (Peng, 2017). This indicates that recent technology has enabled us to use ASR as an assessment instrument for the Chinese language classroom. Moreover, the research of University of Oregon Chinese Flagship found that ASR lacked methodology for measuring spoken Chinese (Clark, 2010); they could only provide the traditional four modalities to assess the oral output. This research demonstrated the necessity and possibility to develop an easily adaptable method for assessing oral Chinese. Two decades ago, the use of speech recognition software as an English language oral assessment instrument had been addressed by English language scholars (Coniam, 1998). This could be used in a similar manner for studying Chinese language assessment. Additionally, the quantitative assessment of the learners’ fluency in the second language by means of automatic speech recognition technology needs to be employed (Cucchiarini, 2000). Furthermore, the nature of speaking proficiency in English as a second language has been developed in the context of a larger project, and the rating scale for a new international test of English for academic purposes had also been executed (Iwashita, 2008). Our research used the same methodology to measure the test results in a smaller scale assessment that may limit the stringency of the result. Nevertheless, it exhibits the use of ASR technology as an assessment instrument for Chinese language education, which has substantially benefited Chinese teachers in evaluating students’ pronunciation, including the accuracy with four tones. Oremus introduced the historical development of ASR and its current advancement (Oremus, 2014). The same method was employed in our study. The students’ various mobile devices can all become engaging tools to assess their proficiency of spoken Chinese. Snyder et al. (2016) evaluated the effectiveness of using "flipped" instruction in a secondary social studies classroom. The discussion in this article provided insight for curricular decision-making when implementing the ASR technology in the Chinese language classroom.

Research Questions and Hypotheses

After examination of the literature related to ASR, several research questions remain to be discussed. For example, how accurate is ASR in measuring speech proficiency, and can it be used as an official summative assessment method? To what degree can students be engaged in this type of assessment, and is it appropriate to use this technology in a high school classroom considering that teenagers are easily distracted by electronic devices?

On the basis of the above research questions, two hypotheses will be formulated in our research. The first stated that implementing ASR technology in the Chinese language classroom is feasible and effective. The second stated that the ASR method plays an important role on learners’ motivation and sustainable interest in Chinese study, regardless of concerns of distraction.

Methodology

Participants

This research involved thirteen high school students, six boys and seven girls, ranging from age 14 to 18. Eleven of them were first-year Chinese learners, with Mandarin Chinese level novice low according to the American Council on the Teaching of Foreign Languages (ACTFL) speaking proficiency guidelines; the other two students’ Chinese levels were medium high and advanced mid. Four of them were from an urban school and nine of them were from a suburban school.

The students came from three classes and were naturally organized into three groups, using WeChat, Apple text message app, and the Siri app, respectively. Students in Group I had studied Chinese for six months and learned nearly 100 sentences; however, they are only able to produce about 30% of these sentences to make their own dialogues without referring back to the text. . Students in Group II had also learned Chinese for six months but in a distance learning setting; although they have also learned 100 sentences, it remains a major challenge for them to replicate these sentences in real life communication or to make coherent dialogues. Students in Group III had over six years of part-time study; they are heritage students, and they can use AI voice assistants like Siri of Apple or Alexa of Amazon for more complex speech.

None of the students had prior experience with interactive ASR in Chinese. Before the three practice trials, they received two weeks of instruction on how to use different types of ASR technology. The seven students in Group I were instructed to use a Chinese social media app, WeChat. Two of them used Android phones, four of them used iPhones, and one had no phone, but shared a phone with others. The four students in Group II were instructed to use the Apple text message application. The two students in Group III were instructed to use the Siri application. The demographic characteristics of the students who participated in the study and the number of students assigned to each group are shown in Table 1.

Table1. ASR Participant Demographics

   

Number of Students

% of Students in Overall Study

Experimental method in each group

WeChat app (Group I)

7

53.8%

Text Message app (Group II)

4

15.4%

Siri app (Group III)

2

30.8%

Gender across all three groups

Female

7

53.8%

Male

6

46.2%

 

Procedures of the experiment

Students in all three groups used the latest ASR technology to either convert vocal input to character output, or use man-machine conversation to acquire information.

The Group I students were instructed to interact in a group chat setting, with teacher involvement. They spoke short sentences to their devices which automatically converted their speech to text. They shared their text among the group and compared their accuracy in performing the lesson dialogues.

The Group II students were instructed to interact with the teacher individually. They also used their own devices to automatically convert their speech to text, and then they took pictures of the text via phone screenshot and sent them to the teacher through email as a record of assessment.

The Group III students were instructed to use daily conversation with Siri to conduct an inquiry on classical Chinese poetry, the weather in Beijing, and the closest library in town. In this third model, students used verbal Chinese to request information from Siri .

For all three groups, the teacher provided immediate feedbacks to students, so that they would note their incorrect pronunciations, and sometimes they even conducted self-corrections.  After three trials of performances by each group spanning multiple days, the teacher instructed students to count the number of incorrect words to gauge their improvement. Upon making corrections, students sent them to the teacher for an official summative assessment.

Measurements

The measurements covered oral proficiency, listening comprehension, and conversational skills. A set of comparative data was gathered in each of the three trials that were performed in the experiment. The three trials included the first, the second, and the third time using ASR throughout a 45-day period.

In order to identify whether students will support the use of ASR technology in Chinese classroom and what instruction the students may need, this research included a survey containing six statements related to ASR. Students indicated whether they agree or disagree by using a five-level Likert scale, detailed as follows::

  1. Strongly disagree

  2. Disagree

  3. Neither agree nor disagree

  4. Agree

  5. Strongly agree

The survey was anonymous and students provided voluntary responses through Google form.

Results and Discussion

Effectiveness of ASR and Quantitative Assessment

The three trials in our research project presented some interesting preliminary results. Firstly, our research suggests that ASR is an effective way to assess students’ speaking and reading proficiency in both interpretive and presentational communication skills. Secondly, it instantly provides teachers with specific information about imperfections in student pronunciation so that students receive prompt feedback from the teacher through ASR interaction that consequently increases their learning cognition. Students also become more attentive towards their speech and motivated to make self corrections before submitting their work. This resonated with other research statements such as “enhancing the learning in conversation courses designed to develop spontaneous second language (L2) oral proficiency” (Miller, 2013). Lastly, this application of ASR technology in the Chinese classroom filled the historically missing rapid quantitative measurement method for oral Chinese language learning assessment. This is the first time that a teacher can score student’s oral expression on the ASR text output. It would be a welcome addition to the Computerized Assessment of Proficiency (CAP) that was designed to measure proficiency in Chinese reading, listening, writing, and speaking, based on the underlying principles of the Standards-based Measurement of Proficiency (STAMP) (Clark, 2009).

Our research results supported both of our hypotheses. A statistical data analysis and performance chart was constructed according to the designed test model and data collection as shown in Table 2. These results suggest that there is a clear distinction between traditional assessment and ASR assessment. Traditionally, teachers had no quantitative measurement method to evaluate student pronunciation with the exception of using their own listening judgments. However, with ASR, the performance of both Group I and Group II was measured quantitatively without using the teacher’s listening judgment. The text material, converted from speech, was graded according to the correctness rate in quantitative measures. The research results and students’ achievements showed that ASR ratings of fluency in speech were reliable and effective. Among the 92 sentences produced by the nine students, the correlation with the correctness rate varied between 0.9 and 0.98 as shown in Figure 1. The accuracy of speech performance from the first to last trial presented an upward trend.



 

Table 2. All three trial results from a 45-day period by ASR input

 

Students

First Trial

 

Incorrectness Rate

Second Trial

 

Incorrectness Rate

Third Trial

 

Incorrectness Rate

Total number of incorrect characters per student

Group I

Student 1

0/13

0/20

0/16

0/49

Student2

0/0

1/20     

0.05

3/18    

0.166

4/38     

0.105

Student3

0/33

1/20     

0.05

0/32

1/85     

0.012

Student 4

7/39   

0.179

2/20     

0.1

1/25      

0.04

10/84   

0.119

Student5

0/49

0/0

0/23

0/72

Student6

1/11  

0.09

0/20

data not available

1/33      

0.03

Student7

N/A

N/A

N/A

 

Group II

Student 1

8/15   

0.53

8/29    

0.275

0/9

16/53      

0.301

Student 2

2/29   

0.069

1/107  

0.009

 

3/136     

0.022

Student 3

0

0

0

0

Student 4

2/10

0.2

0/0

0/26

2/36

0.056

Total Incorrectness Rate among Students in Groups I & II during Each Trail

20/189   

 

0.106

13/236    

 

0.055

4/123   

 

0.033

37/586  

 

0.063

 

Group III

Chinese Classical Poem

Weather Inquiry

Library Location Inquiry

Correctness

Rate

Degree of Involvement

Student 1

0/4

0/2

0/1

100%

100%

Student2

1/4

0/2

0/1

86%

100%

Figure 1. The Correlation of Trials and Accuracy Rating

 

Students’ Motivation and Engaging Learning Environment

The experiment results showed that, with the assistance of ASR, students who have less than one year of Chinese study can convert their relatively complex speech into a text-based format, which, according to the teacher’s observation, surpassed those who had studied Chinese for years without ASR assistance in the classroom. This can become positive motivation for students starting a one-year language-study program to perform practical, language-based speech in real life communication.

Table 3.  Likert Scale Survey Result of Using ASR in Chinese Classroom

Six Survey Questions

# of students selected 1, 2, 3, 4, 5, respectively, in Likert Scale

Q1: I like to use ASR because it is fun to use

1

2

1

4

5

Q2: I like to use ASR because it is easy to use

1

2

3

4

4

Q3: I communicate more frequently with the teacher by using ASR

3

3

2

1

4

Q4: It improved my Chinese pronunciation

3

1

2

1

6

Q5:  It increased my interest of studying foreign language

3

3

1

2

4

Q6: I will use ASR more often in learning Chinese

4

0

0

3

6

 

The survey results showed the degree of enthusiasm of participants in using ASR in their Chinese language classroom (see Table 3). The results supported the beliefs we held prior to the experiment: using mobile devices in the classroom is easier and more efficient than using computers, and ASR enables teachers to provide instant feedback and thus increases teacher-student interaction. According to the survey, students enjoyed using ASR to submit their oral assignments, and the teacher could easily measure their performance accurately and quantitatively. ASR applied in the classroom enhanced student engagement with human-machine conversation.

Moreover, over 65% of students responded that ASR is fun and easy to use and they are engaged when using it. 69% of students responded that they want to use ASR more often in learning Chinese. In Group III, after the ASR method was introduced to two students who had learned Chinese part-time over six years, they voluntarily asked Siri numerous questions in Chinese. This served the purpose of this research well: creating an engaging learning environment and motivating student’s interest in practicing on Chinese language.

Concerns and Considerations

Some concerns arose during the research. Firstly, the amount of time prescribed for using ASR in the classroom must be carefully planned in order to minimize distractions from the common attention diffusion of teenagers. Secondly, although mobile devices have become widely accessible, what if students do not have one? Can we use other alternatives such as Chromebooks, iPads, or computers? Will the difference of teaching platforms cause classroom time management issues? Thirdly, as was reflected by their responses to survey question #4, students were not sure whether ASR could help them improve their pronunciation. ASR technologies are over-intelligent, so as to correct phonic errors automatically. Even if the student's pronunciation of the four tones was not sufficiently accurate, ASR will still able to interpret speech and produce correct text output. This may not be a positive influence because some students may stop striving for more accurate pronunciation in the long run. Another potential negative effect of using ASR is that it may decrease students’ desire to improve their writing proficiency.

A Surprising Result

One thing that surprised us was that the responses to survey question #3 (“I communicate more frequently with the teacher by using ASR”) were evenly spread, meaning this phenomena did not show strong agreement among students. However, in reality, ASR did increase quality and quantity of the students’ interactions with the teacher. Throughout the experiment, students actively participated in all exercises and assessments. They were highly motivated and some even took initiative to communicate in Chinese, which had not been the case prior to completing the ASR exercises. In the past, these students would use Chinese to communicate only when they were required. During and after this experiment, they began using Chinese to respond whenever they could, even to prompts in English.

Conclusions

Our research indicates that ASR can support real-time evaluation, error identification, and self-correction functions for L2 learners' speech proficiency. In this experiment, 27 authentic sentences were used 92 times among ten students in Group 1 and Group 2 (One student did not participate in the three trials due to the surgery in hospital, but showed high performance in a separate evaluation using ASR afterwards). There were three trials spread out over a 45-day period to demonstrate the correlation between trials and correctness rate. The improvement of pronunciation had been detected through the positive relationship between attempts and resulting accuracy. Using ASR as an assessment tool to engage students to develop communicative proficiency in the target language was also successful. The goal of creating an engaging learning environment therefore, to increase the students’ interest of learning Chinese through ASR assessment in Chinese language classroom, has been achieved. At the present time, the iFLYTEK’s ASR technology has been applied in 94% of primary and middle schools in Singapore (Guanchazhe, 2017). In the United States, we tested mobile phones with ASR, used by American students in secondary schools, to increase the effectiveness of Chinese language study. In addition, we can continue to apply ASR technology in the classroom to assist language study in the future through authentic conversation with voice input. In the past, if students did not know certain Chinese characters, resources would be very limited to do research on their own; but now with the new ASR technology, they can make simple conversation with Siri or other ASR applications to do their own research. That will be particularly helpful for the first-year Chinese learners who have limited Chinese proficiency. For future study, voice input technology is expected to become significantly more well developed.  To foster 21st-century students, we educators have the obligation of embedding the latest technology in our classrooms, so that we may improve the effectiveness of teaching and learning in the AI-voice command era.

 

References

Clark, M. (2009). Chinese Computerized Assessment of Proficiency (CAP).  CASLS Technical

Report 2010-1.  Retrieved from https://casls.uoregon.edu/cap/TechReport/Chinese.pdf

Coniam, D. (1998). The Use of Speech Recognition Software as an English Language Oral

Assessment Instrument: An Exploratory Study. CALICO Journal, Vol. 15, (No. 4), pp.7,     23. Retrieved from https://www.jstor.org/stable/24147601?seq=1#page_scan_tab_contents

Cucchiarini,C., Strik, H., & Boves, L. (2000). Quantitative assessment of second language

learners’ fluency by means of automatic speech recognition technology. The Journal of

the Acoustical Society of America, pp. 107, 989. Retrieved from

http://asa.scitation.org/doi/abs/10.1121/1.428279

Guanchazhe (Observer). (2017). iFLYTEK announced the release of six education products

using intelligent technologies, including ASR, and the collection of 35 billion samples

of learning data.

Retrieved from http://tech.163.com/17/0309/11/CF36M23100097U7T.html

iFLYTEK official web. (2018) Retrieved from

http://www.iflytek.com/en/content/details_10_1681.html

Iwashita, N., Brown, A., & Mcnamara, T. (2008). Assessed Levels of Second Language Speaking Proficiency: How Distinct? Oxford University Press 2008.  Retrieved fromhttps://eclass.uoa.gr/modules/document/file.php/ENL264/testing%20speaking.pdf

Miller, J. S. (2013). Improving oral proficiency by raising metacognitive awareness with

recordings. In J. Levis & K. LeVelle (Eds.). Proceedings of the 4th Pronunciation in

Second Language Learning and Teaching Conference. Aug. 2012.  101-111.

Retrieved from https://apling.engl.iastate.edu/alt-content/uploads/2015/05/PSLLT_4th_Proceedings_2012.pdf

Oremus, W. (2014). I Didn’t Type This Article.  Retrieved from

http://www.slate.com/articles/technology/technology/2014/04/the_end_of_typing_speech _recognition_technology_is_getting_better_and_better.html

Peng, Y. (2017). iFLYTEK announced the release of six education products using intelligent

technologies, including ASR, and the collection of 35 billion samples of learning data

             Retrieved from https://www.jiemodui.com/N/85948.html

Ren, B., From Siri to IDF Speech Recognition is changing who? November 11, 2017 20:00

Source: Car home Type: Original Edit: Ren Bo  RetrievedRetrieved from

https://www.autohome.com.cn/user/201711/909060.html

Snyder, C. Besozzi, D., Lawrence, P., & Oppenlander, J. (2016). Is Flipping Worth the Fuss: A

Mixed Methods Case Study of Screencasting in The social Studies Classroom. American

Secondary Education 45(1)  

Snyder, C., Lawrence, M. P. & Besozzi, D. (2014). Cast from the Past: Using Screencasting in

the Social Studies Classroom. The Social Studies, DOI: 10.1080/00377996.2014.951472

Link: http://dx.doi.org/10.1080/00377996.2014.951472

Vanian, J. (2017). Google Challenges Apple's Siri in Dictating Messages. Fortune February 23,

2017. Retrieved from http://fortune.com/2017/02/23/google-iphone-keyboard-voice/

Appendix A

The Response Percentage of Students’ Survey

Forms response chart. Question title: I like to use ASR because it is fun to use. Number of responses: 13 responses.

Forms response chart. Question title: I like to use ASR because it is easy to use. Number of responses: 13 responses.

Forms response chart. Question title: I communicate more frequently with the teacher by using ASR. Number of responses: 13 responses.

Forms response chart. Question title: It improved my pronunciation in learning Chinese. Number of responses: 13 responses.

Forms response chart. Question title: It increased my interest of studying foreign language. Number of responses: 13 responses.

Forms response chart. Question title: I will use ASR more often in learning Chinese. Number of responses: 13 responses.



 

Appendix B

Examples of Chinese Sentences, Dialogue, Classical Poem Used in the Experiment

我叫…, 很高兴认识你。

你在哪儿工作?

图书馆在哪儿?

这个周末你想做什么?看电影还是去跳舞?

请进,请进,快进来!

我来介绍一下。这是我的同学……

你想喝点什么?咖啡还是茶?

 

周末你忙吗?我想请你去看电影。

什么电影?

美国电影。

7:30可以。

 

登鹳雀楼

(唐) 王之涣

白日依山尽,

黄河入海流。

欲穷千里目,

更上一层楼。

 

 

 

 

 
登录后才可评论.