Developing and Validating a Mobile Augmented Reality (MAR)-Mediated English Speaking Assessment for Korean EFL High School Learners

변정희

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Developing and Validating a Mobile Augmented Reality (MAR)-Mediated English Speaking Assessment for Korean EFL High School Learners : 한국 고등학교 영어 학습자를 위한 모바일 증강현실 기반 말하기 평가 개발과 타당화 연구

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 변정희

Advisor: 이용원

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: test mode ; technologymediated language assessment ; ManyFacet Rasch Measurement(MFRM) ; Multitrait Multimethod (MTMM) ; mobile augmented reality

Description: 학위논문(박사) -- 서울대학교대학원 : 인문대학 영어영문학과, 2023. 8. 이용원.

Abstract: 이 논문은 모바일 기반 증강현실(MAR) 기술을 이용한 제2외국어 말하기 평가의 실행 가능성을 탐구할 목적으로 한국 고등학생들을 대상으로 MAR 기반 영어 말하기 시험을 개발하고 검증하는 데 기울인 노력에 대해 자세히 설명하였다.
MAR 기반 영어 말하기 시험(이하 MARST로 표기)을 개발하기 위해 "Eco English Test"라는 모바일 애플리케이션을 제작하여 고1 남학생 110명과 여학생 90명을 포함한 약 200명의 한국 고등학생들에게 말하기 평가를 시행했다. 이 말하기 시험은 교실 수업에서 평가 이전에 학습한 언어 기능과 스킬의 숙달도를 평가하는 성취평가의 목적을 띠고 한 학기 수행평가로 치러졌다. 글로벌 환경과 관련된 단원 학습 후 해당 주제에 대한 네 가지 간접(semi-direct) 말하기 과제를 제시했다.
언어평가의 타당도 프레임워크인 Assessment Use Argument (AUA; Bachman and Palmer, 2010)와 Interpretation/Use Argument (I/UA; Kane, 1992, 2006, 2013)를 사용하여 MARST의 타당도 검증을 수행하고, 특히 이러한 혁신적인 평가 모드가 제2 언어 평가의 검증 과정에서 어떻게 맥락화(contextualize)될 수 있는지에 중점을 두었다.
다음 네 가지 연구 질문을 다루었다; (1) MAR에 의한 말하기 평가의 기저구조(underlying structure)가 동일한 말하기 능력 및 다른 능력(읽기, 쓰기, 듣기)을 측정하는 평가들과 어느 정도 비교가능한가? (2) 평가의 여러 국면들(예: 채점자, 과제, 측정기준)은 얼마나 MARST 점수에 영향을 미치는가? (3) 시험 사용자들의 MARST 사용에 대한 인식은 어떠하고, 인식의 정도가 성별과 일반 영어 능력 등의 개인 특성에 따라 다른가? (4) MAR을 매개로 산출된 발화의 언어적 특징은 무엇이며 MAR 기반 평가의 타당도 검증에 어떤 도움을 주는가?
데이터 분석을 위해 말하기 능력 및 다른 스킬의 여러 측정치 시험 점수,설문조사와 면접 응답을 수집했고, Multi-Trait Multi-Method (MTMM) 및 Many-Facet Rasch Model (MFRM)과 같은 심리측정학적 접근법 뿐만 아니라 응시자의 말하기 응답에 대한 코퍼스 및 담화 분석을 포함하는 혼합 방법(mixed method)을 사용하여 분석했다. MTMM 분석의 결과는 MARST 점수와 다른 말하기 측정치 간의 긍정적인 상관관계를 보여주는 것뿐만 아니라, MARST의 일원적인 내부 요인 구조를 밝혀냈다. MFRM 분석은 다음과 같은 유효성 주장 – (1) MARST의 관찰된 점수는 예상 점수의 신뢰할 수 있는 추정치이다 (2) 별도의 분석 등급 척도가 목표로 하는 구인 (targeted construct)측정에 기여한다 (3) 과제의 중복이 없으며 수정이나 삭제할 필요가 없다 (4) 시험 응시자들을 다양한 수준으로 변별할 수 있다 (5) 테스트 구조의 해석은 테스트 응시자 그룹 간에 일관성이 있다 - 을 지원하는 실증적 증거를 제공했다.
또한, MAR 모드 효과는 다양한 측면에서 통계적으로 유의하지 않았다. 다만, 편향분석(bias analysis)에서 발견된 두 명의 등급자의 점수 부여 행동과 과제 1(대화 완성) 및 과제 3(사건의 순서 설명) 사이에 유의미한 상호 작용은 문헌에서 제안된 기준과 후속 면접에 기반하여, 시험 응시자의 수행 능력 측정에 실질적인 영향을 미치지는 않은 것으로 나타났다. 한편, 과제 3에 대한 150개 응답 샘플을 코퍼스 및 담화 분석한 결과, 상호작용적(interpersonal /interactional) 의사소통의 특징들이 드러났다. 이는 과제 수행 과정에서 응시자들이 가상의 대화자에 대한 존재 인식 및 발화자로서 자신의 역할에 대한 인식, 그리고 주어진 상황에 대한 감수성의 증가에 기인한 것으로 판단되었다. MAR 모드의 고유한 특징인 몰입 효과가 가상의 대화 상대와의 상호 작용을 촉진함으로써, 직접적인 말하기 테스트가 제한적일 수 밖에 없는 EFL 상황에서 영어 말하기 평가의 대안으로서 잠재력을 보여주었다. 결론적으로, 위에서 밝혀진 분석 결과들은 MARST가 과제 3에서 의도한 말하기 평가 구인(construct)을 제한하지 않았음을 시사했다.
이후, 타당도 주장에서 세 가지 핵심 이슈 – (1) 제 2언어 말하기 평가에서 MAR 기술의 맥락화 (contextualize), (2) 평가 구인 (construct), 평가 과제 (task), 그리고 시험 응시자 (test-taker)에 끼치는 MAR 모드의 영향, 그리고 (3) 시험 조건 (test conditions)의 가변성 (variability) 통제- 에 대해 이론적 분석과 통찰을 제시했다. MARST에 대한 기술적 및 교육적 시사점 중의 하나로 말하기 평가 과정에서 일어나는 응시자의 과제 수행 행동과 전략 연구의 필요성을 언급했다. 끝으로 언어 평가자들과 기술 전문가들이 협력하여 언어 학습자들을 위한 더욱 실제적인 언어 학습 및 평가 상황을 설계하고 개발하는 것을 후속 연구로 제안한다.
This dissertation explores the feasibility of using mobile-based, context-aware augmented reality (MAR) technology as a new mode of second language speaking assessment. It provides a detailed description of the efforts made to develop and validate an MAR-based English speaking test for high school students in the South Korean context.
To that end, a mobile AR-mediated English speaking test (hereafter referred to as MARST) was developed within the mobile application named "Eco English Test". This test was administered to approximately 200 Korean high school students, consisting of 110 males and 90 females aged 16 to 17. The test comprised four semi-direct speaking tasks on topics related to the global environment.
The study conducted the validation process of MARST using both the Assessment Use Argument (AUA) and Interpretation/Use Argument (I/UA) framework. The focus was on determining how the innovative testing mode provided by MAR technology could be characterized and evaluated as part of the overall validation process for the entire test."
The research questions posed for this study are as follows; (1) to what extent are the MARmediated speaking test scores and the tests underlying structure comparable to those in several other measures of the same and other traits? (2) To what extent do the assessment settings (e.g., rater, task, and rating categories) affect test scores? (3) What are test-users perceptions toward the use of MARST, and do they differ according to individual characteristics, such as gender and general English proficiency? (4) What are the linguistic features of MAR-mediated communication, and how do they inform MAR-mediated test validation?
For data analysis, test scores from several measures of speaking skills and other traits were collected, along with questionnaire and interview responses. These were analyzed using a mixed method that included psychometric approaches, such as the Multi-Trait Multi-Method (MTMM) and the Many-Facet Rasch Model (MFRM), as well as corpus and discourse analyses of test-takers speaking responses.
The results of the MTMM analysis not only showed positive correlations between the MARST score and other speaking measures, but also revealed the unidimensional internal factor structure of the MARST. These findings provide empirical evidence supporting the validation argument that MARST test scores contribute to a common construct of the target speaking ability.
The MFRM analysis offers empirical evidence to support the following validation arguments: 1) The observed scores of the MARST are reliable estimates of expected scores; 2) The separate analytic rating scales contribute to the target construct; 3) There is no task redundancy, nor is there a need for revision or deletion; 4) Test-takers perform significantly differently across various aspects of speaking; 5) Interpretations of the test construct are consistent across different groups of test-taker.
The bias (interaction) analysis indicates that raters behaviors in assigning ratings did not vary due to factors such as gender, test-takers region of residence, or the rating criteria. Regarding the mode effect, no significant differences were found across test-takers gender, general English proficiency level, or region of residence. However, a significant interaction was found between the scoring behaviors of two raters and both Task 1 (dialogue completion) and Task 3 (explaining the sequence of events). Yet, following the guideline suggested in the literature and drawing upon subsequent interviews, it was concluded that this interaction did not appear to have a substantial impact on the measurement of test-takers ability to perform.
Results from the test-taker questionnaires revealed that the MAR-based testing was perceived as comfortable and engaging. Respondents generally agreed that the test input, presented through the MAR mode, was authentic and provided clear instructions and guidance for crafting their responses. They highly rated the items inquiring about the suitability of the test tasks for presentation in the MAR mode and their relevance to classroom learning. Test users saw the MARST as a valuable alternative for second language (L2) speaking assessment in English as a Foreign Language (EFL) contexts.
In the subsequent corpus and discourse analysis of 150 sampled responses to Task 3, which required test-takers to describe the procedure of an event to a simulated interlocutor, the immersive effects of the MAR mode on the linguistic features became apparent. These included an increased perception of the interlocutor's presence, heightened awareness of the speaker's role identity, and a sense of urgency given the task situation. MAR technological features appeared to encourage interactions with a simulated interlocutor, revealing the interactive linguistic features of test-takers' responses in tasks typically limited to a monologue. These factors suggest that the MARST did not underrepresent the intended speaking construct in Task 3.
Subsequently, three key issues were addressed in the validation arguments: 1) the contextualization of integrating MAR technology in second language assessment, 2) the investigation of mode effects on the construct, task, and test-taker, and 3) the investigation of control of variabilities in test conditions.
Among the technological and pedagogical implications for MARST and recommendations for future research is the need to investigate the rating behaviors and strategies involved in the speaking process. Additionally, it was suggested that language testers and technology experts cooperate to design and develop more authentic language learning and testing contexts for language learners.

Language: eng

URI: https://hdl.handle.net/10371/197231

https://dcollection.snu.ac.kr/common/orgView/000000178360

Files in This Item:

000000178360.pdf 12.15 MB

Appears in Collections:

College of Humanities (인문대학)
- English Language and Literature (영어영문학과)
  - Theses (Ph.D. / Sc.D._영어영문학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share