Data-Driven Optimal Control for Linear Systems with Arbitrary Initial Policy and Application to Nonlinear Systems Using Koopman Operators

김성훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Data-Driven Optimal Control for Linear Systems with Arbitrary Initial Policy and Application to Nonlinear Systems Using Koopman Operators : 임의의 초기 정책에 대한 선형 시스템의 데이터 기반 최적 제어 및 쿠프만 연산자를 활용한 비선형 시스템에 대한 응용

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김성훈

Advisor: 김유단

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Reinforcement Learning ; Data-Driven Control ; Learning-Based Control ; Automatic Control System ; Optimal Control ; Algebraic Riccati Equation

Description: 학위논문(박사) -- 서울대학교대학원 : 공과대학 기계항공공학부, 2023. 8. 김유단.

Abstract: A model-free off-policy reinforcement learning algorithm is proposed for solving optimal control problems for dynamic systems. The algorithm is designed to converge to not only the optimal but also stabilizing policy, which is one of the most critical concerns in designing the controller for safety-critical systems such as unmanned aerial vehicles. Unlike typical approximate dynamic programming methods, an initial stabilizing policy is not required by the proposed algorithm, which is a key advantage.
In the first part of the dissertation, a data-driven surrogate Q-leaning algorithm is proposed for linear systems based on the extended Kleinman iteration that solves algebraic Riccati equation. To allow an initial unstable policy, the value function is redefined implicitly to evaluate the performance index of the unstable policy. Based on this implicit value function, an action-value function called the surrogate Q-function is proposed by augmenting virtual control dynamics in the state space to properly define values of state and control input pairs. An off-policy data-driven method called the surrogate Q-learning is then provided based on the surrogate Q-function, which enables the reuse of data obtained from an arbitrary control sources, e.g., trained human experts or fine-tuned PID controllers. The convergence of the extended Kleinman iteration to the unique positive definite solution, which yields the optimal stabilizing policy, is proven based on matrix inertia theory since the surrogate Q-learning is equivalent to the extended Kleinman iteration.
The second part of the dissertation is devoted to an application of the proposed reinforcement learning algorithm to nonlinear systems. The Koopman operator theory is employed to linearize nonlinear systems in an infinite-dimensional space, called the Koopman lifting linearization. The controllability and observability of linearized systems are investigated by assuming that there exists a finite-dimensional invariant subspace of the Koopman operator spanned by a mapping called the lifting. The equivalence between two optimal control problems for the original nonlinear system and the linearized system is shown under several conditions on the lifting. To find the lifting satisfying all of the conditions, a diffeomorphic lifting approximation by coupling flow-based invertible deep neural network is employed. A meta-learning framework is proposed to train the network to approximate a common lifting for a group of systems, and therefore the mode-free surrogate Q-learning can be applied to uncertain systems.
Numerical simulations using illustrative nonlinear systems with known optimal controllers are used to demonstrate the feasibility of the proposed framework, along with practical considerations and implementation details.
본 논문에서는 최적제어 문제를 해결하기 위해 비모델(model-free) 강화학습 알고리듬을 제안하였다. 제어 시스템의 안정성은 제어기 설계 시 필수적으로 고려되어야 할 사항으로 본 논문에서 제안한 알고리듬은 학습되는 제어기가 최적일 뿐만 아니라 안정한 제어기로 수렴하도록 설계되었다. 기존의 근사 동적 프로그래밍 기법들과는 달리, 제안한 알고리듬은 안정한 초기 제어기를 필요로 하지 않는데, 이는 불안정한 평형점을 가지는 시스템의 비모델 학습 관점에서 주요한 장점이다.
논문의 전반부에서는 데이터만을 이용해 선형 시스템의 안정한 최적제어기를 학습할 수 있는 새로운 형태의 Q-학습 알고리듬을 제안한다. 초기 불안정한 제어 입력을 허용하기 위해 성능지수를 평가하기 위한 가치함수를 음함수 형태 재정의하고, 선형 시스템에 대해 존재성과 유일성을 보였다. 가상의 제어 동역학을 상태변수에 추가한 확장된 상태공간에서의 가치 음함수로 Q-함수를 정의하고, 이를 기반으로 하는 정책 반복법 기반의 Q-학습 알고리듬을 제안하였다. 이 알고리듬은 학습 중인 제어기로부터 데이터를 얻을 필요가 없는 off-policy 기법으로, 시스템의 숙련된 운영자나 실험적으로 설계된 PID 제어기를 통해 얻은 데이터를 사용할 수 있다는 장점이 있다. 제안한 Q-러닝 알고리듬을 이용하면 학습되는 제어기가 유한 단계 이내에 안정화 되며, 최종적으로 대수적 리카티(Riccati) 방정식의 안정한 선형 최적해로 수렴함을 행렬 관성 이론을 기반으로 증명하였다.
논문의 후반부는 제안된 강화학습 알고리즘을 비선형 시스템에 적용하는 문제를 다룬다. 이를 위해 비선형 시스템을 무한 차원 공간에서 선형화하는 쿠프만(Koopman) 연산자 이론을 활용한다. 리프팅(lifting)이라 불리는 매핑에 의해 생성되는 쿠프만 연산자의 유한 차원의 불변 부분공간이 존재한다고 가정할 때, 선형화된 시스템의 최적제어를 위해 가제어성과 가관측성을 가지기 위한 조건을 정립한다. 리프팅에 대한 여러 조건을 바탕으로 기존 비선형 시스템 최적제어 문제와 선형화된 시스템의 최적제어 문제 간의 동치성을 증명하고, 앞서 제안한 강화학습 알고리즘을 사용할 수 있는 이론적 근거를 마련한다. 모든 조건을 만족하는 리프팅을 찾기 위해 가역 심층신경망을 활용한 미분동형(diffeomorphic) 리프팅 근사법을 제안한다. 특정 시스템 그룹에 대해 공통된 리프팅이 존재한다면 그룹 내의 불확실한 시스템에 대해 제안한 비모델 강화학습을 활용할 수 있다는 점에 착안하여, 공통 리프팅을 학습하는 메타 러닝(meta learning) 프레임워크를 개발하였다.
마지막으로 이미 알려진 최적 제어기와 비선형 동역학을 갖는 비선형 시스템을 사용하여 수치 시뮬레이션을 수행하고, 제안된 프레임워크의 타당성과 구현 세부사항을 살펴보았다.

Language: eng

URI: https://hdl.handle.net/10371/196314

https://dcollection.snu.ac.kr/common/orgView/000000177423

Files in This Item:

000000177423.pdf 4.53 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Mechanical Aerospace Engineering (기계항공공학부)
  - Theses (Ph.D. / Sc.D._기계항공공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share