S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Civil & Environmental Engineering (건설환경공학부) ICASP13
Life-cycle policies for large engineering systems under complete and partial observability
- Andriotis, Charalampos; Papakonstantinou, Kostas
- Issue Date
- 13th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP13), Seoul, South Korea, May 26-30, 2019
- Management of structures and infrastructure systems has gained significant attention in the pursuit of optimal inspection and maintenance life-cycle policies that are able to handle diverse deteriorating effects of stochastic nature and satisfy long-term objectives. Such sequential decision problems can be efficiently formulated along the premises of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP), which describe agent-based acting in environments with Markovian dynamics, equipped with rewards, actions, and complete or partial observations. In systems with relatively low dimensional state and action spaces, MDPs and POMDPs can be satisfactorily solved using different dynamic programming algorithms, such as value iteration with or without synchronous updates and pointbased approaches for partial observability cases. However, optimal planning for large systems with multiple components is computationally hard and severely suffers from the curse of dimensionality. Namely, the system states and actions can grow exponentially with the number of components, in the most general and adverse case, making the problem intractable by conventional dynamic programming schemes. In this work, Deep Reinforcement Learning (DRL) is implemented, with emphasis in the development and application of deep architectures, suitable for large engineering systems. The developed approach leverages componentwise information to prescribe component-wise actions, while maintaining global optimality on the system level. Thereby, the system life-cycle cost functions are efficiently parametrized for large state and action spaces through nonlinear approximations, enabling adept planning in complex decision problems. Results are presented for a multi-component system, evaluated against various condition-based policies.