Greedy Confidence Bound Techniques for Restless Multi-armed Bandit Based Cognitive Radio

Dong Shuyan

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Greedy Confidence Bound Techniques for Restless Multi-armed Bandit Based Cognitive Radio

DC Field	Value	Language
dc.contributor.advisor	이정우 교수님	-
dc.contributor.author	Dong Shuyan	-
dc.date.accessioned	2017-07-14T02:52:45Z	-
dc.date.available	2017-07-14T02:52:45Z	-
dc.date.issued	2013-08	-
dc.identifier.other	000000013817	-
dc.identifier.uri	https://hdl.handle.net/10371/123005	-
dc.description	학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 8. 이정우.	-
dc.description.abstract	인지 라디오는 제한된 스펙트럼 리소스를 활용하는 효율적인 기술이다. 이러한 소프트웨어 기반 인지 라디오 시스템은 무선 통신환경이 가지고 있는 통계적인 특성을 인식 하고 그것에 기반하여 스스로를 적응시키는 동작을 수행한다. 인지 라디오 시스템이 추구하는 목표는 크게 신뢰도 높은 통신을 언제 어디서나 구축하는것과 스펙트럼을 효율적으로 사용하는 두가지로 요약할수있다. 연속성이있는 자원의 효율적인 할당문제는 MAB로 알려진 알고리즘적 방식을 통해 모델링하여 반복적인 실험을 통해 해를 구할수있다. MAB 문제에서는 현재의 적절한 보상을 달성하기위한 전략과 현재의 이득을 어느정도 포기하고 미래에 예측가능한 이득을 증진시키는 두가지 전략을 적절히 배합하여 최적의 해를 찾는다. 일반적인 MAB 관련 문제에서는 확률적인 보상을 생성하는 다중 Arm 또는 이에 상응하는 등가의 미리 정의된 동작이 있으며, 플레이어는 예상되는 총 보상을 극대화하기 위한 K개의 Arm을 연속적으로 선택하는 정책을 찾는다. 이 논문에서는 보다 도전적인 문제인 비 베이지안 RMAB 문제를 다룬며 이때 Markov chain의 파라미트들은 미리 알려지지 않았다고 가정된다. 또한 이를 이용해 인지 라디오 시스템을 위한 실용적인 동적 스펙트럼 센싱에에대한 연구를 진행한다. 만약 채널 사용에 있어 우선순위를 갖는 1차 사용자의 채널 사용 패턴이 동일하고 독립적인 39 Markov chain으로 모델링 할때 이를 비 베이지안 RMAB문제로 다룰수있다. 본 논문의 기여도는 통계적 특성이 알려지지 않은 동적 채널에서 2시간 슬롯 Greedy 신뢰도 범위 알고리즘을 (Two slot GCB) 이용해 효율적인 다중 채널 스펙트럼 센싱 알고리즘을 개발한데에 있다. 이는 Markov 파라미터 추정기법과 채널 센싱기법이 결합된 형태를 띄고 있으며 이를 통해 기존의 UCB 알고리즘의 Regret rate이 ln(t) 을 가짐을 보일수있다. 마지막으로 본 연구를 통해 제한조건이 없는 연속 시간 Markov chain 채널 모델을 적용한 인지 라디오 시스템에서의 채널 접근 정책을 제안한다.	-
dc.description.abstract	The electromagnetic radio spectrum is a natural resource, the use of which by transmitters and receivers is licensed by governments. The underutilization of the electromagnetic spectrum leads us to think in terms of spectrum holes, where a spectrum hole is a band of frequencies assigned to a primary user, but, at a particular time and specific geographic location, the band is not being utilized by that user. Spectrum utilization can be improved significantly by making it possible for a secondary user (who is not being serviced) to access a spectrum hole unoccupied by the primary user at the right location and the time in question. Cognitive radio is viewed as a novel approach for improving the utilization of a precious natural resource: the radio electromagnetic spectrum. The cognitive radio, built on a software-defined radio, is defined as an intelligent wireless communication system that is aware of its environment and uses the methodology of understanding-by-building to learn from the environment and adapt to statistical variations in the input stimuli, with two primary objectives in mind: Highly reliable communication whenever and wherever needed	-
dc.description.abstract	Efficient utilization of the radio spectrum. Multi-armed bandit (MAB) problems are a class of sequential resource allocation problems, which has fundamental conflict between a strategy yielding high present reward and a strategy sacrificing present gain for better future reward. In a multi-armed bandit problem, there are multiple (N) arms which generate stochastic reward, and a player seeks a policy to select multiple K≥1 arms in order to maximize the expected total reward over multiple time-slots. A particularly challenging variant of MAB problems is the restless multi-armed bandit problem (RMAB), in which all arms evolve as Markov chains. Even in the Bayesian case, where the parameters of the Markov chains are known, this problem is difficult to solve, and has been proved to be PSPACE hard. We consider more challenging non-Bayesian RMAB problems, in which the parameters of the Markov chain are further assumed to be unknown a priori. We demonstrate our approach on a practical problem related to dynamic spectrum sensing for cognitive radio applications. If the primary user occupancy on each channel is modeled as an identical but independent Markov chain with unknown parameters, we obtain a non-Bayesian RMAB. Our main contribution in this work is that we develop an efficient new multi-channel spectrum sensing algorithm for unknown dynamic channels based on the two-slot greedy confidence bound algorithm (Two-slot GCB), which combines the Markov Chain parameter estimation and the channels sensing simultaneously and provides a new analysis which shows that UCB algorithms have the regret rate of ln(t). And finally we give out another solution for the accessing policy for Cognitive Radio of unconstrained continuous time Markov chain channel model.	-
dc.description.tableofcontents	Contents Abstract .................................................................................................................. i Contents ............................................................................................................... iii List of Figures ......................................................................................................... v Chapter 1 ............................................................................................................... 1 Introduction ........................................................................................................... 1 1.1 Cognitive Radio (CR) .............................................................................. 1 1.2 MAB problem ........................................................................................ 2 1.3 Combination .......................................................................................... 3 Chapter 2 ............................................................................................................... 5 MAB algorithms with a Markov Chain Reward Process ........................................ 5 2.1 The multi-armed bandit problem .......................................................... 5 2.2 UCB algorithm for MAB ......................................................................... 7 2.3 A simple induction for UCBs regret rate of log(t) ................................. 7 2.4 CR with Markov Chain channel model and Greedy algorithm ............ 12 2.4.1 Problem Modeling ........................................................................... 12 2.4.2 Belief Vector based Greedy Algorithm ............................................ 12 2.5 UCB algorithm for CR Markov Chain channel model .......................... 13 2.6 Two and One slot algorithm for CR Markov Chain channel model ..... 14 2.6.1 Two-slot Algorithm ...................................................................... 14 2.6.2 One-slot Algorithm ...................................................................... 15 2.6.3 Combination of Two-slot and One-slot algorithms ..................... 16 2.7 Simulation ............................................................................................ 17 2.7.1 Single User Single Play ..................................................................... 17 2.7.2 Single User Multi play ...................................................................... 18 2.7.3 Decentralized Multi User Single Play ............................................... 21 Chapter 3 ............................................................................................................. 26 Un-slotted channel access policy ........................................................................ 26 3.1 Continuous-time Markov channel model ............................................ 26 iv 3.2 Channel access policy without distribution constraints ....................... 27 3.2.1 Channel access behavior and mathematic modeling ....................... 27 3.2.2 One general channel access policy ..................................................... 31 3.2.3 Another general channel access policy ............................................... 32 3.3 Another model for collision time constraint .............................................. 34 Chapter 4 ............................................................................................................. 36 Conclusions .......................................................................................................... 36 개 요 .................................................................................................................... 38 Bibliography ......................................................................................................... 40	-
dc.format	application/pdf	-
dc.format.extent	1665252 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	Greedy Confidence Bound Techniques for Restless Multi-armed Bandit Based Cognitive Radio	-
dc.subject.ddc	621	-
dc.title	Greedy Confidence Bound Techniques for Restless Multi-armed Bandit Based Cognitive Radio	-
dc.type	Thesis	-
dc.description.degree	Master	-
dc.citation.pages	40	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2013-08	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000013817.pdf 1.59 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share