Part 7. 강화 학습 기초

Home / 엔비디아 알파마요(Alpamayo) / Volume 1. 기초 이론 / Part 7. 강화 학습 기초

Part 7. 강화 학습 기초

Part 7. 강화 학습 기초

Chapter 61. 마르코프 결정 과정(MDP)
Chapter 62. 가치 함수와 벨만 방정식
Chapter 63. 정책 경사 기법(Policy Gradient)
Chapter 64. PPO(Proximal Policy Optimization) 알고리즘
Chapter 65. 보상 모델(Reward Model) 설계
Chapter 66. 강화 학습과 미세 조정의 결합
Chapter 67. GRPO(Group Relative Policy Optimization)
Chapter 68. 오프라인 강화 학습(Offline RL) 기초
Chapter 69. 모방 학습(Imitation Learning)
Chapter 70. 역강화 학습(Inverse Reinforcement Learning)

Generated by Rust Site Gen