BIJUNG:6.2.1 CLIP (Contrastive Language-Image Pre-training)의 원리: 이미지와 텍스트를 동일한 잠재 공간(Latent Space)에 매핑하기.