In this session, we will explore some groundbreaking insights into Large Language and Vision Models (LLVMs) with one of the paper’s co-authors, Young-Jun Lee. These models have recently shown remarkable capabilities in tasks that require both perception and cognition, making them key players in fields ranging from coding to advanced reasoning.
Young-Jun will present findings from the paper, "Intriguing Properties of Large Language and Vision Models," which dives into the impressive, yet complex, performance of LLVMs. While these models excel at high-level reasoning, their grasp on fundamental perception tasks, like multi-modal vision processing (MMVP), is surprisingly limited. The paper evaluates popular LLVM models, such as LLaVA, against 10 benchmarks to shed light on properties like permutation invariance, alignment, and the critical roles of different model layers in visual understanding.
Young-Jun Lee is a Ph.D. student at KAIST. His research primarily focuses on enhancing social interaction in conversations between humans and LLM-based agents by incorporating empathy, persona, image-sharing behavior, persona commonsense knowledge, and long-term engagement. He is also interested in data-centric AI, particularly in building high-quality and diverse dialogue datasets using LLMs. Recently, his work has expanded to exploring large language and vision models. He completed his Master's degree at KAIST and earned his Bachelor's degree at Sungkyunkwan University (SKKU).
©2024 SSI Club. All Rights Reserved