LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Previous ArticleSa2VA: A Unified AI Framework for Dense Grounded Video and Image Understanding through SAM-2 and LLaVA Integration
Next Article DistroWatch Weekly, Issue 1104