👁️

sa2va/4b/image

falai

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

vision

20 tokens / 1_megabitpixiels

Model Settings

e.g. Could you please give me a brief description of the image? Please respond with a detailed image prompt for re-generation in plain text

Upload a image or enter a url here

Have questions or want to share your experience with sa2va/4b/image? Join the conversation in our forums.