TinySAM: An Effective and Efficient Segment Anything Model

Written By: kinza.sabir
Last Updated On: January 4, 2024

TinySAM is not tiny—it’s enormous because it is revolutionizing the world of Image Segmentation. It is also a smart due zero-shot technique, as it doesn’t require explicit training to create segmentation. Researchers with extensive and enormous knowledge from University of Science and Technology of China and Huawei Noah’s Ark Lab presented the TinySAM.

TinySAM takes the image as an input and generates the outcome by segmenting the required object or environment. The three techniques used by Tiny Segment Anything Model (TinySAM) are;

Online hard prompt sampling
Quantization technique
Hierarchical segmentation

The merger of all the above mentioned techniques transforms the SAM into TinySAM in terms of computational cost, resources making the process more effective and efficient in terms of performance and accuracy.

Why TinySAM was necessary?

SAM delivered outstanding results in vision related tasks but it has a complex structure that demands high computational resources which becomes hard on the system with limited resources. MobileSAM, a recent approach was also designed to cater this issue but the training process was not efficient causing performance delay. So, there was a need for a model that should be faster and utilize less resources without compromising the quality and efficiency to segment an extensive range of objects.

Some of the examples generated by the model are mentioned above. These images are actually segmented output generated by the model. The framework TinySAM has the ability to segment anything while maintaining the zero-shot technique intact. It involves the teaching of a compact and portable model (student model) that inherits the knowledge from a complex model (teacher model).

The process of “Online hard prompt sampling” was utilized to make the knowledge inheritance process more efficient and effective. After completing the training process, a quantization technique was applied for the task of image segmentation which reduces the computational cost and resources.

From the above images you can see that the image is given as an input and TinySAM generated the segmented output with highlighting different objects with various colors to distinguish among the objects. The green box represents the box prompt.

A hierarchical segmentation technique breakdown the task into small. Instead of segmenting everything, the architecture divides the task into smaller segments to make the process faster and easier in terms of accuracy and performance.

This model generates more precise, accurate and smooth masks compared with FastSAM and MobileSAM for segmentation tasks. Since TinySAM is the advanced version of SAM in terms of functionality but in the terms of speed, TinySAM is 100 times speedier than the original SAM. Researchers performed extensive experimentation without training TinySAM for certain tasks and it outperformed all other methods with a great margin.

Review of TinySAM’s Demo

The interface of TinySAM presented by the researchers is quite innovative and user friendly compared to other segmentation models such as Aligning And Prompting Everything (APE)’s interface. TinySAM‘s interface has the features of uploading images and drawing images. Also, user can crop the image, take the image using webcam, copy and paste the image, write using pointer and erase the written using eraser.

After running the demo, the model is unable to generate the result presented by the researchers in the research paper. I gave the sample image to the model presented below.

After highlighting the required masking area and it generated unexpected result. I selected the glass and bottle of milk to segment and model highlighted the arm without creating green boundary boxes. The output is presented below.

Wrap Up!

Some smart techniques were utilized to make TinySAM more effective and efficient such as Online hard prompt sampling, quantization and hierarchical segmentation . These techniques helps to utilize minimum computational resources for the process of masking in image segmentation but output is not satisfactory.