MLNews

StreamDiffusion: Image-to-image generation in real-time

Get yourself introduced to StreamDiffusion, a thought of revolutionizing real-time visuals for Metaverse, video game graphics, live streaming, and broadcasting with the magic of instant creation. Researchers from UC Berkeley, University of Tsukuba and some other institutes are the innovators behind this creation.

An interactive framework StreamDiffusion helps the image generation models to operates efficiently within a real-time environment. It certainly generates a high quality output while making the user independent according to their own desire and needs of the application.

StreamDiffusion

The model takes the input data in the form of batch of images and generates the batch of output images instantly and with high quality making the process more interactive and fluid. The research is available on Arxiv whereas the code is open source and available on GitHub.

StreamDiffusion Example

The white spot on the top of the image goes down creating different frames and the StreamDiffusion is going to generate a beautiful semantic output showing that the sun is going down as the frames passes by. The fluency in the changes of frames is so frequent that it gives the effect of smooth video. I would definitely suggest you to see this captivating demo that researchers have introduced.

StreamDiffusion

Have you ever seen the Statue of Liberty moving? Now you can see this through this amazing model by changing the input and the algorithm used in this model generates interlinked frames to create a formation of videos. The continuous input throigh the techniques of Batch turned out to be extremely fruitful in terms of high quality output with extreme fluency.

Based on text or image prompt, diffusion models has shown outstanding performance but they face different challenges when continuous input and real-time interaction is required. The caused delay become evident in Metaverse, broadcasting, live video streaming and video games.

Lets get to know about StreamDiffusion

For the above mentioned challenges, researchers presented StreamDiffusion as a solution that does not interrupt existing model instead it is feasible in integration with other models without major modifications. It is a latest diffusion pipeline solution for a good throughput. The strategy utilized by the StreamDiffusion is very extensive which includes the Stream Batch strategy, Residual Classifier-Free Guidance (RCFG), an input-output queue, Stochastic Similarity Filter, a pre-computation procedure, and model acceleration tools with a tiny-autoencoder.

Whenever the images are not showing changes in the environment without active users, the identical input are inserted into the VAE and U-Net which increase the GPU resources. To reduce the computational cost stochastic similarity filter (SSF) strategy is used which reduce the GPU usage.

This approach enables N-step denoising diffusion models which includes iteratively image refinement over multiple steps to increase the quality and reducing the noise. This method helps to generate high throughput without compromising on responsiveness and speed while generating images in real-time where continuous input is involved.

Residual Classifier-Free Guidance (RCFG)

The traditional approach calculates the negative condition noise again and again throughout the process of diffusion which increase the computational cost. To overcome this challenge, residual classifier-free guidance (RCFG) was introduced which reduce the redundant calculation through approximation of negative condition via virtual residual noise. This approach maintains the alignment and quality of the generated images.

Performance and Evaluation

From sequential denoising to batch denoising, this transition enhanced the speed by approximately 1.5 at different denoising levels. The residual classifier-free guidance (RCFG) strategy also showed enhanced speed with the ratio of 2.05 faster than conventional classifier-free guidance method which ensures the faster generation of images and processing. With the help of mature acceleration tool along with the effects of these combined strategies yields outstanding throughput with the image-to-image generation of 91.07 frames per second (fps) on a single RTX 4090 GPU.

StreamDiffusion also emphasize on energy efficiency with the reduction of energy consumption by 2.39 times on an RTX 3060 GPU and 1.99 times on an RTX 4090 GPU while handling static scene inputs.

Conclusion

For interactive diffusion generation StreamDiffusion is pipe-line solution for image-to-image generation in real-time with different optimization strategy. Stream batch, residual classifier-free guidance (RCFG), IO-queue for parallelization, stochastic similarity filter, pre-computation, Tiny Auto Encoder and
the use of model acceleration tool ensures the enhancement of throughput and GPU usage.

The efficiency is extremely useful in number of application such as live video streaming, video games and Metaverse providing high performance and reduction of unnecessary utilization of energy.

References


Similar Posts

Signup MLNews Newsletter

What Will You Get?

Bonus

Get A Free Workshop on
AI Development