Sketch Video Synthesis: From Videos to Artistic Sketched Videos

Written By: kinza.sabir
Last Updated On: December 8, 2023

Sketch Video Synthesis, transforming videos into stunning hand drawn like video sketch faces new heights in the video domain.

The model is presented by Yudian Zheng from University of Saarland, Xiaodong Cun from Tencent AI Lab, Menghan Xia from Tencent AI Lab and Chi-Man Pun from University of Macau.

This latest model introduced a new approach for sketching videos using a unique method based on optimization, instead of traditional techniques. This method is based on Bezier Curves applied to each frame of the video individually.

Traditional edge detection method were used for image sketching that effectively create the clear and concise output but in these outcomes, the artistic touch is missing. Another technique, like video doodling, is employed for adding sketches to videos, yet it doesn’t meet the criteria for achieving a comprehensive level of abstraction across the entire video.

Exploration of Sketch Video Synthesis

This approach sketch video synthesis, is focused to create a sketch depiction of the objects in a video, using multiple vector strokes. Each stroke is defined by four-point Bezier Curves, emphasizing the preservation of both semantic precision and temporal uniformity.

Scalable vector graphics (SVG) has been used for image sketching but the idea of sketching videos needs more research. As indicated in recent studies on image sketching, line drawings are created utilizing Bezier Curve’s control points, strategically refined to accurately depict the scene. The latest sketch video synthesis is a novel optimization-driven system designed to produce SVG-format sketch videos that showcase both semantic coherence and consistent temporal flow.

Generating superior quality video sketch requires ongoing optimization and careful initialization. Neural Layered Atlas (NLA) which is method for video editing was used. CLIP feature along with the XDoG edge detection was utilize for semantic-aware edges while initialization. Then enhanced the position of these points through multiple criteria to guarantee both semantic alignment and temporal consistency. Identified points were converted into Bezier Curves and apply a differentiable rasterizer to visually represent them in frame-by-frame images.

This approach will be very useful for video editing applications and doodle creation for bringing creativity to a single frame. This method also helps in replacing original content with the sketches into the scenes. The research paper is available on Arxiv whereas other resources are available on GitHub.

The researchers also presented a brief video of this model.

Dataset and Evaluation

This method was evaluated on DAVIS dataset for marking the outline of objects in the video. It takes approximately 29 minutes, consuming around 19.5GB on a single NVIDIA GeForce RTX 3090 GPU to optimize a video sketch.

It was also evaluated on the manual dataset and first 50 frames of the video were used to generate the results. In particular, there were no existing video sketching method for comparison but some image sketching and edge detection techniques were used for comparison such as Canny and HED.

Conclusion

This method faces challenges while tackling with the motion of non-rigid body. Also, the method shows some textured artifacts when their is complex foreground elements in the video. These challenges can be mitigated by dividing the video into small sections and adding more layers at the cost of accuracy of segmentations and computational resources.

This video synthesis method showed extra ordinary outcome by generating coherent and semantically rich videos, keeping the quality of the sketches and appropriate level of abstraction. Through this method, creative sketch video can be generated using simple Bezier curves.