MLNews

GeoDream: Text Prompt to High-Fidelity and Consistent 3D Image Generation

Its GeoDream, an innovative technique separates 2D visuals and geometry, refining precision and consistency in transforming 2D images into 3D models.

Researchers Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang and Xinlong Wang from Beijing Academy of Artificial Intelligence, BUPT, Tsinghua University and Peking University presented this model.

GeoDream is an innovative model that is designed to improve the creation of 3D structure from 2D data. This model ensures that the generated 3D image has clear and consistent geometric shapes, reducing ambiguity in their structure. It also aims to retain diversity in the models and maintain the accuracy in the generated 3D structure. GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 ร— 1024) and adheres more closely to semantic coherence.

The model takes the textual input and generates the output in the form of consistent and accurate 3D images in the form of rendered images or textured meshes based on 2D data. 

GeoDream workflow

The field of 3D generation has vastly flourished due to deep generative models. These models generate photo-realistic and diverse 3D images with the help of textual prompts provided as an input such as ProlifcDreamer, DreamFusion and Fantasia3D. These models are efficient but also highlighted the inconsistency issues (Janus problem) during the lifting process as the output rely on 2D diffusion model for training. Some models addressed this issue through negative prompt text and by altering score function, after all these efforts the issue of inconsistency remains the same in 3D generation of images.

Exploration of GeoDream

GeoDream has a unique ability of combining specific 3D assumptions with 2D cues, allowing for the creation of intricate 3D objects while significantly reducing inconsistencies in their structures. It has the capability to produce high-resolution rendered images at 1024ร—1024 resolution and textured meshes with exceptional fidelity, effectively addressing the well-known challenges associated with Janus issues.

The model GeoDream is applicable in the field of augmented and virtual reality, education, online conferences, architecture, animation, gaming and movie industry. Its research is available on Arxiv whereas, code of this model is open-source and is available on GitHub.

How GeoDream works

This model involves a sequence of steps to generate 3D geometric structures from 2D images while maintaining spatial consistency.

This model predicts images from different perspectives based on certain parameters through multi-view diffusion model which involves blending visual information across multiple viewpoints or angles. A cost volume is constructed using these predicted images. This volume captures the correlations between the images in 3D space, considering variations due to changes in perspective. The cost volume from the predicted images acts as “native” 3D geometric priors. This information is utilized as a geometric priors, which ensures spatial consistency in 3D models.

Evaluation

The performance of GeoDream was evaluated with the latest 3D generation method such as DreamFusion, ProlificDreamer, MVDream and Fantasia3D. 35 sample prompts were collected from different sources along with the real user input. From the comparison it was clear that this 3D priors offer extensive generality in challenging and diverse cases and effectively generate 3D assets with lighting and texture styles.

The results of quantitative comparison with the baseline depicts that this method significantly outperformed in context with the quality, text-image consistency and 3D consistency. The qualitative comparison shows that this model excels in multifaceted nature of generated 3D images in terms of meshes and rendered images in the form of realistic textural details.

Conclusion

This model GeoDream enhances the rendering fidelity of image and textual meshes while handling Janus problems. The comparisons clearly shows that this model outperforms the previous 3D generation models through its effectiveness and flexibility.

References


Similar Posts

Signup MLNews Newsletter

What Will You Get?

Bonus

Get A Free Workshop on
AI Development