{"id":2103,"date":"2023-08-24T03:01:53","date_gmt":"2023-08-24T03:01:53","guid":{"rendered":"https:\/\/mlnews.dev\/?p=2103"},"modified":"2023-12-03T14:41:39","modified_gmt":"2023-12-03T14:41:39","slug":"wanjuan-ignitingwith-2tb-of-english-and-chinese-data","status":"publish","type":"post","link":"https:\/\/mlnews.dev\/wanjuan-ignitingwith-2tb-of-english-and-chinese-data\/","title":{"rendered":"WanJuan: Igniting Multimodal Empowerment with 2TB of English and Chinese Data"},"content":{"rendered":"\n
Discover the remarkable world of WanJuan, where language meets images and videos, unlocking boundless possibilities! The WanJuan dataset is a collaborative effort involving researchers from Shanghai AI Laboratory<\/em><\/strong>, including Conghui He<\/em><\/strong> and Zhenjiang Jin<\/strong><\/em>, dedicated to fostering advancements in language and multimodal understanding.<\/p>\n\n\n\n It presents an opportunity to delve into the world of multimodal exploration through its extensive 2TB dataset, featuring a rich array of English and Chinese content. Unleash the potential of text, image-text, and video modalities, driving progress in language models and multimodal AI systems. This all-encompassing resource invites researchers to engage in cross-modal comprehension, paving the way for innovative applications in the realms of NLP and Computer Vision.<\/p>\n\n\n