{"id":200,"date":"2023-07-13T11:27:07","date_gmt":"2023-07-13T11:27:07","guid":{"rendered":"https:\/\/34.239.202.173\/?p=200"},"modified":"2024-02-01T15:31:02","modified_gmt":"2024-02-01T15:31:02","slug":"revolutionize-longnet-transformers-to-1000000000-tokens","status":"publish","type":"post","link":"https:\/\/mlnews.dev\/revolutionize-longnet-transformers-to-1000000000-tokens\/","title":{"rendered":"REVOLUTIONIZE: LONGNET &#8211; Epic Transformers To 1,000,000,000 Tokens"},"content":{"rendered":"\n<p>Introducing LONGNET, a revolutionary language model that supports one billion tokens, while maintaining system performance and quality. LONGNET is revolutionizing long sequences, along with linear computation for optimization. This research was first published at<strong>&nbsp;Microsoft Research,<\/strong>&nbsp;with the help of different researchers such as&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Ding%2C+J\" target=\"_blank\" rel=\"noreferrer noopener\">Jiayu Ding<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Ma%2C+S\" target=\"_blank\" rel=\"noreferrer noopener\">Shuming Ma<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Dong%2C+L\" target=\"_blank\" rel=\"noreferrer noopener\">Li Dong<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Zhang%2C+X\" target=\"_blank\" rel=\"noreferrer noopener\">Xingxing Zhang<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Huang%2C+S\" target=\"_blank\" rel=\"noreferrer noopener\">Shaohan Huang<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Wang%2C+W\" target=\"_blank\" rel=\"noreferrer noopener\">Wenhui Wang<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Wei%2C+F\" target=\"_blank\" rel=\"noreferrer noopener\">and Furu Wei<\/a>. This mechanism is allowing the system to gain optimization over a long sequence of text. Research has found a new way to handle billions of tokens. This approach can model extremely long pieces of tokens.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image1.webp\" alt=\"LONGNET : Scaling transformer to 1000,000,000 tokens\" width=\"760\" height=\"390\"\/><figcaption class=\"wp-element-caption\"><em>LONGNET : Scaling transformer to 1000,000,000 tokens<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Embrace The Legacy Of Past Research<\/h2>\n\n\n\n<p>Before the use of LONGNET researchers faced many limitations in processing long sequences. RNN-style model was used, which supports long pieces of sequence, but its sequential nature effect parallelization training. Other than that, State model was also used, offering some improvement to the previous approach, by working as CNN at the time of training and transforming into RNN at test time. After viewing their performance, it was concluded that their regular sequence was not strong for transformers which had superior results. Transformers, on the other hand, suffers from the problem of computation complexity, and consequently requires multiple GPUs and longer training time to train in most cases.<\/p>\n\n\n\n<p>To overcome previous limitations new model was proposed named LONGNET. LONGNET  expands the model&#8217;s ability to hold different dependencies. Effectively recognizing far-apart information without affecting computation efficiency. With the help of linear computation complexity, it can hold tokens over 1 billion, efficiently. Allowing parallel training over multiple GPU devices, while improving scalability, and serving as a distributed trainer. It provides strong performance over large and small sets of tokens.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image7.webp\" alt=\"LONGNET outperforms dense Transformers\" width=\"440\"\/><figcaption class=\"wp-element-caption\">LONGNET outperforms dense Transformers<\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Marvels Of Future Work<\/h2>\n\n\n\n<p>In future scaling sequence of tokens will be the focus point. Pushing the limits, to explore tokens longer than 1 billion, addressing computation and memory constraints. In short handle large-scale text. It can easily extend to different levels of domains and tasks. Researchers will continue to seek more ways to enhance language performance.<\/p>\n\n\n\n<p>LONGNET will do multi-modeling tasks, which means models can further explore, and process information from multiple models including text, audio, video, and images. Context-rich data will be handled by the model. Enhancing prompting techniques to find how further the context window can go, gaining effectiveness. To gain more accurate response and contextual aware response.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image6.webp\" alt=\"LONGNET\" width=\"480\" height=\"330\"\/><figcaption class=\"wp-element-caption\">LONGNET<\/figcaption><\/figure><\/div>\n\n\n<p>It is a strong base model that performs fine-tuning and shift learning tasks. It will be analyzed for pretraining on the large-scale dataset and fine-tuning downstream tasks. To grab large dependencies to improve performance. The future search will explore techniques to improve effectiveness across multiple scenarios and make it applicable to use in real-world scenarios.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Availability<\/h2>\n\n\n\n<p>You can check the whole research paper at <a href=\"https:\/\/arxiv.org\/abs\/2307.02486\" target=\"_blank\" rel=\"noopener\">arxiv.org<\/a> its code is available on<a href=\"https:\/\/github.com\/microsoft\/unilm\/\" target=\"_blank\" rel=\"noopener\"> GitHub<\/a> and view another source of <a href=\"https:\/\/github.com\/kyegomez\/LongNet\" target=\"_blank\" rel=\"noopener\">LONGNET<\/a>. Not only this, LONGNET  dataset is also open to all people at <a href=\"https:\/\/paperswithcode.com\/dataset\/the-stack\" target=\"_blank\" rel=\"noopener\">paperswithcode.com <\/a>&nbsp;researchers have also attached similar datasets for better understanding. You can check their <a href=\"https:\/\/paperswithcode.com\/paper\/longnet-scaling-transformers-to-1000000000#code\" target=\"_blank\" rel=\"noopener\">web page<\/a> to access all resources. To explore pretraining and multimodal you can also check the <a href=\"https:\/\/thegenerality.com\/agi\/\" target=\"_blank\" rel=\"noopener\">thegenerality.com<\/a>.  All these sources are open and available for access. You can also view different models and their progress on above pages.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image5.webp\" alt=\"LONGNET-Research paper\" width=\"640\" height=\"222\"\/><figcaption class=\"wp-element-caption\">Research paper<\/figcaption><\/figure><\/div>\n\n\n<p>The implemented code is available on GitHub, as open source.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installation<\/h2>\n\n\n\n<p>To access the whole system, you can install it on your systems. Two methods can be used for installation one is Git clone and the other is Pip install (you need to install python).<\/p>\n\n\n\n<p>In the Git clone method, the user needs to clone the LONGNET repo from GitHub. Navigate the cloned directory and install the dependencies. For a clear understanding check the <a href=\"https:\/\/github.com\/kyegomez\/LongNet\" target=\"_blank\" rel=\"noopener\">page<\/a>.<\/p>\n\n\n\n<p>In Pip install method install LONGNET directly from PYPL using pip. For clear understanding check the <a href=\"https:\/\/github.com\/kyegomez\/LongNet\" target=\"_blank\" rel=\"noopener\">page<\/a>.<\/p>\n\n\n\n<p>After the installation from any of the  method. check its usage listed on the <a href=\"https:\/\/github.com\/kyegomez\/LongNet\" target=\"_blank\" rel=\"noopener\">page<\/a>. On the same page, they have listed the inputs\/Output of the system.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Potential Application<\/h2>\n\n\n\n<p>It can be implemented to fix real-time problems such as in the:<\/p>\n\n\n\n<ul>\n<li>Text generation<\/li>\n\n\n\n<li>Machine translation<\/li>\n\n\n\n<li>Social media monitoring<\/li>\n\n\n\n<li>Systematic analysis<\/li>\n\n\n\n<li>Document analysis<\/li>\n\n\n\n<li>Multimodal Q\/A system<\/li>\n\n\n\n<li>clinical decision support systems<\/li>\n\n\n\n<li>Genomic data modeling<\/li>\n\n\n\n<li>Financial sentiment analysis&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>Distributed training of LONGNET  is done on two GPU devices. It parallelized the training by partitioning the dimension in sequence.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image4.webp\" alt=\"Distributed training of LONGNET\" width=\"350\" height=\"380\"\/><figcaption class=\"wp-element-caption\">Distributed training of LONGNET on two GPU devices<\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Summarize<\/h2>\n\n\n\n<p>The researcher has introduced a new approach known as the LONGNET. LONGNET transformer deals with a large number of tokens without affecting their performance. Gain dilated attention, to expand the distance between tokens. It contains many advantages like integration with the seamless existing system, transformer-based optimization, and complex linear computation. It can serve as a distributed training for billions of sequences.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image3.webp\" alt=\"Building blocks of dilated attention used in LONGNET\" width=\"620\" height=\"480\"\/><figcaption class=\"wp-element-caption\">Building blocks of dilated attention used in LONGNET<\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<p>After the implementation of different models, it is observed <a href=\"https:\/\/mlnews.dev\/\">LONGNET<\/a> provides outstanding performance, in language modeling. by doing a few computation efficiency and effectiveness is achieved.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/FourthBlog\/images\/Image2.webp\" alt=\"LONGNET-RESULTS\" width=\"900\" height=\"670\"\/><figcaption class=\"wp-element-caption\"><em>RESULTS<\/em><\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>This research was promising for a long sequence of tokens in the language model. It has overcome the limitation of previous models, leverage the efficiency and effectiveness of the system, and makes it unique from others. LONGNET is a surpassing long\/short sequence model, showcasing its potential in various applications. Addressing problems in the field of language modeling. It has a bright future, this approach will be utilized for a long period.<\/p>\n\n\n\n<div class=\"wp-block-group post-tag-div is-layout-constrained wp-block-group-is-layout-constrained\"><div class=\"wp-block-group__inner-container\">\n<hr class=\"wp-block-separator has-text-color has-cyan-bluish-gray-color has-alpha-channel-opacity has-cyan-bluish-gray-background-color has-background\"\/>\n\n\n\n<p><strong>Read More<\/strong><\/p>\n\n\n<div class=\"taxonomy-post_tag post-tag wp-block-post-terms\"><a href=\"https:\/\/mlnews.dev\/tag\/artificial-intelligence\/\" rel=\"tag\">Artificial Intelligence<\/a><span class=\"wp-block-post-terms__separator\">   <\/span><a href=\"https:\/\/mlnews.dev\/tag\/longnet\/\" rel=\"tag\">LONGNET<\/a><span class=\"wp-block-post-terms__separator\">   <\/span><a href=\"https:\/\/mlnews.dev\/tag\/machine-learning\/\" rel=\"tag\">Machine Learning<\/a><span class=\"wp-block-post-terms__separator\">   <\/span><a href=\"https:\/\/mlnews.dev\/tag\/transformer\/\" rel=\"tag\">Transformer<\/a><\/div><\/div><\/div>\n\n\n\n<hr class=\"wp-block-separator has-text-color has-cyan-bluish-gray-color has-alpha-channel-opacity has-cyan-bluish-gray-background-color has-background\"\/>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Similar Posts<\/strong><\/p>\n\n\n<ul class=\"wp-block-latest-posts__list is-grid columns-3 has-dates has-text-color has-black-color wp-block-latest-posts\"><li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"201\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/Best-AI-Image-Generator-Tools-300x201.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"Best AI Image Generator Tools\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/Best-AI-Image-Generator-Tools-300x201.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/Best-AI-Image-Generator-Tools.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/discover-the-best-ai-image-generator-tools-in-2024\/\">Discover the Best AI Image Generator Tools in 2024<\/a><time datetime=\"2024-02-06T14:53:03+00:00\" class=\"wp-block-latest-posts__post-date\">February 6, 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"201\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/image-to-video-ai-tools-_1_-300x201.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"image to video ai tools\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/image-to-video-ai-tools-_1_-300x201.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2024\/01\/image-to-video-ai-tools-_1_.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/explore-the-6-top-image-to-video-ai-tools\/\">Explore the 6 top Image-to-Video AI Tools to Add Amazing Artistic Touch<\/a><time datetime=\"2024-02-03T13:36:55+00:00\" class=\"wp-block-latest-posts__post-date\">February 3, 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"201\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/AI-music-generator-300x201.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"best AI music generator tools\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/AI-music-generator-300x201.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/AI-music-generator.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/best-ai-music-generator-tools-of-2024\/\">Best AI Music Generator Tools of 2024 Easily Accessible to Every Musician \u00a0<\/a><time datetime=\"2024-01-31T00:22:19+00:00\" class=\"wp-block-latest-posts__post-date\">January 31, 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"201\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/Image-to-Image-1-300x201.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"image2image generator AI tools\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/Image-to-Image-1-300x201.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/Image-to-Image-1.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/explore-the-best-image2image-ai-models\/\">Explore the Best Image2Image AI Models that Revamp Your Creativity<\/a><time datetime=\"2024-01-27T11:06:31+00:00\" class=\"wp-block-latest-posts__post-date\">January 27, 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"201\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/text-to-image-ai-tools-_1_-300x201.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"Text to image AI tools\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/text-to-image-ai-tools-_1_-300x201.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/12\/text-to-image-ai-tools-_1_.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/best-text-to-image-ai-tools\/\">Explore the 10 Best Text to Image AI Tools that Generate Incredible Graphics<\/a><time datetime=\"2024-01-10T15:32:33+00:00\" class=\"wp-block-latest-posts__post-date\">January 10, 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"200\" src=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/08\/Google-Health-AI-Tool-300x200.webp\" class=\"attachment-medium size-medium wp-post-image\" alt=\"Google Health AI Tool\" style=\"\" srcset=\"https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/08\/Google-Health-AI-Tool-300x200.webp 300w, https:\/\/mlnews.dev\/wp-content\/uploads\/2023\/08\/Google-Health-AI-Tool.webp 500w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/mlnews.dev\/google-health-ai-tool-empowering-health-management\/\">Google Health AI Tool: Empowering Health Management<\/a><time datetime=\"2023-08-06T11:55:06+00:00\" class=\"wp-block-latest-posts__post-date\">August 6, 2023<\/time><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Introducing LONGNET, a revolutionary language model that supports one billion tokens, while maintaining system performance and quality. LONGNET is revolutionizing long sequences, along with linear computation for optimization. This research was first published at&nbsp;Microsoft Research,&nbsp;with the help of different researchers such as&nbsp;Jiayu Ding,&nbsp;Shuming Ma,&nbsp;Li Dong,&nbsp;Xingxing Zhang,&nbsp;Shaohan Huang,&nbsp;Wenhui Wang,&nbsp;and Furu Wei. This mechanism is allowing the [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":203,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[271],"tags":[15,19,14,20],"_links":{"self":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/200"}],"collection":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/comments?post=200"}],"version-history":[{"count":16,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/200\/revisions"}],"predecessor-version":[{"id":7278,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/200\/revisions\/7278"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/media\/203"}],"wp:attachment":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/media?parent=200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/categories?post=200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/tags?post=200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}