{"id":223,"date":"2023-07-16T03:42:34","date_gmt":"2023-07-16T03:42:34","guid":{"rendered":"https:\/\/cdn.mlnews.dev\/?p=223"},"modified":"2023-10-02T06:06:46","modified_gmt":"2023-10-02T06:06:46","slug":"fot-lama-unlocking-the-potential-of-long-context","status":"publish","type":"post","link":"https:\/\/mlnews.dev\/fot-lama-unlocking-the-potential-of-long-context\/","title":{"rendered":"Is A Long Context Sequence Achievable? FOT-LAMA Unlocking The Potential Of Long Context"},"content":{"rendered":"\n<p>Revolutionary language model is here to improve context length in both single and multiple documents with the help of FOT. Its main purpose is to expand context length without compromising on performance. Extending context length beyond its limitation helps it to understand and process more information. It was first published at Google DeepMind, IDEAS NCBR, the Polish Academy of Sciences, and the University of Warsaw. Multiple researchers have put effort in this research including&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Tworkowski%2C+S\" target=\"_blank\" rel=\"noreferrer noopener\">Szymon Tworkowski<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Staniszewski%2C+K\" target=\"_blank\" rel=\"noreferrer noopener\">Konrad Staniszewski<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Pacek%2C+M\" target=\"_blank\" rel=\"noreferrer noopener\">Miko\u0142aj Pacek<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Wu%2C+Y\" target=\"_blank\" rel=\"noreferrer noopener\">Yuhuai Wu<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Michalewski%2C+H\" target=\"_blank\" rel=\"noreferrer noopener\">Henryk Michalewski<\/a>,&nbsp;<a href=\"https:\/\/arxiv.org\/search\/cs?searchtype=author&amp;query=Mi%C5%82o%C5%9B%2C+P\" target=\"_blank\" rel=\"noreferrer noopener\">Piotr Mi\u0142o\u015b<\/a>. <\/p>\n\n\n\n<p>FOT has achieved good results by implementing it in real-life scenarios. FOT is different from other models because it processes information in a different manner.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/longlama\/longlama1.webp\" alt=\"FOT-LAMA\" width=\"700\" height=\"480\"\/><figcaption class=\"wp-element-caption\">Long LAMA<\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\" style=\"text-transform:capitalize\">Explore Challenges Of Long-Context Modeling<\/h2>\n\n\n\n<p>Previous research faced problems regarding processing longer text. They work well in some scenarios but struggle to attain long context length. Discover inaccurate results at a certain point. This response makes the system less effective. It was easy to handle context length till 2,000 tokens, but the system cannot handle tokens over this limit. This drawback causes problems in the system. When adding multiple documents, it becomes overloaded with information that starts giving irrelevant information.<\/p>\n\n\n\n<p>Previous language models work well with short documents and hard with long pieces of documents. It was hard to process long documents and showcase the right information.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"text-transform:capitalize\">FOT overcome previous work Limitation<\/h2>\n\n\n\n<p>FOT has a significant impact on language models to optimize long context length. It can easily handle tokens above 2,000 and capture full lengthy passages to attain accurate results. Ultimately a new concept of memory attention layer was launched, allowing the model to accept tokens from large contexts.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><br><a href=\"https:\/\/huggingface.co\/syzymon\/long_llama_3b\" target=\"_blank\" rel=\"noopener\"><strong><\/strong><strong>LongLLaMA-3B<\/strong><\/a><\/td><td><strong>LongLLaMA-7B<br><em>(coming soon)<\/em><\/strong><\/td><\/tr><tr><td><strong>Source model<\/strong><\/td><td><a href=\"https:\/\/huggingface.co\/openlm-research\/open_llama_3b_easylm\" target=\"_blank\" rel=\"noopener\">OpenLLaMA-3B<\/a><\/td><\/tr><tr><td>Source model tokens<\/td><td>1T<\/td><\/tr><tr><td>Fine-tuning tokens<\/td><td>10B<\/td><\/tr><tr><td>Memory layers<\/td><td>6, 12, 18<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">                                                                           Models<\/figcaption><\/figure>\n\n\n\n<p>FOT addresses issues related to distraction when holding tokens from multiple sources. FOT filters tokens and provides best-suited results with accuracy. Improved its ability to capture accurate responses in multi-source documents. It can handle distraction, and extrapolating longer context. This approach incorporates cross-batch training and understanding different keys and values. FOT responds to a wide range of documents and improves its performance in multi-source documents.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/longlama\/longlama2.webp\" alt=\"The Focused Transformer Overview\" width=\"680\" height=\"335\"\/><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\" style=\"text-transform:capitalize\">Outstanding impact on the world<\/h2>\n\n\n\n<p>It has a huge impact on the real world, as its implementation makes a difference in the world. It helps in analyzing comprehensive large documents, use in information extraction, and question\/answers category. Helping specialists like content creators and informational retrieval, for better results.<\/p>\n\n\n\n<p>Enhancing the capability of chatbots, by providing more engaging and relevant response. It helps in creating more genuine summarized storytelling content with few inputs. Making itself adaptable to the environment, by fulfilling user&#8217;s needs. The advanced ability of the FOT language model makes its progress fast and secure.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/longlama\/longlama3.webp\" alt=\"OPEN LAMA different version\" width=\"780\" height=\"420\"\/><figcaption class=\"wp-element-caption\"><em>OPEN LAMA <\/em>different version<\/figcaption><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\" style=\"text-transform:capitalize\">Research Paper and Code<\/h2>\n\n\n\n<p>Its research paper is available on <a href=\"https:\/\/arxiv.org\/abs\/2307.03170\" target=\"_blank\" rel=\"noopener\">arxiv.org<\/a> and <a href=\"https:\/\/paperswithcode.com\/paper\/focused-transformer-contrastive-training-for\" target=\"_blank\" rel=\"noopener\">paperswithcode.com<\/a>. To view its source code jump to <a href=\"https:\/\/github.com\/openlm-research\/open_llama\" target=\"_blank\" rel=\"noopener\">the Github repo<\/a>. It data set is also available to people on <a href=\"https:\/\/paperswithcode.com\/paper\/focused-transformer-contrastive-training-for\" target=\"_blank\" rel=\"noopener\">paperswithcode.com<\/a>. For better understanding, you can also run the code online on google <a href=\"https:\/\/colab.research.google.com\/github\/CStanKonrad\/long_llama\/blob\/main\/long_llama_colab.ipynb\" target=\"_blank\" rel=\"noopener\">colab<\/a>, where the whole code is already present you just need to run the code and view the results.&nbsp;<\/p>\n\n\n\n<p>The researcher has added  all main tasks on this&nbsp;<a href=\"https:\/\/paperswithcode.com\/paper\/focused-transformer-contrastive-training-for\" target=\"_blank\" rel=\"noreferrer noopener\">webpage<\/a>. To check its pytorch (python framework) format move to&nbsp;<a href=\"https:\/\/huggingface.co\/docs\/transformers\/index\" target=\"_blank\" rel=\"noreferrer noopener\">hugging face<\/a>&nbsp;to explore its transformation library. Moreover, all models are present on&nbsp;<a href=\"https:\/\/github.com\/openlm-research\/open_llama\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a>&nbsp;where you can also check each model&#8217;s implementation details including FOT. The source code of&nbsp;<a href=\"https:\/\/github.com\/openlm-research\/open_llama\" target=\"_blank\" rel=\"noreferrer noopener\">training and dataset<\/a>&nbsp;is also open to people.&nbsp;<\/p>\n\n\n\n<p>It is open source and available to the public, to seek its feedback in real-world scenarios. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"text-transform:capitalize\">Potential application<\/h2>\n\n\n\n<p>FOT applications includes:<\/p>\n\n\n\n<ul>\n<li>Question answering<\/li>\n\n\n\n<li>chatbot development<\/li>\n\n\n\n<li>document summarization<\/li>\n\n\n\n<li>Sentiment analysis<\/li>\n\n\n\n<li>content generation<\/li>\n\n\n\n<li>knowledge base construction<\/li>\n\n\n\n<li>Information retrieval<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<p>FOT&#8217;s outstanding ability make it irreplaceable. It has surpassed previous models bring a new change to the world by processing large amounts of data. <\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table><tbody><tr><td><strong>Context\/Dataset<\/strong><\/td><td><strong>TREC<\/strong><\/td><td><strong>WebQS<\/strong><\/td><\/tr><tr><td><strong>2k<\/strong><\/td><td>67.0<\/td><td>21.2<\/td><\/tr><tr><td><strong>4k<\/strong><\/td><td>71.6<\/td><td>21.4<\/td><\/tr><tr><td><strong>6k<\/strong><\/td><td>72.9<\/td><td>22.2<\/td><\/tr><tr><td><strong>8k<\/strong><\/td><td><strong>73.3<\/strong><\/td><td><strong>22.4<\/strong><\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">                                                                                 Table 2<\/figcaption><\/figure>\n\n\n\n<p>FOT is an extension of the traditional model while improving its performance to tackle long piece of  documents. It utilizes a memory attention mechanism that helps to deal with relevant information, Mitigating the distraction issues by improving and maintaining performance to baseline models.<\/p>\n\n\n\n<p>Its main objective is to gain accuracy across multiple datasets. This model exhibits extrapolation capabilities.  <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/mlnews-website.s3.amazonaws.com\/longlama\/longlama4.webp\" alt=\"LongLLaMA\" width=\"780\" height=\"360\"\/><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>FOT successfully extends the context length transformer, expanding its length to deal with hundreds of thousands of tokens.  Without any lack and limitations, FOT handles multiple documents and provides outstanding outputs. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>After viewing the research, I believe the FOT transformer is an effective approach. Addressing issues of length, performance, and accuracy. Allowing it to deal with complexity with any range, FOT models include language modeling, and context length exploitation to provide better results. <\/p>\n\n\n\n<p>click <a href=\"https:\/\/mlnews.dev\/midjourney-ai-tool-generates-realistic-images\/\">here <\/a>to see latest innovation of AI<\/p>\n\n\n\n<p><strong>References:<\/strong><\/p>\n\n\n\n<p><strong><a href=\"https:\/\/github.com\/openlm-research\/open_llama\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub<\/a><\/strong><\/p>\n\n\n\n<p><strong><a href=\"https:\/\/arxiv.org\/abs\/2307.03170\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Arxiv.org<\/a><\/strong><\/p>\n\n\n\n<p><strong><a href=\"https:\/\/colab.research.google.com\/github\/CStanKonrad\/long_llama\/blob\/main\/long_llama_colab.ipynb\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Google colab<\/a>:<\/strong> <\/p>\n\n\n\n<div class=\"wp-block-group post-tag-div is-layout-constrained wp-block-group-is-layout-constrained\"><div class=\"wp-block-group__inner-container\">\n<hr class=\"wp-block-separator has-text-color has-cyan-bluish-gray-color has-alpha-channel-opacity has-cyan-bluish-gray-background-color has-background\"\/>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Read More<\/strong><\/p>\n\n\n<div class=\"taxonomy-post_tag post-tag wp-block-post-terms\"><a href=\"https:\/\/mlnews.dev\/tag\/artificial-intelligence\/\" rel=\"tag\">Artificial Intelligence<\/a><span class=\"wp-block-post-terms__separator\">   <\/span><a href=\"https:\/\/mlnews.dev\/tag\/fot\/\" rel=\"tag\">FOT<\/a><span class=\"wp-block-post-terms__separator\">   <\/span><a href=\"https:\/\/mlnews.dev\/tag\/lama\/\" rel=\"tag\">LAMA<\/a><\/div><\/div><\/div>\n\n\n\n<hr class=\"wp-block-separator has-text-color has-cyan-bluish-gray-color has-alpha-channel-opacity has-cyan-bluish-gray-background-color has-background\"\/>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Similar Posts<\/strong><\/p>\n\n\n<ul class=\"wp-block-latest-posts__list is-grid columns-3 has-dates has-text-color has-black-color wp-block-latest-posts\"><\/ul>","protected":false},"excerpt":{"rendered":"<p>Revolutionary language model is here to improve context length in both single and multiple documents with the help of FOT. Its main purpose is to expand context length without compromising on performance. Extending context length beyond its limitation helps it to understand and process more information. It was first published at Google DeepMind, IDEAS NCBR, [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":331,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[269],"tags":[15,26,27],"_links":{"self":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/223"}],"collection":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/comments?post=223"}],"version-history":[{"count":14,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/223\/revisions"}],"predecessor-version":[{"id":1576,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/posts\/223\/revisions\/1576"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/media\/331"}],"wp:attachment":[{"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/media?parent=223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/categories?post=223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mlnews.dev\/wp-json\/wp\/v2\/tags?post=223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}