AI Research Insights
Posts
🏅🏅🏅 What is trending in AI research- JPMorgan AI Research Introduces DocLLM and Meet TinyLlama: An Open-Source Small-Scale Language Model ....... many others

🏅🏅🏅 What is trending in AI research- JPMorgan AI Research Introduces DocLLM and Meet TinyLlama: An Open-Source Small-Scale Language Model ....... many others

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

ASIF RAZZAQ
January 11, 2024

Hi there,

Here are this week's top AI/ML research briefs.

JPMorgan AI Research Introduces DocLLM 🏅
🤔 How can we efficiently understand complex enterprise documents like forms, invoices, and contracts, which blend text with intricate spatial layouts? 📄✨ Meet DocLLM! 🚀 This paper from JPMorgan introduces a nifty method, DocLLM, a sprightly extension to traditional large language models (LLMs). Unlike its multimodal cousins, DocLLM ditches the bulky image encoders and smartly focuses on bounding box info to grasp the spatial structure of documents. 🧩📏 It ingeniously uses disentangled matrices in transformers to align text and space, and a pre-training objective that's ace at filling in text gaps. 🧠💡 This cool technique tackles the messy layouts and varied content in visual documents with flair. After buffing up with a hefty instruction dataset, DocLLM flexes its muscles, outshining state-of-the-art LLMs in 14 of 16 datasets across key document tasks, and also wows with its adaptability to new, unseen datasets! 🏆📈🔍

Meet TinyLlama: An Open-Source Small-Scale Language Model that Pretrain a 1.1B Llama Model on 3 Trillion Tokens 🏅
How can we achieve high performance in language models without the massive computational overhead? Meet TinyLlama, the new compact sensation in the world of AI! 🌟 This pint-sized powerhouse, a 1.1B language model, is like a mini-genius trained on about 1 trillion tokens for roughly 3 epochs. Drawing inspiration from its big sibling, Llama 2, TinyLlama incorporates cool advances from the open-source community, like FlashAttention, making it a speed demon in computational efficiency. 🚀 Despite its "tiny" stature, it packs a punch, dazzling in various downstream tasks and outshining its peers in the same weight class. TinyLlama is a testament to how size isn't everything; it's how you use it that counts! 🧠💪

A New MIT Research Announces a Vision Check-Up for Language Models 🏅
How can large language models (LLMs) be taught about the visual world through modeling relationships between strings? This paper from MIT explores the LLMs' capabilities in generating and recognizing a variety of visual concepts, presenting a novel visual representation learning system that leverages text models. Since LLMs inherently cannot process or produce visual information as pixels, this research employs code as a medium for image representation. The study reveals that, although images generated by LLMs diverge from natural images, their ability to create and refine these images indicates a significant potential for LLMs to learn about the visual world through string modeling. Moreover, the paper highlights exciting possibilities in self-supervised visual representation learning, where vision models trained solely with LLM-generated images can semantically interpret natural images, marking a groundbreaking step in AI research. 🌟🔍🖼️

Colossal-AI Team Open-Sources SwiftInfer: A TensorRT-Based Implementation of the StreamingLLM Algorithm 🏅
How can we enhance the efficiency of Large Language Models (LLMs) in handling multi-round conversations? Meet SwiftInfer! Developed by researchers at Colossal-AI. SwiftInfer is a ground-breaking TensorRT-based implementation of the StreamingLLM algorithm. StreamingLLM already made waves by feeding a whopping 4 million tokens in multi-round conversations without losing speed or quality, achieving a 22.2x speedup compared to traditional methods. But SwiftInfer takes this up a notch, boosting inference performance by an additional 46% 🚀!

This solution is crafted by integrating StreamingLLM's methodology with TensorRT's inference optimization. The result? A seamless blend that not only retains all perks of StreamingLLM but also skyrockets inference efficiency. Using TensorRT-LLM's API, the team has re-implemented key components like the KV Cache mechanism and the attention module with position shift, ensuring that as conversations grow, the model stays sharp and responsive. The figure below illustrates how SwiftInfer cleverly manages token windows, ensuring the model is always focused on the most relevant parts of the conversation. With SwiftInfer, we're looking at a new era of LLMs: faster, more cost-effective, and ready for real-world multi-round conversations! 🌐🔥

🐝 [Partnership and Promotion on Marktechpost] Now you can partner with Marktechpost to promote your Research Paper, Github Repo and even add your pro commentary in any trending research article on marktechpost.com. Elevate your and your company's AI research visibility in the tech community...Learn more

😊 guess!!!…WHO IS TALKING ABOUT MARKTECHPOST?

From @Marktechpost: a description of our latest work on Image Understanding Through Contextual Phrase Detection, by a team from NYU consisting of @ashkamath20, Sara Price, Jonas Pfeiffer, me, and @alcinos26.
marktechpost.com/2023/02/14/res…
— Yann LeCun (@ylecun)
Feb 19, 2023

Other Trending Papers 🏅🏅🏅

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation [Paper]
TOFU: A Task of Fictitious Unlearning for LLMs [Paper]
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models [Paper]
The Impact of Reasoning Step Length on Large Language Models [Paper]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models [Paper]
TrustLLM: Trustworthiness in Large Language Models [Paper]
Your Research Paper or Github Repo here??? Learn