• AI Research Insights
  • Posts
  • 🏅🏅🏅 What is trending in AI research- LLMWare Launches SLIMs + Google DeepMind Unveils MusicRL + Can Large Language Models be Trusted for Evaluation? and many more....

🏅🏅🏅 What is trending in AI research- LLMWare Launches SLIMs + Google DeepMind Unveils MusicRL + Can Large Language Models be Trusted for Evaluation? and many more....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Hi there, 

I hope you all are doing well!

Here are this week's top AI/ML research briefs.

LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation 🏅
As enterprises look to deploy LLMs in more complex production use cases beyond simple knowledge assistants, there is a growing recognition of three interconnected needs:  

  • Agents – complex workflows involve multiple steps and require the orchestration of multiple LLM calls;

  • Function Calls – models need to be able to generate structured output that can be handled programmatically, including key tasks such as classification and clustering, which often times are the connective tissue in such workflows; and 

  • Private Cloud – models and data pipelines need to be finetuned and tightly integrated with sensitive existing enterprise processes and data stores.  

LLMWare is setting out to uniquely address all three of these challenges with the launch of its 1B parameter small language models called SLIMs (Structured Language Instruction Models) and a new set of capabilities in the LLMWare library to execute multi-model, multi-step agent workflows in private cloud.

Google DeepMind Unveils MusicRL: A Pretrained Autoregressive MusicLM Model of Discrete Audio Tokens Finetuned with Reinforcement Learning to Maximise Sequence-Level Rewards🏅
How can music generation systems be fine-tuned to align with the diverse and subjective preferences of users? Addressing this challenge, Deepmind researchers propose MusicRL, a novel framework that refines a pretrained MusicLM model using human feedback. Initially, they finetune the MusicLM model using reinforcement learning (RL) to optimize for rewards based on caption adherence and audio quality, derived from evaluations by selected raters. Further innovation comes from incorporating large-scale user feedback, where they collected 300,000 pairwise preferences to inform the training process, marking a pioneering step in adapting text-to-music models to general user preferences through Reinforcement Learning from Human Feedback (RLHF). The combined approach, MusicRL-RU, which integrates both rater-informed and user-generated feedback, outperforms the baseline in human evaluations. 🎵🤖

Can Large Language Models be Trusted for Evaluation? Meet SCALEEVAL: An Agent-Debate-Assisted Meta-Evaluation Framework that Leverages the Capabilities of Multiple Communicative LLM Agents 🏅
How can we reliably evaluate Large Language Models (LLMs) across diverse tasks and scenarios, given the challenges of existing benchmarks and the need for extensive human annotation? 🤔 Meet ScaleEval, a novel, scalable agent-debate-assisted meta-evaluation framework. Leveraging multiple communicative LLM agents, ScaleEval facilitates multi-round discussions, aiding human annotators in identifying the most capable LLM evaluators with significantly reduced annotation effort. This innovative approach not only enhances the efficiency of meta-evaluation but also illuminates the reliability, capabilities, and limitations of LLMs as evaluators under various conditions. By making ScaleEval open-source, the researchers aim to spur further advancements in this domain, paving the way for the development of more sophisticated and dependable LLM evaluators. 🚀

Stanford Researchers Introduce RAPTOR: A Novel Tree-based Retrieval System that Augments the Parametric Knowledge of LLMs with Contextual Information 🏅
Addressing the limitation of current retrieval-augmented language models which primarily fetch short, contiguous text snippets, limiting their ability to grasp the comprehensive context of documents, the paper introduces RAPTOR. This innovative approach stands out by embedding, clustering, and summarizing text chunks recursively to build a hierarchical tree with varying levels of summarization. Such a structure enables the RAPTOR model to access information across extensive documents at different abstraction levels during inference. The experiments conducted reveal that this method of retrieval using recursive summaries significantly enhances performance over traditional retrieval-augmented language models across multiple tasks. Notably, RAPTOR, combined with GPT-4, dramatically improved accuracy by 20% on the QuALITY benchmark for question-answering tasks requiring complex reasoning, setting new state-of-the-art results. 🌳🔍

Other Trending Papers 🏅🏅🏅

  • More Agents Is All You Need [Paper]

  • SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models [Paper]

  • Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning [Paper]

  • InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [Paper]

Recommended Newsletters 📍📍📍