• AI Research Insights
  • Posts
  • 🚀 AI News: Unpacking Transformer LLMs, Introducing PassGPT & CodeTF, and Delving into Multilingual Models in ASR....

🚀 AI News: Unpacking Transformer LLMs, Introducing PassGPT & CodeTF, and Delving into Multilingual Models in ASR....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

How to Keep Scaling Large Language Models when Data Runs Out? A New AI Research from Hugging Face, Harvard University, and the University of Turku train 400 models with up to 9B parameters and 900B tokens to create an extension of Chinchilla Scaling Laws for repeated data. The researchers have run various trials with different amounts of data repetition and compute budget while training the models in the experiments using up to 900 billion training tokens and 9 billion parameters. The results showed that training with up to 4 epochs of repeated data had less effect on loss compared to training with unique data when data was confined, and the compute budget was fixed. However, the value of adding more computing resources decreased to zero as the amount of repeated data grew.

Fine-Grained RLHF: a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated. (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). The research team conducted extensive experimental analysis on two text generation tasks to illustrate the better performance of FINE-GRAINED RLHF than RLHF with holistic rewards, supported by both automatic and human evaluation. Furthermore, they show that Large language models can be customized for specific needs using different combinations of fine-grained reward models.

Researchers from ETH Zürich and HKUST propose HQ-SAM, equipping SAM with the ability to accurately segment any object while maintaining SAM’s original promptable design, efficiency, and zero-shot generalizability. HQ-SAM significantly boosts the mask prediction quality of SAM, which was trained on 1.1 billion masks. The zero-shot transfer evaluation is performed on 7 segmentation benchmarks across both image and video tasks, spanning diverse objects and scenes.

This AI Research Dives Into The Limitations and Capabilities of Transformer Large Language Models (LLMs), Empirically and Theoretically, on Compositional Tasks. A team of researchers has highlighted the difference between the impressive performance of LLMs like GPT on complex tasks and their struggles with simple tasks in a recently released research paper. Diving into the limitations and capabilities of Transformer LLMs, the team has conducted experiments on three representative compositional tasks: multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks involve breaking down problems into smaller steps and combining those steps to produce an exact solution.

Meet PassGPT: an LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. The research team also introduced the concept of guided password generation, where they leveraged the PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Overall, the research team introduced two autoregressive architectures that model the conditional distribution of characters based on previous ones: PassGPT and PassVQT. PassGPT might be preferable because it provides access to an explicit probability distribution, is simpler, and provides faster generation. However, PassVQT might still be helpful for scenarios where the research group wanted to express more variability and generate more complicated passwords that are still close to the training distribution.

The Salesforce AI research team introduces CodeTF, an open-source transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF library supports both the development and deployment of Code LLMs for code intelligence tasks. The library can support both training and serving code LLM models, code utilities to process code data, and popular research benchmarks to evaluate the model performance. CodeTF is designed with key principles to provide a user-friendly and easy-to-use platform for code intelligence tasks. It follows a modular architecture, enhancing its extensibility by allowing seamless integration of additional programming languages, models & utilities.

How can multilingual models handle the increasingly diverse audio as they scale to more languages? Encoder-decoder multilingual models commonly perform both language identification (LID) and ASR. The decoder uses the LID as a condition for ASR, but what about the encoder? CMU researchers propose a CTC-based multi-task approach, where encoder layers also predict the LID. By having an LID for each token in an utterance, the model learns to align the LID with the audio! ASR is also much easier if we already know the language to transcribe. The researchers force the model to utilize the LID as a dependency for ASR, by also conditioning encoder layers on intermediate LID predictions.

A Banksy got everyday investors 32% returns?

Note: This section is supported and sponsored by Masterworks

Mm-hmm, sure. So, what’s the catch?

We know it may sound too good to be true. But thousands of investors are already smiling all the way to the bank, thanks to the fine-art investing platform Masterworks.

These results aren’t cherry-picking. This is the whole bushel. Masterworks has built a track record of 13 exits, including net returns of +10.4%, +27.3%, and +35.0%, even while financial markets plummeted.

But art? Really? Okay, skeptics, here are the numbers. Contemporary art prices:

  • outpaced the S&P 500 by 131% over the last 26 years

  • have the lowest correlation to equities of any asset class

  • remained stable through the dot-com bubble and ’08 crisis

Got your attention yet? Marktechpost readers can skip the waitlist with this exclusive link.

See important disclosures at masterworks.com/cd