• AI Research Insights
  • Posts
  • ↗️ AI/ML Research Updates: UC Berkeley Researchers Propose CRATE; Meta Research Introduces System 2 Attention (S2A); When To Use Whisper v2, Whisper v3, and Distilled Whisper?... and many more research trends

↗️ AI/ML Research Updates: UC Berkeley Researchers Propose CRATE; Meta Research Introduces System 2 Attention (S2A); When To Use Whisper v2, Whisper v3, and Distilled Whisper?... and many more research trends

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Hey Folks!

This newsletter will discuss some cool AI research papers. Happy learning!

👉 What is Trending in AI/ML Research?

How can representation learning effectively compress data distributions, such as token sets, into coherent low-dimensional spaces? This paper introduces a novel approach centered on transforming data distributions into Gaussian mixtures within incoherent subspaces. This method evaluates representations using 'sparse rate reduction', optimizing both information gain and representation sparsity. The authors reinterpret popular deep learning architectures, like transformers, as iterative optimizers of this measure. Specifically, they reframe transformer blocks as alternating optimization tools: multi-head self-attention for compression and multi-layer perceptrons for sparsification. This perspective leads to 'CRATE', a fully interpretable, transformer-like architecture that excels in both encoding and decoding through a unique denoising-compression link. CRATE architectures demonstrate their efficacy on large-scale image and text datasets, closely matching the performance of sophisticated models like ViT, MAE, DINO, BERT, and GPT2. This research suggests a promising direction in harmonizing deep learning theory and practice through data compression principles.

How can NLP performance be improved without significantly increasing resource consumption? Addressing this, the paper surveys current methods in efficient NLP, acknowledging that scaling up model parameters and training data leads to unsustainable resource use, including data, time, storage, and energy. The focus is on synthesizing efficient NLP methods that achieve comparable results with fewer resources. The survey serves as a guide for conducting NLP with limited resources and suggests promising avenues for future research in developing more efficient NLP methodologies. This approach is crucial in a landscape where resources are limited and unevenly distributed.

How can Transformer-based Large Language Models (LLMs) avoid incorporating irrelevant information in their latent representations? This paper introduces "System 2 Attention (S2A)", a novel method addressing this issue. S2A capitalizes on the natural language reasoning and instruction-following capabilities of LLMs to selectively focus on pertinent information. It works by first regenerating the input context to filter out irrelevant content, and then applying attention to this refined context for generating responses. Experimentally, S2A surpasses traditional attention-based LLMs in tasks laden with opinionated or irrelevant information, such as QA, math word problems, and longform generation. It notably enhances factuality, objectivity, and reduces undue flattery in responses.

This article discusses the various Whisper models developed by OpenAI for speech recognition and transcription. These models, part of a series that began in late 2022, are based on a Transformer encoder-decoder architecture. They are trained on extensive datasets, making them capable of generalizing across many languages and domains without needing fine-tuning. The key variants include Whisper v2, v3, and Distilled Whisper.

  • Whisper v3 is recommended for known languages with reliable language identification.

  • Whisper v2 is more suitable for unknown languages or when v3's language identification is less reliable.

  • Whisper v3 Large is optimal for English audio and when memory or inference performance is not a constraint.

  • Distilled Whisper, being faster and smaller, is preferable for situations where efficiency is crucial, especially with English audio. It performs almost as well as the slower models, despite some limitations in recognizing successive sentence segments and occasionally better at punctuation insertion.

How can the tendency of large language models (LLMs) to produce hallucinations be addressed? This survey paper provides an exhaustive overview of the advances in understanding and mitigating hallucinations in LLMs. It introduces a novel taxonomy categorizing different types of LLM hallucinations, explores the factors leading to these inaccuracies, and presents a detailed examination of methods developed for detecting and mitigating such hallucinations. Additionally, the paper reviews various benchmarks in the field. By analyzing existing challenges and posing open questions, the survey aims to guide future research towards addressing the reliability issues posed by hallucinations in LLM applications.

Check Out These FREE Tutorials and Notebooks from Our Partners