• AI Research Insights
  • Posts
  • 🐝 AI/ML Research Updates: Microsoft AI Releases LLMLingua; Mixtral 8x7b; LLM360; Together AI Introduces StripedHyena-7B .... many more research updates

🐝 AI/ML Research Updates: Microsoft AI Releases LLMLingua; Mixtral 8x7b; LLM360; Together AI Introduces StripedHyena-7B .... many more research updates

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Hey Folks!

This newsletter will discuss some cool AI research papers and trending AI Tools. Happy learning!

👉 What is Trending in AI/ML Research?

Mixtral 8x7b, a state-of-the-art large language model released by Mistral, outperforms GPT-3.5 in many benchmarks. It is integrated into the Hugging Face ecosystem, supporting a 32k token context length and multiple languages. Mixtral uses a Mixture of Experts (MoE) architecture, containing eight "expert" models in one. This setup replaces some Feed-Forward layers with a sparse MoE layer, allowing two experts to be selected per timestep. This enables the model to operate at the speed of a 12B parameter-dense model, despite having effectively four times the parameters. The model showcases strong performance in tasks like coding and is commercially available under an Apache 2.0 license. Mixtral's MoE architecture is not a simple aggregation of 8 models with 7B parameters each but is more nuanced, with some layers replicated and others shared across the model. The model requires significant VRAM for operation, posing challenges for local setups but is ideal for large VRAM environments. Mixtral's integration includes various tools within the Hugging Face ecosystem, such as training and inference scripts, and it demonstrates strong performance on benchmarks like MT-Bench.

This paper addresses the problem of limited transparency and reproducibility in the training of Large Language Models (LLMs) like LLaMA, Falcon, and Mistral, which typically release only partial artifacts like model weights or inference code. The proposed solution, LLM360, is an initiative to fully open-source LLMs, advocating for the availability of all training code, data, model checkpoints, and intermediate results to the AI research community. This approach aims to foster open, collaborative research, making the end-to-end LLM training process transparent and reproducible. As an initial step, LLM360 has released two 7B parameter LLMs, Amber and CrystalCoder, along with their comprehensive training resources. This initiative signifies a commitment to continual advancement in LLMs, promising the release of more robust and large-scale models in the future.

This paper addresses the challenge of managing the increasingly lengthy prompts fed to large language models (LLMs), a result of advancements in technologies like chain-of-thought (CoT) prompting and in-context learning (ICL). The proposed solution, LLMLingua, is a novel coarse-to-fine prompt compression method. It features a budget controller to preserve semantic integrity under high compression ratios, a token-level iterative compression algorithm to effectively handle the interdependence between compressed contents, and an instruction tuning-based method for aligning distributions between language models. The effectiveness of LLMLingua is demonstrated through experiments on four diverse datasets - GSM8K, BBH, ShareGPT, and Arxiv-March23. Results show that LLMLingua not only maintains state-of-the-art performance but also enables up to 20x compression with minimal loss in performance.

Together AI introduces StripedHyena-7B, an innovative architecture developed by Together AI's team and academic collaborators, offering a glimpse into a future beyond traditional Transformer models. StripedHyena is the first model to compete with the best open-source Transformers in both short and long-context evaluations. It stands out for its speed and memory efficiency during long-sequence training, fine-tuning, and generation. This efficiency is partly due to its unique state-space model (SSM) layer, which improves upon existing Transformer architectures.

StripedHyena's architecture offers significant advancements in terms of computational footprint, with cheaper fine-tuning and faster inference. The model demonstrates the potential of alternative architectures in improving the efficiency and performance of language models, marking a shift in computational paradigms. The article also indicates plans for future developments, including larger models, multi-modal support, and further performance optimizations.

  • Taplio*: Taplio is the all-in-one, AI-powered LinkedIn growth tool. It's helped over 6200 pros create awesome AI-powered content, schedule posts with ease, dig into deep analytics, and connect with top LinkedIn creators. The best part? You can try it for free.

  • Julius AI*: With Julius, anyone can analyze datasets, create visualizations, and even train ML models with only a prompt. [Data Analysis]

  • MeetGeek*: Your AI-powered meeting assistant for effortless recording, transcription, and summarization. [Meeting]

  • Decktopus*: Decktopus: AI-powered presentations, captivating designs, zero design experience. [Presentation]

  • Adcreative AI*: Boost your advertising and social media game with AdCreative.ai - the ultimate Artificial Intelligence solution. [Marketing and Sales]

  • Aragon*: Get stunning professional headshots effortlessly with Aragon. Utilize the latest in A.I. technology to create high-quality headshots of yourself in a snap! [Professional]

  • Otter AI*: Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries. [Meeting]

  • Notion*: Notion is an all-in-one workspace for teams and individuals, offering note-taking, task management, project management, and more. [Productivity]

  • Motion*: Motion is an AI-powered daily schedule planner that helps you be more productive. [Productivity and Automation]

*We do make a small affiliate profit when you buy this product through the click link

  • AlphaNotes GPT: Efficiently and enjoyably convert YouTube videos and web articles into customized study guides and learning aids, enhancing your educational experience.

  • HackerNews GPT: Get daily or weekly summaries of top stories and comments from Hacker News, featuring advanced search capabilities

  • Excel Brother: Provides responses to inquiries about Excel files and supports both file uploads and screenshot submissions for comprehensive assistance.