• AI Research Insights
  • Posts
  • ↗️ AI/ML Research Updates: NVIDIA AI Researchers Propose Tied-Lora; Microsoft Research Introduces Florence-2; LLMWare Launches RAG-Specialized 7B Parameter LLMs; .. and many more research trends

↗️ AI/ML Research Updates: NVIDIA AI Researchers Propose Tied-Lora; Microsoft Research Introduces Florence-2; LLMWare Launches RAG-Specialized 7B Parameter LLMs; .. and many more research trends

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Hey Folks!

This newsletter will discuss some cool AI research papers. Happy learning!

👉 What is Trending in AI/ML Research?

How can the efficiency of the Low-rank Adaptation (LoRA) method be enhanced for language models? This paper introduces "Tied-LoRA," a paradigm that combines weight tying with selective training to boost LoRA's parameter efficiency. The research explores various combinations of parameter training and freezing, alongside weight tying, to find an optimal balance between performance and the trainable parameters' count. This exploration, conducted across different tasks and two base language models, reveals the trade-offs between efficiency and performance. Notably, a specific Tied-LoRA configuration is highlighted for its exceptional performance, achieving results comparable to standard LoRA while using only about 13% of its parameters.

How can a unified model handle the complexity of various computer vision and vision-language tasks with simple text prompts? Florence-2 addresses this by introducing a vision foundation model capable of understanding and executing a wide range of tasks based on text-prompt instructions. It operates across diverse domains such as captioning, object detection, grounding, and segmentation. This versatility is powered by the extensive FLD-5B dataset, comprising 5.4 billion annotations on 126 million images. Florence-2 employs a sequence-to-sequence structure, enabling it to adeptly translate textual prompts into accurate visual tasks. Its performance, tested extensively, showcases its robust zero-shot learning and fine-tuning abilities, positioning it as a formidable player in vision foundation models.

How can enterprises efficiently implement Retriever-Augmented Generation (RAG) systems using Large Language Models (LLMs) for complex workflows? Ai Bloks addresses this with the launch of "llmware", an open-source framework designed for constructing enterprise-grade LLM-based workflow applications. The latest addition to this suite is the DRAGON series, featuring seven billion parameter LLMs optimized for business workflows, particularly focusing on fact-based question-answering within intricate business and legal documents. LLMWare specifically caters to the enterprise demand for a unified framework that integrates LLMs with workflow tools, provides high-quality, specialized LLMs for enterprise tasks, and allows private, customizable, and cost-effective deployment options. The DRAGON models, available on Hugging Face, are fine-tuned for RAG tasks, ensuring production-grade readiness for diverse enterprise applications.

How can LLMs bridge the gap between code generation benchmarks and practical programming involving pre-existing libraries? This paper introduces "ML-Bench", a benchmark designed to evaluate the effectiveness of LLMs in leveraging open-source libraries for machine learning tasks. It comprises 10,044 samples across 130 tasks from 14 major GitHub repositories. LLMs are tested on generating code for a given machine learning task, requiring them to interpret complex, language-code mixed documents and multi-file code structures. Despite GPT-4's advanced capabilities, achieving only 39.73% task completion highlights a significant improvement area. To address this, "ML-Agent" is proposed, enhancing GPT-4's ability to navigate codebases, locate documentation, retrieve relevant code, and generate executable solutions. This approach marks a substantial improvement over traditional LLM applications in coding.

How can we create photorealistic, controllable 3D avatars in real-time without the need for dense input images or accurate 3D registrations? This paper introduces Drivable 3D Gaussian Avatars (D3GA), a novel approach utilizing 3D Gaussian Splatting (3DGS) for rendering realistic human figures. Unlike traditional methods that rely on neural radiance fields and suffer from slow performance, D3GA uses dense calibrated multi-view videos for input, rendering at real-time framerates. The framework employs cage deformations instead of linear blend skinning (LBS) for a more efficient deformation of 3D Gaussian primitives. These deformations are driven by joint angles and keypoints, making D3GA ideal for telepresence applications. Tested on nine diverse subjects, D3GA demonstrates superior quality compared to existing methods, using identical training and test datasets.

Check Out These FREE Tutorials and Notebooks from Our Partners