• AI Research Insights
  • Posts
  • ๐Ÿš€ AI News: Can language models help us do a better search? | Researchers from KAIST introduce SelFee | Can (very) simple math inform RLHF for large language models? .....

๐Ÿš€ AI News: Can language models help us do a better search? | Researchers from KAIST introduce SelFee | Can (very) simple math inform RLHF for large language models? .....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Researchers from KAIST introduce SelFee: An iterative self-revising LLM empowered by self-feedback generation. SelFee is a new instruction-following language model that generates self-feedback on its response and self-revises based on feedback. The research team finetuned the LLaMA model (7B, 13B) using 178K training instances that contain self-feedback and revision data generated by ChatGPT. SelFee is a LLaMA-based instruction-following model that has been fine-tuned to continuously revise its own answer until it provides a high-quality response in a single inference. When presented with an instruction Q, our model generates not only the initial answer A0 but also self-feedback sequences F0. By analyzing the content of the generated feedback F0, the model determines whether a revision is necessary or not. If a revision is deemed necessary, the model generates the revised answer A1 based on the feedback F0. Additionally, it generates F1, which represents feedback for A1, and so forth. Importantly, this entire process is completed within a single inference. Even with this straightforward task, our model demonstrates a significant improvement over existing LLaMA-based models.

Can language models help us do a better search?๐Ÿค” Researchers from MIT and Meta AI propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering. EAR first applies a query expansion model to generate a diverse set of queries, and then uses a query reranker to select the ones that could lead to better retrieval results. Motivated by the observation that the best query expansion often is not picked by greedy decoding, EAR trains its reranker to predict the rank orders of the gold passages when issuing the expanded queries to a given retriever. By connecting better the query expansion model and retriever, EAR significantly enhances a traditional sparse retrieval method, BM25.

Researchers from UT Austin introduced Ambient Diffusion, a framework to train/finetune diffusion models given only corrupted images as input. This reduces the memorization of the training set. This is the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to acquire. Another benefit of the proposed approach is the ability to train generative models that are less likely to memorize individual training samples since they never observe clean training data. The main idea of this paper is to introduce additional measurement distortion during the diffusion process and require the model to predict the original corrupted image from the further corrupted image.

Can (very) simple math inform RLHF for large language models? In this AI paper, an empirical phenomenon called reward collapse is introduced. Reward collapse occurs during the training of reward models for aligning language models using human preference rankings. It leads to the emergence of the same reward distribution, irrespective of the prompt type. This phenomenon arises due to neural network interpolation in the final training phase. To address reward collapse, the research group proposes utility functions that take into account the characteristics of prompts. They also present an analytical framework that evaluates reward distribution, resulting in closed-form reward expressions. Synthetic experiments are conducted to support these findings, demonstrating a more effective method than early stopping for dealing with reward collapse.

A Banksy got everyday investors 32% returns?

Note: This section is supported and sponsored by Masterworks

Mm-hmm, sure. So, whatโ€™s the catch?

We know it may sound too good to be true. But thousands of investors are already smiling all the way to the bank, thanks to the fine-art investing platform Masterworks.

These results arenโ€™t cherry-picking. This is the whole bushel. Masterworks has built a track record of 13 exits, including net returns of +10.4%, +27.3%, and +35.0%, even while financial markets plummeted.

But art? Really? Okay, skeptics, here are the numbers. Contemporary art prices:

  • outpaced the S&P 500 by 131% over the last 26 years

  • have the lowest correlation to equities of any asset class

  • remained stable through the dot-com bubble and โ€™08 crisis

Got your attention yet? Marktechpost readers can skip the waitlist with this exclusive link.

See important disclosures at masterworks.com/cd

Bothered by the expensive runs on Auto-GPT and LangChain agents? Check out ReWOO: An AI project that eliminates token redundancy in the prevailing 'thought-action-observation' paradigm, achieving better task completion with 5x less token usage at inference. Given a question, a Planner composes a blueprint of interlinked plans prior to tool response. The blueprint instructs the Worker to call tools and collect evidence. Finally, plans and evidence are paired and fed to Solver. Only two calls to LLMs are necessary. Compared to ReAct, widely adopted in LangChain and Auto-GPT, ReWOO quadratically improves system efficiency by eliminating the 'stacking' prompt history (or so-called long-term memory management).

Have you ever questioned the necessity of RL in RLHF? Are you concerned that a deep understanding of PPO is essential? Put your worries to rest because Direct Preference Optimization (DPO) provides a solution. With DPO, you can fine-tune LMs directly from preferences using a straightforward classification loss, eliminating the need for RL. How does it work? The crucial concept is that language models can already serve as reward models (refer to the explanation below). By leveraging the binary cross-entropy loss function used for training a reward model with preference data, we can now directly refine language models through fine-tuning!

Featured AI Tools For This Newsletter Issue:

SaneBoxย 

Plaskย 

Find 100s of cool artificial intelligence (AI) tools. Our expert team reviews and provides insights into some of the most cutting-edge AI tools available. Check out AI Tools Club