• AI Research Insights
  • Posts
  • AI News: Diffusion models less private than prior generative models such as GANs; Can LLMs extract knowledge graphs from the unstructured text?; A transformer model edits a document and regenerates it..

AI News: Diffusion models less private than prior generative models such as GANs; Can LLMs extract knowledge graphs from the unstructured text?; A transformer model edits a document and regenerates it..

Hi there, today we will share some research updates from Diffusion models less private than prior generative models such as GANs; Can LLMs extract knowledge graphs from the unstructured text?; A transformer model edits a document and regenerates it; Looped Transformers as Programmable Computers and many other cool updates. So, let's start...

Are Diffusion models less private than prior generative models such as GANs? New research from a group of researchers from Google, DeepMind, ETHZ, Princeton, and UC Berkeley shows that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

Can LLMs extract knowledge graphs from the unstructured text? Introducing GraphGPT: You can pass in any text (summary of a movie, passage from Wikipedia, etc.) to generate a visualization of entities and their relationships.

NoRef-ER: A multi-language referenceless ASR metric based on a fine-tuned language model, for public use on HuggingFace. This metric allows for evaluating the outputs of ASR models without needing a reference transcript, making it a valuable tool for a/b testing multiple ASR models or model versions, or even ensembling their outputs. Potential use-cases for NoRefER include:

  1. A/B testing models or their versions to determine the best-performing one.

  2. Picking production outputs worth for the human-evaluation or post-editing.

  3. Ensemble the outputs of multiple ASR models to achieve a superior quality.

A transformer model edits a document and regenerates it: Researchers from Microsoft and the University of North Carolina at Chapel Hill propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data.

Text-davinci-002: Have large language models solved news summarization? A new study shows that text-davinci-002 is comparable to freelance writers.The story is more complicated though. While on aggregate text-davinci-002 is rated as comparable to freelance writers, researchers show that annotators have diverse and stable preference for either text-davinci-002 or the freelance writers.Text-davinci-002 also summarizes very differently from human writers. The freelance writers paraphrase much more frequently whereas Text-davinci-002 copies. That said, both of them are good at combining sentences together in a coherent way.

Looped Transformers as Programmable Computers: Researchers from Princeton University and the University of Wisconsin-Madison introduce a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Benefits of this approach over something like Tracr are (1) you can write general purpose "algebraic" programs (can emulate algorithms like power iteration, or in context learning with SGD on neural nets, etc), (2) depth <= 13, and overall size doesn't scale with length of programs due to looping over x = transformer(x)...

SingSong: Google researchers introduce a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. Singing is among the most intuitive ways we engage with music. We already sing along with existing music, but singing may also be useful as a control mechanism for music generation, allowing anyone who can sing to create new music with rich instrumentation. SingSong builds on improvements in source separation and audio generation. We use the former, specifically MDX-Net (Kim+21), to create large volumes of parallel (vox, instr.) training data. For the latter, we adapt AudioLM (Borsos+ 22) to the conditional “audio-to-audio” setting.

Wow 😱