In a significant move to advance artificial intelligence while promoting responsibility, Meta's Fundamental AI Research (FAIR) team is publicly releasing several cutting-edge AI models. The releases come after over a decade of FAIR focusing on open research collaboration to drive AI innovation.
The models have the potential to transform how we interact with AI, from generating images and music to detecting AI-created speech. By putting this technology in the hands of the global research community, Meta hopes to spur further advancements while encouraging responsible AI development.
One of the most intriguing releases is the Chameleon model, a mixed-modal AI that can understand and generate both images and text simultaneously. Unlike most large language models which focus on one or the other, Chameleon can process and produce any combination of images and text. This opens up possibilities like creating image captions or building new scenes from mixed text and image prompts.
Meta is also tackling the efficiency of large language model training. Traditional models predict one word at a time, requiring vast amounts of text. FAIR's new multi-token prediction approach trains models to predict multiple words at once, promising faster and better language model development.
For the musically inclined, Meta's JASCO model offers unprecedented control in text-to-music generation. Unlike existing models that rely mainly on text, JASCO can take in chords, beats, and more to shape the generated music. This allows for greater creativity and versatility in AI-powered music creation.
As generative AI raises concerns about potential misuse, Meta's AudioSeal release aims to provide a solution. This novel audio watermarking technique can quickly detect AI-generated speech within longer audio clips, working up to 485 times faster than previous methods. AudioSeal could be vital for preventing the misuse of AI voice generation in areas like deepfakes.
Recognizing the importance of diversity in AI representation, FAIR has conducted extensive research into geographic disparities in text-to-image models. They've developed evaluation tools and gathered over 65,000 annotations to help ensure AI-generated images reflect the world's true cultural and geographic diversity.
For more information, review Meta's statement here.