Amazon Web Services has introduced a new family of multimodal generative AI models, called Nova at its re:Invent conference on Tuesday.
These models are designed to handle a range of tasks, from text generation to image and video creation.
The Nova family includes four text-generating models: Micro, Lite, Pro, and Premier. Micro, Lite, and Pro are available to AWS customers starting today, while Premier will launch in early 2025.
AWS CEO Andy Jassy highlighted the progress the company has made with these models, noting, “If we are finding value from them, you will likely find value too.”
The Nova models are optimized for 15 languages, with English being the primary language. They vary in size and capability: Micro is the smallest, capable of processing only text but offering the fastest response time.
Lite can handle text, image, and video inputs at a reasonable speed. Pro offers a balanced mix of speed, accuracy, and cost-effectiveness for various tasks, and Premier, the most advanced, is designed for complex workloads and is ideal for creating customized models.
Pro and Premier are also capable of processing text, images, and video, making them suitable for tasks like summarizing documents and analyzing charts. AWS has positioned Premier as a “teacher” model, ideal for fine-tuning custom models.
Micro can handle up to 128,000 tokens, roughly equivalent to 100,000 words, while Lite and Pro can manage up to 300,000 tokens, around 225,000 words or 30 minutes of video. In 2025, some Nova models will support over 2 million tokens.
Jassy emphasized that the Nova models are among the fastest and most affordable in their class. Available on AWS Bedrock, Amazon’s AI development platform, these models can be fine-tuned for improved performance across text, image, and video tasks.
In addition to the text models, AWS introduced two new media generation tools: Nova Canvas and Nova Reel. Canvas enables users to create and edit images based on prompts, while Reel allows users to generate short, six-second videos with customizable camera movements, such as pans and zooms. Although currently limited to six-second videos, AWS plans to release a version capable of generating two-minute-long videos soon.
Both models include built-in safeguards to prevent the creation of harmful content, with features like watermarking and content moderation.
AWS also outlined safety measures to address the spread of misinformation and other harmful material, though specific details on these safeguards remain limited.
Looking ahead, Jassy announced that AWS is developing a speech-to-speech model, expected to launch in Q1 2025, that will interpret verbal and nonverbal cues to produce natural-sounding, human-like voices. Additionally, an “any-to-any” model is planned for mid-2025, which will allow users to input text, speech, images, or video and receive output in any of those formats.
“These are the models of the future,” said Jassy, highlighting their potential to revolutionize how we interact with AI across multiple mediums.