AWS Foundation Model Types

Model Types

You can categorize FMs into multiple categories.

Two of the most frequently used models are text-to-text models and text-to-image models.

In this lesson, you learn more about each of these types of models.

Text-to-text models

Text-to-text models are large language models (LLMs) that are pretrained to process vast quantities of textual data and human language.

These large foundation models can summarize text, extract information, respond to questions, create content (such as blogs or product descriptions), and more.

Natural Language Processing (NLP)

NLP is a machine learning technology that gives machines the ability to interpret and manipulate human language.

NLP does this by analyzing the data, intent, or sentiment in the message and responding to human communication.

Typically, NLP implementation begins by gathering and preparing unstructured text or speech data from different sources and processing the data.

It uses techniques such as tokenization, stemming, lemmatization, stop word removal, part-of-speech tagging, named entity recognition, speech recognition, sentiment analysis, and so on.

However, modern LLMs don't require using these intermediate steps.

Recurrent neural network (RNN)

RNNs use a memory mechanism to store and apply data from previous inputs.

This mechanism makes RNNs effective for sequential data and tasks, such as natural language processing, speech recognition, or machine translation.

However, RNNs also have limitations.

They're slow and complex to train, and they can't support training parallelization.

Transformer

A transformer is a deep-learning architecture that has an encoder component that converts the input text into embeddings.

It also has a decoder component that consumes the embeddings to emit some output text.

Unlike RNNs, transformers are extremely parallelizable, which means instead of processing text words one at a time during the learning cycle, transformers process input all at the same time.

It takes transformers significantly less time to train, but they require more computing power to speed training.

The transformer architecture was the key to the development of LLMs.

These days, most LLMs contain a decoder component.

Text-to-image models

Text-to-image models take natural language input and produce a high-quality image that matches the input text description.

Some examples of text-to-image models are DALL-E 2 from OpenAI, Imagen from the Google Research Brain Team, Stable Diffusion from Stability AI, and Midjourney.

To learn more about text-to-image models, review the following slide.

Diffusion architecture

Diffusion is a deep learning architecture system that learns through a two-step process.

The first step is called forward diffusion.

Using forward diffusion, the system gradually introduces a small amount of noise to an input image until the noise is leftover.

There is a U-Net model that tracks and predicts the noise level.

In the subsequent reverse diffusion step, the noisy image is gradually introduced to denoising until a new image is generated.

During the training process, the model gets the feed of text, which is added to the image vector.

Large language models

Large language models are a subset of foundation models.

LLMs are trained on trillions of words across many natural language tasks.

LLMs can understand, learn, and generate text that’s nearly indistinguishable from text produced by humans.

LLMs can also engage in interactive conversations, answer questions, summarize dialogues and documents, and provide recommendations.

Because of their sheer size and AI acceleration, LLMs can process vast amounts of textual data.

LLMs have a wide range of capabilities, such as creative writing for marketing, summarizing legal documents, preparing market research for financial teams, simulating clinical trials for healthcare, and writing code for software development.

Neural network layers

Transformer models effectively process natural language because they use neural networks to understand the nuances of human language.

These neural networks are computing systems modeled after the human brain.

Multiple layers of neural networks in a single LLM work together to process your input and generate output.

Embedding layer

The embedding layer converts your input text to vector representations called embeddings.

This layer can capture complex relationships between the embeddings, so you can understand the context of your input text.

Feedforward layer

The feedforward layer contains several connected layers that transform the embeddings into more weighted versions of themselves.

This layer continues to contextualize the language and helps you understand the input text's intent better.

Attention mechanism

With the attention mechanism, you can focus on the most relevant parts of the input text.

This mechanism, a central part of the transformer model, helps you achieve more accurate output results.

LLM use cases

LLMs have diverse applications:

Automate customer service with chatbots and virtual assistants
Draft articles, blogs, and marketing materials
Generate, review, and explain code for developers
Summarize patient records and clinical documentation
Translate documents and real-time conversations
Analyze and summarize reports, legal documents, and financial statements
Support creative writing like poetry and storytelling
Enable personalized recommendations in e-commerce, education, and media

The following four categories highlight key areas where LLMs deliver value:

1 2 3 4

Image created by Amazon Web Services.

1. Improve customer experiences

Chatbots and virtual assistants - Automate responses, provide instant answers and support.
Call analytics - Extract insights from contact center calls to boost loyalty.
Agent assist - AI tools support human agents in problem solving and decision-making.

2. Boost employee productivity

Conversational search - Quickly find and summarize information through natural language.
Code generation - Accelerate development with code suggestions.
Automated report generation - Generate financial reports and projections automatically.

3. Enhance creativity and content creation

Marketing - Create blog posts, social media updates, email newsletters.
Sales - Generate personalized emails and sales scripts.
Product development - Generate design prototypes and optimize based on feedback.
Media and entertainment - Create scripts and dialogues for movies, TV, games.
News generation - Generate articles and summaries from raw data.

4. Accelerate process optimization

Document processing - Extract and summarize data from documents.
Fraud detection - Learn fraud patterns to train robust detection systems.
Supply chain optimization - Evaluate scenarios to improve logistics and reduce costs.

❮ Previous Next ❯

★ +1

AWS GenAI

AWS Prompt Engineering

More AWS

AWS Foundation Model Types

Model Types

Text-to-text models

Natural Language Processing (NLP)

Recurrent neural network (RNN)

Transformer

Text-to-image models

Diffusion architecture

Large language models

Neural network layers

Attention mechanism

LLM use cases

1. Improve customer experiences

2. Boost employee productivity

3. Enhance creativity and content creation

4. Accelerate process optimization

COLOR PICKER

Contact Sales

Report Error

Top Tutorials

Top References

Top Examples

Get Certified