Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 114 additions & 103 deletions 09-building-image-applications/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@

Let's start with DALL-E, which is a Generative AI model that generates images from text descriptions.

> [DALL-E is a combination of two models, CLIP and diffused attention](https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e?WT.mc_id=academic-105485-koreyst).

Check failure on line 51 in 09-building-image-applications/README.md

View workflow job for this annotation

GitHub Actions / Check Broken URLs

File 09-building-image-applications/README.md, line 51, Link https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e?WT.mc_id=academic-105485-koreyst is broken.

- **CLIP**, is a model that generates embeddings, which are numerical representations of data, from images and text.

Expand Down Expand Up @@ -78,14 +78,22 @@
- **pillow**, to work with images in Python.
- **requests**, to help you make HTTP requests.

## Create and deploy an Azure OpenAI model

If not done already, follow the instructions on the [Microsoft Learn](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/create-resource?pivots=web-portal) page
to create an Azure OpenAI resource and model. Select DALL-E 3 as model.

## Create the app

1. Create a file _.env_ with the following content:

```text
AZURE_OPENAI_ENDPOINT=<your endpoint>
AZURE_OPENAI_API_KEY=<your key>
AZURE_OPENAI_DEPLOYMENT="dall-e-3"
```

Locate this information in Azure Portal for your resource in the "Keys and Endpoint" section.
Locate this information in Azure OpenAI Foundry Portal for your resource in the "Deployments" section.

1. Collect the above libraries in a file called _requirements.txt_ like so:

Expand Down Expand Up @@ -113,57 +121,54 @@

1. Add the following code in file called _app.py_:

```python
import openai
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_API_KEY']

# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'


try:
# Create an image by using the image generation API
generation_response = openai.Image.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2,
temperature=0,
)
# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')

# If the directory doesn't exist, create it
if not os.path.isdir(image_dir):
os.mkdir(image_dir)

# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')

# Retrieve the generated image
image_url = generation_response["data"][0]["url"] # extract image URL from response
generated_image = requests.get(image_url).content # download the image
with open(image_path, "wb") as image_file:
image_file.write(generated_image)

# Display the image in the default image viewer
image = Image.open(image_path)
image.show()

# catch exceptions
except openai.InvalidRequestError as err:
print(err)

```python
import openai
import os
import requests
from PIL import Image
import dotenv
from openai import OpenAI, AzureOpenAI

# import dotenv
dotenv.load_dotenv()

# configure Azure OpenAI service client
client = AzureOpenAI(
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ['AZURE_OPENAI_API_KEY'],
api_version = "2024-02-01"
)
try:
# Create an image by using the image generation API
generation_response = client.images.generate(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
size='1024x1024', n=1,
model=os.environ['AZURE_OPENAI_DEPLOYMENT']
)

# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')

# If the directory doesn't exist, create it
if not os.path.isdir(image_dir):
os.mkdir(image_dir)

# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')

# Retrieve the generated image
image_url = generation_response.data[0].url # extract image URL from response
generated_image = requests.get(image_url).content # download the image
with open(image_path, "wb") as image_file:
image_file.write(generated_image)

# Display the image in the default image viewer
image = Image.open(image_path)
image.show()

# catch exceptions
except openai.InvalidRequestError as err:
print(err)
```

Let's explain this code:
Expand All @@ -185,28 +190,26 @@
dotenv.load_dotenv()
```

- After that, we set the endpoint, key for the OpenAI API, version and type.
- After that, we configure Azure OpenAI service client

```python
# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_API_KEY']

# add version and type, Azure specific
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'
client = AzureOpenAI(
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ['AZURE_OPENAI_API_KEY'],
api_version = "2024-02-01"
)
```

- Next, we generate the image:

```python
# Create an image by using the image generation API
generation_response = openai.Image.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2,
temperature=0,
)
generation_response = client.images.generate(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
size='1024x1024', n=1,
model=os.environ['AZURE_OPENAI_DEPLOYMENT']
)
```

The above code responds with a JSON object that contains the URL of the generated image. We can use the URL to download the image and save it to a file.
Expand All @@ -222,14 +225,13 @@

Let's look at the code that generates the image in more detail:

```python
generation_response = openai.Image.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2,
temperature=0,
)
```
```python
generation_response = client.images.generate(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
size='1024x1024', n=1,
model=os.environ['AZURE_OPENAI_DEPLOYMENT']
)
```

- **prompt**, is the text prompt that is used to generate the image. In this case, we're using the prompt "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils".
- **size**, is the size of the image that is generated. In this case, we're generating an image that is 1024x1024 pixels.
Expand All @@ -244,20 +246,29 @@

You can also do the following:

- **Perform edits**. By providing an existing image a mask and a prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done.
- **Perform edits**. By providing an existing image a mask and a prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done.
> Note: this is not supported in DALL-E 3.

Here is an example using GPT Image:

```python
response = client.images.edit(
model="gpt-image-1",
image=open("sunlit_lounge.png", "rb"),
mask=open("mask.png", "rb"),
prompt="A sunlit indoor lounge area with a pool containing a flamingo"
)
image_url = response.data[0].url
```

```python
response = openai.Image.create_edit(
image=open("base_image.png", "rb"),
mask=open("mask.png", "rb"),
prompt="An image of a rabbit with a hat on its head.",
n=1,
size="1024x1024"
)
image_url = response['data'][0]['url']
```
The base image would only contain the lounge with pool but the final image would have a flamingo:

<div style="display: flex; justify-content: space-between; align-items: center; margin: 20px 0;">
<img src="./images/sunlit_lounge.png" style="width: 30%; max-width: 200px; height: auto;">
<img src="./images/mask.png" style="width: 30%; max-width: 200px; height: auto;">
<img src="./images/sunlit_lounge_result.png" style="width: 30%; max-width: 200px; height: auto;">
</div>

The base image would only contain the rabbit but the final image would have the hat on the rabbit.

- **Create variations**. The idea is that you take an existing image and ask that variations are created. To create a variation, you provide an image and a text prompt and code like so:

Expand All @@ -280,16 +291,16 @@

> Prompt : "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"

![Bunny on a horse holding a lollipop, version 1](./images/v1-generated-image.png?WT.mc_id=academic-105485-koreyst)
![Bunny on a horse holding a lollipop, version 1](./images/v1-generated-image.png)

Now let's run that same prompt just to see that we won't get the same image twice:

![Generated image of bunny on horse](./images/v2-generated-image.png?WT.mc_id=academic-105485-koreyst)
![Generated image of bunny on horse](./images/v2-generated-image.png)

As you can see, the images are similar, but not the same. Let's try changing the temperature value to 0.1 and see what happens:

```python
generation_response = openai.Image.create(
generation_response = client.images.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2
Expand All @@ -303,7 +314,7 @@
Let's therefore change our code and set the temperature to 0, like so:

```python
generation_response = openai.Image.create(
generation_response = client.images.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2,
Expand All @@ -313,8 +324,8 @@

Now when you run this code, you get these two images:

- ![Temperature 0, v1](./images/v1-temp-generated-image.png?WT.mc_id=academic-105485-koreyst)
- ![Temperature 0 , v2](./images/v2-temp-generated-image.png?WT.mc_id=academic-105485-koreyst)
- ![Temperature 0, v1](./images/v1-temp-generated-image.png)
- ![Temperature 0 , v2](./images/v2-temp-generated-image.png)

Here you can clearly see how the images resemble each other more.

Expand Down Expand Up @@ -394,17 +405,17 @@
import requests
from PIL import Image
import dotenv

from openai import AzureOpenAI
# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
openai.api_base = "<replace with endpoint>"
openai.api_key = "<replace with api key>"
client = AzureOpenAI(
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ['AZURE_OPENAI_API_KEY'],
api_version = "2024-02-01"
)

# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

Expand All @@ -419,19 +430,19 @@
The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}"""
{disallow_list}
"""

prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
""""

try:
# Create an image by using the image generation API
generation_response = openai.Image.create(
generation_response = client.images.generate(
prompt=prompt, # Enter your prompt text here
size='1024x1024',
n=2,
temperature=0,
n=1,
)
# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')
Expand All @@ -444,7 +455,7 @@
image_path = os.path.join(image_dir, 'generated-image.png')

# Retrieve the generated image
image_url = generation_response["data"][0]["url"] # extract image URL from response
image_url = generation_response.data[0].url # extract image URL from response
generated_image = requests.get(image_url).content # download the image
with open(image_path, "wb") as image_file:
image_file.write(generated_image)
Expand All @@ -454,7 +465,7 @@
image.show()

# catch exceptions
except openai.InvalidRequestError as err:
except openai.BadRequestError as err:
print(err)
```

Expand Down
Binary file added 09-building-image-applications/images/mask.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading