Home > Press > Rice research could make weird AI images a thing of the past: New diffusion model approach solves the aspect ratio problem
Moayed Haji Ali is a Rice University computer science doctoral student. Credit (Photo by Vicente Ordóñez-Román/Rice University) |
Abstract:
Generative artificial intelligence (AI) has notoriously struggled to create consistent images, often getting details like fingers and facial symmetry wrong. Moreover, these models can completely fail when prompted to generate images at different image sizes and resolutions.
Rice University computer scientists’ new method of generating images with pre-trained diffusion models ⎯ a class of generative AI models that “learn” by adding layer after layer of random noise to the images they are trained on and then generate new images by removing the added noise ⎯ could help correct such issues.
Moayed Haji Ali, a Rice University computer science doctoral student, described the new approach, called ElasticDiffusion, in a peer-reviewed paper presented at the Institute of Electrical and Electronics Engineers (IEEE) 2024 Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle.
“Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images,” Haji Ali said. “But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch … that’s where these models become problematic.”
If you tell a model like Stable Diffusion to create a non-square image, say a 16:9 aspect ratio, the elements used to build the generated image gets repetitive. That repetition shows up as strange-looking deformities in the image or image subjects, like people with six fingers or a strangely elongated car.
The way these models are trained also contributes to the issue.
“If you train the model on only images that are a certain resolution, they can only generate images with that resolution,” said Vicente Ordóñez-Román, an associate professor of computer science who advised Haji Ali on his work alongside Guha Balakrishnan, assistant professor of electrical and computer engineering.
Ordóñez-Román explained that this is a problem endemic to AI known as overfitting, where an AI model becomes excessively good at generating data similar to what it was trained on, but cannot deviate far outside those parameters.
“You could solve that by training the model on a wider variety of images, but it’s expensive and requires massive amounts of computing power ⎯ hundreds, maybe even thousands of graphics processing units,” Ordóñez-Román said.
According to Haji Ali, the digital noise used by diffusion models can be translated into a signal with two data types: local and global. The local signal contains pixel-level detail information like the shape of an eye or the texture of a dog’s fur. The global signal contains more of an overall outline of the image.
“One reason diffusion models need help with non-square aspect ratios is that they usually package local and global information together,” said Haji Ali, who worked on synthesizing motion in AI-generated videos before joining Ordóñez-Román’s research group at Rice for his Ph.D. studies. “When the model tries to duplicate that data to account for the extra space in a non-square image, it results in visual imperfections.”
The ElasticDiffusion method in Haji Ali’s paper takes a different approach to creating an image. Instead of packaging both signals together, ElasticDiffusion separates the local and global signals into conditional and unconditional generation paths. It subtracts the conditional model from the unconditional model, obtaining a score which contains global image information.
After that, the unconditional path with the local pixel-level detail is applied to the image in quadrants, filling in the details one square at a time. The global information ⎯ what the image aspect ratio should be and what the image is (a dog, a person running, etc.) ⎯ remains separate, so there is no chance of the AI confusing the signals and repeating data. The result is a cleaner image regardless of the aspect ratio that does not need additional training.
“This approach is a successful attempt to leverage the intermediate representations of the model to scale them up so that you get global consistency,” Ordóñez-Román said.
The only drawback to ElasticDiffusion relative to other diffusion models is time. Currently, it takes up to 6-9 times as long for Haji Ali’s method to make an image. The goal is to reduce that to the same inference time as other models like Stable Diffusion or DALL-E.
“Where I’m hoping that this research is going is to define…why diffusion models generate these more repetitive parts and can’t adapt to these changing aspect ratios and come up with a framework that can adapt to exactly any aspect ratio regardless of the training, at the same inference time,” said Haji Ali.
⎯ by John Bogna
####
About Rice University
Located on a 300-acre forested campus in Houston, Rice University is consistently ranked among the nation’s top 20 universities by U.S. News & World Report. Rice has highly respected schools of architecture, business, continuing studies, engineering, humanities, music, natural sciences and social sciences and is home to the Baker Institute for Public Policy. With 4,574 undergraduates and 3,982 graduate students, Rice’s undergraduate student-to-faculty ratio is just under 6-to-1. Its residential college system builds close-knit communities and lifelong friendships, just one reason why Rice is ranked No. 1 for lots of race/class interaction, No. 2 for best-run colleges and No. 12 for quality of life by the Princeton Review. Rice is also rated as a best value among private universities by Kiplinger’s Personal Finance.
For more information, please click here
Contacts:
Silvia Cernea Clark
Rice University
Office: 7133486728
Copyright © Rice University
If you have a comment, please Contact us.Issuers of news releases, not 7th Wave, Inc. or Nanotechnology Now, are solely responsible for the accuracy of the content.
Related Links |
Related News Press |
News and information
New method in the fight against forever chemicals September 13th, 2024
Energy transmission in quantum field theory requires information September 13th, 2024
Breakthrough in proton barrier films using pore-free graphene oxide: Kumamoto University researchers achieve new milestone in advanced coating technologies September 13th, 2024
Quantum researchers cause controlled ‘wobble’ in the nucleus of a single atom September 13th, 2024
Possible Futures
New discovery aims to improve the design of microelectronic devices September 13th, 2024
New method in the fight against forever chemicals September 13th, 2024
Announcements
New discovery aims to improve the design of microelectronic devices September 13th, 2024
New method in the fight against forever chemicals September 13th, 2024
Interviews/Book Reviews/Essays/Reports/Podcasts/Journals/White papers/Posters
Breakthrough in proton barrier films using pore-free graphene oxide: Kumamoto University researchers achieve new milestone in advanced coating technologies September 13th, 2024
Quantum researchers cause controlled ‘wobble’ in the nucleus of a single atom September 13th, 2024
Artificial Intelligence
Simulating magnetization in a Heisenberg quantum spin chain April 5th, 2024
Researchers’ approach may protect quantum computers from attacks March 8th, 2024
HKUST researchers develop new integration technique for efficient coupling of III-V and silicon February 16th, 2024
The latest news from around the world, FREE | ||
Premium Products | ||
Only the news you want to read!
Learn More |
||
Full-service, expert consulting
Learn More |
||