AI Toolbox | AI Art Weekly

InTeX

InTeX can enable interactive text-to-texture synthesis for 3D content creation. It allows users to repaint specific areas and edit textures precisely, while a depth-aware inpainting model reduces 3D inconsistencies and speeds up generation.

18.03.24 · Project Page · Code · Text-to-Texture

Stylized Face Sketch Extraction via Generative Prior with Limited Data

StyleSketch is a method for extracting high-resolution stylized sketches from a face image. Pretty cool!

17.03.24 · Project Page · Code · Image-to-Sketch

MVControl

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting can create high-quality 3D content from text prompts. It uses edge, depth, normal, and scribble maps in a multi-view diffusion model, enhancing 3D shapes with a unique hybrid guidance method.

15.03.24 · Project Page · Code · Text-to-3D · 3D Object Generation

Desigen

Desigen can generate high-quality design templates, including background images and layout elements. It uses advanced diffusion models for better control and has been tested on over 40,000 advertisement banners, achieving results similar to human designers.

14.03.24 · Project Page · Code · Controllable Image Generation · Image-to-Image · Image Editing

StyleGaussian

StyleGaussian on the other hand enables instant style transfer of any image’s style to a 3D scene at 10fps while preserving strict multi-view consistency.

12.03.24 · Project Page · Code · Image-to-Texture

DragAnything

DragAnything can control the motion of any object in videos by letting users draw trajectory lines. It allows for separate motion control of multiple objects, including backgrounds.

12.03.24 · Project Page · Code · Controllable Video Generation

DEADiff

DEADiff can synthesize images that combine the style of a reference image with text prompts. It uses a Q-Former mechanism to separate style and meaning.

11.03.24 · Project Page · Code · Text-to-Image · Image Style Transfer

VideoElevator

VideoElevator is a training-free and plug-and-play method that can be used to enhance temporal consistency and add more photo-realistic details of text-to-video models by using text-to-image models.

08.03.24 · Project Page · Code · Text-to-Video

ELLA

ELLA is a lightweight approach to equip existing CLIP-based diffusion models with LLMs to improve prompt-understanding and enables long dense text comprehension for text-to-image models.

08.03.24 · Project Page · Code · Text-to-Image · Image Editing

SplattingAvatar

SplattingAvatar can generate photorealistic real-time human avatars using a mix of Gaussian Splatting and triangle mesh geometry. It achieves over 300 FPS on modern GPUs and 30 FPS on mobile devices, allowing for detailed appearance modeling and various animation techniques.

08.03.24 · Project Page · Code · 3D Animation · 3D Avatar Generation

PixArt-Σ

The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.

07.03.24 · Project Page · Code · Text-to-Image

UniCtrl

UniCtrl can improve the quality and consistency of videos made by text-to-video models. It enhances how frames connect and move together without needing extra training, making videos look better and more diverse in motion.

04.03.24 · Project Page · Code · Text-to-Video

TripoSR

TripoSR can generate high-quality 3D meshes from a single image in under 0.5 seconds.

04.03.24 · Code · Image-to-3D · 3D Object Generation

ResAdapter

ResAdapter can generate images with any resolution and aspect ratio for diffusion models. It works with various personalized models and processes images efficiently, using only 0.5M parameters while keeping the original style.

04.03.24 · Project Page · Code · Image Upscaling · Image Editing · Image Restoration

ViewDiff

ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.

04.03.24 · Project Page · Code · Text-to-3D · 3D Object Generation

Trajectory Consistency Distillation

While LCM and Turbo have unlocked near real-time image diffusion, the quality is still a bit lacking. TCD on the other hand manages to generate images with both clarity and detailed intricacy without compromising on speed.

29.02.24 · Project Page · Code · Image Restoration · Image Classification

OHTA

OHTA can create detailed and usable hand avatars from just one image. It allows for text-to-avatar conversion and editing of hand textures and shapes, using data-driven hand priors to improve accuracy with limited input.

29.02.24 · Project Page · Code · Image-to-3D · Image Editing

SongComposer

SongComposer can generate both lyrics and melodies using symbolic song representations. It aligns lyrics and melodies precisely and outperforms advanced models like GPT-4 in creating songs.

27.02.24 · Project Page · Code · Text-to-Music Text-to-SFX

GEM3D

GEM3D is a deep, topology-aware generative model of 3D shapes. The method is able to generate diverse and plausible 3D shapes from user-modeled skeletons, making it possible to draw the rough structure of an object and have the model fill in the rest.

26.02.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

Multi-LoRA Composition for Image Generation

Multi-LoRA Composition focuses on the integration of multiple Low-Rank Adaptations (LoRAs) to create highly customized and detailed images. The approach is able to generate images with multiple elements without fine-tuning and without losing detail or image quality.

26.02.24 · Project Page · Code · Controllable Image Generation · Image Editing