AI Toolbox | AI Art Weekly

Clearer Frames, Anytime

InterpAny-Clearer is a video frame interpolation method that is able to generate clearer and sharper frames compared to existing methods. Additionally, it introduces the ability to manipulate the interpolation of objects in a video independently, which could be useful for video editing tasks.

14.11.23 · Project Page · Code · Video Inpainting · Video Editing

I2VGen-XL

I2VGen-XL can generate high-quality videos from static images using a cascaded diffusion model. It achieves a resolution of 1280x720 and improves the flow of movement in videos through a two-stage process that separates detail enhancement from overall coherence.

07.11.23 · Project Page · Code · Image-to-Video

Consistent4D

Consistent4D is an approach for generating 4D dynamic objects from uncalibrated monocular videos. With the speed we’re progressing, it looks like dynamic 3D scenes from single-cam videos will be here sooner than I’ve expected the last few weeks.

06.11.23 · Project Page · Code · Video-to-4D

Mesh Neural Cellular Automata

Mesh Neural Cellular Automata (MeshNCA) is a method for directly synthesizing dynamic textures on 3D meshes without requiring any UV maps. The model can be trained using different targets such as images, text prompts, and motion vector fields. Additionally, MeshNCA allows several user interactions including texture density/orientation control, a grafting brush, and motion speed/direction control.

06.11.23 · Project Page · Code · Text-to-Texture · Image-to-Texture

VideoDreamer

VideoDreamer is a framework that is able to generate videos that contain the given subjects and simultaneously conform to text prompts.

02.11.23 · Project Page · Code · Text-to-Video

SEINE

SEINE is a short-to-long video diffusion model that focuses on generative transitions and predictions. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of clips. The model can also be used for image-to-video animation and autoregressive video prediction.

31.10.23 · Project Page · Code · Text-to-Video · Video Editing · Video Summarization

ZeroNVS

ZeroNVS is a 3D-aware diffusion model that is able to generate novel 360-degree views of in-the-wild scenes from a single real image.

27.10.23 · Project Page · Code · Image-to-3D

DreamCraft3D

DreamCraft3D can create high-quality 3D objects from a single prompt. It uses a 2D reference image to guide the sculpting of the 3D object and then improves texture fidelity by running it through a fine-tuned Dreambooth model.

25.10.23 · Project Page · Code · 3D Object Generation · 3D Scene Generation

FreeNoise

FreeNoise is a method that can generate longer videos with up to 512 frames from multiple text prompts. That’s about 21 seconds for a 24fps video. The method doesn’t require any additional fine-tuning on the video diffusion model and only takes about 20% more time compared to the original diffusion process.

23.10.23 · Project Page · Code · Text-to-Video

Zero123++

Zero123++ can generate high-quality, 3D-consistent multi-view images from a single input image using an image-conditioned diffusion model. It fixes common problems like blurry textures and misaligned shapes, and includes a ControlNet for better control over the image creation process.

23.10.23 · Code · Image-to-3D

Wonder3D

Wonder3D is able to convert a single image into a high-fidelity 3D model, complete with textured meshes and color. The entire process takes only 2 to 3 minutes.

23.10.23 · Project Page · Code · Image-to-3D

Head360

Head360 can generate a parametric 3D full-head model you can view from any angle! It works from just one picture, letting you change expressions and hairstyles quickly.

20.10.23 · Project Page · Code · 3D Editing · 3D Avatar Generation

OIR-Diffusion

Object-aware Inversion and Reassembly can edit multiple objects in an image by finding the best steps for each edit. It allows for precise changes in shapes, colors, and materials while keeping the rest of the image intact.

18.10.23 · Project Page · Code · Image Editing

Progressive3D

Progressive3D can generate detailed 3D content from complex prompts by breaking the process into smaller editing steps. It lets users focus on specific areas for editing and improves results by highlighting differences in meaning.

18.10.23 · Project Page · Code · Text-to-3D · 3D Editing · 3D Object Generation

HyperHuman

HyperHuman is a text-to-image model that focuses on generating hyper-realistic human images from text prompts and a pose image. The results are pretty impressive and the model is able to generate images in different styles and up to a resolution of 1024x1024.

12.10.23 · Project Page · Code · Image Editing · Image Restoration

MotionDirector

MotionDirector is a method that can train text-to-video diffusion models to generate videos with the desired motions from a reference video.

12.10.23 · Project Page · Code · Text-to-Video · Video Editing

ScaleCrafter

ScaleCrafter can generate ultra-high-resolution images up to 4096x4096 and videos at 2048x1152 using pre-trained diffusion models. It reduces problems like object repetition and allows for custom aspect ratios, achieving excellent texture detail.

11.10.23 · Project Page · Code · Image Upscaling · Image Restoration

Uni-paint

Uni-paint can perform image inpainting using different methods like text, strokes, and examples. It uses a pretrained Stable Diffusion model, allowing it to adapt to new images without extra training.

11.10.23 · Code · Image Inpainting

SadTalker

SadTalker can generate talking head videos from a single image and audio. It creates realistic head movements and expressions by linking audio to 3D motion, improving video quality and coherence.

10.10.23 · Project Page · Code · Talking Head Generation

FLATTEN

FLATTEN can improve the visual flow of edited videos by using optical flow in diffusion models. This method enhances the consistency of video frames without needing extra training.

09.10.23 · Project Page · Code · Text-to-Video · Video Editing