AI Toolbox | AI Art Weekly

Mask²DiT can generate long videos with multiple scenes by aligning video segments with text descriptions.

14.10.25 · Project Page · Code · Text-to-Video

PINO

PINO can generate realistic interactions among groups of any size by breaking down complex actions into simple pairwise motions. It uses pretrained diffusion models for two-person interactions and ensures realistic movement with physics-based rules, allowing control over character speed and position.

14.10.25 · Project Page · Code · Text-to-Motion

Motion-2-to-3

Motion-2-to-3 can generate realistic 3D human motions from text prompts using 2D motion data from videos. It improves motion diversity and efficiency by predicting consistent joint movements and root dynamics with a multi-view diffusion model.

11.10.25 · Project Page · Code · 3D Motion Generation

OmniPart

OmniPart can generate 3D objects from a single image by planning their structure and then creating them.

09.10.25 · Project Page · Code · Demo · 3D Object Generation · 3D Editing

DiffVSR

DiffVSR can upscale and restore videos by improving their resolution while keeping details clear and stable across frames.

06.10.25 · Project Page · Code · Video Restoration · Video Upscaling

IntrinsiX

IntrinsiX can generate high-quality PBR maps from text descriptions. It helps with re-lighting, material editing, and texture generation, producing detailed and coherent images.

30.09.25 · Project Page · Code · Text-to-Image · Image Editing · 3D Texture Generation

MeshMosaic

MeshMosaic can generate high-resolution 3D meshes with over 100,000 triangles. It breaks shapes into smaller patches for better detail and accuracy, outperforming other methods that usually handle only 8,000 faces.

30.09.25 · Project Page · Code · 3D Mesh Generation

Manipulation by Analogy

Manipulation by Analogy can change audio textures by learning from paired speech examples. It allows users to add, remove, or replace sounds, and it works well in real-world situations beyond just speech.

27.09.25 · Project Page · Code · Audio Editing

Bokeh Diffusion

Bokeh Diffusion can control defocus blur in text-to-image diffusion models by using a physical defocus blur parameter. It allows for flexible blur adjustments while preserving scene structure and supports real image editing through inversion.

27.09.25 · Project Page · Code · Text-to-Image · Image Editing · Controllable Image Generation

Lyra

Lyra can generate 3D scenes from a single image or video. It uses a method that allows real-time rendering and dynamic scene generation without needing multiple views for training.

24.09.25 · Project Page · Code · Text-to-3D · Image-to-3D · Video-to-3D

RealisMotion

RealisMotion can generate human videos with realistic motions by separating four key elements: the subject, background, movement path, and actions. It uses a 3D world coordinate system for better motion editing and employs text-to-video diffusion models for high-quality results.

23.09.25 · Project Page · Code · Text-to-Video

CapStARE

CapStARE can achieve high accuracy in gaze estimation. It works in real-time at about 8ms per frame and handles extreme head poses well, making it ideal for interactive systems.

22.09.25 · Code · Video Analysis

Follow-Your-Click

Follow-Your-Click can animate specific regions of an image with a simple user click and a short motion prompt, and allows to control the speed of the animation.

17.09.25 · Project Page · Code · Image-to-Video

Animate-X++

Animate-X++ can animate characters from a single image and a pose sequence while creating dynamic backgrounds.

17.09.25 · Project Page · Code · Image-to-Video

HuMo

HuMo can generate high-quality human-centric videos from text, images, and audio. It ensures that the subjects are preserved and the audio matches the visuals, using advanced training methods for better control.

17.09.25 · Project Page · Code · Text-to-Video · Audio-to-Video · Image-to-Video

Diffuman4D

Diffuman4D can generate high-quality, 4D-consistent videos of human performances from just a few input videos. It uses a spatio-temporal diffusion model to improve the quality of the videos, making them more realistic and consistent than other methods.

11.09.25 · Project Page · Code · Video Inpainting · Video Restoration · Video Editing

InstantRestore

InstantRestore can restore badly damaged face images in near real-time. It uses a single-step image diffusion model and a small set of reference images to keep the person’s identity.

10.09.25 · Project Page · Code · Image Restoration

PeRFlow

ByteDance published a new low-step method called PeRFlow which accelerates diffusion models like Stable Diffusion to generate images faster. PeRFlow is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines such as ControlNet, Wonder3D and more.

08.09.25 · Project Page · Code · Controllable Image Generation · Personalized Image Generation

Synthesizing Moving People with 3D Control

3DHM can animate people with 3D camera control from a single image and a given target video motion sequence.

02.09.25 · Project Page · Code · 3D Object Generation Motion Generation

SemLayoutDiff

SemLayoutDiff can generate diverse 3D indoor scenes by creating detailed semantic maps and placing furniture while considering doors and windows.

29.08.25 · Project Page · Code · 3D Scene Generation