AI Toolbox

A curated collection of 855 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

The Fabrication of Reality and Fantasy

RFNet is a training-free approach that bring better prompt understanding to image generation. Adding support for prompt reasoning, conceptual and metaphorical thinking, imaginative scenarios and more.

12.10.24 · Project Page · Code · Text-to-Image

FreeLong

FreeLong can generate 128 frame videos from short video diffusion models trained on 16 frame videos without requiring additional training. It’s not SOTA, but has just the right amount of cursedness 👌

11.10.24 · Project Page · Code · Video Inpainting

Animate3D

Animate3D can animate any static multi-view 3D model.

10.10.24 · Project Page · Code · 3D Object Generation · 3D Motion Generation

VSTAR

VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.

10.10.24 · Project Page · Code · Text-to-Video

Hallo2

Hallo2 can create long, high-resolution (4K) animations of portrait images driven by audio. It allows users to adjust facial expressions with text labels, improving control and reducing issues like appearance drift and temporal artifacts.

10.10.24 · Project Page · Code · Talking Head Generation

Pyramid Flow

Pyramidal Flow Matching can generate high-quality 5 to 10-second videos at 768p resolution and 24 FPS. It uses a unified pyramidal flow matching algorithm to link flows across different stages, making video creation more efficient.

10.10.24 · Project Page · Code · Demo · Model · Text-to-Video · Image-to-Video

Trans4D

Trans4D can generate realistic 4D scene transitions with expressive object deformation.

10.10.24 · Code · Text-to-Motion

AvatarGO

AvatarGO can generate 4D human-object interaction scenes from text. It uses LLM-guided contact retargeting for accurate spatial relations and ensures smooth animations with correspondence-aware motion optimization.

09.10.24 · Project Page · Code · Text-to-3D · 3D Scene Generation · 3D Object Generation

GenN2N

And because methods always come in pairs, GenN2N is another NeRF editing method. This one can edit scenes using text prompts, colorize, upscale and inpaint them.

09.10.24 · Project Page · Code · 3D Editing · 3D Object Generation

SEMat

SEMat can improve interactive image matting! It enhances network design and training to achieve better transparency, detail, and accuracy than methods like MAM and SmartMat.

09.10.24 · Code · Image Segmentation

UniMuMo

UniMuMo can generate outputs across text, music, and motion. It achieves this by aligning unpaired music and motion data based on rhythmic patterns.

08.10.24 · Project Page · Code · Text-to-Motion · Text-to-Music

OmniBooth

OmniBooth can generate images with precise control over their layout and style. It allows users to customize images using masks and text or image guidance, making the process flexible and personal.

08.10.24 · Project Page · Code · Text-to-Image · Image Editing · Personalized Image Generation

EgoAllo

EgoAllo can estimate 3D human body pose, height, and hand parameters using images from a head-mounted device.

07.10.24 · Project Page · Code · 3D Object Detection · Motion Capture

MagicClay

While TripoSR can generate meshes from an image, MagicClay can edit them. It’s an artist-friendly tool that allows you to sculpt regions of a mesh with text prompts while keeping other regions untouched.

05.10.24 · Project Page · Code · 3D Object Generation · 3D Editing

TCAN

TCAN can animate characters of various styles from a pose guidance video.

04.10.24 · Project Page · Code · Image-to-Video

GAGAvatar

GAGAvatar can create 3D head avatars from a single image and enable real-time facial expression reenactment.

04.10.24 · Project Page · Code · Talking Head Generation

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

Generative Radiance Field Relighting can relight 3D scenes captured under a single light source. It allows for realistic control over light direction and improves the consistency of views, making it suitable for complex scenes with multiple objects.

04.10.24 · Project Page · Code · 3D Scene Generation · 3D Object Generation · 3D Editing

Explorative Inbetweening of Time and Space

Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.

04.10.24 · Project Page · Code · Image-to-Video

Text-Guided Vector Graphics Customization

Love this one! SVGCustomization is a novel pipeline that is able to edit existing vector images with text prompts while preserving the properties and layer information vector images are made of.

02.10.24 · Project Page · Code · Text-to-Image · Image Editing

SynTalker

SynTalker can generate realistic full-body motions that match speech and text prompts. It allows precise control of movements, like talking while walking.

02.10.24 · Project Page · Code · Text-to-Motion