AI Toolbox

A curated collection of 849 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

SGEdit

SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.

18.11.24 · Project Page · Code · Image Editing

StyleSplat

StyleSplat can stylize 3D objects in scenes represented by 3D Gaussians from reference style images. The method is able to localize style transfer to specific objects and supports stylization with multiple styles.

17.11.24 · Project Page · Code · 3D Style Transfer

Long-LRM

Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.

17.11.24 · Project Page · Code · Image-to-3D · 3D Scene Generation

CamI2V

CamI2V is a method which can generate videos from images with precise control over camera movements and text prompts.

16.11.24 · Project Page · Code · Image-to-Video

MagicQuill

MagicQuill enables efficient image editing with a simple interface that lets users easily insert elements and change colors. It uses a large language model to understand editing intentions in real time, improving the quality of the results.

14.11.24 · Project Page · Code · Demo · Image Editing

GarmentDreamer

GarmentDreamer can generate wearable, simulation-ready 3D garment meshes from text prompts. The method is able to generate diverse geometric and texture details, making it possible to create a wide range of different clothing items.

14.11.24 · Project Page · Code · Text-to-3D · 3D Object Generation

JoyVASA

JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.

14.11.24 · Project Page · Code · Model · Lip Syncing · Audio-to-Video · Image-to-Video

SPARK

SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup.

13.11.24 · Project Page · Code · 3D Object Generation · 3D Mesh Generation · 3D Editing

CHANGER

CHANGER can integrate an actor’s head onto a target body in digital content. It uses chroma keying for clear backgrounds and enhances blending quality with Head shape and long Hair augmentation (H2 augmentation) and a Foreground Predictive Attention Transformer (FPAT).

12.11.24 · Project Page · Code · Talking Head Generation

BPT

Scaling Mesh Generation via Compressive Tokenization can generate high-quality meshes with over 8,000 faces.

11.11.24 · Project Page · Code · 3D Mesh Generation

DAWN

DAWN can generate talking head videos from a single portrait and audio clip. It produces lip movements and head poses quickly, making it effective for creating long video sequences.

09.11.24 · Project Page · Code · Talking Head Generation

DimensionX

DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.

08.11.24 · Project Page · Code · Image-to-3D · Image-to-Video

SG-I2V

SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories

08.11.24 · Project Page · Code · Image-to-Video · Controllable Video Generation

RayGauss

RayGauss can create realistic new views of 3D scenes, using Gaussian-based ray casting! It produces high-quality images quickly, running at 25 frames per second, and avoids common picture problems that older methods had.

06.11.24 · Project Page · Code · Image-to-Image · Image Restoration

CLoSD

CLoSD can control characters in physics-based simulations using text prompts. It can navigate to goals, strike objects, and switch between sitting and standing, all guided by simple instructions.

06.11.24 · Project Page · Code · Text-to-Motion · Motion Generation

Generalizable Implicit Motion Modeling for Video Frame Interpolation

GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.

06.11.24 · Project Page · Code · Video Analysis

Regional-Prompting-FLUX

Regional-Prompting-FLUX adds regional prompting capabilities to diffusion transformers like FLUX. It effectively manages complex prompts and works well with tools like LoRA and ControlNet.

05.11.24 · Code · Text-to-Image

AutoVFX

AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.

05.11.24 · Project Page · Code · Video Editing

Adaptive Caching

Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.

05.11.24 · Project Page · Code · Text-to-Video

ZIM

ZIM can generate precise matte masks from segmentation labels, enabling zero-shot image matting.

05.11.24 · Project Page · Code · Image Segmentation