AI Toolbox
A curated collection of 610 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
LinFusion can generate high-resolution images up to 16K in just one minute using a single GPU. It improves performance on various Stable Diffusion versions and works with pre-trained components like ControlNet and IP-Adapter.
ViewCrafter can generate high-quality 3D views from single or few images using a video diffusion model. It allows for precise camera control and is useful for real-time rendering and turning text into 3D scenes.
CSGO can perform image-driven style transfer and text-driven stylized synthesis. It uses a large dataset with 210k image triplets to improve style control in image generation.
HumanVid can generate videos from a character photo while allowing users to control both human and camera motions. It introduces a large-scale dataset that combines high-quality real-world and synthetic data, achieving state-of-the-art performance in camera-controllable human image animation.
Follow-Your-Canvas can outpaint videos at higher resolutions, from 512x512 to 1152x2048.
LogoMotion can turn logos from layered PDF files into content-aware animated HTML canvas animations. Very cool!
KEEP can enhance video face super-resolution by maintaining consistency across frames. It uses Kalman filtering to improve facial details, working well on both synthetic and real-world videos.
tps-inbetween can generate high-quality intermediate frames for animation line art. It effectively connects lines and fills in missing details, even during fast movements, using a method that models keypoint relationships between frames.
STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.
TVG can create smooth transition videos between two images without needing training. It uses diffusion models and Gaussian Process Regression for high-quality results and adds controls for better timing.
Iterative Object Count Optimization can improve object counting accuracy in text-to-image diffusion models.
SparseCraft can reconstruct 3D shapes and appearances from just three colored images. It uses a Signed Distance Function (SDF) and a radiance field, achieving fast training times of under 10 minutes without needing pretrained models.
MagicFace can generate high-quality images of people in any style without needing extra training.
MagicFace can generate high-quality images of people in any style without needing training. It uses special attention methods for precise attribute alignment and feature injection, working for both single and multi-concept customization.
Generative Photomontage can combine parts of multiple AI-generated images using a brush tool. It enables the creation of new appearance combinations, correct shapes and artifacts, and improve prompt alignment, outperforming existing image blending methods.
Filtered Guided Diffusion shows that image-to-image translation and editing doesn’t necessarily require additional training. FGD simply applies a filter to the input of each diffusion step based on the output of the previous step in an adaptive manner which makes this approach easy to implement.
[Matryoshka Diffusion Models] can generate high-quality images and videos using a NestedUNet architecture that denoises inputs at different resolutions. This method allows for strong performance at resolutions up to 1024x1024 pixels and supports effective training without needing specific examples.
DiffComplete can complete 3D shapes from incomplete scans using a diffusion-based method.
Puppet-Master can create realistic motion in videos from a single image using simple drag controls. It uses a fine-tuned video diffusion model and all-to-first attention method to make high-quality videos.
Generative Camera Dolly can regenerate a video from any chosen perspective. Still very early, but imagine being able to change any shot or angle in a video after it’s been recorded!