AI Toolbox
A curated collection of 732 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





DiffPortrait360 can create high-quality 360-degree views of human heads from single images.
VACE basically adds ControlNet support to video models like Wan and LTX. It handle various video tasks like generating videos from references, video inpainting, pose control, sketch to video and more.
Perception-as-Control can achieve fine-grained motion control for image animation by creating a 3D motion representation from a reference image.
MVGenMaster can generate up to 100 new views from a single image using a multi-view diffusion model.
On the other hand, DiffuseKronA is another method that tries to avoid having to use LoRAs and wants to personalize just from input images. This one generates high-quality images with accurate text-image correspondence and improved color distribution from diverse and complex input images and prompts.
LeX-Art can generate high-quality text-image pairs with better text rendering and design. It uses a prompt enrichment model called LeX-Enhancer and two optimized models, LeX-FLUX and LeX-Lumina, to improve color, position, and font accuracy.
TexGaussian can generate high-quality PBR materials for 3D meshes in one step. It produces albedo, roughness, and metallic maps quickly and with great visual quality, ensuring better consistency with the input geometry.
AccVideo can speed up video diffusion models by reducing the number of steps needed for video creation. It achieves an 8.5x faster generation speed compared to HunyuanVideo, producing high-quality videos at 720x1280 resolution and 24fps, which makes text-to-video generation way more efficient.
InterMimic can learn complex human-object interactions from imperfect motion capture data. It enables realistic simulations of full-body interactions with dynamic objects and works well with kinematic generators for better modeling.
FloVD can generate camera-controllable videos using optical flow maps to show motion.
MotionMatcher can customize text-to-video diffusion models using a reference video to transfer motion and camera framing to different scenes.
DIDiffGes can generate high-quality gestures from speech in just 10 sampling steps.
LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.
StyleMaster can stylize videos by transferring artistic styles from images while keeping the original content clear.
PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.
SISO can generate and edit images using just one subject image without any training. It improves image quality, keeps the subject clear, and preserves the background better than other methods.
MagicMotion can animate objects in videos by controlling their paths with masks, bounding boxes, and sparse boxes.
DeepMesh can generate high-quality 3D meshes from point clouds and images.
ARTalk can generate realistic 3D head motions, including lip synchronization, blinking, and facial expressions, from audio in real-time.
InfiniteYou can generate high-quality images with FLUX and retain a person’s identity.