3D AI Tools
Free 3D AI tools for creating, optimizing, and manipulating 3D assets for games, films, and design projects, boosting your creative process.
MeshFormer can generate high-quality 3D textured meshes from just a few 2D images in seconds.
SPA-RP can create 3D textured meshes and estimate camera positions from one or a few 2D images. It uses 2D diffusion models to quickly understand 3D space, achieving high-quality results in about 20 seconds.
FlashTex](https://flashtex.github.io) can texture an input 3D mesh given a user-provided text prompt. These generated textures can also be relit properly in different lighting environments.
Argus3D can generate 3D meshes from images and text prompts as well as unique textures for its generated shapes. Just imagine composing a 3D scene and fill it with objects by pointing at a space and using natural language to describe what you want to place there.
GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.
LGM can generate high-resolution 3D models from text prompts or single-view images. It uses a fast multi-view Gaussian representation, producing models in under 5 seconds while maintaining high quality.
InterScene is a novel framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. Another step closer to completely dynamic game worlds and simulations. Checkout an impressive demo below.
AToM is a text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.
GALA can turn a single-layer clothed 3D human mesh and decompose it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create new clothed human avatars with any pose.
GARField can break down 3D scenes into meaningful groups. It improves the accuracy of object clustering and allows for better extraction of individual objects and their parts.
RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.
Audio2Photoreal can generate full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, the model is able to output multiple possibilities of gestural motion for an individual, including face, body, and hands. The results are highly photorealistic avatars that can express crucial nuances in gestures such as sneers and smirks.
SIGNeRF is a new approach for fast and controllable NeRF scene editing and scene-integrated object generation. The method is able to generate new objects into an existing NeRF scene or edit existing objects within the scene in a controllable manner by either proxy object placement or shape selection.
En3D can generate high-quality 3D human avatars from 2D images without needing existing assets.
DreamGaussian4D can generate animated 3D meshes from a single image. The method is able to generate diverse motions for the same static model and do that in 4.5 minutes instead of several hours compared to other methods.
Spacetime Gaussian Feature Splatting is a novel dynamic scene representation that is able to capture static, dynamic, as well as transient content within a scene and can render them at 8K resolution and 60 FPS on an RTX 4090.
RelightableAvatar is another method that can create relightable and animatable neural avatars from monocular video.
HAAR can generate realistic 3D hairstyles from text prompts. It uses 3D hair strands to create detailed hair structures and allows for physics-based rendering and simulation.