3D AI Tools
Free 3D AI tools for creating, optimizing, and manipulating 3D assets for games, films, and design projects, boosting your creative process.
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting can create high-quality 3D content from text prompts. It uses edge, depth, normal, and scribble maps in a multi-view diffusion model, enhancing 3D shapes with a unique hybrid guidance method.
StyleGaussian on the other hand enables instant style transfer of any image’s style to a 3D scene at 10fps while preserving strict multi-view consistency.
SplattingAvatar can generate photorealistic real-time human avatars using a mix of Gaussian Splatting and triangle mesh geometry. It achieves over 300 FPS on modern GPUs and 30 FPS on mobile devices, allowing for detailed appearance modeling and various animation techniques.
TripoSR can generate high-quality 3D meshes from a single image in under 0.5 seconds.
ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.
GEM3D is a deep, topology-aware generative model of 3D shapes. The method is able to generate diverse and plausible 3D shapes from user-modeled skeletons, making it possible to draw the rough structure of an object and have the model fill in the rest.
MeshFormer can generate high-quality 3D textured meshes from just a few 2D images in seconds.
SPA-RP can create 3D textured meshes and estimate camera positions from one or a few 2D images. It uses 2D diffusion models to quickly understand 3D space, achieving high-quality results in about 20 seconds.
FlashTex](https://flashtex.github.io) can texture an input 3D mesh given a user-provided text prompt. These generated textures can also be relit properly in different lighting environments.
Argus3D can generate 3D meshes from images and text prompts as well as unique textures for its generated shapes. Just imagine composing a 3D scene and fill it with objects by pointing at a space and using natural language to describe what you want to place there.
GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.
LGM can generate high-resolution 3D models from text prompts or single-view images. It uses a fast multi-view Gaussian representation, producing models in under 5 seconds while maintaining high quality.
AToM is a text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.
GALA can turn a single-layer clothed 3D human mesh and decompose it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create new clothed human avatars with any pose.
GARField can break down 3D scenes into meaningful groups. It improves the accuracy of object clustering and allows for better extraction of individual objects and their parts.
RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.
Audio2Photoreal can generate full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, the model is able to output multiple possibilities of gestural motion for an individual, including face, body, and hands. The results are highly photorealistic avatars that can express crucial nuances in gestures such as sneers and smirks.
SIGNeRF is a new approach for fast and controllable NeRF scene editing and scene-integrated object generation. The method is able to generate new objects into an existing NeRF scene or edit existing objects within the scene in a controllable manner by either proxy object placement or shape selection.