Text-to-3D
Free text-to-3D AI tools for quickly generating 3D assets for games, films, and virtual environments, optimizing your creative projects.
HoloDreamer can generate enclosed 3D scenes from text descriptions. It does so by first creating a high-quality equirectangular panorama and then rapidly reconstructing the 3D scene using 3D Gaussian Splatting.
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting can create high-quality 3D content from text prompts. It uses edge, depth, normal, and scribble maps in a multi-view diffusion model, enhancing 3D shapes with a unique hybrid guidance method.
ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.
MeshFormer can generate high-quality 3D textured meshes from just a few 2D images in seconds.
GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.
LGM can generate high-resolution 3D models from text prompts or single-view images. It uses a fast multi-view Gaussian representation, producing models in under 5 seconds while maintaining high quality.
AToM is a text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
En3D can generate high-quality 3D human avatars from 2D images without needing existing assets.
WonderJourney lets you wander through your favourite paintings, peoms and haikus. The method can generate a sequence of diverse yet coherently connected 3D scenes from a single image or text prompt.
4D-fy can generate high-quality 4D scenes from text prompts. It combines the strengths of text-to-image and text-to-video models to create dynamic scenes with great visual quality and realistic motion.
LucidDreamer is a text-to-3D generation framework that is able to generate 3D models with high-quality textures and shapes. Higher quality means longer inference. This one takes 35 minutes on an A100 GPU.
Progressive3D can generate detailed 3D content from complex prompts by breaking the process into smaller editing steps. It lets users focus on specific areas for editing and improves results by highlighting differences in meaning.
HumanNorm is a novel approach for high-quality and realistic 3D human generation by leveraging normal maps which enhances the 2D perception of 3D geometry. The results are quite impressive and comparable with PS3 games.
TECA can generate realistic 3D avatars from text descriptions. It combines traditional 3D meshes for faces and bodies with neural radiance fields (NeRF) for hair and clothing, allowing for high-quality, editable avatars and easy feature transfer between them.
Text2NeRF can generate 3D scenes from text descriptions by combining neural radiance fields (NeRF) with a text-to-image diffusion model. It creates high-quality textures and detailed shapes without needing extra training data, achieving better photo-realism and multi-view consistency than other methods.
DragonDiffusion can edit images by moving, resizing, and changing the appearance of objects without needing to retrain the model. It lets users drag points on images for easy and precise editing.
Shap-E can generate complex 3D assets by producing parameters for implicit functions. It creates both textured meshes and neural radiance fields, and it works faster with better quality than the Point-E model.
AvatarCraft can turn a text prompt into a high-quality 3D human avatar. It allows users to control the avatar’s shape and pose, making it easy to animate and reshape without retraining.
3DFuse can improve 3D scene generation by adding 3D awareness to 2D diffusion models. It builds a rough 3D structure from text prompts and uses depth maps for better realism in reconstructions.