Text-to-3D
Free text-to-3D AI tools for quickly generating 3D assets for games, films, and virtual environments, optimizing your creative projects.
X-Oscar can generate high-quality 3D avatars from text prompts. It uses a step-by-step process for geometry, texture, and animation, while addressing issues like low quality and oversaturation through advanced techniques.
GaussianCube is a image-to-3D model that is able to generate high-quality 3D objects from multi-view images. This one also uses 3D Gaussian Splatting, converts the unstructured representation into a structured voxel grid, and then trains a 3D diffusion model to generate new objects.
Garment3DGen can stylize the geometry and textures from 2D image and 3D mesh garments! These can be fitted on top of parametric bodies and simulated. Could be used for hand-garment interaction in VR or to turn sketches into 3D garments.
TexDreamer can generate high-quality 3D human textures from text and images. It uses a smart fine-tuning method and a unique translator module to create realistic textures quickly while keeping important details intact.
HoloDreamer can generate enclosed 3D scenes from text descriptions. It does so by first creating a high-quality equirectangular panorama and then rapidly reconstructing the 3D scene using 3D Gaussian Splatting.
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting can create high-quality 3D content from text prompts. It uses edge, depth, normal, and scribble maps in a multi-view diffusion model, enhancing 3D shapes with a unique hybrid guidance method.
ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.
MeshFormer can generate high-quality 3D textured meshes from just a few 2D images in seconds.
GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.
LGM can generate high-resolution 3D models from text prompts or single-view images. It uses a fast multi-view Gaussian representation, producing models in under 5 seconds while maintaining high quality.
AToM is a text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
En3D can generate high-quality 3D human avatars from 2D images without needing existing assets.
WonderJourney lets you wander through your favourite paintings, peoms and haikus. The method can generate a sequence of diverse yet coherently connected 3D scenes from a single image or text prompt.
4D-fy can generate high-quality 4D scenes from text prompts. It combines the strengths of text-to-image and text-to-video models to create dynamic scenes with great visual quality and realistic motion.
LucidDreamer is a text-to-3D generation framework that is able to generate 3D models with high-quality textures and shapes. Higher quality means longer inference. This one takes 35 minutes on an A100 GPU.
Progressive3D can generate detailed 3D content from complex prompts by breaking the process into smaller editing steps. It lets users focus on specific areas for editing and improves results by highlighting differences in meaning.
HumanNorm is a novel approach for high-quality and realistic 3D human generation by leveraging normal maps which enhances the 2D perception of 3D geometry. The results are quite impressive and comparable with PS3 games.
TECA can generate realistic 3D avatars from text descriptions. It combines traditional 3D meshes for faces and bodies with neural radiance fields (NeRF) for hair and clothing, allowing for high-quality, editable avatars and easy feature transfer between them.
Text2NeRF can generate 3D scenes from text descriptions by combining neural radiance fields (NeRF) with a text-to-image diffusion model. It creates high-quality textures and detailed shapes without needing extra training data, achieving better photo-realism and multi-view consistency than other methods.