Hello there, my fellow dreamers, and welcome to issue #65 of AI Art Weekly! 👋 and a Happy New Year 🎉🎉🎉
I hope you all had some well deserved down-time and are ready to kick off 2024 💥
Apparently even ML researchers are doing holiday breaks, so this issue is a bit shorter than usual, but I still managed to find some cool stuff for you to check out.
Let’s dive in:
- MoonShot is a new video generation model that can condition on both image and text inputs
- VideoDrafter creates content-consistent multi-scene videos
- SIGNeRF can edit NeRF scenes in a controllable manner
- Spacetime Gaussian Feature Splatting renders dynamic 8k scenes at 60fps
- DreamGaussian4D generates animated 3D meshes from a single image
- MACH creates lifelike 3D avatars from text descriptions
- En3D generates 3D animatable humans
- 3D-Fauna turns image of quadruped animals into 3D objects
- Personalized Restoration can restore images and preserve identities
- Auffusion is a new text-to-audio model
- and some gems!
Cover Challenge 🎨
News & Papers
MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions
MoonShot is a new video generation model that can condition on both image and text inputs. The model is also able to integrate with pre-trained image ControlNet modules for geometry visual conditions, making it possible to generate videos with specific visual appearances and structures.
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
VideoDrafter is a framework for content-consistent multi-scene video generation. The model is able to convert a text prompt into a multi-scene script, generate reference images for each scene, and finally output a video clip for each scene.
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
SIGNeRF is a new approach for fast and controllable NeRF scene editing and scene-integrated object generation. The method is able to generate new objects into an existing NeRF scene or edit existing objects within the scene in a controllable manner by either proxy object placement or shape selection.
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Spacetime Gaussian Feature Splatting is a novel dynamic scene representation that is able to capture static, dynamic, as well as transient content within a scene and can render them at 8K resolution and 60 FPS on an RTX 4090.
DreamGaussian4D: Generative 4D Gaussian Splatting
DreamGaussian4D can generate animated 3D meshes from a single image. The method is able to generate diverse motions for the same static model and do that in 4.5 minutes instead of several hours compared to other methods.
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes
MACH can create lifelike 3D avatars from text descriptions. The system is able to generate fully-realized 3D characters with detailed facial features, hair, and clothing within 2 minutes. The characters are highly-realistic and animatable, making them ready for immediate use in various scenarios.
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
En3D is yet another method for generating 3D humans either from text or 2D images. This one is trained on millions of synthetic 2D images and is capable of producing visually realistic 3D humans that can be seamlessly rigged and animated.
3D-Fauna: Learning the 3D Fauna of the Web
3D-Fauna is able to turn a single image of a quadruped animal into an articulated, textured 3D mesh in a feed-forward manner, ready for animation and rendering.
Personalized Restoration via Dual-Pivot Tuning
Personalized Restoration is a method that can restore degraded images of faces while retaining the identity of the person using reference images. The method is able to edit the restored image using text prompts, enabling modifications like changing the color of the eyes or making the person smile.
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Auffusion is a Text-to-Audio system that is able to generate audio from natural language prompts. The model is able to control various aspects of the audio, such as acoustic environment, material, pitch, and temporal order. It can also generate audio based on labels or be combined with an LLM model to generate descriptive audio prompts.
- TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
- Image Sculpting: Precise Object Editing with 3D Geometry Control
- ZeroShape: Regression-based Zero-shot Shape Reconstruction
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!