AI Art Weekly #65
Hello there, my fellow dreamers, and welcome to issue #65 of AI Art Weekly! π and a Happy New Year πππ
I hope you all had some well deserved down-time and are ready to kick off 2024 π₯
Apparently even ML researchers are doing holiday breaks, so this issue is a bit shorter than usual, but I still managed to find some cool stuff for you to check out.
Letβs dive in:
- MoonShot is a new video generation model that can condition on both image and text inputs
- VideoDrafter creates content-consistent multi-scene videos
- SIGNeRF can edit NeRF scenes in a controllable manner
- Spacetime Gaussian Feature Splatting renders dynamic 8k scenes at 60fps
- DreamGaussian4D generates animated 3D meshes from a single image
- MACH creates lifelike 3D avatars from text descriptions
- En3D generates 3D animatable humans
- 3D-Fauna turns image of quadruped animals into 3D objects
- Personalized Restoration can restore images and preserve identities
- Auffusion is a new text-to-audio model
- and some gems!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For the next issue Iβm looking for mystery covers. Challenge runs two weeks so the reward is $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here. Iβm looking forward to your submissions π
News & Papers
MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions
MoonShot is a new video generation model that can condition on both image and text inputs. The model is also able to integrate with pre-trained image ControlNet modules for geometry visual conditions, making it possible to generate videos with specific visual appearances and structures.
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
VideoDrafter is a framework for content-consistent multi-scene video generation. The model is able to convert a text prompt into a multi-scene script, generate reference images for each scene, and finally output a video clip for each scene.
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
SIGNeRF is a new approach for fast and controllable NeRF scene editing and scene-integrated object generation. The method is able to generate new objects into an existing NeRF scene or edit existing objects within the scene in a controllable manner by either proxy object placement or shape selection.
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Spacetime Gaussian Feature Splatting is a novel dynamic scene representation that is able to capture static, dynamic, as well as transient content within a scene and can render them at 8K resolution and 60 FPS on an RTX 4090.
DreamGaussian4D: Generative 4D Gaussian Splatting
DreamGaussian4D can generate animated 3D meshes from a single image. The method is able to generate diverse motions for the same static model and do that in 4.5 minutes instead of several hours compared to other methods.
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes
MACH can create lifelike 3D avatars from text descriptions. The system is able to generate fully-realized 3D characters with detailed facial features, hair, and clothing within 2 minutes. The characters are highly-realistic and animatable, making them ready for immediate use in various scenarios.
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
En3D is yet another method for generating 3D humans either from text or 2D images. This one is trained on millions of synthetic 2D images and is capable of producing visually realistic 3D humans that can be seamlessly rigged and animated.
3D-Fauna: Learning the 3D Fauna of the Web
3D-Fauna is able to turn a single image of a quadruped animal into an articulated, textured 3D mesh in a feed-forward manner, ready for animation and rendering.
Personalized Restoration via Dual-Pivot Tuning
Personalized Restoration is a method that can restore degraded images of faces while retaining the identity of the person using reference images. The method is able to edit the restored image using text prompts, enabling modifications like changing the color of the eyes or making the person smile.
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Auffusion is a Text-to-Audio system that is able to generate audio from natural language prompts. The model is able to control various aspects of the audio, such as acoustic environment, material, pitch, and temporal order. It can also generate audio based on labels or be combined with an LLM model to generate descriptive audio prompts.
Also interesting
- TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
- Image Sculpting: Precise Object Editing with 3D Geometry Control
- ZeroShape: Regression-based Zero-shot Shape Reconstruction
My genesis Solana collection that explores the primal connection between humans and nature, focusing on the thin line between our civilized selves and our inherent wildness
@MatthieuGB has made a new ambitious AI-made short video with RunwayML that got him on the edge with the censorship. Worth a watch!
@dotsimulate has been exploring audio reactive animations in real-time using Stable Diffusion, MusicGen and TouchDesigner.
@minchoi shared a thread on X of a few photorealistic images that look extremely real. The prompt: phone photo of {subject and location} posted to {some social media} in {some time frame} --style raw --s 0 --ar {some vertical aspect ratio}.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa