AI Art Weekly #55

Hello there, my fellow dreamers, and welcome to issue #55 of AI Art Weekly! 👋

AI developments are in full swing and I’ve another packed issue for you. Let’s jump in. The highlights this week are:

  • New 2k and 4k Midjourney upscalers
  • 3D-GPT: 3D modeling with large language models
  • Progressive3D can do local edits on 3D assets
  • DiffSketcher generates vectorized free-hand sketches
  • 4D real time videos at 4k resolution
  • LLMs can control human motion generations
  • DynVideo-E can edit human-centric videos in 3D space
  • PAIR Diffusion: A Comprehensive Multimodal Object-level Image Editor
  • OIR-Diffusion can manipulate multiple objects in an image
  • Separate Anything You Describe
  • Training AI to play Pokémon
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: hope
63 submissions by 40 artists
AI Art Weekly Cover Art Challenge hope submission by UltimAI1138
🏆 1st: @UltimAI1138
AI Art Weekly Cover Art Challenge hope submission by OGFL0W3RS
🥈 2nd: @OGFL0W3RS
AI Art Weekly Cover Art Challenge hope submission by pactalom
🥉 3rd: @pactalom
AI Art Weekly Cover Art Challenge hope submission by kaoru_creation
🥉 3rd: @kaoru_creation

News & Papers

New Midjourney Upscalers

Midjourney released two new image upscalers this week that can upscale images by a factor of 2 or 4. For instance square 1024x1024 images can now be upscaled to a resolution of 2048x2048 and 4096x4096.

Midjourney 2x/4x Upscaler example. High resolution version.

3D-GPT: 3D modeling with large language models

So far it has been tough to imagine the benefits of AI agents. Most of what we’ve seen from that domain has been focused on NPC simulations or solving text-based goals. 3D-GPT is a new framework that utilizes LLMs for instruction-driven 3D modeling by breaking down 3D modeling tasks into manageable segments to procedurally generate 3D scenes. I recently started to dig into Blender and I pray this gets open sourced at one point.

3D-GPT example

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Generating 3D assets is one thing, editing them another. Progressive3D can do both with a DALL·E 3 like level of prompt understanding. Especially its editing capabilities look wild and offer the ability to select different regions of an object with 2D masks and 3D bounding boxes to define the area which should be edited.

Progressive3D examples

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

DiffSketcher is a tool that can turn words into vectorized free-hand sketches. The method also supports the ability to define the level of abstraction, allowing for more abstract or concrete generations.

DiffSketcher examples

4K4D: Real-Time 4D View Synthesis at 4K Resolution

If you were impressed with last weeks 4D-GS advancements, you’ll love 4K4D. The method improves upon Gaussian Spaltting and is able to render at over 400fps at 1080p resolution and 80fps at 4K resolution using an RTX 4090 GPU on common multi-view video datasets.

4K4D in action

MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations

Also last week OmniControl showed us it’s possible to control human motion generations through spatial control signals. This week, MoConVQ shows us that motion frameworks combined with LLMs will be able to follow and complete complex and abstract tasks through text and voice instructions.

MoConVQ example

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

DynVideo-E is an interesting approach utilizing dynamic NeRFs to edit human-centric videos in 3D space and propagate the changes to the entire video. The results look stunning.

DynVideo-E in action

PAIR Diffusion: A Comprehensive Multimodal Object-level Image Editor

PAIR Diffusion is a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in an image. This allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.

PAIR Diffusion examples

OIR-Diffusion: Object-aware Inversion and Reassembly for Image Editing

OIR-Diffusion is yet another image editing method. This one enables object-level fine-grained editing and is able to change the shape, color, material, category and more of multiple objects in a single image.

OIR-Diffusion example turning a lighthouse into a rocket taking off and a blue sky into a sky with sunset

Separate Anything You Describe

As someone who has experimented on audio-reactive music videos in the past, AudioSep might bring me back to it. The model is able to separate audio events, musical instruments, and even enhance speech with natural language queries which makes this a versatile tool for different audio tasks. A demo can be found on HuggingFace.

More papers & gems

  • LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
  • SVC: Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion
  • CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
  • HumanTOMATO: Text-aligned Whole-body Motion Generation
  • Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Looks like DALL·E 3 is back on the menu. Piece made by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buy a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa