AI Art Weekly #108

Hello my fellow dreamers, and welcome to issue #108 of AI Art Weekly! 👋

Whew, what an issue to end the year with! After skimming through 562 papers in the last two weeks, I’ve packed this final newsletter of 2024 with 47 groundbreaking highlights for you. From video models to 3D generation techniques, we’ve come a long way this year. The newsletter is quite long, so if it’s not rendering properly, check out the web version.

I’ll be taking a short break until January to recharge and spend some time with family and play with my already four-month-old (🤯) Mini-Tulpa.

Thank you all for being part of this journey - your support means the world to me! Have fantastic holidays, and I’ll catch you in 2025 with more AI art breakthroughs! 🎄✨


Cover Challenge 🎨

Theme: ho-ho-ho
36 submissions by 24 artists
AI Art Weekly Cover Art Challenge ho-ho-ho submission by Edztra
🏆 1st: @Edztra
AI Art Weekly Cover Art Challenge ho-ho-ho submission by mahage_studio
🥈 2nd: @mahage_studio
AI Art Weekly Cover Art Challenge ho-ho-ho submission by mamaralic
🥉 3rd: @mamaralic
AI Art Weekly Cover Art Challenge ho-ho-ho submission by pactalom
🧡 4th: @pactalom

News & Papers

Highlights

Veo 2

Google dropped a surprise bomb this week with their new video model Veo 2. From the first previews and user tests, it looks set to be the new state of the art in video generation. The waitlist is open if you want to try it out.

Sadly for us Europeans, the EU AI Act means we can’t play with it yet :(

Veo 2 beekeeper example

Sora

After teasing us since February, OpenAI finally launched Sora. The reviews? Pretty mixed. Its basic text-to-video and image-to-video stuff isn’t quite hitting the mark they showed earlier. But where it really shines is in the cool extras - blending videos together, making perfect loops, remixing clips, and upscaling them to look even better.

You can get it with your ChatGPT subscription, but EU folks are out of luck for now.

A Sora clip of an icebear surfing

Pika 2.0

Pika just dropped a new major version of their video model Pika 2.0. The model comes with a new feature called ingredients that lets you mix and match different images - like a person, some fashion, and a background and combines them all together in one video.

Best part? They’re offering unlimited generations for free until the end of the week, and everyone can join in!

Pearl Girl trying to steal my popcorn

Midjourney Moodboards

While Midjourney’s video model is still in the works, they’ve rolled out something pretty neat: Moodboards. It’s a super simple way to create your own style codes in Midjourney. Keep an eye out for some exciting Promptcache updates coming soon 😉

Midjourney “Blade Runner” moodboard I’m working on

3D

PRM: Photometric Stereo based Large Reconstruction Model

PRM can create high-quality 3D meshes from a single image using photometric stereo techniques. It improves detail and handles changes in lighting and materials, allowing for features like relighting and material editing.

PRM examples

Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Tactile DreamFusion can improve 3D asset generation by combining high-resolution tactile sensing with diffusion-based image priors. Supports both text-to-3D and image-to-3D generation.

Tactile DreamFusion examples

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

MCMat can generate multi-view physically-based rendering (PBR) materials.

MCMat examples

Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation

Motion-2-to-3 can generate realistic 3D human motions from text prompts using 2D motion data from videos. It improves motion diversity and efficiency by predicting consistent joint movements and root dynamics with a multi-view diffusion model.

Motion-2-to-3 example

Wonderland: Navigating 3D Scenes from a Single Image

Wonderland can generate high-quality 3D scenes from a single image using a camera-guided video diffusion model. It allows for easy navigation and exploration of 3D spaces, performing better than other methods, especially with images it hasn’t seen before.

Wonderland example

MeshArt: Generating Articulated Meshes with Structure-guided Transformers

MeshArt can generate 3D meshes with clean shapes.

MeshArt examples

Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Illusion3D can generate 3D multiview illusions from text prompts or images.

Illusion3D example

Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

Meshtron can generate detailed 3D meshes with up to 64K faces at a high resolution.

Meshtron example

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

SimAvatar can generate 3D human avatars from text prompts, creating realistic motion and detailed textures for hair and clothing.

SimAvatar example

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

BLADE can recover 3D human meshes from a single image by estimating perspective projection parameters.

BLADE examples

PrEditor3D: Fast and Precise 3D Shape Editing

PrEditor3D can edit 3D shapes quickly and accurately using text prompts and rough masks. It allows users to keep unwanted areas unchanged and can edit a single shape in minutes without needing training.

PrEditor3D example

ShapeCraft: Body-Aware and Semantics-Aware 3D Object Design

ShapeCraft can generate 3D objects that fit the human body from a base mesh using body shapes and guidance from text, images, or sketches. It ensures these objects work well in virtual settings and can be made for real-life use.

ShapeCraft examples

Image

ColorFlow: Retrieval-Augmented Image Sequence Colorization

ColorFlow can colorize black and white line-art and manga panels while keeping characters and objects consistent.

ColorFlow examples

InvSR: Arbitrary-steps Image Super-resolution via Diffusion Inversion

InvSR can upscale images in one to five steps. It achieves great results even with just one step, making it efficient for improving images in real-world situations.

InvSR example

Leffa: Learning Flow Fields in Attention for Controllable Person Image Generation

Leffa can generate person images based on reference images, allowing for precise control over appearance and pose.

Leffa examples

TryOffAnyone: Tiled Cloth Generation from a Dressed Person

TryOffAnyone can generate high-quality images of clothing on models from photos.

TryOffAnyone examples

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

FireFlow is FLUX-dev editing method that can perform fast image inversion and semantic editing with just 8 diffusion steps.

FireFlow examples

MV-Adapter: Multi-view Consistent Image Generation Made Easy

MV-Adapter can generate images from multiple views while keeping them consistent across views. It enhances text-to-image models like Stable Diffusion XL, supporting both text and image inputs, and achieves high-resolution outputs at 768x768.

MV-Adapter examples

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

FlowEdit can edit images using only text prompts with Flux and Stable Diffusion 3.

FlowEdit examples

UnZipLoRA: Separating Content and Style from a Single Image

UnZipLoRA can break down an image into its subject and style. This makes it possible to create variations and apply styles to new subjects.

UnZipLoRA examples

FashionComposer: Compositional Fashion Image Generation

FashionComposer can generate fashion images using text prompts, garment images, and human models.

FashionComposer examples

Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy

Pattern Analogies can edit patterns. Not sure what else to say really 😅

Pattern Analogies examples

InstructMove: Instruction-based Image Manipulation by Watching How Things Move

InstructMove can manipulate images based on instructions from users. It allows for complex edits like changing subject poses and rearranging elements while keeping the content consistent and enabling precise adjustments with masks.

InstructMove example

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

InstantRestore can restore badly damaged face images in near real-time. It uses a single-step image diffusion model and a small set of reference images to keep the person’s identity.

InstantRestore example

ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

ReF-LDM can restore low-quality face images by using multiple high-quality reference images.

ReF-LDM example

PanoDreamer: 3D Panorama Synthesis from a Single Image

PanoDreamer can generate 360° 3D scenes from a single image by creating a panoramic image and estimating its depth. It effectively fills in missing parts and projects them into 3D space.

PanoDreamer examples

LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors

LayerFusion can generate images with dynamic interactions between foreground (RGBA) and background (RGB) layers. It enables seamless blending, preserves transparency, and enhances visual coherence, making it great for graphic design and digital art.

LayerFusion examples

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

SwiftEdit can edit images quickly using text prompts in just 0.23 seconds.

SwiftEdit example

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

AnyDressing can customize characters with any combination of clothes and text prompts.

AnyDressing examples

Video

AniDoc: Animation Creation Made Easier

AniDoc can automate the colorization of line art in videos and create smooth animations from simple sketches.

AniDoc examples

FCVG: Generative Inbetweening through Frame-wise Conditions-Driven Video Generation

FCVG can create smooth video transitions between two key frames. It improves stability by defining clear paths for movement and matching lines from the input frames, ensuring coherent changes even with fast motion.

FCVG example generated from one start and end frame

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

DisPose can generate high-quality human image animations from sparse skeleton pose guidance.

DisPose examples

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

SynCamMaster can generate videos from different viewpoints while keeping the look and shape consistent. It improves text-to-video models for multi-camera use and allows re-rendering from new angles.

SynCamMaster examples

ObjCtrl-2.5D: Training-free Object Control with Camera Poses

ObjCtrl-2.5D enables object control in image-to-video generation using 3D trajectories from 2D inputs with depth information.

ObjCtrl-2.5D example

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

3DTrajMaster can control the 3D motions of multiple objects in videos using user-defined 6DoF pose sequences.

3DTrajMaster examples

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

VividFace can swap faces in videos while keeping the original person’s look and expressions. It handles challenges like keeping the face consistent over time and working well with different angles and lighting.

VividFace example

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

SnapGen-V can generate a 5-second video on an iPhone 16 Pro Max in just 5 seconds. It uses a compact model with 0.6 billion parameters, making it much faster than traditional server-side models that take minutes, while still keeping good quality.

SnapGen-V example generated on an iPhone

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

LinGen can generate high-resolution minute-length videos on a single GPU.

5 second clip from a 68 second clip generated with LinGen

CausVid: From Slow Bidirectional to Fast Causal Video Generators

CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.

CausVid preview

StyleMaster: Stylize Your Video with Artistic Generation and Translation

StyleMaster can stylize videos by transferring artistic styles from images while keeping the original content clear.

StyleMaster examples

Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training

Latent-Reframe can control cameras in video diffusion models without extra training. It adjusts latent codes during sampling to match camera movements, achieving high-quality video generation similar to methods that require training.

Latent-Reframe examples

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Mind the Time can generate multi-event videos with precise control over the timing of each event.

Mind the Time example

Diffusion VAS: Using Diffusion Priors for Video Amodal Segmentation

Diffusion VAS can generate masks for hidden parts of objects in videos.

Diffusion VAS example

MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

MEGASAM can estimate camera parameters and depth maps from casual monocular videos.

MegaSaM example

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

INFP can create interactive agent videos from audio and a single portrait image. It enables lifelike facial expressions and head movements, supporting real-time communication at over 40 fps.

INFP example

IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation

IF-MDM can generate high-quality talking head videos in real-time from a single image and audio input. It achieves a resolution of 512x512 at up to 45 frames per second and allows control over motion intensity and video quality.

IF-MDM example

Audio

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

Sketch2Sound can generate high-quality sounds using control signals like loudness, brightness, and pitch, along with text prompts. It allows sound artists to create sounds from vocal imitations or reference shapes.

Sketch2Sound preview

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa