  • Google released their own image generator: Ideogram
  • MS-Image2Video: Open-Source Image-to-Video
  • Dysen-VDM: Text-to-Video with better motion
  • MagicAvatar: Multi-modal Avatar Generation and Animation
  • MagicEdit: Video outpainting and a possible Gen-1 competitor 🤯
  • Total Selfie: Helps you generate full-body selfies 😅
  • Interview with artist Rocketgirl 🚀
  • AI Eric Cartman song cover
  • 3D Gaussian Splatting Tutorial
  • and more guides, tools and gems!

News & Papers

Ideogram: Google’s answer to MidJourney, DreamStudio and Dall-E

Google released their own image generation service called Ideogram last week. The service is still very limited but is very good at generating images with text in them. It’s currently free to use, so don’t miss out on this opportunity.

"AI Art Weekly" logo, t-shirt design, typography created with Ideogram

MS-Image2Video & MS-Vid2Vid-XL

The team behind VideoComposer (issue 37) released MS-Image2Video and MS-Vid2Vid-XL this week. It’s an open-source “alternative” to Gen-2’s and PikaLabs’ image-to-video feature. It’s not a 1 to 1 image animator though, as it’s using the base image more as an inspiration for the video that will get generated. Similar to ZeroScope, the Vid2Vid model is used to upscale the video and remove flickering and artifacts from the lower-res version. There is a HuggingFace demo and a Google Colab available for you to try it out.

MS-Image2Video example

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models

While ZeroScope, Gen-2, PikaLabs and others have brought us high resolution text- and image-to-video, they all suffer from unsmooth video transition, crude video motion and action occurrence disorder. The new Dysen-VDM tries to tackle those issues, and while nowhere near perfect, delivers some promising results.

Dysen example: A lady holds an umbrella, walking in the park with her friend.

MagicAvatar: Multimodal Avatar Generation and Animation

MagicAvatar on the other hands is a multi-modal framework capable of converting various input modalities (text, video, and audio) into motion signals that subsequently generate and animate an avatar. Just look at these results.

Magic Avatar demo with motion- and text-to-avatar: A boy running, red jacket

MagicEdit: High-Fidelity Temporally Coherent Video Editing

But we aren’t done with video yet. MagicEdit not only does an extremely good job stylizing and editing videos (imo comparable with Gen-1), but it also supports video-outpainting 🤯

Magic-Edit outpainting examples

Total Selfie: Generating Full-Body Selfies

Afraid of asking strangers to take an image of you? No problem. Total Selfie has got you covered as it’s able to generate full-body selfies, similar to a photo someone else would take of you at a given scene. All you need is a pre-recording of you with your current outfit and a target pose. All left is taking images of your face and scenery during the day to produce a full-body image at each location. Holidays, here I come 😅

Total Selfie examples

