AI Art Tools, Papers and more
A collection of all the research, tools and links shared in the newsletter.
@A_B_E_L_A_R_T put together an AI short movie called Sh*t! It’s the story of a man who wakes up one morning with an empty fridge but a mind full of memories. Beautifully executed. Abel also shared an insightful Tweet about how the short came together.
Reddit Reddit user u/corporalcadet put together a cool music video by creating multiple ControlNet images of a square and fed them to Gen-2. Sweet idea.
@IXITimmyIXI showed us a raw Pika Labs output this week that is crazy good. You can try the Pika Labs Text-to-Video and Image-to-Video model yourself by joining their Discord.
@socalpathy put together a Stable Diffusion XL style study. Perfect to gather some visual prompt inspiration for your next latent explorations.
@camenduru put together a Google Colab based on the new improved AudioGen model Meta open-sourced this week. AudioGen is a model that is able to generate sound effects from text prompts.
@fffiloni has been at it again. Based on the LP-MusicCaps paper above, he built a HuggingFace space that lets you generate a video based on the captions generated from the music.
Another @fffiloni HuggingFace space based on LP-MusicCaps. This one generates images from music captions.
@opticnerd is the winner of the Claire Silver ‘motion’ contest. His piece Inference is an homage to past analogue media as a young woman moves along the radio and TV dial.
@ranetas publishd a short film created with Pika Labs called “Momentos” depicting diverse artistic styles going through human emotions. A must watch.
@petravoice made a 12-second short for Claire Silver’s motion contest called ‘bliss’ which is a love letter to her visit to Bali.
NoGPU-Webui is a freemium service by OmniInfer that lets you use the Automatic1111 interface with SDXL 1.0 and a ton of others model for free using their cloud infrastructure in the backend.
ResShift is a new upscaler model that uses residual shifting and can achieve image super-resolution faster compared to other methods.
The image to 3D model One-2-3-45 from issue 40 now has a demo on HuggingFace Spaces.
Yamer, our Stable Diffusion fine-tuning expert within our Discord has put together a new model for SD 1.5. Can’t wait with what he comes up for SDXL.
@AICultureWorld has found a way to create loop using AnimateDiff. Apparently it turns out that generations of 4 seconds seems to create a looping output.
Aaah, merging childhood memories… @ThereIRuinedIt made an AI Johnny Cash cover of the famous Aqua song “Barbie Girl”, and it’s fantastic!
Pinokio is a browser that lets you install, run, and automate any AI applications and models automatically and effortlessly.
There is now an official HuggingFace space for AnimateDiff, which lets you generate animations with some base DreamBooth models.
Reddit user u/Neostlagic put together a Stable Diffusion guide on how to generate better and more unique people.
A demo for the GenMM paper got recently released and lets you generate variations from different base motions. No ingame emote dance will ever be the same one day.
Stable Diffusion XL is right around the corner, but before the new SDXL era Starts, Reddit user u/Unreal_777 put together a short chronological history of Stable Diffusion to reminisce about the past few months.
My submission LIVE.DIE.REPEAT. for @ClaireSilver12 5th AI art contest with the theme “motion” got some love when I published it on Tuesday. So I thought I would put together a “Behind-The-Scenes” explanation on how it as made. Enjoy.
@skirano has been expanding scenes from the movie Raiders of the Lost Ark with generative fill AI.
@ilumine_ai has been exploring AI’s potential in the video game industry and put together a crazy highlight reel.
Stability.AI published a new ClipDrop tool called Stable Doodle that uses the T2I adapter behind the scene that lets you use sketches to guide image generation.
While SDXL support for Automatic1111 is on its way, ComfyUI already supports it. It’s a powerful and modular stable diffusion GUI with a graph/nodes interface. Not everyones cup of tea, but one you might enjoy.
If you’re new to ZeroScope text-to-video, @fofrAI put together a short and handy guide to cover the basic settings like fps, steps and upscaling.
@madaro_art put together a Twitter thread about Midjourney blending techniques. Interesting read.
@MatthewPStewart put together an interesting summary discussing the Author’s Guild v. Google District Court case which decided that using copyrighted material in a dataset to train a discriminative machine-learning algorithm is perfectly legal – which sets a precedent for training generative AI models.
@ClaireSilver12 is hosting her fifth AI contest. This time with the theme “Motion”. She’s looking for 8-12 second video/animation using AI (all or part).
@veryVANYA put together a music video with the new ZeroScope text-to-video model.
@flngr put together a Hugging Face space that shows a live stream of generative AI videos, all made using Zeroscope V2 576w.
There is a new upscale method in town. The Multidiffusion Upscaler extension enables large image drawing & upscaling with limited VRAM.
FullJourney is currently testing a text-to-video bot, which similar to Midjourney, can generate content directly within Discord by prompting /movie. If you have a potato GPU and want to try text-to-video for free, this is currently the easiest way.
Stumbled upon @midlibrary_io this week, a library of genres, artistic movements, techniques, titles, and artists’ styles for Midjourney.
Looking for a way to turn your AI images into SVGs? Adobe has a free online tool for that.
@robobenjie built a web game that uses Stable Diffusion under the hood to generate the levels. He also wrote an interesting blog post about it.
@robdewinter discovered how to replicate Stable Diffusion’s img2img feature with Generative Fill by using masks with lowered opacity. Great find!
@javilopen created a zoom out animation using the Midjourney Zoom Out feature and the CapCut video editor.
@radamar and @XingangP released the source code for DragGAN, that crazy GAN interface that lets you drag points in an image to manipulate it.
@aisetmefree shared an interesting thread showcasing how formulas and math equations can influence your generations.
@rmranmo published a blog post about his endeavours in building digital companions capable of learning and growth by using multiple layers of LLMs, real time learning by association, and other core systems. An interesting read.
@RenAI_NFT used the new MJv5.2 zoom out feature to create a zoom out animation by interpolating between the different zoom levels.
The opening title sequence for Marvel’s ‘SECRET INVASION’ show has been made with what looks like Midjourney v2, some img2img variations as well as deforum. Personally love the style but opinions are split on this one.
Deforum developer XmYx released a new project called aiNodes Engine, a simple and easy-to-use Python-based AI image / motion picture generator node engine. I didn’t have the time to test it yet, but that’s definitely on my backlog.
@g_prompter has built a collection of prompt generators for Midjourney and LLMs. If you’re looking for inspiration, definitely worth a try.
@aisetmefree shared some texture / bg / wallpaper prompts which should be interesting for everybody embracing blending and img2img.
Meta released a HuggingFace demo of their MMS model that can transcribe and generate speech for 1000+ languages.
If you’re interested in prompt engineering, this guide by @dair_ai might be for you. Its focus is on prompting LLM’s and not diffusion models, but it’s an interesting read nonetheless.
@Martin_Haerlin produced a super cool short with Gen-1, Elevenlabs & Reface. The coherency of the video is phenomenal.
I came across @pactalom’s ultimate prompt generator this week (thank you @aisetmefree 🙏) and had the most fun with its randomization feature. Definitely worth to give this a try.
Matting Anything can estimate the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance. This is useful for separating a subject from a background. @camenduru put together a Google Colab and @fffiloni created a HuggingFace demo for video matting.
I haven’t shared a Stable Diffusion checkpoint in a while. Here is an experimental one by one of our Discord members @YamerOfficial that I found interesting. A merge of 600+ models that can, as he describes it, create “the pinnacle of perfection or a delightful chaos of visuals”.
@huggingface put together a QR Code AI Art Generator that uses a ControlNet model trained on a large dataset of 150’000 QR code and QR code artwork couples by @diontimmermusic. And yes, 60% of the time, they work every time.
This weeks style prompt got inspired by the theme of above’s cover challenge: new poster of <subject>, alex katz, hypercolorful, mural painting, óscar domínguez, kay sage
.
@0xFramer shared a thread in which he goes through the process of bringing AI pictures to life.
@alon_farchy made a Unity plugin to generate UI for his game. The tool lets him build UI with ChatGPT-like conversations.
@MartinNebelong showcased how he utilized Dreams for PS5 to sculpt and animate characters and then uses the scene as a video input to transform it with Gen-1. Looks like fun!
@paultrillo created the most fun Gen-2 short I’ve seen all week. Feels like a TV ad that could play in a @panoscosmatos movie in the background.
@fofrAI built a prompter generator that lets you generate a lot of prompts at once based on precompiled lists.
I’m a big fan of @spiritform’s custom embeddings and he was so kind as to write a great community post on how to train your own using the Stable Diffusion WebUI.
@romero_erzede shared a LoRA training guide for Stable Diffusion 1.5 and 2.1. Although the guide focuses on training passages, the guide gives a great basic overview on what is important if you want to train a LoRA with your own concepts.
After coming across ColorDiffuser above, I went to look for a solution that was already available. And I found @DeOldify. The projects lets you colorize and restore old images and film footage.
@ArtVenture_ created an Automatic1111/Vladmandic plugin that lets you queue multiple tasks, tweak prompts & models on the fly, monitor tasks in real time, re-prioritize and stop or delete them.
Photoshop’s Generative Fill feature is only about a week old and people are showcasing how easy it is to use. @NathanLands compiled a thread with 7 cool examples.
@BjoernKarmann built the worlds first AI “camera” called Paragraphica! The device generates photos by turning location data like address, daytime and weather into a prompt which then gets turned into an image via a Stable Diffusion API. Pretty cool concept!
@SuperchiefNFT is hosting an AI surrealism art show in NYC with a stunning 100 AI artists, starting Today. Definitely worth checking out.
I played around with another style prompt for this weeks “minimalism” contest and here is what I came up with: Figurative minimalism of <subject>, airy <architectural style> scenes, flat color blocks, minimalist purity and serenity, light <color 1> and <color 2>
@s0md3v built roop, a one-click face swapping python app for videos. You only need one image of the desired face. No dataset, no training.
@pejkster has written a guest post for AI Art Weekly in which he described how to use Segment Anything and Inpainting to composite character into images while retaining the style of the original image.
While Generative Fill is great, it’s capabilities are not state of the art. I’ve already shared a few Photoshop plugins in the history of this newsletter and Flying Dog is another paid one that uses Stable Diffusion in the background. Take a look at the comparison video with Adobe Firefly by @NicolayMausz.
As a developer I’m always looking for interesting and easy ways to integrate AI into my projects. Gyre.ai is an open-source Stable Diffusion API web server that also powers the Flying Dog Photoshop plugin above. Definitely interesting if you don’t want to rely on the buggy Automatic1111 web interface as your backend server.
@4rtofficial conducted a Midjourney Camera lens experiment in which he put the lenses up against the “photo” prompt in MJ. Part 2 can be found here.
@moelucio was toying around with PhotoMosh for this weeks cover challenge in our Discord. The tool has nothing to do with AI, but still an easy way to apply some cool glitch effects to your images.
blotterstream is an endless audio-reactive music video that is generated in real-time at 24 fps. Chris is working on a Spotify integration so you can run this at home during your own home parties.
@frantzfries shared an interesting video this week in which Carvana created a 1.3 million hyper-personalized videos. Is this the future of marketing?
@joinrealmai is hosting an exclusive exhibit taking place during the 2023 Central Pennsylvania Festival of the Arts! They’re looking for AI-art submissions, so show them what you got!
After exploring this weeks lovecraftian theme myself, I found a style I’m extremely in love, so I thought I would start producing weekly style prompts again. This weeks prompt is: in the style of john bauer, alessandro gottardo, wäinö aaltonen, dark black and light beige
.
One week after the announcement of DragGAN, developer Zeqiang-Lai put together an unofficial implementation of the paper. Code can be found here.
Open-source voice training is still a bit of a pain. Luckily there is a new WebUI that you can run on your own hardware. Unfortunately the docs are in Chinese 😅 Luckily again, @NerdyRodent put together a YouTube tutorial.
In case you’ve missed the signup link last week, Google’s MusicLM is in closed beta and you can sign up for access. I did so last week and haven’t heard back yet, but according to @infiniteyay it shouldn’t take longer than a few days 🤞
Part 1 of a three parts series about Generative A.I. by The Economist. Interesting read.
Nothing to do with art, but this is a great summary of Sam Altman’s testimony in front of Congress this week in case you’re interested. I highly recommend you check out the rest of “AI Explained”’s channel as well. One of the best YouTuber’s in this field. Makes this 3 months old “meme” look like a prediction for the future.
I love what musicians are doing with Grimes’s AI voice. There are some really dope tracks which have been created for the #uberduck contest.
@antoine_caillon shared a demo in which he uses real-time hand tracking, rave and msprior that enables an intuitive way to steer sound generation using your bare-hands. Pretrained models and code will hopefully be released at the end of June.
@StabilityAI open-sourced their DreamStudio WebUI. The backend currently relies on their hosted API endpoints but because it’s open-source, I’m sure the community will soon find a way to make it work with local GPUs as well.
Khroma uses AI to learn which colors you like and creates limitless palettes for you to discover, search, and save. I use it to generate gradients for image blending in Midjourney.
@promptmuse interviewed Sagans about their AI animation workflow this week. Sagans is an AI anousmous collective which has created AI music videos for Lorn, Die Antwood and Linkin Park.
@terhavlova shared her discoveries on how to age a character with Midjourney.
Basic Pitch, built by @SpotifyEng, is a free audio-to-MIDI converter with pitch bend detection. With AI music on the rise, this one seems useful.
@itspetergabriel is hosting an animation contest together with @StabilityAI. The competition got extended until 26 May at 7pm BST. Time to revisit deforum 🔥
@FluorescentGrey put together an one hour plus full length feature ‘film’ with AI. It’s the 1st part of a 3-part horror anthology homage to the movie “The Thing” from 1982 and it premiers in 4K on the 17th of May.
@NyricWorlds announced Nyric this week, an AI text to world-generation platform for digital communities. First trailer looks promising!
I stumbled upon @kevinbparry’s tutorial on his viral “10 Types of Magic” short where he explained how he used AI interpolation to create neatless transitions between stop-and-go frames. Interesting watch and cool to see how AI enables new ways of creating when applied creatively.
There are two new ControlNet models in town which are using MediaPipe hand landmarks for guidance.
@yankooliveira released a Photopea extension for Automatic1111 which lets you modify images in a Photoshop like editor directly within your browser before modifying them further through ControlNet or img2img.
@vizsumit shared his LoRA learnings for style and person/character training this week.
Reddit user u/PrimeFixer developed what feels like a Ebsynth web version to apply Stable Diffusion generated style effects to videos. I’m especially impressed with the UI/UX of Flick. First 45 seconds are free, so give it a try.
I updated my “Midjourney Archive Downloader” chrome extension. New version opens the download form in a new tab, so the download process doesn’t get canceled when switching tabs. I want to add prompt as metadata next so we can organize Midjourney images offline.
Reddit user u/Mobile-Traffic2976 converted an old telephone switchboard into a Stable Diffusion Photobooth. A cool project that brings back some retro cyberpunk vibes.
AI movies are taking up steam. Gen-2 is producing some quality footage with some unique weirdness that we’ll probably lose in the future. So, here are 10 of my current favourite examples consisting of shorts, music videos and concept trailers that I found impressive and worth watching!
This week @SnoopDogg put into words what we’re probably all thinking: “Like, what the f**k?” 😂
I got a bit tired of Midjourney’s limited and buggy archive download solution so I decided to put together a quick and dirty chrome extension with GPT-4 that only downloads upscaled images. As a next step I want to add the ability to attach prompts as metadata to the downloaded files so we can better organize Midjourney files locally. Code can be found on GitHub.
@dymokomi open-sourced a tool called dygen which is a python script that can apply painting textures to your images. Worth trying out.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences.
Reddit user u/lazyspock put together a cheatsheet for female haircut styles. Within the cheatsheet is also a workflow described on how the different hairstyle prompts where generated, so this could potentially be reused for other concepts.
A leaked internal Google document that is making the rounds claims that open source AI will outcompete Google and OpenAI. One can only hope!
@ClaireSilver12 shared her “Collaborative” AI Contest finalists. There are some insanely cool projects in that thread worth checking out! Congratulation to the winners @alxalxalx, @dragos_badita, @PhoenixFawkes25, @MutagenSamurai and @bl_artcult!
@Grimezsz announced this week that she’ll split 50% royalties on any successful AI generated song that uses her voice. If you want to give this a shot, take a look at SoftVC VITS which I shared in issue 27 alongside a tutorial from @NerdyRodent on how to train a custom voice.
YouTuber Art from the Machine is working on a Skyrim VR mod that lets you talk to NPCs powered by ChatGPT and xVASynth. The voices still sound a bit mechanic and it takes a while to generate the responses, but remember, this will be the worst this tech will ever be. I at least am super excited for the future of gaming!
@Merzmensch created a trippy AI short film called ALICE using Gen-2 by @runwayml. I’m sure we’re going to see a lot of those in the near future and I’m all in for it.
@kenakennedy is hosting an explorative AI “hackathon/festival” in Berlin during May 11th-29th 2023 featuring 19 days of extended coliving with a vibehack finale weekend festival.
This hasn’t anything to do with art (except for the engineering part), but I wanted to share this just for the sheer fun of it. First robot soccer league wen? ⚽️🤖
@bl_artcult won the “Collaborative” AI Contest with his Poe chatbot “MuseAI”. The idea is to build an AI that could act as a ‘Muse’ for artists; to inspire, help create & add value. Absolutely love it. Used it to come up with this weeks cover art challenge. @bl_artcult if there is a way to get involved, let me know 😉✌️
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem.
Prompt Reader is a simple standalone viewer for reading prompts from Stable Diffusion generated images outside the WebUI.
The demo I’m most hyped this week comes from @frantzfries. He built a conversational NPC powered by ChatPGT, Whisper and ElevenLabs in an VR environment that I would absolutely love to chat with. Brother Geppetto is 🔥
@jessicard hooked up ChatGPT to a Furby and quote: “I think this may be the start of something bad for humanity” – I think she may be right.
@bobarke built WeatherPainting.com, an impressionist weather report that uses @openAI’s Dall-E API to create generative art from real-time weather data.
@TomLikesRobots is killing it with his ModelScope text-to-video tests and he found that using simple prompts with a strong style can produce consistently interesting animations that can overpower the ShutterStock logo.
@vmandic00 is working on a fork of the popular Automatic1111 repo which is quite alive, fixes a ton of open issues and adds new features. At the time of writing, the fork is 443 commits ahead of the original master branch. Might be worth checking out.
Beta testing for Automatic1111 ControlNet v1.1 has started and apparently it works quite well. But if something broke for you and you want to test the v1.1 models without installing anything, you can use @camenduru’s colab notebook.
TAS is an interactive demo based on Segment-Anything for style transfer which enables different content regions apply different styles.
Facebook open-sourced the code for AnimatedDrawings, a library that lets you turn children’s drawings into animated characters.
After getting inspired by @NathanLands meme game, I was thinking I could maybe turn some of my AI art into memes. So I stumbled upon @AndreasRef’s MemeCam which uses BLIP image recognition and GPT-3.5 to generate captions. Only thing that it is imo missing, is an option for a smaller font-size or the option to just generate the text.
@CoffeeVectors added 2D sidescrolling mechanics and collision detection to a realtime NeRF in @UnrealEngine 5 with the @LumaLabsAI plugin and shared the tutorials to make this happen.
@mrjonfinger made another film using @runwayml and @elevenlabs. The script was inspired by a Tweet that GPT-4 wrote: “I just had a dream that I was an AI and woke up in a lab. What does it mean?”
@SuperXStudios added the ability to generate custom 3D character skins for his game “Fields of Battle 2”. The image is generated using ControlNet OpenPose, which creates character textures and is then pulled through a pipeline to create a 3D, rigged, animated character in about 15 seconds. Amazing!
@NathanBoey shared how he created his animated piece “LUNA”. Definitely worth a read.
@reach_vb put together a Google Colab notebook that lets you generate audio from text using the AudioLDM model.
Last weeks Follow Your Pose paper brought pose guidance to video, but creating pose frames isn’t that straightforward. So @fffiloni put together this helpful HuggingFace space that lets you convert a video or gif to a MMPose sequence.
The SceneDreamer code from issue 19 got released and lets you generate unbounded 3D scenes. There is also a HuggingFace demo.
I already shared a HuggingFace demo which lets you generate an infinite zoom video, this one is usable directly within Automatic1111.
You’re Balenciaga Harry. @cocktailpeanut put together a website that generate a 100% automated fashion show that keeps on going, forever.
Inspired by @ClaireSilver12’s AI art contest I started utilizing GPT-4 to create a video game called Room of Wonders. Once finished, the game will let you fill a 2D room with AI generated assets and furniture.
@nickfloats received early access and put together a Gen-2 video with prompts from the community. Very cool showcase of Gen-2’s text-to-video capabilities.
@cocktailpeanut is also the author of Breadboard. The browser lets you organize your AI generated Stable Diffusion images in one place by simply linking a folder. Works on Windows, macOS and Linux.
@AxSauer announced the release of the StyleGAN-T code to train your own weights. But more exciting: He’ll be joining @StabilityAI, which hopefully means we’ll soon see large-scale pretrained open-source GAN models!
I wrote a short blog post about intention and how it relates to AI art. I’d love to hear your thoughts on the topic!
@DarthMarkov trained a new ControlNet model based on MediaPipe’s face mesh annotator which helps to control faces when generating images.
It’s finally here. Upload an image with an speech clip to generate an animated video of the character speaking.
@StarkMakesArt shared his process behind creating his cover submissions for this weeks anthropomorphism challenge. Always interesting to see behind the curtains of how other artists work.
There is some sort of hilarious and weird spaghetti fetish going on with the ModelScope Text-To-Video model. Besides Will Smith eating spaghetti, there is also a video of Elon Musk or the Pope eating pasta. Now I’m hungry again.
@nikkiccccc from @lkgglass is working on the first conversational holographic AI being powered by ChatGPT called Uncle Rabbit. Time to unbox my Looking Glass Portrait.
@bensartnoodles showcased some cool Deforum experiment in which he used ControlNet in combination with an input video of a masked silhouette for guidance.
In my opinion an inspiring short documentary about AI art involving @katecrawford, @trevorpaglen and @refikanadol produced by @MuseumModernArt.
The official Text2Video-Zero implementation of Text2Video-Zero is live on HuggingFace and GitHub.
@TREE_Industries is building a GPT3/4 Blender plugin which lets you modify 3D objects with natural language. It’s still in early development, but it’s already pretty cool. There is also a Unreal plugin.
The SoftVC VITS fork lets you train your own singing voice conversion model. @NerdyRodent created YT tutorial for it in case you want to try this out for yourself.
It’s now possible to finetune the ModelScope Text-to-Video model. It’s recommended to use at least a an RTX 3090, but you should be able to train on GPUs with <= 16GB as well with some additional settings.
This is actually quite a neat trick to achieve better temporal consistency when style transfering videos. Summarized: Create a grid of keyframes and run them through img2img together and then split up the resulting images back into frames to continue processing in EbSynth for example.
@kirkouimet, @scottycoin, @joe_bowers and @mikedenny took on the job of generating an image for each verse of the bible – which apparently took two weeks for the Old Testament on a 4090ti. Ngl, I’m curious about the electricity bill 😅
@mrjonfinger created a cool AI short movie using Gen-1 video-to-video by @runwayml and Speech-To-Text by @elevenlabsio.
OpenAI announced support for ChatGPT plugins this week and @gdb showcased how ChatGPT can modify an uploaded video file using a text command by first writing the code and then executing that code within a sandboxed environment. Wow!
My fine-tuning quest continues. This video by koiboi provides a great overview over the different available Stable Diffusion fine-tuning methods. And to my surprise contains my last music video. Thanks for the share @FakeSmileNFT.
Stumbled upon this beautiful website full of classical public domain artworks this week. Great for img2img as well as learning about new terms and classical artists.
@luxdav created a nifty little prompt randomizer which lets you generate multiple prompts with by templating with different keywords. Love the simplicity of it.
@javilopen built a super basic Doom prototype (3D maze-like map to walk through) with the help of GPT-4 and shares his learnings in this Twitter thread.
@vibeke_udart shared her animation process of turning images into 3D using normal/depth maps and relighting them with @1null1 (TouchDesigner).
@stine1online self-published a book showcasing the creative potential of Artificial Intelligence by featuring poems, short stories, and recipes generated by AI with matching art. The book is available as an ebook and paperback, check it out!
I feel like each week I’m sharing a new pose plugin 😅 This one seems like the cumulation of all that came before. It lets you stage multiple skeletons, edit hands, generate normal/depth/canny maps and adjust body parameters.
This HuggingFace space by @kadirnar_ai lets you create infinite zoom out and zoom in animations.
I’m currently research how to train LoRAs and I stumbled upon this cool illustrated guide by Reddit user u/UnavailableUsername_ back from Feburary.
The unprompted extension introduces a templating language which enables shortcuts in your prompts. The new [zoom_enhance]
shortcode for instance automatically fixes faces and increases details in your images.
If you’re looking into isolating a subject from a background there are a few tools out there to help you achieve that. This one is a Automatic1111 extension built upon rembg.
If you want to support the newsletter, this weeks cover is available on objkt for 5ꜩ a piece. Thank you for your support 🙏😘
@WonderDynamics introduced their product Wonder Studio – an AI tool that automatically animates, lights and composes CG characters into a live-action scene. You can signup for the closed beta on their website. Just imagine combining this with a high quality 3D avatar generator and boom, infinite possibilities.
@alejandroaules published his book about Aztec lore produced using his own fine-tuned Stable Diffusion models (after having been temporarily banned from Midjourney for generating some graphic imagery). The full digital book can be bought on Gumroad.
@_bsprouts made a cool short animation using Gen1. I got to play around with it this week myself and the current 3 second limit is a bit of a pain, making me appreciate @_bsprouts endeavour to push through it and show us what’s possible.
ControlNet only worked with Stable Diffusion 1.5 so far as the models have to specifically trained with the same base model. There are now canny, depth and pose models which work with SDv2.
There is a new pose plugin for Automatic1111 in town which lets you stage poses with multiple bodies in 3D directly within the web interface.
One thing Posex can’t do though, is staging hands correctly. This is were the depth map library comes in. In it you find a bunch of prerendered depth map hands which you can combine with the pose images from Posex to generate the hands that you actually want.
If you’re looking for a free to use video-editor, CapCut seems like a good choice. @superturboyeah created a short music video using stills created with Midjourney with it.
@weird_momma_x shared with me some resources she uses for blending with her work. @smithsonian Open Access is a platform where you can download, share, and reuse millions of the Smithsonian’s images—right now, without asking. Perfect if you’re having concerns about the ethics of using input images when working with AI. There are also a few more resources momma shared in the Discord.
If you want to support the newsletter, this weeks cover is available on objkt for 5ꜩ a piece. Thank you for your support 🙏😘
The Corridor Crew released an Anime this week, stylized with Stable Diffusion img2img and deflickered with the Davinci Resolved’s deflicker plugin.
I’m gonna be honest, I don’t know exactly what’s going on here @ryunuck, but it looks trippy and fun. Audio / motion reactive AI animated VR music experience? Count me in if I ever get my hands on a VR headset.
@Aidan_Wolf showed us how easy it is to turn a cube into an ancient artifact using Midjourney, Blender and DeepBump.
@toyxyz3 put together a Blender plugin that includes character bones that resemble the OpenPose visualization. The plugin lets you setup a pose (including fake hands and feet) and then export the bones as a pose image and the hands/feets as depth maps / canny images for usage with ControlNet.
The team at @weizmannscience put together another MultiDiffusion @Gradio app, this one lets you try out the spatial control feature of their method by drawing a segmentation mask and assigning a prompt to each color.
If you don’t want to wait for Composer to colorize your grandparents photo collection, @hysts12321 created a DDNM @Gradio app which lets you restore old or low quality images by increasing their resolution and colorization. There is also a Google Colab.
@RamAnanth29 put together a HuggingFace demo for the novel ZoeDepth depth map model.
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
This one made me super excited. @PeterHollens is working on a 100% AI music video. If this is real, I didn’t know @D_ID_ was capable of creating signing voices this good! Gotta try.
@Songzi39590361 put together a Blender script which makes it possible to pose characters in Blender and send it through Stable Diffusion + ControlNet with the click of a button.
@bilawalsidhu shared a ControlNet experiment where he’s toggling through different styles of contemporary Indian décor, while keeping a consistent scene layout.
GitHub user AugmentedRealityCat shared their workflow on how to convert UV texture maps into segmentation maps so you can use them with ControlNet to texture your assets.
ArtLine is a model designed to produce amazing line art portraits from an input image and works great with the Canny ControlNet model.
This Automatic1111 plugin lets you create a create a pose by drag and dropping the joints within the WebUI which then can be used with ControlNet.
Another @fffiloni @Gradio app. This one makes it possible to achieve greater consistency between video frames using ControlNet, although there is still some flicker without temporal guidance between the frames, it’s a step up to previous techniques.
@j_stelzer put together a HuggingFace demo app which enables video transitions with incredible smoothness between prompts. There is also a Google Colab notebook. ControlNet support is apparently coming.
This HuggingFace space lets you train Tune-A-Video on your own videos and then use the trained model to edit your video clip.
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
@Oranguerillatan put togehter a music video for the band “Dead Man’s Couch” and the intro is just magic *chefskiss*.
@bensartnoodles tested ControlNet with a set of video frames and although the results still produce a lot of flicker depending on the scene and prompt, this one looks super cool.
@peteromallet is working on Banodoco, an open-source tool that combines multiple AI models to create coherent animated videos. He just showcased a vid2vid example of v0.2 and it looks amazing! You can currently signup to become a beta tester.
Boy, this week keeps on giving. @justLV shared his process behind applying more temporal consistency when modifying videos content with Stable Diffusion.
@ryunuck is working on what looks like a Deforum editor with visualized keyframe graphs for different audio stems. Looks cool.
@ouhenio is as well working on a vid2vid pipeline called Dreamcatcher. Also worth keeping an eye on.
@KaliYuga_ai created a fork of the LoRA-enabled Dreambooth notebook and extended it with BLIP functionality to autocaption your image dataset. I’m gonna use this to train my next fine-tune with LoRA.
Tired of spelling out all same negative prompts over and over again? The EasyNegative embedding has you covered. It was trained on the Counterfeit model, but should also work (more or less) with other models.
@johnowhitaker put together a Google Colab to generate more coherent frame by frame consistency when using the IP2P method. Example 1. Example 2.
If you’re looking to play around with a new fine-tuned SD model or just want to browse a bit, @HuggingFace created a Diffusers Model Gallery to easily do just that.
Sponsored
– This weeks issue and challenge is sponsored by @YoupiCamera. They recently released a personalized AI art sharing app for iOS called MetaReal – a mix of Lensa A.I. and MidJourney. Join the “ruins” cover art challenge to get one month premium membership for free.
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
@paultrillo put out a stunning animation transforming some plain ol’ video footage into a cloudy dream combining tools like @NVIDIAStudio #instantnerf, #stablediffusion, @runwayml and #aftereffects. All on a @dell laptop. Pretty impressive.
@angrypenguinPNG is building an 360 image generator with #stablediffusion. Try the prompt: medieval castle ruins in sunset forest, golden trees, in the style of elden ring, octane render, wide angle
.
@dreamwieber shared his process on how he creates Stable Diffusion animations using ChatGPT and Midjourney.
With last weeks audio models arrived the ability to turn images into music. @DrJimFan shared how with the combinations of image captioners, LLM and text-to-audio we can generate inifinite atmopsheric background music for images.
Last week I shared an almost flickerless example video created by @CoffeeVectors using Automatic1111 with InstructPix2Pix. @fffiloni now created a simple @Gradio app which lets you do the same thing – minus the hassle to install anything.
@fffiloni also created an app to generate audio from an image with last weeks AudioLDM model. If you want to take the above a step further, take a frame from your Pix2Pix-video, generate audio from it, and combine it with your video. Just take a look at this cute burning doggo.
If you just want to create audio from text instead of an image, @liuhaohe got you covered. There is also a Replicate model which gives you the ability to generate longer audio.
While InstructPix2Pix is good at manipulating styles of an image, @Microsoft’s Instruct-X-Decoder is good at object-centric instructional image inpainting/editing.
If you’re looking to sync Deforum animations to music, framesync is a handy tool to do just that. It’s similar to what I’ve been building but also lets you create keyframes without having to use an audio file.
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
watchmeforever is broadcasting the Nothing, Forever show, a show about nothing, that happens forever. The show runs 24/7 and is always streaming new content, generated via machine learning and AI algorithms like GPT-3. An early taste of the future.
Flickering is currently the state of the art when it comes to generating videos with Diffusion models. Inspired by @TomLikesRobots, @CoffeeVectors showcased an almost flickerless video this week using the new InstructPix2Pix model.
Tired of updating your local Stable Diffusion or spinning up cloud notebooks and waiting for dependencies to install? The team behind @RunDiffusion has put together a cloud solution that lets you spin-up an Automatic1111 or InvokeAI instance in about 3 minutes and starts at $0.50/hour.
Sampling methods don’t have to be a mystery. In this video @brockwebb unpacks the general behavior of samplers to give you a better understanding of what’s going on under the hood.
Pixera is for all you pixel art lovers out there (me included). Put in an image and the HuggingFace demo turns it into Pixel art.
@tajnft put together a detailed thread, explaining step by step on style training your first Stable Diffusion model and migrating from MidJourney.
Want to generate some human motions? This HuggingFace demo lets you do that by utilizing T2M-GPT models.
You might have heard about the lawsuit against Stability AI, DeviantArt and Midjourney. A few tech enthusiasts have put together a well referenced response to the original lawsuit announcement. The Corridor Crew also put together a worthwhile video about the case explained by an actual lawyer.
It took me a while, but I finally managed to produce another AI music video. This time rendering multiple versions using Stable Diffusion v2 and then combining them into a single final product. Even though SDv2 is harder to prompt for due to the reduced dataset, its ability to stay more coherent can be an advantange in certain situations (haven’t seen a single horizontal split when zooming out so far).
I love generative art but never took the time to sit down and tinker with it. Well, this now changed thanks to @OakOrobic, who came up with the idea to instruct ChatGPT to generate p5js code. Tried this myself recently and it’s quite fun.
For this weeks interview we also did a fun experiment by turning the whole thing into a podcast episode. I used ChatGPT to convert the interview below into a conversation and then used elevenlabs.io to clone our voices and generate speech with their state of the art Text-To-Speech model.
This weeks Style of The Week is a Stable Diffusion exclusive from artist @DGSpitzer who doesn’t fear, but fully embraces AI art by fine-tuning a model on his own paintings and concept artworks and giving it to the community for free. Truly appreciated 🙏
I know I’m a bit late to the game, but I’ve recently started tinkering with fine-tuning my own models. While Dreambooth is great, it’s fairly slow. LoRA aims to improve that and there is now HuggingFace space that you can duplicate to run your own trainings.
Stable.art is an open-source plugin for Photoshop (v23.3.0+) that allows you to use Stable Diffusion (with Automatic1111 as a backend) to accelerate your art workflow. As an Affinity user, I’m jealous and although I don’t want to switch, I might soon get a Photoshop subscription.
@ShuaiYang1991 created a HuggingFace space for his VToonify implementation which lets you turn portrait images and videos into toonified versions by applying comic, cartoon, illustration and other styles.
Aside Automatic1111 and InvokeAI, there is yet another Stable Diffusion UI called NMKD. I stumbled upon it this week due to its implementation of InstructPix2Pix which lets you edit images with pure text prompts (I wrote about it in issue #9). Unfortunately it’s Windows only, but thanks to @Gradio, there is also HuggingFace demo.
stable-karlo combines the Karlo image generation model with the Stable Diffusion v2 upscaler in a neat little app. There is also a Google Colab notebook for us stone age AI artists without a decent GPU.
@DavidmComfort put together a guide on how to systematically changing facial expressions, while maintaining a consistent character in Midjourney.
@daveranan is working on a short film titled Daedalus. All visuals for this scene have been generated using MidJourney. Can’t wait to see the final cut.
@nptacek is creating 3D environments with Stable Diffusion by prompting terms like 3d panorama|stereoscopic Stereo Equirectangular
. Checkout the tutorial thread for more details.
@rainisto is killing it lately with his WarpFusion renderings of popular music videos. WarpFusion is definitely on my list of things to experiment with in 2023.
Each week we share a style that produces some cool results when used in your prompts. @weird_momma_x shared an interesting find this week by adding the words art brut
and/or outsider art
to your prompts. Technically not a style, but still fun experiment to explore with your traditional prompts.
A few weeks ago I wrote about the Tune-A-Video paper, a method that is able to generate a video by providing a video-text pair as input prompt. bryandlee on GitHub now released a first unofficial but promising first implementation of the paper.
Reddit user u/UnavailableUsername_ put together a visual Automatic1111 WebUI guide for stepping up your Stable Diffusion art generation game. The guide covers model merging, prompt weighting, matrices, prompt transformation and more.
If you’re into Pixel Art, this is by far the best model I’ve found so far. But it comes with some caveats. First of all it only works on Windows as an Aseprite extension as there are some additional features built on top of it to make it this good. Second of all it costs $65. Now if you’re serious about creating Pixel Art, that price tag shouldn’t be an issue and is certainly a good investment in my opinion.
Now if you don’t want to shell out money for a fine-tuned Pixel Art model, there is PXL8. The generated examples look fantastic as well and PXL8 comes with an extension for the Automatic1111 WebUI.
@ErotemeArt stumbled upon a useful MJv4 artist reference spreadsheet which covers a lot of different topics like characters, landscapes, paintings, anime and more. Best used after duplicating the sheet ot your own GDrive.
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
@enpitsu fine-tuned Stable Diffusion on 10,000 images of Japanese Kanji characters and their meaning. The model came up with “Fake Kanji” for novel concepts like Linux Skyscraper, Pikachu, Elon Musk, Deep Learning, YouTube, Gundam, Singularity and they kind of make sense.
@brockwebb created a short explanatory video about how “Super connectors” in a network tie several things together and thus why certain words and names like “Greg Rutkowski” are becoming shortcuts to produce more better and more coherent concepts.
@somewheresy shared a 16 bar jungle loop which was partially created using Riffusion generated samples. I’m pretty excited for generative music to create an ongoing soundtrack of my daily life.
I’m don’t usually like to promote drama, but this screenshot by @reddit_lies perfectly portrays how silly anti-AI art gatekeeping is. We’ve come full circle 🤡
Each week we share a style that produces some cool results when used in your prompts. This weeks style is based upon a fine-tuned Stable Diffusion model trained on pulp art artists (pulp art by Glen Orbik
, Earle Bergey
, Robert Maguire
or Gil Elvgren
also creates a cool style in MidJourney).
If you’re looking for fine-tuned Stable Diffusion models and embeddings, civitai.com is the place. There are over a 1000 models and some pretty neat gems (among all the unholy stuff) with example images and prompts for you to play around with.
Not all models are on Civitai yet though. Cool Japan Diffusion being one of them. As the name says, the model was fine-tuned for generating cool Japan themed anime and manga images.
Then there is Dreamlike Photoreal 2.0, a stunning photorealistic model based on Stable Diffusion 1.5, created by @dreamlike_art.
And last but not least @_jonghochoi created a @Gradio demo on HuggingFace for pop2piano which lets you convert pop audio to a piano cover and download the result as a MIDI file for further processing.
A glimpse into @LynnFusion‘s noise painting process using #stablediffusion. It’s always interesting to see how people use different approaches to creating.
@angrypenguinPNG walks us through the process of creating 3D assets using Stable Diffusion, remove.bg and OpenAI’s Point·E model.
You should know by now, I always love to see AI work in the real world. @emmacatnip produced some mesmerizing animations for a live concert of the band @plaidmusic.
And because I’m a sucker for these, here is another great one by @vibeke_udart for the song “Shanghai Roar” by Who Killed Bambi and @jennylydmor performed at @Musikhuset.
Each week we share a style that produces some cool results when used in your prompts. This weeks style is based upon a Stable Diffusion v2.x embedding which creates, well, double exposure images (double exposure
with silhouette
also creates a cool style in MidJourney, example below).
Enter a main idea for a prompt, and the model will attempt to add suitable style cues. If you’re suffering from a creative or conceptual block. It certainly helped me. There is also another one called Magic Prompt which does something similar but different.
This is a user-friendly plug-in that makes it easy to generate stable diffusion images inside Photoshop using Automatic1111-sd-webui as a backend.
The-Jam-Machine is a Generative AI trained on text transcription of MIDI music, meant to enhance creativity while making computer music. It uses a GPT2 model architecture and has been trained on about 5000 midi songs.
Vintedois Diffusion is a model trained by @Predogl and @piEsposito. The model was trained on a large amount of high quality images with simple prompts to generate beautiful images without a lot of prompt engineering.
Shameless plug of eCard AI, the festive greeting card generator I’ve been working on this week. It uses text, image and speech models in the background to generate a Christmas wish from Santa Claus and combines it all into one video for you.
@harmonai_org is a community-driven organization that is working on open-source generative audio tools. They are currently hosting a audio-visual 24/7 music stream on YouTube. The sound is super experimental and glitchy but definitely fun to checkout.
@DrawGPT prompts GPT3 to output code that draws images on an HTML5 canvas. The output doesn’t compare obviously to your state of the art image diffusion generator, but it’s still a fun and creative implementation.
This weeks style is based upon a fine-tuned Stable Diffusion model inspired by Kurzgesagt YoutTube videos (kurzgesagt style
also creates a cool style in MidJourney). Trained by @Fireman4740.
@GuyP created a Prompt Book for DALL·E 2 earlier this year. I currently use it only for in- and outpainting but if you’re looking to dive a bit deeper into it, this guide contains tips and tricks about aesthetics, vibes, emotional prompt language, photography, film styles, illustration styles, art history, 3D art, prompt engineering, outpainting, inpainting, editing, variations, creating landscape and portrait images in DALL·E, merging images, and more!
Remember last weeks RIFFUSION? @DGSpitzer put together a @Gradio app that compiles your text prompt into an album cover and spectogram and then into music and in the end compiles it all together into a video.
If you need something to read over the holidays, this blog post by @hardmaru delves into how examining the construction, training, and deployment of neural network systems through the lens of complex systems theory can lead to a deeper understanding of them.
Karlo is a text-conditional image generation model based on OpenAI’s unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details only in the small number of denoising steps.
fragments is an embedding (not fine tuned checkpoint) for Stable Diffusion v2 768 which applies the fragments_v2 style to your image generations.
An interesting short documentary by @kylevorbach faking his life over a month by Dreambooth’ing his face and fabricating a synthetic life on Social Media. The mad-lad.
@DrJimFan dives into the @Google Robotics RT-1 announcement. The new robot is equipped with a 7-DOF arm, a 2-fingered gripper, and a wheeled base. It can retrieve lost sunglasses, organize condiments, restock and prepare snacks, and close drawers left open in Google’s kitchens.
@ammaar combined ChatGPT and MidJourney to create an illustrated children’s book trying to introduce the concept of AI to kids. This is certainly not the first of its kind, but it’s always fun to see how people use AI to create physical products.
Each week we share a style that produces some cool results when used in your prompts. This weeks style is based upon a finetuned Stable Diffusion model trained to produce analog film style images (join me in figuring out keywords that produce similar results in MidJourney). Trained with Dreambooth by @wavymulder.
I tried the WebUI several weeks ago, and I think it was even before I started writing this newsletter. It was the first one to run on an Apple M1 Macbook, and I must say it has come a long way. If you’re looking for a more user-friendly and stable alternative to the Automatic1111 WebUI, you might want to give InvokeAI a try.
There is a new Stable Diffusion animation notebook in town created by @RiversHaveWings. It’s not as elaborate as Deforum when it comes to the settings, but the output looks cool.
Seek.art MEGA is a general use “anything” model trained on nearly 10k high-quality digital artworks that significantly improves on Stable Diffusion v1.5 across dozens of styles.
LoRA is a fine-tuning method in the domain of LLM and this repository promises a faster and more lightweight fine tuning alternative to Dreambooth. There are both a HuggingFace Space and a Google Colab notebook for you to try out.
If you don’t want to think about all the technical aspects of fine tuning and don’t mind shelling out a few bucks to create your checkpoints, this @Gradio HuggingFace space by @abhi1thakur is probably the easiest way to do so (I feel like I say this every week lately).
In The Dream Tapestry installation, museum guests visualize their dreams using text-to-image AI. The AI then combines those dreams together. It’s the first interactive DALL-E experience at a museum. By @CitizenPlain.
@ThoseSixFaces created a cool AR target tracking demo using a Pokémon card. Tilting the card morphs Ditto into various types of Pokémon which got diffused using Stable Diffusion.
It’s always cool to see AI art in the wild. @lerchekatze produced a video input animation for a DJ set at an @ArtBasel party this week.
It’s getting cold outside. Each week we share a style that produces some cool results when used in your prompts. This weeks style is based upon a finetuned Stable Diffusion model that generates neat crocheted wool images (using 3d crocheted wool
also creates a cool style in MidJourney, example below and here). Trained with Dreambooth by @plasm0.
ChatGPT plugins are popping up like wildfire. This one is my most favourite since it lets me send prompts to ChatGPT through the browsers Omnibar or by highlighting text.
@multimodalart setup a @huggingface space to train Dreambooth models. This is likely the easiest way to train your custom SDv1 and SDv2 models. Simply duplicate the Space and you’re ready to go.
@camenduru published the Automatic1111 WebUI on a @huggingface space. No need to own a local GPU or setup a Cloud hosting service. Just simply duplicate the space to run it privately and to use your own preferred checkpoints or Dreambooth models.
Another Prompt Book by @learn_prompting, but this one is not exclusively for Stable Diffusion but in general for LLMs like ChatGPT. If you’re having trouble getting the results you want, this might do the trick for you.
All hail MiDaS. MiDaS lets you generate depth maps based on your images which is for example used to guide 3D animation mode in the Deforum Stable Diffusion notebook. This week, @TomLikesRobots shows us another application example for creating 3D animated dioramas.
@MoonWatcher68 created an interesting style transfer showcase using @runwayml and EbSynth. Perfect fit for this issue’s cover challenge.
@GanWeaving inspired me this week to imagine made up stills from made up movies. I even went so far as to create a synopsis using GPT-3. I’ve always wanted to create a movie, AI might help me achieve that someday. What movie would you make?
Each week we share a style that produces some cool results when used in your prompts. This weeks style is based upon a finetuned Stable Diffusion model vaguely inspired by Gorillaz, FLCL and Yoji Shinkawa (combining these names also creates a cool style in MidJourney, example below). Trained with Dreambooth by @envvi_ai.
In case you’re not running with Automatic1111,@hahahahohohe created a super simple Google Colab @Gradio app that lets you play around with SDv2. Support for depth2img is currently missing, but will be added soon. The best thing: it works great on a free T4 GPU.
Upscayl is a free and open source AI Image Upscaler. I’ve been relying on Topaz Gigapixel until now, but this seems like a great free alternative that is available on all major operating systems.
This updated version of CLIP Interrogator is specialized for producing nice prompts for use with Stable Diffusion 2.0 using the ViT-H-14 OpenCLIP model! Code by @pharmapsychotic, Gradio demo by @fffiloni.
If you’re having trouble reproducing your old Stable Diffusion v1 prompts, fear not. @fffiloni built a HuggingFace demo that allows you to convert a v1.x Stable Siffusion prompt to a v2.x Stable Siffusion equivalent prompt based on the above CLIP Interrogator 2.1.
If you’re into 2D and 3D game assets you should check out @mirage_ml. They run a Text-To-3D service and started to open source their 2D models built upon Dreambooth on HuggingFace. The one linked here generates lowpoly worlds of your imagination.
@ninklefitz is working on @AlpacaML. A next-generation design platform with new tools & workflows for the upcoming age of generative AI models.
@bioinfolucas built a proof of concept RPG game demo in 5 hours using Godot, Krita, Stable Diffusion and MidJourney.
Each week we share a style that produces some cool results when used in your prompts. This weeks style is surprised anime <subject> reading the news
and works especially well with NijiJourney.
As far as I know, SD v2.0 models don’t work with existing user interfaces (yet). So if you want to build your own, @1littlecoder put together a short video tutorial on how to achieve that by using @Gradio.
Listen up, MTG nerds! HuggingFace user volrath50 the madlad created a comprehensive fine-tuned Stable Diffusion model trained on all currently available Magic: the Gathering card art (~35k unique pieces of art). Might be a cool experiment to create your own unique proxies.
(Probably) the worlds first Stable Diffusion v2.0 trained fine-tune model trained with Dreambooth by @Nitrosocke.
@HaihaoShen and the team behind Intel® Neural Compressor have made it possible to fine-tune Stable Diffusion on a CPU with a single image in around 3 hours of time. There is a demo on HuggingFace in case you want to try out.
@DIGIMONSTER1006 created a short two part science fiction comic based on the “comic” cover challenge this week. Part1 and Part2 can be found on Twitter.
@bioinfolucas showed us this week how to easily create animated 2D game assets using Stable Diffusion and sketch.metademolab.com.
@gd3kr developed an AI bot that answers all of grandma’s technology questions so you don’t have to spend an hour on FaceTime trying to get her to click the right buttons. For anyone who still had any doubts: I asked her if AI art is art and she said yes. You can live in peace now.
Each week we share a style that produces some cool results when used in your prompts. This weeks style is retro scifi art style
. There is also a dedicated Stable Diffusion model to produce more fine tuned results.
DreamArtist looks very similar to MidJourney’s v4 img2img feature. With just one training image DreamArtist learns the content and style in it, generating diverse high-quality images with high controllability.
@Nitrosocke did it again and created the first Multi-Style Model trained from scratch! This is a fine-tuned Stable Diffusion model trained on three artstyles (archer style, arcane style or modern disney style) simultaniously while keeping each style separate from the others.
Two weeks ago I shared a fine tuned Stable Diffusion spritesheet model in this section. Now @ronvoluted created a HuggingFace demo that turns the generated pixel characters into gifs.
Do you want to acquire a deeper understanding of how diffusion models work? Look no further. In this fantastic hands-on Google Colab notebook by @johnowhitaker you’ll learn about diffusion loops, text embeddings, img2img and arbitrary guidance.
This nifty algorithm optimizes your text-to-image prompts by rating the output based on its aesthetics. Notebook by @Omorfiamorphism.
An “older” guide by @KaliYuga_ai where she goes through the process of building an image dataset and turning it into a custom clip-guided diffusion model.
Apart from our beautiful cover art, this is the most cyberpunk content I’ve seen this week. The madlad @SCPSolver built a home brewed server with 4x NVIDIA K80 GPU’s with a total capacity of 96 GB VRAM. And if that’s not enough, he plans on building a second one and cluster them together to run inference on large language models with a total of 192 GB VRAM. All the parts for this he acquired on Ebay. You can find his shopping list on Reddit.
@BjoernKarmann created a proof of concept showing how voice command, selection gesture, SD and AR can come together to alter the reality around us.
Maybe not “art” in the traditional sense but still super cool to look at. By @Weiyu_Liu_: “StructDiffusion constructs structures out of a single RGB-D image based on high-level language goals. The model trained in simulation generalizes to real-world objects the robot has never seen before.”
If you want to try out GAN interpolation yourself, @makeitrad1 and @alterbeastlab put together a How-To Guide Twitter thread with some useful links and information that they used for creating their “Majestic Mycology” collection.
Each week we share a style that produces some cool results when used in your prompts. This weeks style is popup-book
. There is also a dedicated Stable Diffusion model to produce more fine tuned results.
If you want to improve your prompt game this book might be for you. Learn more about format, modifiers, magic words, parameters, img2img and other important tips.
@RiversHaveWings has trained a latent diffusion upscaler for the Stable Diffusion autoencoder. Colab written by @nshepperd1.
A huggingspace demo by @Flux9665 that lets you create AI voices for a specific text input. The voices still sound hearable artificial, but might be a cool play session to implement them in some experiment.
Automatic1111 WebUI finally got a Dreambooth extension. If you haven’t tried Dreambooth yet, this might be the time to do it. I’ll definitely take a look at it and try to run it on Paperspace as soon as I find some time for it.
The (maybe) first AI generated Twitter game. Every hour Prompto! tweets an image based on a GPT-3 generated clue. The goal is to guess the correct word that was used to generate those clues.
A insightful twitter thread by @sergeyglkn on how he used different (AI) tools to create an animated AR character.
This feels like it could be from a scifi movie. “Prompt Battle” is a rap battle inspired competition but with keyboards and AI image generation instead of words. Shared by @alexabruck.
Creating movies with AI is still kind of a mess. But things are evolving. Checkout this example by @coqui_ai where they use their own platform to create the voices, Stable Diffusion for the images, Google’s AudioLM model for the music and @AiEleuther‘s GPTJ model to write the script.
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is in the style of shotaro ishinomori and michael ancher
.
@DGSpitzer released a stunning Cyberpunk Anime model to recreate the style of the popular Cyberpunk 2077: Edgerunners anime. There are HuggingFace and Google Colab versions. Submit your creations to this weeks cover art challenge and get a chance to win $50.
@kylebrussel created a Stable Diffusion checkpoint which allows you to generate pixel art sprite sheets from four different angles.
With all these fine tunes getting released, I got curious how this works and I stumbled across this tutorial on how to do it using the automatic1111 web-ui. I also found a thread from yet another fine tune (Naturo anime) by @eoluscvka in which he shared @Buntworthy‘s GitHub guide on fine tuning stable diffusion on to create your own style transfer.
My last deforum music video took me almost 500 computing units on Google Colab. So I’ve decided to make the switch to paperspace.com and recorded a video tutorial to help you do the same.
A hugging space demo that sends an image in to CLIP Interrogator to generate a text prompt which is then run through Mubert text-to-music to generate music from the input image! By @fffiloni.
This week I came across this gem by @SatsumaAudio and had to share it with you. He’s building a rhythm based platformer and uses AI generated art to create the assets.
It’s cool to see AI art in the real world. SuperChief hosted a gallery by AI artist z_kai in LA last friday. If you have a project that includes AI art that touches the physical world, reach out to me, I’d love to share more stuff like this in future issues.
My latest Deforum Stable Diffusion experiment. Basically blew an entire month worth of computing units into this and created 10 different versions. Key take away: apart from the audio reactive stuff, prompts are still the most important ingredient. So don’t slack on them!
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is in the style of pascale campion and olivier bonhomme
.
This is the fine-tuned Stable Diffusion model trained on images from the TV Show Arcane. Use the tokens arcane style
in your prompts for the effect.
I came across Arcane Diffusion from a tweet by @rainisto where he explains how he uses Dreambooth to train SD new styles. In case you want to go a bit deeper, Suraj Patil created an analysis on their experiments with DB that might be interesting for you.
MotionDiffuse is a Text-To-Motion model which lets you generate human animation sequences. There is no export option to your favourite 3D tool yet, but they are working on it
I’ve put together a simple web utility that let]s you easily put together and adjust prompt keyframes for animations similar like this. Not rocket science, but helpful either way.
Last week I shared a Google Colab tutorial which let’s you generate audio tracks with Mubert’s Text-To-Audio model. This week they added a Hugging Face space, which makes generating audio tracks even easier now.
This weeks featured style was provided by @weirdmomma. She was so kind as to share her entire prompt with us A scarecrow in a field at night. Block print, letterpress. Poster art by Mary Blair, cubo-futurism, poster art, somber color palette, concert poster, hatch show print
. I played around with some variations of the prompt and found that I quite like the style of mary blair
by itself. Have fun!
If you are into pixel art, I got you. @PublicPrompts released a SD model trained on pixel art with dreambooth. The results look pretty cool and usable as game assets for instance.
Speaking about custom models. @proximasan trained a SD model with dreambooth to make some cute icons. The results look pretty cool as well!
@pharmapsychotic published CLIP-Interrogater on huggingface which got assigned a free T4 GPU. You ever wondered how AI would describe your selfie? Now you can find out for free.
An interesting article about how it’s possible to generate 8 images in 8 seconds using Google Colab and a TPU runtime.
Two weeks ago I shared a video from Computerphile on how AI image generators work. This week we go a bit deeper and take a look at AI Image Generation with Stable Diffusion.
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is in the style of richard burlet and slim aarons
.
A pytorch implementation of text-to-3D dreamfusion, powered by stable diffusion. Not of the same quality as the original dreamfusion, but might be fun to play around with.
If you don’t have a GPU and don’t want to use Colab to train a SD with a new concept, Astraea has you covered. It’s a bit more expensive (3$/concept) but apparently much easier to use. Haven’t tried it yet though. Credits to @TomLikesRobots for sharing.
Deforum v0.5 lets you use math formulas to modify various configurations when creating animations. Graphtoy lets you easily visualize those math functions with graphs.
Prompt Parrot generates text2image prompts from finetuned distilgpt2. If you ever need some inspiration for your prompts, prompt-parrot might help.
We’ve heard it all before and I bet most of us don’t have a clue what the “Latent Space” actually is. Ekin Tiu has done a great job explaining the fundamentals.
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is #inktober
.
If you want to expand your stylistic horizon the Stable Diffusion Artist Style Studies is a great resource to do so. The database includes over 1500 artists and their recognition status (whether the model recognized the name) and example images synthesized by SD.
In this tutorial you’re learning how to generate video input masks to only diffuse parts of a video frame that you want. Example.
If you’ve created portrait images of humans with AI, you certainly have come across weird looking facial features. GFPGAN is a blind face restoration algorithm towards real-world face images that aims to fix that. There is also a colab notebook.
There are already tons of different color palette generators, some of which ML behind the scenes. I played around with Huemint this week to pick the AI Art Weekly cover title color for this weeks issue.
AI image generators are massive, but how are they creating such interesting images? A more technical explanation on how image diffusion models work.
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is ink Dropped in water, splatter drippings
.
Point this notebook at a youtube url and it’ll make a music video for you. You don’t need a DreamStudio API key for this, just disable that setting and it’ll install the Diffusers right within the Colab.
Christian Cantrell is developing a free Stable Diffusion plugin for Photoshop which lets you generate new images with text2img, img2img and now even inpainting by using layer masks.
A browser interface based on Gradio library for Stable Diffusion with tons of features. This is the one CoffeeVectors is using to create X/Y plots.
I feel this is a good example on what dynamic character dialogue could look like in future video-games. If you sign up, chat to the character Grok I created. It’s a greek yoghurt that wants to take over the world. Thank you @darkestdollx for sharing.
A toolkit to generate 3D mesh model / video / nerf instance / multiview images of colourful 3D objects by text and image prompts input.
Each week we share a prompt that produces some cool results when used in combination with other prompts. This weeks featured prompt is Yoji Shinkawa
, a japanese artist best known as the lead character and mecha designer for the Metal Gear franchise.
First is a Google Colab notebook for creating video animations. If you haven’t tried this yet, please do, it’s amazing. There are different modes for 2D, 3D, video input or interpolation with each providing different results depending on what you want to achieve.
Last week I created a video input experiment with the Deforum colab above. People asked me how I did it, so I put together a quick Twitter tutorial on how to replicate this style of content.
A worlds first one click installer for StableDiffusion for Macs that have a M1/M2 chip. I haven’t tried this yet, but will coming next week.
Inspired by @mattdesl, user @radamar created a Hugging Face space which lets you create color palettes using text prompts
AI Art Weekly isn’t just about image and video related resources. Dance Diffusion is the first in a suite of generative audio tools for producers and musicians to be released by Harmonai. It lets you generate, regenerate and interpolate audio samples. Might be cool to incorporate and combine with other tools like the Deforum notebook.
Looking for prompt references? Lexica is a search engine that lets you search through millions of images generated by StableDiffusion AI. It also offers an API for more tech savvy people.