AI Toolbox | AI Art Weekly

Control-A-Video

Control-A-Video can generate controllable text-to-video content using diffusion models. It allows for fine-tuned customization with edge and depth maps, ensuring high quality and consistency in the videos.

23.05.23 · Project Page · Code · Text-to-Video · Video Editing · Video Style Transfer

Text2NeRF

Text2NeRF can generate 3D scenes from text descriptions by combining neural radiance fields (NeRF) with a text-to-image diffusion model. It creates high-quality textures and detailed shapes without needing extra training data, achieving better photo-realism and multi-view consistency than other methods.

19.05.23 · Project Page · Code · Text-to-3D

Drag Your GAN

DragGAN can manipulate images by letting users drag points to change the pose, shape, and layout of objects. It produces realistic results even when parts of the image are hidden or deformed.

18.05.23 · Project Page · Code · Image Editing

LDM3D

DragonDiffusion can edit images by moving, resizing, and changing the appearance of objects without needing to retrain the model. It lets users drag points on images for easy and precise editing.

18.05.23 · Code · Text-to-3D

FastComposer

FastComposer can generate personalized images of multiple unseen individuals in various styles and actions without fine-tuning. It is 300x-2500x faster than traditional methods and requires no extra storage for new subjects, using subject embeddings and localized attention to keep identities clear.

17.05.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

Make-A-Protagonist

Make-A-Protagonist can edit videos by changing the protagonist, background, and style using text and images. It allows for detailed control over video content, helping users create unique and personalized videos.

15.05.23 · Project Page · Code · Video Editing · Text-to-Video

CoMoSpeech

CoMoSpeech can synthesize speech and singing voices in one step with high audio quality. It runs over 150 times faster than real-time on a single NVIDIA A100 GPU, making it practical for text-to-speech and singing applications.

11.05.23 · Project Page · Code · Text-to-Speech

HumanRF

HumanRF can capture high-quality full-body human motion from multiple video angles. It allows playback from new viewpoints at 12 megapixels and uses a 4D dynamic neural scene representation for smooth and realistic motion, making it great for film and gaming.

10.05.23 · Project Page · Code · Video-to-4D

InstantBooth

What if you could generate images from an untrained concept by providing a few images and without having to fine-tune a model first? InstantBooth from Adobe might be the answer. The novel approach is built upon pre-trained text-to-image models that enables instant text-guided image personalization without finetuning. Compared to methods like DreamBooth and Textual-Inversion, InstantBooth model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster. Wen open-source?

10.05.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

Sketching the Future (STF)

[Sketching the Future] can generate high-quality videos from sketched frames using zero-shot text-to-video generation and ControlNet. It smoothly fills in frames between sketches to create consistent video content that matches the user’s intended motion.

10.05.23 · Project Page · Code · Sketch-to-Video · Text-to-Video

Shap-E

Shap-E can generate complex 3D assets by producing parameters for implicit functions. It creates both textured meshes and neural radiance fields, and it works faster with better quality than the Point-E model.

03.05.23 · Project Page · Code · Demo · 3D Object Generation · 3D Mesh Generation · Text-to-3D

Ray Conditioning

Ray Conditioning is a lightweight and geometry-free technique for multi-view image generation. You have that perfect portrait shot of a face but the angle is not right? No problem, just use that shot as an input image and generate the portrait from a another angle. Done.

26.04.23 · Project Page · Code · Image-to-Image · Image Editing

Patch-based 3D Natural Scene Generation from a Single Example

Patch-based 3D Natural Scene Generation from a Single Example can create high-quality 3D natural scenes from just one image by working at the patch level. It allows users to edit scenes by removing, duplicating, or modifying objects while keeping realistic shapes and appearances.

25.04.23 · Project Page · Code · 3D Scene Generation · 3D Object Generation

Total-Recon

Total-Recon can render scenes from monocular RGBD videos from different camera angles, like first-person and third-person views. It creates realistic 3D videos of moving objects and allows for 3D filters that add virtual items to people in the scene.

24.04.23 · Project Page · Code · Video Scene Detection · Video Object Detection · Video Analysis

Improved Diffusion-based Image Colorization via Piggybacked Models

Improved Diffusion-based Image Colorization via Piggybacked Models can colorize grayscale images using knowledge from pre-trained Text-to-Image diffusion models. It allows for conditional colorization with user hints and text prompts, achieving high-quality results.

21.04.23 · Project Page · Code · Image Colorization

DiFaReli

DiFaReli can relight single-view face images by managing lighting effects like shadows and global illumination. It uses a conditional diffusion model to separate lighting information, achieving photorealistic results without needing 3D data.

19.04.23 · Project Page · Code · Image Editing

Expressive Text-to-Image Generation with Rich Text

Expressive Text-to-Image Generation with Rich Text can create detailed images from text by using rich text formatting like font style, size, and color. This method allows for better control over styles and colors, making it easier to generate complex scenes compared to regular text.

13.04.23 · Project Page · Code · Demo · Text-to-Image

DreamPose

DreamPose can generate animated fashion videos from a single image and a sequence of human body poses. The method is able to capture both human and fabric motion and supports a variety of clothing styles and poses.

12.04.23 · Project Page · Code · Image-to-Video

Inst-Inpaint

Inst-Inpaint can remove objects from images using natural language instructions, which saves time by not needing binary masks. It uses a new dataset called GQA-Inpaint, improving the quality and accuracy of image inpainting significantly.

06.04.23 · Project Page · Code · Image Inpainting · Text-to-Image

Follow Your Pose

Follow Your Pose can generate character videos that match specific poses from text descriptions. It uses a two-stage training process with pre-trained text-to-image models, allowing for continuous pose control and editing.

03.04.23 · Code · Text-to-Video · Video Editing