Create Anime Characters with A.I. !

We all love anime characters and are tempted to create our own, but most of us cannot do that because we are not professional artists. What if anime characters could be generated automatically at a professional level of quality? Imagine that you could just specify attributes (such as blonde/twin tailed/smiling), and have an anime character with your customizations generated without any further intervention!

In the community there are already several pioneering works in anime image generation, such as ChainerDCGAN, Chainerを使ってコンピュータにイラストを描かせる, and projects like IllustrationGAN and AnimeGAN which have their code available online. However, the results generated by these models are often blurred and distorted, and generating industry-standard facial images for anime characters remains an unsolved problem.

As a step towards tackling this challenge, we propose a model that produces high quality anime faces with a promising rate of success.

Dataset: a Good Quality Model Begins with a Clean Dataset.

To teach computers to do things requires high quality data, and our case is not an exception. The quality of images on large scale image boards like Danbooru and Safebooru varies wildly, and we think this is at least part of the reason for the quality issues in previous works. So, instead, we use “standing pictures” (立ち絵) from games sold on Getchu, a website for learning about and purchasing Japanese games. Standing pictures are diverse since they are rendered in different styles for different genres of game, yet reasonably consistent since they are all part of the domain of game character images.

We also need categorical metadata (a.k.a tags/attributes) for the images, like hair color, and whether faces are smiling or not. Getchu does not provide such metadata, so we use Illustration2Vec, a CNN-based tool for predicting anime image tags.

Model: The Essential Part

A good generative model is also a must-have for our goal. The generator should understand and follow the user’s specified attributes, which is called our prior, and it should also have the freedom to generate different, detailed visual features, which is modeled using noise. We use a popular framework called GANs (Generative Adversarial Networks) to accomplish this.

GANs use a generator network to generate images from the prior and noise inputs, and also another network which tries to distinguish ’s generated images from real images. We train them both, and in the end should be able to generate images so realistic that cannot differentiate them from real images with that prior. However, training GANs properly is infamously hard and time-consuming. Luckily a recent advance, named DRAGAN, can give plausible results compared to other GANs with comparatively little computational power required. We successfully train a DRAGAN whose generator is SRResNet-like.

We also need our generator to know the label information, so that the user’s specifications can be incorporated. Inspired by ACGAN, we feed the labels to the generator , along with noise, and add a multi-label classifier on the top of the discriminator which is asked to predict the assigned tags for the images.

With this data and model, we train straightforwardly on GPU-powered machines.

Samples: A Picture is Worth a Thousand Words

To sample the quality of our model, see generated images like the following: it handles different attributes and visual features well.

One interesting setting is fixing the random noise part and sampling random priors. The model is now required to generate images with similar major visual features and different attribute combinations, and it does this successfully:

Also, by fixing priors and sampling randomly for the noise, the model can generate images which have the same attributes with different visual features:

Web Interface: Bringing the Neural Generator to your Browser

In order to make our model more accessible, we built a website interface with React.js for open access. We do the generation entirely client-side by using WebDNN and converting the trained Chainer model to a WebAssembly-based Javascript model. For a better user experience, we want to keep the size of the generator model small (since users need to download the model before generating anything), and our choice of an SRResNet generator makes the model times smaller than the popular DCGAN generator without compromising on the quality of results. Speed-wise, even though all computations are done on the client side, it still only takes about seconds on average to generate an image.

For more technical details, check out our paper on arXiv which is initially available as the techinical report.