Still Writing Prompts in One Sentence? You Can Also Use AI to Generate What You Want
Currently, Midjourney is Boka's most frequently used AI image generation tool, producing French-style illustrations, 3D-style IP characters, and match-3 game art assets under her guidance.
Your AI and my AI seem different. AI can allow amateurs to create an image with just one sentence, but generating specific, stylized works remains challenging for many people.
There are roughly two reasons: AI is not that controllable, and we may not understand AI, unable to grasp its essence.
When I asked Boka, "Are there any tricks to writing prompts?", Boka calmly replied, "Actually, knowing English is enough."
While this is certainly a major prerequisite, there are also some specific, feasible methodologies for making AI do exactly what you want.
First, you need to know what you want AI to generate, right?
Taking the Western match-3 games (puzzle games where you eliminate at least three identical elements) that Boka is familiar with as an example, there are definitely background images, patterns for elimination, icons representing various rewards, and if combined with decorative gameplay to unlock different scenes, there might also be furniture like sofas.
In this case, you can simply try writing what items you need to generate in the prompt: a treasure chest is "treasure chest", and a key is "key".
Next, how do you determine the art style?
One way is to learn from tutorials and others' prompts, accumulating some specific prompts.
To use AI to generate match-3 game interfaces, Boka watched many YouTube videos on using AI for UI icon design.
From these, she learned a key prompt: "multiple item spritedsheet", which generates a collection of images containing multiple related patterns, making it easier to maintain consistency in icon style and angle, avoiding AI deviations.
Another way is to let AI tell us what prompts to use.
Taking the match-3 game as an example again, how do you write a prompt if you need a top-down view effect for the background image?
Boka's approach is not to rush into writing, but to find an image that meets our requirements, upload it to Midjourney, and let its Describe feature provide prompts for this image.
At the same time, we don't need to accept all the prompts given by AI. We only need the parts related to the perspective, such as "a top-down view of an interior room", and incorporate it into our own prompts.
But just writing good prompts is still not enough. Many AI-generated images have a similar style - bright, glossy, lacking personality, easily forgettable.
Boka explains that this is because the nature of AI models determines that their painting style is convergent and popularized. When describing a scene with language alone, AI often generates an ordinary, boring image that conforms to Western aesthetics.
At the same time, language is not precise, making it difficult to directly generate the style we want. When it comes to "Chinese style", a thousand people might have a thousand answers in mind, and AI doesn't understand which one we want.
The simplest solution is to use "image prompts", giving AI a clear indication. If using Midjourney, uploading relevant images and using the style reference feature "--sref" can anchor the art style.
It can be said that when the text prompts remain unchanged, image prompts directly determine the quality and style of the generated image. The more stylized the image prompt, the less stereotypical the generated image will be.
According to Boka's experience, image prompts don't need to be complex; the simpler, the more straightforward the effect. Using a Western cartoon-style chest with a blank background as an image prompt can turn ordinary icons into icons that fit the style of Western match-3 games.
These experiences in generating images all come from Boka's learning tutorials and personal practice.
Boka believes that with more attempts, whether it's prompts or workflows, we can quickly develop our own set of AI methodologies and become proficient, because she feels that "AI actually has a pretty low threshold."
No Rush to Embrace AI, But Once You Start, Use Every Feature Well
All in all, Boka has only been using generative AI for eight or nine months, with only three tools used most frequently: ChatGPT, Midjourney, and KREA (a high-definition restoration software), maintaining a "keep it simple" attitude.
Midjourney was launched in July 2022, but when Boka saw images generated from a single sentence before, she didn't have a strong, impactful feeling because the quality wasn't great.
At the beginning of this year, AI image generation tools had several major updates. The technology became more mature, with more controllable small features, gradually allowing Boka to see the possibility of commercialization. Only then did she feel it was time to take it seriously.
Regarding new technologies, Boka's attitude is that learning is definitely necessary, but it's okay to wait a bit for more mature products, otherwise, a lot of unnecessary effort might be wasted. Once you really start using it, you should make the most of it.
Midjourney's features are often talked about, but I think using these features well is also a process that requires a lot of practice.
AI is still not fully controllable, and running images is a daily occurrence, but there are always some solutions to prevent AI from being too free-spirited.
Boka has done a lot of IP design using Midjourney, often using two of its features: the style reference feature "--sref" to anchor the art style, and the character consistency feature "--cref" to anchor character images.
This way, after multiple generations, the similarity between images can still be maintained at 80 to 90%.
Sometimes, the images generated by AI are not complete enough. For example, we hope to get a full-body IP image, but the AI-generated result doesn't include feet.
Boka suggests either trying a few more times or using Midjourney's image expansion feature "Zoom Out" to extend the canvas, allowing AI to generate the previously missing parts.
Additionally, AI often generates fragmented, unnecessary things. Boka uses Midjourney's feature to edit specific areas "Vary (Region)" for simple removal.