Last week, Midjourney launched the public alpha-testing of its V5 algorithm. In this study, we look in-depth into its innovative features and challenges and compare it with its predecessor to determine whether V4 is still relevant.
To make it more visual, here is a little magical toggle—use it to switch between V5 and V4 examples!
When V4 came out, many of my peers were disturbed by what they defined as "the disappearance of magic" that was present in V3. I can assure all of them now: the magic is back. There is poetry, there is the spark, and these new images are not just artful but alive!
"V5 isn't the final step, but we hope you all feel the progression of something deep and unfathomable in the power of our collective human imagination"
— from Midjourney's team official announcement
V5 is "much more 'unopinionated' than v3 and v4."
It has a much wider stylistic range.
The model can generate significantly more realistic imagery.
It is more responsive to prompting. And the prompting strategies have changed, too!
V5 renders more detailed images, and details are more likely to be correct (yes, the hands improved greatly; and it is promised to generate way less unwanted text).
Image prompting performance improved. And the new model supports --iw for weighing image prompts versus text prompts.
Remixes are much better.
In this guide, we will put each of these statements to the test.
Based on multiple tests, here is my interpretation of this statement: the current model is VERY straightforward. To best illustrate this, I devised a somewhat complex prompt that makes little sense and leaves space for interpretations. And a surprising thing happened...
The "unopinionatedness" becomes especially self-evident when you compare Artistic Techniques in V5 vs. V4. Where V4 rendered the actual results of a technique application, V5 tends to return the images of the process of said application: Haute couture fashion is no longer a dress but a fashion show, Pinhole photography must include a pinhole camera, and Encaustic paint is not a beautiful abstract artwork, but buckets of… paint.
Finally, Midjourney V5 generates WAY more photorealistic images. Whereas before, many styles were depicted as paintings or illustrations, now they are photographs. And that, together with a straightforward, literal approach to prompts, often makes an average V5 generation uninteresting, characterless, and less varied—without additional moves.
Good news: we have those moves! And the first and most powerful one is specifying a style—style references became much more impactful in V5. And many styles themselves—much more precise, detailed, and nuanced.
"Much wider stylistic range" might mean two things: MJ now knows more styles, and existing styles have become more nuanced and varied. We will get to nuances down the road, and now, let's look at how the existing styles have changed.
We are re-checking our backlog of artistic styles that V4 rejected, and will be adding the styles known to V5 to Midlibrary catalog regularly. When the list of new styles will be significant enough, expect the update of this study!
For the first time since Midjourney arrived, I genuinely confused an AI-generated image posted on Instagram with an actual photograph. For a few instants, I couldn't believe my eyes seeing the #MidjourneyV5 hashtag below the picture.
It. Truly. Is. Insane.
Needless to mention, photographers' styles improved dramatically, and some non-photographic techniques acquired a much more photorealistic look.
However, there is a "downside," too. By default, Midjourney V5ɑ tends to render more photorealistic images. Thus, frequently, you must "nudge" it to get less literal and more artistic results.
Well, maybe the Italian mother is a bad example: just look at that baby! <3
And let's remember that photorealism is not just about clean, modern photographs. More "vintage" photographic techniques and classical photographers' styles became more realistic, too!
It is a 'pro' mode of the model tuned to provide a wide diversity of outputs and to be very responsive to your inputs. The tradeoff here is that it may be harder to use. Short prompts may not work as well. You should try to write longer, more explicit text about what you want.
— from Midjourney's team official announcement
For this experiment, I chose a simple prompt and then gradually complicated it using the same --seed value—to keep the results more consistent.
Apart from improved photorealism... I wouldn't say that the difference is that striking. Okay, let's try with one of the "megaprompts" from our Image-to-Text-to-Image study ↗︎
Once again, in these example the difference doesn't seem to be that big. If anything, V5 is kind of losing this round to V4.
Finally, let's see how two or more styles combine in complex prompts in V5 vs. V4.
Can you spot the Big Difference?
TL:DR, it's true. V5 renders much more subtleties in details, making them more intricate, refined, and more correct indeed!
The styles that were already detailed in V4 perfectly show the new model's progress.
During the office hours announcement, the Midjourney team mentioned that V5 got much better at rendering groups of people. And there are many proofs of that.
Another statement the team made: V5 generates much less unwanted text and text in general—up to a point where even "infographics may suffer." Is that so?
Seems like infographics lovers can sleep tight. :) And the following samples got me confused—it seems like, for now, V5 returns even more text than V4…
Finally—yes—hands got MUCH better!
And V5 generates more hands by default.
Image Prompts are one of my favorite parts of testing Midjourney's new models. Why? Because I love how MJ sees and reinterprets the portrait of Francis D.! Here is what V5 is capable of, given the such powerful source material.
What you immediately notice is the variability of output. Before, in most cases, Midjourney would inherit the original's close-up framing. In V5, the angles, the framing, and the situation constantly change, offering more options for a single-image input.
But Francis' portrait is very characteristic, with dramatic lighting emphasizing facial features. What about a more bland appearance and flatter light? Here's my self-portrait from many years ago.
Obviously, V5 is more intricate, detailed, and varied than everything we've seen before. However, I wouldn't say that the face recognition algorithm improved much. The Image Prompting rules for portraits remained the same: characteristic faces with stand-out features and emphasizing lighting work better. ;)
As we've seen previously, Midjourney V5 is much better at group portraits. How about groups and Image Prompts?
I'd say, that group portraits are still challenging for both V4 and V5.
And what about non-portrait images? Let's see how V5's Image Prompts work with still life, landscape, and… weird stuff. :)
As you can see, even with specific prompts (e.g., comic strip by Geoff Darrow), Midjourney sometimes struggles to turn a photorealistic image into an illustration or painting. In Darrow's case, V4 seems to have done a slightly better job with the backdrop.
V5's Remix mode seems to work in roughly the same manner as in V4. You can change details, transition reference style, and even affect the context—to a certain extent. Because the more you remix, the more glitchy and distorted the outcome becomes.
And highly detailed pictures get messed up after the first remix. And it is almost impossible to change significant details: e.g., switch day to night or invert colors—without "breaking" the image.
Undoubtedly, V5 is as revolutionary as V4 was when it first appeared—if not more so. Midjourney's new model is groundbreakingly capable and has magic that was somehow lost after V3!
Of course, Midjourney V5 is not perfect. But we are looking at the early Alpha release! It will surely get better, more understanding, and even more powerful. Until then… let's say there are some challenges.
The main one being "unopinionatedness," responsible for more literal and less "artistic" outcomes, more photorealistic images, and dull defaults.
But V5 truly shines when you apply style modifiers to your prompts! And the styles themselves became much more precise and varied. I was genuinely pleased to see how my favorite styles evolved in V5.
So do we still need V4? V5 is amazing, but it still has a long way ahead of it. For now, V4 might be more "artistic" without additional efforts and can still deliver outstanding results (in some cases—more interesting and varied than V5). So wait to discard it just yet!
— Andrei Kovalev
You can help us maintain and expand Midlibrary and produce more regular educational content of higher quality. And keep it free for all!
All samples are produced by Midlibrary team using Midjourney AI. Naturally, they are not representative of real artists' works/real-world prototypes.
We'll be grateful for shares and backlinks!
Midlibrary Catalog grows largely through the contributions of our Community.
Thank you for taking time to share your suggestion!
We do our best to keep this website running as smoothly as possible.
However, stuff happens. Thank you for letting us know about it!
Every week we publish a new Midjourney study and a new Editor's Pick.
Receive our newsletter to never miss an important Midlibrary update!