The “Open-Source Version of GPT-4o” is Here! This 17B Domestic Model Matches GPT-4o in Image Generation and is Commercially Usable
Not long ago, GPT-4o gained massive popularity due to its dramatically improved image generation and editing capabilities, making everyone eager to give it a try. Although OpenAI later announced that free users could also access it, slow image generation and limited usage still troubled those who hadn’t subscribed to ChatGPT.
So, aside from GPT-4o, do we have other options? A quick look at the Artificial Analysis Arena for large text-to-image models will provide some answers.
In this arena, we discovered a model that ranked second recently — HiDream-I1, with 17 billion parameters, scoring very close to GPT-4o.

The AI benchmarking and analysis platform, Artificial Analysis, announced on Twitter that HiDream-I1 has become the new state-of-the-art (SOTA) open-source text-to-image model. The platform uses an arena-style evaluation method: two images generated by different models are presented simultaneously, and humans select which one aligns best with the prompt.
It’s worth noting that within 24 hours of its release, this model climbed to the top of the Artificial Analysis Arena rankings, becoming the first Chinese self-developed generative AI model to achieve this feat.
Through several comparison images, it’s evident that HiDream-I1’s generation quality rivals GPT-4o and even surpasses FLUX1.1 [pro], which was previously hailed as “dethroning Midjourney.” More importantly, among these three models, only HiDream-I1 is open-source and commercially usable under the MIT license.


HiDream-I1 Model: https://huggingface.co/HiDream-ai/HiDream-I1-Full
HiDream-I1 Code: https://github.com/HiDream-ai/HiDream-I1
Additionally, the domestic company behind this model, ZhiXiang Future, just announced another upcoming open-source model called HiDream-E1, which supports interactive image editing. Similar to GPT-4o, it can transform any provided image into various styles or content. Together, these two models replicate GPT-4o’s “word-to-action” effect in image generation and editing, filling the gap for an “open-source version of GPT-4o.”

HiDream-E1’s image editing effects will be open-sourced soon.
Now, let’s dive deeper into what makes HiDream-I1 stand out through more examples.
How Does HiDream-I1 Perform in Image Generation?
The reason GPT-4o and FLUX gained traction lies in their ability to generate realistic, detailed images while accurately following instructions. To test whether HiDream-I1 deserves the title of “open-source GPT-4o,” we referenced some prompts OpenAI used when showcasing GPT-4o’s new capabilities.
Here are the results:
Prompt 1:
“A photorealistic image of a horse galloping from right to left across a vast, calm sea. Accurately depict splashing water, reflections, and subtle ripple patterns beneath its hooves. Exaggerate the horse’s motion, but everything else should remain still and quiet to contrast with the horse’s power. Clean composition, cinematic. Wide panoramic shot showing the distant horizon. Atmospheric perspective creates depth. The horse appears tiny compared to the vast ocean.”

Prompt 2:
“A photo of a fruit platter blending real fruits with miniature planets (Jupiter, Saturn, Mars, Earth). Maintain realistic reflections, lighting, shadows, and texture consistency. Clean composition with crisp details.”

Prompt 3:
“A realistic underwater scene where a dolphin swims into a derelict subway car through a window. Precisely simulate bubbles and water flow details.”

Prompt 4:
“A paparazzi-style candid photo of Einstein rushing through the parking lot of an American mall. He glances back with a surprised expression, trying to avoid being photographed. He carries shiny shopping bags filled with luxury items. His coat flutters in the wind, and one bag swings as if he’s striding forward. The blurred background features cars and a glowing mall entrance, emphasizing motion. Flash overexposure adds a chaotic tabloid feel.”

Overall, HiDream-I1’s generated images are very close to GPT-4o in terms of realism and detail, sometimes even surpassing it. When compared to FLUX, this advantage becomes more apparent.
For example, in the image below, HiDream-I1 generates more intricate elements, including textures, background details, and layers between objects (e.g., individual strands of cat fur illuminated by light, creating a vivid sense of life; the stainless steel coffee pot reflects light perfectly, enhancing realism). While FLUX also produces detailed images, it falls short in material texture compared to HiDream-I1.
Prompt:
“A cute orange cat sitting next to a coffee grinder, slowly turning the handle with its paw. Capture the cat’s focused expression and gentle purring in a cozy, tranquil kitchen. Soft, warm light streams through the window, casting gentle shadows on the cat and grinder, enhancing the serene atmosphere. Rendered in a photorealistic style, emphasizing calmness and intimacy.”

In terms of color rendering, HiDream-I1 performs better, producing layered and diverse tones. For instance, observe the wolf’s facial fur in the image below—HiDream-I1 and GPT-4o display richer color gradations. FLUX, while vibrant, often lacks saturation and depth in certain scenes.
Prompt:
“A 3D wolf dressed in a musician’s tailcoat, standing upright like a human, holding a guitar surrounded by amplifiers and a stage, exuding artistry and elegance.”
Moreover, this realism stems from the model’s understanding of objective laws. As seen below, HiDream-I1 demonstrates precise comprehension of physical rules, whether in object placement, character poses, or environmental lighting. FLUX, on the other hand, struggles with dynamic scenes and complex physical interactions, often producing unrealistic results.
Prompt:
“A 3D cat dressed in a musician’s tailcoat, standing upright like a human, playing a violin surrounded by swirling musical notes and a grand piano. Spotlight illuminates the scene, creating a dramatic and refined environment.”

Even with complex prompts, HiDream-I1 retains these qualities, showcasing its advanced text comprehension and instruction-following abilities.
Prompt:
“Stone walls of a medieval castle. An armored warrior faces the camera, flames dancing behind him and outlining his rugged face. Sparks fly onto rusted chainmail as his right hand instinctively grips the sword hilt. A dark brown cloak billows violently in the heat. Burning arrows fall continuously from a distant tower, contrasting orange-red firelight against an indigo night sky, illuminating moss-covered battlements and scars on the warrior’s brow.”

These visual strengths are corroborated by benchmark data:
- HPSv2.1: A preference prediction model trained on human preference data. HiDream-I1 excels across multiple styles (anime, concept art, painting, photorealism), indicating its alignment with human aesthetics.
- GenEval: Validates image-text matching via object detection and color classification. HiDream-I1 achieves top scores, demonstrating strong instruction-following capabilities.
- DPG-Bench: Focuses on detecting multiple objects, detailed attributes, and complex relationships in images. HiDream-I1 again ranks highest, especially for long, intricate prompts.
What Technical Improvements Did ZhiXiang Future Make to Enhance Image Generation?
The model’s robust instruction-following ability and realistic, detailed outputs stem from technical innovations.
To improve text comprehension, HiDream-I1 adopts a new architecture called Sparse Diffusion Transformer (Sparse DiT), integrating Sparse Mixture-of-Experts (MoE) technology. Different expert models handle specific types of text inputs, specializing in their respective domains.
This architecture also offers an additional benefit: enhanced performance with controlled computational costs, making HiDream-I1 cost-effective for personal developers and startups concerned about resource consumption.
Image quality improvements come from incorporating adversarial learning into diffusion model distillation. By leveraging GANs’ ability to capture fine details and sharpen edges, HiDream-I1 achieves greater realism and clarity while optimizing speed and quality.
Notably, HiDream-I1’s scalability allowed ZhiXiang Future to quickly extend it into HiDream-E1, an interactive image editing model, providing an open-source alternative to GPT-4o for image editing tasks.
HiDream Series Models Gain Initial Open-Source Influence
From both practical tests and benchmark results, ZhiXiang Future’s HiDream-I1 closely matches GPT-4o, solidifying its position among the top-tier domestic image generation models.
As an open-source model, its international influence is growing. Shortly after its release, another model company in the arena, Recraft AI, announced integration of HiDream-I1 and provided tutorials for users.
On the HuggingFace Trending list, HiDream-I1 soared to second place, reflecting high download and like counts and community popularity.
For those without local deployment needs, HiDream-I1 can also be experienced on ZhiXiang Future’s official platform, Vivago, which offers a complete workflow, including video creation based on generated images.
In the near future, ZhiXiang Future plans to release a multimodal agent product, enabling users to generate and edit images/videos through conversational dialogue, progressively creating story-driven content.
According to ZhiXiang Future’s CTO, Yao Ting, realism, instruction-following, and narrative capabilities are fundamental to user satisfaction. By excelling in these areas and open-sourcing the model, they’ve removed barriers for developers and companies in this field.
Stay tuned for the next open-source model, HiDream-E1, along with its benchmark data, expected to deliver an exceptional editing experience.
Reproduction without permission is prohibited:AI LAB » The “Open-Source Version of GPT-4o” is here! This 17B domestic model generates images on par with 4o and is also commercially usable.