Language

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Lumine Team

We introduce Lumine, a generalist agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions across 3D open-world environments.

Lumine can even complete hours-long missions in unseen scenarios and even entirely new games!

An Efficient and Scalable Recipe for Building General-Purpose Agents

Built upon Qwen2-VL-7B, Lumine integrates perception, reasoning, and fine-grained control in a human-like manner. It processes raw pixels to generate precise keyboard–mouse actions at 5 Hz and dynamically invokes explicit reasoning only when necessary, enabling a balance between deliberative planning and reactive behavior.

Lumine makes full use of the collected raw gameplay data, following a curriculum that builds skills incrementally:

  • 1731 hours of human gameplay for pre-training to master action primitives;
  • 200 hours of instruction following data to ground control in language;
  • 15 hours of reasoning data to enable adaptive thinking.

The resulting model can not only autonomously complete hours-long missions but also follow diverse instructions to accomplish a broad spectrum of tasks.

Combat

Benefiting from large-scale pretraining, Lumine has mastered the essential combat skills, dynamically tracking enemies, accurately striking distant targets with a bow, seamlessly switching characters to perform combo attacks, and efficiently locating and opening treasure chests unlocked after combat.

Defeat the enemies ahead and collect the chest

Complete the Domain

Complete the Daily Commission: Defeat all enemies

Boss Fight

Besides regular combat, Lumine also shows a strong understanding of boss mechanics and the ability to respond effectively. It can skillfully evade powerful attacks and employ appropriate strategies to defeat them.

Defeat the Electro Hypostasis

Defeat the Electro Hypostasis

Defeat the Electro Hypostasis

Defeat Stormterror

Defeat Stormterror

Defeat the Anemo Hypostasis

Puzzle

Lumine can handle various challenges and puzzles in the game, which typically require a thorough understanding of game mechanics, strong spatial reasoning skills, and precise low-level control.

Fly along the Wind Current to collect the Anemoculus

After defeating the floating Anemo Slime, open the chest

Collect the three Wind Anemograna to activate a Wind Current, then enter the Wind Barrier to open the chest

Open the chest wrapped in thorns ahead

Activate the Elemental Monument using the corresponding element

Complete the Time Trial Challenge ahead: Open the chest within the time limit

NPC Interaction

Lumine exhibits reliable instruction-following ability, consistently interacting with designated NPCs within crowds, laying a solid foundation for accomplishing long-term missions.

Talk to NPC Grace

Talk to NPC Monroe

Talk to NPC Sayid

GUI Manipulation

Beyond open-world exploration, Lumine can also perform efficient GUI operations through human-like relative mouse movements, achieving a unified interaction between 2D interfaces and the 3D world, a capability that is crucial for generalist agents.

Cook Sweet Madame

Teleport using a Teleport Waypoint

Change the character's weapon

In-Context Learning

Meanwhile, Lumine has demonstrated strong in-context learning abilities. When provided with prior task information or more detailed decomposition steps within the instruction, Lumine can successfully complete a range of tasks that it was previously unable to perform.

Climb the stone pillar on the right and, once you reach the top, collect the blue Anemoculus floating in the air on the left

Switch to Kaeya, continuously use his Elemental Skill (E Skill) to freeze the water surface, and collect the Anemoculus floating ahead

Hit the Iron Chunk and collect the dropped Iron Chunk

Lumine's promising results highlight its strong potential for further scaling. Its impressive zero-shot generalization to unseen missions and even entirely new games, indicates that the model has learned transferable meta-skills, such as 3D navigation and 2D manipulation, that extend beyond the training environments. These findings underscore the promise of Lumine’s approach as a foundation for developing general-purpose decision models and as an ideal starting point for reinforcement learning to achieve superhuman intelligence.