Lumine can even complete hours-long missions in unseen scenarios and even entirely new games!
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
We introduce Lumine, a generalist agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions across 3D open-world environments.
An Efficient and Scalable Recipe for Building General-Purpose Agents
Built upon Qwen2-VL-7B, Lumine integrates perception, reasoning, and fine-grained control in a human-like manner. It processes raw pixels to generate precise keyboard–mouse actions at 5 Hz and dynamically invokes explicit reasoning only when necessary, enabling a balance between deliberative planning and reactive behavior.
Lumine makes full use of the collected raw gameplay data, following a curriculum that builds skills incrementally:
- 1731 hours of human gameplay for pre-training to master action primitives;
 - 200 hours of instruction following data to ground control in language;
 - 15 hours of reasoning data to enable adaptive thinking.
 
The resulting model can not only autonomously complete hours-long missions but also follow diverse instructions to accomplish a broad spectrum of tasks.
Combat
Benefiting from large-scale pretraining, Lumine has mastered the essential combat skills, dynamically tracking enemies, accurately striking distant targets with a bow, seamlessly switching characters to perform combo attacks, and efficiently locating and opening treasure chests unlocked after combat.
Defeat the enemies ahead and collect the chest
Complete the Domain
Complete the Daily Commission: Defeat all enemies
Boss Fight
Besides regular combat, Lumine also shows a strong understanding of boss mechanics and the ability to respond effectively. It can skillfully evade powerful attacks and employ appropriate strategies to defeat them.
Defeat the Electro Hypostasis
Defeat the Electro Hypostasis
Defeat the Electro Hypostasis
Defeat Stormterror
Defeat Stormterror
Defeat the Anemo Hypostasis
Puzzle
Lumine can handle various challenges and puzzles in the game, which typically require a thorough understanding of game mechanics, strong spatial reasoning skills, and precise low-level control.
Fly along the Wind Current to collect the Anemoculus
After defeating the floating Anemo Slime, open the chest
Collect the three Wind Anemograna to activate a Wind Current, then enter the Wind Barrier to open the chest
Open the chest wrapped in thorns ahead
Activate the Elemental Monument using the corresponding element
Complete the Time Trial Challenge ahead: Open the chest within the time limit
NPC Interaction
Lumine exhibits reliable instruction-following ability, consistently interacting with designated NPCs within crowds, laying a solid foundation for accomplishing long-term missions.
Talk to NPC Grace
Talk to NPC Monroe
Talk to NPC Sayid
GUI Manipulation
Beyond open-world exploration, Lumine can also perform efficient GUI operations through human-like relative mouse movements, achieving a unified interaction between 2D interfaces and the 3D world, a capability that is crucial for generalist agents.
Cook Sweet Madame
Teleport using a Teleport Waypoint
Change the character's weapon
In-Context Learning
Meanwhile, Lumine has demonstrated strong in-context learning abilities. When provided with prior task information or more detailed decomposition steps within the instruction, Lumine can successfully complete a range of tasks that it was previously unable to perform.
Climb the stone pillar on the right and, once you reach the top, collect the blue Anemoculus floating in the air on the left
Switch to Kaeya, continuously use his Elemental Skill (E Skill) to freeze the water surface, and collect the Anemoculus floating ahead
Hit the Iron Chunk and collect the dropped Iron Chunk
Lumine's promising results highlight its strong potential for further scaling. Its impressive zero-shot generalization to unseen missions and even entirely new games, indicates that the model has learned transferable meta-skills, such as 3D navigation and 2D manipulation, that extend beyond the training environments. These findings underscore the promise of Lumine’s approach as a foundation for developing general-purpose decision models and as an ideal starting point for reinforcement learning to achieve superhuman intelligence.