New AI model turns photos into explorable 3D worlds, with caveats

Coaching with automated information pipeline
Voyager builds on Tencent’s earlier HunyuanWorld 1.0, launched in July. Voyager can also be a part of Tencent’s broader “Hunyuan” ecosystem, which incorporates the Hunyuan3D-2 mannequin for text-to-3D technology and the beforehand lined HunyuanVideo for video synthesis.
To coach Voyager, researchers developed software program that mechanically analyzes present movies to course of digital camera actions and calculate depth for each body—eliminating the necessity for people to manually label 1000’s of hours of footage. The system processed over 100,000 video clips from each real-world recordings and the aforementioned Unreal Engine renders.

A diagram of the Voyager world creation pipeline.

Credit score:

Tencent

The mannequin calls for critical computing energy to run, requiring no less than 60GB of GPU reminiscence for 540p decision, although Tencent recommends 80GB for higher outcomes. Tencent printed the mannequin weights on Hugging Face and included code that works with each single and multi-GPU setups.
The mannequin comes with notable licensing restrictions. Like different Hunyuan fashions from Tencent, the license prohibits utilization within the European Union, the UK, and South Korea. Moreover, business deployments serving over 100 million month-to-month energetic customers require separate licensing from Tencent.
On the WorldScore benchmark developed by Stanford College researchers, Voyager reportedly achieved the best total rating of 77.62, in comparison with 72.69 for WonderWorld and 62.15 for CogVideoX-I2V. The mannequin reportedly excelled in object management (66.92), fashion consistency (84.89), and subjective high quality (71.09), although it positioned second in digital camera management (85.95) behind WonderWorld’s 92.98. WorldScore evaluates world technology approaches throughout a number of standards, together with 3D consistency and content material alignment.
Whereas these self-reported benchmark outcomes appear promising, wider deployment nonetheless faces challenges because of the computational muscle concerned. For builders needing sooner processing, the system helps parallel inference throughout a number of GPUs utilizing the xDiT framework. Operating on eight GPUs delivers processing speeds 6.69 occasions sooner than single-GPU setups.
Given the processing energy required and the restrictions in producing lengthy, coherent “worlds,” it could be some time earlier than we see real-time interactive experiences utilizing an identical method. However as we have seen to date with experiments like Google’s Genie, we’re probably witnessing very early steps into a brand new interactive, generative artwork type.

What's Hot

Apple iPhone Air Is Finally Here. I Know God Must Be Real!

Apple Final Cut Camera 2.0 – Open Gate, Apple Log 2 and ProRes RAW, Genlock for iPhone 17 Pro

Moderna CEO Responds to RFK Jr.’s Crusade Against the Covid-19 Vaccine

Moderna CEO Responds to RFK Jr.’s Crusade Against the Covid-19 Vaccine

Why SpaceX made a $17B bet on the direct-to-cell market

After early struggles, NASA’s ambitious mission to Titan is “on track” for launch

5 Steps for Leading a Team You’ve Inherited

A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

Meera Sodha’s vegan recipe for Thai-style tossed walnut and tempeh noodles | Noodles

Apple iPhone Air Is Finally Here. I Know God Must Be Real!

Apple Final Cut Camera 2.0 – Open Gate, Apple Log 2 and ProRes RAW, Genlock for iPhone 17 Pro

Moderna CEO Responds to RFK Jr.’s Crusade Against the Covid-19 Vaccine

Most Popular

SLR reform is happening. Does it matter?

Panthers in awe of Brad Marchand’s ‘will to win’ in Cup run

DOJ Offers Divestiture Remedy in Lawsuit Opposing Merger of Defense Companies

Our Picks

Apple iPhone Air Is Finally Here. I Know God Must Be Real!

Apple Final Cut Camera 2.0 – Open Gate, Apple Log 2 and ProRes RAW, Genlock for iPhone 17 Pro

Moderna CEO Responds to RFK Jr.’s Crusade Against the Covid-19 Vaccine

Subscribe to Updates

What's Hot

New AI model turns photos into explorable 3D worlds, with caveats

Related Posts