3D Asset Generation: AI for Game Development #3

Published January 20, 2023

This article is also available in Chinese 简体中文.

Welcome to AI for Game Development! In this series, we'll be using AI tools to create a fully functional farming game in just 5 days. By the end of this series, you will have learned how you can incorporate a variety of AI tools into your game development workflow. I will show you how you can use AI tools for:

Art Style
Game Design
3D Assets
2D Assets
Story

Want the quick video version? You can watch it here. Otherwise, if you want the technical details, keep reading!

Note: This tutorial is intended for readers who are familiar with Unity development and C#. If you're new to these technologies, check out the Unity for Beginners series before continuing.

Day 3: 3D Assets

In Part 2 of this tutorial series, we used AI for Game Design. More specifically, we used ChatGPT to brainstorm the design for our game.

In this part, we'll talk about how you can use AI to generate 3D Assets. The short answer is: you can't. That's because text-to-3D isn't at the point it can be practically applied to game development, yet. However, that's changing very quickly. Keep reading to learn about The Current State of Text-to-3D, Why It Isn't Useful (yet), and The Future of Text-to-3D.

The Current State of Text-to-3D

As discussed in Part 1, text-to-image tools such as Stable Diffusion are incredibly useful in the game development workflow. However, what about text-to-3D, or generating 3D models from text descriptions? There have been many very recent developments in this area:

DreamFusion uses 2D diffusion to generate 3D assets.
CLIPMatrix and CLIP-Mesh-SMPLX generate textured meshes directly.
CLIP-Forge uses language to generate voxel-based models.
CLIP-NeRF drives NeRFs with text and images.
Point-E and Pulsar+CLIP use language to generate 3D point clouds.
Dream Textures uses text-to-image to texture scenes in Blender automatically.

Many of these approaches, excluding CLIPMatrix and CLIP-Mesh-SMPLX, are based on view synthesis, or generating novel views of a subject, as opposed to conventional 3D rendering. This is the idea behind NeRFs or Neural Radiance Fields, which use neural networks for view synthesis.

What does all of this mean if you're a game developer? Currently, nothing. This technology hasn't reached the point that it's useful in game development yet. Let's talk about why.

Why It Isn't Useful (yet)

Note: This section is intended for readers who are familiar with conventional 3D rendering techniques, such as meshes, UV mapping and photogrammetry.

While view synthesis is impressive, the world of 3D runs on meshes, which are not the same as NeRFs. There is, however, ongoing work on converting NeRFs to meshes. In practice, this is reminiscient of photogrammetry, where multiple photos of real-world objects are combined to author 3D assets.

The practical use of assets generated using the text-to-NeRF-to-mesh pipeline is limited in a similar way to assets produced using photogrammetry. That is, the resulting mesh is not immediately game-ready, and requires significant work and expertise to become a game-ready asset. In this sense, NeRF-to-mesh may be a useful tool as-is, but doesn't yet reach the transformative potential of text-to-3D.

Since NeRF-to-mesh, like photogrammetry, is currently most suited to creating ultra-high-fidelity assets with significant manual post-processing, it doesn't really make sense for creating a farming game in 5 days. In which case, I decided to just use cubes of different colors to represent the crops in the game.

Things are changing rapidly in this area, though, and there may be a viable solution in the near future. Next, I'll talk about some of the directions text-to-3D may be going.

The Future of Text-to-3D

While text-to-3D has come a long way recently, there is still a significant gap between where we are now and what could have an impact along the lines of text-to-image. I can only speculate on how this gap will be closed. There are two possible directions that are most apparent:

Improvements in NeRF-to-mesh and mesh generation. As we've seen, current generation models are similar to photogrammetry in that they require a lot of work to produce game-ready assets. While this is useful in some scenarios, like creating realistic high-fidelity assets, it's still more time-consuming than making low-poly assets from scratch, especially if you're like me and use an ultra-low-poly art style.
New rendering techniques that allow NeRFs to be rendered directly in-engine. While there have been no official announcements, one could speculate that NVIDIA and Google, among others, may be working on this.

Of course, only time will tell. If you want to keep up with advancements as they come, feel free to follow me on Twitter. If there are new developments I've missed, feel free to reach out!

Click here to read Part 4, where we use AI for 2D Assets.

Attribution

Thanks to Poli @multimodalart for providing info on the latest open source text-to-3D.

Our Transformers Code Agent beats the GAIA benchmark!

By July 1, 2024 • 21

Welcome Gemma 2 - Google's new open LLM

By June 27, 2024 • 83

Upvote