iOS Memory Deep Dive for Unity Developers

This post was first published at Medium

0x00 Description

There are often a lot of people wondering why the memory data provided by Unity’s profiler is different from the memory data provided by some native profilers, such as Xcode for iOS. And people are interested in how to analyze the data from these native tools, such as Xcode for iOS. I watched the session Developing and optimizing a procedural game | The Elder Scrolls Blades — Unite Copenhagen recently. The session covered some topics related to iOS memory and Unity. And there is an amazing session last year from WWDC2018, you can find some useful links at the end of the article. However, as for our Unity developers, what should we do to solve iOS memory related problems?

Next, I will discuss how a Unity developer can handle iOS memory related issues at work. The main content includes analyzing the memory management of the iOS system, using the Instrument to view the memory status of a Unity game, and using the command line tools to dig deep into the memory problems in a Unity game.

0x01 iOS memory management — Is Unity Profiler wrong?

First of all, I want to emphasize that the memory data provided by a profiler tool is only a (set) number, and different tools have different strategies to analyze memory. Therefore, an important question is how exactly do the data we see are obtained?

Depending on the tool we used, the final result may be different. Therefore, if you are looking for a number to summarize all the memory information of an application or game, you may simplify the problem or ignore the complexity of an operating system. For example, different versions of iOS have different statistics on memory overhead. The memory used number of a metal app running on iOS12 is larger than running on iOS11 in the Xcode memory gauge. This is because Apple has changed the statistical strategy for memory, and many of the memory that has not been counted before is now also calculated into memory overhead.

The accounting for purgeable, nonvolatile memory changed beginning in iOS 12 and tvOS 12. In iOS 11 and tvOS 11, allocations with this memory storage mode — commonly used by Metal apps to store buffers, textures, and state objects — weren’t counted toward an app’s memory limit and weren’t presented in tools like Xcode memory gauge.1

Also on the iOS platform, the data from Xcode memory gauge and the data from Instrument may not be exactly the same. And the early Instrument Allocation tool was mainly used to count heap memory. Therefore, instead of wasting time comparing data from different tools, it is better to use the same tool as a measure of memory overhead or to determine whether memory optimization is effective.

So, it’s important to understand how the operating system manages memory, and how to interpret the data provided by a profiler tool. Next, let’s discuss the memory management mechanism of the iOS system, and then look at the memory data captured by Xcode and the memory data captured by Unity.

First, each process will have an address space. Its range is supported by pointer size, such as 32bit or 64bit. And the address space is first divided into multiple regions, and then subdivided into 4K in size (early version) or 16KB (after A7) in size pages, these pages inherit the various attributes of the region, such as read-only, readable & writable, etc. Of course, some pages may store less data than the size of a page, and some data may take several pages to store. So the memory overhead of your app or game is equal to the number of pages multiplied by the page size.

Of course, the system also has real physical memory.

Virtual memory vs Resident memory

ref: WWDC 2013

Virtual memory allows us to map from this address space to real physical memory, which I think you should know. Mapping is actually a very interesting thing. Because from the perspective of each app process, it has all the memory, that is, virtual memory, but in fact only a part of the virtual memory is mapped to the real physical memory, the part which is mapped to the physical memory is called Resident Memory.

As showed in the figure above, an app allocates memory and 4 regions are allocated on virtual memory, and the third region includes 13 pages. But at this point, there are only 6 pages that are actually mapped to physical memory. The mapping of virtual memory to real physical memory occurs when the first use of memory, such as reading data from memory or writing data to memory. Resident memory is also virtual memory, but this part of virtual memory has been mapped to real physical memory. I think you may see similar data in Xcode or Instrument. For example, Instrument’s VM Tracker lists Resident and Virtual Size respectively.

Dirty memory vs Clean memory

A page may be dirty or clean. How to distinguish whether a page is dirty or clean? Simply put, your app or game has modified the content of this page, allocated memory and also modified the contents of the memory, then this page is dirty. Dirty memory can’t be discarded, because the data obviously needs to be stored in memory to keep the program running.

In contrast, pages that have not been modified for their content are clean page, and clean page can be discarded and recreated by the system. For example, the memory-mapped file, if the operating system needs more memory, it can be discarded. Because the system can always reload it from disk, create a mapping between memory space and files on disk. However, although the memory-mapped file does not consume real physical memory, it consumes the virtual memory of the process2.

In addition to this, there are __TEXT sections of executable files and some DATA CONST sections of frameworks, which are also classified as clean memory.

At WWDC 2018, Apple’s engineers gave a very vivid example. That is, an array of 20,000 integers is allocated, and pages are created. If only the first element and the last element are assigned, the first page and the last page, that is, the pages where the first and last elements are located, will become dirty, but the pages between the first and last are still clean.

ref: WWDC 2018

Compressed memory

When more memory is needed, the system will discard the clean page. But the dirty page can’t be discarded, so what if the dirty memory is too much? Prior to iOS 7, if the process had too much dirty memory, the system would terminate the process directly. After iOS 7, the mechanism of Compressed Memory was introduced. Since iOS doesn’t have a traditional disk swap mechanism (mac OS has), the Swapped Size we saw in Apple’s Profiler tool is actually Compressed Memory.

Since iOS 7, the operating system can compress the dirty memory by using the memory compressor. The memory compressor compresses dirty pages (multiple pages) that have not been accessed for a while. However, when this memory is accessed again, the memory compressor will decompress it for proper access.

Is Unity Profiler wrong?

It can be seen that from the perspective of operating system memory management, the memory of a process is actually very complicated. The memory data recorded by Unity, for example, “Reserved Total — Unity”, is mainly from the record of the MemoryManager in the engine.

MemoryManager will call the corresponding Allocator to allocate memory for the engine according to different situations.

For example, we can use the free Unity 3D Game Kit project as an example, use Instrument to check its memory allocation.

You can see that the MemoryManager calls UnityDefaultAllocator. And the figure below shows IphoneNewLabelAllocator is called to allocate memory.
That is to say, the memory allocated by Unity’s code, Unity will record. But we can see that in addition to the memory allocated by Unity’s code, there are many frameworks or third-party libraries that allocate memory. And these memory allocations will not be recorded by Unity.

0x02 Use Instrument to review memory used in a Unity game

In this part, I recommend Valentin Simonov’s article “Understanding iOS Memory (WiP)”, which is great about using some tools to review memory used in a Unity game.

0x03 Use command line tools to dig deeper into memory problems

In addition to using Instrument to investigate memory issues, we can also profile memory problems with the great Xcode memory debugger tool. In particular, after exporting the Memgraph file, you can use a variety of command line tools to assist in the investigation to get more information.

And sometimes people will complain that the memory data seen on Xcode’s Memory Report page is not only different from Unity Profiler, but sometimes even different from Apple’s own performance tools such as Instrument. As mentioned above, it is normal for different tools to have different data. But we can also use the Memgraph file and the command line tool to check what the data of the Memory Report focuses on.

Still using the Unity 3D Game Kit project as a demonstration, the test device is iPhone X, but before we start, we need to enable the Scheme -> Run -> Diagnostics -> Malloc Stack option.

After running the game, click on Start Game to load the first scene, and we can see in the Memory Report that the memory has reached 1.48G at this time. However, the memory gauge is still in the green part, so the fact that the memory gauge is not a good optimization suggestion, because this memory overhead on iPhone7 will directly lead to the game being terminated by the system.

Animation Leak?

We go directly to the Xcode memory debugger. If you want to check if there is a memory leak here, you can click on the option in the Filter. There is a common “fake leak” situation here.

If we look at its call stack, it’s mostly related to Animation. I consulted a developer from Animation-Dev team on this issue, confirming that Xcode reports false memory leak in this case, and the memory block is still referenced by the allocator. These memory will be freed when the whole block is freed.

Of course, if you encounter other strange engine-related leaks, you can submit a bug report to Unity as described in this article.3

Then you can export the data as a .memgraph file and you can use some command line tools to process it.

VMMAP Summary

The first command line tool is vmmap, which allows us to view the current virtual memory data.

When we get a memgraph file, we can consider using this command with the — summary flag to output an overview of the current virtual memory.

vmmap --summary Unity3DKit_ipx.memgraph

The output of the terminal is shown below:

We can find something interesting. First of all, the first four columns are what we discussed before: VIRTUAL SIZE, RESIDENT SIZE, DIRTY SIZE, SWAPPED SIZE.

We can see the TOTAL part. This game process allocates 2.7G of virtual memory, of which 1.6G is mapped to physical memory, and the DIRTY SIZE value is 1.4G — this value is very close to the value in the Memory Report, and SWAPPED SIZE is 52mb. And this value is the the pre compressed size of memory, not what it compressed down to. Therefore, we mainly pay attention to the DIRTY SIZE item.

IOKit

Secondly, we can see that IOKit has the most overhead. Its virtual memory not only reaches 832.5mb, but also the size mapped to physical memory reaches 750.4mb. This part is mainly some memory related to GPU, such as render targets, textures, meshes, compiled shaders and so on.

MALLOC and Heap

Again, we can see that MALLOC_** allocates a lot of memory. This part of the memory is mainly allocated by calling Malloc, which includes the allocation by Unity’s native C++ code, as well as the memory allocated by the third-party library and system using Malloc. This memory is stored in the so-called Heap. You can find these words “see MALLOC ZONE table below” , that is, you can find a categorization of each heap zone below. Here we can use the second command line tool heap to check the contents of Heap memory.

heap --sortBySize Unity3DKit_ipx.memgraph

When using the heap command, we can add the — sortBySize flag to sort the data by size, otherwise the default is sorted by the number of type instances.

In the above figure, you can see that most of Heap’s memory is occupied by non-object, reaching nearly 700mb, and the objects memory allocation is small, for example, there are 573 instances of GpuProgramMetal, but they only take up 223kb.

I think you might be interested in the content of non-object, but we can’t find more information in this screenshot. So next we can add the — showSize flag to group the data by size.

heap --showSize --sortBySize Unity3DKit_ipx.memgraph

This is much clearer.

As you can see, in the non-object category, the highest ranked memory allocations are a 30mb allocation, three 10mb allocations, and a 8mb allocation. Next we will profile these memory allocations.

Of course, the heap command also provides more functions, such as those allocations with Class Name, we can get the memory address of each instance of a type through ClassName matching. Just add the — addresses flag. For example, we can print the address of all GpuProgramMetal instances. We can see that the instance of this class is not large, but the real shader resource it refers to may be a big memory overheads.

heap -addresses GpuProgramMetal Unity3DKit_ipx.memgraph

With the memory address of each object, we can find out how they came from by the malloc_history command mentioned below. But now we are turning our attention to these relatively large memory allocations.

Let’s back to the terminal, and print the virtual memory information, but this time we only focus on the allocation from MALLOC_LARGE, so we can use grep to filter out our target.

vmmap -verbose Unity3DKit_ipx.memgraph | grep "MALLOC_LARGE"

Now we get the memory data of MALLOC_LARGE, including its address, size, and information about the Heap Zone. We can find our targets here, a 30mb allocation, three 10mb allocation and an 8mb allocation.

Let’s take a look at the stack calls that allocate them. Here we will use the malloc_history command with the — fullStacks flag to output the stack information.

malloc_history Unity3DKit_ipx.memgraph --fullStacks 0x0000000127c60000

You can see that this 30mb allocation is to allocate a memory pool for FMOD.

The other three 10mb allocations are also doing similar things. Finally, let’s take a look at where this 8mb allocation comes from.

malloc_history Unity3DKit_ipx.memgraph --fullStacks 0x0000000113400000

You can see that the memory allocated by Unity for creating a CommandQueue when the multithreaded rendering is enabled.

VM_ALLOC == Mono Size?

Next, we can see there is a section of the result from the vmmap –summary output is called VM_ALLOC. According to Valentin Simonov, VM_ALLOC corresponds to the size of Mono memory, which is managed memory. Is it true? We can look at the memory allocation call stack in the VM_ALLOC section in the same way as above.

vmmap -verbose Unity3DKit_ipx.memgraph | grep "VM_ALLOC"

The screenshot shows that these memory allocations are relatively small, and we also choose the largest block to start.

We first use malloc_history to profile the 3mb part.

malloc_history Unity3DKit_ipx.memgraph --fullStacks 0x0000000152bd4000

We can see that this 3mb memory is allocated by the SimplFXSynth.RenderAudio method in the C# script, which triggers the GC allocation, and the managed heap is expanded.

Interesting, then let’s take a look at how the 1mb memory is allocated.

malloc_history Unity3DKit_ipx.memgraph --fullStacks 0x0000000150084000

This time it’s the Unity’s ScriptingGCHandle::Acquire method to allocate memory on the managed heap.

Therefore, the memory of VM_ALLOC does correspond to the Unity’s Mono managed heap. Specifically, you can find out which function triggers GC allocation by malloc_history command.

Commands Summary

Now, using the command line tools to profile and find memory problems on the iOS platform is complete. Let’s make a simple summary, after getting the . Memgraph file from a Unity game, you can first view the memory summary through vmmap — summary. For the heap, which is the memory allocated by malloc, it can be further analyzed by the heap command. Once the target object’s memory address is obtained, you can use the malloc_history command to get the call stack information for the allocated memory. Of course, remember to enable the Malloc Stack in Xcode. After that, you can make an automated analysis tool to process the data to locate memory problems.

0x04 “Post-credits scene”


  1. Reducing the Memory Footprint of Metal Apps ↩︎

  2. Mapping Files Into Memory ↩︎

  3. Attaching your project to a bug report ↩︎


Subscribe To Jiadong Chen's Blog

Avatar
Jiadong Chen
Cloud Architect/Senior Developer

Cloud Architect at Company-X | Microsoft MVP, MCT | Azure Certified Solutions Architect & Cybersecurity Architect Expert | Member of .NET Foundation | Packt Author ㅣ Opinions = my own.

comments powered by Disqus

Related