Pricing

Optimizing Memory Monitoring in WeChat to Reduce FOOM Occurrences and Improve App Performance

The primary memory monitoring tool for iOS currently is Instruments' Allocations, which can only be utilized during the development stage. This article will explain how to create an offline memory monitoring tool to identify memory-related issues after the app has been launched.

Introduction

FOOM (Foreground Out Of Memory) occurs when an app is forcefully terminated by the system due to excessive memory usage while running in the foreground. To users, it appears as if the app has crashed. In August 2015, Facebook proposed a method for detecting FOOM events. The general principle involves eliminating various scenarios, with the remaining cases being considered as FOOM. Here is the specific link for more information: https://code.facebook.com/posts/1146930688654547/reducing-fooms-in-the-facebook-ios-app/

In late 2015, WeChat introduced a feature called FOOM reporting. Initial data showed that the daily ratio of FOOM occurrences to logged-in users was around 3%, while the crash rate was less than 1% during the same period. In early 2016, a senior executive from JD.com reported frequent WeChat crashes. After analyzing over 2GB of logs, it was discovered that frequent logging of KV reports was causing the FOOM issues. Then, in August 2016, numerous external users reported that WeChat crashed shortly after starting, but after analyzing a large number of logs, the cause of the FOOM problem could not be identified. WeChat urgently needs an effective memory monitoring tool to pinpoint the issue.

Implementation Principle

The initial version of WeChat's memory monitoring used Facebook's FBAllocationTracker tool to monitor Objective-C object allocation and the fishhook tool to hook into malloc/free interfaces to monitor heap memory allocation. Every second, the current number of Objective-C objects, the top 200 largest heap memory allocations, and their allocation stacks were logged as text on the local device. This simple solution was implemented quickly and helped identify a module causing excessive memory usage due to database migration and loading a large number of contacts.

However, this solution had several drawbacks:

1. The monitoring granularity was insufficient as it couldn't track qualitative changes caused by numerous small memory allocations. Moreover, fishhook could only hook into C interface calls made by the app itself, not system libraries.

2. Controlling the log interval was challenging. Longer intervals could miss peak situations, while shorter intervals could cause battery drain, frequent IO, and other performance issues.

3. The original log relied on manual analysis and lacked effective tools for display and problem classification.

To address these issues, the second version of the solution, inspired by Instruments' Allocations, focused on optimizing four aspects: data collection, storage, reporting, and presentation.

1. Data Collection

In September 2016, to solve the iOS 10 nano crash issue, I studied the libmalloc source code and accidentally discovered these interfaces:

When the malloc_logger and __syscall_logger function pointers are not empty, memory operations like malloc/free and vm_allocate/vm_deallocate inform the higher level through these pointers. This is also the basis for the memory debugging tool called malloc stack. Using these two function pointers, we can easily track the memory allocation information of currently active objects, including the allocation size and stack.

To capture the allocation stack, we use the backtrace function. However, the captured address is a virtual memory address and cannot be resolved from the symbol table dsym. Therefore, we also need to record the offset slide when each image is loaded. This way, the symbol table address can be calculated as the stack address minus the slide.

Additionally, to better organize data, each memory object should have its own category, as illustrated in the image above. For heap memory objects, the category name is "Malloc " followed by the allocation size, such as "Malloc 48.00KiB". For virtual memory objects, when calling vm_allocate to create them, the last parameter 'flags' represents the virtual memory type. This 'flags' value corresponds to the first parameter 'type' of the previously mentioned function pointer __syscall_logger. The specific meaning of each flag can be found in the header file <mach/vm_statistics.h>.

For Objective-C objects, the category name is the Objective-C class name. We can obtain this by hooking the Objective-C method +[NSObject alloc]:

Later, it was found that the class static method of creating NSData objects did not call +[NSObject alloc]. Instead, it called the C method NSAllocateObject to create objects, which means that the OC objects created by this method cannot be obtained by hooking the OC class name. Finally, the answer was found in Apple's open-source code CF-1153.18. When __CFOASafe=true and __CFObjectAllocSetLastAllocEventNameFunction!=NULL, CoreFoundation creates objects and then uses this function pointer to tell the upper layer what type the current object is:

Through the above method, our monitoring data source is the same as that of Allocations, of course, with the help of private APIs. If we don't have enough "skills", the private API can't bring the Appstore, so we have to take the second place. Modify malloc_ Default_ Malloc returned by zone function_ Zone_ The malloc, free, and other function pointers in the t structure can also monitor heap memory allocation, with the same effect as malloc_ Logger; The virtual memory can only be allocated through fishhook.

Data Storage

Living Object Management

During an app's runtime, a significant amount of memory is allocated and released. For instance, within 10 seconds of launching WeChat, 800,000 objects are created and 500,000 are released, making performance optimization crucial. To minimize memory allocation and release during the storage process, a lightweight balanced binary tree is used instead of sqlite.

A Splay Tree, also known as a split tree, is a type of binary search tree that does not guarantee perfect balance but has an average time complexity of O(logN) for various operations. This makes it comparable to a balanced binary tree. Splay Trees have a smaller memory footprint and do not require storing additional information compared to other balanced binary trees, such as red-black trees. The primary concept behind splay trees is the principle of locality, which means that a recently accessed node is likely to be accessed again, or a frequently accessed node is likely to be accessed in the future. To reduce overall search time, commonly queried nodes are moved closer to the tree root through a "splay" operation. In most cases, memory is quickly allocated and released, such as with autorelease objects and temporary variables. After an Objective-C object allocates memory, it immediately updates its associated category. Therefore, using a splay tree for management is highly suitable.

Traditional binary trees are implemented using linked lists, and each time a node is added or removed, memory is allocated or released. To reduce memory operations, binary trees can be implemented using arrays. This approach involves changing the left and right children of the parent node from pointer types to integer types, representing the children's index in the array. When deleting a node, the deleted node stores the index of the array where the previously released node was located. This optimized method makes it easier for users to understand and manage data storage and living object management.

Stack Storage

While operating WeChat, statistics show that there are millions to tens of millions of backtrace stacks, with an average stack length of 35 when capturing the maximum stack length of 64. If we use 36 bits to store an address (the maximum virtual memory address of ARMv8 is 48 bits, but 36 bits are actually sufficient), the average storage length of a stack is 157.5 bytes. To store 1 million stacks, we would need 157.5 MB of storage space. However, upon further examination using breakpoints, it has been observed that most stacks share common endings, such as the last 7 addresses being the same in the following two stacks:

For this purpose, a Hash Table can be utilized to store these stacks. The concept involves inserting the entire stack into the table as a linked list, where each linked list node contains the current address and the index of the table where the previous address is located. When inserting an address, first calculate its hash value to determine the index in the table. If the corresponding index does not have any stored data, record the linked list node; if there is stored data and it matches the linked list node, then the hash is a hit, and continue processing the next address. If the data is inconsistent, it indicates a hash conflict, and the hash value must be recalculated until the storage conditions are met. Here's a simplified example of hash calculation:

1) G, F, E, D, C, A of Stack1 are sequentially inserted into the Hash Table, and the index 1 to 6 node data are (G, 0), (F, 1), (E, 2), ( D, 3), (C, 4), (A, 5). Stack1 index entry is 6

2) It is the turn to insert into Stack 2. Since the data of G, F, E, D, and C nodes are consistent with the first 5 nodes of Stack1, the hash hits; B is inserted into the new position No. 7, (B, 5). Stack2 index entry is 7

3) Finally insert Stack3, G, F, E, D nodes hash hit; but because the previous address D index of Stack3 A is 4, not the existing (A, 5), the hash does not hit, look for the next Blank position 8, insert node (A, 4); the previous address A index of B is 8, not the existing (B, 5), hash misses, find the next blank position 9, insert node (B, 9). Stack3 index entry is 9

After such suffix compression storage, the average stack length is shortened from 35 to less than 5. The storage length of each node is 64bits (36bits storage address, 28bits storage parent index), the hashTable space utilization rate is 60%+, the average storage length of a stack only needs 66.7bytes, and the compression rate is as high as 42%.

Performance Data

After the above optimization, the CPU usage of the memory monitoring tool running on iPhone6Plus is less than 13%, which is related to the data volume, and the usage of heavy users (such as too many groups and frequent messages) may be slightly higher. However, the memory occupied by storing data is about 20M, and files are mapped to memory by mmap. About the benefits of mmap, you can use google search yourself.

Data Reporting

Since memory monitoring stores the memory allocation information of all currently surviving objects, and the amount of data is huge when FOOM occurs, it is impossible to report in full, but selectively report according to certain rules.

First, classify all objects by Category, and count the number of objects and allocated memory size of each Category. There is very little data in this list, and full reporting can be done. Then merge all the same stacks under Category, and calculate the number of objects and memory size of each stack. For certain categories, such as allocation size TOP N, or UI-related (such as UIViewController, UIView, etc.), it only allocates a stack of size TOP M for reporting. 

Page Display

The page display refers to Allocations, and you can see which categories, each category allocates the size and number of objects, and some categories can also see the allocation stack.

In order to highlight the problem and improve the efficiency of problem-solving, the background first finds out the categories that may cause FOOM according to the rules (such as the Suspect Categories above). The rules are:

● Whether the number of UIViewController is abnormal

● Whether the number of UIViews is abnormal

● Whether the number of UIImages is abnormal

● Whether the allocation size of other categories is abnormal, whether the number of objects is abnormal

Then calculate the characteristic value for the suspicious Category, which is the reason for OOM. The characteristic value is composed of "Caller1", "Caller2" and "Category, Reason". Caller1 refers to the application memory point, and Caller2 refers to the specific scenario or business. They are all extracted from the stack with the first allocation size under Category. Caller1 extraction is as meaningful as possible, not the previous address of the allocation function. For example:

After calculating the feature values of all reports, they can be classified. The first-level classification can be Caller1 or Category, and the second-level classification is the feature aggregation related to Caller1/Category. The effect is as follows:

First-level Classification

Second-level Classification

Operation Strategy

As mentioned above, memory monitoring will bring a certain performance loss. At the same time, the amount of data reported is about 300K each time. Full reporting will put a certain pressure on the background, so sampling is enabled for live network users, grayscale package users/internal users /Whitelist users do 100% open. Only the most recent three data are kept locally.

Reduce Misjudgments

Let's first review how Facebook determines whether FOOM occurred during the last startup:

1. The app has not been upgraded

2. App did not call exit() or abort() to exit

3. App does not crash

4. The user did not force quit the app

5. The system is not upgraded/rebooted

6. The app was not running in the background at the time

7. App appears FOOM

1, 2, 4, and 5 are relatively easy to judge, 3 depend on the crash callback of its own CrashReport component, 6, 7 depend on ApplicationState and front-background switching notification. Since WeChat went online to report FOOM data, there have been many misjudgments. The main situations are as follows:

ApplicationState is not accurate

Some systems will briefly wake up the app in the background, and the ApplicationState is Active, but not BackgroundFetch; it will exit after executing didFinishLaunchingWithOptions, and it will also receive a BecomeActive notification, but it will exit soon; the entire startup process lasts for 5 to 8 seconds. The solution is to consider this startup as a normal foreground startup after receiving the BecomeActive notification for one second. This method can only reduce the probability of misjudgment, but cannot completely solve it.

Group control plug-in

This type of plug-in is software that can remotely control the iPhone. Usually, one computer can control multiple mobile phones. The computer screen and the mobile phone screen can be synchronized in real time, such as opening WeChat, automatically adding friends, posting to Moments, and forcing out of WeChat. This process is easy. produce misjudgments. The solution can only reduce this type of misjudgment through security background attacks.

The CrashReport component crashes and there is no callback to the upper layer.

WeChat once experienced a large number of GIF crashes at the end of May 2017. The crash was caused by memory out of bounds. However, when the crash signal was received and the crashlog was written, the component could not write the crashlog normally due to the damage to the memory pool, and even caused a secondary crash; the upper layer could not receive it either. crash notification, so it was misjudged as FOOM. Currently, it has been changed to not rely on crash callbacks. As long as the last crashlog exists locally (regardless of whether it is complete or not), the APP restart is considered to be caused by the crash.

The front desk is stuck, causing the system watchdog to kill

This is the common 0x8badf00d, which is usually caused by too many foreground threads, deadlock, or persistently high CPU usage. This type of forced killing cannot be captured by the App. To this end, we combined the existing lag system and caught the lag at the last moment of the front-end operation. We believe that this startup was forcibly killed by watchdog. At the same time, we classified a new restart reason from FOOM called "APP front desk stuck leading to restart", and included it as a key concern.

Achievements

Since WeChat launched its online memory monitoring in March 2017, it has solved more than 30 large and small memory problems, involving chat, search, friends circle and other businesses. The FOOM rate has dropped from 3% at the beginning of 17 years to 0.67% at present, while the front desk card death rate has dropped from 0.6% to 0.3%, and the effect is particularly obvious.

Common Problem

UIGraphicsEndImageContext

UIGraphicsBeginImageContext and UIGraphicsEndImageContext must appear in pairs, otherwise, it will cause context leakage. In addition, XCode's Analyze can also scan out such problems.

UIWebView

Whether it is opening a web page or executing a simple js code, UIWebView will occupy a lot of memory in the APP. WKWebView not only has excellent rendering performance, but also has its own independent process. Some web page-related memory consumption is moved to its own process, which is most suitable to replace UIWebView.

autoreleasepool

Usually autoreleased objects are released at the end of the runloop. If a large number of autoreleased objects are generated in the loop, the memory peak will soar, and OOM may even occur. Properly adding autoreleasepool can release memory in time and reduce the peak value.

Reference to each other

The place where mutual references are more likely to occur is when self is used in a block, and self holds the block. This can only be avoided through code specifications. In addition, NSTimer's target and CAAnimation's delegate are strong references to Obj. Currently, WeChat uses its own implementation of MMNoRetainTimer and MMDelegateCenter to avoid such problems.

Large picture processing

For example, in the past, the image zoom interface was written like this:

However, when dealing with large-resolution images, OOM is often prone to occur. The reason is that when drawing - [UIImage drawInRect:], the image is first decoded and then a bitmap of the original resolution size is generated, which consumes a lot of memory. The solution is to use a lower-level ImageIO interface to avoid intermediate bitmap generation:

Big View

A large view means that the size of the View is too large and it contains the content to be rendered. Very long text is a common group message in WeChat, usually thousands or even tens of thousands of lines. If you draw it into the same View, it will consume a lot of memory and cause serious lagging. The best way is to divide the text into multiple View drawings and use the reuse mechanism of TableView to reduce unnecessary rendering and memory usage.

 

Latest Posts
1Enhancing Game Quality with Tencent's automated testing platform UDT, a case study of mobile RPG game project We are thrilled to present a real-world case study that illustrates how our UDT platform and private cloud for remote devices empowered an RPG action game with efficient and high-standard automated testing. This endeavor led to a substantial uplift in both testing quality and productivity.
2How can Mini Program Reinforcement in 5 levels improve the security of a Chinese bank mini program? Let's see how Level-5 expert mini-reinforcement service significantly improves the bank mini program's code security and protect sensitive personal information from attackers.
3How UDT Helps Tencent Achieve Remote Device Management and Automated Testing Efficiency Let's see how UDT helps multiple teams within Tencent achieve agile and efficient collaboration and realize efficient sharing of local devices.
4WeTest showed PC & Console Game QA services and PerfDog at Gamescom 2024 Exhibited at Gamescom 2024 with Industry-leading PC & Console Game QA Solution and PerfDog
5Purchase option change notification Effective from September 1, 2024, the following list represents purchase options will be removed.