Pricing

Enhancing App Performance: A Comprehensive Guide to Java Memory Optimization

In this article, we dive into the world of Java memory optimization, exploring various strategies and techniques to identify and address performance bottlenecks, reduce memory usage, and ultimately improve the overall performance of your app.

Introduction

In the article "I Reduced 26.5M Java Memory!" the first phase of memory optimization has come to an end. The main tasks were to create several tools for analyzing memory issues, locate various types of memory usage in processes, and analyze the reasons for OOM (Out of Memory) when creating threads. Of course, the most important thing was to optimize the memory usage of the process in its idle state (reducing 26M+). The second phase is based on the first phase, pushing for the resolution of discovered issues in SDKs, and most importantly, optimizing the dynamic Java memory usage of the process!

Generally speaking, no matter what kind of performance optimization is being done, it cannot escape the three-step process of performance optimization:

1. Identify performance bottlenecks

2. Analyze optimization solutions

3. Implementation & optimization

The above three steps may seem like the third step is the most crucial in determining the optimization results, but in fact, based on the author's several performance optimization experiences, identifying bottlenecks holds the most significant influence!

● Whether or not bottlenecks can be found determines whether optimization can proceed.

● The worse the performance of the identified bottlenecks, the more apparent the optimization effect.

● The more bottlenecks found, the better the optimization effect.

How to find bottlenecks?

In terms of analysis methods, mainly:

● Analyze code logic, check problematic logic, and optimize it accordingly.

● Simulate user operations, dump memory when memory usage is high, and analyze with MAT (Memory Analyzer Tool)

● Then there are methods for analyzing HeapDump:

1. Look at the Dominator Tree to determine the instance with the most memory usage

2. Use GC root to help analyze the source of memory usage

3. Quantify memory usage analysis through RetainHeapSize

Dynamic memory optimization is more challenging than static optimization, and the difficulty lies in the word "dynamic." The dynamic nature not only makes it difficult to find bottlenecks but also makes it less obvious to compare optimization results. Different environments, operation paths, devices, and usage habits may all lead to different memory usage. A possible situation is that the found performance bottlenecks are different from the actual user operations, resulting in the inability to resolve OOM issues in the external network. Therefore, obtaining real data from mobile users is the most effective method.

So, another approach is taken to collect real user data.

● Dump memory when an OOM occurs on the phone and upload it to the backend for subsequent analysis

Measure 1: Optimize the existing code logic, targeting scenarios with excessive/unreasonable memory usage. This is the primary scenario.

Measure 2: Mainly analyze the OOM scenarios under the usage habits of external network users. It is relatively easy to find scenarios where bugs cause an excessive amount of memory usage in an instant.

What bottlenecks are found?

There are many bottlenecks found. Let's sort them out according to the classification:

1. Data loaded into memory but not used (or not used yet)

1) PullToRefreshListView's Loading and Empty View: lazyLoad. This is a pull-to-refresh component with a frame animation that contains many images and occupies a lot of memory.

2) Minibar PlayListView. Every page has a Minibar, but not every Minibar will open the playlist.

3) AsyncImageView's default and failure images are loaded directly into memory as Drawables.

2. UI-related data not released in time

1) 24-hour live broadcast room data, only useful when switching programs

2) Bullet screen, only useful when displaying bullet screen in the playback page

3) Playback page TransitionBackgroundManager large image memory usage issue. This is a large image used for gradient animation.

3. Unreasonable data structure, excessive memory usage

1) Playback history records up to 600 program information, with each ShowInfo occupying up to 22K of memory (viewed through MAT RetainHeap)

2) Download management stores user-downloaded program information, lyrics, and album information in memory, occupying 12K, 0-10K, and 12K, respectively. There is no quantity limit here.

4. Excessive memory usage by images

1) Operating on the main page of the app, it is found that images (Bitmaps) occupy a lot of memory

2) Gaussian blur images.

5. Bugs causing excessive memory usage

Due to a code logic bug in the playback history, the record quantity limit is not controlled. As a result, the more programs the user listens to, the larger the memory usage. The main issue here was discovered through OOM reporting, with the highest memory usage reported at 50M just for playback history records.

Points 1-4 were found through Measure 1, actively checking memory. The 5th point was "accidentally" discovered while analyzing OOM reports. If it were through Measure 1, it would be almost impossible to know that so many OOMs were caused by this issue.

How to optimize bottlenecks?

After identifying the problems, the remaining tasks are relatively easy to do. Just follow the clues and tackle them one by one!

1. LazyLoad

For 1.1 and 1.2 mentioned above, LazyLoad can be implemented, creating related instances only when pull-to-refresh or displaying the playlist is needed.

For 1.4, the related Bitmaps can be cleared after the animation ends.

1.3 is a bit more complicated. The image loading component can provide a default image, temporarily displayed during the image loading process, and a failed image, displayed after the image loading fails. Both images in AsyncImageView are directly referenced as Drawables. In fact, most scenarios will display successful images. Therefore, the modification here is:
AsyncImageView's default/failed images no longer reference drawables but resource IDs. The ImageLoader loads them into memory when needed, and these images are managed by ImageCache, occupying memory LRU space (previously managed by Resource).

Here, the memory usage of several large images is removed. The memory usage is at the level of several MBs.

2. Timely release

In 2.1 above, the 24-hour live broadcast room data is always in memory, even if the user is not currently listening to the 24-hour live broadcast room. This is clearly unreasonable.

The modification is to cache the business data in the DB and query it from the DB when needed.

2.2's bullet screen is purely UI-related data, which can be released after exiting the playback page.

2.3 is a large image prepared for animation to create a cool animation effect. In fact, it can be released after the animation ends. The image's memory usage is related to the phone's resolution (strictly speaking, density). The higher the resolution, the larger the image size. On mainstream phones with 1080p, it's about 1M.

Here, 287K + 512K + 1M are reduced.

3. Optimize data structure

Both 3.1 and 3.2 store program information, and the related jce structures are quite large. Through MAT, it can be seen that Show: 12K, Album: 10K, and one ShowInfo contains both data structures.

The most reasonable approach should be:

1. Store data in the DB

2. When data is needed, query the specific data from the DB with a single query.

However, since the existing code queries from memory and the interface is synchronous, changing everything to asynchronous would be costly, and we have limited time and testing freedom.

Considering the results of the MAT analysis, one idea is:

Store the minimal program information (ShowMeta) in memory, such as program name, program ID, album ID, etc. Store the actual Show and Album structures in the DB.

This way, the data in memory can be as small as possible, and most existing interfaces can still maintain synchronous calls.

Moreover, from the user's perspective, assuming a heavy user downloads 1000 programs, the memory occupied by each ShowMeta will be magnified by 1000 times. So, optimizing ShowMeta to the extreme is not excessive.

Two things were done here:

1. Delete fields: Remove unnecessary fields from ShowMeta.
For example, the URL field is only used to generate a file name through a hash. We can replace it with the showId. A URL can be up to 500 bytes long, so for 1000 ShowMetas, this can save 500K of memory!

Another example: the dowanloadTaskId field is used to store the download task ID. After the program is downloaded, the field loses its meaning, so it can be deleted.

2. Intern: This is inspired by String.intern. Different ShowMetas may have the same fields or parts of the same fields.

For example, ShowMetas in the same album will have the same albumId field. We only need to keep one albumId, and other ShowMetas can use the same instance (Memory optimization phase 1 made the same transformation for ShowList).

Another example: ShowMeta stores the full path of the downloaded file. In fact, all programs are stored in the same file directory. Here, the file path is split into directory + file name for storage, and the path is interned to ensure that there is only one copy in memory.

Before optimization:

Optimized:

The most intuitive change is the memory usage from 14272B to 120B. A closer look reveals that the retainHeap of ShowRecordMeta is not equal to the sum of the memory usage of each field. This is because of the String intern mentioned earlier, which reuses the same fields. Therefore, the retainHeap here is not accurate. By calculating through RecordDataManager/countof(records), the average memory usage per record is 14800/60 = 247B, a reduction of 98%.

The modification results here:

Playback history ShowHistoryBiz -> ShowHistoryMeta memory usage reduced from 19k to approximately 216B

Download record ShowRecordBiz -> ShowRecordMeta memory usage reduced from 14k to approximately 100B

Roughly estimating, the modified playback history (each playback adds a record, up to 600 records) can reduce the memory usage by (19256-216) * 600 = 10.9M

And for download records (assuming a light user downloads 100 programs), the total memory reduction is:
(14727-100) * 100 = 1.4M

For heavy users who download 1000 programs, it can be as much as 14M!

It must be said that this is a significant number!

Image memory

After Android 2.3, Bitmap changed its implementation, and image memory was moved from native heap to Java heap. This led to a significant increase in JavaHeap usage. (However, in Android 8.0, it was changed back to NativeHeap. The official documentation does not mention the specific reason, which needs further investigation).

Usually, when analyzing heap dumps, we find that Bitmap memory usage is the most significant part. This memory optimization is no exception.

The idea here is to analyze whether memory usage is reasonable:

1. Are all images used for interface display?

2. Are the image sizes too large?

First, analyze whether memory usage is reasonable. After the first phase of optimization, there are almost no images in memory when MainActivity is not opened. However, after opening MainActivity, dozens of megabytes of image memory will appear in memory.
The image memory is mainly used for display, that is, the part held by AsyncImageView.

Additionally, the image memory cache holds a maximum of 1/8 of the JavaHeap memory as Bitmap cache, using the LRU algorithm to eliminate old data.

Of course, some images are too large due to improper use, and they can actually be cropped to the actual size of the View.

Some full-screen images (those with the same width as the screen, mainly banners) can actually be cropped a bit smaller (e.g., 3/4 size) to reduce memory usage by nearly 46%, and the visual perception will not have a particularly noticeable difference. (This idea came up while writing this document, so it's marked as TODO).

Question 1: Regarding the AsyncImageView issue, consider whether all images are displayed to users?
The answer is obviously no, as some images are held by views recycled by ListView. This memory usage is clearly unreasonable.

Question 2: Additionally, for multi-page views like ViewPager, only one page is actually displayed to users, and the other pages are not being displayed. So, can ViewPager be modified?

For the first question, to address the issue of ListView-recycled views still remaining in memory, modify AsyncImageView so that it actively releases Bitmap when the view is detached from the window and attempts to load the image again when attached to the window. Another issue is with multi-image scrolling views, where the images are large and therefore occupy a lot of memory. Due to historical reasons, Gallery was used previously, which had a bug that caused it to hold two additional large images (already invisible). Therefore, RecyclerView was used to modify its implementation, solving the aforementioned problem.

For the second question, no effective measures have been taken yet, mainly relying on the Android system to actively reclaim the memory of the Activity. (There is doubt here, and further investigation into the system code is needed. After clarifying the logic, a conclusion can be drawn. The short-term conclusion is that the system's cleaning behavior is unreliable.) If changes are to be made, the memory of ViewPager can be modified simply to ensure that when other pages are not visible, their related Fragments are reclaimed. Leave a TODO for this.

LRU + TTL

For image caching, it only caches images and has an LRU algorithm to ensure that it does not exceed the maximum memory, so theoretically, memory usage is reasonable. However, the LRU algorithm has a problem: once the cache is full, subsequent additions of new Bitmaps can only eliminate old Bitmaps, and at this point, the memory occupied by the cache is still at its maximum. Therefore, the idea here is to use the LRU+TTL algorithm: on the basis of LRU, specify a valid duration for each Bitmap in the cache. After the duration has passed, actively remove it from the cache. This way, we can solve the problem of LRU cache's memory usage not being reducible.

Gaussian Blur

Here's an addition regarding the issue of excessive memory usage by Gaussian blur images, which has been optimized in previous versions.

Since Gaussian blur images themselves make the image blurry (obviously...), a large part of the image information is essentially lost. Based on this idea, we can first reduce the size of the image that needs Gaussian blur (e.g., 100x100) and then apply the Gaussian blur. This not only reduces memory usage but also significantly increases the speed of Gaussian blur processing!

For example, the previous playback page cover image was 720*720 in size, occupying memory at 720 * 720 * 4 = 2M. Reducing it to 100x100 occupies memory at 100 * 100 * 4 = 40K, showing a significant memory optimization effect, with almost no visual difference.

Other optimizations

This mainly targets the TOP 1 crash in the external network, the OOM caused by the creation of WNS internal threads.

The author's solution is to first delve into the system source code "Android Thread Creation Source Code and OOM Analysis" based on the crash report information, thoroughly clarify the thread creation logic, and ultimately determine that the crash is caused by uncontrolled thread creation. Then, a detailed cause analysis is prepared for the crash, a bug is reported to the WNS team, and the SDK is replaced after the bug is fixed.

Results comparison

The overall effect of memory optimization is not bad. Two phases of optimization have been carried out, optimizing dozens of projects. First of all, thanks to the project team for providing a considerable schedule, allowing time for some more in-depth changes.

Idle memory

The first phase of optimization resulted in an idle memory optimization of 26.5M tested on Nexus6P@7.1.

The second phase further optimized (sections 3.2 and 3.3 in the text), and now the idle memory dump shows only 3M of memory, with part of it being the playlist and part being the small images held by the playback page.

Through calculations, it can be concluded that the idle memory has been further reduced by:
24-hour live broadcast room singleton: 287K
Bullet screen manager singleton: 512K
Playback page animation large image: 1M
Playback history 600 (upper limit): (19256-216) * 600 = 10.9M
Download record 100 programs downloaded: (14727-100) * 100 = 1.4M

Total reduction: 28M+

Dynamic memory

Dynamic memory is relatively difficult to compare. Here, we decided to use a black-box testing approach:
Open the app, operate each tab of MainActivity, open the playback page, and then compare memory usage. Since the author only has one Nexus6P development machine, in order to control variables, two emulators were created and placed side by side. They separately opened Penguin FM 4.0 and 3.9 versions, ensuring the same operation path.

Conclusion

Another point to note is that although dynamic memory and static memory have been reduced by 52M and 28M, respectively, there is some overlap between them.

The measurement standards for the two are slightly different, and their impact on the application is also different.

Dynamic memory optimization mainly improves the app's performance on low-memory devices and reduces the likelihood of OutOfMemory occurrences.

On the other hand, static memory optimization mainly reduces the memory usage of the app when it is in the background. This not only reduces the chances of the app process being killed by Android's LowMemoryKiller but also leaves more available memory for the user's device, resulting in a better user experience.

Latest Posts
1A review of the PerfDog evolution: Discussing mobile software QA with the founding developer of PerfDog A conversation with Awen, the founding developer of PerfDog, to discuss how to ensure the quality of mobile software.
2Enhancing Game Quality with Tencent's automated testing platform UDT, a case study of mobile RPG game project We are thrilled to present a real-world case study that illustrates how our UDT platform and private cloud for remote devices empowered an RPG action game with efficient and high-standard automated testing. This endeavor led to a substantial uplift in both testing quality and productivity.
3How can Mini Program Reinforcement in 5 levels improve the security of a Chinese bank mini program? Let's see how Level-5 expert mini-reinforcement service significantly improves the bank mini program's code security and protect sensitive personal information from attackers.
4How UDT Helps Tencent Achieve Remote Device Management and Automated Testing Efficiency Let's see how UDT helps multiple teams within Tencent achieve agile and efficient collaboration and realize efficient sharing of local devices.
5WeTest showed PC & Console Game QA services and PerfDog at Gamescom 2024 Exhibited at Gamescom 2024 with Industry-leading PC & Console Game QA Solution and PerfDog