Skip to main content

Fix Memory Issues in Your Java Apps

Chi Wang
Oct 01 - 11 min read

In my previous blog post, Troubleshoot Memory Issues in Your Java Apps, I talked about how to troubleshoot and find memory problems. In this post, I cover how to fix complex memory problems in a production environment.

In this blog post you’ll learn how to:

  1. Mitigate memory pressure for data-intense applications
  2. Fix a native memory leak
  3. Handle troubleshooting complexity in production

Mitigate Memory Pressure

Normally, if the root cause of the memory issue is a coding mistake such as an unreleased resource, you can fix it right away. But in some complex cases, you may encounter out-of-memory exceptions due to a large number of objects being created in short amount of time. Here, I’ll talk about how to address this kind of issue.

For data intense applications, objects may be created faster than the garbage collector can free up enough memory space leading to an out-of-memory error. The graphic below shows what the profiler graph looks like when this is happening.

The blue represents the new gen heap space, the green is the total allocated memory, and the red is the old gen heap space. We see that a lot of interim objects are created and pile up in the heap’s young gen space (the blue spike). The downward spike shows the garbage collection freeing up memory. If enough objects are created in the young gen space before garbage collection happens, the spike exceeds the memory limit and you see an out-of-memory error.

When you run into this situation, there’s no easy fix. There are two things you need to do: tune the JVM memory settings and refactor your code. Let’s discuss them one by one.

Tune the JVM

The official Oracle doc Tuning Java Virtual Machines (JVMs)and HotSpot Virtual Machine Garbage Collection Tuning Guide covers almost every detail of JVM tuning. These documentations are good places to start when tuning for memory optimization

I recommend that you spend a few days tuning the JVM first before you start to refactor your code. Customizing the JVM is a low-cost way to start; you only need to set a few flags for the JVM, but if the tuning works, it stops the fire in your production app and buys you some time to plan for a proper code refactoring.

Here are the JVM variables I like to play with. Try changing them in your environment and verify the performance. But time-box your tuning, because code refactoring is what you want to spend your time on.

-XX:NewSize -XX:MaxNewSize -XX:SurvivorRatio -Xms -Xmx -XX:GCTimeRatio -XX:GCTimeLimit

Refactor Your Code

In my opinion, JVM tuning is only a temporary fire fighting approach. Code refactoring is the right approach to fix memory issues.

In Java, developers typically tend to create local (temporary) objects arbitrarily and trust the JVM garbage collector to recycle them effectively. This coding pattern might be fine for normal Java applications, but it could be an issue for large-scale, data-heavy applications such as data processing, data transferring, or streaming apps.

For example, this year I was troubleshooting an OOM issue for a dataset management application. This application was consuming almost 10 GB memory to parse a 2 GB image dataset. When the application was parsing the dataset, it created a lot of interim objects for each image in the dataset. This caused the memory usage to balloon up when the application processes three datasets in parallel.

Code refactoring suggestions

Here are some suggestions to refactor your code for mitigating memory pressure.

Always use stream with fix size buffer and avoid entire array copy

The example below is a very bad practice, it tries to load the entire stream into a byte array in the heap. If the stream represents a 2GB data file on disk, this operation costs 2GB+ memory in the heap. A better way is to read the data out of a stream in batches with a fixed size buffer, so it always consumes the same amount of memory for reading and processing data files no matter how large the files are.

Avoid inheriting a large abstract parent class

A cumbersome abstract root class introduces many unnecessary attributes for its child class to carry. This wastes a lot of memory when creating many objects for these child classes.

Limit memory consumption for serving each customer request

Ideally, I would like the application to consume a similar amount memory/CPU for serving the same type of customer requests. For example, whether the user chooses to upload a 2 GB or a 20 GB dataset, the memory cost should be same. This way, it’s easy to estimate how many resources you need to for scaling up your application.

Using image dataset parsing as a specific example, instead of creating a new ImageExample object for every image in the dataset, we can pre-allocate a couple of fixed-size image buffers and use them for parsing the entire dataset. This way, the total memory cost doesn’t depend on the size of the dataset (number of images), but the number of the pre-allocated buffers.

Another tip is to avoid functions such as toList()and toArray(), which read the entire stream into memory all at once. The standard pattern is to read content in fixed size batches from stream.

Utilize disk space and only keep necessary information in memory

Using the dataset parsing as an example, I temporarily save all the dataset files (images, labels, schema) to disk and keep only the essential attributes in memory such as `local directory of dataset files`, `exampleCnt`, `labels`, etc.

Test your dependencies’ memory performance

Be aware that your application’s dependencies (especially I/O libraries) can also leak memory. In one of my past projects, I found a third-party library (Amazon-S3-FileSystem-NIO2) has a serious memory leak. In my case, the library’s fileDAO.createAPI consumes 300-400 MB memory for uploading a 26 KB image. If the dependency library has such issue, there isn’t much you can do. In the end, we implemented our own NIO library to solve the problem.

I want to call out that performance refactoring is a prolonged effort, so be sure to adjust your expectations. There’s no silver bullet to get your application perform perfectly with one change. Only iterations of different attempts can get you to the destination.

Detect and Solve a Native Memory Leak

If you haven’t heard of a Java native memory leak before, I recommend you read these articles:

Although a Java application lives within its own JVM kingdom, it can also allocate resources in native space through the Java Native Interface (JNI). If you forget to clean the resource, a native memory leak occurs, but the JVM still appears normal because these large chunk resource leaks don’t happen in the heap. Therefore, all the tools we talked about so far can’t detect this type of memory leak which is why a native memory leak in Java is so tricky.

Detect a Native Memory Leak

Detect whether there is a native memory leak is to monitor both physical memory and heap memory, if heap memory size is stable but the process’ physical memory consumption is keep growing, it means you find a native memory leak. The Native Memory Tracking (NMT) is very helpful to monitor what heap looks like in native space.

If your application runs in Kubernetes, and it keep getting OOMKilled by the Kubernetes controller but there is no JVM dump file been generated, then it’s also a sign of native memory leak.

Another useful tip is to set the flag -XX:MaxDirectMemorySizeto a small number like 500 MB or 1 GB. This helps you to catch the leak on Direct Byte Buffersadd -Djdk.nio.maxCachedBufferSize flag will fix this type issue.

Troubleshoot the Root Cause of a Native Memory Leak

The general idea of native memory troubleshooting is quite straightforward. You replace the default native memory allocation function (malloc) with another allocator: jemalloc at the operating system level. Compared to the standard allocator, jemalloc provides introspection, memory management, and tuning features.

Once you setup jemalloc, you start the application and run a test scenario, and jemalloc generates dump files according to your settings. When the test finishes, you can visualize the dump file (named something like jeprof.19678.0.f.heap) to get a memory allocation graph. After you create the graph, it’s usually very easy to tell where the root cause is (see one sample jemalloc report below).

Since everything is containerized now, here’s how to enable jemalloc in an Ubuntu container.

RUN apt updateRUN apt install -y bzip2
RUN apt install -y build-essential
RUN apt install -y autoconf
RUN apt install -y wget
RUN apt install -y graphviz
RUN apt install -y ghostscript
RUN apt install -y unzip# Download jemalloc source code and build its library
RUN mkdir -p /export/app/jemalloc && cd /export/app/jemalloc \
&& wget https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2 \
&& tar -xvf jemalloc-5.2.1.tar.bz2 && cd jemalloc-5.2.1 \
&& ./configure --enable-prof --enable-stats --enable-debug --enable-fill \
&& make && make installENV LD_PRELOAD "/usr/local/lib/libjemalloc.so"
ENV MALLOC_CONF "prof:true,lg_prof_interval:31,lg_prof_sample:17,prof_prefix:/export/app/<your_ap>/logs/jeprof"

Here are the commands to visualize the memory allocation graph.

jeprof --show_bytes `which java` {jemalloc dump, ex:jeprof.19678.0.f.heap} > ps_out.pdf
jeprof --show_bytes --pdf `which java` {jemalloc dump, ex:jeprof.19678.0.f.heap} > ps_out.pdf

Native Memory Leak When Serving a Tensorflow Model in Java

Let’s look at a specific native memory leak example: Tensorflow model serving. The Google Tensorflow team released a very convenient java JNI library to support hosting Tensorflow models in a Java process. The JNI library helps you load model and build compute graph in native space, and you could serve a prediction request with just few line of java code as shown in the following example.

// Load model in memory (native space)
SavedModelBundle model = SavedModelBundle.load(modelFile.getPath(), "serve")// Get session of the model.
Session session = model.getTFSession();
Runner runner = session.runner();// Set input
inputTensors.forEach(runner::feed);
// Run the model with input and return result
outputs = runner.run();

There are several places where severe memory leak could occur in this code example.

Close objects to prevent resource leaks

The inputTensor and outputTensor objects must be closed to prevent resource leaks. Note that there are generally multiple output and input tensors.

inputTensors.forEach((name, tensor) -> tensor.close());
if (Objects.nonNull(outputs)) {
outputs.forEach(Tensor::close);
}
Model loading isn’t reflected in the JVM memory usage

Be aware that model loading consumes a great amount of native memory that isn’t reflected in the JVM memory usage. Seen from the graph below, when loading models, two large objects (Graph and Session) are created in the native space, and only the handles (pointers) are sent back to the JVM.

Since the JVM has only the pointers, it’s blind to these native memory allocations. It’s very easy to run into memory issues if you’re hosting multiple models in the cache and underestimate the actual memory usage of loading a model. In my experience, loading an 80–840 KB language model can cause memory usage of 200–300 MB.

Troubleshoot in Production

Troubleshooting a memory issue in production is a lot different than troubleshooting on your local machine. You can’t take a memory dump (snapshot) from the service in production since it could degrade performance and it may crash the service. Also, because there’s realtime customer traffic, there are many troubleshooting obstacles, like restricted access for security reasons, slow and strict deployment processes, and potential risk of impacting customers. Imagine how difficult it is to debug an OOM issue for a web service in an access-restricted production environment with 40,000 queries per seconds (QPS) traffic.

Here, I want to share some tips on handling production troubleshooting. The key idea is not to work on production directly but to spend time identifying the possible conditions that cause the memory leak, replicating them and troubleshooting in your local test environment. When you work in a local test environment, it’s easy to change things. You can iterate and experiment quickly.

Here are the recommended methods for analyzing production issues.

  1. Build comprehensive measurement metrics (traffic, latency, service internal details and resources consumption — especially the JVM internal details, ex: metrics-jvm) and enable garbage collection logs.
  2. Examine the metrics around each minor/major garbage collection and every memory usage surge. Find out suspicious user scenarios by associating the memory metrics changes with traffic patterns and service logic.
  3. Set up local test environment. For example, deploy the service to your own Kubernetes namespace and replicate production settings.
  4. Build perf tests (ex: Apache JMeter) to replicate or amplify the suspicious user scenarios and reproduce the memory issue in test environment.
  5. Create a troubleshooting/test plan and an experimentation log. Use the test plan to list the major work items for evaluating different theories and use the experimentation log to track the results of each experimentation. Memory troubleshooting and refactoring is usually a longterm effort. The experimentation log is extremely useful for keeping a clear mind under pressure and making steady progress when the investigation lasts more than a few weeks.
  6. Follow the test plan, investigate dump files (with the knowledge from this article), make code refactoring changes one at a time, and evaluate the results. Keep doing this in iterations until you find the root cause or mitigate the memory pressure.
  7. Summarize your findings and present it to your team so others can learn and not repeat the issue in future.

A common mistake of many less experienced developers make is to develop a fix and verify in production directly. Setting up a test environment and reproducing a memory leak is hard. Some developers think it’s not necessary.

To me, it totally worth spending 3–5 days to set up a test environment, and reproduce the OOM conditions, because fast iteration is the key to solving memory issues. You can experiment your ideas pretty fast (less than a few hours) and be very confident for your findings and fixes. Because you have full control in the test environment, you can do almost anything to verify the results and not worry about breaking anything.

Conclusion

I hope these two posts, Troubleshoot Memory Issues in Your Java Apps and Fix Memory Issues in Your Java Apps, help you explore a bit deeper into the world of Java memory troubleshooting and feel less intimidated when taking on the challenge of a memory issue in an enterprise production application. Working on memory problems is actually quite fun; I learned more about Java from troubleshooting than reading books. I’m sure you’ll enjoy it too, since you just finished reading this long blog post!

I want to thank Dianne Siebold, our tech writer who wrote most of the public developer docs for Einstein Vision and Language. I wouldn’t have finished this blog without her help and guidance!

Reference

Related General Engineering Articles

View all