Improving CPU Performance in Java

Knowing that you have CPU-related issues in your app is one thing — doing something about it is the next challenge. In some respects, tuning an Android application is a “one-off” job, tied to the particulars of the application and what it is trying to accomplish. That being said, this chapter will outline some general-purpose ways of boosting performance that may counter issues that you are running into.

Prerequisites

Understanding this chapter requires that you have read the core chapters and understand how Android apps are set up and operate. Reading the introductory chapter to this trail is also a good idea.

Reduce CPU Utilization

One class of CPU-related problems come from purely sluggish code. These are the sorts of things you will see in Traceview, for example – methods or branches of code that seem to take an inordinately long time. These are also some of the most difficult to have general solutions for, as often times it comes down to what the application is trying to accomplish. However, the following sections provide suggestions for consuming fewer CPU instructions while getting the same work done.

These are presented in no particular order.

Standard Java Optimizations

Most of your algorithm fixes will be standard Java optimizations, no different than have been used by Java projects over the past decade and change. This section outlines a few of them. For more, consider reading Effective Java by Joshua Bloch or Java Performance Tuning by Jack Shirazi.

Avoid Excessive Synchronization

Few objects in java.* namespaces are intrinsically thread-safe, outside of java.util.concurrent. Typically, you need to perform your own synchronization if multiple threads will be accessing non-thread-safe objects. However, sometimes, Java classes have synchronization that you neither expect nor need. Synchronization adds unnecessary overhead.

The classic example here is StringBuffer and StringBuilder. StringBuffer was part of Java from early on, and, for whatever reason, was written to be thread-safe — two threads that append to the buffer will not cause any problems. However, most of the time, you are only using the StringBuffer from one thread, meaning all that synchronization overhead is a waste. Later on, Java added StringBuilder, with the same basic set of methods as has StringBuffer, but without the synchronization.

Similarly, in your own code, only synchronize where it is really needed. Do not toss the synchronized keyword around randomly, or use concurrent collections that will only be used by one thread, etc.

Avoid Floating-Point Math

The first generation of Android devices lacked a floating-point coprocessor on the ARM CPU package. As a result, floating-point math speed was atrocious. That is why the Google Maps add-on for Android uses GeoPoint, with latitude and longitude in integer microdegrees, rather than the standard Android Location class, which uses Java double variables holding decimal degrees.

While later Android devices do have floating-point coprocessor support, that does not mean that floating-point math is now as fast as integer math. If you find that your code is spending lots of time on floating-point calculations, consider whether a change in units would allow you to replace the floating-point calculations with integer equivalents. For example, microdegrees for latitude and longitude provide adequate granularity for most maps, yet allow Google Maps to do all of its calculations in integers.

Similarly, consider whether the full decimal accuracy of floating-point values is really needed. While it may be physically possible to perform distance calculations in meters with accuracy to a few decimal points, for example, in many cases the user will not need that degree of accuracy. If so, perhaps changing to fixed-point (integer) math can boost your performance.

Don’t Assume Built-In Algorithms are Best

Years upon years of work has gone into the implementation of various algorithms that underlie Java methods, like searching for substrings inside of strings.

Somewhat less work has gone into the implementation of the Apache Harmony versions of those methods, simply because the project is younger, and it is a modified version of the Harmony implementation that you will find in Android. While the core Android team has made many improvements to the original Harmony implementation, those improvements may be for optimizations that do not fit your needs (e.g., optimizing to reduce memory consumption at the expense of CPU time).

But beyond that, there are dozens of string-matching algorithms, some of which may be better for you depending on the string being searched and the string being searched for. Hence, you may wish to consider applying your own searching algorithm rather than relying on the built-in one, to boost performance. And, this same concept may hold for other algorithms as well (e.g., sorting).

Of course, this will also increase the complexity of your application, with long-term impacts in terms of maintenance cost. Hence, do not assume the built-in algorithms are the worst, either — optimize those algorithms that Traceview or logging suggest are where you are spending too much time.

Support Hardware-Accelerated Graphics

An easy “win” is to add android:hardwareAccelerated="true" to your <application> element in the manifest. This toggles on hardware acceleration for 2D graphics, including much of the stock widget framework. For maximum backwards compatibility, this hardware acceleration is off, but adding the aforementioned attribute will enable it for all activities in your application.

Note that this is only available starting with Android 3.0. It is safe to have the attribute in the manifest for older Android devices, as they simply will ignore your request.

You also should test your application thoroughly after enabling hardware acceleration, to make sure there are no unexpected issues. For ordinary widget-based applications, you should encounter no problems. Games or other applications that do their own drawing might have issues. If you find that some of your code runs into problems, you can override hardware acceleration on a per-activity basis by putting the android:hardwareAccelerated attribute on <activity> elements in the manifest.

Minimize IPC

Calling a method on an object in your own process is fairly inexpensive. The overhead of the method invocation is fairly minuscule, and so the time involved is simply however long it takes for that method to do its work.

Invoking behaviors in another process, via inter-process communication (IPC), is considerably more expensive. Your request has to be converted into a byte array (e.g., via the Parcelable interface), made available to the other process, converted back into a regular request, then executed. This adds substantial CPU overhead.

There are three basic flavors of IPC in Android:

  1. “Directly” invoking a third-party application’s service’s AIDL-published interface, to which you bound with bindService()
  2. Performing operations on a content provider that is not part of your application (i.e., supplied by the OS or a third-party application)
  3. Performing other operations that, under the covers, trigger IPC

Remote Bound Service

Using a remote service is fairly obvious when you do it — it is difficult to mistake copying the AIDL into your project and such. The proxy object generated from the AIDL converts all your method calls on the interface into IPC operations, and this is relatively expensive.

If you are exposing a service via AIDL, design your API to be coarse-grained. Do not require the client to make 1,000 method invocations to accomplish something that can be done in 1 via slightly more complex arguments and return values.

If you are consuming a remote service, try not to get into situations where you have to make lots of calls in a tight loop, or per row of a scrolled AdapterView, or anything else where the overhead may become troublesome.

For example, in the CPU-Java/AIDLOverhead sample project, you will find a pair of projects implementing the same do-nothing method in equivalent services. One uses AIDL and is bound to remotely from a separate client application; the other is a local service in the client application itself. The client then calls the do-nothing method 1 million times for each of the two services. On average, on a Samsung Galaxy Tab 10.1, 1 million calls takes around 170 seconds for the remote service, while it takes around 170 milliseconds for the local service. Hence, the overhead of an individual remote method invocation is small (~170 microseconds), but doing lots of them in a loop, or as the user flings a ListView, might become noticeable.

Remote Content Provider

Using a content provider can be somewhat less obvious of a problem. Using ContentResolver or a CursorLoader looks the same whether it is your own content provider or someone else’s. However, you know what content providers you wrote; anything else is probably running in another process.

As with remote services, try to aggregate operations with remote content providers, such as:

  1. Use bulkInsert() rather than lots of individual insert() calls
  2. Try to avoid calling update() or delete() in a tight loop – instead, if the content provider supports it, use a more complex “WHERE clause” to update or delete everything at once
  3. Try to get all your data back in few queries, rather than lots of little ones… though this can then cause you issues in terms of memory consumption

Remote OS Operation

The content provider scenario is really a subset of the broader case where you request that Android do something for you and winds up performing IPC as part of that.

Sometimes, this is going to be obvious. If you are sending commands to a third-party service via startService(), by definition, this will involve IPC, since the third-party service will run in a third-party process. Try to avoid calling startService() lots of times in close succession.

However, there are plenty of cases that are less obvious:

  1. All requests to startActivity(), startService(), and sendBroadcast() involve IPC, as it is a separate OS process that does the real work
  2. Registering and unregistering a BroadcastReceiver (e.g., registerReceiver()) involves IPC
  3. All of the “system services”, such as LocationManager, are really rich interfaces to an AIDL-defined remote service, and so most operations on these system services require IPC

Once again, your objective should be to minimize calls that involve IPC, particularly where you are making those calls frequently in close succession, such as in a loop. For example, frequently calling getLastKnownLocation() will be expensive, as that involves IPC to a system process.

Android-Specific Java Optimizations

The way that the Dalvik VM was implemented and operates is subtly different than a traditional Java VM. Therefore, there are some optimizations that are more important on Android than you might find in regular desktop or server Java.

The Android developer documentation has a roster of such optimizations. Some of the highlights include:

  1. Getters and setters, while perhaps useful for encapsulation, are significantly slower than direct field access. For simpler cases, such as ViewHolder objects for optimizing an Adapter, consider skipping the accessor methods and just use the fields directly.
  2. Some popular method calls are replaced by hand-created assembler instructions rather than code generated via the JIT compiler. indexOf() on String and arraycopy() on System are two cited examples. These will run much faster than anything you might create yourself in Java.

Reduce Time on the Main Application Thread

Another class of CPU-related problem is when your code may be efficient, but it is occurring on the main application thread, causing your UI to react sluggishly. You might have tuned your decryption algorithm as best as is mathematically possible, but it may be that decrypting data on the main application thread simply takes too much time. Or, perhaps StrictMode complained about some disk or network I/O that you are performing on the main application thread.

The following sections recap some commonly-seen patterns for moving work off the main application thread, plus a few newer options that you may have missed.

Generate Less Garbage

Most developers think of having too many allocations as being solely an issue of heap space. That certainly has an impact, and depending on the nature of the allocations (e.g., bitmaps), it may be the dominant issue.

However, garbage has impacts from a CPU standpoint as well. Every object you create causes its constructor to be executed. Every object that is garbage-collected requires CPU time both to find the object in the heap and to actually clean it up (e.g., execute the finalizer, if any).

Worse still, on older versions of Android (e.g., Android 2.2 and down), the garbage collector interrupts the entire process to do its work, so the more garbage you generate, the more times you “stop the world”. Game developers have had to deal with this since Android’s inception. To maintain a 60 FPS refresh rate, you cannot afford any garbage collections on older devices, as a single GC run could easily take more than the ~16ms you have per drawing pass.

As a result of all of this, game developers have had to carefully manage their own object pools, pre-allocating a bunch of objects before game play begins, then using and recycling those objects themselves, only allowing them to become garbage after game play ends.

Most non-game Android applications may not have to go to quite that extreme across the board. However, there are cases where excessive allocation may cause you difficulty. For example, avoiding creating too much garbage is one aspect of view recycling with AdapterView, which is covered in greater detail in the next section.

If Traceview indicates that you are spending a lot of time in garbage collection, pay attention to your loops or things that may be invoked many times in rapid succession (e.g., accessing data from a custom Cursor implementation that is tied to a CursorAdapter). These are the most likely places where your own code might be creating lots of extra objects that are not needed. Examining the heap to see what is all being created (and eventually garbage collected) will be covered in an upcoming chapter of the book.

View Recycling

Perhaps the best-covered Android-specific optimization is view recycling with AdapterView.

In a nutshell, if you are extending BaseAdapter, or if you are overriding getView() in another adapter, please make use of the View parameter supplied to getView() (referred to here as convertView). If convertView is not null, it is one of your previous View objects you returned from getView() before, being offered to you for recycling purposes. Using convertView saves you from inflating or manually constructing a fresh View every time the user scrolls, and both of those operations are relatively expensive.

If you have been ignoring convertView because you have more than one type of View that getView() returns, your Adapter should be overriding getViewTypeCount() and getItemViewType(). These will allow Android to maintain separate object pools for each type of row from your Adapter, so getView() is guaranteed to be passed a convertView that matches the row type you are trying to create.

A somewhat more advanced optimization — caching all those findViewById() lookups — is also possible once your row recycling is in place. Often referred to as “the holder pattern”, you do the findViewById() calls when you inflate a new row, then attach the findViewById() results to the row itself via some custom “holder” object and the setTag() method on View. When you recycle the row, you can get your “holder” back via getTag() and skip having to do the findViewById() calls again.

Background Threads

Of course, the backbone of any strategy to move work off the main application thread is to use background threads, in one form or fashion. You will want to apply these in places where StrictMode complains about network or disk I/O, or places where Traceview or logging indicate that you are taking too much time on the main application thread during GUI processing (e.g., converting downloaded bitmap images into Bitmap objects via BitmapFactory).

Sometimes, you will manually dictate where work should be done in the background, either by forking threads yourself or by using AsyncTask. AsyncTask is a nice framework, handling all of the inter-thread communication for you and neatly packaging up the work to be done in readily understood methods. However, AsyncTask does not fit every scenario — it is mostly designed for “transactional” work that is known to take a modest amount of time (milliseconds to seconds) then end. For cases where you need unbounded background processing, such as monitoring a socket for incoming data, forking your own thread will be the better approach.

Sometimes, you will use facilities supplied by Android to move work to the background. For example, many activities are backed by a Cursor obtained from a database or content provider. Classically, you would manage the cursor (via startManagingCursor()) or otherwise arrange to refresh that Cursor in onResume(), so when your activity returns to the foreground after having been gone for a while, you would have fresh data. However, this pattern tends to lead to database I/O on the main application thread, triggering complaints from StrictMode. Android 3.0 and the Android Compatibility Library offer a Loader framework designed to try to solve the core pattern of refreshing the data, while arranging for the work to be done asynchronously.

Asynchronous BroadcastReceiver Operations

99.44% of the time (approximately) that Android calls your code in some sort of event handler, you are being called on the main application thread. This includes manifest-registered BroadcastReceiver components — onReceive() is called on the main application thread. So any work you do in onReceive() ties up that thread (possibly impacting an activity of yours in the foreground), and if you take more than 10 seconds, Android will terminate your BroadcastReceiver with extreme prejudice.

Classically, manifest-registered BroadcastReceiver components only live as long as the onReceive() call does, meaning you can do very little work in the BroadcastReceiver itself. The typical pattern is to have it send a command to a service via startService(), where the service “does the heavy lifting”.

Android 3.0 added a goAsync() method on BroadcastReceiver that can help a bit here. While under-documented, it tells Android that you need more time to complete the broadcast work, but that you can do that work on a background thread. This does not eliminate the 10-second rule, but it does mean that the BroadcastReceiver can do some amount of I/O without having to send a command to a service to do it while still not tying up the main application thread.

The CPU-Java/GoAsync sample project demonstrates goAsync() in use, as the project name might suggest.

Our activity’s layout consists of two Button widgets and an EditText widget:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
  android:orientation="vertical" android:layout_width="match_parent"
  android:layout_height="match_parent">
  <EditText android:id="@+id/editText1" android:layout_width="match_parent"
    android:layout_height="wrap_content">
  </EditText>
  <Button android:layout_width="match_parent" android:id="@+id/button1"
    android:layout_height="wrap_content" android:text="@string/nonasync"
    android:onClick="sendNonAsync"></Button>
  <Button android:layout_width="match_parent" android:id="@+id/button2"
    android:layout_height="wrap_content" android:text="@string/async"
    android:onClick="sendAsync"></Button>
</LinearLayout>
(from CPU-Java/GoAsync/app/src/main/res/layout/main.xml)

The activity itself simply has sendAsync() and sendNonAsync() methods, each invoking sendBroadcast() to a different BroadcastReceiver implementation:

package com.commonsware.android.tuning.goasync;

import android.app.Activity;
import android.content.Intent;
import android.os.Bundle;
import android.view.View;

public class GoAsyncActivity extends Activity {
  @Override
  public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.main);
  }
  
  public void sendAsync(View v) {
    sendBroadcast(new Intent(this, AsyncReceiver.class));
  }
  
  public void sendNonAsync(View v) {
    sendBroadcast(new Intent(this, NonAsyncReceiver.class));
  }
}
(from CPU-Java/GoAsync/app/src/main/java/com/commonsware/android/tuning/goasync/GoAsyncActivity.java)

The NonAsyncReceiver simulates doing time-consuming work in onReceive() itself:

package com.commonsware.android.tuning.goasync;

import android.content.BroadcastReceiver;
import android.content.Context;
import android.content.Intent;
import android.os.SystemClock;

public class NonAsyncReceiver extends BroadcastReceiver {
  @Override
  public void onReceive(Context arg0, Intent arg1) {
    SystemClock.sleep(7000);
  }
}
(from CPU-Java/GoAsync/app/src/main/java/com/commonsware/android/tuning/goasync/NonAsyncReceiver.java)

Hence, if you click the “Send Non-Async Broadcast” button, not only will the button fail to return to its normal state for seven seconds, but the EditText will not respond to user input either.

The AsyncReceiver, though, uses goAsync():

package com.commonsware.android.tuning.goasync;

import android.content.BroadcastReceiver;
import android.content.Context;
import android.content.Intent;
import android.os.SystemClock;

public class AsyncReceiver extends BroadcastReceiver {
  @Override
  public void onReceive(Context context, Intent intent) {
    final BroadcastReceiver.PendingResult result=goAsync();
    
    (new Thread() {
      public void run() {
        SystemClock.sleep(7000);
        result.finish();
      }
    }).start();
  }
}
(from CPU-Java/GoAsync/app/src/main/java/com/commonsware/android/tuning/goasync/AsyncReceiver.java)

The goAsync() method returns a PendingResult, which supports a series of methods that you might ordinarily fire on the BroadcastReceiver itself (e.g., abortBroadcast()) but want to do on a background thread. You need your background thread to have access to the PendingResult — in this case, via a final local variable. When you are done with your work, call finish() on the PendingResult.

If you click the “Send Async Broadcast” button, even though we are still sleeping for 7 seconds, we are doing so on a background thread, and so our user interface is still responsive.

Saving SharedPreferences

The classic way to save SharedPreferences.Editor changes was via a call to commit(). This writes the preference information to an XML file on whatever thread you are on — another hidden source of disk I/O you might be doing on the main application thread.

If you are on API Level 9, and you are willing to blindly try saving the changes, use the new apply() method on SharedPreferences.Editor, which works asynchronously.

If you need to support older versions of Android, or you really want the boolean return value from commit(), consider doing the commit() call in an AsyncTask or background thread.

And, of course, to support both of these, you will need to employ tricks like conditional class loading. You can see that used for saving SharedPreferences in the CPU-Java/PrefsPersist sample project. The activity reads in a preference, puts the current value on the screen, then updates the preference with the help of an AbstractPrefsPersistStrategy class and its persist() method:

package com.commonsware.android.tuning.prefs;

import android.app.Activity;
import android.content.SharedPreferences;
import android.os.Bundle;
import android.preference.PreferenceManager;
import android.widget.TextView;

public class PrefsPersistActivity extends Activity {
  private static final String KEY="counter";
  
  @Override
  public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.main);
    
    SharedPreferences prefs=
      PreferenceManager.getDefaultSharedPreferences(this);
    int counter=prefs.getInt(KEY, 0);
    
    ((TextView)findViewById(R.id.value)).setText(String.valueOf(counter));
    
    AbstractPrefsPersistStrategy.persist(prefs.edit().putInt(KEY, counter+1));
  }
}
(from CPU-Java/PrefsPersist/app/src/main/java/com/commonsware/android/tuning/prefs/PrefsPersistActivity.java)

AbstractPrefsPersistStrategy is an abstract base class that will hold a strategy implementation, depending on Android version. On pre-Honeycomb builds, it uses an implementation that forks a background thread to perform the commit():

package com.commonsware.android.tuning.prefs;

import android.content.SharedPreferences;
import android.os.Build;

abstract public class AbstractPrefsPersistStrategy {
  abstract void persistAsync(SharedPreferences.Editor editor);
  
  private static final AbstractPrefsPersistStrategy INSTANCE=initImpl();
  
  public static void persist(SharedPreferences.Editor editor) {
    INSTANCE.persistAsync(editor);
  }
  
  private static AbstractPrefsPersistStrategy initImpl() {
    int sdk=new Integer(Build.VERSION.SDK).intValue();
    
    if (sdk<Build.VERSION_CODES.HONEYCOMB) {
      return(new CommitAsyncStrategy());
    }
    
    return(new ApplyStrategy());
  }

  static class CommitAsyncStrategy extends AbstractPrefsPersistStrategy {
    @Override
    void persistAsync(final SharedPreferences.Editor editor) {
      (new Thread() {
        @Override
        public void run() {
          editor.commit();
        }
      }).start();
    }
  }
}
(from CPU-Java/PrefsPersist/app/src/main/java/com/commonsware/android/tuning/prefs/AbstractPrefsPersistStrategy.java)

On Honeycomb and higher, it uses a separate strategy class that uses the new apply() method:

package com.commonsware.android.tuning.prefs;

import android.content.SharedPreferences.Editor;

public class ApplyStrategy extends AbstractPrefsPersistStrategy {

  @Override
  void persistAsync(Editor editor) {
    editor.apply();
  }
}
(from CPU-Java/PrefsPersist/app/src/main/java/com/commonsware/android/tuning/prefs/ApplyStrategy.java)

By separating the Honeycomb-specific code out into a separate class, we can avoid loading it on older devices and encountering the dreaded VerifyError.

Whether using the built-in apply() method is worth dealing with multiple strategies, versus simply calling commit() on a background thread, is up to you.

Improve Throughput and Responsiveness

Being efficient and doing work on the proper thread may still not be enough. It could be that your work is not consuming excessive CPU time, but is taking too long in “wall clock time” (e.g., the user sits waiting too long at a ProgressDialog). Or, it could be that your work, while efficient and in the background, is causing difficulty for foreground operations.

The following sections outline some common problems and solutions in this area.

Minimize Disk Writes

Earlier in this book, we emphasized moving disk writes off to background threads.

Even better is to get rid of some of the disk writes entirely.

A big culprit here comes in the form of database operations. By default, each insert(), update(), or delete(), or any execSQL() invocation that modifies data, will occur in its own transaction. Each transaction involves a set of disk writes. Many times, this is not a problem. But, if you are doing a lot of these – such as importing records from a CSV file — hundreds or thousands of transactions will mean thousands of individual disk writes, and that can take some time. You may wish to wrap those operations in your own transaction, using methods like beginTransaction(), simply to reduce the number of transactions and, therefore, disk writes.

If you are doing your own disk I/O beyond databases, you may encounter similar sorts of issues. Overall, it is better to do a few larger writes than lots of little ones.

Set Thread Priority

Threads you fork, by default, run at a default priority: THREAD_PRIORITY_DEFAULT as defined on the Process class. This is a lower priority than the main application thread (THREAD_PRIORITY_DISPLAY).

Threads you use via AsyncTask run at a lower priority (THREAD_PRIORITY_BACKGROUND). If you fork your own threads, then, you might wish to consider moving them to a lower priority as well, to affect how much time they get compared to the main application thread. You can do this via setThreadPriority() on the Process class.

The lowest possible priority, THREAD_PRIORITY_LOWEST, is described as “only for those who really, really don’t want to run if anything else is happening”. You might use this for “idle-time processing”, but bear in mind that the thread will be paused a lot to allow other threads to run.

Lower-priority threads will help ensure that your background work does not affect your foreground UI. Processes themselves are put in a lower-priority class as they move to the background (e.g., you have no activities visible), which further reduces the amount of CPU time you will be using at any given moment.

Also, note that IntentService uses a thread at default (not background) priority — you may wish to drop the priority of this thread to something that will be lower than your main application thread, to minimize how much CPU time the IntentService steals from your UI.

Do the Work Some Other Time

Just because you could do the work now does not mean you should do the work now. Perhaps a better answer is to do the work later, or do part of the work now and part of the work later.

For example, suppose that you have your own database of points of interest for your custom map application. Periodically, you publish a new database on your Web site, which your Android app should download. Odds are decent that the user is not in desperate need for this new database right away. In fact, the CPU time and disk I/O time to download and save the database might incrementally interfere with the foreground application, despite your best efforts.

In this case, not only should you check for and download the database when the user is unlikely to be using the device (e.g., before dawn), but you should check whether the screen is on via isScreenOn() on PowerManager, and delay the work to sometime when the screen is off. For example, you could have AlarmManager set up to have your code check for updates every 24 hours at 4am. If, at 4am, the screen is on, your code could skip the download and wait until tomorrow, or skip the download and add a one-shot alarm to wake you up in 30 minutes, in hopes that the user will no longer be using the device.

At the same time, you may wish to consider having a “refresh” menu choice somewhere, for when the user specifically wants you to go get the update (if available) now, for whatever reason.