Symbian C++ Performance Tips
Article Metadata
Contents |
What is performance
Performance is a number of measurable characteristics that a device can display, be it boot time, ROM size, RAM usage, viewing a picture or battery life. The usage and features of a device can often dictate desirable values for these characteristics. In order to satisfy these the software has to be designed and implemented accordingly.
Why it matters
Often when performance is important, the standard solution is to increase CPU speed, or dedicate large amounts of RAM to caching solutions. Neither of these options is really open to mobile phone manufacturers, as devices have to be built with battery and cost limitations in mind.
Performance killers
Most of the performance problems seen on smartphones fall into one or other of a small set of problems. It is this subset that we will explore in this article.
Too much code, not enough data
Very often during development, an application has some parameters that affect its behavior. These are often stored in a file which is processed during startup.
The problem arises when this configuration becomes static, either through a finalization of requirements or settling on a set of default values. At this point the parameters could be hard-coded, but often that can mean refactoring an application:
void SomeCode(void)
{
// Open file
// New ReadStream
// Read some data members into a struct
// Close file, etc.
}
It can also occur when a data structure is created on the
heap, which could have been generated at build-time:
void ConstructL(void)
{
TFuncTable *fns = new (ELeave) TFuncTable;
fns->iFunc1 = ExampleFunc1;
fns->iFunc2 = ExampleFunc2;
fns->iFunc3 = ExampleFunc3;
iFuncTable = fns;
}
In this example it would be better to have a const copy
of the initialized TFuncTable, and either have
iFuncTable point to it, or, better still, use the const
version in place of iFuncTable.
This manifestation of the problem can often be hidden,
having the function table take the form of an interface
class.
In some cases it can be difficult to express the intentions in easy-to-understand C++ terms: a data structure that is read from a file could contain sub-structures. These can often become difficult to present as easily-read const struct or const class declarations.
A developer may even choose to have a ‘generated code’ solution that takes a human-readable data file and converts it into C++ code ready for the compiler.
Repeated code within loops
Redundant calculations very often occur in tandem with the construction of a complex type. Consider the following example:
ExampleClass::SimpleOperation(SimpleType a, SimpleType b)
{
//creation of a complex type – this is
//unnecessary, see text
ComplexType c = b.MakeComplex();
// some other code
}
CExampleClass::DoSomeComplexComputation(...)
{
SimpleType a,b;
while(moreToDo)
{
//some code
SimpleOperation(a, b);
// some other code that does not change
//variable b
}
}
In the code above, the SimpleOperation() method is
called within a loop. In each iteration, the same complex
type is created from a SimpleType but it is not modified
in the further code. This repeated creation of a
ComplexType is unnecessary, it is a waste of resources
and could seriously affect performance. Clearly if a
ComplexType could be passed to SimpleOperation()
then its repeated creation could be removed:
ExampleClass::SimpleOperation(SimpleType a, ComplexType &b)
{
// some code
}
ExampleClass::DoSomeComplexComputation(...)
{
SimpleType a;
ComplexType b; //create b as a complex type
//from outset
while(moreToDo)
{
//pass b as a ComplexType instead of a
//SimpleType
SimpleOperation(a, b);
//some code
}
}
To summarize, care has to be taken to ensure repeated
calculations or processing does not get done in heavily
used loops.
Inefficient heap usage
Often on embedded systems, the heap has to be used in place of the stack. Without care, this can lead to excessive heap calls, as stack space is usually used for temporary variables.
void LoopWithHeap(void)
{
while(moreToDo)
{
CData *temp = new CData;
GetData(temp);
ProcessData(temp);
delete temp;
}
}
Where possible, any temporary variables should be
reused:
void LoopWithHeap(void)
{
CData *temp = new CData;
while(moreToDo)
{
GetData(temp);
ProcessData(temp);
temp->Reset();
}
delete temp;
}
Another cause of this problem is the use of segmented
data structures with granularity that is too fine for the
amount of data being processed.
Another possible cause of poor heap usage can be over- reliance on realloc. Poorly thought-out design can mean that heap cells are required to increase in size, and this usually involves an alloc, free and memcpy call.
Limited understanding of library
API documentation rarely includes any in-depth implementation notes. Coding to an inappropriate/poorly understood API can lead to problems such as duplicated or unnecessary processing and data transformation. Consider a class that provides access to an array and implements bounds checking on the SetElement method:
void ArrayClass::SetSize(int aSize)
{
iMaxLength = aSize;
}
void ArrayClass::SetElement(int aPos, unsigned char aChar)
{
if(aPos >= 0 && aPos < iMaxLength)
{
iRawArray[aPos] = aChar;
}
}
Now consider a program written to use this class - it
needs to add a number of elements to the array:
void ExampleClass::FillArray()
{
//some code
myArray.SetSize(bytesToProcess);
for(currentPos = 0; currentPos < bytesToProcess; currentPos++)
{
myArray.SetElement(currentPos, aByte);
}
}
The inefficiency here stems from the fact that the bounds
checking done by SetElement is unnecessary – the
calling function is in a loop that has already set the
upper limit of the array.
This problem may have arisen for many reasons. Perhaps the developer of ExampleClass doesn’t realize that ArrayClass does bounds checking, or doesn’t know about another API on ArrayClass that may be more appropriate, or, perhaps the implementer of ArrayClass hasn’t provided such an API, and doesn’t expect that class to be used in such a manner.
Type coercion
Where a bad ‘data design’ has been used, processor time is needlessly wasted transforming data from one type to another. This often only involves a static transformation, and is usually done in preparation for passing the data to an external API.
Consider the following code, an example of ‘Chasing the data type’:
TInt intDrive;
TChar ch = ((*drives)[i])[0];
RFs::CharToDrive(ch,intDrive);
TDriveUnit curDrive(intDrive);
The code needs to use a TDriveUnit type, but what it
has stored is a string of the drive name, and so has to
process this data three times until it is in a usable form.
Now consider that this function may be at the heart of a
heavily used loop, this processing could become a
significant part of the time taken to execute the function.
It may be worth storing a TDriveUnit along with or
instead of the drive name, which could then be used
directly.
In some cases this problem can cause dummy data objects to be constructed on the stack, purely for the purpose of changing the interface to a particular data object.
Consider this example. Here we see the creation of an object just to get at a method call:
iDllEntry.iName.Des().Zero();
iDllEntry.iName.Des().Append(aPath.Drive());
iDllEntry.iName.Des().Append(KSysBin);
iDllEntry.iName.Des().Append(*resourceName);
iDllEntry.iName.Des().Append(KDllExtension);
This example shows two problems:
The primary problem is that the type coercion carried out
by the Des() call constructs a temporary object, and in
this instance this is being done five times, when the
result could easily have been stored locally and reused:
TPtr des = iDllEntry.iName.Des();
des.Zero();
des.Append(aPath.Drive());
des.Append(KSysBin);
des.Append(*resourceName);
des.Append(KDllExtension);
A less obvious problem stems from the compiler options
that are being used. The implementers of Des() have
marked it as being inline, but as the compiler has been
told to optimize for code size, and as the function is
used often, the compiler will decide not to enforce the
inline qualifier.
Inefficient file usage
This category covers a number of problems, and some of them apply not only to file misuse, but more widely to any data source that doesn’t offer the same instantaneous access as RAM does, for instance, hardware and network sources.
Inefficient use of the file system can arise when software uses it as if it were a database, where directory structure and filename format are used to define a database structure.
Another problem can come from ‘synchronous’ designs that read and process data from a file or other source in blocks, but do so serially. This can mean that processing of data is held up whilst waiting for the entire block to be read.
In the example below we find another common problem, that of reading files in multiple, small reads.
EXPORT_C CColorList*
ColorUtils::CreateSystemColorListL(RFs& aFs)
{
...
CDirectFileStore* store;
store=CDirectFileStore::OpenL(aFs,
KGulColorSchemeFileName,EFileRead | EFileShareReadersOnly));
...
RStoreReadStream stream;
stream.OpenL(*store,store->Root()));
...
CColorList* colorList=CColorList::NewLC();
stream>>*colorList;
return colorList;
}
The problem lies hidden in the implementation of the
overloaded C++ >> operator, a small section of the
InternalizeL function that is called is shown below:
aStream>>card;
const TInt count(card);
TRgb rgb;
for (TInt ii=0;ii<count;ii++)
{
aStream>>rgb;
iEikColors->AppendL(rgb);
}
We can see that it calls further overloaded >> operator
functions for each embedded class. Following through
the function calls shows that the structures these
functions create in memory are built in 32-bit blocks,
with each block causing a new read from the file. Even if
the File Server has a read-ahead cache for the file, the
depth of function calls for each read will cause a
performance problem.
Consider something like the following example instead:
aStream>>card;
const TInt count(card);
aStream->ReadL(iEikColors, count *sizeof(TRgb));
This will be faster, the trade-off is that care has to be taken
to ensure the internal format of TRgb hasn’t changed.
Inefficient Database Usage
Closely coupled to bad file usage is the problem of inefficient usage of the Symbian OS database systems. Both of the database systems that Symbian OS supports make heavy usage of the underlying file system and therefore using the APIs provided at this level can lead to problems showing up in the file usage patterns.
The first point to note is in the usage of the Compact() API. For durability reasons the database sub-systems do not ‘edit in place’, that is to say, if you change the database structure in any way, the new parts of the structure are appended to the end of the database file and the various markers in the structure are set to point to these new areas. This clearly has an impact on file size, as a database that is constantly being updated will continue to grow in size, unchecked. It is for this reason that there is a Compact() API. This essentially rebuilds the database, removing the ‘dead’ areas as it progresses. Clearly then, this operation will be slow, especially for large and complex databases, so it is important to ensure it is only called when necessary.
Another important performance consideration when working with databases is the schema of the database itself. The database components make extensive use of caching, to limit the impact of file system usage. Better performance can be achieved if the schema used lends itself to caching by increasing the locality of often read items.
Consider a database that has a single table of records, each record having a few small fields and a large field. For a particular use case, it is the small fields that are accessed more frequently. The performance of the use case can be improved if the database is restructured so that it has two co-indexed tables; one containing just the frequently accessed fields, and the other containing the larger, infrequently accessed fields.
If the performance of your database application is critical, you may wish to measure the performance of alternative implementations. As already mentioned, database application performance is largely governed by the choice of schema and access algorithms so additional effort spent working out the best solution for you can pay large dividends. If you do decide to produce a performance test harness, note that database operations have some natural variation in execution time so a number of iterations of each measured event will be required in order to obtain a true picture.
Bad use of Design Patterns
Design Patterns are a useful way of categorizing software problems into well-known classes. Common, tried and tested solutions to those problems can then be easily implemented. However, the use of Design Patterns should never be a substitute to actually thinking through a problem and deriving a design solution.
Further problems can arise even once a particular design pattern has been chosen. Design Patterns usually imply an implementation that is described in full Object Oriented terms. Such OO abstraction is not always appropriate. Blindly implementing the prescribed solution can often incur a cost to performance, or an increase in code complexity.
Consider the following code: the design has suggested the use of the State Pattern. One way to implement this pattern is to derive from a common base class, a class for each state. In order to initialize the state machine, the implementers have decided to use a Factory Pattern:
CExampleStateFactory* CExampleStateFactory::NewL()
{
CExampleStateFactory* factory= new (ELeave) CExampleStateFactory();
CleanupStack::PushL(factory);
// Create all the new states
factory->iStates[EError] = new (ELeave) TExampleStateError(*factory);
factory->iStates[EStarted] = new (ELeave) TExampleStateStarted(*factory);
factory->iStates[EStopped] = new (ELeave) TExampleStateStopped(*factory);
// etc...
CleanupStack::Pop();
return factory;
}
The state factory then owns each of the state classes,
and they each take a pointer to the factory. However, if
we look more closely at the state classes;
class TExampleStateBase
{
public:
TExampleStateBase(CExampleStateFactory*aFactory);
Inline TExampleStateBase*GetState(TStateEnum aState) {return iFactory->GetState(aState);}
private:
CExampleStateFactory *iFactory;
}
TExampleStateBase::TExampleStateBase(CExampleStateFactory* aFactory) : iFactory(aFactory)
{
}
TExampleStateStarted::TExampleStateStarted(CExampleStateFactory* aFactory) : TExampleStateBase(aFactory)
{
}
We can see that the factory pointer that each state takes
is only used for state switching; this means that the way
the State Pattern has been implemented has added a
great deal of complexity to the code. This can lead to
performance problems if the factory is called repeatedly,
or it can mean that much of the code only exists for
initialization, which leads to an increase in ROM usage.
If the State Pattern had been implemented differently, with the state switching handled outside of the State Machine itself, then the complex construction could be avoided, and much of the Factory Pattern code could be handled by compiler intrinsics.
If each state is simple and contains no member data, then the state machine can be encapsulated entirely by the Virtual Function Pointer Table. In this case we can eliminate the need for the Factory Pattern completely, and simply change state by using casting, though this comes with the trade-off of a decrease in code legibility, and the higher risk of defects.
Generic and ‘future proofed’ code
This category of problems comes from an over-reliance on frameworks and plugins. It can also involve cases of trading ease-of-development for software performance.
Consider an application that stores some configuration parameters in a ‘.ini’ file. During development it may be that these parameters are being altered often, and so it makes sense for them to be in an easily edited form. However, once they become static, the overhead of loading and parsing the file can become a performance problem. The ‘Framework and Plugins’ approach is one of the cornerstones of Symbian OS design, but all too often it can be used inappropriately.
The use of a framework carries with it an overhead in terms of code size, as you have to have administrative code to scan for plugins, check for changes in plugin availability, loading and linking of the plugin DLLs, and code for selecting a particular plugin. It also carries with it an overhead in terms of performance, as the framework usually provides a common client interface, which internally forwards requests to the plugin in use.
These are all acceptable tradeoffs when dynamic and in- the-field flexibility is required. However, when the set of plugins discovered and used becomes essentially static for the lifetime of a piece of software, these costs can become unacceptable.
Future proofing code can also be good practice, but again if the techniques are wrongly or inappropriately applied, there is a cost in code size and performance.
Developing and testing on the emulator
The Symbian OS emulator is provided as a development tool, as it allows a quick turnaround from writing code to getting it running. It also allows code to be debugged whilst it is running, without expensive hardware tools.
These facilities are invaluable, however, it is all too easy to forget about real hardware testing, especially when pressures of release deadlines loom. Hardware testing can often end up only being done for proving functionality, or ensuring the code runs on hardware prior to a release. This can lead to a poor understanding of how code behaves on actual devices, and while this doesn’t lead directly to performance problems, it can often mean they aren’t detected until the final stages of development or after the product ships.
This can appear to customers as though insufficient testing was performed or inadequate quality control was upheld. Additionally, it limits the scope for any improvements that can be made, as there will be little time or agreement to make big architectural changes.
You, your compiler and your code
As important as it is to know the system and the language you are programming for, so it is important to understand the basics of the compiler you are using. Many modern compilers contain a number of optimization phases that try to produce code for a set of desired qualities, be they small code size, or fastest execution path. It is important to understand which of these are being chosen in order to appreciate how that affects the code the compiler produces. It’s also important not to assume any ‘tricks’ done by one compiler will be done by another.
Don’t work against the compiler
Know your compiler, understand how the code it generates relates to the source you write. Don’t write code that is too prescriptive. This can force the compiler to produce code in a certain way, which may not be the most efficient code for the particular case.
Learn a little assembler
Assembler is often thought of as a ‘black art’ and is usually shied away from by most software developers. However, to fully understand how code will run on a particular platform, a passing understanding of assembler is incredibly useful.
Quick tips
Aside from the ‘killers’ mentioned in this booklet there are a number of smaller things that should be remembered when writing code. A lot of these are good common sense, and are usually formalized in a Coding Standard. A few of them are worth mentioning here.
Store results of calls used in loops
Avoid function calls in a loop’s condition statement. Prefer instead to store the value returned from a function call in a local variable, unless the result changes after each iteration.
Use references or pointers where necessary
Passing parameters by reference is usually a good thing, but don’t pass references to integer types if they are only ever read from.
Don’t unroll loops
With modern compilers this type of optimization is no longer necessary, and can even be counter productive. The compiler can perform this optimization where appropriate.
Avoid long if...else chains
It’s better to use a switch statement, as these can be implemented more efficiently. If the conditions aren’t constant integers, such as strings, then consider using a lookup table before a switch.
Use the const qualifier appropriately
By marking read-only variables as const the compiler can generate more efficient code.
Quick profiling using the Fast Counter
The user library provides access to a system clock that typical has a high resolution, User::FastCounter(). This can be used to measure the time taken for a particular piece of code to execute. The exact nature of the counter is device specific, but it’s attributes can be discovered using the HAL:Get() API: EFastCounterFrequency returns the frequency of the counter, and EFastCounterCountsUp returns an indication of which direction the counter progresses in.
Tools: the Sampling Profiler
This section is aimed at developers who have access to licensee prototypes and certain levels of SDK or are using reference boards.
The Sampling Profiler can be used to provide a rough, statistical view of software activity on a device. It does this by logging the thread ID and current Program Counter value at one millisecond intervals. It is accompanied by a command line-based program that can be used to analyse this data on a PC. This information can then be used to investigate performance problems, and inform code inspection of likely bottlenecks.
Build the ROM
To use the profiler, it first has to be added to the ROM. This can be done by adding ‘profiler.iby’ to the buildrom command line;
buildrom h4hrp techview profiler.iby
Start the profiler
The simplest way to control the profiler is from eshell. A command line such as:
start profiler start
This starts another thread that the profiler application runs in so you can switch back to other tasks using the <Control><Alt><Shift><T> key combination.
Run the code you wish to profile
At this point the profiler will be running, and gathering samples. A short pause before starting the code to be analyzed can help the thread activity analysis phase by visually separating out the various chunks of processing shown.
Stop the profiler
After you have profiled what you need, switch back to the eshell and stop the profiler;
profiler stop
And then to close the profiler data file;
profiler unload
Retrieve the profile data
You should have a file, profiler.dat, in the root of the C drive of the reference board. You can copy it to the MMC card and transfer it back to the build machine for analysis.
Analyze the data by activity
You should convert the data to a form suitable to be displayed in Excel in order to generate a graph, so you get an overall picture of the activity of the software you have profiled.
Copy the profile file
Copy it to the ROM folder as you need the symbol table to extract the names. Create the activity format file by running the following command:
analyse –r h4hrp_001.techview.symbol profiler.dat -v –mx > profile.xls
Create the activity graph
Open profile.xls file in Excel. To ensure the graph shows the thread names, delete the first six rows of the data. This is summary data and will mess up the graph if it is included. Similarly, the time stamps in the first column will mess up the graph but you cannot delete them as they are needed to cross reference the areas of the graph that you are interested in to the actual times.
Select all of the data and then click on the ‘chart wizard’. This opens up a four-stage wizard:
- select ‘Area’ from chart type and ‘Stacked’ from the sub-type, select ‘Next’
- adjust the area to miss out the time stamp in the first column. Change the A to a B. E.g., =profile!$A$2:$V$941 gets changed to =profile!$B$2:$V$941, select ‘Next’
- ignore the next pane and press ‘Next’, select ‘As new sheet’ and press ‘Finish’.
Select the active section and threads
By looking at the graph created, you should be able to work out what your program was doing and when, allowing you to locate the area you are interested in. You can hover over the data area with your mouse and a pop-up window will tell you which thread was running at that point. You can then use the row number of the point to find its timestamp by looking at the value in the first column of the same row number in the data sheet. Additionally, you can delete rows you are not interested in. Remember that Excel will renumber the rows so delete the end of the range first. The graph will be redrawn with the new data.
Create a listing by function
Once you know the range of timestamps and within which thread they occurred, you can create a list of the functions ranked in order of the activity. For example, if you were interested in what functions were called between the 51300 and 76900 timestamps in the EFile thread, you would use the following command:
analyse –r h4hrp_001.techview.symbol profiler.dat -bf –p –s51300-76900 –t EFile* > analysis.txt
- the sample range has no spaces after the –s or between the two numbers in the range and the hyphen separating them
- the sample range is the timestamps that come from the first column of the datasheet used to create the activity graph
- the thread name does have a space after the –t, and can include wildcards both at the beginning and the end of the name.
You can read the output file into a text editor (such as Notepad) where you will find the list of functions in the timestamp range selected. Usually, the top five or so functions will be of interest. You can then go to an IDE and inspect the relevant sections of code.



(no comments yet)