Array Performance | Download
Source code (Apr 2011): perf-array-src.zip | |
Heap, Map, MultiMap Commparison | Download
Source code (Jan 2015): perf-hash-map-src.zip | |
VS2008, VS2010, VS2013 | Compiled code excution speed comparison |
Array Performance | Download Source code (Apr 2011): perf-array-src.zip |
Array performance measures the impact of bounds checking on stl::vector and compares its three common access operators. Visual Studio 2008 has STL bounds checking enabled by default in Release build while VS2010 and newer have it disabled. The removal of bounds checking in Release builds significantly improves performance.
Two array tests are measured:
Update Performance: | container[i] = container[i] + n; |
Read Access: | total += container[i]; |
The Update Performance test measures the performance to read and write a value in a container object.
The measurments are collected on four different containers:
The Read Access test measures read access only and use 12x more iterations than Update test.
The measurements are collected on three different access methods:
DataVector is a custom container similar to std::vector but has no bounds checking, so the bounds checking measurements are not computed.
Bounds checking defaults to on with VS2008 and off for others.
The Elapsed column reflects a unitless elapsed clock ticks. The larger the number the longer the duration and the worse the performance. The Ratio column reflects the ratio of the elapsed ticks compared to the fastest item in the test group. A value of 1 means it performs as well as the best in the group. A value of 2 is 2 times slower.
Zoomed in graph excluding Vs2008.
Boost v1.57 Vector Read Access built with Visual Studio
Elapsed #ticks | Ratio | Container Object | Bounds Checking | Compiler /Target |
Update Performance obj[i]=obj[i] + n | ||||
7,598 | 8.2 | array[i] | Yes | VS2008/x64 |
7,614 | 8.3 | array[i] | Yes | VS2008/x86 |
925 | 1.0 | array[i] | No | VS2010/x64 |
1,648 | 1.8 | array[i] | No | VS2010/x86 |
939 | 1.0 | array[i] | No | VS2013/x64 |
1,652 | 1.8 | array[i] | No | VS2013/x86 |
15,997 | 17.4 | DataVector[i] | No | VS2008/x64 |
20,378 | 22.1 | DataVector[i] | No | VS2008/x86 |
925 | 1.0 | DataVector[i] | No | VS2010/x64 |
1,888 | 2.0 | DataVector[i] | No | VS2010/x86 |
942 | 1.0 | DataVector[i] | No | VS2013/x64 |
1,767 | 1.9 | DataVector[i] | No | VS2013/x86 |
28,240 | 30.6 | std::vector[i] | Yes | VS2008/x64 |
38,503 | 41.8 | std::vector[i] | Yes | VS2008/x86 |
936 | 1.0 | std::vector[i] | No | VS2010/x64 |
1,915 | 2.1 | std::vector[i] | No | VS2010/x86 |
943 | 1.0 | std::vector[i] | No | VS2013/x64 |
1,804 | 2.0 | std::vector[i] | No | VS2013/x86 |
21,249 | 23.0 | boost::vector[i] | Yes | VS2008/x64 |
33,801 | 36.7 | boost::vector[i] | Yes | VS2008/x86 |
923 | 1.0 | boost::vector[i] | No | VS2010/x64 |
1,839 | 2.0 | boost::vector[i] | No | VS2010/x86 |
922 | 1.0 | boost::vector[i] | No | VS2013/x64 |
1,978 | 2.1 | boost::vector[i] | No | VS2013/x86 |
Elapsed #Ticks | Ratio | Container Object | Bounds Checking | Compiler /Target |
std::vector Read Access | ||||
21,625 | 15.5 | std::vector[i] | Yes | VS2008/x64 |
26,443 | 18.9 | std::vector[i] | Yes | VS2008/x86 |
1,525 | 1.1 | std::vector[i] | No | VS2010/x64 |
1,611 | 1.2 | std::vector[i] | No | VS2010/x86 |
1,522 | 1.1 | std::vector[i] | No | VS2013/x64 |
1,397 | 1.0 | std::vector[i] | No | VS2013/x86 |
196,639 | 140.8 | std::vector.at(i) | Yes | VS2008/x64 |
105,111 | 75.2 | std::vector.at(i) | Yes | VS2008/x86 |
1,914 | 1.4 | std::vector.at(i) | Yes | VS2010/x64 |
1,846 | 1.3 | std::vector.at(i) | Yes | VS2010/x86 |
1,914 | 1.4 | std::vector.at(i) | Yes | VS2013/x64 |
1,765 | 1.3 | std::vector.at(i) | Yes | VS2013/x86 |
48,673 | 34.8 | std::vector::iterator | Yes | VS2008/x64 |
56,947 | 40.8 | std::vector::iterator | Yes | VS2008/x86 |
1,863 | 1.3 | std::vector::iterator | No | VS2010/x64 |
1,767 | 1.3 | std::vector::iterator | No | VS2010/x86 |
1,870 | 1.3 | std::vector::iterator | No | VS2013/x64 |
1,987 | 1.4 | std::vector::iterator | No | VS2013/x86 |
Elapsed #Ticks | Ratio | Container Object | Bounds Checking | Compiler /Target |
boost::vector Read Access | ||||
15,753 | 11.3 | boost::vector[i] | No | VS2008/x64 |
23,315 | 16.8 | boost::vector[i] | No | VS2008/x86 |
1,520 | 1.1 | boost::vector[i] | No | VS2010/x64 |
1,607 | 1.2 | boost::vector[i] | No | VS2010/x86 |
1,521 | 1.1 | boost::vector[i] | No | VS2013/x64 |
1,390 | 1.0 | boost::vector[i] | No | VS2013/x86 |
19,151 | 13.8 | boost::vector.at(i) | Yes | VS2008/x64 |
30,230 | 21.7 | boost::vector.at(i) | Yes | VS2008/x86 |
1,912 | 1.4 | boost::vector.at(i) | Yes | VS2010/x64 |
1,756 | 1.3 | boost::vector.at(i) | Yes | VS2010/x86 |
1,913 | 1.4 | boost::vector.at(i) | Yes | VS2013/x64 |
1,770 | 1.3 | boost::vector.at(i) | Yes | VS2013/x86 |
16,863 | 12.1 | boost::vector::iterator | No | VS2008/x64 |
24,542 | 17.7 | boost::vector::iterator | No | VS2008/x86 |
1,914 | 1.4 | boost::vector::iterator | No | VS2010/x64 |
1,766 | 1.3 | boost::vector::iterator | No | VS2010/x86 |
1,869 | 1.3 | boost::vector::iterator | No | VS2013/x64 |
1,987 | 1.4 | boost::vector::iterator | No | VS2013/x86 |
Heap, Map, MultiMap Comparison | Download Source code (Sept 2013): perf-hash-map-src.zip |
Sections covered in this article:
Hash Map | stdext::hash_map<std::string, int> |
Map | std::map<std::string, int> |
Hash MultiMap | stdext::hash_multimap<std::string, int> |
Multimap | std::multimap<std::string, int> |
Vector | std::vector< std::pair<std::string, int> > |
Deque | std::deque< std::pair<std::string, int> > |
HashAlloc | stdext::hash_map with custom allocator |
MapAlloc | std::map with custom allocator |
The following Boost 1.57 Containers have been added to the test suite.
Map | boost::container::map<std::string, int> |
Flat Map | boost::container::flat_map<std::string, int> |
Multimap | boost::container::multimap<std::string, int> |
Vector | boost::container::vector< std::pair<std::string, int> > |
Deque | boost::container::deque< std::pair<std::string, int> > |
/////////////////////////////////////////////////////////////////////////////// /// Exercise ordered containers /// /// 1. Add words to container until it has 'groupSize' elements. /// 2. Scan remainder of words counting duplicate. /// 3. When scanning sLookupScale * groupSize words, clear list and goto #1 template <typename Container_t> size_t Test(const Words& words, size_t groupSize, const char* objName, size_t& mallocCnt) { Container_t container; size_t lookupCnt = 0; size_t cnt = words.size(); // One shot pre-allocation value use to allocate next set of data nodes. // Afterwards it goes back to sAllocRate which is 16. container.get_allocator().SetPrealloc(groupSize); typename Container_t::iterator found; for (size_t idx = 0; idx < cnt; idx++) { const std::string& word = words[idx]; if (container.size() < groupSize) { // Fill - find dup and count, else add new word. found = container.find(word); if (found != container.end()) found->second++; // count duplicates else container.insert(PairSI(words[idx], 1)); } else { // Find dup and count, else ignore. found = container.find(word); if (found != container.end()) found->second++; // only count duplicates // Occassionally empty list and repeat fill cycle. if (++lookupCnt == groupSize * sLookupScale) { lookupCnt = 0; container.clear(); } } } mallocCnt = container.get_allocator().GetAllocCnt(); // The following is diagnostic code to sum word count and return its value // to compare to other containers to make sure all instances return the same value. size_t total = 0; for (typename Container_t::const_iterator iter = container.begin(); iter != container.end(); ++iter) { total += iter->second; } return total; }
The Test Engine above has been geared towards doing just this, it periodically clears and refills the container.
The custom allocator uses a reference counted SharedInfo object to track its private data. This shared object allows containers like map to create multiple allocator instances of different sizes and still provide statistics to the client code and cleanup correctly. Map can create 3 different allocators which come and go independent of their allocation requests. Meaning the state information could be lost by the time you are done using the Map if it were not shared between the allocators.
Even with the allocator's shared information, it is still associated with a single container. Meaning it is not a global pool used across multiple STL containers. This custom allocator has no thread protection. STL containers are are not thread safe.
Custom Allocator (for node containers such as Map, MultiMap, HashMap, etc):
/////////////////////////////////////////////////////////////////////////////// // Interface for Allocator class IAlloc { public: virtual size_t GetAllocCnt() const = 0; typedef size_t size_type; // Optionally // One shot pre-allocation value use to allocate next set of data nodes. // Afterwards it goes back to sAllocRate which is 16. virtual void SetPrealloc(size_type numElements) { } }; /////////////////////////////////////////////////////////////////////////////// // Custom allocator - groups allocation in chunks of 16. // Recyles deleted nodes by placing them on free list and using them for allocations. template <class TT> class PoolAlloc : public IAlloc { // Rate to allocate extra objects. static const unsigned sAllocRate = 16; public: typedef size_t size_type; typedef ptrdiff_t difference_type; typedef TT* pointer; typedef const TT* const_pointer; typedef TT& reference; typedef const TT& const_reference; typedef TT value_type; template <class UU> struct rebind { typedef PoolAlloc<UU> other; }; // Wrapper to add singly link list pointers to track next // free object and allocated pointers. class Wrapper { public: Wrapper() : pNext(0), pAlloc(0) { } Wrapper* pNext; Wrapper* pAlloc; // Place client data at end since its size can vary becaue // std::map has several internal lists of varying structure. #if 1 char value[sizeof(TT)]; // reserve size for TT, avoid its constructor #else TT value; // Safer, but slower #endif }; // Allocator shared state information. class SharedInfo { public: SharedInfo() : allocCnt(0), numToAlloc(sAllocRate), refCnt(1), pAlloc(0) { } ~SharedInfo() { FreeList(); } size_t allocCnt; // DEBUG - track number of allocations size_t numToAlloc; // # of allocations in next malloc, used to preallocate. int refCnt; // Reference counter, std::map has multiple internal list which share this info. Wrapper* pAlloc; // List of all allocations void FreeList() { // Free any allocations (follow pAlloc link list) while (pAlloc != NULL) { Wrapper* pNext = pAlloc->pAlloc; free(pAlloc); pAlloc = pNext; } } }; SharedInfo* pSharedInfo; Wrapper* pFree; PoolAlloc() throw() : pSharedInfo(new SharedInfo()), pFree(0) { } PoolAlloc(const PoolAlloc& other) : pSharedInfo(other.pSharedInfo), pFree(0) { pSharedInfo->refCnt++; } template <class U> PoolAlloc(const PoolAlloc<U>& other) : pSharedInfo((SharedInfo*)other.pSharedInfo), pFree(0) { pSharedInfo->refCnt++;} bool operator==(const PoolAlloc& other) { return pSharedInfo == other.pSharedInfo; } ~PoolAlloc () throw() { if (--pSharedInfo->refCnt == 0) delete pSharedInfo; } // Return number of calls to malloc executed so far. size_t GetAllocCnt() const { return pSharedInfo->allocCnt; } // One shot pre-allocation value use to allocate next set of data nodes. // Afterwards it goes back to sAllocRate which is 16. void SetPrealloc(size_type numToAlloc) { pSharedInfo->numToAlloc = numToAlloc; } // Pre-allocate nodes on free-list. void Preallocate(size_type numElements) { pSharedInfo->allocCnt++; // malloc faster than new[], avoids constructors Wrapper* pWrapper = (Wrapper*)malloc(numElements * sizeof(Wrapper)); pWrapper->pNext = 0; // Keep track of allocated memory, wire up link list. pWrapper->pAlloc = pSharedInfo->pAlloc; pSharedInfo->pAlloc = pWrapper; // Split up allocation and store on free list. for (unsigned idx = 0; idx != numElements; idx++) { pWrapper->pNext = pFree; pFree = pWrapper; pWrapper++; } } pointer address(reference x) const { return &x; } const_pointer address(const_reference x) const { return &x; } pointer allocate(size_type numElements, typename PoolAlloc<TT>::const_pointer hint = 0) { if (numElements != 1) { // Pass multile element allocation directly to malloc, no special handling. pSharedInfo->allocCnt++; return static_cast<pointer>(malloc(numElements * sizeof(TT))); } if (pFree == NULL) { if (pSharedInfo->allocCnt > 4) { // Once we have complete basic setup, allocate in larger chunks. Preallocate(pSharedInfo->numToAlloc); pSharedInfo->numToAlloc = sAllocRate; } else { // During setup, allocate in small chunks. Preallocate(1); } } if (pFree == NULL) return 0; // or should we throw. Wrapper* pWrapper = pFree; pFree = pWrapper->pNext; pWrapper->pNext = (Wrapper*)0; return (TT*)&pWrapper->value; } void deallocate(pointer ptr, size_type numElements) { if (numElements != 1) { // Pass multi-element allocation directly to free. free(ptr); } else { // Place memory on free list. const size_t sToWrapper = offsetof(Wrapper, value); Wrapper* pWrapper = (Wrapper*)((char*)ptr - sToWrapper); pWrapper->pNext = pFree; pFree = pWrapper; } } size_type max_size() const throw() { return size_t(-1) / sizeof(value_type); } void construct(pointer p, const TT& val) { new(static_cast<void*>(p)) TT(val); } void construct(pointer p) { new(static_cast<void*>(p)) TT(); } void destroy(pointer p) { p->~TT(); } };
/////////////////////////////////////////////////////////////////////////////// // Custom allocator - counts memory allocation calls. template <class T> class CountAlloc { public: typedef size_t size_type; typedef ptrdiff_t difference_type; typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; typedef T value_type; template <class U> struct rebind { typedef CountAlloc<U> other; }; struct SharedInfo { public: SharedInfo() : allocCnt(0), refCnt(1) { } size_t allocCnt; int refCnt; }; SharedInfo* pSharedInfo; CountAlloc () throw() : pSharedInfo(new SharedInfo()) { } CountAlloc (const CountAlloc& other) : pSharedInfo(other.pSharedInfo) { pSharedInfo->refCnt++; } template <class U> CountAlloc(const CountAlloc<U>& other) : pSharedInfo((SharedInfo*)other.pSharedInfo) { pSharedInfo->refCnt++;} bool CountAlloc::operator==(const CountAlloc& other) { return this == &other; } ~CountAlloc () { if (--pSharedInfo->refCnt == 0) delete pSharedInfo; } size_t GetAllocCnt() const { return pSharedInfo->allocCnt; } pointer address(reference x) const { return &x; } const_pointer address(const_reference x) const { return &x; } pointer allocate(size_type size, typename CountAlloc<T>::const_pointer hint = 0) { pSharedInfo->allocCnt++; return static_cast<pointer>(malloc(size * sizeof(T))); } void deallocate(pointer p, size_type n) { free(p); return; } size_type max_size() const throw() { return size_t(-1) / sizeof(value_type); } void construct(pointer p, const T& val) { new(static_cast<void*>(p)) T(val); } void construct(pointer p) { new(static_cast<void*>(p)) T(); } void destroy(pointer p) { p->~T(); } };
Input Word | Analysis: |
40743 | words |
1409 | unique words |
28 | average #duplicates |
6 | average length |
39 | maximum length |
Group Size is the maximum size the container is allowed to grow during the process of counting duplicate words. The test input contains 1409 unique words, so any container smaller than this will fail to count some words. The larger the Group Size the larger the container and the harder it works to find duplicates.
The graph and report plot the ratio of the elapsed time of the test divided by the elapsed time of the fastest test.
Smaller ratio values are faster, with 1.00 the fastest. A ratio value of 2 is twice as slow as the fastest.
The following results are with 4 background threads running the same test in parallel as the foreground test. The background threads are used to stress the memory sub-system to make it more expensive to allocate memory. The tests are repeated 4 times and averaged. The test are built using Visual Studio 2010 and 2013 and run on Windows 7/x64.
Microsoft Standard Template Library
Performance (1.00=fastest) - Ratio of elapsedTicks/fastestTicks(15,634,428):
Group Size | HashMap | HasMpAlloc | HashMmap | Map | MapAlloc | MultiMap | Vector | SortedVec | DequeFront | DequeBack |
100 | 1.45 | 1.00 | 1.35 | 1.98 | 1.66 | 1.94 | 5.85 | 2.74 | 11.87 | 10.55 |
500 | 1.65 | 1.28 | 1.66 | 2.71 | 2.43 | 2.74 | 28.41 | 8.51 | 61.06 | 52.31 |
1000 | 1.90 | 1.78 | 1.81 | 3.41 | 3.50 | 3.87 | 45.22 | 21.39 | 134.60 | 86.83 |
Memory - Number of Memory Allocations called
HasMpAlloc and MapAlloc use custom allocator which reduces memory allocations.
Group Size | HashMap | HasMpAlloc | HashMmap | Map | MapAlloc | MultiMap | Vector | SortedVec | DequeFront | DequeBack |
100 | 7204 | 8 | 7204 | 7201 | 6 | 7201 | 1 | 1 | 7561 | 7561 |
500 | 6504 | 8 | 6504 | 6501 | 6 | 6501 | 1 | 1 | 6592 | 6592 |
1000 | 6005 | 9 | 6005 | 6001 | 6 | 6001 | 1 | 1 | 6049 | 6049 |
Boost 1.57 Container Performance
Performance (1.00=fastest) - Ratio of elapsedTicks/fastestTicks(29,787,329):
Group Size | map | flat_map | MuliMap | Vector | DequeFront | DequeBack |
100 | 1.06 | 1.09 | 1.00 | 3.20 | 6.08 | 5.22 |
500 | 1.69 | 2.22 | 1.58 | 16.97 | 27.36 | 24.43 |
1000 | 2.14 | 3.38 | 2.30 | 25.57 | 56.92 | 36.63 |
Memory - Number of Memory Allocations called
Group Size | map | flat_map | MuliMap | Vector | DequeFront | DequeBack |
100 | 7200 | 8 | 7200 | 1 | 601 | 577 |
500 | 6500 | 10 | 6500 | 1 | 543 | 534 |
1000 | 6000 | 11 | 6000 | 1 | 501 | 499 |
Compiler | Fastest Tick Count |
vs2013/x64 | 15,634,428 |
vs2010/x64 | 17,069,494 |
vs2008/x64 | 58,195,100 |