Walking the key-value container
The recent discussion in comments to my latest articles (Fun with enumerators - Walking the (integer) list, On bad examples and smelly code) caused a shift in my perspective. I was always treating my TGpIntegerObjectList and TGpInt64ObjectList as lists with some baggage attached, but in practice I'm almost never using them that way. Most of the time I treat them as a background storage for key-value pairs. What's the difference? Most of the time, I don't care about item indices. When I use my lists as containers, I never need to know where in the list some (Key, Value) pair is stored. OK, almost never. When I'm deleting from the list, I sometimes use the index, just for the performance purposes. And when I access the Value part, I have to find the index with IndexOf and then use it to reference the Objects property. There are probably other cases too - but that's something that you just have to take into account if you want to use a list as a storage media. From time to time I'm extending lists in my GpLists unit with small wrappers that help me not to use list indices in some specific situation. Today, I used the new Walk enumerator in some code and asked myself: "Why does it have to return list index? Why couldn't it return a (Key, Value) pair?" Good question. Why couldn't it? It turns out that it could. A little enumerator that couldLet's set up some assumptions first. Assume that I have this pointless list. il := TGpIntegerObjectList.Create;Further assume that I want to walk over it and display both number and text for each item. I can do this with a standard loop. for idx := 0 to il.Count - 1 doOr with my index-returning walker. for idx in il.Walk doBut what I'd really like to do is. for kv in il.WalkKV doOr even better. for kv in il.WalkKV do In other words, I want to return not a single item, but a pair of items from the enumerator. Of course, Delphi expressions can only return a single result and not a tuple so we have to provide a wrapper for enumerated (Key, Value) pairs. We have to pack them in a record, class or interface. Getting all self-destructiveMy first idea was to return an interface to a key-value object from the enumerator. IGpKeyValue = interface That surely works fine, but guess what - it's incredibly slow. I wouldn't expect anything else - after all an object is allocated for every enumerated value, plus all that complications with interface reference counting... I did some testing, of course. Thousand iterations over a list with 10.000 elements. Results are quite interesting. 5 milliseconds for a standard for loop. 50 milliseconds for my Walk enumerator. 5 seconds for interface-based WalkKV. Ouch! Let's return to the drawing board... One allocation is enoughMy next idea was to return not an interface, but an object. When you return an object, you actually return a pointer to the object data, which is quite fast. It would not help much if I would recreate this object every time the GetCurrent is called, but luckily there is no real reason to do that. I can create the object when enumerator is created and destroy it when enumerator is destroyed. TGpKeyValue = class BTW, you can see another trick in this implementation - enumeration by delegation. I'm reusing my Walk enumerator to do the actual walking. That works much faster than the interface-based version - 300 ms for my test case. It's still six times slower than the Walk enumerator, though, and it is not really obvious why the difference is so big. Leave that be, for a moment. The third approach would be to use a record to store the current (Key, Value) pair. Then there is no allocation/deallocation at all, but resulting code is not faster. Record-based enumerator needs about 500 ms to run the test case. This slowdown occurs because record is not returned as a pointer, but as a full copy. In the class-based scenario, GetCurrent returns a pointer to the TGpKeyValue object and that pointer is passed in a register. In the record version, GetCurrent returns not a pointer to the record, but the record itself - it copies full record to the stack so the caller can use this copy - and that is waaaay slower. The speed differenceLet's return to that major speed difference between Walk and WalkKV. I looked at the code, but couldn't find any good reason. Then I turned to the CPU view and it was evident. The problem lied not in the enumerator, but in my poorly designed benchmarking code :( You see, I was timing multiple repetitions of these three loops: for idx := 1 to 10000 do Three empty loops, so I'm timing just the enumeration, yes? No! First loop just runs from 1 to 10000. Trivial job and compiler will generate great code. Second loop does the same, but with more overhead. Third loop does much more than that. It also accesses il[] and il.Objects[] (inside the GetCurrent). In reality, code inside the for statement in the first two cases would have to access il[] and il.Objects[] by itself. The code inside the third for statement has no need for that - it gets data neatly prepared in kv.Key and kv.Value. I changed the loops to: for idx := 0 to il.Count - 1 do begin I'm just accessing and throwing the Items[]/Key information away and then I copy the Objects[]/Value information into the local variable. All loops are now running comparable jobs. Results are now significantly different. Simple for loop needs 300 ms to run the test, Walk version needs 310 ms (really small difference) and WalkKV version needs 470 ms. It is still slower, but by less than the factor of two. If there would be a real code inside the for loop, the difference would be unnoticeable. Morale? You should benchmark, but you should also check that you're benchmarking the right thing! Syntactic sugarThe generic version of WalkKV only supports this kind of code: for kv in il.WalkKV do But there's a really neat trick to change this into for kv in il.WalkKV do without modifying the WalkKV itself. Can you guess it? Class helpers, of course. You just have to declare a simple helper in the WalkKV consumer (no need to change the WalkKV itself) and you can use the StrValue instead of TGpString(kv.Value).Value. TGpKeyStrValue = class helper for TGpKeyValue Conclusion: class helpers are great. Key-value walker is useful. Enumerator-induced performance loss is negligible. Enumerators are fun. --- I've tagged this and all previous enumerator-related articles with the enumerators label so you can now access them all at once via this simple link. Labels: best practices, Delphi, enumerators, programming, source code, utilities |
3 Comments:
Oh dear, class helpers again.
Why is it so important to not have to modify the class you are "helping", if you "own the class yourself.
All you have done it make it harder to use that class (since you presumably place the helper in a separate unit, otherwise the avoidance of modifying the source itself is a bit pointless).
So now code that uses that class has to use two units, instead of just using the unit that contains the class.
And then when someone else want to add some more capabilities to that class without modifying the source, now you have 3 units to use in order to use one class.
And nothing in the actual source of the class is going to tell you which other units you need in order to "complete" the class.
Which is why class helpers (and partial classes, which suffer very similar issues) are a great hack for modifying classes to which you don't have source access, but are a flat out CRAZY idea if you have access to that source.
You just create usage and maintenance headaches for your future.
And as I have commented on before, code spends the majority of it's life being used and maintained, and only a vanishingly small fraction of it's life being created.
Other than that, a GREAT example this time.
:)
No, that's definitely NOT what I had in mind by introducing the class helper.
There is unit GpLists with my lists, enumerators and with the TGpKeyValue class.
Then you use the TGpIntegerObjectList class in some other unit, let's call it unit A. You store some objects inside Objects[] property. In my example I stored TGpString there.
In your code in unit A that uses TGpString stored in Objects[], you can use
for kv in list do
DoSomething(TGpString(kv.Value).Value);
To simplify this, you can introduce the class helper - and you implement it in the same unit A! So it is defined (it even doesn't have to be declared in the interface section) and used in the same unit. It only serves as a kind of strong-typed macro, nothing more.
There is no sense in modifying GpLists unit to achieve the same effect, I think you'll admit to that.
IMO, if you own the class you want to help - derive and override instead of "helping".
A stumbling block for class helpers is that you can only help the class once in the same scope. In other words - you can kiss polymorphism goodbye for that class in that scope.
IMO, class helpers - although nice - should be considered a last resort.
Post a Comment
Links to this post:
Create a Link
<< Home