OFX Library Roadmap

I recently pulled out an old coding project I had started almost 10 years ago involving OFX files.  OFX files are used by Money and Quicken to import transactions from banking and investment websites.  The plan now is as it was then, to create a utility to convert CSV files to OFX files for websites that have given up on OFX support.

Back then, I was at a serious disadvantage because I was reverse-engineering the files.  Despite the name, the Open Finance eXchange organization was not very open about sharing the specifications for the OFX file format.  This time around, I was able to find and download the specification files and have been building my object model in accordance with them.

It’s not an easy spec to digest, for sure.  But, it’s more than what I had and I’m able to make some significant progress.  After a few focused days of coding, I am able to create OFX files for Banking and Credit Card transaction downloads.  Currently, I am building out the Investment download, which is as big as everything I’ve done so far.

So, I wanted to pause for a moment and write out what I want to accomplish in the near and distant future.  Here’s a short list of goals:

  1. Create banking transaction download (X)
  2. Create credit card transaction download (X)
  3. Create investment transaction download
  4. Create validation functions
  5. Create import functions
  6. Support v2.0+ format
  7. Support direct serialization/deserialization

So as you see, it’s not a short path and there’s still plenty to do.  So, with that, back to coding.

Results from Phase 1

After completing the goals of downloading the three major transaction lists, I grabbed the broom and cleaned up the mess I had made.  A clean shop will make going forward easier.  So, upon reflection, what’s good and bad?

I started the code with little to no understanding of the model I was coding.  Because of that, I was not able to preemptively design namespaces and so my class names were long and redundant, like InvestmentPositionBuyMutualFund.  This specific case was refactored to OFX.Investment.Position.BuyMutualFund.  Creating namespaces simplified my naming structure greatly.  I was not able to really think big-picture until I had all the classes coded.  Even before creating the namespaces, I had to physically organize the classes into folders.

I started out aggressively trying to reuse classes as much as possible, which led to some class hierarchies that led to poor developer experience.  The biggest example was when I had a property of a base class type.  Yeah, it was sort of good that it was limited to only the classes that inherited from it, so it was sort of discoverable, but it meant casting it to the specific type every time you wanted to work with it, which really sucked.  I was trying to be clever with automatic instantiation with a factory, but in the end, it was better to drop that property from the base class and move an instance of it to each specific class.

As I refined the code I was pleased that I took the time to automatically instantiate any objects that were needed through properties.  It’s one less thing you have to worry about is setting an instance to a property.  And as I write more detailed unit tests, I should be able to find any cases where I forgot to do this.

The unit tests have been very helpful for me to test out my changes and my refactorings as I go.  In actuality, the unit test code is pretty much what a developer is going to be doing with the library anyway, so it’s also documentation.

So what’s bad?  The OFX schema is pretty ridiculous.  Even in my unit test coding, I was using an existing OFX file to determine what properties I need to set.  There’s going to need to be a lot of documentation.  I think I can assist by making constructors from classes that include all the required fields for the node.  I know the validate methods will help when I get to them.  So that’s the big bad, is that there’s no real way to simplify this library.

The only other thing I’m not happy about is how much more there is to do.  I have a lot more classes to write and I have a lot of classes written that aren’t being used (yet).  I don’t have any practical way of testing these.  Well, I do, but it’s not practical.  The way I have to test them is to create new file in Money, and create OFX files that simulate transactions that I can verify got imported correctly.  Unit tests, you are going to get a WORKOUT.

Uneventful

After reading the post about events and AddHandler and tracking references and hanging objects, you might be a little concerned about using events at all, maybe because of the added code or the housekeeping.  If so, you can consider another way of doing event-type behavior using Delegates.

Delegates can be used many different ways.  This particular way might be a little different than what you’ve previously seen.  The general concept is to keep a variable holding all the delegates to be called for an event.  This variable (as a delegate itself) can be invoked at any time, and more importantly, can be cleared at any time.

Reusing some old code from the previous post, we still have our HangingObject class and we now have this method to build the objects and call their internal method:

    Private Sub UsingDelegate()
        Dim o As HangingObject

        HelloList = [Delegate].RemoveAll(HelloList, HelloList)
        
        For i As Integer = 1 To 5
            o = New HangingObject
            HelloList = [Delegate].Combine(HelloList, New SayHelloDelegate(AddressOf o.SayHello))

        Next

        HelloList.DynamicInvoke(Nothing)

    End Sub

And at the form level, we have the delegate sub definition and the delegate variable holding all the calls.

    Public Delegate Sub SayHelloDelegate()
    Dim HelloList As [Delegate] = Nothing

So what’s different?  We’ve changed the event declaration to a Delegate Sub declaration; we’ve added a variable to hold all the event subscriptions; instead of using AddHandler, we use Delegate.Combine to register the event subscription; and instead of RaiseEvent, we use DynamicInvoke.  There’s not much more or less, but everything is different.  The list of subscribers is accessible now through the variable HelloList, which is a huge benefit.

The two interesting parts of this code are the shared methods on the Delegate class: Combine and RemoveAll.  The delegate itself (HelloList) contains a list of invocation targets, similar to subscribers of an event.  The Combine method merges the specific SayHelloDelegate invocation list with the generic HelloList invocation list, resulting in one list of all the targets.  Calling the DynamicInvoke method performs a .Invoke on all the delegates in the invocation list.  Simple and magical.

However, because HelloList is declared at the form level, it persists between calls and can suffer the same issues as the AddHandler method.  The nice thing is that you can clear the invocation list by using RemoveAll, or you can just set the variable to Nothing.  If HelloList was defined within the method instead of the form, it would be cleared at the end of the method, unlike the AddHandler method, where the event is declared at the form level.

It’s good to know a lot of different ways to do something, just in case.  Another tool in the coding toolbox.

ByVal, ByRef, Structure, Class, ugh.

In another post I referred to this topic as a phenomena.  That’s pretty over-dramatic.  But there are some important rules and guidelines for types in .NET and until you know why things happen as they do, it seems like voodoo.  So here’s an all-in-one example to clear it all up.  But, be warned that it’s going to be very confusing until it “clicks”.

A brief prelude:  .NET has two variable types, value types and reference types.  A value type is stored on the stack, is generally small, and contains its actual data.  A reference type is stored on the heap, has an unknown or variable size, and contains pointers to the actual data.  .NET also has two definitions that utilize these types: Structures and Classes.  Structures are value types and Classes are reference types.

Method calls support two parameter keywords: ByVal and ByRef.  ByVal passes the parameter data to the method by value, in other words, it sends the values to the method.  ByRef sends the parameter data to the method by reference, it sends a reference to the parameter data.  The thing that is most misunderstood is that if the parameter type is a reference type (a class) and it is sent ByVal, the value is a copy of the reference.  When the parameter type is a value type (a structure) and is sent ByVal, the value is a copy of the value – a real copy.  Here’s a quick real-world summary of method/parameter behavior:

  • A class passed ByVal – the method works on the original (kind of)
  • A structure passed ByVal – the method works on a copy
  • A class passed ByRef – the method works on the original
  • A structure passed ByRef – the method works on the original

If you’re still following, you might notice some vagueness in the most common method call type: classes passed ByVal.  This will be explained after a demo of these different combinations.  So for demonstration, we create a Class and a Structure with a public field:

Public Class NameClass
    Public Name As String

    Public Sub ChangeName()
        Name &= " (changed)"
    End Sub

End Class

Public Structure NameStructure
    Public Name As String

    Public Sub ChangeName()
        Name &= " (changed)"
    End Sub

End Structure

A method is in each to modify the internal state.  This will prove whether we are working on a copy or the original.  Next, we make a class that uses these objects and modifies them in private methods with parameters passed in various combinations:

Public Class PersonClass
    Public Name1 As New NameClass
    Public Name2 As New NameStructure

    Public Sub UpdateNames()
        Name1.Name = "Class Name"
        Name2.Name = "Structure Name"

        Debug.WriteLine("Name1 before ByVal: " & Name1.Name)
        UpdateClassByVal(Name1)
        Debug.WriteLine("Name1 after ByVal: " & Name1.Name)

        Debug.WriteLine("Name2 before ByVal: " & Name2.Name)
        UpdateStructureByVal(Name2)
        Debug.WriteLine("Name2 after ByVal: " & Name2.Name)

        Debug.WriteLine("Name1 before ByRef: " & Name1.Name)
        UpdateClassByRef(Name1)
        Debug.WriteLine("Name1 after ByRef: " & Name1.Name)

        Debug.WriteLine("Name2 before ByRef: " & Name2.Name)
        UpdateStructureByRef(Name2)
        Debug.WriteLine("Name2 after ByRef: " & Name2.Name)

    End Sub

    Private Sub UpdateClassByVal(ByVal item As NameClass)
        item.ChangeName()
    End Sub

    Private Sub UpdateStructureByVal(ByVal item As NameStructure)
        item.ChangeName()
    End Sub

    Private Sub UpdateClassByRef(ByRef item As NameClass)
        item.ChangeName()
    End Sub

    Private Sub UpdateStructureByRef(ByRef item As NameStructure)
        item.ChangeName()
    End Sub

End Class

So, after instantiating this class and calling UpdateNames, we get the following results:

Name1 before ByVal: Class Name

Name1 after ByVal: Class Name (changed)

Name2 before ByVal: Structure Name

Name2 after ByVal: Structure Name

Name1 before ByRef: Class Name (changed)

Name1 after ByRef: Class Name (changed) (changed)

Name2 before ByRef: Structure Name

Name2 after ByRef: Structure Name (changed)

In the same order as the bullet list above, we can see that Name2 (the structure) passed ByVal did not change, showing that the method was working on a copy.  Everything else remained changed after leaving the method call, showing they were working on the original.

Now to add confusion and clarity to the ambiguity of passing classes ByVal…

When you pass a class to a method ByVal, you are sending a copy of the reference.  Everything that is inside that class is a reference as well, so when you change a property, it’s still changing the same property in the original – they share the same reference.  This essentially is like working on the original.  However, you cannot change the class itself.  What?

Here’s another bit of code to add to PersonClass to illustrate:

    Public Sub UpdateObjects()
        Name1.Name = "Class Name"

        Debug.WriteLine("Name1 before ByVal: " & Name1.Name)
        UpdateClassObjectByVal(Name1)
        Debug.WriteLine("Name1 after ByVal: " & Name1.Name)

        Debug.WriteLine("Name1 before ByRef: " & Name1.Name)
        UpdateClassObjectByRef(Name1)
        Debug.WriteLine("Name1 after ByRef: " & Name1.Name)

    End Sub

    Private Sub UpdateClassObjectByVal(ByVal item As NameClass)
        item = New NameClass
        item.Name = "Replaced Class"
    End Sub

    Private Sub UpdateClassObjectByRef(ByRef item As NameClass)
        item = New NameClass
        item.Name = "Replaced Class"

    End Sub

What this code does is try to reassign the value of Name1 to a new instance.  When you call UpdateObjects, you will see you can’t change the instance of Name1 when the parameter is passed ByVal, but you can when it is passed ByRef.

Name1 before ByVal: Class Name

Name1 after ByVal: Class Name

Name1 before ByRef: Class Name

Name1 after ByRef: Replaced Class

Again, because ByVal passes a copy of the reference where ByRef passes the actual reference.  If you reassign the value when it is passed ByVal, you’re only reassigning to a copy, which has no effect on the original.

In real-world usage, using ByVal with classes is going to work for you 99% of the time, but you need to understand why and how things work to handle that odd 1% of cases and avoid crazy workarounds.

Why Won’t You Go Away?!

All .NET developers should know that .NET is pretty reference-happy.  There’s an interesting phenomena with Structures and Classes that I can illustrate later, but the issue I wanted to point out here involves references and garbage collection.  As we know, .NET manages object cleanup through the garbage collector when there are no more references to the object.  This is kind of nice because you can let objects just fall out of scope and the GC will take care of everything.  That is, if nothing else has a reference to those objects.

One thing to pay special attention to is event handlers.  These create references and can keep objects alive MUCH longer than you want; maybe for the running life of the application.  As a potential scenario, you have a form with a form-level event.  That form has a method that creates some objects that listen for that event.  The method does its stuff and finishes.  You call the method again and suddenly you have twice as many responses to the event.  The objects you set up from the first run still exist and listen for that event.

Here’s some demo code to illustrate this.

The class that responds to the event:

Public Class HangingObject

    Public Sub SayHello()
        MsgBox("Hello from " & Me.GetHashCode.ToString)
    End Sub

End Class

The method that creates the objects and raises the event:

    Private Sub ShowHanging()
        Dim o As HangingObject

        For i As Integer = 1 To 5
            o = New HangingObject
            AddHandler Me.SayHelloEvent, AddressOf o.SayHello

        Next

        RaiseEvent SayHelloEvent()

    End Sub

And the form-level event:

Public Event SayHelloEvent()

So if you call ShowHanging, you get 5 Messageboxes.  If you call it again, you get 10, and so on.

The reason for this is the AddHandler statement.  AddHandler creates a reference to the instance of the object “o” and stores it with the form-level event SayHelloEvent.  When do these references get removed?  When the form is disposed.  If that form is the main form of the application, that will be when the application ends.

Can you get around this?  Maybe by implementing Dispose and disposing the objects?

Public Class HangingDisposableObject
    Implements IDisposable

    Public Sub SayHello()
        MsgBox("Hello from " & Me.GetHashCode.ToString)
    End Sub

    Private disposedValue As Boolean
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If Not Me.disposedValue Then
            If disposing Then
            End If

        End If
        Me.disposedValue = True
    End Sub

    Public Sub Dispose() Implements IDisposable.Dispose
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub

End Class

    Private Sub ShowHanging()
        Dim o As HangingDisposableObject

        For i As Integer = 1 To 5
            o = New HangingDisposableObject
            AddHandler Me.SayHelloEvent, AddressOf o.SayHello
            o.Dispose()
        Next

        RaiseEvent SayHelloEvent()
    End Sub

You’d be surprised.  You still get 5 Messageboxes even though all five were Disposed inside the loop.  Ah, but the garbage collector hasn’t run yet.  So let’s force it to do a collection.

    Private Sub ShowHanging()
        Dim o As HangingDisposableObject

        For i As Integer = 1 To 5
            o = New HangingDisposableObject
            AddHandler Me.SayHelloEvent, AddressOf o.SayHello
            o.Dispose()
        Next

        GC.Collect()
        RaiseEvent SayHelloEvent()
    End Sub

Still surprised?  You shouldn’t be.  The GC won’t collect (Finalize) the objects because there is still a reference to that object (the event handler).  So how do you manage this?  You have to call RemoveHandler to remove the reference.  And that means you have to keep your own references to the objects until you’re done with them and you do the cleanup yourself.

    Private Sub ShowHanging()
        Dim oCollection As New Generic.List(Of HangingObject)
        Dim o As HangingObject

        For i As Integer = 1 To 5
            o = New HangingObject
            AddHandler Me.SayHelloEvent, AddressOf o.SayHello
            oCollection.Add(o)

        Next

        RaiseEvent SayHelloEvent()

        For Each o In oCollection
            RemoveHandler Me.SayHelloEvent, AddressOf o.SayHello
        Next

    End Sub

So IDispose is not the answer.  Keeping track of your objects and the references they hold is the answer.  And that should be the obvious answer anyway.

Delegatorial Rambling

I see in a sister blog a planned post regarding Delegates in .NET. I have used delegates before, and I think I have seen them misused before. But remaining on the positive, I’ll describe what the delegate capability did for me.

In an application I was writing, I made the decision to extract the product search form from the main UI application; my thought was that it would be a common element that could be used in many different applications in the long-range-planned application suite. My choice was a good one, as I did use it in multiple applications, but it did get modified along the way.

The first version was simple: you search for products, you highlight the product(s) you want, and click "Select". The search form would raise an event that the calling form would handle and process. The search form also had a button to view product details. You could highlight a product and click "View Details" to bring up another form with the product information.

This worked great until I wanted to use it in another part of the same application. In this new section, there was no "Select" function needed, only "View Details". One choice I pondered was making the Select button Public instead of Friend (remember this code is in a shared DLL now). Then the calling form could enable or disable it. I also considered making a Boolean property like HideSelect. That seemed pretty tacky. But my overall goal with designing the code was that it do what the programmer tells it and do the most obvious thing by default.

Going with that mindset, I decided the Select button will do nothing unless you give it something to do. Goodbye Event, hello Delegate. I dropped the public event and created a delegate sub that matched the signature of my old event. Then I created a private variable to hold the delegate and created a public property to Get/Set the delegate.

What benefits did this gain? I was then able to see if the calling form needed the Select button. If the private variable was Nothing, I disabled the Select button. If the private variable was a valid instance, I invoked it when the Select button was clicked. When I was using events, I couldn’t tell if anyone was subscribed to the event so I couldn’t take any action on the Select button.

Back in the calling form, the code change was minimal. Instead of doing an AddHandler statement, I created a new instance of the delegate in the search form using AddressOf in the constructor and set it to the public property on the search form. The target method that handled the event never changed.

That’s one way I used a delegate. The other way I used it was with the View Details button. For another application, the View Details was slightly different. In brief, the application needed to handle the display of the product instead of letting the search form do it. Again, I created a delegate sub, private variable and public property. In the click event of the View Details button, I checked to see if the delegate was Nothing. If it was Nothing, I performed the default action and displayed the product in the typical manner. If it was not nothing, I invoked the delegate instead of doing the default display. In this way, I was able to provide an override for default behavior.

Without using delegates, how could that have been done as easily? As I was determining how do implement the override feature, I considered using events, but that means that every calling form would have to handle that event in order to implement the default display action. That went against my design philosophy. I could have used a Boolean property, but…bleh. I insisted to myself that I would not have any shared library that required performing steps in a rigid fashion. Like, oh, the code bombed out. I forgot to call InitControls2() after setting the base properties. Yuck.

So that’s my delegate story. In summary, I used Delegates instead of Events because I was able to test to see if they were needed using IsNothing() and they provided me with a way to optionally inject a call to a remote codebase (that codebase being the calling form). Sorry for no code samples. I don’t have the code with me right now.