2 июня 2023, 1:21

Performance Bottlenecks Hiding on the Client Side

As an iOS engineer, for the most time in my career, I have been “transforming JSONs into beautiful UI”. Compared to backend development, handling large amounts of data and doing performance optimizations are not a typical part of our work. Of course, from time to time, performance does matter — especially to make the UI smooth, but the techniques are often different — such as reusing views or offloading expensive work from the main thread. Additionally, if we want the client to be thin, most of the heavy job is delegated to the server, for example, content ranking, search, filtering and so on.

However, sometimes you still have to perform some expensive operations on the client side — for example, because for privacy reasons you don’t want some local data to leave the device. It’s easy to accidentally make those parts of code extremely inefficient — especially if you haven’t built this muscle of quickly spotting potential complexity issues yet. Algorithms and data structures do matter — this is something I only truly realized only several years into my mobile career, and I still see this thing to be often overlooked in the industry. Of course, early optimization is not needed and may even do harm (see premature optimization), but even basic calculations can become performance bottlenecks that severely harm user experience.

There is only one way to solve it — embrace the basics — which means using appropriate algorithms and data structures for the task at hand. One real example that I always recall is a thing I built for one of my projects many years ago. For an invitation flow, I had to implement a contact merge feature where the data would come from three different sources — backend, social account and local iPhone address book. We wanted to combine contacts from these sources into one if they had any overlapping channels (phone numbers or emails). The result would be an array of contacts with all their channels, so there would be no duplicate channels for two different contacts.

At first, my naive approach was just to go one by one and see if in the remainder of the list any contact has overlapping channels with the current one, merge them if yes, repeat. This was needed because, for example, the last channel in the list could have two channels — one that would overlap with the current one, and the second one that could have appeared in the previous contacts which would mean having to go through the list again.

I implemented this, and it worked pretty reliably, here is the pseudocode:

func slowReliableSmartMerge(contacts: [Contact]) -> [Contact]  {
    var mergedContacts = contacts
    var results = [Contact]()
    var merged = true

    while merged {
        merged = false
        results.removeAll()

        while !mergedContacts.isEmpty {
            var commonContact = mergedContacts.first!
            let restContacts = mergedContacts.dropFirst()

            mergedContacts.removeAll()

            for contact in restContacts {
                if contact.hasNoOverlappingChannels(with: commonContact) {
                    mergedContacts.append(contact)
                } else {
                    merged = true
                    commonContact = Contact.mergedContactFrom(contact: commonContact, otherContact: contact)
                }
            }
            results.append(commonContact)
        }

        mergedContacts = results
    }

    return mergedContacts
}

An experienced engineer would quickly spot the issue here, but plese bear with me for a minute. I tested this on my device which had roughly 150 local contacts, 100 friends on social media, and a couple dozen users from the server. It would finish in just a couple of seconds after showing a spinner — “not a huge deal” I thought and moved on to the next feature. Test devices had much fewer contacts, so it worked instantly there. Then a couple of weeks later we started getting some reports from the users that this spinner can take a minute or even longer. Suddenly I realized that the issue was related to complexity, and then I figured that the approach I had taken could actually hit the O(n^2) complexity — similar to the bubble sort.

I quickly discussed that with another engineer on a whiteboard, and we came up with hashmaps to optimize this significantly:

func smartMerge(contacts: [Contact]) -> [Contact] {
    var channelToContact = [String: Contact]()
    var contactToChannels = [Contact: Set<String>]()

    for contact in contacts {
        var mergedContact = contact

        for channel in contact.allChannels {
            if let matchingContact = channelToContact[channel] {
                if mergedContact !== matchingContact {
                    let mergedMatchingContact = Contact.mergedContactFrom(contact: matchingContact, otherContact: mergedContact)
                    contactToChannels[mergedMatchingContact] = (contactToChannels[mergedContact] ?? []).union((contactToChannels[mergedMatchingContact] ?? []))

                    if let channels = contactToChannels[mergedContact] {
                        for c in channels {
                            channelToContact[c] = mergedMatchingContact
                        }
                    }

                    contactToChannels[mergedContact] = nil

                    mergedContact = mergedMatchingContact
                }
            } else {
                channelToContact[channel] = mergedContact

                if contactToChannels[mergedContact] != nil {
                    contactToChannels[mergedContact]!.insert(channel)
                } else {
                    contactToChannels[mergedContact] = [channel]
                }
            }
        }
    }

    return contactToChannels.keys
}

The eventual complexity was linear, the spinner would just flicker for a split second, and the tests were luckily green.

Since then, I’ve always been much more alert when it comes to doing some computation on the client side that potentially can have a variable-sized input. This all seems to be very obvious to me now, but back in the day this didn’t look too important to me. I think having a proper understanding of the complexity that comes with various algorithms and data structures can make you a much better software engineer which will lead to better products you build. After all, this is how the big tech companies hire — they value coding skills more than knowledge of certain frameworks.

These days, it’s also important for new folks who switch to software engineering from other areas — they often start their career with simple projects that involve UI work or simply connecting the stuff that’s built on top of well-known frameworks. I’d encourage them to also master the core things like algorithms in order to excel at this job.

Поделиться