We all try, in our own ways, to be perfect. But everyone makes mistakes.
The past couple of months have been an interesting time at Palringo. A few times a year the offices converge on Newcastle for a “reset meeting” where we discuss our plans for the next 3-6 months, re-evaluate our current commitments, and scrap anything we no longer believe in.
The “Invite Centre” was originally conceived by Martin Rosinski approx. 6 months ago and presented at one of these meetings. I was tasked with making the largest user-facing project of 2013 (so far) a reality. The “Invite Centre” is how we plan to grow the Palringo community – and is heavily related to the Reputation and Achievements system I have detailed before. The “Invite Centre” would be the “big ticket item” for Palringo iOS v6.4 and Palringo Android v5.10.
Palringo iOS v6.4 was released on Monday 2nd Sept, 2013, our previous major release, Palringo iOS v6.2, was released on Monday 22nd July, 2013 – with a follow up bug-fix release, v6.2.1, on Monday 29th July.
Some background; all major releases will have up to 2 big ticket items – these are tasks that require a lot of development and testing time, and may not be apparent to the user. For v6.2 this was a transition to CoreData. A change of this scale will always result in bugs; the simple mantra of “the bigger the change, the more bugs there will be” always holds true. You can never test every combination, or find every bug – this is part and parcel of software development, and shouldn’t be a surprise to anyone reading this – our game is to minimise issues, and begrudgingly accept that you’ll never eradicate every idiosyncrasy, and celebrate when you do finally manage to terminate that one little bastard that has been haunting you for months.
The need for v6.2.1 was apparent early on, both from the reports we were getting back from TestFlight – an iOS crash reporting system – and our in-app Support team. The scope for the bug-fix was defined, fixes found and made, and the update released 7 days later – an impressive feat, especially when taking in to account the Apple Review Process, and the time it takes to get an app reviewed.
For all parties, v6.2.1 was an improvement – but still had more issues than pre-6.2 releases. This is another issue with updating any core functionality for a piece of software. Once you get the big bugs out of the way, you spend years continually polishing the code, ensuring it is running as best as it can. This level of polish is not possible to replicate in a month, or two, when you have to change something significant.
The “Invite Centre” is getting closer, so we barrel on through, deciding not to make another bug-fix release.
Feedback from our Support Dept is that the users are living with the current issues that still arise in v6.2.1, which I incorrectly assume means they are happy. They are not.
This was my first mistake.
We continue, and as the week becomes two, and then three, the complaints from our Support Dept increase. It is already the 20th August, with v6.4 meant to be our “August Release”, there are minimal signs of submission soon.
A meeting is called and we – myself, the devs, and QA – detail the situation, explaining that we aren’t happy with the current state of v6.4 and the fixes we have included. We don’t believe the experience will be any better (“We’ve not so much fixed the bugs, as just moved them to other places.”) and that there are a number of known bugs. One of our external testers continually experiences this bug, but cannot find a way to reproduce, and neither can we when plugged in to the debugger. We diagnose this bug as being “severe but infrequent”, and decide that we can live with it. We agree, in part due to the ever increasingly vocal complaints from our users, that we would fix the high priority bugs, and submit the app in close to its current state, acknowledging it isn’t perfect, but will be better when make these fixes.
This was my second mistake.
Once we have made all the agreed fixes, we submit to Apple.
We spend the week between submission and release attempting to reproduce the “blank messages” bug (as it is now known), to no avail. The App is approved over the weekend – Apple didn’t find it either – and I wait for Support to confirm that nothing has happened over the weekend for them to insist on a delay.
While I’m waiting, we finally reproduce the bug. It only occurs under very specific conditions. Firstly, you had to have minimal memory available – which in Palringo happens if you are in many, many groups, receiving hundreds of messages a minute – and secondly you receive a message (group or PM) when not viewing the messages tab.
My choice is simple;
- Pull the approved app, which we know is an improvement for most, and hope we can find a fix in a short enough window of time to please the community that are increasingly vocal in their complaints about v6.2.1, or,
- We release, and bug fix if required.
It is my decision – and the fact we can now reproduce the bug does not matter. The submission passed our locks with us knowing the bug existed, without a caveat of “if we can reproduce, we’ll pull”. This is different to a bug you suddenly find before a release – that would cause a delay for at least as long as it took to triage. Fixing bugs is an on going aspect of software development, when you find out how to fix them is irrelevant, if you’ve decided that you are happy with said bug being present.
I release the app.
This was my third mistake.
It would ultimately transpire that 99% of our user base would not be affected by this bug, and that v6.4 is a noticeable improvement – but the 1% it does, it is a continually reoccurring nightmare.
We get reports of users experiencing the bug every 4 minutes, that the app is unusable, that they need to downgrade. We cannot roll back the app – Apple don’t let you – and we can’t re-submit v6.2.1 due to the approval delay. We add text to the “What’s New” section of the App Store Description, telling high-usage users to not upgrade to this version of the app, as well as passing out the message in-app.
We start frantically fixing the bug. Either there is a problem in the locks, or, we made a mistake.
“Release early, release often” is how we manage QA, nightly builds are increasingly common as we progress through the release cycle. Our bug-fix process barely differs from a standard cycle, and within a few hours we have a proposed fix and have pushed a beta release to our testers, with steps to re-create the situation. “Negative Testing” is some of the most time consuming software testing you can do – and is strongly related to the concept of proving a negative – and so all you can do is state that “these steps no longer seem to cause the problem(s)”. You can never really explicitly state the bug has be fixed.
I call a meeting Tuesday morning for Technical and QA, for 30 minutes we discuss just how we got here, why we are in the situation, have a full debrief, and discuss what we need to change. We have faith in our process, we admit, understand, and accept that the bug was misdiagnosed – we agree that, if we had to do it again, we would. We get back to work.
Our testers report back, things seem better…but different. We have new bugs. We apply yet more plasters, but it becomes clear that this is a fundamental issue with the framework.
The Technical Lead consults with me, and expresses his desire to rip out the back end, and completely re-implement the view. We decide that this is the best way forward.
A “functional but not pretty” beta is released with the old framework ripped out, and the stability improvements are apparent from the get go. A by product is that, because we removed the view completely, that specific bug can no longer exist.
Many more interim releases are made, and late Friday we make a final RC release to our beta team, and agree to submit to Apple in parallel. At the time of writing, we are awaiting approval.
—
So, what have I learnt?
- The balance between fixing bugs vs. new features is even more delicate that previously imagined; and it must be a balance, you cannot sacrifice one for the other.
- It is always better to delay something slightly, than make an action you cannot undo.
- Confirm your diagnosis of bugs is still true if presented with new evidence or data.
- If a bug is frequently encountered by one person, it will definitely be encountered by the many.
- Everyone makes mistakes – but it is unforgivable to not learn from them.