This article discusses how DevOps teams can measure performance through DORA metrics and the importance and challenges of these metrics.
The excitement of launching a new feature
Are you on the brink of launching a new feature – one that will affect many of your high-value clients? You've worked hard to build it, you're proud of it and you should be!
You can't wait to release it for all your users, but wait! What if you’ve missed something? Something that would ruin all your engineering efforts?
There’s nothing worse than starting the day after a release by having to immediately deal with a number of alerts for production issues and spending the day checking a number of logging and monitoring systems for errors and, ultimately, having to rollback the feature you just launched. You would just feel frustrated and unmotivated.
In addition to sapping the morale of your technical teams, NIST has shown that the longer a bug takes to be detected, the more costly it is to fix. This is illustrated by the following graph:
This is explained by the fact that once the feature has been released and is in production, finding bugs is difficult and risky. In addition to preventing users from being affected by problems, it's critical to ensure service availability.
Are you sure your feature is bug-free?
You might think that this won't happen to you. That your feature is safe and ready to deploy.
History has shown that it can happen to the biggest companies. Let’s name a few examples.
Facebook, May 7, 2020. An update to Facebook's SDK rolled out to all users, missed a bug: a server value that was supposed to provide a dictionary of things was changed to provide a simple YES/NO instead. This really tiny change was enough to break Facebook's authentication system and affect tons of other apps like TikTok, Spotify, Pinterest, Venmo, and other apps that didn't even use Facebook's authentication system as it is extremely common for apps to connect to Facebook regardless of whether they use a Facebook-related feature, mainly for ad attribution. The result was unequivocal, the app simply crashed right after launch. Facebook fixed the problem in a hurry, with about two hours for things to get back to normal. But do you have the same resources as Facebook?
Apple, September 19, 2012. Another good example, even though it’s a bit older, would be the replacement of Google Maps with Apple Maps in iOS 6 in 2012 on iOS devices. For many customers and especially fans, Apple always handles the rollout of new features carefully, but this time they messed up. Apple didn't want to be tied to Google's app anymore, so they made their own version. However, in their rush to release their map system, Apple made some unforgivable navigational mistakes. Among the many failures of Apple Maps are erased cities, disappearing buildings, flattened landmarks, duplicate islands, distorted graphics, and erroneous location data. A large part of this mess could have been avoided if they had deployed their new map application progressively. They would have been able to spot the bugs and quickly fix them before massive deployment.
And now, thinking about this and seeing that even big companies are impacted, you're stressed out and may not even want to release it anymore.
But don't worry! At AB Tasty, we know that building a feature is only half of the story and that to be truly effective, that feature has to be well deployed.
With Flagship, our feature management service has you covered. You'll find a set of useful features, such as progressive rollout, to free you from the fear of a release catastrophe and erase feature management frictions, so that you can focus on value-added tasks to get high-quality features into production and apply your energy and innovation in the best way possible, thereby delivering maximum value to your customers.
What’s progressive rollout?
So now you’re curious: what’s progressive rollout? How will this help me monitor the release and make sure everything is okay?
A progressive rollout approach lets you test the waters of a new version with a restricted set of clients. You can set percentages of users to whom your feature will be released and gradually update the percentage to safely deploy your feature. You can also do a canary launch by manually targeting several groups of people at various stages of your rollout.
This is a practice already used by large companies that have realized the significant benefits of a progressive rollout.
Netflix, for example, is one of the most dynamic companies and its developers are constantly releasing updates and new software, but users rarely experience downtime and encounter very few bugs or issues. The company is able to deliver such a smooth experience thanks to sophisticated deployment strategies, such as Canary deployment and progressive deployment, multiple staging environments, blue/green deployments, traffic splitting, and easy rollbacks to help development teams release software changes with confidence that nothing will break.
Disney is another good example of a company that makes the most of progressive deployment. It has taken the phased deployment approach to a whole new level for its "Disney +" and "Star" streaming services by deploying them regionally rather than globally. This delivery method is driven by the needs of the business. The company is making sure that everything is ready at the regional level, in line with its focus on the most important markets. Prior to launching Disney+ in Europe, it spent a lot of time building the local infrastructure needed to deliver a high-quality experience to consumers when launching Disney+ in Europe, including establishing local colocation facilities and beefing up data centers to cache content regionally. After starting to roll it out in Europe, Disney was able to identify that, for some markets, the launch of Disney+ could actually create issues that would have resulted in latency and thus provide a poor experience for affected users. So they took proactive steps to reduce their overall bandwidth usage by at least 25% prior to their march 24 launch and delayed their launch in France by two weeks. Without progressive deployment, they wouldn't have been able to identify these issues. And that’s why the launching of Disney + was remarkable.
What are the benefits of the progressive rollout?
There are three main benefits to the progressive rollout approach.
Avoiding bugs impacting everyone in production at once
First, by slowly ramping up the load, you can monitor and capture metrics about how the new feature impacts the production environment. If any unanticipated issues come to light, you can pause the full launch, fix the issues, and then smoothly move ahead. This data-driven approach guarantees a release with maximum safety and measurable KPIs.
Validating the “Viable part” in your MVP
You can effectively measure how well your feature is welcomed by your users. If you launch a new feature to 10% of your client base and notice revenue or engagement taking a dip, you can pause the release and investigate. The other major advantage? Anticipating costs. Since margin, profit and revenue are an important part of sustainability, unexpected costs that blow up your projected budgets at the end of the month are almost as bad as the night sweats that come from an unexpected bug! Monitoring your costs during a progressive rollout and immediately pausing the launch if those costs spike is a phenomenal level of control that you will absolutely want to get in on.
Progressively deploying services based upon business drivers
Finally, deploying a service or product progressively can also be seen as a way of prioritizing specific markets based on data-driven business plans. Disney, for example, decided not to launch the service in the U.S. when it launched "Star," its new channel available in the Disney+ catalog for international audiences, which will feature more mature R-rated movies, FX TV shows, and other shows and movies that Disney owns the rights to but that do not fit the Disney+ family image. Ironically: U.S. customers will have to pay extra on their Disney+ subscription to access the same content on the other streaming service, Hulu.
The decision was made following a complex matrix of rights agreements and revenue streams. Disney found that subscribers are willing to pay for the separate Hulu and Disney+ libraries in the U.S., but that Star's more limited lineup was enough to justify a standalone paid purchase for international customers, who will have to add $2 to their initial $6.99 subscription to access it. When the content library for Star is enough to justify not going through Hulu anymore, the U.S. customers will have access to it by paying just 1$ more. This progressive rollout approach has enabled Disney to make sure that once they launch Star in the U.S., everything will be ready and they will achieve good results.
In other words, the progressive rollout approach helps you ensure that your functionality meets the criteria of usability, viability, and desirability in accordance with your business plan.
How to act fast when you identify bugs while progressively deploying a feature?
Now that you know more about the progressive rollout of your features/products, you may be wondering how to take action if you identify bugs or if things aren't going well. Lucky for you, we've thought of that part too. In addition to progressive rollout, you'll also find automatic rollback on KPIs and feature flagging in the Flagship toolkit.
Feature flagging will let you set up flags on your feature, that work as simply as a switch on/off button. If for any reason you identify threats in your rollout or if the engagement of your users is not really convincing, you can simply toggle your feature off and take time to fix any issues.
This implies that you are aware and that someone from the product team is available to turn it off. But what if something happens overnight and no one can check on the progress of the deployment? Well, for that eventuality, you can set up automatic rollbacks (also called Rollback Threshold) linked to key performance indicators. Our algorithm will check the performance of your deployment and, based on the KPIs you set, if something goes wrong, it will automatically roll back the deployment and inform you that a problem has occurred. This way, in the morning, your engineers will be able to fix the problems without having to deal with the rollback themselves.
Downtime incidents are stressful for both you and your customers. To resolve them quickly and efficiently, you need to have access to the right tools and make the most of them. The progressive rollout, automatic rollback, and feature flagging are great levers to relieve your product teams of stress and let them focus on innovating your product to create a wonderful experience for your users. Highly effective organizations have already realized the importance of having the right approach to deployment with the right tools. What about your organization?
Flagship minimizes risk and maximizes results to make the lives of Product teams a whole lot easier. Create a free account today!
AB Tasty, we got you covered.