Self-service Performance Engineering
As organizations evolve and transform into digital technology, some of the key focus areas is to be able to increase the velocity, efficiency, and productivity of engineering teams deploying new features at rapid pace with improved performance and reliability of the products. Expectations of the users are like never before. Software performance quality impacts the businesses in terms of customer loss/reduced revenue and organizations wants to stay on top of it. Degrading system performance and non-availability at an inconvenient time can cost millions in lost revenue.
Performance needs to be everybody’s focus and not just that of performance engineering team.
In a fast-paced agile environment, catching performance and reliability issues as they are introduced can be achieved by empowering the product engineering teams to own and execute continuous tests themselves.
Traditionally, product engineers have championed unit testing of functionality but performance has always been overlooked. With technology advancements in automation and readily available tools and accelerators, it’s about time that performance frameworks enable product engineers to execute tests early on in the software development lifecycle.
How do we empower product engineers to execute performance tests?
One way is, creating a self-service performance engineering framework that automates every step of performance evaluation such as writing scripts, execution, result analysis, pass or fail decision making, defect reporting and related notifications.
Product engineering teams should then be able to invoke this framework via a push button, or web chatbots using Slack or voice controlled assistants such as Amazon Echo/Alexa. Chatbots add fun color to it as well.
With technology advancements in automation and readily available tools and accelerators, it’s about time that performance frameworks enable product engineers to execute tests early on in the software development lifecycle
As an example, if a product engineer wants to evaluate the performance of a code change right away, he/she should be able to invoke the self-service framework via any of the above explained integrations. Once invoked, first the framework will use CI/CD tools such as Jenkins to deploy the latest build and execute the test suite. Second, the framework will pull performance metrics from various application performance management tools such as New Relic. Lastly, the framework will leverage a decision making engine to pass or fail the code change. Hence, providing a quick synopsis of the ‘change.’
How do we automate the performance test analysis and decision-making?
Performance tests and monitoring systems generate ton of metrics. Traditionally one of the most tiresome tasks is to consume all the metrics, correlate, and decide if there is degradation in performance. Now as part of self-service model you would want to automate that as well, that is, you automate the data collection, analysis and pass fail decision-making. In order to pursue that, the first and foremost step is knowing what data to look for in each component of your application and underline infrastructure, automating the key performance metrics collection from various sources, keep a history of all the metrics in a singular central repository and build insights that can compare the results from various builds and then intelligently decide pass-fail based on historical performance as well as set SLAs.
You would also want to make the thresholds dynamic and adaptive and not static so that it can catch degradation even within the SLAs.
Build a tool/system that collects all the metrics from disparate monitoring sources like Load test tools, APM tools, infrastructure monitoring tools etc and acts as the central repository for all the metrics. It keeps the history of each & every test results. This platform knows the baseline performance of all the APIs and use that to quickly compare the results and intelligently deciding for failures.
Defect Management and Notifications
Now as soon as you determine the pass/fail decision for the test you would want to automatically identify the problematic area, be it service response times or reliability or capacity issues and notify the product engineering teams with the pointers about the reasons for the failure. The decision-making engine would know the which key performance indicator has violated and can create automated Jira defects with the details. It can also supplement the defect with the exact timelines of the test and monitoring dashboards permalinks that point to the code path or the component that degraded.
Reduce Time to Prepare and Execute Test
In a typical performance testing cycle, there are a variety of performance tests that teams execute in order to be 100 percent sure that they are not introducing a performance and scalability defect. These tests are usually long both from preparation and execution standpoint whereas in self-service model you need to rely upon short, crisp and specific tests that can quickly point out defects and speed up the overall feedback loop. Having confidence on such tests comes from a solid benchmarking process that generates baseline which can then be used for anomaly detection. Scaled-down load and spike tests are preferred and tests like endurance/volume should be avoided.
Keep the Test Data and Environment Consistent
The quality of the test determines the quality of the product. And in order to maintain quality of your test its very important to be able to reproduce the same exact test environment over and over keeping everything else constant except the very change you are intending to test. Thankfully with teams moving towards ‘Infrastructure as code’ on the cloud using tools like puppet, chef, terraform to test environment can be built with the same exact configuration faster and more efficiently like never before. This helps tremendously in keeping the environment consistent across SDLC and of course between tests. These environments can then be easily expanded and contracted based on needs. For keeping test data consistent, we make our tests self-contained and create & destroy data as part of the test as much as possible. For the ones we can’t create during the test we create as part of the environment build out and Spin up parallel DB with pre-seeded test data.
The good news is if you have all these ingredients done for self-service, you can simply reuse them for embedding performance into your continuous integration and continuous delivery pipeline as well to speed up product features delivery more reliably and frequently to your customers.