A seamless checkout process is essential to any online retail. It enables an increased conversion rate and a higher return-on-investment. This is especially true during high traffic times like an end of season sale or unforeseen circumstances that lead to an increased online presence. However, it is often the case that these essential workflows are not tested on high load and fail when they are needed the most.
We recently had a retail customer in that exact situation. During their primary sale for the year, their payment gateway went down. To make matters worse, their backup payment system wasn’t working either. They had no idea which orders were paid for, and which weren’t. They had to stop the sale, and lose hundreds of thousands in potential revenue.
We immediately began to investigate the issue. We initially assessed that their payment gateway was not able to handle the number of orders generated by their sale, and had started rate limiting API calls – resulting in rejected payments for orders that were placed. Our first priority was to find and implement a working gateway that could process large volumes, allowing the client to kick-start their sale again.
After assessing different options, and discussing with the client, we decided to replace their existing payment gateway with Braintree. This was due to a number of factors, chiefly their ability to process the expected sale traffic. Braintree also supports many payment methods (like credit cards, PayPal, Apple Pay and Google Pay) and includes a pre-built payments UI. This allowed us to focus on necessary business logic rather than spend time testing small visual details of a custom payment form.
After integrating Braintree’s platform into the website, extensive testing needed to be run to understand how the system behaved when faced with high levels of concurrent traffic.
We wrote a custom script that ran concurrent checkout processes against Braintree’s API, aggregating responses into a CSV. This allowed us to send 25,000 simultaneous requests within a few minutes – replicating the high load in a short timeframe that caused the original sale to fail. The data collected included server responses, response times and statistics showing how the client’s endpoints and integrations were holding up as requests progressed.
Analysing this data uncovered multiple error states that weren’t being handled by the client’s current code. It also triggered an undocumented rate limiting response from Braintree, similar to the issue we had with the original provider. This rate limiting error was interesting technically - we had assumed the error would be returned in the API response object but the Braintree SDK was throwing an error that needed to be caught and handled separately.
Having clarity about this rate limiting error meant we were able to write logic to handle it and all other found checkout edge cases appropriately – before using it for another sale.
Running the load testing script again resulted in every single recorded error being caught and handled appropriately, with custom messages shown (based on error type). The difference between this test and the original failed sale was significant. The new payment gateway was able to handle more than 20x the original number of orders.
All-in-all, the transition to Braintree was a resounding success. The client was able to migrate to the new system with no downtime, and the next sale ran successfully.
This highlights the importance of pushing features to their limits before release. The quick feedback, combined with the ability to aggregate and interpret important information (like error states) was invaluable. The testing allowed us to write more robust, performant code and deploy to a critical user path with confidence.