The latest UK payments system crash may have exposed important information about who in the financial system gets their money first.
Introduction
Amidst the chaos of the 19 July 2024 global IT outage, the crash of the UK’s CHAPS payment system a day earlier may seem like a footnote. But the payments failure has raised important questions. And the two IT incidents may even be linked.
The CHAPS payment system went down for a period on 18 July 2024 due to “global payment issues”, according to a Bank of England (‘Bank’) statement.
CHAPS is the UK’s high-value payment system, settling £300-400bn on a typical day.
According to the Bank, it needed to confer with a third-party supplier, ‘the industry’, and ‘other authorities’, without naming any of them or disclosing the subject to be conferred upon.
This provides little clarity regarding the payments outage. On its website, the Bank says it is ‘the operator of a systemically important payment system’ and ‘is accountable for the end-to-end risk management of the CHAPS payment system’.
However, the use of the word ‘global’ in yesterday’s press release seems to translate into ‘not our fault’, and ‘beyond our control’.
In a systemically important payment system, everything should presumably be within the operator’s control and in a sealed-off, protected information technology (IT) environment. Clearly, this didn’t happen with CHAPS. So what might have gone wrong?
Digging into the links between CHAPS and SWIFT
The reference to ‘global payments’ in yesterday’s Bank release appeared to point the finger at SWIFT, the main messaging network for the global payments system.
On its website, the Bank says that CHAPS payments depend on the Bank’s real-time gross settlement (RTGS) infrastructure and the SWIFT messaging network.
Yesterday’s statement by the Bank suggests there may have been a problem either with the SWIFT network itself, or with the ‘Alliance Access’ gateway through which the Bank receives and transmits its SWIFT messaging traffic. However, SWIFT has in effect denied any problem on its side.
Alliance Access permits the Bank to connect to all of SWIFT’s messaging services using particular structured messaging formats. For example, these could be SWIFTNet FIN for MT messages, SWIFTNet InterAct for individual messages formatted in ISO20022 XML, or SWIFTNet FileAct for files of messages.
Although SWIFT permits different layers of access, it is not credible that the Bank would be connecting to SWIFT other than through Alliance Access, the top-specification ‘on-premises’ gateway.
Behind Alliance Access a SWIFT participant needs to install a message sorting/routing product. This product would also unwrap the files received over FileAct and send them into the processing system as individual messages.
In the context of yesterday’s payment outage, there might have been a problem with this message sorting/routing product, because these are customarily offered by a ‘third-party supplier’. However, why the Bank would need to consult with other authorities or the industry if that were the case is anyone’s guess.
Where does this all leave us? None the wiser.
We do not know several things, and the Bank is probably not going to enlighten us. However, we can sketch the problem and try to infer where it occurred. To do that, let’s focus on three particularly important issues:
- Whether in the Bank’s terminology, ‘CHAPS’ and ‘RTGS’ (real-time gross settlement) are synonyms
- What is the Bank’s IT application architecture
- How much of that architecture is hosted
Which payments are systemically important?
On its website, the bank says that its RTGS system offers real-time, final and risk-free settlement between system participants.
It also says that CHAPS uses risk-free real-time gross settlement within the Bank’s RTGS infrastructure.
For practical purposes, CHAPS and RTGS sound the same. However, they are not synonyms. The distinction is to do with the Bank’s stance towards ‘systemically important payments’ and ‘non-systemically important payments’. Let’s call them SIPs and Non-SIPs.
Before we explore why that distinction is important, let’s remember a third UK real-time payment system—Faster Payments. Created in 2008, Faster Payments enables all of us to send money in real-time online, using mobile phones or via telephone banking.
However, unlike CHAPS payments, which have no transaction limit, Faster Payments is designed for the retail market and for small businesses. In 2022, the Bank pressed Faster Payments to raise its system limit to £1 million to get real-time payments of below that value off CHAPS. In effect, the Bank was saying that real-time payments below £1m were not systemically important.
In 2014, what was then referred to as ‘CHAPS’ went down. However, all SIPs were processed on time. It then emerged that there was a separate process for non-SIPs and that those payments had not been processed on time. Indeed, it further emerged that these payments were processed through a net settlement system with time lapses in between the netting cycles.
[The whole point of a real-time gross settlement system is that large payments are settled immediately and in full. When netting occurs and payments are only settled periodically (i.e., in cycles), credit risk builds up between the banks conducting the settlements. The failure of one bank could then cause a cascading effect and bring down the whole system.]In the 2014 case, when customers had paid £35 for a CHAPS payment, which they were led to believe was via an RTGS system, they surely had a right to demand that their payments were processed in real time and irrevocably. However, it turned out that they had paid for one service and been given another. This topic was not addressed at all at the time.
Now, ten years later, we have ‘CHAPS’ going down again, and it is said in the media to be affecting football player payments, transactions in art and house purchases. These are non-SIPs and are not the prime payments type in an RTGS system, whose purpose is to process very large interbank, central bank and government payments.
In the press coverage of yesterday’s outage, such systemically important payments have so far not been mentioned. We can conclude that ‘RTGS’ and ‘CHAPS’ are not synonyms, and that in the Bank’s terminology ‘RTGS’ is the mode in which they process SIPs, and ‘CHAPS’ is the mode in which they process Non-SIPs.
In effect, there is a two-tier real-time payments system (although the Bank will not wish this to be widely known). There is First-Class Post for SIPs and Second-Class Post for Non-SIPs (but costing the same as First Class).
The distinction is reinforced by the Bank’s issuing an RTGS and CHAPS annual report: there are in effect two modes, depending upon a payment’s size, payer and payee, and purpose.
What is the IT architecture?
The latest outages raise other important questions about the design of the Bank’s information technology infrastructure and its safety.
Above, I used the word ‘mode’ to distinguish between RTGS and CHAPS payments. I chose this word because we have no idea what the Bank’s overall IT application architecture is and whether there is a difference between how they process SIPs and Non-SIPs. Are they using the same application? Same process? Same data centre? Same machines? We don’t know.
Because they are critically important, SIPs should be processed in a Bank-run data centre that is isolated from the public internet and which has highly controlled access points. IT changes must go through a full change management process. The Bank is fully in control of everything happening within this environment.
All the Bank’s traffic for RTGS/CHAPS arrives on SWIFT’s Alliance Access gateway. But what happens to the traffic after that? The traffic now has to be sent to and received from the participants in the ISO20022 XML message format and not SWIFT FIN, but that does not mean that it remains in that format all the way through the process.
The Bank must – as any SWIFT user must – have a message handling application behind the gateway, and its functions could extend into format conversion. If there are two different modes for processing, why should there not be two different message formats: for example, ISO20022 XML for SIPs, and SWIFT FIN for Non-SIPs? The world is meant to be moving onto ISO20022 XML but that does not mean that all processing applications move to it when the communications channels do.
I can easily see that the Bank might decide on a phased migration, in which communications and processing for SIPs are prioritized for migration, and processing for Non-SIPs comes later. Format translation of ISO20022 to FIN and back is a 100%-standard function in these intermediary, message-handling applications.
Are SIPS and Non-SIPs then processed on two different applications, possibly on two different machines and even in two different locations?
How much of the payments architecture is hosted?
If we accept that SWIFT did not go down yesterday and that SIPs were processed normally, we can conclude that the intermediary, message-handling application did not go down completely (and maybe not at all). Since SIPs’ payment processing is dependent upon this message-handling application (as it is dependent upon Alliance Access), both must sit in a protected IT environment.
But what if the processing of non-SIPs is via an application on a machine that is hosted by an organisation like Amazon Web Services or Microsoft Azure? If the hosted environment went down, messages would be despatched from the protected environment at the Bank to the hosted environment, but would land in a queue.
This would be either an outgoing message queue at the protected environment at the Bank, or an incoming message queue at the hosted environment (depending on whether the interface had gone down as well and the agreements as to where messages would be warehoused during an outage).
If it was the communication channel between the protected environment at the Bank and the hosted environment that went down, the messages would normally sit in the outgoing message queue at the protected environment at the Bank and not be passed on.
‘Global payment issues’ can have no effect on the protected environment at the Bank unless Alliance Access became unavailable, which it didn’t (because the protected environment is sealed off). SIPs are then all processed within that environment and SIPs – what the Bank calls its ‘RTGS’ system – did not go down.
To conclude, there must be some other environment for processing ‘CHAPS’ payments (which we have argued are being treated by the Bank as not systemically important). This environment may or may not be hosted externally, but it is configured in some way that allows an avenue for ‘global technical issues’ to interfere.
Given the possible links to external hosting providers, it may also be that the CHAPS outage is somehow connected to the dramatic global IT outage that took place a day later, on 19th July 2024.
The latest breakdown in CHAPS may have provided concrete evidence of a two-tier wholesale payments system in the UK. If this is the case, it has important implications for the overall safety of payments, as well as for the competitive landscape between banks, other financial firms and technology companies.
Bob Lyddon is a consultant with 30 years of experience in payments strategy and regulation, electronic banking, the Single Euro Payments Area and the Payments Services Directive. He was a New Money Review podcast guest in January 2023.