WeChat Payment: How to build a high-available cash register system in the era of mobile payment?

In the era of mobile payment, more and more people are accustomed to go out without cash, and many payment scenarios can only be completed by taking out the mobile phone. Because of this, the availability of the cash register system is becoming more and more important. How to build a high-availability cash register system in the mobile payment era? This is the experience of the WeChat payment team and is for reference only.

1. Why emphasize the availability of the cash register system?

With the rapid development of mobile payment, users have developed the habit of going out to consume without a wallet. Frequent daily consumption puts high demands on the high availability of the merchant's cash register system. The cashier system has a small failure such as “cannot pay the money, repeat Payments, payment timeouts, etc. will bring a lot of discomfort and disadvantage to users and merchants, causing user anger, complaints, disputes, and ultimately the loss of merchants. So how to create a highly available cash register system for merchants becomes very important.

How to build a high-availability cash register system? After reading this article, I believe you will be inspired.

2, high availability cash register system design

Through the analysis of the cash register system on the market, it is found that the following risks are common:

1. Service delay is unstable:

Inter-city calls and improper DNS configuration result in unstable network;

2. System availability is not considered:

Multiple payment channels (Alipay, WeChat, etc.) are deployed together and affect each other;

Business logic services and data services are deployed together and interact with each other;

No disaster recovery and automatic switching capability;

3. Data disaster recovery is not timely:

DB single point, active/standby switching depends on manual, and fault recovery time (TTR) is uncontrollable;

In order to help merchants improve service quality and minimize the above risks, the WeChat payment team proposed a design solution for a high-availability cash register system. The system architecture diagram is as follows:

WeChat Payment: How to build a high-available cash register system in the era of mobile payment?

The following is elaborated from three levels:

1. Reduce service delay:

The cashier system offline stores are all over the country, and the network is complex (including telecommunications, China Unicom, China Railcom, mobile, etc.), which poses a higher challenge to system delay.

In response to this problem, some cloud service providers support the ability of BGP network access to cross-regional real-time handover. The redundant network egress deployment enables flexible handover scheduling between inter-area networks, which provides guarantee for network export disaster recovery.

In addition, Tencent Cloud and WeChat Payment launched a payment acceleration solution. The services deployed on Tencent Cloud can directly resolve the public network request sent to WeChat payment to intranet access, reducing the delay rate by 30% and improving the user's payment experience.

At the same time, WeChat payment official also provides two API domain names, api and api2, for the service provider system to detect the quality of service itself, and prefer faster domain names for access.

Note: Double domain name detection has the following points:

Concurrent detection, whoever comes back first is used first, thus improving efficiency;

Establish a probe retry mechanism, control the detection frequency, and reduce unnecessary detection;

Suggested detection timing: initiate detection when the system is started, or request timeout to initiate detection;

2. Cloud power, low cost and improved usability:

The article mentioned at the beginning that in the era of mobile payment, users have higher requirements for the availability of the cash register system, which forces service providers to consider more factors in system design.

Due to the high cost of these factors, it is not realistic to implement it purely. Therefore, the author will combine the familiar capabilities of Tencent Cloud to explain the situation. It is recommended that service providers in the cloud era understand these capabilities and solve the problem at low cost. Available questions.

Factor 1, multi-site deployment, multi-point access:

Utilizing Tencent Cloud's infrastructure in more than 20 data centers around the world, it is easy to achieve multi-site deployment and multi-point access. The high-availability design at the architecture layer can tolerate the failure of single-region network operators and network jitter. Stabilizing factors and providing the highest quality access conditions for business partners around the world.

When the network fails, Tencent Cloud's global intranet interconnects and dispatches service traffic to other areas in a timely manner to ensure that the user experience is not affected.

Factor two, anti-DDoS attack:

DDoS attacks keep real users out of the door, and cloud service providers now offer services to defend against such attacks. For example, Tencent Cloud 禹 BGP high-defense system provides 800G protection bandwidth and 21 BGP lines, which can dynamically schedule network traffic and help users effectively defend against DDoS attacks.

Factor three, load balancing, fault shielding:

In order to improve the stability and disaster tolerance of the system, the industry's more mature solution is based on “stateless application layer service design”, which enables “real-time monitoring of server node availability status, automatic transfer failure task to other available nodes, and Centralized requests for the ability to allocate to each machine node in the cluster."

Service providers in the cloud era can use Tencent Cloud's load balancing (CLB) capabilities to solve this problem at a low cost. Tencent Cloud's load balancing has health check capabilities, allowing users to customize the health check frequency to ensure that the back-end cloud server senses and cuts traffic in the first time in the event of a failure, ensuring high availability and no awareness of front-end applications. .

The CLB single cluster consists of four physical servers. The maximum number of concurrent connections exceeds 120 million. It can handle peak traffic of 40 Gbps and processing 6 million packets per second. In the extreme case where only one instance is available, it can still support more than 30 million concurrent connections, ensuring that the back end provides normal services, high scalability and low cost to maximize IT cost savings.

Factor four, overload protection:

Mobile payment is currently in a period of rapid growth, and various marketing activities will bring business peaks.

On the one hand, it needs to be expanded in time to reserve redundant service capabilities; on the other hand, when the actual service traffic far exceeds the maximum normal service level of the system, it protects itself and quickly rejects some requests to ensure normal service levels instead of being dragged down. Affect all services;

It is recommended to use the message queue provided by the cloud service provider to provide reliable asynchronous communication through the distributed message queue CMQ on the cloud, effectively improve system throughput, ensure reliable delivery of messages, reduce back-end system pressure, and prevent system avalanche.

In addition, Tencent Cloud Server has the capability of Auto Scaling. It only needs to be configured with simple scaling rules. The cluster can automatically expand and shrink at high load to ensure the smoothness of business smoothness. The metered capacity can save IT costs to the utmost.

3. "Bounce order" to achieve automatic data disaster recovery capability in the data layer:

The data of the cash register system is divided into two categories, one is the order information (mainly including the order form and the refund form, which is characterized by large amount of data and multiple reading and writing); the other is basic information (mainly including stores, equipment, merchants, etc.) Information, characterized by a small amount of data and more read and write less). The database efficient disaster recovery practice described here is based on the order information DB, taking MySQL as an example.

The MySQL disaster recovery policy generally relies on "semi-synchronous, active/standby switchover" through automatic or manual switching (service recovery time is between 1 minute and tens of minutes). For a scenario with a slightly larger transaction volume, the recovery time is still too long. How to achieve recovery business in a shorter period of time, we designed a data hopping solution for “hopping”.

Core ideas:

Encapsulate a "jump-out" component at the data access layer to "automatically avoid faulty storage" so that order data can be randomly dropped into individual containers.

The overall process of “jumping” is described in detail below:

WeChat Payment: How to build a high-available cash register system in the era of mobile payment?

In order to realize the single-hopping logic, we first divide the database level into several groups. Each group of DBs has one master and two backups, and the read-write is separated. The main DB is used for writing. The DB is used for reading. The master-slave synchronization is guaranteed by the MySQL semi-synchronization mechanism. .

Use the order number to save the group tag. If the original number is 201609121215432322199, you can add the group identifier in the last digit. For example, group 2 becomes 2016091212154323221992.

Under the premise of this:

a) Create an order request:

A request to create an order from the cashier terminal, first call the DB selector to randomly select a group of DBs, and then query the counter to see if the number of DB failures exceeds the threshold. If the limit exceeds the reselection, otherwise an update statement is sent through the probe. , to detect if the DB is available. If it fails, you need to re-select the DB. If you succeed, write the group tag to the single number and insert the order into the reorganization DB.

b) Update or query request:

The packet tag of the single number is directly parsed, and then the corresponding DB is operated. “Jumping” guarantees that the new transaction is normal and the payment is made first. When a group of DBs fails, operations such as order inquiry and revocation must be performed after the active/standby switchover is resumed.

Here are the notes:

The counter needs to set a period, such as one minute, so that device failure recovery is automatically enabled.

When calling MYSQL, you need to set the timeout period, such as 1 second, to avoid a certain group of DB failures and drag the upper layer service.

The probe uses the update statement. This is because the DB may be read-only if it crashes. If you send a select to detect it, you cannot guarantee that the DB is writable.

3. Daily exercise after “jumping”

In order to detect whether the system is truly highly available, regular drills are required. Here are our daily drill plans:

Do a regular drill of a single DB failure every week.

Do multiple sets of DB fault drills every quarter.

It can be seen from the monitoring during the following exercise that when a certain group DB fails, the request will drop the order, but the overall curve is smooth, the service runs normally, and there is no impact.

WeChat Payment: How to build a high-available cash register system in the era of mobile payment?

4, after the "jumping" expansion and shrinkage

Expansion steps:

Deploy a new order DB and assign it a DB number;

Configuring new library information to the DB selection component;

New library access service traffic;

Observe whether there is any abnormality in monitoring;

Shrinking steps:

The library number of the deleted DB is modulo according to the remaining DB number after shrinking: for example, there are 5 groups of DBs, which are shrunk to 3 groups, and it is planned to remove the library 5, and then 5 is obtained by 5 library modules 3.

Migrate the data to library 2, modify the configuration, and close the library 5 traffic. New orders will no longer enter library 5, while historical queries will access library 2 via modulo.

After the monitoring is normal, the library 5 is officially removed.

5, the business dimension query after the "jumping"

One common problem with multi-group DB disaster recovery solutions is the "business dimension list query efficiency problem." Orders are scattered in different DBs. If the query volume is small, the whole library scan can be directly used to solve the efficiency problem through concurrent calls. If it becomes a high-frequency operation, it is necessary to consider setting up a database to store data at the latitude of the merchant. The data synchronization between the two databases is synchronized by a reliable message queue. Specifically recommend to understand the PGXZ and MQ components above Tencent Cloud.

Although the efficiency of list query is brought about by "jumping", for the cash register system, the core design concept is to "make the payment as much as possible"! Do not affect the availability of core payment because of the list query problem.

6, the cash register system security considerations

System security is also a key indicator to measure the availability of a cashier system. The survey found that offline cashier systems may have the following security risks:

The cashier terminal software is illegally installed;

The entire POS machine was stolen;

Middleman attack;

Normal trading orders are illegally refunded;

In order to deal with the above risks, we provide the following strategies for your reference:

The POS registration activation mechanism solves the problem that the cashier terminal software is illegally installed, and can be directly blocked when the POS machine is stolen;

Request and respond to the parameter signature mechanism to prevent client forgery and request tampering;

Take the HTTPS protocol and limit the legal root certificate to prevent the middleman from capturing packets, listening, and requesting playback.

Restricting orders on the same day can initiate a refund on the POS machine that was traded at that time. For more than one day, the WeChat payment merchant system can only be used to refund the malicious refund problem.

WeChat Payment: How to build a high-available cash register system in the era of mobile payment?

In addition, the WeChat payment official security team also added the “best security practice” in the developer documentation of WeChat payment, and everyone can go and check it out.

7, recommended to use WeChat payment network monitoring tools

In order to better monitor the network quality between the merchant server and the WeChat payment server, the operation and maintenance team of WeChat payment provides a network monitoring tool to facilitate the operation and maintenance personnel to help the merchants by reporting the monitoring data to the operation and maintenance system of WeChat payment. Optimize link quality.

Detailed instructions for using the tool can be found in the instructions in the WeChat Payment Developer documentation.

8, written at the end

In summary, there are many issues to consider when building a high-availability cash register system from scratch. The cost of all self-built is not low. It is recommended to pay more attention to some basic capabilities provided by cloud service providers (BGP high-defense, BGP network access cross-region real-time switching, distributed message queue CMQ, load balancing CLB, elastic scaling AS, TDSQL, cloud payment). "Wait," it is a more sensible choice to stand on the infrastructure of the cloud era for efficient research and development.

Again, what we are pursuing is "to make the payment as much as possible"!

The WeChat payment team will continue to maintain technical research on the “high availability cash register system”, hoping to continue to deliver experience to the entire industry, help the industry improve service quality, and ultimately allow users to enjoy better mobile payment services.

Massage Recliner Sofa

  • [Comfortable Material & Massage Mode]The massage recliner is made with high-quality PU leather, skin-friendly, and easily cleaned, thick padding provides better comfort; 2 point massage of the manual Recliner on the waist. 8 function of the massage recliner sofa chair gives you the best massage.3 intensity of the Reclining Sofa offers you the best relaxation.
  • [Application]This single sofa is a good choice for the living room, home theater, bedroom. The club chair recliner's back can be adjusted from 90 degrees to 160 degrees to read books, watching movies, and napping.
  • [Upgraded Footrest]This recliner sofa features a dual-function foot extension and a reclining back that will surely help you to unwind and de-stress. Enjoy your favorite entertainment and fully relax your body and mood with this comfortable push-back recliner chair. Steel frame design leads to a more stable; Freely adjustable angle of the footrest, perfect for relaxation during your rest time
  • [Durable & Stable]Solid hardwood frame and widened base design make the massage chair more steady when lay back, completely avoid the dangers of flipping back like others' chairs.

Massage Recliner Sofa,Massage Sofa,Massage Sofa Chair,Massage Reclining Sofa

Kaifeng Lanwei Smart Home Co., Ltd , https://www.sofas-world.com