At present, in the eyes and mouths of many manufacturers and experts, privacy computing has become a “universal antidote” to solve the problem of personal information security compliance. Based on their own work experience and insights, three “veterans” working on privacy have written this reflective article, which is well worth reading.
I clearly remember that in 2018, the former team members developed a multi-party + SGX technical POC to verify the optimization of multi-party computing performance. The scene is to protect the privacy of pictures, and realize the AI classification of pictures on the basis of the invisibility of the original pictures. Soon, one day he excitedly told me that his technical prototype was 1000+ times faster than using only multi-party computing. Looking at his eagerly motivated eyes, I asked him a question: What is the difference between the computing performance of multiparty + SGX and the performance of direct plaintext computing? So far, the lonely back of that member still makes me remember: federated learning, multi-party computing, fully homomorphic, etc., can not only protect personal information, but also achieve business goals, data availability is invisible, what a perfect technology, performance , communication volume, etc. have become the mountains that lie on the road of its application. When will it be possible to break through?
In the past 2020, if the field of privacy enhancement technology has at least a bright spot in the field of domestic personal information protection technology, then there is no doubt that the rise of privacy computing technology in China, as evidenced by the emergence of many startups. (There are more bright spots in foreign countries, OneTrust is still strong, Collibra and other companies are rising strongly, Synthetic data is emerging, etc.). It is said that the test results six months ago showed that the performance of multi-party secure computing is only 25 times slower than plaintext computing on average, and the computing category has also broken through the previous simple addition, subtraction and comparison three categories, and began to support statistical analysis, logical functions, and classification algorithms. Even neural network algorithms and so on. With the involvement of many forces, “available and invisible” has become the most popular language in the privacy community for a time, and privacy computing seems to represent the direction of privacy/personal information protection.
If only as a technical researcher, in addition to being happy or happy about the progress of privacy computing, I would like to thank so many innovators for their efforts! But today, with the understanding of personal information protection, while still rejoicing for technological progress, it is also necessary to remind that privacy computing is not the whole of personal information protection, it is only an attempt to solve the most urgent problem of current personal information protection It is only one of them, and it is only a branch of the development of personal information protection. So, what are the problems of putting privacy computing technology in the context of the “Personal Information Protection Law”?
1. Privacy computing mainly solves the problem of data sharing and flow
Although technologies such as differential privacy can also be applied to anonymized data collection; federated learning algorithms can also prevent users’ personal information from being uploaded to the cloud, etc. However, the most valuable application scenario of privacy computing at present is obviously trying to solve the problem of “data island”, to solve the problem of data sharing and processing, and realizing the compliant flow of data, especially in the risk control of credit and information sharing between hospitals. etc. scene.
However, whether it is GDPR or China’s personal information protection law, it involves the whole life cycle of personal information such as the collection, storage, use, processing, transmission, provision, and disclosure of personal information, and the problems faced by enterprises’ personal information compliance , it is obviously not just data sharing and circulation issues. For example, recently, from the perspective of APPs notified by departments such as network security, industry and information technology, it is more concentrated on the side of personal information collection.
The “Personal Information Protection Law” will enter the third trial of the National People’s Congress in August, and it will soon be implemented soon. The construction of the legal system has basically taken shape, which also means that it will enter the implementation stage, that is, the compliance stage. Several recent major events have shown that the past cannot be obeyed, or the law does not blame the public, or it is just innocuous. That era of Internet development has passed, and the era of strong governance and supervision is coming. In the new stage, enterprises need to “make up lessons”, specifically in the field of personal information protection, that is, to build a compliance management system involving the collection, processing, use, storage, deletion or archiving, and sharing and exchange of personal information throughout the life cycle as soon as possible. And this system will be the basis for all personal information behaviors of enterprises, including sharing and exchanging “self-innocence”. More importantly, the personal information protection law is changing the data ecology of personal information, especially clarifying the rights of consumers to search/delete/modify/copy personal information, just like “shooting” consumers. In the face of consumers who are one-shot, facing a large number of surging requests, for any company that collects and uses personal information today, it will be a “compliance nightmare” that has to be faced. No matter how big or small, Without exception.
2. Privacy computing does not completely solve compliance issues
Privacy computing realizes “data availability is invisible”, and data partners cannot obtain real user data. It seems to perfectly solve the compliance problem of personal information use, but is this really the case?
Consumer consent is indispensable. Whether it is GDPR or our own domestic “Personal Information Protection Law”, it is clearly stated that the information after anonymization does not belong to personal information, and naturally it is not restricted by the protection of personal information. Theoretically, the data partners use privacy computing technology, and the data in the process seems to be anonymized (really?), and does not actually transfer the data, and may not need user authorization and consent. However, in practice, the original data is collected first, and the privacy calculation is adopted later, and all parties to the data cooperation still need to obtain the user’s authorization to agree to collect the data. For example, when using federated learning on the terminal to model user behavior, it is necessary to collect user travel data, shopping consumption data, etc. for analysis. Although these raw data do not leave the terminal, it does not mean that they can be collected at will. Users still have the right to informed consent or refusal, and enterprises need to prove their innocence and maintain a reasonable range for the actual purpose of data processing.
In the stage of data use, algorithms such as multi-party computing and homomorphic encryption, and trusted execution environments, etc., although high-strength encryption algorithms are used to ensure that the data will not be leaked, it still cannot change the essence of its “pseudonymization” rather than Anonymous, encrypted data is still reversible (although the key protection is excellent), the results of encrypted calculations still reflect certain characteristics of a single individual in some scenarios, which are obviously personal information and will directly affect the user’s personal information Benefit. In this scenario, it is still necessary to determine whether both parties of the data cooperation have obtained the authorization and consent of the user; whether they have not used the authorization of the user beyond the scope; what is the proof of authorization and consent, etc. In short, even “absolute security” does not mean “compliance with personal information protection”. Privacy computing can neither exempt the authorization in the data collection stage, nor absolutely exempt the authorization in the process of data use.
Privacy computing realizes “available and invisible” between data partners, but from the perspective of relevant information subjects (consumers), data must be “controllable and visible” by themselves, which is a right granted by law. In addition to “authorized consent,” the use of privacy computing will in some cases face another compliance difficulty: the response to data subject rights requests. Whether it is the GDPR or the Personal Information Protection Law, the overall tendency is to strengthen the protection of the rights and interests of personal information subjects, and it is clear that consumers (personal information subjects) have the right to query, modify, and copy personal information. Although the security of data is guaranteed in the calculation process, as long as the user’s personal information is collected, both parties to the privacy computing data partners must still correctly respond to consumer rights requests in most cases (although how to disclose and which information to disclose still needs to be Normative guidance), at the same time, it may also have to “expose” the other party to consumers (as mentioned above, the result of privacy calculation is still the case of the user’s personal information), and this “exposure” will bring compliance to the other party. pressure.
In short, algorithms such as multi-party computing solve the problem of mutual distrust between data partners and release the value of data, but they do not solve the protection of the rights of the personal information subjects involved; they only solve the problem of “data minimization” in personal information protection ” and “ensure security” (confidentiality, integrity, etc.), but cannot guarantee consumers’ rights to informed consent, knowable and controllable, etc. Therefore, in the process of using privacy computing technology, not only privacy computing partners need the support of the compliance management system, but also the compliance requirements in the computing process still need to be judged according to specific application scenarios, and privacy computing cannot be absolutely exempt from compliance requirements .
3. The problem of privacy computing efficiency and performance improvement is the biggest difficulty it has to face
Algorithms such as multi-party computing and homomorphism in privacy computing try to solve the “compliance” problem of data sharing with “absolute security”. This kind of security is obviously obtained through the increase of computational complexity, multi-party interactive communication volume, etc., which will inevitably lead to performance degradation, and also makes most application scenarios focus on the support of a small amount of data. The support ability of the system still needs to be improved, and there is a dilemma of large system, high computing power, and small tasks. Although the performance of privacy computing is now improved by 1000+ times or even higher, its principle determines that there must be a ceiling for performance optimization, and this ceiling may not be low. (Remember, when we implemented the differential privacy algorithm a few years ago, one of the basic requirements was that ε<3, and the performance drop should not exceed 3%). To break through the limitations of algorithms such as privacy computing, it is either a revolutionary breakthrough in the algorithm or the emergence of special chips such as DPU. It is expected that the performance of privacy computing will decrease or have less impact, making it more acceptable and more widely used.
Based on the performance and computing power requirements of privacy computing, the way that privacy computing implements data flow is actually like “armed escort” in reality: data providers, computing service providers, and data recipients do not trust each other and are on guard against each other. This model, with high security, high cost, and low efficiency, is obviously only suitable for transporting high-value objects such as money and gold. The “armed escort” model is obviously not suitable for the transportation of conventional commodities. At this time, a universal, efficient and low-cost “express company” is the king. So is there a “courier company” model? What is the “courier company” model? Discuss later.
Privacy computing attempts to use mutual distrust of “absolute security” to replace compliance, and to solve compliance problems by solving the data provider’s own data security and non-disclosure. Thinking to solve the problem of personal protection, then, is data security equal to personal information protection? Obviously not.
The application of multi-party computing and homomorphic algorithms in privacy computing has a basic assumption: the data provider has control over the arbitrary use of the data. This is obviously problematic in the context of the Personal Information Protection Law. Privacy computing partners cannot simply exempt themselves from liability through technology, and still need the support of a compliance management system.
Considering the issue of practicality, privacy computing performance improvement, such as multi-party computing security assumption is an important parameter: whether it is based on semi-honest assumptions or supports malicious adversary models and so on. The setting of these parameters obviously requires specific analysis based on scenarios, etc., and naturally relies on the support of the personal information compliance management system.
In fact, if the sharing and communication of data are not separated separately, but are incorporated into the compliance management of the entire life cycle of personal information, a deep understanding of the nature of personal information protection, and the sharing and flow of personal information will return to business practices the original, then you will find: private computing is not the only option for data flow and sharing. Based on the construction of a trust system, changes in data usage and flow ecology, data-based value distribution, etc., the selection of the most appropriate data sharing and flow strategies will be different for different application scenarios.
Don’t repeat the old problem of emphasizing technology and neglecting management. Is the lesson in a certain field not deep enough?