2 months ago (04-14)  Technology |   First to comment  3 
post score 0 times, average 0.0

2019 "315" Party artificial intelligence calls the plot of nuisance calls to let the public know the importance of protecting personal privacy in the information age.This article shares seven best practices for protecting user privacy data in logging.

Is completely different from the "Chinese people are willing to exchange privacy for convenience" mentality, Europe and the US in the protection of personal privacy obviously go a little earlier and further.In the period before and after the release of GDPR in May 2018, the need to protect personal privacy was rapidly raised, and the daily work of an ordinary programmer like me who developed international products was affected by this, and we put down our business requirements Card (Story), Instead, do the security needs associated with GDPR. In healthcare or the financial sector in general, there are very strict rules restricting access to sensitive data for customers, especially after the promulgation of the European GDPR, the consequences of the company's disclosure of personal data are also very serious.In terms of personal privacy protection, the country is currently lagging behind in terms of law and awareness, but many people are more or less aware of the troubles caused to themselves by the disclosure of personal information, such as the increase in harassment calls is the most obvious example.More optimistic is the release of cyber Security Law, as well as the awakening of netizens ' consciousness, indicating that our personal information protection is on the way. For some projects for Europe and the US, from the highest level of the company, from top to bottom, we have taken a series of related actions, such as combing our infrastructure frame composition, data flow diagram, API data field analysis, etc., including the protection of personal information in the log.

The particularity of security problems

Personal privacy security, like other security issues, is a need that will never be done.You can't say that your site is absolutely secure, you can only say, "I checked all the currently discovered lists of security vulnerabilities (Checklist) and took the appropriate defensive measures to be as safe as possible," or we took some good security practices, such as adopting dynamic passwords, An anti-attack anti-SQL injection plug-in is installed on the Nginx, and so on. Now the WEB system is generally equipped with a log system for recording access requests, analysis of online accidents, such as open source with Elk,saas DataDog, Sumo Logic and so on. It is often unavoidable to record some user privacy information during the logging process.Admittedly, the developer's sense of personal privacy protection is important, but sometimes it is not necessarily the developer's initiative to peek at user information.For example, here is a very common situation where some program exceptions, if not properly captured, tend to output the call stack, and the parameters of some of the methods in these call stacks may contain personal privacy information; Although there is no way to avoid personal information appearing in the log once and for all, But we can use the following practice to avoid as much as possible, and build these within their usual development work.The following practice, some involved in the code level of technical practice, team process optimization, but also some testing, operation and maintenance of some measures.

First: Determine what privacy data is

Before we discuss in depth how to avoid personal privacy data appearing in the log, let's define what privacy data is:

  • Personal identifiable data (PII): such as Social Security numbers, data combinations (such as name + date of birth or last name + ZIP code) or user-generated data (such as email or username, such as blog@mail.wangbaiyuan.cn), mobile phone number.
  • Health information
  • Financial data (e.g. credit card number)
  • Password
  • IP Address: IP address may also be personal privacy data, especially with personal identifiable data with some kind of binding relationship with it.(And the 3.15 party in 2019 introduced a way to turn MAC into PII, too)

Personal privacy information, the definition of which may need to be done in cooperation with security experts familiar with GDPR, to thoroughly investigate the data in the application according to the actual situation, to determine what is sensitive.

First, decoupling privacy fields

When processing private data, you should minimize the frequency with which the system uses the data.For example, when designing a database table, use an email address email, or an extreme example, using an ID number (hereinafter referred to as PID) as the primary key for the users table.This means that the system needs to use Email or PID to establish an association when accessing user data, which can be very easy to do, and the system is fully working, but this greatly increases the exposure of sensitive fields, and the more places appear, the greater the chance of being logged. So a better approach is to decouple the privacy data and use it only when necessary.A common solution is to use a randomly generated string as the ID of the user table, while creating a "1-to-1" database table to store the relationship between the user ID and the primary key of the user database table.For example:  

All database tables outside the user table should be queried using this random ID, which, even if exposed, does not disclose any personal data.

Ii. avoid the presence of personal privacy information in URLS

For example, if you have a RESTful API that looks up user information via Email, it may be easy to have such a Endpoint, such as:/user/<email>. </email>This request URL is typically logged by the reverse proxy server and the WEB server, so that the Email appears in the log.To make sensitive data not appear in the URL, you can option 1. Do not use sensitive fields as unique identifiers, use these random IDS instead. Option 2. Passing sensitive values as POST data is the same as decoupling privacy fields from the database above, which need to be considered early in the API or database design, otherwise it may take a lot of work to refactor later.The premise is that it is appropriate to determine which data in the system is sensitive data.

Iii. Object Printing Override toString method

In order to locate the problem or the convenience of debug, development often adds a debug information to the log.Because of the convenience, it is possible to write such code (print the User directly instead of User.username):

Some programming languages, such as Java and Javascript, if you print an object directly, it is actually a string returned by the print toString method, so that we can override the toString method of the object to avoid personal information disclosure problems when printing objects.

If the developer is really "dead", for example, there is no way to print the field of the object directly, such as: Logger.info (&quot;The user&#39;s details is: ${user.firstName} ${user.lastName}&quot;);

Iv. shielding privacy fields when structured log output

In order for the log to be easy to view, we often upload logs to the log server in the form of Json strings, so that we can view the logs clearly to see the key value pair structure. We can traverse all key value pairs in the app's log output, and if "key" has a field like firstName, or "value" can match to Email, replace the corresponding value<MASKED>with "", for example:</MASKED>

V. Incorporating log code reviews into the Code review

Code review is a part of the development process that guarantees the quality of your code, such as procedural vulnerabilities, robustness issues, suggestions for improvement, and so on, which are often pointed out in the Code review.Use the check of the log code as part of the concerns of individual members of the Code review.This aspect is not a technical level, but an improvement in the team Code review process. If you are using Pull Request Template to merge code, you may need to set a check box in the template to prompt reviewer for a check.

Vi. Personal Information Disclosure testing incorporated into QA and automated testing

Although the practice of most companies at present does not include personal privacy leak testing in the scope of the work of the test or QA personnel, but this part of the work not only needs to be tested to do, but can even be automated. For example, a user-registered scene, testers can imitate the user in the Web front-end form fill in the name, Email, check the server log to see whether this information.This part of the work can be automated by using end-to-end test tools such as Selenium and Cypress, and then calling the log server's API to search for whether the information exists. Automated personal privacy leak testing can also be incorporated into the CI/CD continuous integration pipeline.

Vii. "Code" privacy information before the log collector uploads

In our project, there are generally two ways to collect logs

  • Push the standard output of the machine instance or the contents of the log file to the log server through the log collection process provided by the Log Center (agent, agent)
  • Forward logs to the log center via AWS LAMBDA without server code

Protect user privacy data in the log Log Collection tool is a pass to log to the center of the log, at this gate to do a good job of information shielding, you can be from all services (in the case of multiple micro-services) of the log to do a centralized processing.The Datadog Agent provides a direct configuration for shielding private data, while AWS Lambda's code is manageable and allows you to implement regular replacement at the code level.  

Viii. monitoring alerts for configuring personal privacy information in the log system

Even with the above practice, we still can not guarantee that personal privacy will never appear in the log, on the one hand, we can in the usual Debug, view the application log when consciously check whether there is any privacy information, on the other hand, we can still use some technical means to automate this testing work, And through the alarm system notification to the team members for processing. [Caption Id= "attachment_3114" align= "AlignCenter" width= "1080"] Configure Email alerts in the moniProtect user privacy data in the logtoring system [/caption] This has been practiced in the author's team.We use Datadog as a log, monitoring system, the successful implementation of Email information in the log, Datadog can automatically send mail notifications.One thing to note, however, is that Email can match well through regular expressions and is supported by many log systems.But for names, this information may only be given to artificial intelligence.


[Caption Id= "attachment_3130" align= "AlignCenter" width= "749"] PII Protection[/caption] From the above description, you can see that Protect user privacy data in the logthe protection of personal privacy information, is not a security expert can be a simple solution to the problem, nor is the work of a separate role, but the need for the entire team of the various roles of the cooperation.That's the idea of DevSecOps.  


Follow my WeChat to get an article update

If you find this article useful to you, you can click on the "sponsor author" below to reward the author!

Reprint indicating the original source:Baiyuan's Blog>>https://wangbaiyuan.cn/en/protecting-user-privacy-data-in-log-2.html

Post comment


No Comment


Forget password?