10 ways to protect data in the cloud

How to plan for real-time security and compliance monitoring with big data.

Sep 06 166 min read

HDM Staff

10 ways to protect data in the cloud

Big data is generated by a variety of different gadgets and sensors, including security devices. The new report from the Cloud Security Alliance -- “100 Best Practices in Big Data Security and Privacy” -- looks at the best practices that should be implemented for real-time security/compliance monitoring.

1. Apply big data analytics to detect anomalous connections to cluster

Why? To ensure only authorized connections are allowed on a cluster, as this makes up part of the trusted big data environment.

How? Use solutions like TLS/SSL, Kerberos, Secure European System for Applications in a Multi-Vendor Environment (SESAME), Internet protocol security (IPsec), or secure shell (SSH) to establish trusted connections to and–if needed–within a cluster to prevent unauthorized connections. Use monitoring tools, like a security information and event management (SIEM) solution, to monitor anomalous connections. This could be, for instance, based on connection behavior (e.g., seeing a connection from a ‘bad Internet neighborhood’) or alerts being filed in the logs of the cluster systems, indicating an attempt to establish an unauthorized connection.

2. Mine logging events

Why? To ensure that the big data infrastructure remains compliant with the assigned risk acceptance profile of the infrastructure.

How? • Mine the events in log files to monitor for security, like in a SIEM tool. • Apply other algorithms or principles to mine events (such as machine learning) to get potential new security insights.

3. Implement front-end systems

Why? To parse requests ,and stop bad requests. Front-end systems are not new to security. Examples are routers, application-level firewalls and database-access firewalls. These systems typically parse the request (based on, for instance, syntax signatures or behavior profiles) and stop bad requests. The same principle can be used to focus on application or data requests in a big data infrastructure environment (e.g., MapReduce messages).

How? Deploy multi-stage levels of front-end systems. For example, utilize a router for the network; an application-level firewall to allow/block applications; and a dedicated big data front-end system to analyze typical big data inquiries (like Hadoop requests). Additional technology, such a software defined network (SDN), may be helpful for implementation and deployment.

4. Consider cloud-level security

Why? To avoid becoming the “Achilles heel” of the big data infrastructure stack. Big data deployments are moving to the cloud. If such a deployment lives on a public cloud, this cloud becomes part of the big data infrastructure stack.

How? • Download “CSA Guidance for Critical Areas of Focus in Cloud Computing V3.0” • Implement other CSA best practices. • Encourage Cloud Service Providers to become CSA STAR-certified compliant.

5. Utilize cluster-level security

Why? To ensure that security methodology for big data infrastructure is approached from multiple levels. Different components make up this infrastructure—the cluster being one of them.

How? Apply—where applicable—best security practices for the cluster. These include: • Use Kerberos or SESAME in a Hadoop cluster for authentication. • Secure the Hadoop distributed file system (HDFS) using file and directory permissions. • Utilize access control lists for access (e.g., role-based, attribute-based). • Apply information flow control using mandatory access control. The implementation of security controls also (heavily) depends on the cluster distribution being used. In case of strict security requirements (e.g., high confidentiality of the data being used), consider looking at solutions like Sqrrl, which provide fine-grained access control at the cell level.

6. Apply application-level security

Why? To secure applications in the infrastructure stack. Over the last years, attackers have shifted their focus from operating systems to databases to applications.

How? • Apply secure software development best practices, like OWASP (owasp.org) for Web-based applications. • Execute vulnerability assessments and application penetration tests on the application on an ongoing and scheduled basis.

Why? To avoid legal issues when collecting and managing data. Due to laws and regulations that exist worldwide—specifically those that relate to privacy rights—individuals who gather data cannot monitor or use every data item collected. While many regulations are in-place to protect consumers, they also create a variety of challenges in the universe of big data collection that will hopefully be resolved over time.

How? Follow the laws and regulations (i.e. privacy laws) for each step in the data lifecycle. These include: • Collection of data • Storage of data • Transmission of data • Use of data • Destruction of data Physical and virtual locations for each step in the data lifecycle may not be the same.

7. Adhere to laws and regulations

Why? To avoid legal issues when collecting and managing data. Due to laws and regulations that exist worldwide—specifically those that relate to privacy rights—individuals who gather data cannot monitor or use every data item collected. While many regulations are in-place to protect consumers, they also create a variety of challenges in the universe of big data collection that will hopefully be resolved over time.

How? Follow the laws and regulations (i.e. privacy laws) for each step in the data lifecycle. These include: • Collection of data • Storage of data • Transmission of data • Use of data • Destruction of data Physical and virtual locations for each step in the data lifecycle may not be the same.

8. Reflect on ethical considerations

Why? To address both technical and ethical questions that may arise. The fact that one has Big Data doesn’t necessarily mean that one can just use that data. There is always a fine line between what is: (1) technically possible; and (2) what is ethically correct. The latter is also impacted and related to legal regulations and, the organization’s culture, among other factors, to name a few.

How? There are no clear guidelines concerning ethical considerations related to big data usage. At minimum, big data users must take into account all applicable privacy and legal regulations. Additionally, users should consider ethical discussions related to their organizations, regions, businesses, and so forth.

9. Monitor evasion attacks

Why? To avoid potential system attacks and/or unauthorized access. Evasion attacks are meant to circumvent big data infrastructure security measures and avoid detection. It is important to minimize these occurrences as much as possible.

How? As evasion attacks evolve constantly, it is not always easy to stop them. Following the implementation of a defense in-depth concept, consider applying different monitor algorithms (like machine learning) to mine the data. Look for insights related to potential evasion of monitoring besides signature-based/rule-based/anomaly-based/specification-based detection schemes.

10. Track data-poisoning attacks

Why? To prevent monitoring systems from being misled, crashing, misbehaving or providing misinterpreted data due to malformed data. These type of attacks are aimed at falsifying data, letting the monitoring system believe nothing is wrong.

How? • Consider applying front-end systems and behavioral methods to perform input validation, process the data, and determine right from wrong as much as possible. • It is also crucial to authenticate sources of data and maintain logs not only for preventing unauthorized data injection but also for establishing accountability. • Utilize the monitoring system for strange behavior, like a spike in the central processing unit (CPU) and memory load for prolonged periods of time, or disk space running full quickly.

More for you

Loading data for hdm_tax_topic #better-outcomes...