Data Masking

Data masking or data obfuscation is the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders while still being usable by software or authorized personnel.

a way to create a fake but realistic version of our organizational data.

protect sensitive data while providing a functional alternative when real data is not needed.

The data Masking process changes the values of the data while using the same format.

the goal is to create a version that cannot be deciphered or reverse engineered

Several ways to alter data

character shuffling
word or character substitution
encryption

Why is Data Masking Important?

Here are several reasons data masking is essential for many organizations:

Data masking solves several critical threats – data loss, data ex-filtration, insider threats or account compromise, and insecure interfaces with third-party systems.
Reduces data risks associated with cloud adoption.
Makes data useless to an attacker, while maintaining many of its inherent functional properties.
Allows sharing data with authorized users, such as testers and developers, without exposing production data.
Can be used for data sanitization – normal file deletion still leaves traces of data in storage media, while sanitization replaces the old values with masked ones.

Data Masking Types:

Static Data Masking
Deterministic data masking
On the fly data masking
Dynamic data masking

Data Masking Techniques:

Data Encryption
Data Scrambling
Nulling Out
Value Variance
Data Substitution
Data Shuffling
Pseudonymization

Data Masking Best Practices:

Determine Project Scope
Ensure Referential Integrity
Secure the data masking Algorithms

Data Masking Tools:

DATAPROF:

Features:

Consistent over multiple applications and databases.
XML and CSV file support.
Built-in synthetic data generators.
HTML audit / GDPR reporting.
Test data automation with REST API.
Web Portal for easy provisioning.

Pros:

High performance on large data sets.
Free trial version available.
Easy to install and use.
Native support for all major relational databases.

Cons:

English documentation only.
Development of templates requires Windows.
Execution of templates can be done on Windows or Linux.

IRI FieldShield

Features:

Multi-source data profiling, discovery (search), and classification.
Broad array of masking functions (including FPE) to de-identify and anonymize PII.
Assures referential integrity across schema and multi-DB/file scenarios.
Built-in re-ID risk scoring and audit trails for GDPR, HIPAA, PCI DSS, etc.

Pros:

High performance, without the need for a central server.
Simple metadata and multiple graphical job design options.
Works with DB subsetting, synthesis, reorg, migration, and ETL jobs in Voracity, plus leading DB cloning, encryption key management, TDM portals, and SIEM environments.
Fast support, and affordable (especially relative to IBM, Oracle, Informatica).

Cons:

1NF structured data support only; DarkShield needed for BLOBs, etc.
Free IRI Workbench IDE is a thick client Eclipse UI (not web-based).
DDM requires FieldShield API call, or premium proxy server option.

Accutive Data Discovery & Masking

Features:

Data Discovery – Enables the efficient identification of sensitive data that need to meet regulatory compliance standards such as GDPR, PCI-DSS, HIPAA, GLBA, OSFI/PIPEDA, and FERPA.
Mask Link Technology – Ability to consistently and repeatedly mask source data to the same value (i.e. Smith will always be masked to Jones) across multiple databases.
Multiple Data Sources and Destinations – Data can be moved from any major source type to any major destination type such as Oracle, DB2, MySQL and SQLServer (e.g. data can be moved from a Flat File to an Oracle Database).
API support – Include data masking in your data processing pipeline.

Pros:

User-friendly, configurable interface.
Cost-effective solution with transparent pricing models.
Performs masking configurations rapidly with a built-in progress display.

Cons:

Groovy scripting to customize application behavior requires some knowledge of programming.
Not currently available in languages other than English, French, Spanish, and German.

IRI DarkShield

Features:

Built-in data classification and simultaneous ability to search, mask, and report.
Multiple search methods and masking functions, including fuzzy match and NER.
Deletion function for GDPR (and similar) right to be forgotten laws.
Integrates with SIEM/DOC environments and multiple logging conventions for audit.

Pros:

High speed, multi-source, no need to mask in cloud or compromise control of data.
Consistent ciphertext assures referential integrity in structured and unstructured data.
Shares data classes, masking functions, engine, and job design GUI with FieldShield.
Proven worldwide, but still affordable (or free with FieldShield in Voracity subscriptions).

Cons:

Standalone and embedded image capabilities limited by OCR may need tweaking.
API requires custom ‘glue code’ for cloud, DB, and big data sources.
Price options may seem complex in a mixed data source and use case scenarios.

IRI CellShield EE

Features:

Wide range of ergonomic PII searching and masking methods.
Supports formulas and multi-byte character set.
Leverages data classes, top masking functions, and search parameters of DarkShield GUI.
Excel charts intelligently display discovered and masked data across multiple sheets.

Pros:

High-performance masking of very large and/or multiple sheets at once.
Consistent ciphertext assures referential integrity in sheets and other data sources.
Search and mask audit column results, plus log exports to email, Splunk, and Datadog.
Documented in-app and online. Easily upgradable from low-cost Personal Edition.

Cons:

Only compatible with MS Excel 2007 or higher (not other sheet apps).
Sharepoint and macro support are still in development.
Free trial is only for Enterprise Edition (EE), not low-cost Personal Edition (PE).

Oracle Data Masking and Subsetting

Features:

Discovers Complex Data and its relationships automatically.
Wide Masking Plan Library and enhanced Application Models.
Revolutions of complete data masking.
Fast, Secure, and Assorted.

Pros:

It proposes various customs for masking data.
It supports non-oracle databases as well.
It takes less time to run.

Cons:

High-cost.
Less secured for development and testing environments.

Delphix

Features:

End-to-end data masking and creating reports for the same.
Masking Combined with data virtualization to progress transport of the data.
Easy in use as no training is required to mask data.
It migrates data steadily across sites, on-premises, or in the cloud.

Pros:

Easy and in-time regaining of records.
Virtualization of databases.
Data refreshing is fast.

Cons:

High cost.
SQL Server databases are slow and limited.
Reliant on NFS old protocols.

Informatica Persistent Data Masking

Features:

Supports Robust Data Masking.
Creates and integrates the masking process from a single location.
Features to handle a large volume of databases.
It has wide connectivity and customized Application Support.

Pros:

Decreases the risk of Data Break via a single audit trail.
Advances the Quality of Development, Testing, and Training events.
Easy deployment in the workstations.

Cons: Need to work more on UI.

Microsoft SQL Server Data Masking

Features:

Simplification in designing and coding for applications by securing data.
It doesn’t change or transform the stored data in the database.
It permits the data manager to choose the level of complex data to expose with a lesser effect on the application.

Pros:

End operators are prohibited from visualizing complex data.
Generating a mask on a column field doesn’t avoid updates.
Changes to applications are not essential to read data.

Cons:

Data is fully accessible while querying tables as a privileged user.
Masking can be unmasked via the CAST command by executing an ad-hoc query.
Masking cannot be applied for the columns like Encrypted, FILESTREAM, or COLUMN_SET.

IBM InfoSphere Optim Data Privacy

Features:

Mask private data on request.
Decrease risk by locking data.
Fasten data privacy application.
A secure environment for application testing.

Pros:

Easily abstracts data with no coding.
Advanced-Data Masking Feature.
Smart filtering abilities.

Cons:

Need to work on UI.
Complex architecture.

CA Test Data Manager

Features:

Creates Synthetic test data for data testing.
Creates future test scenarios and unexpected outcomes.
Stores data for reuse.
Creates virtual copies of test data.

Pros:

Different filters and templates are present to mask data.
No additional permission is required to access the production data.
Very fast tools to mask data.

Cons:

Works only on Windows.
Complex User Interface.
Automating everything is not easy.

Compuware Test Data Privacy

Features:

Decreases the difficulty by codeless masking.
Completes data normalization into and out of the masking process.
Dynamic Privacy Rules with complex test data essentials such as account numbers, card numbers, etc.
Allows to discover and mask data within a greater field.

Pros:

Easy to use and is fast.
Secures test data against breaks.
Apply test data privacy to test data, so that it will be more secure.

Cons:

Complex user Interface.

NextLabs Data Masking

Features:

Helps in classifying and sorting data.
Monitors data movement and its usage.
It prevents access to precise data.
Notifications on risky actions and irregularities.

Pros:

Can be installed easily on each workstation.
Evades data breaking.
Data Safety across CAD, PLM, and email is good.

Cons:

Software compatibility problems with PLM software.
Execution is tough at times for the suppliers and vendors.

Hush-Hush

Features:

Less time and Easy installation.
Supple, Robustness and takes less time to create workflows.
Easy and Robust Combination into SQL server, Biztalk, etc.
Custom SSIS agenda to mask data.

Pros:

Speed up development.
No learning curves.
Create data by just the “INSERT” command.

Cons:

In startups the growth is fast but the progress slows down in developed industries.
Limited control of data.

Apache Atlas

https://atlas.apache.org/#/Architecture

Py Data

Data Masking

Data Masking

Why is Data Masking Important?

Data Masking Types:

Data Masking Techniques:

Data Masking Best Practices:

Data Masking Tools:

DATAPROF:

IRI FieldShield

Accutive Data Discovery & Masking

IRI DarkShield

IRI CellShield EE

Oracle Data Masking and Subsetting

Delphix

Informatica Persistent Data Masking

Microsoft SQL Server Data Masking

IBM InfoSphere Optim Data Privacy

CA Test Data Manager

Compuware Test Data Privacy

NextLabs Data Masking

Hush-Hush

Apache Atlas

Post a Comment

Python Basics for Beginners: Numbers, Strings & Lists

Building a Simple Neural Network from Scratch with NumPy

Exploring Data Structures in Python

Hello World

In-Depth Python Topics: Going Beyond the Basics

Py Data