Compliance Portal#

Document Fingerprinting#

Partial & Exact matching Used for DLP

Document fingerprinting is converting a standard form to a SIT (Senstitive Information Type). For example, a document template, customer feedback form or invoice template. You then make a policy to watch for this & apply actions or controls. You can also notify or warn the employee that they may be about to send senstitive information.

The fingerprint only stores the document field hashes, not the data itself.

fingerprinting does not support a .dotx (word template file). Thats an odd one. Fingerprinting also doesnt work on

  • Password protected files (they are encrypted)

  • Images only (there is no text)

  • Documents that dont include all the template text (in exact matching)

  • Files > 4MB… ok that one is odd…

Partial matching is also included. The defaults are Low, Medium and High, with 30-90% matching requirements.

Creating a fingerprint#

  1. In the Compliance porta

  2. Data Classification -> Classifiers

  3. Select Sensitive info types -> Create Fingerprint based SIT

  4. Enter the details and provide a template file.

  5. Click Create

You can store about 50 fingerprints per tenant…


Named Entities#

A named entity is a dictionary or pattern match classifier stored as a SIT.

Some common examples of these are

  • Credit Card Numbers (24 digit string)

  • Sensitive words such as ‘top secret’, ‘internal’ or industry specific jargon

Named entities are bundled or unbundled. Unbundled are specific (such as Aus Addresses) while bundled are ‘all of a type’, such as global addresses.

Keyword Dictionaries#

Keyword dictionaries are large lists of words that need to be flagged and may be subject to change often. The max size is 1MB post compression & language variants. A keyword dictionary is the easiest to manage large size list.