Amazon AWS S3 Integration

Integrate with AWS S3 to perform automated content classification on your data buckets

What this integration does:

Performs content scanning on objects in an S3 Bucket to detect and map the types of data that are stored inside of it.

The integration supports scanning and identifying data types inside many different file and document types. For more information see below.

 

Before you start, make sure you have:

  • Access to Mine PrivacyOps account with a plan that supports Data Mapping
  • Access to your Amazon AWS account with permissions to manage IAM identities.

 

Setting up

On your AWS account:

  1. Go to IAM -> Users -> Add New Users
  2. Type "mine-privacy-ops" as the name, select "Access Key" and click Next
  3. Select "Attach existing policies directly" and type s3 on the search.
  4. Select AmazonS3ReadOnlyAccess and click Next
  5. On the tags page, leave it empty and click Next
  6. Click Create User
  7. Copy and paste the Access Key ID and Secret access key from this page to Mine PrivacyOps

 

On your Mine Privacy Ops account:

  1. Login to the account at https://portal.saymine.com and click Data Inventory -> Data Sources -> Add Data Source
  2. Select Amazon AWS S3 from the list and click Add
  3. Click the system from the list to open the settings page and select "Integration"
  4. Paste the Access Key ID and Secret Access Key from the previous part.
  5. Type the name of the bucket you would like to scan under Bucket Name
  6. Click Test your integration. If you got a success message, click Save
  7. If you would like to add more buckets, simply click the "+ Create Instance" link at the bottom and type in another bucket name. You can reuse the same Key secret & ID.

 

Supported File Types

Mine's content classification supports the following file types by extracting text from the files and performing classification:

  1. Apache Avro (.avro) - There are limits on maximum block size, file size, number of columns etc.
  2. .csv .tsv
  3. PDF - File size limit: 30MB
  4. Textual files
  5. Microsoft Word - File size limit: 30MB
  6. Microsoft Excel - File size limit: 30MB
  7. Microsoft Powerpoint - File size limit: 30MB
Note: Encrypted buckets are supported.
Other file types not listed are not supported, including:
  1. Archives - are not supported.
  2. Image files (with OCR) - not yet supported, although it is planned.
 

Limitations

  1. "Requestor pays" buckets are not supported.
  2. Compressed objects (gzip) are not currently supported.
  3. The system supports scanning multiple buckets. If the number of buckets is very large - that’s not currently supported.