Amazon AWS S3 Integration

Integrate with AWS S3 to perform automated content classification on your data buckets

About the Amazon AWS S3 Integration 

What it does: 

  • Performs content scanning on objects in an S3 Bucket to detect and map the types of data that are stored inside of it.
  • The integration supports scanning and identifying data types inside many different file and document types. For more information, see below.

Before setting up this integration:

  • Be sure to add Amazon S3 to your Inventory. To learn how to add systems to your Inventory, click here.
  • Make sure your MineOS plan supports automatic integrations.

How to set up

On the system side:

    1. Log into your AWS account
    2. Go to IAM -> Users -> Add New Users
    3. Type "mine-privacy-ops" as the name, select "Access Key" and click Next
    4. Select "Attach existing policies directly" and type s3 on the search.
    5. Select AmazonS3ReadOnlyAccess and click Next
    6. Leave the tags page empty and click Next
    7. Click Create User
    8. Copy the Access Key ID and Secret access key from this page to MineOS

    On your Privacy Portal: 

    1. Head to your Data Inventory and select Amazon AWS S3
    2. Scroll down to the component titled “Request handling”
    3. Select “Handle this data source in privacy requests”
    4. Select “Integration” as the handling style (see image below).
    5. Paste the Access Key ID and Secret Access Key into the designated fields
    6. Type the bucket name you want to scan under Bucket Name or a regular expression matching the bucket name you want to scan. For example: `*` will scan all the buckets the user account has access to.
    7. Click "Test your integration" so Mine can verify your API key(s). 
    8. If successful, click "Test & save to enable the integration. 

    If you would like to add more buckets, click the "+ Create Instance" link at the bottom and type in another bucket name. You can reuse the same Key secret & ID.

    Supported File Types

    Mine's content classification supports the following file types by extracting text from the files and performing classification:

    1. Apache Avro (.avro) - There are limits on maximum block size, file size, number of columns etc.
    2. Apache Parquet (.parquet)
    3. .csv .tsv
    4. PDF - File size limit: 30MB
    5. Textual files
    6. Microsoft Word - File size limit: 30MB
    7. Microsoft Excel - File size limit: 30MB
    8. Microsoft Powerpoint - File size limit: 30MB
    Note: Encrypted buckets are supported.
    Other file types not listed are not supported, including:
    1. Archives - are not supported.
    2. Image files (with OCR) - not yet supported, although it is planned.

    Limitations

    1. "Requestor pays" buckets are not supported.
    2. Compressed objects (gzip) are not currently supported.
    3. The system supports scanning multiple buckets. If the number of buckets is very large - that’s not currently supported.

     

    Talk to us if you need any help with integrations via our chat or at portal@saymine.com, and we'll be happy to assist!🙂