Amazon S3

Step-by-step guide to ingest your data from Amazon S3 bucket into RudderStack.

Amazon S3(Simple Storage Service) is a cloud-based object storage service that lets businesses securely store their data at scale.

RudderStack supports S3 as a source from which you can ingest data and route it to your desired downstream destinations.

Setting up the S3 source in RudderStack

To set up Amazon S3 as a source in RudderStack, follow these steps:

Naming the source

Log into your RudderStack dashboard.
From the left panel, select go to Source > New Source > Reverse ETL. Then, select S3, as shown:

Select S3 source in RudderStack

Assign a name to your source.

Configuring the connection credentials

Enter the relevant settings in the Connection Credentials sections as shown below:
- Account Name - Enter the name you wish to assign to this connection account.
- AWS Access Key ID - Enter your AWS access key ID.
- AWS Secret Access Key - Enter your AWS secret access key.

To get the AWS Access Key ID and the AWS Secret Access Key, sign into your AWS Management Console as the root user. Then, in the navigation bar on the upper right corner, choose your account name and select My Security Credentials.

For more information on getting these AWS credentials, refer to the AWS documentation.

If you've already configured S3 as a source before, your existing credentials will automatically appear under Use Existing Credentials.

The minimum S3 actions that need to be attached to the above access keys are listed below:

"Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],

Schedule settings

Specify the Schedule Settings to schedule the data syncs from your S3 source.

RudderStack lets you schedule data syncs for your Reverse ETL sources and specify how and when the syncs will run. For more information on the Basic, CRON, and Manual schedule types, refer to the Sync Schedule Settings guide.

After specifying the schedule type and run settings, click on Continue to finish the setup.

S3 is now successfully configured as a source in your RudderStack dashboard. You can further connect this source to your preferred destination by clicking on Add Destination button, as shown:

Add destination in RudderStack

If you have already configured a destination in RudderStack, choose the Use Existing Destination option which will take you to the Schema tab in the source settings. To add a new destination from scratch, select the Create New Destination option which will take you to the destination configuration page.

Specifying the data to import

This section lists the bucket configuration settings needed for RudderStack to import the data and sync it to the connected destination.

S3 Bucket Name: Enter the name of the S3 bucket.
Prefix: Prefix refers to the path within your S3 bucket from where RudderStack will import the data. For example, if Prefix is set to RUDDER, then RudderStack will import the data stored in the location <your_s3_bucket>/RUDDER.

Bucket configuration settings

Your S3 bucket(with the prefix, if specified above) should consist of Apache Parquet files only. Currently, Rudderstack can extract only Parquet files.

Choose user identifier: Choose atleast one user identifier from user_id and anonymous_id from the dropdown.
Once you specify the above settings, you will be able to preview a snippet of your data, as shown below:

Data snippet preview

Here, you can select all or only specific columns of your choice, search the columns by a keyword, and also edit the JSON Trait Key. You can also preview the resulting JSON on the right.
Add Constant: You can use this option to add a constant key-value pair which is always sent in the JSON payload, as shown:

Add constant option in RudderStack dashboard

For more information on this option, refer to the Add Constant section.

As an alternative to JSON mapping, you can map the columns using the Visual Data Mapper feature. However, note that this feature is currently supported only for selective destinations.

Updating an existing configuration

To update an existing configuration, follow these steps:

Go to the Schema tab of your configured source.
Click on the Update button on the top right.
Then, update your column selection.

When updating the configuration, you can only change the existing mappings. The S3 Bucket Name, Prefix, and User Identifier fields are not editable.

Finally, click on the Save button.

After updating the configuration, the next sync will be a full sync.

FAQs

Where can I obtain the AWS Access Key ID and the AWS Secret Access Key?

To get the AWS Access Key ID and the Secret Access Key, follow these steps:

Sign into your AWS Management Console as the root user.
Then, in the navigation bar on the upper right corner, choose your account name and select My Security Credentials.

For more information on getting these AWS credentials, refer to the AWS documentation.

Contact us

For queries on any of the sections covered in this guide, you can contact us or start a conversation in our Slack community.

PreviousAmazon Redshift NextClickHouse

Last updated 3 years ago

Was this helpful?