LogoLogo
  • Contributing to RudderStack
  • Destination_Name
  • LICENSE
  • RudderStack Docs
  • docs
    • FAQ
    • Identity Resolution
    • Home
    • cloud-extract-sources
      • ActiveCampaign Source
      • Bing Ads
      • Chargebee
      • Common Settings
      • Facebook Ads
      • Freshdesk
      • Google Ads Source
      • Google Analytics
      • Google Search Console
      • Google Sheets
      • Cloud Extract Sources
      • Intercom v2
      • Intercom
      • Mailchimp
      • Marketo
      • Mixpanel
      • NetSuite
      • Pipedrive
      • QuickBooks
      • Salesforce Pardot
      • Sendgrid Source
      • Stripe Source
      • Xero
      • Zendesk Chat
      • Zendesk
      • hubspot
        • HubSpot Data Model and Schema Information
        • HubSpot
      • salesforce
        • Salesforce
        • Schema Comparison: RudderStack vs. Segment
    • connections
      • Connection Modes: Cloud Mode vs. Device Mode
    • data-governance
      • Data Governance
      • RudderTyper
      • Data Governance API
      • RudderTyper
      • tracking-plans
        • Tracking Plans
        • Tracking Plan Spreadsheet
    • data-warehouse-integrations
      • Amazon Redshift
      • Azure Data Lake
      • Azure Synapse
      • ClickHouse
      • Databricks Delta Lake
      • Google Cloud Storage Data Lake
      • Google BigQuery
      • Identity Resolution
      • Warehouse Destinations
      • Microsoft SQL Server
      • PostgreSQL
      • Amazon S3 Data Lake
      • Snowflake
      • FAQ
      • Warehouse Schema
    • destinations
      • Destinations
      • Webhooks
      • advertising
        • Bing Ads
        • Criteo
        • DCM Floodlight
        • Facebook App Events
        • Facebook Custom Audience
        • Facebook Pixel
        • Google Ads (gtag.js)
        • Google AdWords Enhanced Conversions
        • Google Adwords Remarketing Lists (Customer Match)
        • Advertising
        • LinkedIn Insight Tag
        • Lotame
        • Pinterest Tag
        • Reddit Pixel
        • Snap Pixel
        • TikTok Ads
      • analytics
        • Amplitude
        • AWS Personalize
        • Chartbeat
        • Firebase
        • FullStory
        • Google Analytics 360
        • Google Analytics
        • Heap.io
        • Hotjar
        • Analytics
        • Indicative
        • Keen
        • Kissmetrics
        • Kubit
        • Lytics
        • Mixpanel
        • Pendo
        • PostHog
        • Quantum Metric
        • Singular
        • adobe-analytics
          • Adobe Analytics Heartbeat Measurement
          • Mobile Device Mode Settings
          • Web Device Mode Settings
          • E-commerce Events
          • Adobe Analytics
          • Setting Up Adobe Analytics in RudderStack
        • google-analytics-4
          • Cloud Mode
          • Device Mode
          • Google Analytics 4
          • Setting up Google Analytics 4
        • profitwell
          • ProfitWell
          • Cloud Mode
          • Device Mode
      • attribution
        • Adjust
        • AppsFlyer
        • Branch
        • Attribution
        • Kochava
        • TVSquared
      • business-messaging
        • Business Messaging
        • Intercom
        • Kustomer
        • Slack
        • Trengo
      • continuous-integration
        • Visual Studio App Center
        • Continuous Integration
      • crm
        • Delighted
        • HubSpot
        • CRM
        • Salesforce
        • Variance
        • Zendesk
      • customer-data-platform
        • Customer Data Platform
        • Segment
      • error-reporting
        • Bugsnag
        • Error Reporting
        • Sentry
      • marketing
        • ActiveCampaign
        • AdRoll
        • Airship
        • Appcues
        • Autopilot
        • Blueshift
        • Braze
        • CleverTap
        • Customer.io
        • Gainsight PX
        • Gainsight
        • Marketing
        • Iterable
        • Klaviyo
        • Leanplum
        • Mailchimp
        • Marketo Lead Import
        • Marketo
        • MoEngage
        • Ometria
        • Pardot
        • Post Affiliate Pro
        • Qualtrics
        • SendGrid
        • Salesforce Marketing Cloud
        • Userlist
        • drip
          • Cloud Mode
          • Device Mode
          • Drip
          • Setting Up Drip in RudderStack
      • productivity
        • Google Sheets
        • Productivity
      • storage-platforms
        • Amazon S3
        • DigitalOcean Spaces
        • Google Cloud Storage
        • Storage Platforms
        • Azure Blob Storage
        • MinIO
        • Redis
      • streaming-platforms
        • Amazon EventBridge
        • Amazon Kinesis Firehose
        • Amazon Kinesis
        • Azure Event Hubs
        • BigQuery Stream
        • Confluent Cloud
        • Google Pub/Sub
        • Streaming Platforms
        • Apache Kafka
      • tag-managers
        • Google Tag Manager
        • Tag Managers
      • testing-and-personalization
        • Algolia Insights
        • Candu
        • Google Optimize
        • A/B Testing & Personalization
        • LaunchDarkly
        • Monetate
        • Optimizely Full Stack
        • Optimizely Web
        • Split.io
        • Statsig
        • VWO (Visual Website Optimizer)
    • get-started
      • RudderStack Cloud vs. RudderStack Open Source
      • Glossary
      • Get Started
      • RudderStack Architecture
    • reverse-etl
      • Amazon Redshift
      • Amazon S3
      • ClickHouse
      • FAQ
      • Google BigQuery
      • Reverse ETL
      • PostgreSQL
      • Snowflake
      • common-settings
        • Importing Data using Models
        • Importing Data using Tables
        • Common Settings
        • Sync Modes
        • Sync Schedule
      • features
        • Airflow Provider
        • Features
        • Models
        • Visual Data Mapper
    • rudderstack-api
      • Data Regulation API
      • HTTP API
      • RudderStack API
      • Personal Access Tokens
      • Pixel API
      • Test API
      • api-specification
        • Application Lifecycle Events Specification
        • API Specification
        • Video Events Specification
        • rudderstack-ecommerce-events-specification
          • Browsing
          • Coupons
          • E-Commerce Events Specification
          • Ordering
          • Promotions
          • Reviewing
          • Sharing
          • Wishlist
        • rudderstack-spec
          • Alias
          • Common Fields
          • Group
          • Identify
          • RudderStack Event Specification
          • Page
          • Screen
          • Track
    • rudderstack-cloud
      • Audit Logs
      • Dashboard Overview
      • Destinations
      • RudderStack Cloud
      • Live Events
      • Connection Modes: Cloud Mode vs. Device Mode
      • Sources
      • Teammates (User Management)
      • connections
        • Adding a Destination
        • Connections
    • rudderstack-open-source
      • Control Plane Setup
      • RudderStack Open Source
      • installing-and-setting-up-rudderstack
        • Developer Machine Setup
        • Docker
        • Data Plane Setup
        • Kubernetes
        • Sending Test Events
    • stream-sources
      • App Center
      • AppsFlyer
      • Auth0
      • Braze
      • Customer.io
      • Extole
      • Event Stream Sources
      • Iterable
      • Looker
      • PostHog
      • Segment
      • Shopify
      • Webhook Source
      • rudderstack-sdk-integration-guides
        • Client-side Event Filtering
        • SDKs
        • AMP Analytics
        • Cordova
        • .NET
        • Go
        • Java
        • Node.js
        • PHP
        • Python
        • React Native
        • Ruby
        • Rust
        • Unity
        • SDK FAQs
        • rudderstack-android-sdk
          • Adding Application Class
          • Flushing Events Periodically
          • Android
        • rudderstack-flutter-sdk
          • Flutter SDK v1
          • Flutter v2
          • Flutter
        • rudderstack-ios-sdk
          • iOS
          • tvOS
          • watchOS
        • rudderstack-javascript-sdk
          • Data Storage in Cookies
          • Detecting Ad-blocked Pages
          • JavaScript
          • JavaScript SDK Enhancements
          • JavaScript SDK FAQs
          • Querystring API
          • Quick Start Guide
          • Version Migration Guide
          • consent-managers
            • Consent Managers
            • OneTrust
    • transformations
      • Access Token
      • FAQ
      • Transformations
      • Transformations API
    • user-guides
      • User Guides
      • administrators-guide
        • Troubleshooting Guide
        • Alerting Guide
        • Bucket Configuration Settings for Event Backups
        • Configuration Parameters
        • Event Replay
        • High Availability
        • Horizontal Scaling
        • Administrator's Guides
        • Infrastructure Provisioning
        • Monitoring and Metrics
        • Okta SSO Setup
        • OneLogin SSO Setup
        • RudderStack Grafana Dashboard
        • Software Releases
      • how-to-guides
        • How to Use Custom Domains
        • How to Develop Integrations for RudderStack
        • How to Configure a Destination via the Event Payload
        • How to Filter Events using Different Methods
        • How to Filter Selective Destinations
        • How to Submit a Pull Request for a New Integration
        • How-to Guides
        • How to Debug Live Destination Events
        • How to Use AWS Lambda Functions with RudderStack
        • create-a-new-destination-transformer-for-rudder
          • Best Practices for Coding Transformation Functions in JavaScript
          • How to Create a New Destination Transformation for RudderStack
        • implement-native-js-sdk-integration
          • How to Add a Device Mode SDK to RudderStack JavaScript SDK
          • How to Implement a Native JavaScript SDK Integration
        • rudderstack-jamstack-integration
          • How to Integrate RudderStack with Your JAMstack Site
          • How to Integrate Rudderstack with Your Angular App
          • How to Integrate Rudderstack with Your Astro Site
          • How to Integrate Rudderstack with Your Eleventy Site
          • How to Integrate Rudderstack with Your Ember.js App
          • How to Integrate Rudderstack with a Gatsby Website
          • How to Integrate Rudderstack with a Hugo Site
          • How to Integrate Rudderstack with Your Jekyll Site
          • How to Integrate Rudderstack with Your Next.js App
          • How to Integrate Rudderstack with Your Nuxt.js App
          • How to Integrate Rudderstack with Your Svelte App
          • How to Integrate Rudderstack with Your Vue App
      • migration-guides
        • Migrating from Blendo to RudderStack
        • Migrating Your Warehouse Destination from Segment to RudderStack
        • Migration Guides
        • Migrating from Segment to RudderStack
  • src
    • @rocketseat
      • gatsby-theme-docs
        • text
          • Home
Powered by GitBook
On this page
  • Load Test Results
  • Gateway
  • Transformer
  • Batch Router - S3 Destination
  • Database Requirements
  • Estimating Storage
  • Estimating Connections
  • RAM Requirements

Was this helpful?

  1. docs
  2. user-guides
  3. administrators-guide

Infrastructure Provisioning

Detailed overview of the various performance metrics captured by running RudderStack on various AWS configurations, to help you implement effective infrastructure provisioning

PreviousAdministrator's GuidesNextMonitoring and Metrics

Last updated 3 years ago

Was this helpful?

This is a brief overview of the stats that we captured by running Backend on different AWS machine configurations that we hope gives a rough idea for users in making the decisions to provision infrastructure to host Rudder. All numbers capture below are by using metrics described in .

Load Test Results

All tests are done using db.m4.xlarge Amazon RDS instance for hosting the postgres database

Gateway

Machine

Load

Response Time (ms)

gateway.response_time

Throughput

gateway.write_key_requests

Dangling Tables

m4.2xLarge 8Core 32GB

2.5K/s

--

2.5K/s

No

m4.2xLarge 8Core 32GB

5K/s

--

3K/s

Yes

m4.2xLarge 8Core 32GB

3K/s

--

2.7K/s

No

m5.xLarge

4Core 16GB

2.5K/s

3

1.9K/s

No

m5.large

2Core 8GB

2.5K/s

4.2

1.7K/s

No

Backend migrates and drops tables that have a threshold of jobs processed. Gateway tables are backed up to object storage (S3, MinIO etc.) if configured by user. Dangling tables indicate tables are ready for drop at a rate greater than the rate at which tables are backed up to object storage. Concurrent uploads to object storage is in the roadmap for upcoming versions of Backend.

Transformer

Machine

Gateway Throughput

gateway.write_key_requests

Throughput

processor.transformer_received

m4.2xLarge 8Core 32GB

2.7k/s

2.7K/s

m5.xLarge

4Core 16GB

1.9K/s

1.9K/s

m5.large

2Core 8GB

1.7K/s

1.6K/s

Transformer is a NodeJS/Koa server launced as cluster of node processes, processses count equal to the number of cores of the machine. Choosing an instance with lower number of core the number of cores in instance processor might reduce the throughput of transformer.

Batch Router - S3 Destination

Machine

Gateway Throughput

gateway.write_key_requests

Throughput

batch_router.dest_successful_events

m4.2xLarge 8Core 32GB

2.7K/s

2.8K/s

m5.xLarge

4Core 16GB

1.9K/s

1.8K/s

m5.large

2Core 8GB

1.7K/s

1.6K/s

Below is an image captured in CloudWatch Metrics showing the captured stats

Database Requirements

Rudder recommends using a database with at least 1TB allocated storage as there could be downtime to increase storage realtime depending on your database service provider.

Estimating Storage

If you want to dig deeper and figure out the right storage size, go through the following example. Following variables should be considered to come with a right storage size for your use case.

Variable
Description
Production sample data

numSources

Total number of sources

2

numEventsPerSec

Number of events per sec for a given source

2500

avgGwEventSize

Event size that is captured at the gateway by Rudder

2.1 KB

gwEventOverhead

Size of extra metadata that Rudder stores at Gateway to process the event

300 B

numDests

Number of enabled destinations for a given source

3

avgRtEventSize

The payload size that needs to be sent by the router to the destination after applying transformations

1.2 KB

rtEventOverhead

Size of extra metadata that Rudder stores to process the event

300 B

gatewayStorage = numEventsPerSec * (avgGwEventSize + gwEventOverhead) routerStorage = numEventsPerSec * numDests * (avgRtEventSize + rtEventOverhead) totalStoragePerHour = 3600 * \sum_{firstSource}^{lastSource} (gatewayStorage + routerStorage)

In the above production example, after substituting the values, totalStoragePerHour adds up to 120 GB

Sample your peak load in production to estimate the storage requirements and substitute your values to get an estimate of the storage needed per 1 hour of data.

Event data and tables are ephemeral. In a happy path, we would have only a few minutes of event data being stored.

We recommend at least 10 hours worth of event storage computed above to gracefully handle destinations going down for a few hours.

If you want to prepare for a destination going for down for days, accommodate them into your storage capacity.

Estimating Connections

Rudder batches requests efficiently to write data. Under heavy load, backend can be configured (batchTimeoutInMS and maxBatchSize ) to batch more requests to limit concurrent connections to the database. If write latencies to the database are not in permissible thresholds, a new data set needs to be added i.e., backend server and database server.

Rudder reads the data back from the database at a constant rate. A sudden spike in user traffic will not result in more read DB requests.

RAM Requirements

Rudder does not cache aggressively and hence does not need huge amount of memory. Load tests were performed on 4 GB and 8 GB memory instances.

Rudder caches active user events by default to form configurable user sessions server side. The length of any user session can be configured with sessionThresholdInS and sessionThresholdEvents. Once a user's session is formed, that user events are cleared from the cache. If you don't need sessions, this can be disabled by setting processSessions to false.

Variable
Description
Sample data

numActiveUsers

Number of active users during a session (2 min) in your application during peak hours

10000

avgGwEventSize

Event size that is captured at the gateway by Rudder

2.1 KB

userEventsInThreshold

Number of user events in the given threshold i.e., 40 user events in 2 min

40

memoryNeeded = numActiveUsers * userEventsInThreshold * avgGwEventSize

Memory required in the above example would be 840 MB.

The memory estimate does not include the default RAM required for running the OS and the required processes.

Gateway Requests and Batch Router Throughputs

Monitoring and Metrics