LogoLogo
  • Contributing to RudderStack
  • Destination_Name
  • LICENSE
  • RudderStack Docs
  • docs
    • FAQ
    • Identity Resolution
    • Home
    • cloud-extract-sources
      • ActiveCampaign Source
      • Bing Ads
      • Chargebee
      • Common Settings
      • Facebook Ads
      • Freshdesk
      • Google Ads Source
      • Google Analytics
      • Google Search Console
      • Google Sheets
      • Cloud Extract Sources
      • Intercom v2
      • Intercom
      • Mailchimp
      • Marketo
      • Mixpanel
      • NetSuite
      • Pipedrive
      • QuickBooks
      • Salesforce Pardot
      • Sendgrid Source
      • Stripe Source
      • Xero
      • Zendesk Chat
      • Zendesk
      • hubspot
        • HubSpot Data Model and Schema Information
        • HubSpot
      • salesforce
        • Salesforce
        • Schema Comparison: RudderStack vs. Segment
    • connections
      • Connection Modes: Cloud Mode vs. Device Mode
    • data-governance
      • Data Governance
      • RudderTyper
      • Data Governance API
      • RudderTyper
      • tracking-plans
        • Tracking Plans
        • Tracking Plan Spreadsheet
    • data-warehouse-integrations
      • Amazon Redshift
      • Azure Data Lake
      • Azure Synapse
      • ClickHouse
      • Databricks Delta Lake
      • Google Cloud Storage Data Lake
      • Google BigQuery
      • Identity Resolution
      • Warehouse Destinations
      • Microsoft SQL Server
      • PostgreSQL
      • Amazon S3 Data Lake
      • Snowflake
      • FAQ
      • Warehouse Schema
    • destinations
      • Destinations
      • Webhooks
      • advertising
        • Bing Ads
        • Criteo
        • DCM Floodlight
        • Facebook App Events
        • Facebook Custom Audience
        • Facebook Pixel
        • Google Ads (gtag.js)
        • Google AdWords Enhanced Conversions
        • Google Adwords Remarketing Lists (Customer Match)
        • Advertising
        • LinkedIn Insight Tag
        • Lotame
        • Pinterest Tag
        • Reddit Pixel
        • Snap Pixel
        • TikTok Ads
      • analytics
        • Amplitude
        • AWS Personalize
        • Chartbeat
        • Firebase
        • FullStory
        • Google Analytics 360
        • Google Analytics
        • Heap.io
        • Hotjar
        • Analytics
        • Indicative
        • Keen
        • Kissmetrics
        • Kubit
        • Lytics
        • Mixpanel
        • Pendo
        • PostHog
        • Quantum Metric
        • Singular
        • adobe-analytics
          • Adobe Analytics Heartbeat Measurement
          • Mobile Device Mode Settings
          • Web Device Mode Settings
          • E-commerce Events
          • Adobe Analytics
          • Setting Up Adobe Analytics in RudderStack
        • google-analytics-4
          • Cloud Mode
          • Device Mode
          • Google Analytics 4
          • Setting up Google Analytics 4
        • profitwell
          • ProfitWell
          • Cloud Mode
          • Device Mode
      • attribution
        • Adjust
        • AppsFlyer
        • Branch
        • Attribution
        • Kochava
        • TVSquared
      • business-messaging
        • Business Messaging
        • Intercom
        • Kustomer
        • Slack
        • Trengo
      • continuous-integration
        • Visual Studio App Center
        • Continuous Integration
      • crm
        • Delighted
        • HubSpot
        • CRM
        • Salesforce
        • Variance
        • Zendesk
      • customer-data-platform
        • Customer Data Platform
        • Segment
      • error-reporting
        • Bugsnag
        • Error Reporting
        • Sentry
      • marketing
        • ActiveCampaign
        • AdRoll
        • Airship
        • Appcues
        • Autopilot
        • Blueshift
        • Braze
        • CleverTap
        • Customer.io
        • Gainsight PX
        • Gainsight
        • Marketing
        • Iterable
        • Klaviyo
        • Leanplum
        • Mailchimp
        • Marketo Lead Import
        • Marketo
        • MoEngage
        • Ometria
        • Pardot
        • Post Affiliate Pro
        • Qualtrics
        • SendGrid
        • Salesforce Marketing Cloud
        • Userlist
        • drip
          • Cloud Mode
          • Device Mode
          • Drip
          • Setting Up Drip in RudderStack
      • productivity
        • Google Sheets
        • Productivity
      • storage-platforms
        • Amazon S3
        • DigitalOcean Spaces
        • Google Cloud Storage
        • Storage Platforms
        • Azure Blob Storage
        • MinIO
        • Redis
      • streaming-platforms
        • Amazon EventBridge
        • Amazon Kinesis Firehose
        • Amazon Kinesis
        • Azure Event Hubs
        • BigQuery Stream
        • Confluent Cloud
        • Google Pub/Sub
        • Streaming Platforms
        • Apache Kafka
      • tag-managers
        • Google Tag Manager
        • Tag Managers
      • testing-and-personalization
        • Algolia Insights
        • Candu
        • Google Optimize
        • A/B Testing & Personalization
        • LaunchDarkly
        • Monetate
        • Optimizely Full Stack
        • Optimizely Web
        • Split.io
        • Statsig
        • VWO (Visual Website Optimizer)
    • get-started
      • RudderStack Cloud vs. RudderStack Open Source
      • Glossary
      • Get Started
      • RudderStack Architecture
    • reverse-etl
      • Amazon Redshift
      • Amazon S3
      • ClickHouse
      • FAQ
      • Google BigQuery
      • Reverse ETL
      • PostgreSQL
      • Snowflake
      • common-settings
        • Importing Data using Models
        • Importing Data using Tables
        • Common Settings
        • Sync Modes
        • Sync Schedule
      • features
        • Airflow Provider
        • Features
        • Models
        • Visual Data Mapper
    • rudderstack-api
      • Data Regulation API
      • HTTP API
      • RudderStack API
      • Personal Access Tokens
      • Pixel API
      • Test API
      • api-specification
        • Application Lifecycle Events Specification
        • API Specification
        • Video Events Specification
        • rudderstack-ecommerce-events-specification
          • Browsing
          • Coupons
          • E-Commerce Events Specification
          • Ordering
          • Promotions
          • Reviewing
          • Sharing
          • Wishlist
        • rudderstack-spec
          • Alias
          • Common Fields
          • Group
          • Identify
          • RudderStack Event Specification
          • Page
          • Screen
          • Track
    • rudderstack-cloud
      • Audit Logs
      • Dashboard Overview
      • Destinations
      • RudderStack Cloud
      • Live Events
      • Connection Modes: Cloud Mode vs. Device Mode
      • Sources
      • Teammates (User Management)
      • connections
        • Adding a Destination
        • Connections
    • rudderstack-open-source
      • Control Plane Setup
      • RudderStack Open Source
      • installing-and-setting-up-rudderstack
        • Developer Machine Setup
        • Docker
        • Data Plane Setup
        • Kubernetes
        • Sending Test Events
    • stream-sources
      • App Center
      • AppsFlyer
      • Auth0
      • Braze
      • Customer.io
      • Extole
      • Event Stream Sources
      • Iterable
      • Looker
      • PostHog
      • Segment
      • Shopify
      • Webhook Source
      • rudderstack-sdk-integration-guides
        • Client-side Event Filtering
        • SDKs
        • AMP Analytics
        • Cordova
        • .NET
        • Go
        • Java
        • Node.js
        • PHP
        • Python
        • React Native
        • Ruby
        • Rust
        • Unity
        • SDK FAQs
        • rudderstack-android-sdk
          • Adding Application Class
          • Flushing Events Periodically
          • Android
        • rudderstack-flutter-sdk
          • Flutter SDK v1
          • Flutter v2
          • Flutter
        • rudderstack-ios-sdk
          • iOS
          • tvOS
          • watchOS
        • rudderstack-javascript-sdk
          • Data Storage in Cookies
          • Detecting Ad-blocked Pages
          • JavaScript
          • JavaScript SDK Enhancements
          • JavaScript SDK FAQs
          • Querystring API
          • Quick Start Guide
          • Version Migration Guide
          • consent-managers
            • Consent Managers
            • OneTrust
    • transformations
      • Access Token
      • FAQ
      • Transformations
      • Transformations API
    • user-guides
      • User Guides
      • administrators-guide
        • Troubleshooting Guide
        • Alerting Guide
        • Bucket Configuration Settings for Event Backups
        • Configuration Parameters
        • Event Replay
        • High Availability
        • Horizontal Scaling
        • Administrator's Guides
        • Infrastructure Provisioning
        • Monitoring and Metrics
        • Okta SSO Setup
        • OneLogin SSO Setup
        • RudderStack Grafana Dashboard
        • Software Releases
      • how-to-guides
        • How to Use Custom Domains
        • How to Develop Integrations for RudderStack
        • How to Configure a Destination via the Event Payload
        • How to Filter Events using Different Methods
        • How to Filter Selective Destinations
        • How to Submit a Pull Request for a New Integration
        • How-to Guides
        • How to Debug Live Destination Events
        • How to Use AWS Lambda Functions with RudderStack
        • create-a-new-destination-transformer-for-rudder
          • Best Practices for Coding Transformation Functions in JavaScript
          • How to Create a New Destination Transformation for RudderStack
        • implement-native-js-sdk-integration
          • How to Add a Device Mode SDK to RudderStack JavaScript SDK
          • How to Implement a Native JavaScript SDK Integration
        • rudderstack-jamstack-integration
          • How to Integrate RudderStack with Your JAMstack Site
          • How to Integrate Rudderstack with Your Angular App
          • How to Integrate Rudderstack with Your Astro Site
          • How to Integrate Rudderstack with Your Eleventy Site
          • How to Integrate Rudderstack with Your Ember.js App
          • How to Integrate Rudderstack with a Gatsby Website
          • How to Integrate Rudderstack with a Hugo Site
          • How to Integrate Rudderstack with Your Jekyll Site
          • How to Integrate Rudderstack with Your Next.js App
          • How to Integrate Rudderstack with Your Nuxt.js App
          • How to Integrate Rudderstack with Your Svelte App
          • How to Integrate Rudderstack with Your Vue App
      • migration-guides
        • Migrating from Blendo to RudderStack
        • Migrating Your Warehouse Destination from Segment to RudderStack
        • Migration Guides
        • Migrating from Segment to RudderStack
  • src
    • @rocketseat
      • gatsby-theme-docs
        • text
          • Home
Powered by GitBook
On this page
  • Available metrics
  • Recovery Mode
  • Gateway
  • Processor
  • Router
  • BatchRouter
  • JobsDB
  • JobsDB - Table Dump specific
  • Config Backend Polling

Was this helpful?

  1. docs
  2. user-guides
  3. administrators-guide

Monitoring and Metrics

A look at all the stats/metrics generated by the backend and how to monitor the applications using them.

PreviousInfrastructure ProvisioningNextOkta SSO Setup

Last updated 3 years ago

Was this helpful?

The backend uses client to log stats. These stats can be collected by any statsd server such as Graphite, CloudWatch, etc. For example, can be used to collect stats to Amazon CloudWatch.

Note that the collection of stats can be disabled using enableStats in .

Every metric has a dimension called instanceName that can be used to filter metrics. This can be helpful in case of multi-node deployments.

Available metrics

Recovery Mode

The backend usually runs in normal mode. If backend crashes and restarts multiple times in a short span, it is started in either degraded or maintenance mode. In degraded mode, events are collected and stored by the backend gateway, but are not sent to destinations. In maintenance mode, existing database is set aside for further inspection and a new database is used. So, it is important that recovery mode is monitored and appropriate action is taken when backend enters either degraded or maintenance mode.

This is the most important metric to monitor as it directly indicates the health of the application.

Name
Type
Description

recovery.mode_normal

Gauge

has a value of :

Gateway

Name
Type
Description
Dimensions

gateway.response_time

Timer

Response time of each request

-

gateway.batch_size

Counter

Requests are grouped together internally for processing. It captures the size of such batch

-

gateway.batch_time

Timer

Time taken to process each batch of requests

-

gateway.write_key_requests

Counter

Number of requests received with each write key

writekey

gateway.write_key_successful_requests

Counter

Number of successful requests with each write key

writekey

gateway.write_key_failed_requests

Counter

Number of failed requests with each write key. *

writekey

* Requests fail in cases such as large request size, invalid write key, bad format of events, etc.

Processor

Name
Type
Description

processor.active_users

Gauge

Number of active users. This is based on the most recent events received. Useful for monitoring real time traffic.

processor.gateway_db_read

Counter

Number of events read from database for processing.

processor.gateway_db_write

Counter

Number of events whose status is updated in gateway database after processing.

processor.router_db_write

Counter

Number of events written to router db.

processor.batch_router_db_write

Counter

Number of events written to batch router db. Note that batch router db is used for handling batch dumping destinations like S3, MinIO, etc.

processor.transformer_sent

Counter

Number of events sent to transformer.

processor.transformer_received

Counter

Number of events received from transformer. Note that this may not always be the same as transformer_sent even if there are no failures.

processor.transformer_failed

Counter

Number of events from transformer with error responses.

Router

Name
Type
Description

router.[destination_code]_delivery_time*

Timer

Time taken to send each event to a specific destination.

router.[destination_code]_batch_time*

Timer

Time taken by routing worker for each iteration. Multiple events are sent in each iteration. Equivalent to the interval with which a worker picks new batch of events to send.**

router.[destination_code]_failed_attempts*

Counter

Number of retries made for a specification destination.

router.events_delivered

Counter

Total number of events delivered to all destinations.

* These metrics are each destination type such as GA, AMP, etc. All the different Google Analytics destinations are grouped under a single metric (e.g: router.GA_worker_network). Useful for monitoring if there are failures or delays in delivering to a particular destination.

BatchRouter

Destinations such as S3, MinIO, where raw events are dumped, are handled by Batch Router.

Name
Type
Description
Dimension

batch_router.dest_successful_events

Counter

Number of successful events sent to a specific destination

destID

batch_router.dest_failed_attempts

Counter

Number of failed attempts per specific destination. Increased number of this metric means we are unable to reach that specific destination (usually due to invalid authorization or endpoint).

destID

batch_router.[destination_code]_dest_upload_time

Timer

Time taken to upload events to a specific destination (S3, MinIO, etc.)

-

batch_router.errors

Counter

Total number of errors when sending events to destinations

-

JobsDB

These are the backend's implementation-specific metrics that can be used to analyze the performance based on traffic. JobsDB maintains active events and their statuses. For optimizing db operations, we periodically add new tables in the db and migrate rows from older tables.

Name
Type
Description

jobsdb.gw_tables_count

Gauge

Number of gateway tables in JobsDB

jobsdb.rt_tables_count

Gauge

Number of router tables in JobsDB

jobsdb.brt_tables_count

Gauge

Number of batch router tables in JobsDB

Ideally, the above tables count should not be ever growing. Ever growing tables:

  • Indicate events not getting processed and delivered in time.

Or

  • Indicate the load exceeded what current setup can handle and it is time to scale.

JobsDB - Table Dump specific

All the events from gateway tables are periodically dumped to S3/MinIO as a backup and also to facilitate event replay. These stats monitor delays or errors in dumping.

Name
Type
Description

jobsdb.table_file_dump_time

Timer

Time taken to dump gateway tables to a JSON file

jobsdb.file_upload_time

Timer

Time taken to compress and upload the generated JSON files.

jobsdb.total_table_dump_time

Timer

Total time taken for the whole process of dumping tables to S3.

Config Backend Polling

Configuration of the sources and their corresponding destinations is polled from config backend. Any errors in fetching this config can be monitored using config_backend_errors.

Name
Type
Description

config_backend.errors

Counter

Number of errors in fetching or processing config from control-plane's backend.

** Number of events picked in each iteration can be configured using noOfJobsPerChannel from .

      <b>1 </b>when running in normal mode
    </p>
    <p>
      <b> 0</b> when running in degraded or maintenance mode
    </p>
  </td>
</tr>
statsd
CloudWatch Agent
config.yaml
config.yaml