FAQ
Answers to the frequently asked questions about RudderStack - including the core functionality, setup, features, and more.
Last updated
Was this helpful?
Answers to the frequently asked questions about RudderStack - including the core functionality, setup, features, and more.
Last updated
Was this helpful?
This section aims to address the queries and issues you might encounter while using RudderStack.
If you come across any issue not listed in this guide, feel free to start a conversation in our community.
To quickly , you can . Here, you can configure your sources and destinations and start building your data pipelines in no time.
For routing and processing the events to the RudderStack backend, a data plane URL is required.
Refer to the guide for more information the RudderStack Data Plane.
You can get the data plane URL depending on your RudderStack plan:
RudderStack Open Source - Set up your own data plane by in your preferred environment.
An open source data plane URL looks like http:localhost:8080
where 8080
is typically the port where your data plane is hosted.
- The data plane URL is provided in the dashboard at the top of the Connections page.
- for the data plane URL with the email ID you used to sign up for RudderStack.
The token (also referred to as the workspace token) is required if you are installing RudderStack in your own environment and wish to use the RudderStack-hosted control plane. It is a unique identifier for your configuration settings which the RudderStack can fetch to track your instrumentations.
git submodule update
, I get this error:This can happen if you have changed your workspace token. Also, ensure that the RudderStack server is running on the latest version.
A c4.xlarge or c4.2xlarge machine should work just fine for your setup.
This message indicates that the RudderStack server is waiting on the PostgreSQL database dependency to be up and running. Verify if your PostgreSQL container is up.
To check the status of the data plane, run the following command:
A sample command would look something like:
The number of events a single RudderStack node can handle depends on the destinations that you are sending the event data to. It also depends on the transformations that you are running.
Here are some ballpark figures:
Dumping to S3
Approx. 1.5K events/sec
Dumping to warehouse
Approx. 1K events/sec
Dumping to warehouse and a few cloud destinations
Approx. 750 events/sec
These are conservative numbers. A single RudderStack node can handle close to 5x event load at the peak - just that those events get cached locally and are drained as per the regular throughput.
Go to the Events tab of the destination page to see the event-related metrics, as shown below:
No, you need not change the URL. As long as your self-hosted data plane has the same workspace token, the RudderStack-hosted control plane will use your data plane for processing events.
Check for the folder /tmp/badgerdbv2
and delete it. This should resolve the issue and you should be able to start rudder-server.
Note that this utility will only generate the source-destination configurations which are required by RudderStack.
workspaceConfig.json
file. But when I import this file, I get this error:This issue can occur when you have some old data left in your browser's local storage. Use the latest version of the Control Plane Lite after clearing your browser cache and local storage.
To use the control plane URL to initialize your SDKs, follow these steps:
Set up the control plane using the Control Plane Lite utility.
Go to dashboard, configure the source, and export the source configuration by clicking the EXPORT SOURCE CONFIG button as shown:
Host the exported file on your own server such that the configuration is available at <CONTROL_PLANE_URL>/sourceConfig
.
This solution assumes that you have already set up the RudderStack data plane (backend) locally.
Unfortunately, your workspace name is not changeable currently. We are planning to include this feature in our future releases.
The write key is different from your workspace token. The write key is associated with the source, while the workspace token is associated with your RudderStack workspace, as shown:
No, you need not change the URL. As long as your self-hosted data plane has the same workspace token, the RudderStack-hosted control plane will use your data plane for processing events.
Switching between RudderStack Open Source and RudderStack Cloud is quite straightforward. Replace the URL of your self-hosted data plane to the RudderStack-hosted data plane URL. You can use the same sources and destinations as before - all you need to do is just change the URL to where the events are sent.
The personal access token is a unique token associated with your RudderStack account. It is required to access and consume the RudderStack APIs.
batch
method in the SDK.If you encounter this error, it is most likely because of faulty permissions. Try editing the Zendesk Chat source and reauthorizing it again.
RudderStack does not persist any data on its own. Rather, it fetches the data from the source based on the last timestamp till it was extracted.
Yes, you can use the same destination without any problem.
Message type not supported
error. What does this mean?This error is being returned from the RudderStack backend. It means that a particular destination does not support the event you are trying to send.
For example, Salesforce only supports identify
events. Therefore, if a track
call is sent to Salesforce, the Message type not supported
error will be returned. This error does not affect any other events and is harmless. However, a simple user transformation can be written to filter out these events so you will no longer see this error.
By default, RudderStack sends the data to the table / dataset based on the source it is connected to. For example, if the source is Google Tag Manager, RudderStack sets the schema name as gtm_*
.
You can override this by setting a Namespace in the destination settings as shown:
warehouseSyncFreqIgnore = true
to have a real-time sync with BigQuery but I can't find the config.yaml
file. How can I do this using the Docker setup?You can do so by setting this value via the following .env
variables:
RSERVER_WAREHOUSE_WAREHOUSE_SYNC_FREQ_IGNORE
RSERVER_WAREHOUSE_UPLOAD_FREQ_IN_S
RudderStack lets you fill in the values with variable names. These variables should be prepended with env.
. You can populate these secrets as environment variables and run the data plane.
Suppose you are configuring Amazon S3 as a destination but you don't want to enter the AWS access key credentials in the destination settings. Fill the value with a placeholder that starts with env.
It should look like this env.MY_AWS_ACCESS_KEY
. Then set the value of the environment variable MY_AWS_ACCESS_KEY
while running the data plane.
RudderStack's hosted solution is running on AWS EKS with the cluster spanning 3 availability zones (east-1a
, east-1b
, east-1c
).
At the infrastructure layer, RudderStack runs on a multi-availability zone EKS cluster. So hardware failures, if any, are handled by Kubernetes by relocating pods.
At an application level, RudderStack operates in either of the following 3 modes:
Normal mode, where everything is normal and there are no issues.
If for some reason the system fails (e.g. because of a bug), it enters the Degraded mode, where RudderStack processes incoming requests but doesn't send it to destinations.
If the system continues to fail to process the data (e.g. internal database corruption), it enters the Maintenance mode where we save the previous state (which can be debugged and processed) and start from scratch - still receiving requests.
All of RudderStack's SDKs also have failure handling. They can store events in local storage and retry on failure.
RudderStack provides isolation between the data and control planes. For example, if the control plane (where you manage the source and destination configurations) goes offline, the data plane continues to operate.
All this is done to ensure that RudderStack can always receive events and no events are lost.
Adding a new node requires a bit of downtime. However, RudderStack is built in a way that minimizes this downtime as much as possible.
When a new node is added, the users need to be rebalanced across nodes (to keep event ordering). While the re-balancing takes place (can take a few minutes), RudderStack does not send events to downstream destinations, but continues to receive events so that your SDKs don't see any failures (ignoring the small ELB switch over time).
Also, the SDKs have a built-in local caching and retrying capability. So even if there is a failure, no events are lost.
To enable network access to RudderStack, you will need to whitelist the following RudderStack IPs:
3.216.35.97
34.198.90.241
54.147.40.62
23.20.96.9
18.214.35.254
35.83.226.133
52.41.61.208
44.227.140.138
54.245.141.180
3.66.99.198
3.64.201.167
If you have your deployment in the EU region, you can whitelist only the following two IPs:
3.66.99.198
3.64.201.167
All the outbound traffic is routed through these RudderStack IPs.
For a content security policy, the following URLs should be whitelisted:
Control plane
https://api.rudderstack.com
https://api.rudderlabs.com
Data plane
DATA_PLANE_URL
SDK
https://cdn.rudderlabs.com
Sometimes, the downstream destination can be unavailable or send a failure code for a variety of reasons. RudderStack retries sending the events depending on the type of failure:
5XX
, 429
Retry for a time window of 3 hours with exponential backoff and a minimum of 3 times.
4XX
Retry for a minimum of 3 times without any backoff .
If a user event fails, the other events are not sent until the failed event is successfully sent or aborted (as per above behavior). This is to ensure event ordering for all events of a single user.
Some downstream destinations have limits on the number of events they accept at an account or user/device level. RudderStack tries to throttle the API requests as per the destination's limits.
Some examples are:
If you are , this token is not required.
Yes. Use the open source utility to self-host the control plane and configure your sources and destinations. Refer to the Control Plane Lite section below for more information.
Verify if the SSH keys are correctly set in your GitHub account as they are used when cloning using the git protocol. For more information, refer to this .
You can verify your RudderStack installation by sending some test events and checking if they are delivered correctly. For more information, refer to the guide.
There is a to configure the number of workers that send data to destinations. The default value is 64
, which itself is an aggressive number. You can increase the number of workers. However, note that some destinations generally throttle the number of requests per account.
Events sent through the are not visible in this option.
For self-hosting the UI, you can use the utility.
The self-hosted control plane set up using Control Plane Lite does not support features like and , which are included in the .
RudderStack lets you implement your own custom transformations that leverage the event data to implement specific use-cases based on your business requirements. Refer to the section to add transformations in RudderStack.
Currently, transformations can only be configured and used for destinations. If you want to write some custom logic specific to the source, you can get the in the transformation function and use it to include the logic. Refer to the section for more information.
To view the data or events that are sent to your destination, you can use the tab in your destination's page.
For more information on generating a personal access token, refer to the guide.
You should use the track
method instead. For JavaScript SDK's track
method parameters specific to e-commerce, you can refer to the .
Yes, Shopify is compatible as an event stream data source. For more information, refer to . We also have users that integrate the JavaScript SDK into their Shopify sites. In some cases, they even do it through Google Tag Manager. However, we strongly recommend using the Shopify source integration for better tracking.
To view the data or events that are sent to your destination, you can use the tab on your destination page.
You can use to set custom logic on your events before sending them to Mixpanel.
You can override the UI set sync frequency by setting warehouseSyncFreqIgnore
to true in (or config.toml
, in case you have an older RudderStack deployment). You can set your desired frequency by changing the uploadFreqInS
parameter.
Firstly, make sure you have set up the required for PostgreSQL.
Then, check the status of the sync in the .
Check if the database is accessible by whitelisting all the RudderStack IPs listed .
Ensure that all the security group policies for S3 are set as specified .
Refer to the guide for details on how RudderStack generates the schemas in the warehouse.
Make sure that both your BigQuery dataset and the bucket have the same region. For more information, refer to the .
Yes, you can set the desired folder name in the Prefix input field while configuring your BigQuery destination. For more information, refer to the .
Refer to this for more information on obtaining your data plane URL.
The above behavior is configurable via config variables in .
For more information on the SDK-specific retry and backoff logic, refer to the guide.
These limits can also be configured using config variables in or using environment variables as described in comments .
For queries on any of the sections covered in this guide, you can or start a conversation in our community.