Picture of Manish Gupta

Manish Gupta

Founder/CEO Webb.ai

Redshift Advanced Troubleshooting Techniques and Best Practices for SREs and Developers

Greetings, Redshift wranglers!  As SREs and developers, we all rely on Redshift to power our data pipelines and fuel data-driven decisions. But even the mightiest data warehouse can hit a snag.  Here’s where our battle-tested troubleshooting techniques come in –  designed to help you diagnose and resolve complex Redshift issues efficiently.

Level Up Your Redshift Troubleshooting Game

We’ll delve into advanced strategies to identify and conquer performance bottlenecks in your Redshift data warehouse:

  1. Become a CloudWatch Whisperer:

CloudWatch is your window into Redshift’s health. Utilize custom metrics to track critical performance indicators (KPIs) like cluster utilization, query execution times, and disk space usage. Here’s an example CloudFormation snippet to create a CloudWatch alarm for high Redshift cluster utilization:

YAML

Resources:

  HighUtilizationAlarm:

    Type: AWS::CloudWatch::Alarm

    Properties:

      AlarmDescription: "High Cluster Utilization in Redshift"

      Namespace: AWS/Redshift

      MetricName: ClusterUtilization

      Statistic: Average

      Period: 300  # Check average every 5 minutes

      EvaluationPeriods: 2  # Trigger alarm if high for 10 minutes

      Threshold: 70  # Set your desired utilization threshold

      ComparisonOperator: GreaterThanThreshold

      AlarmActions:

        - "arn:aws:sns:REGION:ACCOUNT_ID:YourSNSTopic"  # Replace with your SNS topic ARN

This snippet creates an alarm that triggers if the average cluster utilization exceeds 70% for 10 consecutive minutes, notifying you via SNS for further investigation.

  1. Unmask the Culprit with Explain Analyze:

Redshift’s Explain Analyze feature sheds light on query execution plans, revealing potential bottlenecks. It analyzes query execution and provides insights into factors like table scans, joins, and filters. Here’s a sample SQL query to enable Explain Analyze for your next query:

SQL

EXPLAIN ANALYZE SELECT * FROM my_table;

By analyzing the Explain Analyze output, you can identify areas for query optimization, such as creating indexes on frequently used columns or optimizing join conditions.

  1. Vacuum Regularly to Maintain Peak Performance:

Over time, Redshift accumulates deleted and updated data fragments. These can impact query performance. Regularly running VACUUM commands reclaims this wasted storage space. Here’s a sample SQL script to automate vacuuming on a weekly basis:

SQL

CREATE FUNCTION vacuum_weekly()

RETURNS void AS $$

BEGIN

  EXECUTE pg_vacuum FULL;

END;

$$ LANGUAGE plpgsql;

CREATE SCHEDULE weekly_vacuum

START DATE current_timestamp + INTERVAL '1 week'

INTERVAL '1 week';

SET SCHEDULED EVENT weekly_vacuum

EXECUTE PROCEDURE vacuum_weekly();

This script creates a scheduled event that runs the vacuum_weekly function every week, performing a full vacuum on your Redshift cluster.

  1. Embrace Distribution Styles for Optimized Queries:

Redshift distributes data across nodes based on a chosen distribution style. Analyze your workload access patterns and choose the optimal style.  For example, if queries frequently filter by a specific column, consider a distribution style that clusters data based on that column. Here’s a sample SQL command to alter the distribution style of a table:

SQL

ALTER TABLE my_table SET DISTKEY BY (column_name);

This command alters the distribution style of the table named “my_table” to use the column “column_name” as the distribution key.

  1. Leverage Spectrum for Cost-Effective Cold Data Access

Redshift Spectrum allows you to seamlessly query data stored in Amazon S3 Data Lake alongside your Redshift tables. This is ideal for cost-effectively accessing infrequently used cold data without impacting your Redshift cluster’s performance.  Here’s a sample SQL query to access a table stored in S3 using Spectrum:

SQL

CREATE EXTERNAL TABLE my_cold_data (

  column1 int,

  column2 varchar(255)

)

ROW FORMAT DELIMITED BY ','

LOCATION 's3://my-data-lake/cold_data.csv'

TBLPROPERTIES ('skip header' = 'true');

SELECT * FROM my_cold_data WHERE column1 > 100;

This query creates an external table referencing a CSV file stored in S3 and then retrieves data based on a specific filter.

Discover more from Webbai

Subscribe now to keep reading and get access to the full archive.

Continue reading

Stay Up To Date with Webb.ai

Complete the form below to stay up to date with the latest news and features from Webb.ai.

Early Access Program

“What changed?” – Do you find yourself asking this question when troubleshooting? Do you wish you had Insights on why something broke – at 2am when you got paged? If yes, then try Webb.ai.

Requirements

  • Kubernetes version 1.20 or later
  • EBPF-enabled kernel version 5.4 or later