Recommendations for Sizing Vertica Nodes and Clusters

Overview

This hardware planning guide contains recommendations for hardware for your Vertica nodes. It also provides information that helps you appropriately size the hardware components to meet the needs of your environment.

Optimal Hardware Configurations for Vertica

The following hardware configurations provide excellent performance for your Vertica database.

Component Recommendations

Processor

For optimal performance, run the following processors:

  • Two-socket servers with 8- to 14-core CPUs, clocked at or above 2.6 GHz for clusters over 10 TB
  • Single-socket servers with 8 to 12 cores clocked at or above 2.6 GHz for clusters under 10 TB

Memory

Vertica requires a minimum of 8 GB of memory per physical CPU core in each server. However, in high-performance applications, you should run 12-16 GB of memory per physical core. The memory should be at least DDR3-1600 (preferably DDR4-2133), and should be appropriately distributed across all memory channels in the server.

Storage

Vertica requires a minimum read/write speed of 40 MB/s per physical core of the CPU. However, for best performance, you should have 60–80 MB/s per physical core. Each node should have 1–9 TB of storage post RAID. In a production setting, Micro Focus recommends RAID 10. RAID 50 can be a viable alternative.

Due to the heavy compression/encoding that Vertica performs, you do not need to use solid-state drives (SSDs). To satisfy Vertica requirements, a RAID array of more, less expensive hard disk drives (HDDs) works just as well as a RAID array of fewer SSDs.

Note If you intend to use RAID 50 for your data partition, keep a spare node in every rack. This allows for manual failover of a Vertica node in the case of a drive failure.  (Recovering an Vertica node is faster than rebuilding a RAID 50.To keep node recovery times at an acceptable rate, never put more than 10 TB compressed data on any node.)

Network

Micro Focus recommends 10G networking over 1G networking in almost every situation.

These requirements exceed the recommended requirements.

For recommended requirements, see

Sizing Your Cluster

Consider these factors when sizing your cluster:

  • Data volume (compression): First, look at the total raw data volume for the cluster and then apply a reasonable compression number. In most cases, 2:1 compression with high availability is a good start. To calculate compression for your data, try one of these approaches:
    • Use a previously attained compression number.
    • Install Vertica on an existing system and run Database Designer (DBD). DBD calculates an excellent compression for all columns.
    • Load about 10% of your expected total data into the database. Make sure that data is a good representation of the total data. After you load the data, you can calculate the compression ratio by running an audit and summing up the used bytes in projection storage.

    For more information, see Calculating the Database Size in the Vertica documentation.

  • Data growth: Once you have a good idea of the starting compressed data volume, consider: 
    • The rate of data ingest (i.e., the amount of data you load in the database each day)
    • Your organization’s retention policy, how long you intend to save the data
    • You can estimate the total data volume required for your cluster using the following formula:

    ingest_rate × retention_period

  • Workload: Develop an understanding of your total workload. The following features are part of every Vertica database workload: 
    • Concurrency: The number of concurrently running queries. The higher the concurrency, the more core memory you need.
    • The amount of data on which the average query will operate
    • How you manage resources using runtime priority and resource pools.

Server Configuration

For high availability, use a minimum of three nodes.

The following table lists the recommended server configurations. All configurations are based on raw data volumes and assume 2:1 compression. If you are getting better than 2:1 compression, you may need fewer nodes.

For configurations where the raw data size exceeds 10 TB, the best practice is to engage Vertica presales or ask a Vertica technical representative to provide an exact sizing that best fits your needs.

Before purchasing hardware for a production cluster, always run your hardware Bill of Materials (BOM) past the Vertica technical representative for review.

Note In a high-concurrency or heavy-workload environment, oversize your cluster to achieve better performance.

Raw Data Size Recommended Server Configuration and Comments

Up to 5 TB

3 nodes each with:

  • At least 8 15K RPM spinning disks for the data partition (total 1–2 TB per node or equivalent SSD)
  • A single 8–12 core processor with clock speed or 2.6 GHz or higher
  • 96–128GB of RAM

5-10 TB

6 nodes each with:

  • At least eight 15K rpm spinning disks for the data partition (total 1–2 TB per node or equivalent SSD)
  • A single 8–12 core processor with clock speed or 2.6 GHz or higher
  • 128–256GB of RAM

10-40 TB

3–4 nodes each with:

  • 22 10K rpm drives
  • Dual 12-core processors at 2.6 GHz
  • 256–512GB of RAM

40 TB–1 PB

4–100 nodes each with:

  • 22 10K rpm drives
  • Dual 12-core processors at 2.6 GHz
  • 256–512 GB of RAM

Larger than 1 PB

In most cases, the same configuration for 40 TB–1 PB will work for raw data sizes greater than 1 PB. But first, contact an Vertica technical representative for recommendations on sizing clusters over 1 PB.

For More Information

In this document, Vertica engineers have provided you with sizing information for your database cluster. The following resources provide more detailed configurations, and they explain how to best deploy your Vertica database.