Recommendations for Sizing Vertica Nodes and Clusters
Overview
This hardware planning guide contains recommendations for hardware for your Vertica nodes. It also provides information that helps you appropriately size the hardware components to meet the needs of your environment.
Optimal Hardware Configurations for Vertica
The following hardware configurations provide excellent performance for your Vertica database.
Component | Recommendations |
---|---|
Processor |
For optimal performance, run the following processors:
|
Memory |
Vertica requires a minimum of 8 GB of memory per physical CPU core in each server. However, in high-performance applications, you should run 12-16 GB of memory per physical core. The memory should be at least DDR3-1600 (preferably DDR4-2133), and should be appropriately distributed across all memory channels in the server. |
Storage |
Vertica requires a minimum read/write speed of 40 MB/s per physical core of the CPU. However, for best performance, you should have 60–80 MB/s per physical core. Each node should have 1–9 TB of storage post RAID. In a production setting, Micro Focus recommends RAID 10. RAID 50 can be a viable alternative. Due to the heavy compression/encoding that Vertica performs, you do not need to use solid-state drives (SSDs). To satisfy Vertica requirements, a RAID array of more, less expensive hard disk drives (HDDs) works just as well as a RAID array of fewer SSDs. Note If you intend to use RAID 50 for your data partition, keep a spare node in every rack. This allows for manual failover of a Vertica node in the case of a drive failure. (Recovering an Vertica node is faster than rebuilding a RAID 50.To keep node recovery times at an acceptable rate, never put more than 10 TB compressed data on any node.) |
Network |
Micro Focus recommends 10G networking over 1G networking in almost every situation. |
These requirements exceed the recommended requirements.
For recommended requirements, see
- Red Hat: Red Hat Enterprise Linux technology capabilities and limits
- Debian: System Requirements
- SUSE: Technical Information
- Ubuntu: System Requirements
Sizing Your Cluster
Consider these factors when sizing your cluster:
- Data volume (compression): First, look at the total raw data volume for the cluster and then apply a reasonable compression number. In most cases, 2:1 compression with high availability is a good start. To calculate compression for your data, try one of these approaches:
- Use a previously attained compression number.
- Install Vertica on an existing system and run Database Designer (DBD). DBD calculates an excellent compression for all columns.
- Load about 10% of your expected total data into the database. Make sure that data is a good representation of the total data. After you load the data, you can calculate the compression ratio by running an audit and summing up the used bytes in projection storage.
For more information, see Calculating the Database Size in the Vertica documentation.
- Data growth: Once you have a good idea of the starting compressed data volume, consider:
- The rate of data ingest (i.e., the amount of data you load in the database each day)
- Your organization’s retention policy, how long you intend to save the data
- You can estimate the total data volume required for your cluster using the following formula:
ingest_rate × retention_period
- Workload: Develop an understanding of your total workload. The following features are part of every Vertica database workload:
- Concurrency: The number of concurrently running queries. The higher the concurrency, the more core memory you need.
- The amount of data on which the average query will operate
- How you manage resources using runtime priority and resource pools.
Server Configuration
For high availability, use a minimum of three nodes.
The following table lists the recommended server configurations. All configurations are based on raw data volumes and assume 2:1 compression. If you are getting better than 2:1 compression, you may need fewer nodes.
For configurations where the raw data size exceeds 10 TB, the best practice is to engage Vertica presales or ask a Vertica technical representative to provide an exact sizing that best fits your needs.
Before purchasing hardware for a production cluster, always run your hardware Bill of Materials (BOM) past the Vertica technical representative for review.
Note In a high-concurrency or heavy-workload environment, oversize your cluster to achieve better performance.
Raw Data Size | Recommended Server Configuration and Comments |
---|---|
Up to 5 TB |
3 nodes each with:
|
5-10 TB |
6 nodes each with:
|
10-40 TB |
3–4 nodes each with:
|
40 TB–1 PB |
4–100 nodes each with:
|
Larger than 1 PB |
In most cases, the same configuration for 40 TB–1 PB will work for raw data sizes greater than 1 PB. But first, contact an Vertica technical representative for recommendations on sizing clusters over 1 PB. |
For More Information
In this document, Vertica engineers have provided you with sizing information for your database cluster. The following resources provide more detailed configurations, and they explain how to best deploy your Vertica database.