Design, & deploy defensively

Frequently I’m asked why I always recommend building and testing a defensive design, one that has a chance of surviving first contact.

The answer is simple, I’ve been burnt, and seen others burned by design decisions that made sense at the time but didn’t stand the test of time or the ‘what if?’ test.

A few of my tips or recommendations (your mileage and opinion may wary) are shown below together with why I like to consider them in a design.

Always double check the information you have for the network switch connectivity from hosts to avoid disappointment later

lldpcli show neighbors from an AHV host will help illustrate which physical interface is plugged into which physical switch and port number (assuming LLDP is allowed on the switch). This has helped me countless times discover where cables have been inadvertently swapped during cabling or whether enumerated network identifiers varies between onboard and PCIe devices.

Always think of the future when deciding subnet sizes

On day 1 your Nutanix cluster subnet needs 2 IPs per physical host, plus 1 for cluster VIP, and 1 for Data Services IP but in the future if you plan to deploy Nutanix Unified Storage (NUS) for example you will require more IPs on the cluster subnet. Using a small subnet like a /27 (30 usable IPs) may seem like a sensible enough idea for a small 8 node cluster of 16+2 =18 IPs but you may end up with a problem later when it comes to additional services.

As much as I like using /24’s for a cluster because it means consistency i.e.

.10 – Cluster VIP

.9 – Cluster DSIP

.11 – 50 – Cluster Hosts

.51 – 101 – Cluster CVMs

Leaves 102+ for other systems such as FSVMs etc.

I do understand that larger environments may have restrictions so I’ve seen /26’s be used well for smaller clusters but I am too many times burned by the /27 in even a modest cluster size.

Consider Foundation by default

I am a huge advocate for using Foundation on every deployment, rather than relying on what the vendor has shipped and upgrading in situ.

This often means having to deploy a Foundation VM to build a cluster but I find that the time spent building directly onto the target version is better spent than taking a cluster and using LCM to upgrade it anyway.

I am also acutely aware of Supply Chain attack risks where security conscious customers may insist on systems being reimaged anyway so it’s become a bit of a standardised approach I’ve always taken.

Harden by default

Nutanix provides simple to use ncli commands which can be used to harden CVMs, AHV hosts, Prism Central VMs, and File Server VMs. It is strongly recommended to use these as in addition to being ‘more secure’ it also helps with compliance conversations when the Prism Central Security Dashboard is showing all green.

Where some of them are potentially performance impacting (speculative attacks) it should be discussed before implementing but the traditional ‘Advanced Intrusion Detection Environment (AIDE)‘, ‘High strength passwords‘, and ‘SNMP v3′ restrictions are usually no brainers to implement in addition to ensuring Secureboot is enabled on hosts.

Network Segmentation by default where appropriate

There has been vocal opinion over the years that the Nutanix cluster network reveals ‘too much information’ for clients that sit on it, that introducing the ability to physically or virtually segment services has become a reality. This is great in both allowing for security improvements, but also in enabling more predictable performance by isolating payload workloads on separate physical network cards than some backend services.

In cases where this can be done virtually, all you need is a non-routed VLAN on the switching and some IP addresses to achieve a visibly reduced attack surface (in most cases) without any noticeable impact on performance or productivity.

Consider alerting strategy

A lot of folks will only configure alerting on Prism Central as it will automatically take alerts from Prism Element and send them onwards – but what if Prism Central was hosted on the same cluster and went down, no alerts would be received because it was gone. I always recommend using a custom alert configuration to in addition to Prism Central alerting, set up Prism Element ‘Critical’ severity alerts. The trade-off is some duplicate alerts (from both PC and PE) but the benefit in getting alerts if Prism Central is down is worth it to me.

Use Pulse!

Nutanix Pulse is a very useful feature that can bring useful information to the front of your screen in the Nutanix portal that you might not be aware of in the form of Insights. A good example is keeping on top of software end of life, or known field advisory issues that may be relevant to your cluster(s). I’m even seeing secure environments evaluating using Pulse, with the optional obfuscation of key private information being supported whilst keeping the benefits it brings.

What are some of your top tips for designing, and deploying defensively?