VCAP-DCV Deploy Objective 4.3

Home / DPM / VCAP-DCV Deploy Objective 4.3

In the following post we are going to discuss how to troubleshoot vSphere clusters

The following are objectives from the blueprint :

  • Analyze and resolve DRS/HA faults
  • Troubleshoot DRS/HA configuration issues
  • Troubleshoot Virtual SAN/HA interoperability
  • Resolve vMotion and storage vMotion issues
  • Troubleshoot VMware Fault Tolerance

Lab Setup:

Using VMware workstation:

  • Microsoft Servers 2012R2 for Services (DNS , DHCP, etc…)
  • Installed esx0
  • Installed esx1
  • Installed VCSA

 Documents used:

  • vSphere 6.0 Resource Management Guide
  • vSphere 6.0 Troubleshooting Guide (most used)
  • vSphere 6.0 Availability guide

Analyze and resolve DRS/HA faults:

You can look at the DRS fault and DRS history to collect information when troubleshooting. every vMotion migration will be shown in the history.tshoot_drs

Also always look at the monitor tab, you get information about issues , trigger events and the DRS/HA health.

tshoot_drs1

 

Troubleshooting vSphere HA Host States:

Issues from VMware vSphere 6.0 Troubleshooting Guide:

Agent Unreachable State: The vSphere HA agent on a host is in the Agent Unreachable state for a minute or more

  • Solution: Possible network issues if not reconfigre HA.

Agent is in the Uninitialized State: The vSphere HA agent on a host is in the Uninitialized state for a minute or more

  • Solution: check the events for “vSphere HA Agent for the host has an error” – Possible datastore issue or Firewall issue , if firewall problem, check if there is another service on the host that is using port 8182. If so, shut down that service, and reconfigure vSphere HA.

Agent is in the Initialization Error State: The vSphere HA agent on a host is in the Initialization Error state for a minute or more

Solution:

  • Hosts communication error: check communication and network issues
  • Timeout errors: see agent unreachable solution and Uninitialized solutions
  • Lack of resources: free up to 75MB of disk space or free some memory
  • Reboot pending: reboot the host and reconfigure HA

Agent is in the Uninitialization Error State: The vSphere HA agent on a host is in the Uninitialization Error state

  • Solution: Add the host back to vCenter Server  The host can be added as a stand-alone host or added to any cluster.

Agent is in the Host Failed State: The vSphere HA agent on a host is in the Host Failed state.

  • Solution: Check for the noted failure conditions and resolve any that are found.

Agent is in the Network Partitioned State: The vSphere HA agent on a host is in the Network Partitioned state

  • Solution: Check for networking issues

Agent is in the Network Isolated State: The vSphere HA agent on a host is in the Network Isolated state

  • Solution: Check for networking issues

Troubleshoot DRS/HA configuration issues:

When troubleshooting configuration issues always think about the implementation prerequisites such as:

  • VMkernel configuration
  • CPU compatibility and EVC configuration
  • Heartbeat configuration
  • Redundant management network
  • Default Isolation response
  • VMoverwrite policies
  • Power management
  • APD/PDL settings

In addition you can look under the vSphere HA tab at the “Configuration issues” for more details tshoot_drs2

Configuration of vSphere HA on Hosts Times Out:

The configuration of a vSphere HA cluster might time out on some of the hosts added to it to solve this problem configure vCenter Server advanced option config.vpxd.das.electionWaitTimeSec to value=240.

tshoot_drs3

Logs to look at:

  • /var/log/fdm.log
  • /var/log/vmkernel.log

Ports needed by the HA:

  • Inbound TCP/UDP 8042-8045
  • Outbound TCP/UDP 2050-2250

Troubleshoot Virtual SAN/HA interoperability:

You can use vSAN with a vSphere HA cluster only if the following conditions are met:

  • ESXi hosts version 5.5 and higher
  • Minimum ESXi hosts in the cluster
  • vSAN can only be enabled when vSphere HA is disabled

Networking differences

Virtual SAN has its own network. When Virtual SAN and vSphere HA are enabled for the same cluster, the HA interagent traffic flows over this storage network rather than the management network. The management network is used by vSphere HA only when Virtual SAN is disabled. vCenter Server chooses the appropriate network when vSphere HA is configured on a host.

Before any changes to the vSAN network configuration, you will need to to the following:

  • Disable host monitoring (HA)
  • Make the changes to the vSAN network
  • Right-click all hosts in the cluster and select Reconfigure for vSphere HA
  • Re-enable Host Monitoring for the vSphere HA cluster

Resolve vMotion and storage vMotion issues:

Issues from VMware vSphere 6.0 Troubleshooting Guide:

Storage DRS is disabled on one or more virtual machine disks in the datastore cluster.

Possible solutions:

  • A virtual machine’s swap file is host-local
  • A certain location is specified for a virtual machine’s .vmx swap file
  • The home disk of a virtual machine is protected by vSphere HA and relocating it will cause loss of vSphere HA protection.
  • The disk is a CD-ROM/ISO file
  • disk is an independent disk
  • The virtual machine has hidden disks
  • The virtual machine is a template.
  • The virtual machine is vSphere Fault Tolerance-enabled
  • The virtual machine is sharing files between its disks
  • The virtual machine is being Storage DRS-placed with manually specified datastores

 

Datastore Cannot Enter Maintenance Mode

  • Possible solution: check storage DRS rules

Storage DRS Cannot Operate on a Datastore: Storage DRS generates an alarm to indicate that it cannot operate on the datastore.

  • Possible solution : 
    • The datastore must be visible in only one data center.
    • Enable Storae I/O control

Moving Multiple Virtual Machines into a Datastore Cluster Fails: error msg: Insufficient Disk Space on Datastore.

  • Possible solution: Retry the failed migration operations one at a time

Storage DRS Generates Fault During Virtual Machine Creation: error msg:  Operation Not Allowed in the Current State

  • Possible solution: Revise or remove the rules and retry the create or clone virtual machine operation.

Applying Storage DRS Recommendations Fails

  • Possible solution : 
    • check thin Provisioning Threshold Crossed alarm
    • check to see that the datastore not in maintenance mode

Troubleshoot VMware Fault Tolerance:

Checking configuration requirements

  • CPU compatibility and EVC configuration
  • VMkernel with FT logging
  • Low latency network (use 10G if possible)
  • HA clustering enable
  • Shared storage

Issues from VMware  vSphere 6.0 Troubleshooting Guide:

Hardware Virtualization Not Enabled

  • Possible solution: Enable HV

Compatible Hosts Not Available for Secondary VM:error msg:Secondary VM could not be powered on as there are no compatible hosts that can accommodate it. 

  • Possible solution: Enable HV

Increased Network Latency Observed in FT Virtual Machines

  • Possible solution: change NIC to 10G

Turning On vSphere FT for Powered-On VM Fails: error msg: Unknown error message.

  • Possible solution: Free up memory resources or change to a host with more memory

Recover Orphaned Virtual Machines

In cases that failover didnt failed we will have an orphaned VM the solution:

  • Check where is the .vmx file
  • Remove it from the inventory
  • Register the machine back

 

Thanks for reading

Mordi.

 

Leave a Reply

Your email address will not be published. Required fields are marked *