Kea  1.9.9-git
isc::ha::CommunicationState Class Referenceabstract

Holds communication state between the two HA peers. More...

#include <communication_state.h>

+ Inheritance diagram for isc::ha::CommunicationState:

Public Member Functions

 CommunicationState (const asiolink::IOServicePtr &io_service, const HAConfigPtr &config)
 Constructor. More...
 
virtual ~CommunicationState ()
 Destructor. More...
 
virtual void analyzeMessage (const boost::shared_ptr< dhcp::Pkt > &message)=0
 Checks if the DHCP message appears to be unanswered. More...
 
bool clockSkewShouldTerminate () const
 Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value. More...
 
bool clockSkewShouldWarn ()
 Issues a warning about high clock skew between the active servers if one is warranted. More...
 
virtual bool failureDetected () const =0
 Checks if the partner failure has been detected based on the DHCP traffic analysis. More...
 
size_t getAnalyzedMessagesCount () const
 Returns the number of analyzed messages while being in the communications interrupted state. More...
 
virtual size_t getConnectingClientsCount () const =0
 Returns the current number of clients which attempted to get a lease from the partner server. More...
 
int64_t getDurationInMillisecs () const
 Returns duration between the poke time and current time. More...
 
std::set< std::string > getPartnerScopes () const
 Returns scopes served by the partner server. More...
 
int getPartnerState () const
 Returns last known state of the partner. More...
 
data::ElementPtr getReport () const
 Returns the report about current communication state. More...
 
virtual size_t getUnackedClientsCount () const =0
 Returns the current number of clients which haven't got the lease from the partner server. More...
 
bool isCommunicationInterrupted () const
 Checks if communication with the partner is interrupted. More...
 
bool isHeartbeatRunning () const
 Checks if recurring heartbeat is running. More...
 
std::string logFormatClockSkew () const
 Returns current clock skew value in the logger friendly format. More...
 
void modifyPokeTime (const long secs)
 Modifies poke time by adding seconds to it. More...
 
void poke ()
 Pokes the communication state. More...
 
void setPartnerScopes (data::ConstElementPtr new_scopes)
 Sets partner scopes. More...
 
void setPartnerState (const std::string &state)
 Sets partner state. More...
 
void setPartnerTime (const std::string &time_text)
 Provide partner's notion of time so the new clock skew can be calculated. More...
 
void startHeartbeat (const long interval, const std::function< void()> &heartbeat_impl)
 Starts recurring heartbeat (public interface). More...
 
void stopHeartbeat ()
 Stops recurring heartbeat. More...
 

Protected Member Functions

virtual void clearConnectingClients ()=0
 Removes information about the clients the partner server should respond to while communication with the partner was interrupted. More...
 
boost::posix_time::time_duration updatePokeTime ()
 Update the poke time and compute the duration. More...
 

Protected Attributes

size_t analyzed_messages_count_
 Total number of analyzed messages to be responded by partner. More...
 
boost::posix_time::time_duration clock_skew_
 Clock skew between the active servers. More...
 
HAConfigPtr config_
 High availability configuration. More...
 
std::function< void()> heartbeat_impl_
 Pointer to the function providing heartbeat implementation. More...
 
long interval_
 Interval specified for the heartbeat. More...
 
asiolink::IOServicePtr io_service_
 Pointer to the common IO service instance. More...
 
boost::posix_time::ptime last_clock_skew_warn_
 Holds a time when last warning about too high clock skew was issued. More...
 
const boost::scoped_ptr< std::mutex > mutex_
 The mutex used to protect internal state. More...
 
boost::posix_time::ptime my_time_at_skew_
 My time when skew was calculated. More...
 
std::set< std::string > partner_scopes_
 Last known set of scopes served by the partner server. More...
 
int partner_state_
 Last known state of the partner server. More...
 
boost::posix_time::ptime partner_time_at_skew_
 Partner reported time when skew was calculated. More...
 
boost::posix_time::ptime poke_time_
 Last poke time. More...
 
asiolink::IntervalTimerPtr timer_
 Interval timer triggering heartbeat commands. More...
 

Detailed Description

Holds communication state between the two HA peers.

The HA service constantly monitors the state of the connection between the two peers. If the connection is lost it is an indicator that the partner server may be down and failover actions should be triggered.

Any command successfully sent over the control channel is an indicator that the connection is healthy. The most common command sent over the control channel is a lease update. If the DHCP traffic is heavy, the number of generated lease updates is sufficient to determine whether the connection is healthy or not. There is no need to send heartbeat commands in this case. However, if the DHCP traffic is low there is a need to send heartbeat commands to the partner at the specified rate to keep up-to-date information about the state of the connection.

This class uses an interval timer to run heartbeat commands over the control channel. The implementation of the heartbeat is external to this class and is provided via CommunicationState::startHeartbeat method. This implementation is required to run the poke method in case of receiving a successful response to the heartbeat command. It must also run poke when the lease update is successful.

The poke method sets the "last poke time" to current time, thus indicating that the connection is healthy. The getDurationInMillisecs method is used to check for how long the server hasn't been able to communicate with the partner. This duration is simply a time elapsed since last successful poke time. If this duration becomes greater than the configured threshold, the server assumes that the communication with the partner is interrupted.

The derivations of this class provide DHCPv4 and DHCPv6 specific mechanisms for detecting server failures based on the analysis of the received DHCP messages, i.e. how long the clients have been trying to communicate with the partner and message types they sent. In particular, the increased number of Rebind messages may indicate issues with the DHCP server.

This class is also used to monitor the clock skew between the active servers. Maintaining a reasonably low clock skew is essential for the HA service to function properly. This class calculates the clock skew by comparing local time of the server with the time returned by the partner in response to a heartbeat command. If this value exceeds the certain thresholds, the CommunicationState::clockSkewShouldWarn and the CommuicationState::clockSkewShouldTerminate indicate whether the HA service should continue to operate normally, should start issuing a warning about high clock skew or simply enter the "terminated" state refusing to further operate until the clocks are synchronized. This requires administrative intervention and the restart of the HA service.

Definition at line 85 of file communication_state.h.

Constructor & Destructor Documentation

isc::ha::CommunicationState::CommunicationState ( const asiolink::IOServicePtr io_service,
const HAConfigPtr config 
)

Constructor.

Parameters
io_servicepointer to the common IO service instance.
configpointer to the HA configuration.

Definition at line 55 of file communication_state.cc.

isc::ha::CommunicationState::~CommunicationState ( )
virtual

Destructor.

Stops scheduled heartbeat.

Definition at line 65 of file communication_state.cc.

References stopHeartbeat().

+ Here is the call graph for this function:

Member Function Documentation

virtual void isc::ha::CommunicationState::analyzeMessage ( const boost::shared_ptr< dhcp::Pkt > &  message)
pure virtual

Checks if the DHCP message appears to be unanswered.

This method is used to provide the communication state with a received DHCP message directed to the HA partner, to detect if the partner fails to answer DHCP messages directed to it. The DHCPv4 and DHCPv6 specific derivations implement this functionality.

This check is orthogonal to the heartbeat mechanism and is usually triggered after several consecutive heartbeats fail to be responded.

The general approach to server failure detection is based on the analysis of the "secs" field value (DHCPv4) and "elapsed time" option value (DHCPv6). They indicate for how long the client has been trying to complete the DHCP transaction. If these values exceed a configured threshold, the client is considered to fail to communicate with the server. This fact is recorded by this object. If the number of distinct clients failing to communicate with the partner exceeds a configured maximum value, this server considers the partner to be offline. In this case, this server will most likely start serving clients which would normally be served by the partner.

All information gathered by this method is cleared when the poke method is invoked.

Parameters
messageDHCP message to be analyzed. This must be the message which belongs to the partner, i.e. the caller must filter out messages belonging to the partner prior to calling this method.

Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.

virtual void isc::ha::CommunicationState::clearConnectingClients ( )
protectedpure virtual

Removes information about the clients the partner server should respond to while communication with the partner was interrupted.

This information is cleared by the CommunicationState::poke. The derivations of this class must provide DHCPv4 and DHCPv6 specific implementations of this method. The poke method is called to indicate that the connection has been successfully (re)established. Therefore the clients counters are reset and the failure detection procedure starts over.

See CommunicationState::analyzeMessage for details.

Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.

bool isc::ha::CommunicationState::clockSkewShouldTerminate ( ) const

Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value.

If the clocks on the active servers are not synchronized (perhaps as a result of a warning message caused by clockSkewShouldWarn) and the clocks further drift, the clock skew may exceed another threshold which should cause the HA service to enter "terminated" state. In this state the servers still respond to DHCP clients normally, but they will neither send lease updates nor heartbeats. In this case, the administrator must correct the problem (synchronize the clocks) and restart the service. This method indicates whether the service should terminate or not.

Currently, the terminal threshold for the clock skew is hardcoded to 60 seconds. In the future it may become configurable.

Returns
true if the HA service should enter "terminated" state.

Definition at line 356 of file communication_state.cc.

References mutex_.

bool isc::ha::CommunicationState::clockSkewShouldWarn ( )

Issues a warning about high clock skew between the active servers if one is warranted.

The HA service monitors the clock skew between the active servers. The clock skew is calculated from the local time and the time returned by the partner in response to a heartbeat. When clock skew exceeds a certain threshold the HA service starts issuing a warning message. This method returns true if the HA service should issue this message.

Currently, the warning threshold for the clock skew is hardcoded to 30 seconds. In the future it may become configurable.

This method is called for each heartbeat. If we issue a warning for each heartbeat it may flood logs with those messages. This method provides a gating mechanism which prevents the HA service from logging the warning more often than every 60 seconds. If the last warning was issued less than 60 seconds ago this method will return false even if the clock skew exceeds the 30 seconds threshold. The correction of the clock skew will reset the gating counter.

Returns
true if the warning message should be logged because of the clock skew exceeding a warning threshold.

Definition at line 317 of file communication_state.cc.

References mutex_.

virtual bool isc::ha::CommunicationState::failureDetected ( ) const
pure virtual

Checks if the partner failure has been detected based on the DHCP traffic analysis.

In the special case when max-unacked-clients is set to 0 this method always returns true. Note that max-unacked-clients set to 0 means that failure detection is not really performed. Returning true in that case simplifies the code of the HAService which doesn't need to check if the failure detection is enabled or not. It simply calls this method in the 'communications interrupted' situation to check if the server should be transitioned to the 'partner-down' state.

Returns
true if the partner failure has been detected, false otherwise.

Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.

size_t isc::ha::CommunicationState::getAnalyzedMessagesCount ( ) const

Returns the number of analyzed messages while being in the communications interrupted state.

Returns
Number of analyzed messages. It includes retransmissions by the same clients.

Definition at line 312 of file communication_state.cc.

References analyzed_messages_count_.

Referenced by getReport().

virtual size_t isc::ha::CommunicationState::getConnectingClientsCount ( ) const
pure virtual

Returns the current number of clients which attempted to get a lease from the partner server.

The returned number is reset to 0 when the server successfully establishes communication with the partner. The number is incremented only in the communications interrupted case.

Returns
The number of clients including unacked clients.

Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.

Referenced by getReport().

int64_t isc::ha::CommunicationState::getDurationInMillisecs ( ) const

Returns duration between the poke time and current time.

Returns
Duration between the poke time and current time.

Definition at line 290 of file communication_state.cc.

References mutex_.

Referenced by getReport(), and isCommunicationInterrupted().

std::set< std::string > isc::ha::CommunicationState::getPartnerScopes ( ) const

Returns scopes served by the partner server.

Returns
A set of scopes served by the partner.

Definition at line 110 of file communication_state.cc.

References mutex_, and partner_scopes_.

Referenced by getReport().

int isc::ha::CommunicationState::getPartnerState ( ) const

Returns last known state of the partner.

Returns
Partner's state if it is known, or a negative value otherwise.

Definition at line 80 of file communication_state.cc.

References mutex_, and partner_state_.

Referenced by getReport().

ElementPtr isc::ha::CommunicationState::getReport ( ) const

Returns the report about current communication state.

This function returns a JSON map describing the state of communication with a partner. This report is included in the response to the status-get command.

Returns
JSON element holding the report.

Definition at line 439 of file communication_state.cc.

References config_, getAnalyzedMessagesCount(), getConnectingClientsCount(), getDurationInMillisecs(), getPartnerScopes(), getPartnerState(), getUnackedClientsCount(), isCommunicationInterrupted(), and isc::ha::stateToString().

+ Here is the call graph for this function:

virtual size_t isc::ha::CommunicationState::getUnackedClientsCount ( ) const
pure virtual

Returns the current number of clients which haven't got the lease from the partner server.

The returned number is reset to 0 when the server successfully establishes communication with the partner. The number is incremented only in the communications interrupted case.

Returns
Number of unacked clients.

Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.

Referenced by getReport().

bool isc::ha::CommunicationState::isCommunicationInterrupted ( ) const

Checks if communication with the partner is interrupted.

This method checks if the communication with the partner appears to be interrupted. This is the case when the time since last successful communication is longer than the configured max-response-delay value.

Returns
true if communication is interrupted, false otherwise.

Definition at line 307 of file communication_state.cc.

References config_, and getDurationInMillisecs().

Referenced by getReport().

+ Here is the call graph for this function:

bool isc::ha::CommunicationState::isHeartbeatRunning ( ) const

Checks if recurring heartbeat is running.

Returns
true if heartbeat is running, false otherwise.

Definition at line 224 of file communication_state.cc.

References mutex_, and timer_.

std::string isc::ha::CommunicationState::logFormatClockSkew ( ) const

Returns current clock skew value in the logger friendly format.

Definition at line 401 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::modifyPokeTime ( const long  secs)

Modifies poke time by adding seconds to it.

Used in unittests only.

Parameters
secsnumber of seconds to be added to the poke time. If the value is negative it will set the poke time in the past comparing to current value.

Definition at line 70 of file communication_state.cc.

References mutex_, and poke_time_.

void isc::ha::CommunicationState::poke ( )

Pokes the communication state.

Sets the last poke time to current time. If the heartbeat timer has been scheduled, it is reset (starts over measuring the time to the next heartbeat).

Definition at line 253 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::setPartnerScopes ( data::ConstElementPtr  new_scopes)

Sets partner scopes.

Parameters
new_scopesPartner scopes enclosed in a JSON list.

Definition at line 120 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::setPartnerState ( const std::string &  state)

Sets partner state.

Parameters
statenew partner's state in a textual form. Supported values are those returned in response to a ha-heartbeat command.
Exceptions
BadValueif unsupported state value was provided.

Definition at line 90 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::setPartnerTime ( const std::string &  time_text)

Provide partner's notion of time so the new clock skew can be calculated.

Parameters
time_textPartner's time received in response to a heartbeat. The time must be provided in the RFC 1123 format. It stores the current time, partner's time, and the difference (skew) between them.
Exceptions
isc::http::HttpTimeConversionErrorif the time format is invalid.
Todo:
Consider some other time formats which include millisecond precision.

Definition at line 384 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::startHeartbeat ( const long  interval,
const std::function< void()> &  heartbeat_impl 
)

Starts recurring heartbeat (public interface).

Parameters
intervalheartbeat interval in milliseconds.
heartbeat_implpointer to the heartbeat implementation function.

Definition at line 152 of file communication_state.cc.

References mutex_.

void isc::ha::CommunicationState::stopHeartbeat ( )

Stops recurring heartbeat.

Definition at line 204 of file communication_state.cc.

References mutex_.

Referenced by ~CommunicationState().

boost::posix_time::time_duration isc::ha::CommunicationState::updatePokeTime ( )
protected

Update the poke time and compute the duration.

Returns
The time elapsed.

Definition at line 234 of file communication_state.cc.

References mutex_.

Member Data Documentation

size_t isc::ha::CommunicationState::analyzed_messages_count_
protected

Total number of analyzed messages to be responded by partner.

Definition at line 507 of file communication_state.h.

Referenced by isc::ha::CommunicationState4::analyzeMessageInternal(), isc::ha::CommunicationState6::analyzeMessageInternal(), and getAnalyzedMessagesCount().

boost::posix_time::time_duration isc::ha::CommunicationState::clock_skew_
protected

Clock skew between the active servers.

Definition at line 494 of file communication_state.h.

std::function<void()> isc::ha::CommunicationState::heartbeat_impl_
protected

Pointer to the function providing heartbeat implementation.

Definition at line 483 of file communication_state.h.

long isc::ha::CommunicationState::interval_
protected

Interval specified for the heartbeat.

Definition at line 477 of file communication_state.h.

asiolink::IOServicePtr isc::ha::CommunicationState::io_service_
protected

Pointer to the common IO service instance.

Definition at line 468 of file communication_state.h.

boost::posix_time::ptime isc::ha::CommunicationState::last_clock_skew_warn_
protected

Holds a time when last warning about too high clock skew was issued.

Definition at line 498 of file communication_state.h.

boost::posix_time::ptime isc::ha::CommunicationState::my_time_at_skew_
protected

My time when skew was calculated.

Definition at line 501 of file communication_state.h.

std::set<std::string> isc::ha::CommunicationState::partner_scopes_
protected

Last known set of scopes served by the partner server.

Definition at line 491 of file communication_state.h.

Referenced by getPartnerScopes().

int isc::ha::CommunicationState::partner_state_
protected

Last known state of the partner server.

Negative value means that the partner's state is unknown.

Definition at line 488 of file communication_state.h.

Referenced by getPartnerState().

boost::posix_time::ptime isc::ha::CommunicationState::partner_time_at_skew_
protected

Partner reported time when skew was calculated.

Definition at line 504 of file communication_state.h.

boost::posix_time::ptime isc::ha::CommunicationState::poke_time_
protected

Last poke time.

Definition at line 480 of file communication_state.h.

Referenced by modifyPokeTime().

asiolink::IntervalTimerPtr isc::ha::CommunicationState::timer_
protected

Interval timer triggering heartbeat commands.

Definition at line 474 of file communication_state.h.

Referenced by isHeartbeatRunning().


The documentation for this class was generated from the following files: