Kea
1.9.9-git
|
Holds communication state between the two HA peers. More...
#include <communication_state.h>
Public Member Functions | |
CommunicationState (const asiolink::IOServicePtr &io_service, const HAConfigPtr &config) | |
Constructor. More... | |
virtual | ~CommunicationState () |
Destructor. More... | |
virtual void | analyzeMessage (const boost::shared_ptr< dhcp::Pkt > &message)=0 |
Checks if the DHCP message appears to be unanswered. More... | |
bool | clockSkewShouldTerminate () const |
Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value. More... | |
bool | clockSkewShouldWarn () |
Issues a warning about high clock skew between the active servers if one is warranted. More... | |
virtual bool | failureDetected () const =0 |
Checks if the partner failure has been detected based on the DHCP traffic analysis. More... | |
size_t | getAnalyzedMessagesCount () const |
Returns the number of analyzed messages while being in the communications interrupted state. More... | |
virtual size_t | getConnectingClientsCount () const =0 |
Returns the current number of clients which attempted to get a lease from the partner server. More... | |
int64_t | getDurationInMillisecs () const |
Returns duration between the poke time and current time. More... | |
std::set< std::string > | getPartnerScopes () const |
Returns scopes served by the partner server. More... | |
int | getPartnerState () const |
Returns last known state of the partner. More... | |
data::ElementPtr | getReport () const |
Returns the report about current communication state. More... | |
virtual size_t | getUnackedClientsCount () const =0 |
Returns the current number of clients which haven't got the lease from the partner server. More... | |
bool | isCommunicationInterrupted () const |
Checks if communication with the partner is interrupted. More... | |
bool | isHeartbeatRunning () const |
Checks if recurring heartbeat is running. More... | |
std::string | logFormatClockSkew () const |
Returns current clock skew value in the logger friendly format. More... | |
void | modifyPokeTime (const long secs) |
Modifies poke time by adding seconds to it. More... | |
void | poke () |
Pokes the communication state. More... | |
void | setPartnerScopes (data::ConstElementPtr new_scopes) |
Sets partner scopes. More... | |
void | setPartnerState (const std::string &state) |
Sets partner state. More... | |
void | setPartnerTime (const std::string &time_text) |
Provide partner's notion of time so the new clock skew can be calculated. More... | |
void | startHeartbeat (const long interval, const std::function< void()> &heartbeat_impl) |
Starts recurring heartbeat (public interface). More... | |
void | stopHeartbeat () |
Stops recurring heartbeat. More... | |
Protected Member Functions | |
virtual void | clearConnectingClients ()=0 |
Removes information about the clients the partner server should respond to while communication with the partner was interrupted. More... | |
boost::posix_time::time_duration | updatePokeTime () |
Update the poke time and compute the duration. More... | |
Protected Attributes | |
size_t | analyzed_messages_count_ |
Total number of analyzed messages to be responded by partner. More... | |
boost::posix_time::time_duration | clock_skew_ |
Clock skew between the active servers. More... | |
HAConfigPtr | config_ |
High availability configuration. More... | |
std::function< void()> | heartbeat_impl_ |
Pointer to the function providing heartbeat implementation. More... | |
long | interval_ |
Interval specified for the heartbeat. More... | |
asiolink::IOServicePtr | io_service_ |
Pointer to the common IO service instance. More... | |
boost::posix_time::ptime | last_clock_skew_warn_ |
Holds a time when last warning about too high clock skew was issued. More... | |
const boost::scoped_ptr< std::mutex > | mutex_ |
The mutex used to protect internal state. More... | |
boost::posix_time::ptime | my_time_at_skew_ |
My time when skew was calculated. More... | |
std::set< std::string > | partner_scopes_ |
Last known set of scopes served by the partner server. More... | |
int | partner_state_ |
Last known state of the partner server. More... | |
boost::posix_time::ptime | partner_time_at_skew_ |
Partner reported time when skew was calculated. More... | |
boost::posix_time::ptime | poke_time_ |
Last poke time. More... | |
asiolink::IntervalTimerPtr | timer_ |
Interval timer triggering heartbeat commands. More... | |
Holds communication state between the two HA peers.
The HA service constantly monitors the state of the connection between the two peers. If the connection is lost it is an indicator that the partner server may be down and failover actions should be triggered.
Any command successfully sent over the control channel is an indicator that the connection is healthy. The most common command sent over the control channel is a lease update. If the DHCP traffic is heavy, the number of generated lease updates is sufficient to determine whether the connection is healthy or not. There is no need to send heartbeat commands in this case. However, if the DHCP traffic is low there is a need to send heartbeat commands to the partner at the specified rate to keep up-to-date information about the state of the connection.
This class uses an interval timer to run heartbeat commands over the control channel. The implementation of the heartbeat is external to this class and is provided via CommunicationState::startHeartbeat
method. This implementation is required to run the poke
method in case of receiving a successful response to the heartbeat command. It must also run poke
when the lease update is successful.
The poke
method sets the "last poke time" to current time, thus indicating that the connection is healthy. The getDurationInMillisecs
method is used to check for how long the server hasn't been able to communicate with the partner. This duration is simply a time elapsed since last successful poke time. If this duration becomes greater than the configured threshold, the server assumes that the communication with the partner is interrupted.
The derivations of this class provide DHCPv4 and DHCPv6 specific mechanisms for detecting server failures based on the analysis of the received DHCP messages, i.e. how long the clients have been trying to communicate with the partner and message types they sent. In particular, the increased number of Rebind messages may indicate issues with the DHCP server.
This class is also used to monitor the clock skew between the active servers. Maintaining a reasonably low clock skew is essential for the HA service to function properly. This class calculates the clock skew by comparing local time of the server with the time returned by the partner in response to a heartbeat command. If this value exceeds the certain thresholds, the CommunicationState::clockSkewShouldWarn and the CommuicationState::clockSkewShouldTerminate
indicate whether the HA service should continue to operate normally, should start issuing a warning about high clock skew or simply enter the "terminated" state refusing to further operate until the clocks are synchronized. This requires administrative intervention and the restart of the HA service.
Definition at line 85 of file communication_state.h.
isc::ha::CommunicationState::CommunicationState | ( | const asiolink::IOServicePtr & | io_service, |
const HAConfigPtr & | config | ||
) |
Constructor.
io_service | pointer to the common IO service instance. |
config | pointer to the HA configuration. |
Definition at line 55 of file communication_state.cc.
|
virtual |
Destructor.
Stops scheduled heartbeat.
Definition at line 65 of file communication_state.cc.
References stopHeartbeat().
|
pure virtual |
Checks if the DHCP message appears to be unanswered.
This method is used to provide the communication state with a received DHCP message directed to the HA partner, to detect if the partner fails to answer DHCP messages directed to it. The DHCPv4 and DHCPv6 specific derivations implement this functionality.
This check is orthogonal to the heartbeat mechanism and is usually triggered after several consecutive heartbeats fail to be responded.
The general approach to server failure detection is based on the analysis of the "secs" field value (DHCPv4) and "elapsed time" option value (DHCPv6). They indicate for how long the client has been trying to complete the DHCP transaction. If these values exceed a configured threshold, the client is considered to fail to communicate with the server. This fact is recorded by this object. If the number of distinct clients failing to communicate with the partner exceeds a configured maximum value, this server considers the partner to be offline. In this case, this server will most likely start serving clients which would normally be served by the partner.
All information gathered by this method is cleared when the poke
method is invoked.
message | DHCP message to be analyzed. This must be the message which belongs to the partner, i.e. the caller must filter out messages belonging to the partner prior to calling this method. |
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
|
protectedpure virtual |
Removes information about the clients the partner server should respond to while communication with the partner was interrupted.
This information is cleared by the CommunicationState::poke
. The derivations of this class must provide DHCPv4 and DHCPv6 specific implementations of this method. The poke
method is called to indicate that the connection has been successfully (re)established. Therefore the clients counters are reset and the failure detection procedure starts over.
See CommunicationState::analyzeMessage
for details.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
bool isc::ha::CommunicationState::clockSkewShouldTerminate | ( | ) | const |
Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value.
If the clocks on the active servers are not synchronized (perhaps as a result of a warning message caused by clockSkewShouldWarn
) and the clocks further drift, the clock skew may exceed another threshold which should cause the HA service to enter "terminated" state. In this state the servers still respond to DHCP clients normally, but they will neither send lease updates nor heartbeats. In this case, the administrator must correct the problem (synchronize the clocks) and restart the service. This method indicates whether the service should terminate or not.
Currently, the terminal threshold for the clock skew is hardcoded to 60 seconds. In the future it may become configurable.
Definition at line 356 of file communication_state.cc.
References mutex_.
bool isc::ha::CommunicationState::clockSkewShouldWarn | ( | ) |
Issues a warning about high clock skew between the active servers if one is warranted.
The HA service monitors the clock skew between the active servers. The clock skew is calculated from the local time and the time returned by the partner in response to a heartbeat. When clock skew exceeds a certain threshold the HA service starts issuing a warning message. This method returns true if the HA service should issue this message.
Currently, the warning threshold for the clock skew is hardcoded to 30 seconds. In the future it may become configurable.
This method is called for each heartbeat. If we issue a warning for each heartbeat it may flood logs with those messages. This method provides a gating mechanism which prevents the HA service from logging the warning more often than every 60 seconds. If the last warning was issued less than 60 seconds ago this method will return false even if the clock skew exceeds the 30 seconds threshold. The correction of the clock skew will reset the gating counter.
Definition at line 317 of file communication_state.cc.
References mutex_.
|
pure virtual |
Checks if the partner failure has been detected based on the DHCP traffic analysis.
In the special case when max-unacked-clients is set to 0 this method always returns true. Note that max-unacked-clients set to 0 means that failure detection is not really performed. Returning true in that case simplifies the code of the HAService
which doesn't need to check if the failure detection is enabled or not. It simply calls this method in the 'communications interrupted' situation to check if the server should be transitioned to the 'partner-down' state.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
size_t isc::ha::CommunicationState::getAnalyzedMessagesCount | ( | ) | const |
Returns the number of analyzed messages while being in the communications interrupted state.
Definition at line 312 of file communication_state.cc.
References analyzed_messages_count_.
Referenced by getReport().
|
pure virtual |
Returns the current number of clients which attempted to get a lease from the partner server.
The returned number is reset to 0 when the server successfully establishes communication with the partner. The number is incremented only in the communications interrupted case.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
Referenced by getReport().
int64_t isc::ha::CommunicationState::getDurationInMillisecs | ( | ) | const |
Returns duration between the poke time and current time.
Definition at line 290 of file communication_state.cc.
References mutex_.
Referenced by getReport(), and isCommunicationInterrupted().
std::set< std::string > isc::ha::CommunicationState::getPartnerScopes | ( | ) | const |
Returns scopes served by the partner server.
Definition at line 110 of file communication_state.cc.
References mutex_, and partner_scopes_.
Referenced by getReport().
int isc::ha::CommunicationState::getPartnerState | ( | ) | const |
Returns last known state of the partner.
Definition at line 80 of file communication_state.cc.
References mutex_, and partner_state_.
Referenced by getReport().
ElementPtr isc::ha::CommunicationState::getReport | ( | ) | const |
Returns the report about current communication state.
This function returns a JSON map describing the state of communication with a partner. This report is included in the response to the status-get command.
Definition at line 439 of file communication_state.cc.
References config_, getAnalyzedMessagesCount(), getConnectingClientsCount(), getDurationInMillisecs(), getPartnerScopes(), getPartnerState(), getUnackedClientsCount(), isCommunicationInterrupted(), and isc::ha::stateToString().
|
pure virtual |
Returns the current number of clients which haven't got the lease from the partner server.
The returned number is reset to 0 when the server successfully establishes communication with the partner. The number is incremented only in the communications interrupted case.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
Referenced by getReport().
bool isc::ha::CommunicationState::isCommunicationInterrupted | ( | ) | const |
Checks if communication with the partner is interrupted.
This method checks if the communication with the partner appears to be interrupted. This is the case when the time since last successful communication is longer than the configured max-response-delay value.
Definition at line 307 of file communication_state.cc.
References config_, and getDurationInMillisecs().
Referenced by getReport().
bool isc::ha::CommunicationState::isHeartbeatRunning | ( | ) | const |
Checks if recurring heartbeat is running.
Definition at line 224 of file communication_state.cc.
std::string isc::ha::CommunicationState::logFormatClockSkew | ( | ) | const |
Returns current clock skew value in the logger friendly format.
Definition at line 401 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::modifyPokeTime | ( | const long | secs | ) |
Modifies poke time by adding seconds to it.
Used in unittests only.
secs | number of seconds to be added to the poke time. If the value is negative it will set the poke time in the past comparing to current value. |
Definition at line 70 of file communication_state.cc.
References mutex_, and poke_time_.
void isc::ha::CommunicationState::poke | ( | ) |
Pokes the communication state.
Sets the last poke time to current time. If the heartbeat timer has been scheduled, it is reset (starts over measuring the time to the next heartbeat).
Definition at line 253 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::setPartnerScopes | ( | data::ConstElementPtr | new_scopes | ) |
Sets partner scopes.
new_scopes | Partner scopes enclosed in a JSON list. |
Definition at line 120 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::setPartnerState | ( | const std::string & | state | ) |
Sets partner state.
state | new partner's state in a textual form. Supported values are those returned in response to a ha-heartbeat command. |
BadValue | if unsupported state value was provided. |
Definition at line 90 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::setPartnerTime | ( | const std::string & | time_text | ) |
Provide partner's notion of time so the new clock skew can be calculated.
time_text | Partner's time received in response to a heartbeat. The time must be provided in the RFC 1123 format. It stores the current time, partner's time, and the difference (skew) between them. |
isc::http::HttpTimeConversionError | if the time format is invalid. |
Definition at line 384 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::startHeartbeat | ( | const long | interval, |
const std::function< void()> & | heartbeat_impl | ||
) |
Starts recurring heartbeat (public interface).
interval | heartbeat interval in milliseconds. |
heartbeat_impl | pointer to the heartbeat implementation function. |
Definition at line 152 of file communication_state.cc.
References mutex_.
void isc::ha::CommunicationState::stopHeartbeat | ( | ) |
Stops recurring heartbeat.
Definition at line 204 of file communication_state.cc.
References mutex_.
Referenced by ~CommunicationState().
|
protected |
Update the poke time and compute the duration.
Definition at line 234 of file communication_state.cc.
References mutex_.
|
protected |
Total number of analyzed messages to be responded by partner.
Definition at line 507 of file communication_state.h.
Referenced by isc::ha::CommunicationState4::analyzeMessageInternal(), isc::ha::CommunicationState6::analyzeMessageInternal(), and getAnalyzedMessagesCount().
|
protected |
Clock skew between the active servers.
Definition at line 494 of file communication_state.h.
|
protected |
High availability configuration.
Definition at line 471 of file communication_state.h.
Referenced by isc::ha::CommunicationState4::analyzeMessageInternal(), isc::ha::CommunicationState6::analyzeMessageInternal(), isc::ha::CommunicationState4::failureDetectedInternal(), isc::ha::CommunicationState6::failureDetectedInternal(), getReport(), and isCommunicationInterrupted().
|
protected |
Pointer to the function providing heartbeat implementation.
Definition at line 483 of file communication_state.h.
|
protected |
Interval specified for the heartbeat.
Definition at line 477 of file communication_state.h.
|
protected |
Pointer to the common IO service instance.
Definition at line 468 of file communication_state.h.
|
protected |
Holds a time when last warning about too high clock skew was issued.
Definition at line 498 of file communication_state.h.
|
protected |
The mutex used to protect internal state.
Definition at line 510 of file communication_state.h.
Referenced by isc::ha::CommunicationState4::analyzeMessage(), isc::ha::CommunicationState6::analyzeMessage(), clockSkewShouldTerminate(), clockSkewShouldWarn(), isc::ha::CommunicationState4::failureDetected(), isc::ha::CommunicationState6::failureDetected(), isc::ha::CommunicationState4::getConnectingClientsCount(), isc::ha::CommunicationState6::getConnectingClientsCount(), getDurationInMillisecs(), getPartnerScopes(), getPartnerState(), isc::ha::CommunicationState4::getUnackedClientsCount(), isc::ha::CommunicationState6::getUnackedClientsCount(), isHeartbeatRunning(), logFormatClockSkew(), modifyPokeTime(), poke(), setPartnerScopes(), setPartnerState(), setPartnerTime(), startHeartbeat(), stopHeartbeat(), and updatePokeTime().
|
protected |
My time when skew was calculated.
Definition at line 501 of file communication_state.h.
|
protected |
Last known set of scopes served by the partner server.
Definition at line 491 of file communication_state.h.
Referenced by getPartnerScopes().
|
protected |
Last known state of the partner server.
Negative value means that the partner's state is unknown.
Definition at line 488 of file communication_state.h.
Referenced by getPartnerState().
|
protected |
Partner reported time when skew was calculated.
Definition at line 504 of file communication_state.h.
|
protected |
Last poke time.
Definition at line 480 of file communication_state.h.
Referenced by modifyPokeTime().
|
protected |
Interval timer triggering heartbeat commands.
Definition at line 474 of file communication_state.h.
Referenced by isHeartbeatRunning().