|
||||||
Tutorial: The SA Forum's Hardware Platform Interface
Tutorial: The SA Forum's Hardware Platform Interface A standard that is gaining broad acceptance and adoption in embedded systems in a variety of markets is the Service Availability Forum’s Hardware Platform Interface (HPI). It is one of three it has proposed, which also include specifications for Application Interfaces and System Management Interfaces. While embedded system designers may have a passing knowledge of HPI and may have even heard of customers requiring this of the products they purchase, designers likely still have many questions about HPI. Such as, what exactly is HPI and why is it important in the industry and the marketplace? What hardware management capabilities does HPI provide? And how does HPI fit into a typical system architecture and design? As shown above in Figure 1, below, the Hardware Platform Interface defines a standard interface by which applications can discover, manage, and monitor the hardware resources in the system. The Application Interfaces, as defined by the Application Interface Specifications (AIS), defines interfaces by which applications can use the availability management and application services provided by the AIS interface implementer, such as a management middleware component.
The System Management Interfaces, which will be defined by the forthcoming System Management Specification (SMS), cover standard SNMP and CIM based access to network management data related to the HPI and AIS capabilities in the system. In addition, SMS will define a Notification Service (NTF) API for sending and receiving system-level events using ITU X.73x style notifications.
The benefits of using HPI within a system design are numerous and can be grouped into two categories: functional and business.
From a functional perspective, HPI benefits system designers in three primary areas. First is allowing the hardware entities in the system to be discovered through HPI, thus facilitating the representation of the hardware entities in the System Model. This can lead to a cohesive availability management solution that takes into account the relationships between all system resources, including both hardware and software resources. The second functional benefit is that HPI enables more comprehensive system management policies by providing the facilities for fault, alarm, and hot swap management policies and actions. The third functional benefit is that by providing a standardized interface, HPI can be utilized across multiple platform form factors, such as AdvancedTCA, CompactPCI, and rack mounted servers.
By providing a standardized interface, HPI reduces the effort required to move platform management components from one HPI-enabled platform to another. This allows the system designers to focus on the information available through HPI rather than having to focus on redesigning platform management components to get the required data through another hardware management interface.
HPI Capabilities
The capabilities of HPI are focused on providing management applications with access to the information required to discover, manage, and monitor the hardware entities in a system from the point at which the system is powered on to when it is powered off. To meet these requirements, HPI is defined to provides nine critical capabilities in the areas of resource and entity discovery; reset state management; power state management; managed hot swap; alarm management; management instruments associated with HPI entities; event notification; configuration; system and resource event logging; and managing instruments associated with HPI entities.
Resource and entity discovery
By allowing users to discover the hardware elements that are available in the system, HPI simplifies the task of creating a model of the system under management for the purposes of defining the appropriate availability and fault management policies for the system.
Managing reset and power states
Similar to reset state management, many HPI-enabled resources support the capability for their power state to be monitored and controlled through HPI. For such resources, an HPI user can get the power state of the resource and can perform a power state action on the resource. Valid power state actions include powering the resource on or off, as well as power cycling the resource. This capability is critical for fault management policies that attempt to repair a fault condition by either restarting the resource by cycling its power, or by using the power control capability as part of a procedure to replace a faulty component.
Hot swap management
The second hot swap model in HPI is called the managed hot swap model. FRUs that support this model transition through additional hot swap model states allowing HPI users greater control over the insertion and extraction of such resources. When a hot swap insertion or extraction sequence begins for a FRU, the HPI service waits a configurable amount of time to see if any user wants to take control of the hot swap sequence. If no user requests control within that timeout period, the HPI service for the platform will perform the default hot swap sequence actions for the resource as it transitions through the managed hot swap model. Examples of actions that are taken during a hot swap sequence include changing the power and reset states of the resource, as well as updating the resource’s hot swap indicator (typically a LED) to signal the operator when the hot swap sequence has been completed.
But more significantly, HPI also allows a user to take control of the hot swap sequence of a managed hot swap FRU, thus giving the user greater control over the hot swap sequence to increase the manageability of the system. For example, if an operator attempts to remove a FRU from a system that has software resources running on it with no associated standby resources, the HPI user can reject the extraction sequence. Or if a system administrator attempts to insert a FRU of the wrong type or version into the system, the HPI user can detect the improper configuration and reject the insertion sequence.
Managing alarms
An HPI service may optionally update the platform and individual entity alarm annunciation devices, devices such as LEDs, LCD displays, and audible indicators, to reflect the severity of the alarms currently in the active alarm list. When this functionality is provided by the HPI service, this eliminates the need for the user to determine how to annunciate alarm conditions unless they have specific requirements that are not met by the default annunciation policies.
The benefit of the HPI alarm management capabilities is to allow an HPI user to easily determine the current alarm conditions, e.g., at system startup, and to add their own alarm conditions to the list. This simplifies the need for a user to develop a custom alarm management service since it is already provided with HPI.
Performing event notifications
The listed HPI event types are very useful in triggering system and availability management policies.
Configuration and event logging
Configuration settings have factory default values that can be modified through HPI’s configuration parameter API and then saved to non-volatile memory to override the factory defaults. Additionally, configuration settings can be reset to their factory default values after new configuration values have been stored.
HPI-enabled platforms that retain historical HPI events at the system or resource level can expose those event logs through HPI as system or resource-level event logs. These event logs can be used by an operator or developer to analyze a hardware entity failure and see the sequence of HPI events that occurred before and after the failure. How HPI manages entity instrumentation
Sensor instruments provide information on an HPI entity through the measurement of a critical hardware entity attribute, such as voltage sensors that indicate the voltage level on critical power lines or temperature sensors that indicate the temperature level on different components in the system.
As the state of sensors change, HPI will also send event notifications to all interested subscribers identifying the change in sensor state, such as a temperature sensor exceeding a critical temperature threshold. Sensors serve as an essential mechanism for monitoring the health of entities in the system, and sensor related events can be utilized to drive fault and alarm management policies.
HPI also makes provisions for control instruments, which provide read and potentially write access to control devices associated with a hardware entity such as LEDs, dry contact closures, LCD display, audible alarm indicators, etc. Controls allow an HPI user to customize the manner in which information such as alarms are communicated to the system administrator.
It also makes allowances for inventory data instruments, which provide inventory management information about the hardware entity. This usually includes information such as the manufacturer ID, product name, product version, serial number, and part number for the chassis, product, or an individual entity.
In some HPI-enabled systems, the HPI user can also update or add to the inventory data associated with an entity. Inventory data accessible through HPI improves the manageability of the system by allowing management policies to incorporate inventory data checks to ensure the proper entities and versions of those entities are being used in the system.
Watchdog timers and annunciators
Pre-timer interrupt actions are applied effectively as a warning that the watchdog timer is nearing expiration to allow any additional management actions to be applied. Once the watchdog timer expires the associated action is applied to the entity. Watchdog timers are another means by which management applications can monitor and react to changes in the health of hardware entities in the system.
Under HPI annunciators are abstract control elements each of which can have a set of alarm conditions associated with it. They ensure, based on the severity of the associated alarm conditions, that the alarms are properly annunciated through the platform’s and the entity’s alarm indicators. This eliminates the need for a user to know about the alarm annunciation devices for a platform or entity, but instead they just need to add and remove alarm conditions to annunciator management instruments to allow the HPI service to apply the appropriate alarm annunciation for the system.
In general, the management instruments associated with a resource are related to the primary hardware entity with which the resource is associated. But in some cases, resources will have management instruments that are accessible through the resource that are actually associated with a different HPI entity. For example, a simple hardware entity that cannot be directly accessed by the platform management service, such as a mezzanine card attached to a single-board computer, may have its management instruments made accessible through the HPI resource that contains the simple entity.
Using HPI in a embedded system
As this system design example will illustrate, HPI enables sophisticated system and availability management policies to be easily defined using a standard and portable hardware management interface. In this application, the platform management component illustrated is actually below the level at which HPI would be relevant. But it is shown here to illustrate that typically the platform management logic provided with the platform is usually responsible for managing the critical power and thermal subsystems.
The HPI service component provides an HPI client library that supports the HPI API functions and which can be linked into HPI user applications. The architecture of the HPI service may vary from platform to platform, but since the HPI user applications rely only on the HPI API functions, the HPI service architecture is not relevant.
In this design, the management middleware component plays a number of critical roles. Most importantly it provides a System Model to represent the state of both hardware and software resources along with the dependency and redundancy relationships between the various resources. The System Model is used to drive the availability management policies for the system, such as failing over to standby resources if a resource actively providing a service fails.
This component also represents the hardware entities discovered through HPI in the System Model, including configuration and dependency information for the hardware entities as well as updates the state of the hardware entities in the System Model as changes are identified through HPI. It also provides other services useful to embedded system applications, such as cluster management and distributed messaging services.
The Fault Management component is responsible for detecting and managing fault conditions as they occur in the system, whether they relate to hardware or software resources. When it detects or is notified that a fault has occurred, it applies the fault management policies that have been defined for the system. The policies performed by this component involved fault detection, isolation, recovery, repair and notification.
Initially, its’ main function is to analyze the HPI and other event system notifications to identify fault conditions and isolate faulty components from the remainder of the system, e.g., by powering off or indefinitely asserting the reset line of failed hardware components. It also drives fault recovery policies by updating the state of the failed resources in the System Model to allow the Management Middleware to implement the defined availability management policies for the resource. It then attempts to repair the failed component (e.g., power cycle or reset a failed hardware entity) and notify the appropriate system entities of the existence of failed components.
The Alarm Management component in this application maintains the active alarm list for the system, allows users to add and remove custom alarm conditions then to acknowledge alarm conditions. It also annunciates the current alarm conditions on the platform alarm annunciation devices. The component potentially builds upon the alarm list maintained within HPI to provide much of this functionality. But the component extends beyond the HPI alarm management capabilities by automatically adding alarms for failed software resources to the active alarm list and ensuring that all active alarms are properly annunciated on the platform.
The Hot Swap Management component manages the hot swap sequences of FRUs. As mentioned earlier, HPI allows a user to override the default hot swap sequence actions that would otherwise be performed by the HPI service. But when an HPI user takes control of a hot swap sequence, it is required to perform all of the default actions for the resource as well as any custom actions or policies. Under such conditions, this component will be responsible for three crticial operations: hot swap sequence grant policies, system, and resource actions.
Hot swap sequence grant policies, generally different for insertion and extraction sequences, determine whether a hot swap sequence should be allowed when it is first initiated. The hot swap management component provides default grant policies, while also allowing the user to provide their own custom policies.
The hot swap management component is designed to perform defined system actions in reaction to the hot swap sequence being granted. A prime example is that the system resources that depend on a FRU being extracted are gracefully switched over to their associated standby resources on other nodes. This allows the software resources running on the FRU being extracted to gracefully switch the service over to other software resources.
Once an HPI user takes control of a hot swap sequence it also needs to perform the appropriate actions against the entity. As a result, the hot swap management component also needs to take these hardware entity actions to complete the hot swap sequence.
More information about HPI and the other interface standards can be obtained at the Service Availability Forum web site. There you can also find information on products that are Service Availability Forum Registered (including HPI and others).
David Fick is a System Architect at GoAhead Software. Copyright 2005 © CMP Media LLC
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |