C H A P T E R  1

Introduction to DR on Sun Fire 6800/4810/4800/3800 Systems

The dynamic reconfiguration (DR) features described in this user's guide are specific to Suntrademark Fire 6800, 4810, 4800, and 3800 systems using the Solaristrademark 8 2/02 or Solaris 9 operating environment.



Note - Performing DR operations requires root access.




Dynamic Reconfiguration

DR software is part of the Solaris operating environment. With the DR software you can dynamically reconfigure system boards and safely remove them or install them into a system while the Solaris operating environment is running and with minimum disruption to user processes running in the domain.

You can use DR to do the following:

Command Line Interface

The DR software has a command line interface (CLI) using the cfgadm command, which is the configuration administration program. The DR agent also provides a remote interface to the Suntrademark Management Center 3.0 software.

Graphical User Interface

The optional Sun Management Center 3.0 Update 1 software (and later versions), which is designed for these systems, provides features such as domain management, as well as a graphical user interface (GUI) to the cfgadm DR command line interface (CLI). If you prefer to use a GUI, use the Sun Management Center 3.0 software instead of the command line interfaces of the system controller software and the DR software.

To use the Sun Management Center 3.0 software, you must attach the System Controller board to a network. With a network connection, you can view both the command line interface and the graphical user interface. For instructions on how to use the Sun Management Center 3.0 software, refer to the Sun Management Center 3.0 User's Guide , shipped with the Sun Management Center 3.0 software. For instructions on how to connect the system controller to a network connection on the System Controller board, refer to your systems installation documentation.


DR Concepts

This section contains descriptions of general DR concepts that pertain to Sun Fire 6800/4810/4800/3800 domains.

Detachability

For a device to be detachable, it must conform to the following items:

Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one CPU board, that CPU board cannot be detached. If the boot drive does not have the failover feature implemented, the I/O board connected to it is not detachable.

If there are not multiple pathways for an I/O board, you can:

Quiescence

During the unconfigure operation on a system board with permanent memory (OpenBoottrademark PROM or kernel memory), the operating environment is briefly paused, which is known as operating environment quiescence. All operating environment and device activity on the centerplane must cease during a critical phase of the operation.

Before it can achieve quiescence, the operating environment must temporarily suspend all processes, CPUs, and device activities. If the operating environment cannot achieve quiescence, it displays the reasons, which may include the following:

The conditions that cause processes to fail to suspend are generally temporary. Examine the reasons for the failure. If the operating environment encountered a transient condition--a failure to suspend a process--you can try the operation again.

Suspend-Safe and Suspend-Unsafe Devices

When DR suspends the operating environment, all of the device drivers that are attached to the operating environment must also be suspended. If a driver cannot be suspended (or subsequently resumed), the DR operation fails.

A suspend-safe device does not access memory or interrupt the system while the operating environment is in quiescence. A driver is suspend-safe if it supports operating environment quiescence (suspend/resume). A suspend-safe driver also guarantees that when a suspend request is successfully completed, the device that the driver manages will not attempt to access memory, even if the device is open when the suspend request is made.

A suspend-unsafe device allows a memory access or a system interruption to occur while the operating environment is in quiescence.

Attachment Points

An attachment point is a collective term for a board and its slot. DR can display the status of the slot, the board, and the attachment point. The DR definition of a board also includes the devices connected to it, so the term "occupant" refers to the combination of board and attached devices.

There are two formats used when referring to attachment points:

where N0 is node 0 (zero),
SB is a system board,
IB is an I/O board, and
x is a slot number. A slot number can range from 0 through 5 for a system board, and from 6 through 9 for an I/O board.

DR Operations

There are four main types of DR operations.

Operation

Description

Connect

The slot provides power to the board and monitors its temperature. For I/O boards, the connection operation is included in the configuration operation.

Configure

The operating environment assigns functional roles to a board, and loads device drivers for the board and for devices attached to the board.

Unconfigure

The system detaches a board logically from the operating environment and takes the associated device drivers offline. Environmental monitoring continues, but devices on the board are not available for system use.

Disconnect

The system stops monitoring the board, and power to the slot is turned off.


If a system board is in use, stop its use and disconnect it from the domain before you power it off. After a new or upgraded system board is inserted and powered on, connect its attachment point and configure it for use by the operating environment.

The cfgadm (1M) command can connect and configure (or unconfigure and disconnect) in a single command, but if necessary, each operation (connection, configuration, unconfiguration, or disconnection) can be performed separately.

Hot-Plug Hardware

Hot-plug boards and modules have special connectors that supply electrical power to the board or module before the data pins make contact. Boards and devices that have hot-plug connectors can be inserted or removed while the system is running.

I/O boards and CPU/Memory boards used in the Sun Fire 6800/4810/4800/3800 servers are hot-plug devices. Some devices, such as the peripheral power supply, are not hot-plug modules and cannot be removed while the system is running.


Conditions and States

A state is the operational status of either a receptacle (slot) or an occupant (board). A condition is the operational status of an attachment point.

Before you attempt to perform any DR operation on a board or component from a domain, you must determine state and condition. Use the cfgadm (1M) command with the - la options to display the type, state, and condition of each component and the state and condition of each board slot in the domain. See the section Component Types for a list of the component types.


Board States and Conditions

This section contains descriptions of the states and conditions of system boards (also known as system slots).

Board Receptacle States

A board can have one of three receptacle states: empty, disconnected, or connected. Whenever you insert a board, the receptacle state changes from empty to disconnected. Whenever you remove a board the receptacle state changes from disconnected to empty.



caution icon

Caution - Physically removing a board that is in the connected state, or that is powered on and in the disconnected state, crashes the operating system and can result in permanent damage to that system board.



Name

Description

empty

A board is not present.

disconnected

The board is disconnected from the system bus. A board can be in the disconnected state without being powered off. However, a board must be powered off and in the disconnected state before you remove it from the slot.

connected

The board is powered on and connected to the system bus. You can view the components on a board only after it is in the connected state.


Board Occupant States

A board can have one of two occupant states: configured or unconfigured. The occupant state of a disconnected board is always unconfigured.

Name

Description

configured

At least one component on the board is configured.

unconfigured

All of the components on the board are unconfigured.


Board Conditions

A board can be in one of four conditions: unknown, ok, failed, or unusable.

Name

Description

unknown

The board has not been tested.

ok

The board is operational.

failed

The board failed testing.

unusable

The board slot is unusable.



Component States and Conditions

This section contains descriptions of the states and conditions for components.

Component Receptacle States

A component cannot be individually connected or disconnected. Thus, components can have only one state: connected.

Component Occupant States

A component can have one of two occupant states: configured or unconfigured.

Name

Description

configured

Component is available for use by the Solaris Operating Environment.

unconfigured

Component is not available for use by the Solaris Operating Environment.


Component Conditions

A component can have one of three conditions: unknown, ok, failed.

Name

Description

unknown

Component has not been tested.

ok

Component is operational.

failed

Component failed testing.



Component Types

You can use DR to configure or to unconfigure several types of components. .

Name

Description

cpu

Individ ual CPU

memory

All the memory on the board

pci

Any I/O device, controller, or bus


Sun Fire 6800/4810/4800/3800 Domains

The Sun Fire 6800, 4810, 4800, and 3800 servers can be divided into dynamic system domains, referred to as domains in this document. These domains are based on system board slots that are assigned to the domains. Each domain is electrically isolated into hardware partitions, which ensures that an arbitrary stop in one domain does not affect the other domains in the server.

The domain configuration is determined by the domain configuration table in the platform configuration database (PCD), which resides on the system controller (SC). The domain table controls how the system board slots are logically partitioned into domains. The domain configuration includes empty slots and populated slots.

The number of slots available to a given domain is controlled by an available component list that is maintained on the system controller (refer to the System Management Services (SMS) 1.2 Administrator Guide for more information about the available component list. After a slot has been assigned to a domain, it becomes visible to that domain and unavailable and invisible to any other domain. Conversely, you must disconnect and unassign a slot from its domain before you can connect and assign it to another domain.

The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain without having to be part of a physical domain. After the domain is booted, the system boards and the empty slot can be assigned to or unassigned from a logical domain; however, they are not allowed to become a part of the physical domain until the operating environment requests it. System boards or slots that are not assigned to a domain are available to all domains if the board is in the available component list for each domain. These boards can be assigned to a domain by the platform administrator. However, an available component list can be set up on the SC to allow users with appropriate privileges to assign available boards to a domain.


DR on I/O Boards

You must use caution when you add or remove system boards with I/O devices. Before you can remove a board with I/O devices, all of its devices must be closed and all of its file systems must be unmounted.

If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added or removed, reconfiguration is not necessary and need not be performed. In this case, device paths to the board devices will remain unchanged.

Problems With I/O Devices

All I/O devices must be closed before they are unconfigured. If you encounter a problem with an I/O device, the following list can help you to overcome the problem.

Refer to the Solaris 8 2/02 on Sun Hardware Release Notes Supplement for special instructions for I/O devices.


caution icon

Caution - Unmounting file systems may affect NFS client systems.





Note - If you use the ndd(1M) command to set the configuration parameters for network drivers, the parameters may not persist after a DR operation. Use the
/etc/system file or the driver.conf file for a specific driver to set the parameters permanently.




Nonpermanent and Permanent Memory

Before you can delete a board, the environment must vacate the memory on that board. Vacating a board means flushing its nonpermanent memory to swap space and copying its permanent (that is, kernel and OpenBoottrademark PROM memory) to another memory board. To relocate permanent memory, the operating environment on a domain must be temporarily suspended, or quiesced. The length of the suspension depends on the domain I/O configuration and the running workloads. Detaching a board with permanent memory is the only time when the operating environment is suspended; therefore, you should know where permanent memory resides so that you can avoid significantly impacting the operation of the domain. You can display the permanent memory by using the cfgadm (1M) command with the - v option. When permanent memory is on the board, the operating environment must find another memory component of adequate size to receive the permanent memory.

Target Memory Constraints

When permanent memory is removed, DR chooses a target memory area to receive a copy of the memory. The DR software automatically checks for total adherence. It does not allow the DR memory operation to continue if it cannot verify total adherence. A DR memory operation can be disallowed because the domain does not have enough available memory to hold the permanent memory.


An Illustration of DR Concepts

DR lets you disconnect and then reconnect system boards without bringing the system down. You can use DR to add or remove system resources while the system continues to operate.

As an example reconfiguration of system resources, consider the following Sun Fire system configuration, as depicted in the diagram that follows: domain A contains system boards 0 and 2, and I/O board 7. Domain B contains system boards 1 and 3, and I/O board 8.



Note - Before performing DR operations, always make sure that the system complies with the constraints set forth in Limitations.



  FIGURE 1-1 Example Domains Before Reconfiguration

To re-assign system board 1 from domain B to domain A, you can use the Sun Management Center software GUI. Or you can perform the following steps manually on the CLI in each domain:

1. As superuser, enter the following command on the command line in domain B to disconnect system board 1:

# cfgadm -c disconnect -o unassign N0.SB1

2. Then, enter the following command on the command line in domain A to assign, connect, and configure system board 1 in Domain A:

# cfgadm -c configure N0.SB1

The following system configuration is the result. Notice that only the way in which the boards are connected has changed, but not the physical layout of the boards within the cabinet.

  FIGURE 1-2 Example Domains After Configuration


Sun Enterprise DR Web Site

For late-breaking news and patch information, visit the Solaris 8 web page at:

http://sunsolve2.Sun.COM/sunsolve/Enterprise-dr

The web site is updated periodically.

If you do not have access to this web site, ask your Sun service provider for assistance in obtaining the latest information.


Limitations

Memory Interleaving

System boards cannot be dynamically reconfigured if system memory is interleaved across multiple CPU/Memory boards.



Note - For more information about memory interleaving, refer to the interleave-scope parameter of the setupdomain command, which is described in both the Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual and the Sun Fire 6800/4810/4800/3800 Systems Controller Command Reference Manual.



Conversely, CompactPCI cards and I/O boards can be dynamically reconfigured whether memory is interleaved or not.

Reconfiguring Permanent Memory

When a CPU/Memory board containing non-relocatable (permanent) memory is dynamically reconfigured out of the system, a short pause in all domain activity is required which may delay application response. Typically, this condition applies to one CPU/Memory board in the system. The memory on the board is identified by a non-zero permanent memory size in the status display produced by the cfgadm -av command.

DR supports reconfiguration of permanent memory from one system board to another only if one of the following conditions is met:

  • The target system board has the same amount of memory as the source system board;
-OR-
  • The target system board has more memory than the source system board. In this case, the additional memory is added to the pool of available memory.