Bright Cluster Manager can sample and monitor metrics from supported GPU cards and GPU Computing Systems, such as the C2050 and the rack-mounted S2050. Examples of supported metrics include GPU temperatures, GPU exclusivity modes, GPU fan speeds, system fan speeds, PSU voltages and currents, system LED states, and GPU ECC memory statistics. The frequency of metric sampling is fully configurable and so is the consolidation of the metrics data over time. Metrics data is stored in Bright Cluster Manager's central SQL database and can be visualized in value/time graphs, as well as in Bright Cluster Manager's unique Rackview.
Furthermore, Bright Cluster Manager allows for alerts and actions to be triggered automatically when GPU metric thresholds are exceeded. Such rules are completely configurable to suit your requirements, and any built-in cluster management command, Linux command, or shell script can be used as an action. For example, if you would like to automatically receive an email and shut down a node when its GPU temperature exceeds a set value, this can easily be configured in Bright Cluster Manager.
"There are now more than a 1000 NVIDIA GPU-based clusters around the world," said Andy Keane, General Manager of the Tesla business at NVIDIA. "Bright Computing's cluster management software fills a critical need for datacenter managers to reliably monitor and manage the status of their GPU-enabled clusters."
"Bright Cluster Manager's unique GPU management and monitoring capabilities is rapidly making it the cluster management solution of choice for GPU clusters", says Dr Matthijs van Leeuwen, CEO of Bright Computing. "We will continue to work closely with NVIDIA to incorporate new GPU management and monitoring capabilities into Bright Cluster Manager."
More information about GPU support in Bright Cluster Manager:
Pictures and screenshots of Bright Cluster Manager:
SOURCE Bright Computing, Inc.
|Bright Computing, Inc.
Bright Computing, Inc., Press Office