homeclientsprojectsteamcontact

Event Management System for hedge fund

The fund, now one of the world's largest hedge funds, had badly outgrown their existing network management system, which focused primarily on paging systems administrators based on individual events, such as disk usage alerts or high CPU usage. Citadel wished to build a network operations center (NOC) with full-time staff, and to move towards proactively managing business processes instead of reactively responding to individual alerts.

The immediate business driver for upgrading the system was CitadelŐs impending move into becoming a primary market maker in options trading. This greatly increased CitadelŐs regulatory responsibilities and, for performance reasons, required Citadel to co-locate production systems in New York, while primary systems administration staff remained in Chicago.

Citadel had made a major investment in HP OpenView software but was unhappy with the amount and relevance of the alerts and pages. Systems administrators had turned off their pagers due to event overload. System performance and network load data collection was spotty and not trusted by the systems and network staff who had no way of assessing whether a performance-related event was unusual for a particular system.

Several areas of process and organizational improvement were focused on:
  • Creation of a systems engineering group who were focused on delivering long-term solutions to root-cause issues
  • Creation of a NOC team who responded solely to alerts and user calls
  • Standardization of alerts and elimination of spurious ones such as link down traps from access-layer switches
  • Configuration of OpenView agents to log the same performance data and give access to the data to all IT personnel.
In parallel, the disaster recovery team worked with the network management group to develop a model of to understand the impact of system and network problems on business processes. This information led to substantial re-prioritization of alerts as well as the embedding of application-specific information in alerts to help the NOC team take the first steps to resolving application problems.

The new network management system was handed off to internal staff approximately one year after the start of the project.

HOME      |      CLIENTS      |      PROJECTS         |      TEAM      |      CONTACT