Monitoring a TrinityCore server




Required Software

Statistic and metric logging in TrinityCore is implemented using two projects, InfluxDB, a time-series data storage and Grafana, graph and a dashboard builder for visualizing time series metrics.

Installing InfluxDB

  1. Download and install InfluxDB 1.x from https://influxdata.com/downloads/#influxdb for your platform. InfluxDB 2.x is not currently supported.

  2. Start InfluxDB

  3. Create a user and a database in InfluxDB using the Influx CLI and executing the commands below

    CREATE DATABASE worldserver
    CREATE USER grafana WITH PASSWORD 'grafana'
    GRANT READ ON worldserver TO grafana
  4. Edit the default retention policy to ensure the InfluxDB database doesn't grow too much.

    USE worldserver
    SHOW RETENTION POLICIES


    Refer to https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/ on how to manage retention policies.

Installing Grafana

  1. Download and install Grafana from http://docs.grafana.org/installation/
  2. Open the dashboard at http://localhost:3000
  3. Login with username admin and password admin (defaults can be changed in Grafana's .ini files)
  4. Go to Data Sources → + Add Data Source
    Name: Influx
    Type: InfluxDB
    Urlhttp://localhost:8086
    Access: Server
    Database: worldserver User: grafana Password: grafana
  5. Click on the + sign in the menu on the left called "Create" and select "Dashboard", then import each .json file in TrinityCore's /contrib/grafana clicking on "Upload JSON file"

Configuring TrinityCore

  1. Edit the worldserver.conf file
  2. Set Metric.Enable = 1
  3. Edit Metric.ConnectionInfo with connection details (e.g "127.0.0.1;8086;worldserver")
  4. Start worldserver, the dashboard should now start receiving values

Implemented (tic) and planned (error) metrics

Technical oriented

  • I/O networking traffic:
    • Packets sent (error)
    • Packets received (tic)
    • Average ping (error)
    • Traffic in (error)
    • Traffic out (error)
  • World session update time (tic)
  • Map update time (tic)
  • Map loads/unloads (tic)
  • MMap queries (tic)
  • Database async queries queued count (tic)
  • Server uptime (tic) (through world initialize and world shutdown events)
  • Active connections (error)
  • Queued connections (error)

Game oriented

  • Players online (tic)
  • Logins per hour, per day, day of week, etc (tic)
  • Mails sent (error)
  • Auction house usage (error)
  • Character levels (error)
  • Gold earn/spend (error)
  • LFG queues (error)


We'd like help implementing these and other metrics, feel free to send us a pull request.


Adding new metrics

There are two kinds of metrics that can be logged: values and events.

Values correspond to measurements of a certain quantity, like number of online players or the update diff time.

Events are something that occurs in an instant of time, e.g, a player login, worldserver shutdown, etc..

To log new metrics, call TC_METRIC_EVENT or TC_METRIC_VALUE and add a new graph to the dashboard.

TC_METRIC_EVENT(category, title, description)
  • category: Arbitrary string, table where the values and events are stored. By convention, event logs should be suffixed by "_events";
  • title: Name of the event log;
  • description: Additional info about a log event;
TC_METRIC_VALUE(category, value)
  • category: Same as above;
  • value: A measurement, it can have one of the following types: bool, std::string, float, double or any integral type (int, int32, uint, etc).


Examples
// Registering player logins: in WorldSession::HandlePlayerLogin(LoginQueryHolder* holder)
TC_METRIC_EVENT("player_events", "Login", pCurrChar->GetName());
 
// Logging the update diff time: in World::Update(uint32 diff)
TC_METRIC_VALUE("update_time_diff", diff);

Additional visualizations and metrics collection

InfluxDB is part of a bigger set of projects by InfluxData which integrate nicely with the DB:

  • Telegraf can be used to collect system metrics like CPU, I/O, memory usage and other services such as MySQL – to display this info next to the TC metrics.
  • Chronograf is an alternative to Grafana to graph and visualize time-series metrics.
  • Kapacitator is able to process streaming data from InfluxDB to provide alerts, trigger events, detect anomalies or transform data.

Additional Reading

Learn more about InfluxDB and Grafana: