Overview

Stats Collector

Organizations with high availability requirements may build and maintain farms of Aspera Servers to transfer data.  Transfers can be distributed among the servers either through DNS Round Robin or at the application level.  Normally, a transfer job can be submitted to any of the Aspera Servers in the farm.  The downfall of this is that it can be challenging to query the status of the transfers or even manage them since you do not know what server is actually handling the transfer.

Stats Collector is a software component designed to help with this problem.  It allows developers the ability to collect session and file transfer statistics from any set of Aspera Nodes and store them in a MySQL database.  These statistics can be used by applications for reporting, monitoring or automating.  Stats Collector installs as a daemon in Linux and periodically polls the list of configured Aspera Nodes and retrieves new and modified transfer statistics.

Ideally, Stats Collector is used for Single Aspera Nodes, Multiple Aspera Nodes, and DNS Round Robin clusters of Aspera Nodes.  In each of these uses Stats Collector collects the transfer information.

 


Use Cases

Alfresco CMS Integration

An example use case could be needing to manage multiple types of data through a Content Management System, like Alfresco CMS.  In this case the CMS content is transported using Aspera technologies.  Normally, one or more Aspera Connect Servers are installed and data is uploaded to these servers.  A normal workflow would be to make the content that is uploaded available to the CMS.  For this example, we can assume the following systems are in place:

  • Aspera Connect Server (single or multiple) for transferring with the Aspera Connect Plugin
  • Aspera Stats Collector for retrieving the statistics
  • Web Application (an ingestion service) that interfaces with the Aspera Stats Collector to retrieve completed transfers and also with the Alfresco Server to ingest content metadata into the CMS.
  • Alfresco Server that offers an API to import the content metadata from the Web Application

The Ingest Service Web Application

The Ingest Service is in the form of a web archive (WAR file) that is installed on any Java Servlet Container; like Tomcat, JBoss, Weblogic...  It is normally composed of two parts:

  • A Cookie Generating RESTful Interface
  • A Content Ingest Service that interfaces with the Alfresco Server API

When content is uploaded to the Aspera Server the cooke generating interface is called with the content's metadata (this metadata is a JSON object made of the metadata of all the files in the transfer session).  Normally, a transfer session would transfer hundreds of files.  The cookie is generated and will be used by the transfer to identify the session; this cookie is a text string that is associated with a transfer session and is used to identify the transfer.  The cookie and metadata are mapped together and the cookie is used to retrieve the metadata.  An example of metadata for a two file session is seen below.

 

Example of Cookie Generation Query:

curl -k -i -H "Content-Type: application/json" -X POST http://<server>:<port>/aspera4scripps/getCookie -d @metadata.json

The Response

{"cookie":"ef59c9aa-1610-4e63-b6f3-8aa81e02252d"}

Essentially the normal workflow would be:

  • Aspera Stats Collector logs the transfer statistics in the database
  • The Ingestion Service periodically queries the database to get:
    • Session Information related to a cookie, an example SQL statement for this:
      • SELECT s.session_id, s.cookie FROM fasp_sessions s WHERE ((s.status = 'completed') AND (s.cookie IN ("cookie1", "cookie2")))
    • File information for the matching completed sessions, an example SQL statement for this:
      • SELECT f.size, f.file_fullpath FROM fasp_files f WHERE f.session_id = "session1"
  • The Metadata is ingested using the Alfresco Importer API

 


Configuration

Configuration

Targeting Aspera Nodes

To specify which Aspera Nodes to poll information from you need to set the target.host property.  This property can hold:

  • A Single IP Address
  • A Single Hostname
  • A semicolon separated list of IP address or hostnames (can contains both)

If you set target.host to a hostname all IP addresses resolving to this hostname will be targeted, when using machines that are homed in multiple areas avoid setting multiple IP address to the same host, this can cause the statistics to be counted more than once.  The hostnames are refreshed at an interval that is specified in the node.resolution.interval property.  New Nodes will be targeted in the next cycle and Nodes that are removed will be excluded.

Statistic Collection

Normally Stats Collector retrieves all session statistics on the targeted Nodes and stores them in the database.  However, some needs may only require certain data to be stored.  For example, you may only want sessions that were started using Faspex collected, to achieve this you would set the faspex.session.filtering property to true.  You may also need statistics on file transfers to be collected, to do this you need to set the retrieve.filetransferstats property to true.

Remove Old Statistics

In high traffic environments the statistics database may quickly become large.  You can configure Stats Collector to remove old statistics from the database.  To do so you need to set the property purgestats.enabled to true.  This will periodically remove old statistics from the database.  You can set the interval to check with the property purgestats.interval and can set the older than property with purgestats.age.

Clustered Stats Collector

Many instances of Stats Collector can be used to collect data in a farm of Aspera nodes. The running instances share the data retrieval tasks. Data collection jobs of stopped or failed instances are reliably redistributed among the healthy instances. The clustered Stats Collectors share the same configuration file and store statistics in the same database.

Installation Guide

For more information on configuring and installing Stats Collector please see the Installation Guide.

 


Database Structure

Table fasp_sessions

Below is information on the different fields that are used in the fasp_sessions table.

Name
Type
Null 
Description
id
bigint(20)
No
Sequential integer, unique record id, assigned automatically by MySQL
logged_from
varchar(255)
No
MySQL user name used by the logger, plus the IP address of the node from the database server's point of view. Example: 'logger'@'192.168.1.101'. Although this field is populated and updated by a query from the database logger, the actual value is determined by MySQL, since the logger may not know what its IP address looks like to the database server. This is done by using MySQL's USER() function in the SQL statement.
node_uuid
varchar(36)
No
Foreign key referring to record in the fasp_nodes table for the node that logged this session
session_id
varchar(36)
No
fasp session id - for fasp 1.x, a 32 bit unsigned integer. For fasp 2.x, a UUID string of format 12345678-1234-1234-1234-123456789abc. In fasp 1.x, each side of a single transfer has its own session id. In fasp 2.x, both sides (endpoints) of a single transfer use the same session id.
status
varchar(16)
No
One of the following values (case sensitive): started, completed, paused, error, orphaned, cancelled, willretry . The 'orphaned' status is assigned when a session is found in the database that has status set to 'started' or 'paused', but the node itself reports that there is no active session with the specified session id. This may happen if the database server was inaccessible to the logger at time of termination. The 'paused' status means that target rate for the session has been set to zero. The 'unknown' status is assigned when a session is found in the database with status of 'started' or 'paused' but the node cannot be reached to query current session status. When connection to the node is restored, status will be updated to either 'started', 'paused', or 'orphaned'.
created_at
datetime
No
Time/date record created - maintained by Database Logger, not populated automatically by MySQL, but the time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
started_at
datetime
Yes
Time/date transfer started. Might be null if the database server was inaccessible to the logger at the start of a transfer. Time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
stopped_at
datetime
Yes
Time/date transfer stopped, either successfully or with an error. Null if transfer is active, or if database server was inaccessible to logger at time of termination. Time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
user
varchar(128)
Yes
Ssh user for transfer. Can be null if session failed to start. In the future, this may also be null for servers configured for open access.
cookie
text
Yes
extra user defined data attached to transfer
token
text
Yes
extra application defined data attached to transfer - used by Aspera Faspex to hold authentication token for a specific transfer request
initiator
varchar(16)
No
One of the following values (case sensitive): Local, Remote. If the transfer was initiated by the logging node, then initiator=Local, else initiator=Remote.
operation
varchar(16)
No
  • If initiator is local and direction is in, the operation is Download.
  • If initiator is local and direction is out, the operation is Upload.
  • If initiator is remote and direction is in, the operation is Upload.
  • If initiator is remote and direction is out, the operation is Download.
server_addr
varchar(255)
Yes
Non-initiator's IP address or network name - blank on server side in fasp 1.x. The transfer initiator is always considered the CLIENT. The non-initiator is always considered the SERVER.
server_sshport
int(11)
No
Ssh port used to login to the non-initiator (server).
server_faspport
int(11)
No
UDP port used for the transfer on the non-initiator (server) side.
client_addr
varchar(255)
Yes
Initiator's (client's) IP address or network name
client_faspport
int(11)
No
UDP port used for the transfer on the initiator (client) side.
cipher
varchar(16)
No
Encryption cipher used for the transfer. Currently one of the following values (case sensitive): None, AES128.
dest_path
text
Yes
Initiator's (client's) destination path for the entire session - full local system path for download, remote path relative to docroot for upload. Blank on the non-initiator's (server's) side.
files_complete
int(11)
No
Number of files successfully transferred
files_failed
int(11)
No
Number of files that failed to transfer
bytes_written
bigint(20)
No
Number of bytes written to disk on inbound side
bytes_transferred
bigint(20)
No
Number of bytes actually sent over network
bytes_lost
bigint(20)
No
Number of bytes lost (and retransmitted) due to network packet loss
usecs
bigint(20)
No
Microseconds elapsed since start of transfer
network_delay
Int(11)
Yes
Network delay in milliseconds. Null for fasp 1.x, only available for fasp 2.x transfers
err_code
int(11)
No
fasp error code for session, 0 if no error
err_desc
varchar(255)
Yes
fasp error description for session, null if no error
source_paths
text
Yes
Source file paths. Not all source paths are recorded due to buffer limits.
bytes_pretransfer
bigint(20)
Yes
Number of bytes calculated during pre-transfer scan.
files_pretransfer
int(11)
Yes
Number of files detected during pre-transfer scan.
dirs_pretransfer
int(11)
Yes
Number of directories detected during pre-transfer scan.

Table fasp_files

Below is information on the different fields that are used in the fasp_files table.

Name
MySQL Type
Null 
Description
id
bigint(20)
No
Sequential integer, unique record id, assigned automatically by MySQL
logged_from
varchar(255)
No
MySQL user name used by the logger, plus the IP address of the node from MySQL server's point of view. Example: 'logger'@'192.168.1.101'. Although this field is populated and updated by a query from the database logger, the actual value is determined by MySQL, since the logger may not know what its IP address looks like to the database server. This is done by using MySQL's USER() function in the SQL statement.
node_uuid
varchar(36)
No
Foreign key referring to record in fasp_nodes for the node that logged this session
session_id
varchar(36)
No
fasp session id - for fasp 1.x, a 32 bit unsigned integer. For fasp 2.x, a UUID string of format 12345678-1234-1234-1234-123456789abc. In fasp 1.x, each side of a single transfer has its own session id. In fasp 2.x, both sides (endpoints) of a single transfer use the same session id.
status
varchar(16)
No
One of the following values (case sensitive): running, completed, error.
created_at
datetime
No
Time/date record created - maintained by Database Logger, not populated automatically by MySQL, but the time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
started_at
datetime
Yes
Time/date file started transferring. Might be null if the database server was inaccessible to the logger at the start of the file transfer. Time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
stopped_at
datetime
Yes
Time/date file stopped transferring, either successfully or with an error. Null if file is still transferring, or if database server was inaccessible to logger at time of file termination. Time is from the database server's point of view - done by using MySQL's NOW() function in SQL statements
file_fullpath
text
No
Full, expanded pathname (including file name) from Central's perspective.
file_index
varchar(32)
No
A unique identifier for each file in a single transfer. A sequential, numerical identifier is recommended, but any unique identifier will do. This field only needs to be unique within the context of a single node and transfer. If both sides of a transfer are being logged, the file_index only has to be unique to each side of the transfer, which is why node_uuid is part of the unique index. The file_index for a file on one side of the transfer might not be the same as the file_index for the same file on the other side of the transfer (the receiving node might number the files differently than the sending node).
file_basename
text
No
Filename only, no path. Not currently implemented
source_item
text
Yes
The source item (file or directory) chosen to transfer that caused this particular file to be transferred. If a directory containing more than one file was chosen, then there will be multiple records (one per file in the directory) that all have the same value for this field.. Not currently implemented
size
bigint(20)
No
Size of the file in bytes
start_byte
bigint(20)
No
For resumed transfers, the point in the file at which transfer started. Only applies to files that were previously incompletely transferred
bytes_written
bigint(20)
No
Number of file bytes written to disk on the receiving machine
bytes_contig
bigint(20)
No
Number of contiguous file bytes transferred
bytes_lost
bigint(20)
No
Number of file bytes lost (and retransmitted) due to network packet loss
usecs
bigint(20)
No
Number of microseconds elapsed since start of file transfer
checksum
varchar(255)
Yes
Optional checksum for file. Not supplied automatically by fasp.
checksum_type
varchar(16)
Yes
String description of checksum method
err_code
int(11)
No
fasp error code for file, 0 if no error.
err_desc
varchar(255)
Yes
fasp error description for file, null if no error.

 


Downloads

Download

To download the Stats Collector Utility, please choose a version below

 


Video player

Video

×