The cassandra schema consists of 7 column families, 8 if you have the legacy row_key_index.
The schema for each of the column families are defined in ClusterConnection.java
The row key is made up of 3 parts all concatenated together.
The column name is different depending on of it from the legacy row_key_index or not.
Legacy Column Name: The column name is 32 bits of data. The first 31 bits is the unsigned time offset from the row key (in milliseconds). The last bit is unused.
Current Column Name: 32 bit value (no shifting and unsigned) that represents the number of time units from the row key timestamp. The time unit is either seconds or milliseconds depending on how the cluster was configured when first created.
The value of the column varies depending on the type of value. The exact format
is defined by the writeValueToBuffer
method on the DataPoint.
The length of the row defaults to exactly three weeks of data or 1,814,400,000 columns. The width of the row can be changed in the cluster configuration when the cluster is created. It cannot be changed after the schema has been created.
Legacy This row is primarily used when querying the data. The row key is the name of the metric. The names of the columns are the row keys from the data_points column family. The columns have no values.
The purpose of this table is to index the time frame a metric shows up for. It also provides a future mapping if we ever want to get away from the data points table. This is the first table hit when doing a query.
This is the second table hit when doing a query. The values in this table have all the combinations of tags that show up for the specified metric. Results from this table are filtered if any tags are specified in the query. The results are then used to lookup the actual data in the data points table.
When a metric is configured with an additional index this table is used for queries under certain circumstances. When you configure a metric to be indexed on a tag, say the host tag, data will be written to this table as well as the row keys table. If you do a lookup on the metric and specify a host tag in the query the query will use this table to speed up the query results. if no host tag is specified then the query will use the row keys table like usual.
Just an index to lookup what metric names, tag names and tag values are in the system. There are three rows one for each of the above mentioned.
All of the information stored using the metadata api is stored in this table.
This is an internal table currently only used for storing row width and time granularity. The values in this table are not to be changed by external means.
When a query comes in a column slice of the row key index is done for the particular metric, this returns the rows that will contain the data. The row keys are then filtered based on if any tags were specified. A multi get hector call is made to fetch the data from the various rows. If any row has more data then the remainder is fetched individually using a larger buffer.
By default the width of a row of data in the data points table is about 3 weeks and the time granularity of the data is in milliseconds. In the 1.3.0 release configuration options were introduced to allow the user to change those values durring the creation of the schema. (See kairosdb.conf file) The values for row_time_unit and row_width are only read when creating the schema for the write cluster. After the initial creation the values are stored in the spec table and must not be changed or data loss will occur.