wiki:GSoC2014/PlotXtsSpec

Multi-panel time series charts with xts

This is a set of use cases detailed for the revised plot.xts as a GSOC 2014 project.

Attributes can be renamed from this spec to something that makes more sense, except where certain compatability is explicitly trying to be matched (such as in par).

Use Cases

Our ultimate goal is for plot.xts to provide:

  1. main/sub panel and layered charts on a single input time-series

    • chart_Series/add_TA type functionality
    • replacements for the chart* and charts.* functions in PerformanceAnalytics and Portfolioanalytics
    • we’d like to replace the current functions with plot.xts equivalents
  2. lattice-style plots of the columns using one plot specification per panel as an optional behavior

TODO

  • Describe a chart specification - a nested list of attributes and functions to be applied to the data passed in
  • Describe utility functions for manipulating a plot.spec
  • Is there a hierarchy of plot, panel, data series, transformations, series line type?
  • How to dispatch data into different panels? Or how to identify what data goes with what transformation?
  • Remove the small multiples behavior for now

Open Questions

  • Should we allow transformations to be specified without panels? Yes. Or should transformations always be embedded with display code in Chart functions? Yes.
  • Should panels be allowed to overlap an existing (a’la quantmod:::addTA)?
  • At some point, axis coordination breaks down. Where do we draw the line? At the Chart level.
  • Are Charts always regular matrixes? Yes.
  • Should Page Layouts include margins and titles at the Page level? Out of scope, but Charts should.
  • How do you add complex calculations and graphics to an existing time series without chob-like structures?
  • What is the default x-axis behavior, as quantmod or as plot.xts?

Definitions (this will be confusing because we haven’t always been consistent about how these terms are used)

  • Device: any R device that works with base graphics, whether X11, Cairo, PDF, etc. plot.xts is to be device independent.
  • Frame, Figure or Page: with device independence, a new frame isn’t always a new window. For example, a new window in a PDF device is a new page. In an X11 device, a new frame shows a new window, and is created by calling plot.new or creating a new plot.
  • Figure Layout: divides the Frame, Figure or Page into Panel regions, with as many rows and columns as there are in a specified matrix. Each region contains a Chart.

— everything above this line is out of scope —

  • Chart: a Chart, for our purposes, is a single- or multi-panel time series plot. To facilitate different uses, multi-panel Charts will support two distinct behaviors. The first is similar to trellis graphics implemented in the lattice package, in that a Panel may be drawn once or repeated several times within a Page (in what are effectively linked “small multiples” or “multiples"). The second is different from trellis - the chart can be divided into several Panels that contain different chart types.
  • Chart Layout: divides the chart into regions (sometimes referred to as ‘screens’), as many rows and columns as there are in a specified matrix (or Panel functions passed in), with width and heights specified in their respective arguments. Should behave like graphics:::split.screen, but should be compatible with graphics:::layout.
  • Panel: the subject of each Chart is one or more Panels. A Panel function usually accepts data, provides a transformation on that data, and displays the resulting data within the specified region of the Chart Layout. Where there are multiple Panels within a panel, axes may optionally be coordinated.
  • Panel Type: refers to a specific chart type, such as a line chart, bar chart, scatter plot, histogram, etc. This project will focus on line charts, area charts, OHLC charts, bar charts, and stacked bar charts, all of which are time-series oriented.
  • Plot Area: the area given to the drawing of the data; if axes or sub-titles are shown outside of the plot area, the plot area will be smaller than the panel region in the Chart Layout.

Objectives and Scope

This project will focus on developing multi-panel time series Charts that may be created from specified Panels with a Panel Layout. Charts may (eventually) be of composed of panels with several different chart types, but the focus here is on time series charts that may be linked via the x- and/or y-axes. We are NOT going to worry about linking axes across multiple Charts, although there are conditions under which it may occur.

An example of small multiples drawn for a multi-column xts object containing five years of monthly returns (60 rows with a date index) for 12 assets. The result would be a single device (PDF) containing two Pages, each Page contains one Chart with six Panels: equally spaced, sharing an x-axis and with coordinated y-axes (across Pages and Charts in this case). The Charts are individual bar charts of the returns of each individual asset, labeled as such.

Multiples don’t have to be small. Another example would be a list of xts objects containing OHLC price, SMA, buy transactions, sell transactions, resulting position, and P&L on the same 12 assets and for the same time period (e.g., last 90 days). That could result in twelve Charts on each of twelve Pages, with each Chart composed of three Panels. The first Panel is an OHLC chart overplotted with a SMA line and dots indicating trades. The second Panel is a position bar plot. The third region contains a P&L line plot. Each Chart has a coordinated x-axis (time) but the yaxes are uncoordinated. The y-axis for the OHLC chart and the position plot are uncoordinated with other Charts; the P&L plot’s y-axis is coordinated across Charts (to show relative contribution). We anticipate that (assuming we keep compatability with graphics:::layout) multiple Charts could be minimally arranged using layout. Calls that generate multiple Charts will simply do so; the user will have to know how to use layout to manage where they go on a Page.

Component Functionality Descriptions

Data dispatch

What data goes into what Panel? A Panel function has to make assumptions about the form of the data coming into it to determine what to draw.

Time series

In the default case of a multi-column object being passed into a Chart with one Panel, the Panel will by default (or may choose to, depending on the Panel function) draw all of the data it receives.

Time series by column

In the case of by-column processing, the Panel will only receive the data from that column. Transformations can be made within the Panel function and derived data can be plotted.

Lists of time series

In the case of data in list elements, the Panel will receive the data from the list element (slot) and will treat each slot in the list as a replication of the same panel.

Default time series chart behavior

The function will take in a set of time series data and plot it as is, with multiple columns of data being treated as individual data series in the same line chart.
For example:

dim(x.xts)
[1] 500 6
plot(x.xts)

Creates a single panel time series plot with six lines in the same way that plot or chart.TimeSeries works.

Compatibility with par()

The function will allow for plot attributes to be specified to the attributes of ?par and passed in through dots (use specific argument matching, though). Where they are not given, use par() to get default values. Can we extend par where needed?

Default x-axis behavior

as quantmod or as chart.TimeSeries?

Default y-axis number behavior

Reduce the number of zeros and specify (append) the units in (to) the axis label. Allow the user to cancel this behavior or to specify the units. Not currently functional anywhere that I’m aware of.

Log y-axis behavior

The default is FALSE. The user can specify that the y-axis is a log axis. Note the different geometry involved. Functional in chart.TimeSeries.

Specify event lines

The user can identify specific events by date-label paring that is identified with a tagged vertical line, e.g., 9/11. Functional in chart.TimeSeries.

Specify time-based background areas

Allow the user to pass in start date, end date pairs and a color (or multiple colors) e.g., identifying periods of recessions or bull/bear markets. Functional in chart.TimeSeries, except for multiple color passing.

Multiple data series behavior

Always plot the data series “backwards”, from the last column or list item to the first. Generally, the first item in the list or column set tends to be the most interesting to the user, so it should appear on top where there are overlaps. Functional in chart.TimeSeries.

Specify legend location

Construct a simple legend from the data passed in and place it on the chart. Functional in chart.TimeSeries.

Small multiples behavior

The function takes a set of time series data and plots a separate chart for each column arranged in a panel. This is the same behavior if the data is presented as a list (or environment) of xts objects.

The following example results in a series of line charts, one for each column of data, arranged in six rows and a single column. By default, chart attributes are repeated in each column such that they are exactly the same in each panel. Attributes may be specified by panel.

dim(x.xts)
[1] 500 6
plot(x.xts, byColumn=TRUE)

In this next case, we get the same behavior as when byColumn is set to TRUE, in that each list element is plotted in a separate chart within a panel with the columns of data in each list element showing as individual data series in its respective chart. That has implications as to how much data appears in the chart, detailed in the next section. In general, this will be referred to as “multiple chart behavior” below.

class(l.xts)
[1] "list"
dim(l.xts)
[1] 
plot(l.xts)

Specifying multiples

The specification of multiples is by default FALSE, and can be set to TRUE or any positive integer. That value is used to determine the geometry of the page and repeat the chart as a series of graphs on the device. If more data sets are given than chart multiples specified, a new panel (in a new page, window or frame) is generated for the additional data. For example, if 12 columns of data are given and byColumn=6, then six time series are plotted into individual areas on the first device, a new device is created and the next six are plotted using the same panel dimensions in the new device.

If byColumn=TRUE, the function will try to fit NCOL(x.xts) panels on the device (which may not work depending on the number of columns). If byColumn=1, each column or list would appear in a new panel on a new device. Multiples are only created for values of two or higher.

Panel Layouts are repeated when creating new panels. For example, if a Panel Layout is given and byColumn=TRUE, then each of the 12 columns of data will appear in all of the panels of the layout given in a separate device. if byColumn=4, the layout will be repeated four times in a device, and the first four columns will be displayed each in its individual layout on the first device, over 3 devices.

In a case where ther are 13 columns and byColumn=5, the remainder of 3 columns will appear in a Panel that is sized for five charts but only three appear. The axis will be near the data.

x-axis behaviors in small multiples

The x-axis by default is displayed only at the bottom of the chart, and not repeated between the panels. The user can specify that the x-axis is to be repeated in each panel (note that this affects the determination of sizing of each panel, in that the axis is included or excluded from the overall calculation to get the panels optically the same size).

y-axis behaviors in small multiples

By default, the y-axis is the SAME for each panel to make visual comparison easy, e.g., for comparing returns. The user can specify that the y-axis is UNIQUE in each panel, e.g., for multiples of stock prices.

y-axis labels in small multiples

Takes the column label or the list label. Alternatively, allow a text tag as a sub-heading within each panel that takes on those values, and leave the y-axis label blank or with a specified value. For example, if charting a set of prices the y-axis labels in each paenl would be the symbols by default. Alternatively, the symbol could be displayed as a sub-heading within the panel (in the upper left hand corner, perhaps) and the y-axis could be blank or repeat “Price” in each panel.

Panel functions and behaviors

A Panel function can be used to define transformations of the data and its display. For example, panel.CumReturns takes return data, chains together the individual returns, and produces a line chart of cumulative returns through time. A panel.BarVaR function takes returns and plots only the first series in a bar chart overlayed with an expanding window calculation of VaR for that asset. And panel.Drawdowns produces a line chart of drawdowns through time for all data (assumed to be returns, again) passed in.

By default, plot.xts will simply chart the data in the form it is passed in, using a default panel that is a simple line chart. The following will show a single panel line chart with six lines: dim(x.xts) [1] 60 6 plot(x.xts)

A single panel can be passed in via an argument. This code will show a single panel as defined in panel.CumReturns with six lines: dim(x.xts) [1] 60 6 plot(x.xts, panels=panel.CumReturns)

In the case of byColumn=TRUE or l.xts, the panel is repeated in a small-multiples fashion. This will show six panels as defined in panel.CumReturns, each with one line: dim(x.xts) [1] 60 6 plot(x.xts, byColumn=TRUE, panels=panel.CumReturns)

Multiple panels can be passed in. In this case, the layout will be simply be divided by the number of panels, for example, divided into thirds in the following: dim(x.xts) [1] 60 6 plot(x.xts, panels = c(panel.CumReturns, panel.BarVaR, panel.Drawdowns)) This will result in a three-panel chart, each with six data data series (although the panel function may not choose to draw all of them) available.

Define a panel function

To define a panel function, a user may use a skeleton function that takes functions and charting parameters as arguments and returning a panel function that can be called in plot.xts. This would mimic quantmod:::newTA, in that it allows for highly customizable chart tools with minimal coding. The resulting code could also be changed subsequent to its generation. To create a new panel function with newPanel, certain arguments will need to be specified, such as:

newPanel(FUN,
        preFUN,
        postFUN,
        yrange = NULL,
        legend.name,
        fdots = TRUE,
        cdots = TRUE,
        data.at = 1,
        ...)

This isn’t a fixed specification; it’s just to give an indication of what a function like this might look like. Importantly, modeling on quantmod:::addTA suggests breaking the processing into three pieces. The first is a preFUN step that will be called on the data prior to passing it to FUN. This would be a function symbol or a character string of the name of the function to be called.

The FUN argument is a function that is the primary filter to be used on the data being passed into the panel. This could be most of the functions in the package TTR, for example. The result is equal to calling the function on the data passed into the panel. It should be coercible to a matrix object, of one or more columns of output. By default all columns of output will be added to the chart, unless suppressed by passing the appropriately positioned type=’n’ as the … arg.

The third is the postFUN argument, which is called on the resultant data returned from the FUN filter. This is useful for extracting the relevant data from the returned filter data. Like preFUN it must be a function symbol or a character string of the name of the function to be called.

The yrange argument could be used to provide a custom scale to the y-axis. If NULL the min and max of the data to be plotted will be used for the y-axis range.

While this function may usually sufficient to construct graphical additions to a chart, it may be necessary to modify by-hand the code produced. This can be accomplished by dumping the function to a file, or using fix on it during an interactive session.

While quantmod:::addTA allows for adding transforms to existing charts, that behavior is explicitly scoped out of this project. Panels will accept data and will be drawn individually in the order they are specified.

Panel Layout behaviors

The function takes the same arguments as the layout() function, and arranges panel regions accordingly. For example, plot areas can be arranged with specifications for heights, widths, order, etc. passed in as attributes. By default, the entire data object is passed into each panel and plotted. If byColumn = TRUE or the data is a list (l.xts), the data is passed by column or by list slot into each panel - it will be up to the user to make sure that the panels and data match up. If the data is a list, all of the data in the slot is passed into the panel.

The following example would repeat the default panel with all of the data in each of the three slots. If small multiples behavior is indicated, the layout will be repeated for each data set (column or list slot) passed in. plot(x.xts, layout=as.matrix(c(1,2,3)), height = c(4,2,1), width = c(1,1,1))

Plot Area spacing within Panels

Panel regions will by default be entirely covered by the Plot Area, with room for the y-axes (e.g., top and bottom margins set to zero by default). Y-axes may appear on the left or right side of the Plot Area, with or without labels. Plot Areas may shrink to accomodate a visible x-axis with the panel or a panel sub-title above it. Sub-titles will by default appear in the plot area, not above it.

Multi-panel chart titles

The Chart will have a main title and a main sub-title. Each panel may have a sub-title that appears within or above the plot area (if spacing is given between Plot regions) in the panel.

Each x-axis, which is always time, will by default be unlabeled (to save space at the bottom), although a label may be passed by each panel or at the chart level (which should pre-empt the panel level label at the bottom). Each panel’s y-axis is labeled according to the panel function passed in.

Panel legends

Each panel may show a simple legend. * Add to this

Chart legend

A chart legend may appear in an otherwise blank panel. For example, in a multi-panel plot where all of the data series are formatted consistently through the panels, a legend could be placed at the bottom of the chart, under the x-axis; or in the top panel under the chart title.

Chart Specifications

  • Chart specifications can be built (passed invisibly) from plot.xts. The chart specification is an accumulation of panel specs, where panel specs are accumulations themselves of transformations and plot attributes.

To make a drawing with existing panel functions:

plot.xts(x.xts, panels=c(CumReturns.panel, BarVaR.panel, Drawdowns.panel))

To make a chart.spec with existing panel functions:

PerfSum.spec <- plot.xts(x.xts, panels=c(CumReturns.panel, BarVaR.panel, Drawdowns.panel))

To make a multi-panel plot with an existing chart.spec:

plot(x.xts, spec=PerfSum.spec)

To add a panel to an existing chart.spec:

plot(x.xts, spec=PerfSum.spec, panels=Alpha.panel) # location? at the bottom by default?

To add a transform to an existing chart.spec:

plot(x.xts, spec=PerfSum.spec, ???=addFUN(StdDev.rolling, panel=2, attributes=list(n=12))

dispatches to utility function that revises the panel.spec within the chart.spec.

References

http://www.seananderson.ca/courses/11-multipanel/multipanel.pdf

Last modified 5 years ago Last modified on 06/13/14 08:05:23