Plotting

The plotting functionality of xyzpy is oriented towards quickly getting an overview of high-dimensional gridded data. This is provided by a simple single method interface (with autocorrected kwargs), that allows the dimensions/coordinates to be encoded to the various visual properties in either a line-plot (the main focus), scatter-plot, histogram, or heatmap. The method is accessed (once xyzpy is imported) with the dataset.xyz.plot() method. For this example we’ll first generate a basic example dataset to plot, with 5 dimensions and 2 data variables:

%config InlineBackend.figure_formats = ['svg']
import xyzpy as xyz


@xyz.label(["fx", "dfx"])
def foo(x, delta, p, amp=1.0, C=0.0):
    # return two data variables: the function value and its derivative
    return (
        amp * (x - delta) ** p + C,
        amp * p * (x - delta) ** (p - 1),
    )


ds = foo.run_combos(
    combos=dict(
        x=[-2 + i * 0.25 for i in range(17)],
        p=[1, 2, 3],
        delta=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
        C=[-2.0, 1.0, 4.0],
        amp=[-1.0, 1.0],
    ),
)
ds
100%|##########| 1836/1836 [00:00<00:00, 1544472.95it/s]
<xarray.Dataset> Size: 30kB
Dimensions:  (x: 17, p: 3, delta: 6, C: 3, amp: 2)
Coordinates:
  * x        (x) float64 136B -2.0 -1.75 -1.5 -1.25 -1.0 ... 1.25 1.5 1.75 2.0
  * p        (p) int64 24B 1 2 3
  * delta    (delta) float64 48B 0.0 0.2 0.4 0.6 0.8 1.0
  * C        (C) float64 24B -2.0 1.0 4.0
  * amp      (amp) float64 16B -1.0 1.0
Data variables:
    fx       (x, p, delta, C, amp) float64 15kB 0.0 -4.0 3.0 ... 2.0 3.0 5.0
    dfx      (x, p, delta, C, amp) float64 15kB -1.0 1.0 -1.0 ... 3.0 -3.0 3.0

Plot types

What options are supplied to the x, y and z kwargs controls what type of plot is produced:

  • x coordinate and y data variable: line-plot

  • x data variable only: histogram

  • x data variable and y data variable: scatter-plot (or line-plot if xlink is supplied)

  • x coordinate and y coordinate and z data variable: heatmap

Line-plots

Line plots are the main focus of plotting functionality. By default every slice of data is shown as a separate line with no particular visual differentiation:

fig, axs = ds.xyz.plot(x="x", y="dfx")
_images/5ca4e5d7e6ddffd7f91c26f79e5ebd4f9c1d46d3a7aa3d7f4697a6f24ca4a0b9.svg

Line properties can either be set directly or mapped to a particular dimension/coordinate by supplying its name (note multiple properties can be mapped to the same dimension):

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    # mapped:
    color="amp",
    # double mapped:
    marker="p",
    linestyle="p",
    # static:
    markersize=6,
)
_images/76fd5097a1698db7e4fc6c8fd1561e1f24077fea3595d0c41324dead1cecb616.svg

Supplying a tuple of dimension/coordinate names to a mapped property wil create a combined / fused dimension, allowing essentially an arbitrary number of dimensions to be mapped:

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    color=("amp", "p"),
)
_images/41e413f217df6c8d0da4a35c94459079dc9695663ab497a495e6ce3a9def2a30.svg

The kwarg join_across_missing controls, if data is missing, whether to skip that line section (the default) or to join across it.

Histograms

Histograms simply bin a single data variable, then use the frequency / probability density as the variable supplied to a line plot (as such they support mostly the same set of options for mapping visual properties):

fig, axs = ds.xyz.plot(
    x="fx",
    yscale="log",
    ybase=2,
    color="green",
    markeredgecolor="amp",
    marker="amp",
)
_images/374d0817e3470cb2510fea489adbad7c29474b40f4b0a16530e4d997b26ea2fe.svg

The number of bins is calculated automatically, or you can specify the number of bins (as a int) or the actual bin boundaries (as a sequence) with the bins kwarg. By default bin counts (frequency) are shown, but you can show probability density instead with the bins_density: bool kwarg:

fig, axs = ds.xyz.plot(
    x="fx",
    bins=101,
    bins_density=False,
    color="p",
    col="p",
    marker="",
    linewidth=0.5,
)
_images/9908432c380290b0e2f0691c8bb1fda716b20760f50f86196edcae4a155556c4.svg

Scatter plots

Scatter plots are triggered when you plot one data variable against another:

fig, axs = ds.xyz.plot(
    x="fx",
    y="dfx",
    xscale="symlog",
    yscale="symlog",
    color="#FF7700",
    markersize="delta",
    markeredgecolor="none",
)
_images/e285e6da78bcc515a2330c030b78ddf7d05b8caf03052d2a090ef55fb5029737.svg

Line plots can be made for two data variables by using xlink. For example, in the following code, a sweep across x is connected into a line, but the actual locations are given by fx.

fig, axs = ds.xyz.plot(
    x="fx",
    xscale="symlog",
    xlink="x",
    y="dfx",
    yscale="symlog",
    marker="",
    alpha=0.5,
    color=("amp", "p"),
)
_images/ff5eaf648fbf02f6007fdf4250eb1f3a3a8f5359544095fe4b279ca6cc9101cc.svg

Hint

This can be very useful if plotting for example error vs computational time ‘xlinked’ to some kind of ‘effort’ control knob. See also text which you can then use to annotate each point with the xlink dimension value.

Heatmaps

Heatmaps can be draw by specifying a z data variable in addition to x and y coordinates. Since only a single data trace can be shown in a heatmap, most visual properties are not supported (just row and col for faceting). Instead, unused dimensions must be aggregated over - this will happen automatically (with a warning if you don’t explicitly set aggregate).

fig, axs = ds.xyz.plot(
    x="x",
    y="delta",
    z="fx",
    zscale="symlog",
    palette="Spectral",
    col="p",
    row="amp",
    aggregate=["C"],
)
_images/821ee9c1b7f611b17ed5bf0dee454be1047200bf0c87d0e54871c0f8f705448e.svg

Aggregation & Errors

By default (apart from heatmaps), data is not aggregated and is simply all shown. You can use the aggregate=(dima, dimb, ...) option to reduce the data over those dimensions, which also produces bands or error bars (see aggregate_method, aggregate_err_range, err, err_style, err_kws options).

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    # specify a single dimension to aggregate over
    aggregate="delta",
)
_images/b984dc3c01abd89b91b477f2fc929a7528334c227fdc3d7159d5f2f96bb29e90.svg
fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    # combine with property mapping
    color=("amp", "p"),
    # aggregate over all remaining dimensions
    aggregate=True,
)
_images/caf6d3c39084439fda288d419d690ae389b81c9f16b242a85a04374b22384184.svg

The default aggregation and error calculation method (since it is robust to exponentially distributed data etc.) is the median and interquartile range, but you can specify other methods with the aggregate_method and aggregate_err_range options.

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    # aggregate over all unused dimensions
    aggregate=True,
    # show mean and standard deviation
    aggregate_method="mean",
    aggregate_err_range="std",
    err_style="bars",
)
_images/23247740f6dc45bc552b60bc1101ce75a8085c6e7d4d0f0922bbabafd80fb9a0.svg

You can also supply the error values directly with the err kwarg, and control the style of error bars with err_style and err_kws.

List of mappable visual properties

Depending on the type of plot, the following visual properties can be mapped to dimensions/coordinates (or possibly other data variables) of the dataset:

Main variables:

  • x: x-axis

  • y: y-axis - the target quantity, if not supplied bin the data over x

  • z: z-axis (for heatmaps only)

  • err: explicit error values

Style:

  • color: color of the line

  • hue : if supplied as well as color, this controls hue (palette), and color intensity (if color is not supplied then hue is equivalent to color)

  • marker: marker type

  • markersize: marker size

  • markeredgecolor: color of the marker edge

  • linestyle: line style

  • linewidth: line width

Subplot:

  • col: subplot column

  • row: subplot row

Other:

  • text: text label next to each point (useful combined with xlink), see also text_formatter and text_opts.

  • hspans: horizontal spans (per row or column if mapped)

  • vspans: vertical spans (per row or column if mapped)

Hint

As mentioned before multiple dimensions can also be ‘fused’ (stacked) and mapped to the same property using a tuple or list. Or if the supplied option is not a coordinate of the dataset, it is assumed to be global/static value for that property. To supply arbitrary options to the underyling matplotlib plot function, use kwargs=dict(...).

fig, axs = ds.xyz.plot(
    x="x",
    y="fx",
    yscale="symlog",
    ylabel="$f(x)$",
    col="p",
    row="amp",
    hue="C",
    markeredgecolor="C",
    linestyle="C",
    color="delta",
    marker="delta",
    linewidth="delta",
    markersize="delta",
)
_images/353e4271a5141c326c4db511c346280dff683c0d79f22dabd79fa8f27df67ca3.svg

Manual values, orders and labels

Each mappable property can be manually controlled with extra kwargs (here {style} is a placeholder for color, marker, linestyle etc.):

  • {style}_order: list of specific dimension values in the order they should be plotted (this is equivalent to ds.sel(dim=style_order).xyz.plot(...)).

  • {style}s: list of specific styles to use for each value.

  • {style}_label: a custom label for this dimension in the legend.

  • {style}_ticklabels: a list of specific tick labels to use for this dimension.

Here’s an example of specfically for the color property:

fig, axs = ds.xyz.plot(
    x="x",
    y="fx",
    color=("amp", "p"),
    color_order=[(1.0, 1), (-1.0, 2)],
    colors=("pink", "purple"),
    color_label="$A \\otimes p$",
    color_ticklabels=["$+^1$", "$-^2$"],
)
_images/ed8ebece847395f775c480364c0fd6b85cb9d6f82d8397c0e7f8e0ddf0f9c9db.svg

Colors and palettes

xyz.plot separates “which palette?” from “where inside that palette?”. If you only supply color or hue, then these act the same way: that dimension is mapped to a sequence of colors or to a single palette.

  • you can specify the sequence of colors as colors (or hues)

  • you can specify a palette using palette=, which can be:

    • a string name of a palette family (e.g. “viridis”, “turbo”, etc.)

    • an explicit matplotlib colormap object

    • a number in [0.0, 1.0] to generate an automatic palette from.

fig, axs = ds.xyz.plot(
    x="x",
    y="fx",
    hue=("amp", "delta"),
    palette="turbo",
    marker="",
)
_images/c518506c02aee6c715037ba09397ebed66ea0caebef5e25183d98ff4de25f83a.svg

Automatic palettes are generate in the OKLCH color space so as to be perceptually uniform. You can generate and preview them directly with xyzpy.cmoke().

xyz.cmoke(0.12)
cmoke
cmoke colormap
under
bad
over
# vary hue as well as lightness
xyz.cmoke(0.75, hue_shift=0.5)
cmoke
cmoke colormap
under
bad
over

If both hue and color are supplied, then hue controls which palette and color controls where in that palette (i.e. the intensity).

  • in this case the hues kwarg can manually specify a sequence of palettes, each of which can be str, float or colormap (the same options as palette).

fig, axs = ds.xyz.plot(
    x="x",
    y="fx",
    hue="amp",
    hues=(
        # numeric value is interpreted as a hue in [0, 1]
        0.12,
        # or explicitly supply a full palette
        xyz.cmoke(0.75, hue_shift=0.5),
    ),
    color="delta",
    marker="",
    # these can be convenient for using only a subrange of a palette
    colormap_start=0.1,
    colormap_stop=0.9,
)
_images/6d55f106072284f89ec27df8433fd49fa6287618f3ffc70396a058f66af7ccdb.svg

Legend control

By default the legend is ‘split’ (legend_merge=False), which creates separate legend blocks for each each mappend dimension:

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    color="p",
    linestyle="amp",
    linewidth="C",
    # the default:
    legend_merge=False,
)
_images/39501b51c22e6822d7d8a342be3ef59253d1a2e404ae9c9d6fcbdf9a9858af85.svg

If you specify legend_merge=True, then instead a legend will be produced with one entry for every unique combination of styles.

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    color="p",
    linestyle="amp",
    linewidth="C",
    legend_merge=True,
    # manually specify number of columns
    legend_ncol=2,
)
_images/0f8cb1762319aa512367266af683e61dee755d0e2a66e3f359635489d2b9526c.svg

If you are fusing various dimensions, or using legend_merge=True, then the number of entries can quite quickly become large. legend_entries: dict[dim, values] lets you explicitly specify which values for each dimension to show.

fig, axs = ds.xyz.plot(
    x="x",
    y="dfx",
    # fuse and map 4 dimensions
    color=("p", "amp", "C", "delta"),
    marker=("p", "amp", "C", "delta"),
    linestyle=("p", "amp", "C", "delta"),
    palette="inferno",
    # only show 3 example entries in the legend
    legend_entries={
        # note we need to specify the fused dimension name
        "p, amp, C, delta": [
            (1, -1.0, -2.0, 0.0),
            (2, -1.0, 1.0, 0.6),
            (3, 1.0, 4.0, 1.0),
        ]
    },
    markeredgecolor="none",
    markersize=4,
)
_images/bb7eb3d1a6028e7906ad16b2dea4ce6d76a3b00a91dd7e862441a352f393807d.svg

legend_reverse, legend_labels, legend_extras, and legend_opts let you tune ordering, layout, and manual additions.

Figure and axis customization

Most figure options intentionally follow matplotlib naming. ds.xyz.plot() always returns (fig, axs), so the built-in options are a starting point rather than a limit.

Useful controls include:

  • title, height, width, hspace, wspace, sharex, sharey

  • xlabel, ylabel, xlim, ylim, xscale, yscale, xbase, ybase

  • xticks, yticks, xticklabels, yticklabels

  • hspans and vspans, either as plain sequences or as {label: value} dictionaries for annotated reference lines

  • background_color for a solid figure background

  • zscale and zlim for heatmap color normalization

fig, axs = ds.xyz.plot(
    x="x",
    y="fx",
    color="delta",
    col="p",
    row="amp",
    marker="",
    title="Custom subplot layout and reference lines",
    height=2.0,
    width=3.2,
    hspace=0.25,
    wspace=0.03,
    xlim=(-2, 2),
    ylim=(-8, 8),
    xticks=[-1, 0, 1],
    # map horizontal span location to 'amp' values
    hspans="amp",
    # map vertical span location to specific values with labels
    vspans={"$x=\\frac{1}{\\pi}$": 0.318309886},
    span_color=(0.0, 0.0, 1.0),
    background_color=(0.9, 0.9, 0.9),
    draw_color=(0, 0, 0),
)
_images/349347b679be1338fe3696ee107cb83675cc6f4fb1a7b6ca5230df8f3d9eb857.svg

You can also pass in axs or ax to plot on existing axes, with format_axs controlling whether xyzpy should modify these at all, including applying its ‘neutral style’ to these (which works on both light and dark backgrounds, but may not be desired for producing figures - see also use_neutral_style).

Alternative plotting libraries

There are many other libraries for plotting self-labelled data, especially for pandas.DataFrame objects, which can easily be generated from a xarray.Dataset using e.g.:

ds = harvester.full_ds
df = ds.to_dataframe().reset_index()

Notably:

  • xarray itself has plotting functionality

  • pandas itself has plotting functionality

  • seaborn uses matplotlib for plotting dataframes

  • holoviews uses bokeh for plotting dataframes

  • hvplot builds on holoviews for plotting datasets

  • altair for dataframes