Plotting¶
The plotting functionality of xyzpy is oriented towards quickly getting an
overview of high-dimensional gridded data. This is provided by a simple single
method interface (with autocorrected kwargs), that allows the
dimensions/coordinates to be encoded to the various visual properties in either
a line-plot (the main focus), scatter-plot, histogram, or
heatmap. The method is accessed (once xyzpy is imported) with the
dataset.xyz.plot() method. For this
example we’ll first generate a basic example dataset to plot, with 5 dimensions
and 2 data variables:
%config InlineBackend.figure_formats = ['svg']
import xyzpy as xyz
@xyz.label(["fx", "dfx"])
def foo(x, delta, p, amp=1.0, C=0.0):
# return two data variables: the function value and its derivative
return (
amp * (x - delta) ** p + C,
amp * p * (x - delta) ** (p - 1),
)
ds = foo.run_combos(
combos=dict(
x=[-2 + i * 0.25 for i in range(17)],
p=[1, 2, 3],
delta=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
C=[-2.0, 1.0, 4.0],
amp=[-1.0, 1.0],
),
)
ds
100%|##########| 1836/1836 [00:00<00:00, 1544472.95it/s]
<xarray.Dataset> Size: 30kB
Dimensions: (x: 17, p: 3, delta: 6, C: 3, amp: 2)
Coordinates:
* x (x) float64 136B -2.0 -1.75 -1.5 -1.25 -1.0 ... 1.25 1.5 1.75 2.0
* p (p) int64 24B 1 2 3
* delta (delta) float64 48B 0.0 0.2 0.4 0.6 0.8 1.0
* C (C) float64 24B -2.0 1.0 4.0
* amp (amp) float64 16B -1.0 1.0
Data variables:
fx (x, p, delta, C, amp) float64 15kB 0.0 -4.0 3.0 ... 2.0 3.0 5.0
dfx (x, p, delta, C, amp) float64 15kB -1.0 1.0 -1.0 ... 3.0 -3.0 3.0Plot types¶
What options are supplied to the x, y and z kwargs controls what type
of plot is produced:
xcoordinate andydata variable: line-plotxdata variable only: histogramxdata variable andydata variable: scatter-plot (or line-plot ifxlinkis supplied)xcoordinate andycoordinate andzdata variable: heatmap
Line-plots¶
Line plots are the main focus of plotting functionality. By default every slice of data is shown as a separate line with no particular visual differentiation:
fig, axs = ds.xyz.plot(x="x", y="dfx")
Line properties can either be set directly or mapped to a particular dimension/coordinate by supplying its name (note multiple properties can be mapped to the same dimension):
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
# mapped:
color="amp",
# double mapped:
marker="p",
linestyle="p",
# static:
markersize=6,
)
Supplying a tuple of dimension/coordinate names to a mapped property wil create a combined / fused dimension, allowing essentially an arbitrary number of dimensions to be mapped:
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
color=("amp", "p"),
)
The kwarg join_across_missing controls, if data is missing, whether to skip
that line section (the default) or to join across it.
Histograms¶
Histograms simply bin a single data variable, then use the frequency / probability density as the variable supplied to a line plot (as such they support mostly the same set of options for mapping visual properties):
fig, axs = ds.xyz.plot(
x="fx",
yscale="log",
ybase=2,
color="green",
markeredgecolor="amp",
marker="amp",
)
The number of bins is calculated automatically, or you can specify the number of bins (as a int) or the actual bin boundaries (as a sequence) with the bins kwarg. By default bin counts (frequency) are shown, but you can show probability density instead with the bins_density: bool kwarg:
fig, axs = ds.xyz.plot(
x="fx",
bins=101,
bins_density=False,
color="p",
col="p",
marker="",
linewidth=0.5,
)
Scatter plots¶
Scatter plots are triggered when you plot one data variable against another:
fig, axs = ds.xyz.plot(
x="fx",
y="dfx",
xscale="symlog",
yscale="symlog",
color="#FF7700",
markersize="delta",
markeredgecolor="none",
)
Line plots can be made for two data variables by using xlink.
For example, in the following code, a sweep across x is connected into a
line, but the actual locations are given by fx.
fig, axs = ds.xyz.plot(
x="fx",
xscale="symlog",
xlink="x",
y="dfx",
yscale="symlog",
marker="",
alpha=0.5,
color=("amp", "p"),
)
Hint
This can be very useful if plotting for example error vs computational time
‘xlinked’ to some kind of ‘effort’ control knob. See also text which you can
then use to annotate each point with the xlink dimension value.
Heatmaps¶
Heatmaps can be draw by specifying a z data variable in addition to x and
y coordinates. Since only a single data trace can be shown in a heatmap, most
visual properties are not supported (just row and col for faceting).
Instead, unused dimensions must be aggregated over - this will happen
automatically (with a warning if you don’t explicitly set aggregate).
fig, axs = ds.xyz.plot(
x="x",
y="delta",
z="fx",
zscale="symlog",
palette="Spectral",
col="p",
row="amp",
aggregate=["C"],
)
Aggregation & Errors¶
By default (apart from heatmaps), data is not aggregated and is simply all
shown. You can use the aggregate=(dima, dimb, ...) option to reduce the
data over those dimensions, which also produces bands or error bars (see
aggregate_method, aggregate_err_range, err, err_style, err_kws
options).
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
# specify a single dimension to aggregate over
aggregate="delta",
)
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
# combine with property mapping
color=("amp", "p"),
# aggregate over all remaining dimensions
aggregate=True,
)
The default aggregation and error calculation method (since it is robust to
exponentially distributed data etc.) is the median and interquartile range, but you can specify other methods with the aggregate_method and aggregate_err_range options.
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
# aggregate over all unused dimensions
aggregate=True,
# show mean and standard deviation
aggregate_method="mean",
aggregate_err_range="std",
err_style="bars",
)
You can also supply the error values directly with the err kwarg, and control the style of error bars with err_style and err_kws.
List of mappable visual properties¶
Depending on the type of plot, the following visual properties can be mapped to dimensions/coordinates (or possibly other data variables) of the dataset:
Main variables:
x: x-axisy: y-axis - the target quantity, if not supplied bin the data over xz: z-axis (for heatmaps only)err: explicit error values
Style:
color: color of the linehue: if supplied as well ascolor, this controls hue (palette), andcolorintensity (ifcoloris not supplied thenhueis equivalent tocolor)marker: marker typemarkersize: marker sizemarkeredgecolor: color of the marker edgelinestyle: line stylelinewidth: line width
Subplot:
col: subplot columnrow: subplot row
Other:
text: text label next to each point (useful combined withxlink), see alsotext_formatterandtext_opts.hspans: horizontal spans (per row or column if mapped)vspans: vertical spans (per row or column if mapped)
Hint
As mentioned before multiple dimensions can also be ‘fused’ (stacked) and
mapped to the same property using a tuple or list. Or if the supplied option is
not a coordinate of the dataset, it is assumed to be global/static value for
that property. To supply arbitrary options to the underyling matplotlib plot
function, use kwargs=dict(...).
fig, axs = ds.xyz.plot(
x="x",
y="fx",
yscale="symlog",
ylabel="$f(x)$",
col="p",
row="amp",
hue="C",
markeredgecolor="C",
linestyle="C",
color="delta",
marker="delta",
linewidth="delta",
markersize="delta",
)
Manual values, orders and labels¶
Each mappable property can be manually controlled with extra kwargs (here {style} is a placeholder for color, marker, linestyle etc.):
{style}_order: list of specific dimension values in the order they should be plotted (this is equivalent tods.sel(dim=style_order).xyz.plot(...)).{style}s: list of specific styles to use for each value.{style}_label: a custom label for this dimension in the legend.{style}_ticklabels: a list of specific tick labels to use for this dimension.
Here’s an example of specfically for the color property:
fig, axs = ds.xyz.plot(
x="x",
y="fx",
color=("amp", "p"),
color_order=[(1.0, 1), (-1.0, 2)],
colors=("pink", "purple"),
color_label="$A \\otimes p$",
color_ticklabels=["$+^1$", "$-^2$"],
)
Colors and palettes¶
xyz.plot separates “which palette?” from “where inside that palette?”. If you
only supply color or hue, then these act the same way: that dimension is
mapped to a sequence of colors or to a single palette.
you can specify the sequence of colors as
colors(orhues)you can specify a palette using
palette=, which can be:a string name of a palette family (e.g. “viridis”, “turbo”, etc.)
an explicit matplotlib colormap object
a number in [0.0, 1.0] to generate an automatic palette from.
fig, axs = ds.xyz.plot(
x="x",
y="fx",
hue=("amp", "delta"),
palette="turbo",
marker="",
)
Automatic palettes are generate in the OKLCH color space so as to be
perceptually uniform. You can generate and preview them directly with
xyzpy.cmoke().
xyz.cmoke(0.12)
# vary hue as well as lightness
xyz.cmoke(0.75, hue_shift=0.5)
If both hue and color are supplied, then hue controls which palette and
color controls where in that palette (i.e. the intensity).
in this case the
hueskwarg can manually specify a sequence of palettes, each of which can be str, float or colormap (the same options aspalette).
fig, axs = ds.xyz.plot(
x="x",
y="fx",
hue="amp",
hues=(
# numeric value is interpreted as a hue in [0, 1]
0.12,
# or explicitly supply a full palette
xyz.cmoke(0.75, hue_shift=0.5),
),
color="delta",
marker="",
# these can be convenient for using only a subrange of a palette
colormap_start=0.1,
colormap_stop=0.9,
)
Legend control¶
By default the legend is ‘split’ (legend_merge=False), which creates separate
legend blocks for each each mappend dimension:
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
color="p",
linestyle="amp",
linewidth="C",
# the default:
legend_merge=False,
)
If you specify legend_merge=True, then instead a legend will be produced with
one entry for every unique combination of styles.
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
color="p",
linestyle="amp",
linewidth="C",
legend_merge=True,
# manually specify number of columns
legend_ncol=2,
)
If you are fusing various dimensions, or using legend_merge=True, then the
number of entries can quite quickly become large. legend_entries: dict[dim, values]
lets you explicitly specify which values for each dimension to show.
fig, axs = ds.xyz.plot(
x="x",
y="dfx",
# fuse and map 4 dimensions
color=("p", "amp", "C", "delta"),
marker=("p", "amp", "C", "delta"),
linestyle=("p", "amp", "C", "delta"),
palette="inferno",
# only show 3 example entries in the legend
legend_entries={
# note we need to specify the fused dimension name
"p, amp, C, delta": [
(1, -1.0, -2.0, 0.0),
(2, -1.0, 1.0, 0.6),
(3, 1.0, 4.0, 1.0),
]
},
markeredgecolor="none",
markersize=4,
)
legend_reverse, legend_labels, legend_extras, and legend_opts let you tune ordering, layout, and manual additions.
Figure and axis customization¶
Most figure options intentionally follow matplotlib naming.
ds.xyz.plot() always returns (fig, axs),
so the built-in options are a starting point rather than a limit.
Useful controls include:
title,height,width,hspace,wspace,sharex,shareyxlabel,ylabel,xlim,ylim,xscale,yscale,xbase,ybasexticks,yticks,xticklabels,yticklabelshspansandvspans, either as plain sequences or as{label: value}dictionaries for annotated reference linesbackground_colorfor a solid figure backgroundzscaleandzlimfor heatmap color normalization
fig, axs = ds.xyz.plot(
x="x",
y="fx",
color="delta",
col="p",
row="amp",
marker="",
title="Custom subplot layout and reference lines",
height=2.0,
width=3.2,
hspace=0.25,
wspace=0.03,
xlim=(-2, 2),
ylim=(-8, 8),
xticks=[-1, 0, 1],
# map horizontal span location to 'amp' values
hspans="amp",
# map vertical span location to specific values with labels
vspans={"$x=\\frac{1}{\\pi}$": 0.318309886},
span_color=(0.0, 0.0, 1.0),
background_color=(0.9, 0.9, 0.9),
draw_color=(0, 0, 0),
)
You can also pass in axs or ax to plot on existing axes, with format_axs
controlling whether xyzpy should modify these at all, including applying its
‘neutral style’ to these (which works on both light and dark backgrounds, but
may not be desired for producing figures - see also use_neutral_style).
Alternative plotting libraries¶
There are many other libraries for plotting self-labelled data, especially
for pandas.DataFrame objects, which can easily be generated from a
xarray.Dataset using e.g.:
ds = harvester.full_ds
df = ds.to_dataframe().reset_index()
Notably: