Xarray Dataset and DataArray operations#

# initialization
import numpy as np
import pandas as pd
import xarray as xr
# Load the B-SOSE dataset
bsose = xr.open_dataset("data/bsose_monthly_velocities.nc")

Extracting parts of an xarray Dataset#

As we have seen, an xarray Dataset consists of both data variables and coordinates. To extract a data variable, use the square brackets [] syntax. The result is an xarray DataArray object.

# extract the data variable `U` as an xarray DataArray
bsose["U"]
<xarray.DataArray 'U' (time: 12, depth: 10, lat: 147, lon: 135)> Size: 10MB
[2381400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03
Attributes:
    units:          m/s
    long_name:      Zonal Component of Velocity (m/s)
    standard_name:  UVEL
    mate:           VVEL

And to extract coordinates, use the .coords attribute, followed by the [] syntax (the coordinates of a Dataset are also DataArrays):

# extract the coordinates `lat` as an xarray DataArray
bsose.coords["lat"]
<xarray.DataArray 'lat' (lat: 147)> Size: 588B
array([-77.930405, -77.79021 , -77.64841 , -77.50499 , -77.359924, -77.21321 ,
       -77.06482 , -76.914734, -76.76295 , -76.60942 , -76.45416 , -76.29712 ,
       -76.138306, -75.97768 , -75.81523 , -75.65094 , -75.48479 , -75.31675 ,
       -75.14681 , -74.97495 , -74.80115 , -74.62538 , -74.44764 , -74.26788 ,
       -74.086105, -73.90229 , -73.7164  , -73.52843 , -73.33835 , -73.146126,
       -72.95175 , -72.75522 , -72.55647 , -72.355515, -72.15231 , -71.94685 ,
       -71.73911 , -71.52905 , -71.316666, -71.10194 , -70.88482 , -70.6653  ,
       -70.44336 , -70.21898 , -69.99213 , -69.76277 , -69.5309  , -69.29649 ,
       -69.059525, -68.819954, -68.577774, -68.33295 , -68.08548 , -67.83532 ,
       -67.58244 , -67.326836, -67.06847 , -66.80732 , -66.543365, -66.276566,
       -66.00693 , -65.7344  , -65.45897 , -65.1806  , -64.89928 , -64.614975,
       -64.327675, -64.03734 , -63.743942, -63.44747 , -63.14791 , -62.845222,
       -62.539368, -62.230347, -61.91813 , -61.602695, -61.284016, -60.96207 ,
       -60.63683 , -60.308277, -59.976387, -59.641136, -59.302505, -58.960464,
       -58.614998, -58.26608 , -57.91369 , -57.557808, -57.198425, -56.835503,
       -56.469032, -56.09899 , -55.725353, -55.348103, -54.967228, -54.58271 ,
       -54.194527, -53.802666, -53.407104, -53.007835, -52.604836, -52.19809 ,
       -51.78759 , -51.37332 , -50.95527 , -50.53342 , -50.107765, -49.6783  ,
       -49.244995, -48.807858, -48.366875, -47.922047, -47.47336 , -47.020805,
       -46.564384, -46.104095, -45.639923, -45.171875, -44.699944, -44.224136,
       -43.744453, -43.260902, -42.773476, -42.282185, -41.787025, -41.288017,
       -40.785156, -40.278465, -39.767952, -39.253616, -38.73549 , -38.21357 ,
       -37.687893, -37.158455, -36.625282, -36.088398, -35.547817, -35.00357 ,
       -34.45568 , -33.904167, -33.349068, -32.790398, -32.2282  , -31.66251 ,
       -31.093353, -30.520763, -29.939266], dtype=float32)
Coordinates:
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
Attributes:
    coordinate:     YC XC
    units:          degrees_north
    standard_name:  latitude
    long_name:      latitude
    axis:           Y

Furthermore, the internal data of a DataArray (usually a numpy ndarray) can be extracted using the .value attribute. For example, to extract the internal data of U,

# extract the internal data of an DataArray
Uarray = bsose["U"].values

# check that the data is just stored as an numpy array
print(type(Uarray))

# check the shape of the numpy array
print(Uarray.shape)
<class 'numpy.ndarray'>
(12, 10, 147, 135)

Similarly, we can also extract the internal data of a coordinates using the .values attribute, e.g.:

lat_array = bsose.coords["lat"].values
print(type(lat_array))
print(lat_array.shape)
<class 'numpy.ndarray'>
(147,)

Subsetting an xarray Dataset or DataArray#

The main tool to subset an xarray Dataset (or DataArray) is the .isel() and the .sel() methods. Similar to the .iloc[] method of a DataFrame, .isel() subset a Dataset (or DataArray) by positional indices of the dimensions. For example, to select the most shallow depths from the bsose Dataset, we may do:

bsose.isel(depth=0)
<xarray.Dataset> Size: 3MB
Dimensions:  (time: 12, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
    depth    float32 4B 2.1
Data variables:
    U        (time, lat, lon) float32 953kB 0.0 0.0 0.0 ... 0.01673 0.005073
    V        (time, lat, lon) float64 2MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

We can also supply a slice as arguments to .isel() and .sel(). However, instead of using the shorthand start:stop:step, we need to use an explicit slice() call. For example, to select the two most shallow depths from bsose, we can do:

bsose.isel(depth=slice(0, 2))
<xarray.Dataset> Size: 6MB
Dimensions:  (time: 12, depth: 2, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 8B 2.1 26.25
Data variables:
    U        (time, depth, lat, lon) float32 2MB 0.0 0.0 0.0 ... 0.03479 0.03013
    V        (time, depth, lat, lon) float64 4MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

As you should be familiar right now, the slice is endpoint exclusive when you use .isel()

In addition, we can make selection on multiple dimensions in a single .isel() call, for example:

bsose.isel(depth=slice(0,2), time=5)
<xarray.Dataset> Size: 477kB
Dimensions:  (depth: 2, lat: 147, lon: 135)
Coordinates:
    time     datetime64[ns] 8B 2012-06-30T23:00:00
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 8B 2.1 26.25
Data variables:
    U        (depth, lat, lon) float32 159kB 0.0 0.0 0.0 ... 0.04136 0.03203
    V        (depth, lat, lon) float64 318kB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

While .isel() can be useful in limited situations, the more useful subsetting method is .sel(), which subset dimensions by the corresponding coordinate values, e.g.,:

bsose.sel(depth=2.1)
<xarray.Dataset> Size: 3MB
Dimensions:  (time: 12, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
    depth    float32 4B 2.1
Data variables:
    U        (time, lat, lon) float32 953kB 0.0 0.0 0.0 ... 0.01673 0.005073
    V        (time, lat, lon) float64 2MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

And just like the .loc[] method for DataFrame, the .sel() method is endpoint inclusive

bsose.sel(depth=slice(2.1, 26.25))
<xarray.Dataset> Size: 6MB
Dimensions:  (time: 12, depth: 2, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 8B 2.1 26.25
Data variables:
    U        (time, depth, lat, lon) float32 2MB 0.0 0.0 0.0 ... 0.03479 0.03013
    V        (time, depth, lat, lon) float64 4MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

Quite often you may not know the exact coordinates of a dimension you want to subset. There are two mechanisms in .sel() that can assist you in such case. First of all, in selecting a single coordinate, you can use the method="nearest" argument to select the closest match, e.g.:

bsose.sel(depth=5, method="nearest")
<xarray.Dataset> Size: 3MB
Dimensions:  (time: 12, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
    depth    float32 4B 2.1
Data variables:
    U        (time, lat, lon) float32 953kB 0.0 0.0 0.0 ... 0.01673 0.005073
    V        (time, lat, lon) float64 2MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

Second, when you subset a Dataset (or DataArray) by slice, you can specify bounds that do not correspond to exact coordinates, e.g.,

bsose.sel(depth=slice(0, 100))
<xarray.Dataset> Size: 9MB
Dimensions:  (time: 12, depth: 3, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 12B 2.1 26.25 65.0
Data variables:
    U        (time, depth, lat, lon) float32 3MB 0.0 0.0 0.0 ... 0.02728 0.02084
    V        (time, depth, lat, lon) float64 6MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

The same also works for coordinates that are datetimes:

bsose.sel(time=slice(pd.to_datetime("2012-01-01"), pd.to_datetime("2012-03-31")))
<xarray.Dataset> Size: 5MB
Dimensions:  (time: 2, depth: 10, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 16B 2012-01-30T20:00:00 2012-03-01T06:00:00
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03
Data variables:
    U        (time, depth, lat, lon) float32 2MB 0.0 0.0 ... -0.001613 -0.001607
    V        (time, depth, lat, lon) float64 3MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

Finally, just like the .loc[] method for DataFrame, you can also subset in .sel() using logical vectors, e.g.:

bsose.sel(lat=(bsose.coords["lat"] < -50) & (bsose.coords["lat"] > -70))
<xarray.Dataset> Size: 12MB
Dimensions:  (time: 12, depth: 10, lat: 63, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 252B -69.99 -69.76 -69.53 ... -50.96 -50.53 -50.11
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03
Data variables:
    U        (time, depth, lat, lon) float32 4MB 0.0 0.0 0.0 ... 0.07136 0.06252
    V        (time, depth, lat, lon) float64 8MB ...
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

DataArray calculations and Dataset modifications#

Arithmetic operators (e.g., +, -, *, /) work as expected on xarray DataArray, and the result is another DataArray. Furthermore, numpy functions can also be applied to the DataArray to produce new DataArray. For example, the U and V variables in bsose are eastward and northward ocean current velocity, respectively. Ignoring the vertical component of the ocean current, from the usual definition of speed and velocity, the speed of the ocean current can be obtained as:

speed = np.sqrt(bsose["U"]**2 + bsose["V"]**2)
display(speed)
<xarray.DataArray (time: 12, depth: 10, lat: 147, lon: 135)> Size: 19MB
array([[[[0.        , 0.        , 0.        , ..., 0.06010989,
          0.05197227, 0.04096739],
         [0.        , 0.        , 0.        , ..., 0.14071839,
          0.11713416, 0.09964565],
         [0.        , 0.        , 0.        , ..., 0.04750525,
          0.04322098, 0.05296802],
         ...,
         [0.05226932, 0.0536007 , 0.0527774 , ..., 0.01013198,
          0.00831877, 0.01163159],
         [0.04155295, 0.03667319, 0.03206795, ..., 0.02356772,
          0.02530149, 0.02231216],
         [0.04977216, 0.03858907, 0.026114  , ..., 0.0356411 ,
          0.02995015, 0.02251477]],

        [[0.        , 0.        , 0.        , ..., 0.05217244,
          0.04406487, 0.03266923],
         [0.        , 0.        , 0.        , ..., 0.10693451,
          0.08800546, 0.07192853],
         [0.        , 0.        , 0.        , ..., 0.05761292,
          0.05014921, 0.0393808 ],
...
         [0.01130055, 0.01364487, 0.01592921, ..., 0.00459311,
          0.00633551, 0.00740674],
         [0.01258866, 0.01473226, 0.01583225, ..., 0.00479583,
          0.00627474, 0.00715925],
         [0.01517832, 0.01638868, 0.01647329, ..., 0.00544246,
          0.00621286, 0.00667411]],

        [[0.        , 0.        , 0.        , ..., 0.        ,
          0.        , 0.        ],
         [0.        , 0.        , 0.        , ..., 0.        ,
          0.        , 0.        ],
         [0.        , 0.        , 0.        , ..., 0.        ,
          0.        , 0.        ],
         ...,
         [0.00782566, 0.0087817 , 0.00968277, ..., 0.00242523,
          0.00420563, 0.00568682],
         [0.00768446, 0.00835071, 0.00852652, ..., 0.00387363,
          0.00473127, 0.00590548],
         [0.00762166, 0.00788987, 0.00769931, ..., 0.00566757,
          0.00615353, 0.00707046]]]])
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03

Note that numpy mapping functions can be applied on xarray DataArray without any issues.

If we have an xarray Dataset, we can assign new data variables to it using the same [] operator. For example, with our speed calculation, we may do:

bsose["speed"] = np.sqrt(bsose["U"]**2 + bsose["V"]**2)
display(bsose)
<xarray.Dataset> Size: 48MB
Dimensions:  (time: 12, depth: 10, lat: 147, lon: 135)
Coordinates:
  * time     (time) datetime64[ns] 96B 2012-01-30T20:00:00 ... 2012-12-30T12:...
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03
Data variables:
    U        (time, depth, lat, lon) float32 10MB 0.0 0.0 ... -0.005353
    V        (time, depth, lat, lon) float64 19MB 0.0 0.0 ... -0.004619
    speed    (time, depth, lat, lon) float64 19MB 0.0 0.0 ... 0.006154 0.00707
Attributes:
    name:     B-SOSE (Southern Ocean State Estimate) model output

Again note that the xarray Dataset is being modified in-place. Moreover, note that if the data variable already exists it will be overwritten.

Processing missing value#

As in the case of pandas, sometimes a special value may be used to encode values that are missing. As an example, here is a bathymetry data set originated from NASA Earth Observation and converted into netcdf, which you can download a copy from here:

bathy = xr.open_dataset("data/bathymetry.nc")
display(bathy)
<xarray.Dataset> Size: 22kB
Dimensions:  (lat: 36, lon: 72)
Coordinates:
  * lat      (lat) float64 288B 87.5 82.5 77.5 72.5 ... -72.5 -77.5 -82.5 -87.5
  * lon      (lon) float64 576B -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
Data variables:
    height   (lat, lon) float64 21kB ...

The metadata of the “height” variable indicates that missing (or in this case, inapplicable) values are coded as the number 99999 in this dataset. Indeed, extracting the internal numpy array gives:

bathy["height"].values
array([[-3.84252e+03, -3.84252e+03, -3.65354e+03, ..., -3.84252e+03,
        -3.84252e+03, -3.84252e+03],
       [-2.29921e+03, -3.21260e+03, -3.46457e+03, ..., -2.80315e+03,
        -2.74016e+03, -2.39370e+03],
       [-9.76380e+02, -2.20472e+03, -4.72440e+02, ..., -2.51970e+02,
        -9.13390e+02, -1.10236e+03],
       ...,
       [-5.66930e+02, -5.98430e+02, -4.72440e+02, ...,  9.99990e+04,
         9.99990e+04, -9.44900e+01],
       [ 9.99990e+04,  9.99990e+04,  9.99990e+04, ...,  9.99990e+04,
         9.99990e+04,  9.99990e+04],
       [ 9.99990e+04,  9.99990e+04,  9.99990e+04, ...,  9.99990e+04,
         9.99990e+04,  9.99990e+04]])

Instead of keeping the inapplicable values coded as 99999, which could create troubles when, e.g., we calculate of mean ocean floor depth, we should recode the inapplicable values as np.nan. One way to do so is through the .where() method of xarray DataArray, which convert any values that does not satisfy the supplied condition to a fixed value (defaults to np.nan). Thus, to code the 99999 as np.nan, we may do:

bathy["height"].where(bathy["height"] < 90000)
<xarray.DataArray 'height' (lat: 36, lon: 72)> Size: 21kB
array([[-3842.52, -3842.52, -3653.54, ..., -3842.52, -3842.52, -3842.52],
       [-2299.21, -3212.6 , -3464.57, ..., -2803.15, -2740.16, -2393.7 ],
       [ -976.38, -2204.72,  -472.44, ...,  -251.97,  -913.39, -1102.36],
       ...,
       [ -566.93,  -598.43,  -472.44, ...,      nan,      nan,   -94.49],
       [     nan,      nan,      nan, ...,      nan,      nan,      nan],
       [     nan,      nan,      nan, ...,      nan,      nan,      nan]])
Coordinates:
  * lat      (lat) float64 288B 87.5 82.5 77.5 72.5 ... -72.5 -77.5 -82.5 -87.5
  * lon      (lon) float64 576B -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
Attributes:
    unit:      m
    NA value:  99999

(Note: since bathy["height"] is an array of floats, inequality comparisons with some leeway is generally better than equality checks)

In the code block above, we have created a new DataArray that is not assigned to variable and is thus immediately displayed. To update the original Dataset we simply have to reassign this new array to the same data variables of the dataset, i.e.,

bathy["height"] = bathy["height"].where(bathy["height"] < 90000)
display(bathy)
<xarray.Dataset> Size: 22kB
Dimensions:  (lat: 36, lon: 72)
Coordinates:
  * lat      (lat) float64 288B 87.5 82.5 77.5 72.5 ... -72.5 -77.5 -82.5 -87.5
  * lon      (lon) float64 576B -177.5 -172.5 -167.5 ... 167.5 172.5 177.5
Data variables:
    height   (lat, lon) float64 21kB -3.843e+03 -3.843e+03 ... nan nan

Xarray statistics functions#

In addition to calculations involving arithmetic operators and numpy mapping functions, xarray Dataset and DataArray also have a number of statistical methods, which can be applied along specific dimensions. For example, suppose we want to compute yearly (month averaged) ocean current, we may do:

bsose_yearly = bsose.mean("time")
display(bsose_yearly)
<xarray.Dataset> Size: 4MB
Dimensions:  (depth: 10, lat: 147, lon: 135)
Coordinates:
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03
Data variables:
    U        (depth, lat, lon) float32 794kB 0.0 0.0 0.0 ... -0.005053 -0.00524
    V        (depth, lat, lon) float64 2MB 0.0 0.0 0.0 ... -0.0009622 -0.001767
    speed    (depth, lat, lon) float64 2MB 0.0 0.0 0.0 ... 0.006136 0.006688

And similarly (for DataArray):

speed_yearly = speed.mean("time")
display(speed_yearly)
<xarray.DataArray (depth: 10, lat: 147, lon: 135)> Size: 2MB
array([[[0.        , 0.        , 0.        , ..., 0.05396177,
         0.04697288, 0.03941176],
        [0.        , 0.        , 0.        , ..., 0.12847457,
         0.11631608, 0.10786684],
        [0.        , 0.        , 0.        , ..., 0.10092306,
         0.08603653, 0.09591545],
        ...,
        [0.0525897 , 0.04824668, 0.04024953, ..., 0.02923318,
         0.02910917, 0.03795754],
        [0.04545898, 0.0475362 , 0.04286196, ..., 0.03544119,
         0.03534931, 0.0390486 ],
        [0.04179625, 0.04573175, 0.04619415, ..., 0.05025192,
         0.04542044, 0.04403939]],

       [[0.        , 0.        , 0.        , ..., 0.04670928,
         0.03689622, 0.02867834],
        [0.        , 0.        , 0.        , ..., 0.1105175 ,
         0.09622564, 0.08323717],
        [0.        , 0.        , 0.        , ..., 0.09972569,
         0.08335326, 0.072648  ],
...
        [0.00888233, 0.0080989 , 0.00825608, ..., 0.0045786 ,
         0.00564794, 0.00635814],
        [0.00902226, 0.00767792, 0.00720942, ..., 0.00602174,
         0.00700923, 0.00739127],
        [0.00944595, 0.00874795, 0.00834893, ..., 0.00687001,
         0.00730173, 0.00704931]],

       [[0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ],
        [0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ],
        [0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ],
        ...,
        [0.00586737, 0.00581053, 0.00598294, ..., 0.00441925,
         0.00506245, 0.0060725 ],
        [0.00476857, 0.00442139, 0.00426584, ..., 0.00509038,
         0.00580745, 0.00664087],
        [0.0046224 , 0.00411261, 0.00391904, ..., 0.00555926,
         0.00613552, 0.0066876 ]]])
Coordinates:
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
  * depth    (depth) float32 40B 2.1 26.25 65.0 105.0 ... 450.0 700.0 1.1e+03

Other statistics methods include:

  • .median(): calculate the median along given dimension(s)

  • .min(): calculate the minimum along given dimension(s)

  • .max(): calculate the maximum along given dimension(s)

  • .sum(): calculate sum along given dimension(s)

  • .var(): calculate the variance along given dimension(s)

  • .std(): calculate standard deviation along given dimension(s)

Remark: method chaining#

Sometimes you want to perform different action on different dimensions, e.g., you want to extract data at the shallowest depth and also average over time. Since .sel(), .mean(), etc. all return an object of the same type as the input, we can easily perform multiple actions using a coding style known as method chaining. For example, for the specific case above, we may do:

bsose.isel(depth=0).mean("time")
<xarray.Dataset> Size: 398kB
Dimensions:  (lat: 147, lon: 135)
Coordinates:
  * lat      (lat) float32 588B -77.93 -77.79 -77.65 ... -31.09 -30.52 -29.94
  * lon      (lon) float32 540B 90.17 90.83 91.5 92.17 ... 178.2 178.8 179.5
    depth    float32 4B 2.1
Data variables:
    U        (lat, lon) float32 79kB 0.0 0.0 0.0 0.0 ... 0.03736 0.03416 0.02836
    V        (lat, lon) float64 159kB 0.0 0.0 0.0 ... 0.02628 0.00748 0.001332
    speed    (lat, lon) float64 159kB 0.0 0.0 0.0 ... 0.05025 0.04542 0.04404