Introduction

Postopus is a project to explore different software designs for POSTprocessing for OctoPUS runs.

In this notebook we explore the basic functionality of the postopus package.

The notebook assumes that you have Octopus installed and is in your (bash) path. The notebook is structured as follows:

  • Benzene Example : We use the Benzene example of Octopus to demonstrate the basic functionality of the postopus package.

    • We run the ground state calculation

    • Explore the data structure of postopus

    • Analyse and plot the data using a package called x-array

    • Use a package called holoviews for plotting

Before we start the analysis, we import the packages we need.

[1]:
# First import all the python modules we might need for the analysis.
import os
from pathlib import Path
import shutil

import matplotlib.pyplot as plt
import numpy as np
from postopus import Run

%matplotlib inline

Benzene Example (ground state computation)

Running the simulation

Octopus takes an input file describing the systems and the simulation parameters. The input file is called inp and is a text file with no extension. The command octopus would then have to be executed in the folder that contains this inp file. Since octopus doesn’t allow custom names for the input file, an ideal project structure would look like so:

├── benzene                    # Project folder
│   ├── benzene.xyz            # Geometry file or other supporting files
│   └── inp                    # Input file
├── h-atom
│   └── inp
├── he
│   └── inp
├── methane                     # Project folder in case of a multi-stage calculation
│   ├── calculation_gs          # Ground state calculation
│   │   └── inp
│   ├── calculation_td          # Time dependent calculation
│   │   └── inp
│   └── inp                     # Input file for the whole calculation (The other files must be placed here one by one in each stage)
└── recipe
    └── inp

To run one such simulation project one would run the command octopus in the root of the folder. To begin with, lets try to run the benzene example.

We now need to run the simulation. As mentioned before, this involves two steps: 1. Change the directory to the project folder ( that contains the inp file ) 2. Run the command octopus( optionally store the octopus output in a log file by calling octopus > out_gs.log 2>&1 ).

The above two steps could theoretically be executed in a separate shell. However, since the Jupyter notebook has terminal capabilities, it is recommended to execute all steps of a workflow (from defining the input file to the final analysis) in a single notebook, as we will do in the following. The notebook then serves as a record of all the steps taken to achieve a particular result. This increases the reproducibility of your work and makes it easier for others to understand and reuse your conclusions. For the latter reason, we also recommend using !pip freeze in the notebook to keep track of the software versions used. To increase the reproducibility even more, we can also print out the Octopus version used to generate the data.

[2]:
!octopus -v
octopus 12.1 (git commit )
[3]:
!pip freeze
alabaster==0.7.13
anyio==4.0.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
ase @ git+https://gitlab.com/ase/ase.git@ed85e1117ce3191399ac05d511ce23648487f9a4
asttokens==2.4.1
async-lru==2.0.4
attrs==23.1.0
Babel==2.13.1
beautifulsoup4==4.12.2
bleach==6.1.0
bokeh==3.3.0
certifi==2023.7.22
cffi==1.16.0
cftime==1.6.3
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorcet==3.0.1
comm==0.1.4
contourpy==1.1.1
cycler==0.12.1
dask==2023.10.1
datashader==0.16.0
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
docutils==0.18.1
exceptiongroup==1.1.3
executing==2.0.1
fastjsonschema==2.18.1
fonttools==4.43.1
fqdn==1.5.1
fsspec==2023.10.0
h11==0.14.0
holoviews==1.18.0
idna==3.4
imagesize==1.4.1
importlib-metadata==6.8.0
importlib-resources==6.1.0
invoke==2.2.0
ipykernel==6.26.0
ipython==8.17.2
ipython-genutils==0.2.0
ipywidgets==8.1.1
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.2
jsonschema-specifications==2023.7.1
jupyter==1.0.0
jupyter-client==8.5.0
jupyter-console==6.6.3
jupyter-core==5.5.0
jupyter-events==0.8.0
jupyter-lsp==2.2.0
jupyter-server==2.9.1
jupyter-server-terminals==0.4.4
jupyterlab==4.0.7
jupyterlab-pygments==0.2.2
jupyterlab-server==2.25.0
jupyterlab-widgets==3.0.9
kiwisolver==1.4.5
linkify-it-py==2.0.2
llvmlite==0.41.1
locket==1.0.0
Markdown==3.5.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.1
matplotlib-inline==0.1.6
mdit-py-plugins==0.4.0
mdurl==0.1.2
mistune==3.0.2
multipledispatch==1.0.0
nbclient==0.8.0
nbconvert==7.10.0
nbformat==5.9.2
nbsphinx==0.9.3
nest-asyncio==1.5.8
netCDF4==1.6.5
notebook==7.0.6
notebook-shim==0.2.3
numba==0.58.1
numpy==1.26.1
outcome==1.3.0.post0
overrides==7.4.0
packaging==23.2
pandas==2.1.2
pandocfilters==1.5.0
panel==1.3.1
param==2.0.0
parso==0.8.3
partd==1.4.1
pexpect==4.8.0
Pillow==10.1.0
platformdirs==3.11.0
pooch==1.8.0
postopus @ file:///builds/octopus-code/postopus
prettytable==3.9.0
prometheus-client==0.18.0
prompt-toolkit==3.0.39
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
pyct==0.5.0
Pygments==2.16.1
pyparsing==3.1.1
PySocks==1.7.1
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3.post1
pyvista==0.42.3
pyviz-comms==3.0.0
PyYAML==6.0.1
pyzmq==25.1.1
qtconsole==5.4.4
QtPy==2.4.1
referencing==0.30.2
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.10.6
scipy==1.11.3
scooby==0.9.2
selenium==4.14.0
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.5
sphinx==7.2.6
sphinx-rtd-theme==1.3.0
sphinxcontrib-applehelp==1.0.7
sphinxcontrib-devhelp==1.0.5
sphinxcontrib-htmlhelp==2.0.4
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.6
sphinxcontrib-serializinghtml==1.1.9
stack-data==0.6.3
terminado==0.17.1
tinycss2==1.2.1
toml==0.10.2
tomli==2.0.1
toolz==0.12.0
tornado==6.3.3
tqdm==4.66.1
traitlets==5.13.0
trio==0.22.2
trio-websocket==0.11.1
types-python-dateutil==2.8.19.14
typing-extensions==4.8.0
tzdata==2023.3
uc-micro-py==1.0.2
uri-template==1.3.0
urllib3==2.0.7
vtk==9.2.6
wcwidth==0.2.9
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.4
widgetsnbextension==4.0.9
wsproto==1.2.0
xarray==2023.10.1
xrft==1.0.1
xyzservices==2023.10.1
zipp==3.17.0
[4]:
mkdir examples && cd examples && mkdir benzene && cd benzene

We would now create the input file (inp). To do this we use the magic command %%writefile inp of jupyter notebook to write the contents of the cell to a file called inp (our input file) in the current directory.

[5]:
%%writefile inp

CalculationMode = gs
UnitsOutput = eV_Angstrom

Radius = 5*angstrom
Spacing = 0.15*angstrom

%Output
  wfs
  density
  elf
  potential
%

OutputFormat = cube

XYZCoordinates = "benzene.xyz"
FromScratch = true

Writing inp

Similarly we define the geometry file (benzene.xyz) and write it to the current directory.

[6]:
%%writefile benzene.xyz
12
Geometry of benzene (in Angstrom)
   C  0.000  1.396  0.000
   C  1.209  0.698  0.000
   C  1.209 -0.698  0.000
   C  0.000 -1.396  0.000
   C -1.209 -0.698  0.000
   C -1.209  0.698  0.000
   H  0.000  2.479  0.000
   H  2.147  1.240  0.000
   H  2.147 -1.240  0.000
   H  0.000 -2.479  0.000
   H -2.147 -1.240  0.000
   H -2.147  1.240  0.000

Writing benzene.xyz
[7]:
!octopus > out_gs.log 2>&1  # Run octopus with benzene example as input and log the output

Note that if you are running a notebook on a dedicated High-Performance Computing (HPC) node, the command mentioned above will be able to exploit all the resources of that node. This makes it feasible to run large calculations directly from the notebook.

We can now look at the log file (if created) to see if the simulation ran successfully. The log file is stored in the example_dir (project) folder.

[8]:
!head -n 20 out_gs.log  # Just to see the first 20 lines of the octopus output
    <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
                                ___
                             .-'   `'.
                            /         \
                            |         ;
                            |         |           ___.--,
                   _.._     |0) ~ (0) |    _.---'`__.-( (_.
            __.--'`_.. '.__.\    '--. \_.-' ,.--'`     `""`
           ( ,.--'`   ',__ /./;   ;, '.__.'`    __
           _`) )  .---.__.' / |   |\   \__..--""  """--.,_
          `---' .'.''-._.-'`_./  /\ '.  \ _.-~~~````~~~-._`-.__.'
                | |  .' _.-' |  |  \  \  '.               `~---`
                 \ \/ .'     \  \   '. '-._)
                  \/ /        \  \    `=.__`~-.
             jgs  / /\         `) )    / / `"".`\
            , _.-'.'\ \        / /    ( (     / /
             `--~`   ) )    .-'.'      '.'.  | (
                    (/`    ( (`          ) )  '-;
                     `      '-;         (-'

[9]:
!tail -n 20 out_gs.log  # Just to see the last 20 lines of the octopus output
             Info: Writing states. 2023/11/01 at 14:12:23


        Info: Finished writing states. 2023/11/01 at 14:12:23

Info: SCF converged in   21 iterations


** Warning:
**   Some of the states are not fully converged!

Info: Number of matrix-vector products:       3506
Info: Writing output to static/
Info: Finished writing information to 'restart/gs'.

             Calculation ended on 2023/11/01 at 14:12:31

                        Walltime:  01m 03.943s

Octopus emitted 1 warning.

One would look at the last 20 lines to check if the job ran successfully or if there was any warning. A typical successful output looks like:

Calculation ended on 2022/10/06 at 15:56:42

                        Walltime:  01.631s

Octopus stores the calculation results in the project directory. If you were to take a look at the benzene project folder, you would see some thing like:

├── benzene.xyz                     # Geometry of the molecules ( input )
├── exec                            # Runtime information
│   ├── initial_coordinates.xyz
│   ├── messages
│   ├── oct-status-finished
│   └── parser.log                  # Full set of variables used for the run ( as parsed from the inp file and the code defaults if not specified )
├── inp                             # Our input file
├── out_gs.log                      # Log file
├── output_iter                     # Output for each iteration in td calculation
├── restart
│   └── gs                          # Ground state calculations
│       ├── 0000000001.obf          # Checkpoint file to restart calculation incase of job abortion
│       ├── 0000000002.obf
│       ├── ...
│       └── wfns
└── static                          # Ground state observables are stored here
    ├── info                        # Result of ground state calculation like (energy eigenvalues and forces)
    ├── convergence
    ├── density.cube
    ├── ...
    └── wf-st0015.z=0

These new files are the results of the simulation. One would now try to analyse the output files to determine their result the postopus package helps in this part.

Postopus project structure

Once the simulation is complete, we can use the postopus package to analyse the data.

We first let postopus know the path to the project folder (stored in the examples/benzene directory), by passing this to a Run constructor (i.e. run = Run("path/to/inpfile")). If no specific path to an inp file is passed to the run object (i.e. run = Run()), then the run object is passed the current working directory as the default value. Postopus scans through the project folder and creates an object called run in our example that contains (nearly) all the information about the simulation:

[10]:
run = Run(".")  # or equivalent run = Run()

System selection

Octopus supports the simulation of multiple types of systems, like e.g. Maxwell. The system type will normally be set in the inp file. If no system is specfically set, postopus will create a default system called default. Note: if the systems are explicitly defined, a postopus object will still have a default system to hold internal variables. To know all the systems our project has we run:

[11]:
run.systems.keys()
[11]:
dict_keys(['default'])

Instead of run.system.keys() one can also use run.systems to get a similar output.

Calculation modes and subsystems

We can take a look at the different types of calculations done in this system by running:

[12]:
run.default.system_data.keys()
[12]:
dict_keys(['scf'])

This tells us that this project had only a self-consistent field simulation (scf).

Autocompletion

We can take a look at the different fields available for this system by the tab completion feature available in Jupyter notebooks and IPython.

Here is an example image of how the tab completion feature works: images/tab_completion_eg.png

Instead of using tab completion and if desired, one could use the following command to create a nicely formatted list of the attributes:

[13]:
# for some reason this breaks the html build. You can execute it in an active notebook session.
# [ attribute for attribute in dir(run.default.scf) if not attribute.startswith('_') ]

(Do not worry if the Python syntax used doesn’t make sense to you - the tab completion is generally the convenient way to explore what data is accesible through postopus.)

Output files

Postopus has many ways to handle octopus output files. The handling depends on the type of file. We can have: - unstructured text files like info, without any extension. These are returned as they are in a string format. - table-like-structured files without extension, like forces. These are returned as pandas Dataframes. - Extension files, e.g. xsf or cube files. These are accessed via get methods.

Unstructured text files

We can take a look at the info file, which stores information about this calculation by running:

[14]:
run.default.scf.info
[14]:
['\n',
 '******************************** Grid ********************************\n',
 'Simulation Box:\n',
 '  Type = minimum\n',
 '  Radius  [A] =   5.000\n',
 'Main mesh:\n',
 '  Spacing [A] = ( 0.150, 0.150, 0.150)    volume/point [A^3] =      0.00337\n',
 '  # inner mesh =     371891\n',
 '  # total mesh =     468887\n',
 '  Grid Cutoff [eV] =  1671.245006    Grid Cutoff [Ry] =   122.834253\n',
 '**********************************************************************\n',
 '\n',
 '\n',
 '***************************** Symmetries *****************************\n',
 'Symmetry elements : (i) 3*(C2) 3*(sigma)\n',
 'Symmetry group    : D2h\n',
 '**********************************************************************\n',
 '\n',
 '\n',
 '**************************** Theory Level ****************************\n',
 'Input: [TheoryLevel = kohn_sham]\n',
 '\n',
 'Exchange-correlation:\n',
 '  Exchange\n',
 '    Slater exchange (LDA)\n',
 '    [1] P. A. M. Dirac, Math. Proc. Cambridge Philos. Soc. 26, 376 (1930)\n',
 '    [2] F. Bloch, Z. Phys. 57, 545 (1929)\n',
 '  Correlation\n',
 '    Perdew & Zunger (Modified) (LDA)\n',
 '    [1] J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981), modified to improve the matching between the low- and high-rs\n',
 '\n',
 'Input: [SICCorrection = sic_none]\n',
 '**********************************************************************\n',
 '\n',
 'SCF converged in   21 iterations\n',
 '\n',
 'Some of the states are not fully converged!\n',
 'Eigenvalues [eV]\n',
 ' #st  Spin   Eigenvalue      Occupation\n',
 '   1   --   -21.178470       2.000000\n',
 '   2   --   -18.369571       2.000000\n',
 '   3   --   -18.369080       2.000000\n',
 '   4   --   -14.837372       2.000000\n',
 '   5   --   -14.837251       2.000000\n',
 '   6   --   -13.039694       2.000000\n',
 '   7   --   -11.148511       2.000000\n',
 '   8   --   -11.127117       2.000000\n',
 '   9   --   -10.329307       2.000000\n',
 '  10   --   -10.329088       2.000000\n',
 '  11   --    -9.282375       2.000000\n',
 '  12   --    -8.322699       2.000000\n',
 '  13   --    -8.320891       2.000000\n',
 '  14   --    -6.541142       2.000000\n',
 '  15   --    -6.540950       2.000000\n',
 '\n',
 'Energy [eV]:\n',
 '      Total       =     -1026.51271731\n',
 '      Free        =     -1026.51271731\n',
 '      -----------\n',
 '      Ion-ion     =      2805.76811261\n',
 '      Eigenvalues =      -365.14703650\n',
 '      Hartree     =      3570.35249630\n',
 '      Int[n*v_xc] =      -439.24846086\n',
 '      Exchange    =      -292.59679093\n',
 '      Correlation =       -43.43296706\n',
 '      vanderWaals =         0.00000000\n',
 '      Delta XC    =         0.00000000\n',
 '      Entropy     =         0.00000000\n',
 '      -TS         =        -0.00000000\n',
 '      Photon ex.  =         0.00000000\n',
 '      Kinetic     =       761.43797328\n',
 '      External    =     -7828.04150575\n',
 '      Non-local   =      -298.88711302\n',
 '      Int[n*v_E]  =         0.00000000\n',
 '\n',
 'Dipole:                 [A]          [Debye]\n',
 '      <x> =   -1.85725E-14     -8.92075E-14\n',
 '      <y> =   -9.28123E-14     -4.45797E-13\n',
 '      <z> =    6.86948E-14      3.29955E-13\n',
 '\n',
 'Convergence:\n',
 '      abs_energy =  1.35479045E-04 ( 0.00000000E+00) [eV]\n',
 '      rel_energy =  1.31979882E-07 ( 0.00000000E+00)\n',
 '      abs_dens =  3.63389353E-06 ( 0.00000000E+00)\n',
 '      rel_dens =  1.21129784E-07 ( 1.00000000E-06)\n',
 '      abs_evsum =  6.69414512E-06 ( 0.00000000E+00) [eV]\n',
 '      rel_evsum =  1.83327385E-08 ( 0.00000000E+00)\n',
 '\n',
 'Forces on the ions [eV/A]\n',
 ' Ion                        x              y              z\n',
 '   1         C   1.90863654E-13  -8.79966355E-01   9.50273809E-13\n',
 '   2         C  -7.52272867E-01  -4.31962568E-01  -2.93060308E-12\n',
 '   3         C  -7.52272867E-01   4.31962568E-01   6.02091736E-12\n',
 '   4         C   8.52351441E-14   8.79966355E-01  -3.22375622E-12\n',
 '   5         C   7.52272867E-01   4.31962568E-01   5.22982281E-12\n',
 '   6         C   7.52272867E-01  -4.31962568E-01  -6.54821560E-12\n',
 '   7         H   1.12953250E-13   2.83883456E-01  -1.55952959E-12\n',
 '   8         H   2.37692922E-01   1.35078495E-01   7.59673618E-13\n',
 '   9         H   2.37692922E-01  -1.35078495E-01  -1.34427244E-12\n',
 '  10         H   9.45232009E-14  -2.83883456E-01   3.60029805E-13\n',
 '  11         H  -2.37692922E-01  -1.35078495E-01  -7.58478659E-13\n',
 '  12         H  -2.37692922E-01   1.35078495E-01   1.37531283E-12\n',
 ' ----------------------------------------------------------\n',
 ' Max abs force   7.52272867E-01   8.79966355E-01   6.54821560E-12\n',
 '   Total force   3.79188828E-12  -8.82269557E-12  -1.66882535E-12\n',
 '  Total torque  -8.14607934E-12  -2.75065024E-12  -2.27085519E-13\n']

This is essentially the contents of the file static/info that octopus writes. One can have a look at its output to determine the total energy of the system and the eigenvalues.

Table-like-structured files

These files are usually stored in the static folder for scf calculations or in the td.general folder for time-dependent calculations. These files usually contain physical data that either is one dimensional or without a reference to the coordinate system. Technically, the data provided in rows are presented as a pandas Dataframe object, and we can use all the methods available in pandas to explore them.

We can take a look at the convergence of the systems energy after each scf iteration through:

[15]:
run.default.scf.convergence
[15]:
energy energy_diff abs_dens rel_dens abs_ev rel_ev
#iter
1 -1265.674240 1265.670000 6.571890 2.190630e-01 24.422900 4.647880e-02
2 -1186.146590 79.527600 0.726938 2.423130e-02 71.754200 1.581500e-01
3 -945.600928 240.546000 2.579490 8.598310e-02 134.602000 4.218090e-01
4 -1066.134160 120.533000 1.248540 4.161810e-02 73.159000 1.865040e-01
5 -1059.841830 6.292330 0.171141 5.704690e-03 10.632900 2.786160e-02
6 -1029.855680 29.986100 0.330906 1.103020e-02 15.863000 4.336870e-02
7 -1018.227940 11.627700 0.137260 4.575340e-03 4.744640 1.314210e-02
8 -1024.246120 6.018180 0.067297 2.243230e-03 3.244320 8.906380e-03
9 -1028.792840 4.546720 0.049662 1.655400e-03 2.248620 6.135070e-03
10 -1027.604950 1.187890 0.011515 3.838270e-04 0.837443 2.290090e-03
11 -1026.314760 1.290190 0.013353 4.450930e-04 0.685822 1.878990e-03
12 -1026.402270 0.087512 0.001305 4.350640e-05 0.091265 2.499820e-04
13 -1026.547330 0.145052 0.001486 4.953160e-05 0.086091 2.357530e-04
14 -1026.529550 0.017777 0.000245 8.172110e-06 0.016721 4.579240e-05
15 -1026.519340 0.010211 0.000097 3.231830e-06 0.005739 1.571770e-05
16 -1026.520830 0.001491 0.000023 7.627810e-07 0.001528 4.183750e-06
17 -1026.511360 0.009467 0.000103 3.417460e-06 0.005625 1.540610e-05
18 -1026.511360 0.000003 0.000014 4.799200e-07 0.000644 1.763930e-06
19 -1026.515240 0.003878 0.000043 1.421350e-06 0.002065 5.655260e-06
20 -1026.512850 0.002387 0.000024 7.903380e-07 0.001413 3.868380e-06
21 -1026.512720 0.000135 0.000004 1.211300e-07 0.000007 1.833270e-08

If we wanted to visualize the convergence of the energy, we can run the following:

[16]:
# Plot of energy at each interaction showing its convergence
fig = run.default.scf.convergence["energy"].plot(
    title="Energy convergence",
    marker=".",
    markersize=10,
)
fig.set(xlabel="Iteration", ylabel="Energy (eV)");
../_images/notebooks_Quick_Start_47_0.png

These kind of files always have an .attrs attribute, which holds further metadata that was stored initially on the file. In this case there is not much stored here appart from the filename, that could be used for setting the plot title, e.g. Nonetheless, in other cases, one can find useful information here.

[17]:
run.default.scf.convergence.attrs
[17]:
{'units': {}, 'metadata': {'filename': 'convergence'}}

Extension files

These files store field data that have a reference to the coordinate system. They can be stored in the static folder (normally converged scf iterations), or in the output_iter folder. We will call them Fields from now on. Postopus internally handles them as ScalarFields or VectorFields.

Fields

Fields are quantities that are available for different positions in space, or even time, such as an electric field or an electron density. Often, these are presented as arrays with multiple indices - for example indices for the x, y and z direction for a spatially resolved field.

In our benzene example, we have the following fields: density,elf_rs, v0,vh, vks, wf.

Field data can be accessed with a call to the get() method of the required field object. get() takes two parameters, step for the iteration number(s) you want to access and source for the source of data. For the common scenario that the relevant data is not stored in multiple file formats, we can omit the source argument.

Lets look at density as an example:

[18]:
density_data = run.default.scf.density.get(steps=1)
density_data
[18]:
<xarray.DataArray 'density' (step: 1, x: 95, y: 99, z: 67)>
array([[[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
...
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]]])
Coordinates:
  * step     (step) int64 1
  * x        (x) float64 -13.32 -13.04 -12.76 -12.47 ... 12.47 12.76 13.04 13.32
  * y        (y) float64 -13.89 -13.61 -13.32 -13.04 ... 13.04 13.32 13.61 13.89
  * z        (z) float64 -9.354 -9.071 -8.787 -8.504 ... 8.504 8.787 9.071 9.354
Attributes:
    units:    au

If you want to access the data for all the iterations, you can use get_all() or if you only wanted the converged iteration, you can use get_converged(). In this specific case, both calls would deliver the same output, since in the benzene inp file the OutputDuringSCF parameter is not set to True, so that we only can access to the last converged iteration. For these last two methods, the user doesn’t need to provide a steps method.

The data object is an xarray object containing not only the data values, but also the coordinates for which these values are defined:

[19]:
density_data.coords
[19]:
Coordinates:
  * step     (step) int64 1
  * x        (x) float64 -13.32 -13.04 -12.76 -12.47 ... 12.47 12.76 13.04 13.32
  * y        (y) float64 -13.89 -13.61 -13.32 -13.04 ... 13.04 13.32 13.61 13.89
  * z        (z) float64 -9.354 -9.071 -8.787 -8.504 ... 8.504 8.787 9.071 9.354

For more information on the units take a look at the units tutorial.

The xarray object holds the metadata needed to search for and visualize the data because it is aware of the coordinates (and the names of the coordinates, such as x, y, and z). This can make the plotting of the data easier. It also helps selecting subsets of the data. The following section will examine the plotting workflow.

In summary we can conclude the overall data structure as having the following syntax:

  • Scalar fields: run.systemname.calculationmode.fieldname

  • Vector fields: run.systemname.calculationmode.fieldname(.dimension)

We use the get method (or its variants) to access the data.

Note: In vtk files, we don’t support accessing only one of the components of the vector field. You would need to use the .sel(component=dim) method on the xarray of the whole vector field.

Working with the data: x-array

In the previous sections, we saw how to access any scalar or vector field generated by our calculation. Let’s explore the typical workflow of analyzing the data (slicing and plotting). First, let’s load the density data:

[20]:
# ask postopus for the x-array representation of the data
density = run.default.scf.density
xa = density.get_converged()  # the converged field from the "static" folder is loaded.
xa
[20]:
<xarray.DataArray 'density' (step: 1, x: 95, y: 99, z: 67)>
array([[[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
...
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]],

        [[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]]])
Coordinates:
  * step     (step) int64 1
  * x        (x) float64 -13.32 -13.04 -12.76 -12.47 ... 12.47 12.76 13.04 13.32
  * y        (y) float64 -13.89 -13.61 -13.32 -13.04 ... 13.04 13.32 13.61 13.89
  * z        (z) float64 -9.354 -9.071 -8.787 -8.504 ... 8.504 8.787 9.071 9.354
Attributes:
    units:    au

Selecting a slice by index and visualizing it

In the first line of the previous cell, we had an overview of the xarray data inside the density field. The first line tells us the number of sampling points in each direction:

xarray.DataArray ‘density’ (step: 1, x: 95, y: 99, z: 67)

So there are 95 sampling points in the x-direction and similarly 67 for the z-direction. Suppose we want to have a view of density of benzene in the x-y plane (i.e. at \(z=0\)), we can do so by selecting the slice at z=0. One way to do this for this particular simulation is by asking xarray for selecting the slice at the index \(i_z=67/2~\approx~33\):

[21]:
s0 = xa.isel(z=33)  # Viewing the slice at 33rd index of z-axis
s0
[21]:
<xarray.DataArray 'density' (step: 1, x: 95, y: 99)>
array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]])
Coordinates:
  * step     (step) int64 1
  * x        (x) float64 -13.32 -13.04 -12.76 -12.47 ... 12.47 12.76 13.04 13.32
  * y        (y) float64 -13.89 -13.61 -13.32 -13.04 ... 13.04 13.32 13.61 13.89
    z        float64 3e-06
Attributes:
    units:    au

This slice returns another xarray object. Note here that the coordinate value for z is 1.588e-06, which is the value of z coordinate for the 33rd sampling point in the z-direction. We can now plot this slice using the plot() method of the xarray object:

[22]:
s0.plot()
[22]:
<matplotlib.collections.QuadMesh at 0x7f81be0c2e50>
../_images/notebooks_Quick_Start_66_1.png

Note that the plot has the x and the y axis inverted, one can change this by passing the x='x' argument to the plot() method.

[23]:
s0.plot(x="x")
[23]:
<matplotlib.collections.QuadMesh at 0x7f81c1ac5a90>
../_images/notebooks_Quick_Start_68_1.png

Another way to slice the data is by using the sel() method of the xarray object, where you can pass the coordinate value instead of the index.

For example, to get the same slice as above, we can use:

[24]:
s1 = xa.sel(z=0, method="nearest")
s1.plot(x="x")
[24]:
<matplotlib.collections.QuadMesh at 0x7f81c1a1d9d0>
../_images/notebooks_Quick_Start_70_1.png

Note that plots automatically display the value of the position of the slice in the z-direction as well as the step number. This is possible because xarray maintains the metadata of the coordinates even after slicing.

One can have a side view of the molecule by slicing the data in the y-z plane (i.e. at \(x=0\)) in a similar fashion:

xa.sel(x=0, method='nearest')

Using a for loop and some commands from the matplotlib plotting library, we can plot 6 different slices of the data in a 3x2 grid at different values of x.

[25]:
# plot of 6 2D slice of the density along the x axis
fig, axs = plt.subplots(
    2, 3, figsize=(10, 5)
)  # create a subplot with 2 rows, 3 columns
x_positions = np.linspace(-7.5, 5, 6)  # x positions from -7.5 to 5 in 6 steps
for ax, x in zip(axs.flat, x_positions):
    xa.sel(x=x, method="nearest").plot(
        ax=ax, x="y"
    )  # plot the slice nearest to x position

fig.tight_layout()  # avoid overlap of labels
../_images/notebooks_Quick_Start_73_0.png

The sel method can also be used to select a multiple coordinates at once, for example to select the slice at \(x=0\) and \(z=0\) we can do:

xa.sel(x=0, z=0, method='nearest')

We can then plot the variation of density along the y axis. Because we have restricted two (x and z) of the three spatial dimensions, we obtain a ‘line plot’ of the density along the remaining coordinate dimension y.

[26]:
xa.sel(x=0, z=0, method="nearest").plot()
[26]:
[<matplotlib.lines.Line2D at 0x7f81c1a3bfd0>]
../_images/notebooks_Quick_Start_75_1.png

A more detailed exploration of plotting with x-array is available in the postopus plotting tutorial.

Plotting with holoviews

A plotting library that is built for interactive/multidimensional plots is holoviews. Converting xarray objects (the most common output type in postopus) to holoviews objects is simple, as we will see. Holoviews supports many backends for the plotting. In this tutorial we will use matplotlib and bokeh (the default plotting backends for holoviews).

The holoviews approach to plotting may seem unconventional (and perhaps confusing) at first – it is, however, very powerful. In particular, holoviews allows the selective and/or interactive visualization of data that is defined in more dimensions that we can easily visualize. To be able to use that power, we need to provide some additional information to the plotting commands: for example which of the many dimensions we want to plot, and for which we would like to have an interactive slider to select it.

We provide a few examples that can be used as templates to cover typical use cases below.

First, let us import the holoviews library as well as the plotting backends we will use:

[27]:
import holoviews as hv
from holoviews import opts  # For setting defaults

# hv.extension('bokeh', 'matplotlib')  # Allow for interactive plots
hv.extension("bokeh")  # Allow for interactive plots

We then choose the color map we want to use for the plots. We use viridis.

[28]:
# Choose color scheme similar to matplotlib
opts.defaults(
    opts.Image(cmap="viridis", width=400, height=400),
)

We create the object xa, which is an xarray of the converged density of the benzene molecule and which we used already earlier in this notebook.

[29]:
run = Run()
xa = run.default.scf.density.get_converged()

We first convert the xarray object xa to a holoviews object using the hv.Dataset method.

hv.Dataset(xa)

We can now visualise the data as an hv.Image using the to method of the holoviews object:

hv.Dataset(xa).to(hv.Image, ['x', 'y'])

We specify the ‘x’ and ‘y’ dimensions as the Image’s two key dimensions (kdims) (in this case the x and y coordinates of the Dataset) that will serve as the visible axes. The remaining dimensions (in this case z and step) will be used as controls (in the form of slider and a dropdown list) to select the data to display.

The slider for z allows us to select the slice at different values of z interactively and the dropdown for step allows us to select the data at different iterations. (There is only data for one Step in this example.). One can use the opts arguments to customize the plot further, for more details see the holoviews tutorial.

[30]:
hv.Dataset(xa).to(hv.Image, kdims=["x", "y"]).opts(colorbar=True, width=500)
[30]: