Database structure¶
The web interface for the database is written in the Django web framework and the database itself is of SQL type. That is, any structured database should in principle be fine for hosting (mySQL, SQLite), but we recommend using mySQL/MariaDB. This section provides an overview of the Django models used for the website. The presentation focuses on how the models are defined in the Python source code and not the actual SQL tables. For example, even though fields such as the primary key are not listed in the following, it is understood that these are automatically created for the SQL tables.
Most models inherit from a base model which records information of how each entry is created/updated. Since actual polymorphism is not supported in relational databases, Django explicitly copies these fields to any child models.
Base
- created
- date the entry was created
- updated
- date the entry was last modified
- created by
- user that created the entry
- updated by
- user that updated the entry
All properties are stored in a table that contains the name of the property.
Property(Base)
- name
- displayed name of the property
All units are stored in a table that contains the label field.
Unit(Base)
- label
- “nm”, “cm2 V-1 s-1”, …
A solid system is defined by the following properties.
System
- compound name
- displayed name of the material
- formula
- chemical formula for the compound
- group
- alternate names
- organic
- organic component
- inorganic
- inorganic component
- last_update
- date the system was last modified
- description
- description of the compound
Authors and references are stored in the following tables.
Reference
- title
- title of the paper
- journal
- journal name
- vol
- volume
- pages_start
- starting page number
- pages_end
- end page number
- year
- year of publication
- doi_isbn
- DOI/ISBN if applicable
All experimental and theoretical results are contained in data sets. A data set typically refers to a single value, table, or figure found in a reference. The quantity of primary interest is called the “primary property”. For example, given some data where the band gap depends on temperature, the band gap and temperature would be the “primary” and “secondary” properties, respectively (think of these as y- and x-values in a plot).
Dataset(Base)
- caption
- description of the data set
- system
- foreign key for System
- primary_property
- foreign key for Property
- primary_unit
- foreign key for Unit
- primary_property_label
- custom label for the y-axis (typically left empty and the property name is used instead)
- secondary_property
- foreign key for Property
- secondary_unit
- foreign key for Unit
- secondary_property_label
- custom label for the x-axis (typically left empty and the property name is used instead)
- reference
- foreign key for Reference
- visible
- whether the data is visible on the website
- is_figure
- whether the data should be plotted
- plotted
- whether data is plotted by default
- experimental
- whether the data is of experimental origin (theoretical if false)
- dimensionality
- dimensionality of the inorganic component as understood in the HOIP literature (not the dimensionality of the crystal)
- sample_type
- single crystal, powder, ldots
- extraction_method
- short explanation for how the data was obtained
- representative
- in case of multiple entries of the same property for a given material, whether this data set should be shown on the material’s main page.
- linked_to
- a ManyToManyField, used if the numerical values of this data set are somehow directly linked to another data set
- verified_by
- list of users that have verified the correctness of the data set
A data set consists of one or more data subsets. One is always present but there could be several if it is possible to logically group the data somehow. For instance, different curves in a figure would correspond to separate data subsets.
Subset(Base)
- dataset
- foreign key for Dataset
- label
- short description of this subset
- crystal_system
- one of the seven crystal systems
A data subset consists of one or more data points. When describing a single value such as the band gap of a material with no additional dependencies, the whole data set would consist of one subset with only one data point with one numerical value.
Datapoint(Base)
- subset
- foreign key for Subset
Finally, the actual data is stored in the NumericalValue table.
NumericalValue(Base)
- datapoint
- foreign key for Datapoint
- qualifier
- “primary”, “secondary”
- value_type
- “accurate”, “approximate”, “lower/upper bound”
- value
- floating point number
- counter
- counts the number of values attached to a given data point
Any errors (uncertainties) associated with a numerical value are stored in a separate table. In the code, the errors are then retrieved from the database by querying for numerical values with the verb+select_related(‘error’)+ function and checking if a value has an associated error (verb+if hasattr(value, ‘error’)+).
Error(Base)
- numerical_value
- foreign key for NumericalValue
- value
- floating point number
Similarly to errors, when dealing with ranges, the upper bounds are stored in a separate table. The value field is then understood to contain the lower bound of the range
UpperBound(Base)
- numerical_value
- foreign key for NumericalValue
- value
- floating point number
A separate table is used for values that are fixed across a data subset. For instance, if the curves of band gap vs dopant density are measured for different temperatures, then “band gap”, “dopant density”, and “temperature” would be “primary”, “secondary”, and “fixed”, respectively. Unlike regular numerical values, the fixed values are far lesser in number. Thus, we can attach the errors directly to the values without a performance penalty.
NumericalValueFixed(Base)
- dataset
- foreign key for Dataset
- subset
- foreign key for Subset
- physical_property
- foreign key for Property
- unit
- foreign key for Unit
- value
- floating point number
- type
- “accurate”, “approximate”, “lower/upper bound”, “error”
- error
- floating point number (optional)
- upper_bound
- floating point number (optional); if present, then value is understood to be the lower bound for the range
If the dependence of the primary property is on something that cannot be stored as a floating point number, it is stored in the Symbol table. Example: the user enters band gap values a function of phase. The phases are then stored as strings in the following table.
Symbol(Base)
- datapoint
- foreign key for Datapoint
- value
- a string
- counter
- counts the number of symbols attached to a given data point
In case of an experimental study, the details of the synthesis method and the experiment can be stored in the following tables.
SynthesisMethod(Base)
- dataset
- foreign key for Datapoint
- starting_materials
- starting materials of synthesis
- product
- product of synthesis
- description
- detailed description of the synthesis process
ExperimentalDetails(Base)
- dataset
- foreign key for Datapoint
- method
- name of the experimental method
- description
- detailed description of the experiment
Similarly, in case of a theoretical study, the computational details are recorded in a separate table.
ComputationalDetails(Base)
- dataset
- foreign key for Datapoint
- code
- computer code used for calculations
- level_of_theory
- level of theory used in the calculation
- xc_functional
- exchange-correlation functional
- k_point_grid
- details about the K-point grid
- level_of_relativity
- level of relatively (this includes the description of spin-orbit coupling)
- basis_set_definition
- anything related to the basis set used (this includes pseudopotential details, if applicable)
- numerical_accuracy
- information about parameters that control the accuracy of the calculation
Each entry of synthesis method, experimental details, or computational details may have a comment, which is stored in a separate table.
Besides storing all numerical data in a structured database, the data is also stored in the form of files. This way the original user uploaded data is stored without modifications, e.g., preserving any comments that the input file may contain.
InputDataFile(Base)
- dataset
- foreign key for Dataset
- dataset_file
- a file upload field
Any additional files, if present, are stored in DatasetFile (input/output files for a calculation, image of the sample, ldots).
AdditionalFile(Base)
- dataset
- foreign key for Dataset
- dataset_file
- a file upload field
Phase transition properties, such as the phase transition pressure, required special treatment and are stored in PhaseTransition.
PhaseTransition(Base)
- subset
- foreign key for Subset
- crystal_system_final
- final crystal system; crystal_system of the subset is then understood to be the initial crystal system
- space_group_initial
- initial space group
- space_group_final
- final space group
- direction
- direction of the phase transition
- hysteresis
- details about the hysteresis of the phase transition
- value
- floating point number
- value_type
- “accurate”, “approximate”, “lower/upper bound”
- counter
- number of values attached to a given subset
- error
- uncertainty of the value
- upper_bound
- upper bound of the value
All user information is stored in the UserProfile table.
UserProfile
- user
- the default Django user model
- description
- description of the user (e.g., undergraduate)
- institution
- name of the institution
- website
- website of the user
Comment(Base)