descmap.selection

descmap.selection.get_best_combination(X, n_max_combs, comb_df)[source]

Get the best descriptor combination with the smallest mean squared error (MSE)

Parameters:
  • X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel

  • n_max_combs (int) – Maximum number of descriptor combinations to consider

  • comb_df (pandas DataFrame object) – Fitness score for each descriptor combination

Returns:

  • comb_err (pandas DataFrame object) – MSE and MAE for each descriptor combination

  • best_comb (pandas DataFrame object) – Best descriptor combination with corresponding MSE and MAE

descmap.selection.get_combination_score(X, n_components)[source]

Calculate the metrics for each combination of descriptor set

Parameters:
  • X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel

  • n_components (int) – Number of principal components for PCA model

Returns:

  • comb_df (pandas Series object) – Fitness score for each descriptor combination

  • complete_df (pandas DataFrame object) – Calculated metrics for each descriptor combination: inv_cond_num, robustness_vec, fitness

descmap.selection.get_component_number(X, var_explained)[source]

Calculate the number of PCA components to reach the explained variance

Parameters:
  • X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel

  • var_explained (float) – Defined thresold for explained variance

Returns:

n_components – Required number of components

Return type:

int

descmap.selection.get_lsr_results(X, best_comb)[source]

Get the LSR slopes and intercepts using the best descriptor combination

Parameters:
  • X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel

  • best_comb (pandas DataFrame object) – Best descriptor combination with smallest MSE

Returns:

final_df – Intercepts and slopes for each species

Return type:

pandas DataFrame object

descmap.selection.write_lsr_results(final_df)[source]

Write intercepts and slopes to excel for future use

Parameters:

final_df (pandas DataFrame object) – Intercepts and slopes for each species