descmap.selection
- descmap.selection.get_best_combination(X, n_max_combs, comb_df)[source]
Get the best descriptor combination with the smallest mean squared error (MSE)
- Parameters:
X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel
n_max_combs (int) – Maximum number of descriptor combinations to consider
comb_df (pandas DataFrame object) – Fitness score for each descriptor combination
- Returns:
comb_err (pandas DataFrame object) – MSE and MAE for each descriptor combination
best_comb (pandas DataFrame object) – Best descriptor combination with corresponding MSE and MAE
- descmap.selection.get_combination_score(X, n_components)[source]
Calculate the metrics for each combination of descriptor set
- Parameters:
X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel
n_components (int) – Number of principal components for PCA model
- Returns:
comb_df (pandas Series object) – Fitness score for each descriptor combination
complete_df (pandas DataFrame object) – Calculated metrics for each descriptor combination: inv_cond_num, robustness_vec, fitness
- descmap.selection.get_component_number(X, var_explained)[source]
Calculate the number of PCA components to reach the explained variance
- descmap.selection.get_lsr_results(X, best_comb)[source]
Get the LSR slopes and intercepts using the best descriptor combination
- Parameters:
X (pandas DataFrame object) – (m_surfaces, n_adsorbates). Adsorption energies of species read from excel
best_comb (pandas DataFrame object) – Best descriptor combination with smallest MSE
- Returns:
final_df – Intercepts and slopes for each species
- Return type:
pandas DataFrame object