Skip to content

Class GridMaster


This class manages the end-to-end multi-model hyperparameter search, evaluation, and export pipeline.


Initialization & Setup


Method .init()

Initialize the GridMaster with specified models and training data.

This constructor sets up the internal model configuration for each model using the provided training dataset and optional custom hyperparameters.


Args

Parameter Type Description Default
models list A list of model names (e.g., 'logistic', 'random_forest', 'xgboost').
X_train array-like or DataFrame Training features.
y_train array-like Training labels.
mode str, optional One of'fast' or 'industrial'. Controls the scale of the coarse search grids for each model. 'fast' is quicker but less exhaustive; 'industrial' is larger and better suited for production-grade tuning. 'fast'
custom_params dict, optional Dictionary of custom coarse-level hyperparameters for specific models. Format: {model_name: param_dict}. None
custom_estimator_params dict, optional Dictionary of custom estimator (model) initialization parameters. Format: {model_name: param_dict} Useful for enabling options like GPU. None
njobs int, optional Number of parallel jobs for GridSearchCV. Defaults to half of the total detected CPU cores (based on system hardware). Use -1 to utilize all CPU cores. half of total CPU cores
verbose int, optional Verbosity level for GridSearchCV. Controls how much logging is printed.
Recommendation:
- Beginner or clean run0 to suppress all messages.
- Medium-scale tasks where you want to monitor progress use 1 , especially for: Large grid searches where you want to know it’s still active, or Getting a rough idea of where the search is in its progress
- Advanced users or debugging use 2 or higher to see detailed cross-validation steps and logs from the estimators themselves.
1
refit bool, optional Whether to refit the best estimator on the entire dataset after search. True
return_train_score bool, optional Whether to include training set scores in cv_results_. False

Attributes

Attribute Type Description
model_dict dict Dictionary storing initialized models and their search spaces.
X_train array-like Feature training set.
y_train array-like Label training set.
results dict Stores search results for each model.
best_model_name str Name of the currently best-performing model.
feature_names list List of feature names for plotting and explanation.
njobs int Number of parallel jobs for GridSearchCV.
verbose int Verbosity level for GridSearchCV.
refit bool Whether to refit the best estimator after grid search.
return_train_score bool Whether training scores are included in cv_results_.

⚠️ Warning:
This class sets up internal state; be cautious when modifying results or model_dict manually.



Perform coarse-level hyperparameter grid search across all models.

This method iterates through all configured models and performs GridSearchCV using their predefined coarse parameter grids.


Args

Parameter Type Description Default
scoring str, optional Evaluation metric to optimize. Must be one of 'accuracy', 'f1', 'roc_auc', 'precision', 'recall', or any valid sklearn scorer string. 'accuracy'
cv int Number of cross-validation folds. 5

Returns

None: Updates the results dictionary with fitted GridSearchCV objects under the 'coarse' key.


⚠️ Warning:
This method modifies the results dictionary in-place.


Notes:

This method internally uses the following advanced GridSearchCV parameters, set during GridMaster initialization:

  • n_jobs: Number of parallel jobs. Defaults to None (single-threaded). Use -1 for all CPU cores.
  • verbose: Verbosity level. Controls logging detail.
  • refit: Whether to refit the best model on the entire dataset after search.
  • return_train_score: Whether to include training set scores in the results.

Example

gm = GridMaster()
gm.coarse_search(scoring='f1', cv=5)


Prerequisite:
Before using .fine_search(), you must first run .coarse_search() to generate the required coarse search results.

Performs fine-level hyperparameter tuning based on coarse search results.

This method refines the hyperparameter grid by auto-generating a narrower search space around the best parameters from the coarse search and runs another GridSearchCV.
It now supports smart, expert, or custom fine-tuning modes.


Args

Parameter Type Description Default
scoring str, optional Scoring metric to optimize. Must be one of 'accuracy', 'f1', 'roc_auc', 'precision', 'recall', or any valid sklearn scorer string. 'accuracy'
cv int, optional Number of cross-validation folds. 5
auto_scale float, optional Scaling factor for narrowing the search range (e.g., 0.5 = ±50% around the best value). Applies to both linear and log-scale parameters. 0.5
auto_steps int, optional Number of steps/grid points per parameter in fine grid. 5
search_mode str, optional Fine-tuning mode. Choose from 'smart' (auto-select most important params), 'expert' (only adjusts learning rate, max depth), or 'custom' (use provided custom_fine_params). 'smart'
custom_fine_params dict, optional Custom fine-tuning grid when search_mode='custom'. If provided, it overrides auto-generated grids. None

Modes (smart / expert / custom)

smart mode
  • Default behavior: Automatically selects the top-2 most impactful hyperparameters from coarse search results.

  • How it works:

  • Extracts GridSearchCV.cv_results_ and looks at how each hyperparameter’s different values affect mean_test_score.

  • Calculates the performance variation range (max - min average score) for each parameter.

  • Selects the top-N (default 2) parameters with the largest variation as the focus of fine-tuning.

  • Best for: Users who want automated, data-driven refinement without needing to pre-select key parameters.

  • Caution: If the coarse grid was too narrow or the data is insensitive to certain parameters, the generated fine grid may collapse to a single combination and skip fine-tuning.

expert mode
  • Default behavior: Focuses on well-known, domain-recommended hyperparameters that typically have strong influence.
  • Current configuration:
  • LogisticRegression: 'clf__C'
  • RandomForest: 'clf__max_depth','clf__min_samples_split'
  • XGBoost, LightGBM, CatBoost: 'clf__learning_rate', 'clf__max_depth'
  • Best for: Users who trust established best practices and want to emphasize proven sensitive parameters.
  • Caution: This mode is not data-adaptive; it may overlook dataset-specific influences that fall outside the “usual suspects.”
custom mode
  • Default behavior: Uses user-supplied custom_fine_params to directly define the fine-tuning grid.
  • Best for: Advanced users who want full control or have domain-specific knowledge about optimal parameter ranges.
  • Caution: You are responsible for ensuring the parameter grid is meaningful; an overly narrow grid may result in only one combination and skip fine-tuning.

Returns

None: Updates the results dictionary with the fine-tuned GridSearchCV objects under the 'fine' key.


⚠️ Warning:
This method modifies the results dictionary in-place.


Notes

This method internally uses the following advanced GridSearchCV parameters, set during GridMaster initialization:

  • n_jobs: Number of parallel jobs. Defaults to half of CPU cores (detected at runtime). Use -1 for all cores.
  • verbose: Verbosity level. Controls logging detail.
  • refit: Whether to refit the best model on the entire dataset after search.
  • return_train_score: Whether to include training set scores in the results.

Example

# Step 1: Run coarse search first
gm.coarse_search(scoring='accuracy', cv=5)

# Step 2: Fine-tune using smart mode
gm.fine_search(
    scoring='roc_auc',
    cv=5,
    auto_scale=0.3,
    auto_steps=7,
    search_mode='smart'
)

# Or using expert mode:
gm.fine_search(
    scoring='accuracy',
    cv=3,
    search_mode='expert'
)

# Or using custom mode:
custom_grid = {
    'clf__n_estimators': [100, 200, 300],
    'clf__max_depth': [3, 5, 7]
}
gm.fine_search(
    scoring='f1',
    cv=4,
    search_mode='custom',
    custom_fine_params=custom_grid
)


Perform a multi-stage grid search consisting of one coarse and multiple fine-tuning stages.

This method first performs a coarse search (if not already done), then iteratively refines the hyperparameter space using a list of (scale, steps) tuples.
You can now choose smart, expert, or custom mode for each fine stage.


Args

Parameter Type Description Default
model_name str Name of the model to search (must be present in model_dict). Defaults to to None, which means use all models from initialization. None
cv int, optional Number of cross-validation folds. 5
scoring str, optional Scoring metric to optimize. Must be one of 'accuracy', 'f1', 'roc_auc', 'precision', 'recall', or any valid sklearn scorer string. 'accuracy'
stages list of tuple, optional List of (scale, steps) for each fine-tuning stage. Example: [(0.5, 5), (0.2, 5)] means two rounds: ±50% grid with 5 points, then ±20% with 5 points. [(0.5, 5), (0.2, 5)]
search_mode str, optional Fine-tuning mode for all fine stages. Choose from 'smart', 'expert', or 'custom'. 'smart'
custom_fine_params dict, optional Custom fine-tuning grid if search_mode='custom'. None
verbose bool, optional Whether to print progress messages. True

Modes (smart / expert / custom)

smart mode
  • Default behavior: Automatically selects the top-2 most impactful hyperparameters from coarse search results.

  • How it works:

  • Extracts GridSearchCV.cv_results_ and looks at how each hyperparameter’s different values affect mean_test_score.

  • Calculates the performance variation range (max - min average score) for each parameter.

  • Selects the top-N (default 2) parameters with the largest variation as the focus of fine-tuning.

  • Best for: Users who want automated, data-driven refinement without needing to pre-select key parameters.

  • Caution: If the coarse grid was too narrow or the data is insensitive to certain parameters, the generated fine grid may collapse to a single combination and skip fine-tuning.

expert mode
  • Default behavior: Focuses on well-known, domain-recommended hyperparameters that typically have strong influence.
  • Current configuration:
  • LogisticRegression: 'clf__C'
  • RandomForest: 'clf__max_depth','clf__min_samples_split'
  • XGBoost, LightGBM, CatBoost: 'clf__learning_rate', 'clf__max_depth'
  • Best for: Users who trust established best practices and want to emphasize proven sensitive parameters.
  • Caution: This mode is not data-adaptive; it may overlook dataset-specific influences that fall outside the “usual suspects.”
custom mode
  • Default behavior: Uses user-supplied custom_fine_params to directly define the fine-tuning grid.
  • Best for: Advanced users who want full control or have domain-specific knowledge about optimal parameter ranges.
  • Caution: You are responsible for ensuring the parameter grid is meaningful; an overly narrow grid may result in only one combination and skip fine-tuning.

Returns

None: Updates the results dictionary with intermediate GridSearchCV results for each stage.


⚠️ Warning:
This method modifies the results dictionary in-place.


Notes

This method internally uses the following advanced GridSearchCV parameters, set during GridMaster initialization:

  • n_jobs: Number of parallel jobs. Defaults to half of CPU cores (detected at runtime). Use -1 for all cores.
  • verbose: Verbosity level. Controls logging detail.
  • refit: Whether to refit the best model on the entire dataset after search.
  • return_train_score: Whether to include training set scores in the results.

Example

gm.multi_stage_search(
    model_name='xgboost',
    scoring='accuracy',
    cv=3,
    stages=[(0.5, 5), (0.2, 5)],
    search_mode='smart'
)

# Or using expert mode:
gm.multi_stage_search(
    model_name='lightgbm',
    scoring='roc_auc',
    search_mode='expert'
)

# Or using custom mode:
custom_grid = {
    'clf__n_estimators': [100, 200, 300],
    'clf__max_depth': [3, 5, 7]
}
gm.multi_stage_search(
    model_name='catboost',
    scoring='f1',
    search_mode='custom',
    custom_fine_params=custom_grid
)

Model Evaluation & Summary


Method .compare_best_models()

Compare all trained models on test data using specified evaluation metrics.

This method selects the best estimator for each model, computes scores on the provided test set, and stores results for later access.


Args

Parameter Type Description Default
X_test array-like Feature test set.
y_test array-like Ground truth labels for the test set.
metrics list of str, optional Evaluation metrics to compute. Valid values: 'accuracy', 'f1', 'roc_auc', 'precision', 'recall'. ['accuracy', 'f1', 'roc_auc']
strategy str, optional Placeholder for future ranking strategies (currently unused). 'rank_sum'
weights dict, optional Placeholder for future weighted metric strategies (currently unused). None

Returns

None: Updates results with 'test_scores' and 'best_model' for each model.


⚠️ Warning:
This method modifies the results dictionary in-place.

Example

gm.compare_best_models(X_test, y_test, metrics=['accuracy', 'f1'])


Method .get_best_model_summary()

Retrieve a summary of the best model's configuration and performance.

This includes the best estimator, parameters, cross-validation score, and test set scores if available.


Args

Parameter Type Description Default
model_name str, optional Name of the model to summarize. If None, uses the current best_model_name set by the user or internal logic. None

Returns

dict: A dictionary with the following keys:
- 'model_name' (str): Name of the model.
- 'best_estimator' (sklearn.BaseEstimator): Best estimator object.
- 'best_params' (dict): Best hyperparameters from search.
- 'cv_best_score' (float): Best cross-validation score.
- 'test_scores' (dict): Optional test metrics if available.


Example

summary = gm.get_best_model_summary('logistic')
print(summary['best_params'])


Method .generate_search_report()

Generate a detailed multi-stage search report across all models, summarizing parameter grids, best parameter sets, and best metric scores.


Returns:

​ str: A formatted multi-line text report summarizing the entire search process.


Example

gm.generate_search_report() 


Method .get_cv_results()

Retrieve cross-validation results from GridSearchCV for a specific model.

This method allows you to extract detailed performance metrics from previous coarse or fine searches.


Args

Parameter Type Description Default
model_name str Name of the model to retrieve results for.
use_fine bool, optional Whether to retrieve results from fine-tuned search (final) or coarse-level search. True
as_dataframe bool, optional If True, returns results as a pandas DataFrame; if False, returns the raw cv_results_ dictionary. True

Returns

Union[pd.DataFrame, dict]: Cross-validation results in the selected format.


Example

cv_results = gm.get_cv_results('xgboost', use_fine=True, as_dataframe=True)
print(cv_results.head())

Visualization & Plotting


Method .plot_cv_score_curve()

Plot cross-validation scores for each parameter set tried during grid search.

This visualization helps analyze how performance varies across different hyperparameter combinations.


Args

Parameter Type Description Default
model_name str Name of the model whose results will be plotted.
metric str, optional Score metric to plot. Must be one of 'mean_test_score', 'mean_train_score', 'std_test_score', 'std_train_score', 'rank_test_score'. 'mean_test_score'
plot_train bool, optional Whether to include training scores in the plot. True
figsize tuple, optional Size of the plot in inches (width, height). (10, 5)
show_best_point bool, optional Whether to mark the best score point on the plot. True
title str, optional Custom plot title. If None, a default title will be used. None
xlabel str, optional Label for x-axis. 'Parameter Set Index'
ylabel str, optional Label for y-axis. If None, uses the metric. None
save_path str, optional If provided, saves the plot to the specified file path. None

Returns

None: Displays and optionally saves a matplotlib plot.


Example

gm.plot_cv_score_curve('xgboost', metric='mean_test_score')


Method .plot_confusion_matrix()

Plot the confusion matrix for a classification model on test data.

This method visualizes the true vs. predicted labels to evaluate model performance.


Parameter Type Description Default
model_name str Name of the model to use for prediction.
X_test array-like Test feature set.
y_test array-like True labels for the test set.
labels list, optional List of label names to display in the matrix. None
normalize str or None, optional Normalization mode: 'true' (row-wise), 'pred' (column-wise), 'all' (overall), or None (no normalization). See sklearn confusion_matrix docs for details. None
figsize tuple, optional Size of the figure in inches. (6, 5)
cmap str or Colormap, optional Colormap used for the plot. Must be a valid name or Colormap object from Matplotlib colormaps, e.g., 'Blues', 'viridis', 'plasma'. 'Blues'
title str, optional Title of the plot. If None, a default title will be used. None
save_path str, optional If specified, saves the figure to this file path instead of displaying it. None

Returns

None: Displays the confusion matrix plot.


Example

gm.plot_confusion_matrix('logistic', X_test, y_test, normalize='true')


Method .plot_roc_curve()

Plot the ROC (Receiver Operating Characteristic) curve for the specified model on a test set.

This visualization helps assess the tradeoff between true positive rate (TPR) and false positive rate (FPR) across thresholds.


Args

Parameter Type Description Default
model_name str Name of the model whose ROC curve will be plotted.
X_test array-like Test feature set.
y_test array-like True labels for the test set.
figsize tuple, optional Size of the plot in inches (width, height). (8, 6)
save_path str, optional If provided, saves the plot to the specified file path. None

Returns

None: Displays and optionally saves a matplotlib plot.


Example

gm.plot_roc_curve('xgboost', X_test, y_test)


Method .plot_precision_recall_curve()

Plot the Precision-Recall curve for the specified model on a test set.

This visualization is particularly useful for imbalanced datasets, helping assess the balance between precision and recall at different decision thresholds.


Args

Parameter Type Description Default
model_name str Name of the model whose Precision-Recall curve will be plotted.
X_test array-like Test feature set.
y_test array-like True labels for the test set.
figsize tuple, optional Size of the plot in inches (width, height). (8, 6)
save_path str, optional If provided, saves the plot to the specified file path. None

Returns

None: Displays and optionally saves a matplotlib plot.


Example

gm.plot_precision_recall_curve('model_name='xgboost', X_test=X_test, y_test=y_test )


Method .plot_model_coefficients()

Plot the top N coefficients of a linear model for interpretability.

This method visualizes the most important positive or negative coefficients from models like logistic regression.


Args

Parameter Type Description Default
model_name str Name of the model whose coefficients will be plotted.
top_n int, optional Number of most important features to display. 20
sort_descending bool, optional Whether to sort by absolute value in descending order. True
figsize tuple, optional Figure size in inches. (10, 5)
color str, optional Bar color. Must be a valid color name or hex code as accepted by Matplotlib colors, e.g., 'teal', 'red', '#1f77b4'. 'teal'
title str, optional Plot title. If None, uses default. None
xlabel str, optional Label for x-axis. 'Coefficient Value'
save_path str, optional If provided, saves the plot to this file path. None

Returns

None: Displays and optionally saves a matplotlib bar chart of model coefficients.


Example

gm.plot_model_coefficients('logistic', top_n=15, color='purple')


Method .plot_feature_importance()

Plot the top N feature importances from a tree-based model.

This method shows which features contributed most to predictions, based on models like random forest, XGBoost, or LightGBM.


Args

Parameter Type Description Default
model_name str Name of the model to visualize.
top_n int, optional Number of top features to display. 20
sort_descending bool, optional Whether to sort by importance in descending order. True
figsize tuple, optional Figure size in inches. (10, 5)
color str, optional Bar color. Must be a valid color name or hex code as accepted by Matplotlib colors, e.g., 'darkgreen', 'red', '#1f77b4'. 'darkgreen'
title str, optional Plot title. If None, a default will be used. None
xlabel str, optional X-axis label. 'Feature Importance'
save_path str, optional If specified, saves the plot to this path. None

Returns

None: Displays and optionally saves a matplotlib plot.


Example

gm.plot_feature_importance('xgboost', top_n=15, color='orange')

Import & Export

⚠️ Warning: When loading models across environments or machines, ensure that the software dependencies (e.g., sklearn, xgboost versions) are consistent. Mismatched library versions may lead to loading errors or unpredictable behavior.


Method .export_model_package()

Export the best model, its summary, and cross-validation results to disk.

This function creates a dedicated subdirectory under the specified folder, containing: - The final fitted model (model_final.joblib) - A JSON summary of model performance and parameters (best_model_summary.json) - CSV files of cross-validation results for all search stages (e.g., coarse, fine)


Args

Parameter Type Description Default
model_name str Name of the model to export.
folder_path str, optional Base folder to store exported model files. If the folder does not exist, it will be created. 'model_exports'

Returns

None: Writes files to disk, no return value.


Example

gm.export_model_package('logistic', folder_path='exports')


Method .export_all_models()

Export all models and their associated results to disk.

This function loops through all models stored in self.results and calls .export_model_package() on each of them.
Optionally, it appends a timestamp to each model’s output directory to avoid overwriting previous exports.


Args

Parameter Type Description Default
folder_path str, optional Base folder where all model subfolders will be exported. If the folder does not exist, it will be created. 'model_exports'
use_timestamp bool, optional If True, appends a timestamp to each model folder name (useful for versioning and avoiding overwrite). True

Returns

None: Writes all models and their associated files to disk, no return value.


Example

gm.export_all_models(folder_path='exports', use_timestamp=True)


Method .load_model_package()

Load a saved model package from disk, including the estimator, summary, and cross-validation results.

This function reads: - The saved model (model_final.joblib) - The JSON summary (best_model_summary.json) - The CSV files of cross-validation results

and loads them back into the self.results dictionary.


Args

Parameter Type Description Default
model_name str Name of the model to load.
folder_path str, optional Base path where the model files are stored. The method looks for a subfolder named after the model inside this directory. 'model_exports'

Returns

None: Updates self.results and sets self.best_model_name.


Example

gm.load_model_package('logistic', folder_path='exports')


Method .load_all_models()

Load all saved model packages from a specified folder.

This function iterates through all subdirectories under the given folder_path,
assumes each contains a model export (created by .export_model_package()),
and calls .load_model_package() to load them into self.results.

Model names are inferred from subdirectory names (before the first underscore if timestamped).


Args

Parameter Type Description Default
folder_path str, optional Directory containing exported model subfolders. Each subfolder should follow the expected naming and file structure. 'model_exports'

Returns

None: Updates self.results with loaded model data for all found models.


Example

gm.load_all_models(folder_path='exports')