Outlier Detection Support
Overview
sklearn-genetic now includes native support for tuning outlier detection models such as IsolationForest, OneClassSVM, and LocalOutlierFactor using GASearchCV and GAFeatureSelectionCV. These models are recognized automatically, and a default scoring function is applied when scoring=None is passed.
This feature simplifies hyperparameter optimization for unsupervised anomaly detection problems, where y labels are not available.
Default Scoring Logic
When scoring=None and an estimator is recognized as an outlier detector, a default scorer is used. This scorer attempts the following, in order:
If the estimator has score_samples, the mean of the scores is used.
If score_samples is unavailable but decision_function exists, its mean value is used.
As a fallback, the estimator is used with fit_predict, and the mean of (predictions == 1) is returned.
This scoring system is designed to maximize flexibility and compatibility with a wide range of outlier models.
def default_outlier_scorer(estimator, X, y=None):
if hasattr(estimator, 'score_samples'):
return np.mean(estimator.score_samples(X))
elif hasattr(estimator, 'decision_function'):
return np.mean(estimator.decision_function(X))
else:
predictions = estimator.fit_predict(X)
return np.mean(predictions == 1)
Examples
Using GASearchCV with IsolationForest:
from sklearn.ensemble import IsolationForest
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Integer, Continuous
from sklearn.datasets import make_blobs
import numpy as np
# Create synthetic data with outliers
X_normal, _ = make_blobs(n_samples=200, centers=1, n_features=4, random_state=42)
X_outliers = np.random.uniform(low=-6, high=6, size=(20, 4))
X = np.vstack([X_normal, X_outliers])
estimator = IsolationForest(random_state=42)
param_grid = {
'contamination': Continuous(0.05, 0.3),
'n_estimators': Integer(50, 150)
}
search = GASearchCV(estimator=estimator,
param_grid=param_grid,
scoring=None, # triggers default_outlier_scorer
cv=3,
generations=4,
population_size=6,
n_jobs=-1)
search.fit(X)
Using GAFeatureSelectionCV with outlier detection:
from sklearn_genetic import GAFeatureSelectionCV
from sklearn.ensemble import IsolationForest
selector = GAFeatureSelectionCV(
estimator=IsolationForest(random_state=42),
scoring=None, # default_outlier_scorer used
cv=3,
generations=4,
population_size=6,
n_jobs=-1
)
selector.fit(X)
Custom Scoring
You may override the default logic by passing your own custom scoring function:
def custom_score(estimator, X, y=None):
return np.std(estimator.score_samples(X))
search = GASearchCV(
estimator=IsolationForest(),
param_grid=param_grid,
scoring=custom_score,
cv=3,
generations=4,
population_size=6,
n_jobs=1
)
search.fit(X)
Limitations
Only estimators with fit_predict, decision_function, or score_samples are supported by default.
Models not recognized as outlier detectors must be scored explicitly or will raise a ValueError.