hypermodel.features package¶

Submodules¶

hypermodel.features.categorical module¶

Helper functions for dealing with categorical features

hypermodel.features.categorical.get_unique_feature_values(dataframe: pandas.core.frame.DataFrame, features: List[str]) → Dict[str, List[str]]¶

Take a dataframe and a list of features, and for each feature find me all the unique

values of that feature. This is a useful step prior to one-hot encoding, as it gives you a list of all the values we can expect to encode.

Parameters:	dataframe (pd.DataFrame) – The DataFrame to use to collect values features (List[str]) – A list of all the Features we want to find the unique values of
Returns:	A dictionary keyed by the name of each feature, containing a list of all that features unique values

hypermodel.features.categorical.one_hot_encode(dataframe: pandas.core.frame.DataFrame, uniques: Dict[str, List[str]], throw_on_missing=False) → pandas.core.frame.DataFrame¶

Create a new dataframe that one-hot-encodes values from the given dataframe against the known list of unique feature values (calculated using get_unique_feature_values).

Parameters:

dataframe (pd.DataFrame) – The DataFrame to use to collect values
uniques (Dict[str, List[str]]) – A dict keyed by feature name, containing a list of unique values
throw_on_missing (bool) – If a value is found in the DataFrame which is missing from the uniques dict(), and this parameter is True, we will throw an Exception to prevent further execution. When encoding unseen data against known data, this can be useful to ensure you are not predicting using unseen data.

Returns:

A new DataFrame with each Feature/Value pair as a new column with a “1” where the row contains the features value, and a “0” where it does not

hypermodel.features.numerical module¶

hypermodel.features.numerical.describe_features(dataframe: pandas.core.frame.DataFrame, features: List[str])¶

Return a dictionary keyed with the name of a feature and containing that features summary statistics.

Parameters:	dataframe (pd.DataFrame) – The dataframe to adjust values with features (List[str]) – The name of the features (columns in dataframe) to analyze
Returns:	A dictionary keyed by the feature name, containing summary statistics of the values of that feature.

hypermodel.features.numerical.scale_by_mean_stdev(dataframe: pandas.core.frame.DataFrame, feature: str, mean: float, stdev: float) → pandas.core.frame.DataFrame¶

Scale all the values in a column using a pre-sepcified mean / stdev, in place.

Parameters:	dataframe (pd.DataFrame) – The dataframe to adjust values with feature (str) – The name of the Feature column in the dataframe mean (float) – The mean to use to scale values stdev (float) – The standard deviation to use to scale values
Returns:	The adjusted dataframe passed in

hypermodel.features package¶

Submodules¶

hypermodel.features.categorical module¶

hypermodel.features.numerical module¶

Module contents¶