hypermodel.features package

Submodules

hypermodel.features.categorical module

Helper functions for dealing with categorical features

hypermodel.features.categorical.get_unique_feature_values(dataframe: pandas.core.frame.DataFrame, features: List[str]) → Dict[str, List[str]]
Take a dataframe and a list of features, and for each feature find me all the unique

values of that feature. This is a useful step prior to one-hot encoding, as it gives you a list of all the values we can expect to encode.

Parameters:
  • dataframe (pd.DataFrame) – The DataFrame to use to collect values
  • features (List[str]) – A list of all the Features we want to find the unique values of
Returns:

A dictionary keyed by the name of each feature, containing a list of all that features unique values

hypermodel.features.categorical.one_hot_encode(dataframe: pandas.core.frame.DataFrame, uniques: Dict[str, List[str]], throw_on_missing=False) → pandas.core.frame.DataFrame

Create a new dataframe that one-hot-encodes values from the given dataframe against the known list of unique feature values (calculated using get_unique_feature_values).

Parameters:
  • dataframe (pd.DataFrame) – The DataFrame to use to collect values
  • uniques (Dict[str, List[str]]) – A dict keyed by feature name, containing a list of unique values
  • throw_on_missing (bool) – If a value is found in the DataFrame which is missing from the uniques dict(), and this parameter is True, we will throw an Exception to prevent further execution. When encoding unseen data against known data, this can be useful to ensure you are not predicting using unseen data.
Returns:

A new DataFrame with each Feature/Value pair as a new column with a “1” where the row contains the features value, and a “0” where it does not

hypermodel.features.numerical module

hypermodel.features.numerical.describe_features(dataframe: pandas.core.frame.DataFrame, features: List[str])

Return a dictionary keyed with the name of a feature and containing that features summary statistics.

Parameters:
  • dataframe (pd.DataFrame) – The dataframe to adjust values with
  • features (List[str]) – The name of the features (columns in dataframe) to analyze
Returns:

A dictionary keyed by the feature name, containing summary statistics of the values of that feature.

hypermodel.features.numerical.scale_by_mean_stdev(dataframe: pandas.core.frame.DataFrame, feature: str, mean: float, stdev: float) → pandas.core.frame.DataFrame

Scale all the values in a column using a pre-sepcified mean / stdev, in place.

Parameters:
  • dataframe (pd.DataFrame) – The dataframe to adjust values with
  • feature (str) – The name of the Feature column in the dataframe
  • mean (float) – The mean to use to scale values
  • stdev (float) – The standard deviation to use to scale values
Returns:

The adjusted dataframe passed in

Module contents