xb.Schema#
- class excelbird.schema.Column(input, output)#
The values stored by a
xb.SchemaA
namedtuplewith two values, input and output, that can be accessed by dot notation, and is immutable.- Attributes:
- input
Alias for field number 0
- output
Alias for field number 1
- class excelbird.schema.Schema(*schemas, **kwargs: tuple[str, str] | tuple[str] | list[str] | str | excelbird.schema.Column)[source]#
Defines the state of a dataframe.
- Parameters:
- *schemas
Schema Existing schemas to use, to build a composite Schema that shows the reader where the columns are coming from.
- **kwargs
tuple[str, str] |tuple[str] | str |Column A mapping of python-friendly variable names to their corresponding input column names and output column names. If value is a string, or 1-element tuple, it will be applied as both the input and output name.
- *schemas
Examples
Define a new schema
sch_person = Schema( first_name=("FName", "First Name"), last_name=("LName", "Last Name"), age="Age", )
Define a composite schema that uses columns from a previous one
sch_employee = Schema( sch_person[[ 'last_name', 'age', ]], rank="Rank" )
Methods
apply(df[, strict])Removes columns from a dataframe that aren't in the schema, and re-orders columns according to schema's order.
drop(columns)Returns a copy of Self with the specified keys dropped
fromkeys(iterable[, value])Create a new dictionary with keys from iterable and values set to value.
get(key[, default])Return the value for key if key is in the dictionary, else default.
inputs()The input values for each key in the schema
items()keys()outputs()The output values for each key in the schema
popitem(/)Remove and return a (key, value) pair as a 2-tuple.
rename([keys, inputs, outputs])Rename any part of the schema's data (keys, inputs, outputs) using a dictionary.
Calls
df.renameon the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schemaCalls
df.renameon the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schemaReplaces all input values with the current output values.
Replaces all output values with current input values.
select_inputs(df)Renames desired columns to var names, and selects them in the order of the schema.
select_outputs(df)Renames the current columns to output names, and selects them in the order of the schema.
setdefault(key[, default])Insert key with a value of default if key is not in the dictionary.
update([other])Just like the normal
dict.update(), but if a regulardict, or keyword arguments are passed, the arguments are first converted to aSchemabefore updating.values()
Schema Methods#
- excelbird.schema.Schema.select_inputs(self, df: DataFrame) DataFrame#
Renames desired columns to var names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.
- Parameters:
- df
pd.DataFrame Target dataframe
- df
- Returns:
- excelbird.schema.Schema.select_outputs(self, df: DataFrame) DataFrame#
Renames the current columns to output names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.
- Parameters:
- df
pd.DataFrame Target dataframe
- df
- Returns:
- excelbird.schema.Schema.apply(self, df: DataFrame, strict: bool = False) DataFrame#
Removes columns from a dataframe that aren’t in the schema, and re-orders columns according to schema’s order. If
strict=True, An error will be raised ifdfdoesn’t contain at least all the desired columns- Parameters:
- df
pd.DataFrame Dataframe to apply the changes
- strictbool,
defaultFalse Whether to enforce that
dfmust contain all columns needed by the schema
- df
- Returns:
pd.DataFrameThe updated dataframe
- excelbird.schema.Schema.__getitem__(self, key)#
Called when accessing items with
sch[<key>]syntax.Acts exactly like
dict’s__getitem__, unless alistis passed. Pass a list of keys to return a new object with the selected elements, in the desired order, similar to how apd.DataFrameworks.
- excelbird.schema.Schema.drop(self, columns: list[str] | str) Schema#
Returns a copy of Self with the specified keys dropped
- Parameters:
- columnslist[str]
orstr The items to drop
- columnslist[str]
- Returns:
Self
- excelbird.schema.Schema.rename(self, keys: dict[str, str] | None = None, inputs: dict[str, str] | None = None, outputs: dict[str, str] | None = None) Schema#
Rename any part of the schema’s data (keys, inputs, outputs) using a dictionary. Pick one of
keys,inputs,outputs.Regardless of which option is chosen, the keys in the provided dictionary must represent current keys in the schema.
- Parameters:
- keysdict[str, str],
optional Mapping to rename the keys in the current schema
- inputsdict[str, str],
optional Mapping to rename the inputs in the current schema
- outputsdict[str, str],
optional Mapping to rename the outputs in the current schema
- keysdict[str, str],
- Returns:
Self
- excelbird.schema.Schema.update(self, other: excelbird.schema.Schema | dict | None = None, **kwargs) None#
Just like the normal
dict.update(), but if a regulardict, or keyword arguments are passed, the arguments are first converted to aSchemabefore updating.- Parameters:
- other
Schemaor dict,optional Mapping to update the current schema with
- **kwargsstr
Used to create a Schema first, then update the current one with it.
- other
- Returns:
Self
- excelbird.schema.Schema.rename_inputs_to_vars(self, df: DataFrame) DataFrame#
Calls
df.renameon the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schema- Parameters:
- df
pd.DataFrame Dataframe to update
- df
- Returns:
pd.DataFrameThe updated dataframe
- excelbird.schema.Schema.rename_vars_to_outputs(self, df: DataFrame) DataFrame#
Calls
df.renameon the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schema- Parameters:
- df
pd.DataFrame Dataframe to update
- df
- Returns:
pd.DataFrameThe updated dataframe
- excelbird.schema.Schema.inputs(self) list[str]#
The input values for each key in the schema
- Returns:
- list[str]
- excelbird.schema.Schema.outputs(self) list[str]#
The output values for each key in the schema
- Returns:
- list[str]
- excelbird.schema.Schema.reset_inputs(self) Schema#
Replaces all input values with the current output values. Use this if you’re using a previous schema to read in data that was outputted from it
- Returns:
Self