xb.Schema#

class excelbird.schema.Column(input, output)#

The values stored by a xb.Schema

A namedtuple with two values, input and output, that can be accessed by dot notation, and is immutable.

Attributes:

input: Alias for field number 0
output: Alias for field number 1

class excelbird.schema.Schema(*schemas, **kwargs: tuple[str, str] | tuple[str] | list[str] | str | excelbird.schema.Column)[source]#

Defines the state of a dataframe.

Parameters:

*schemasSchema: Existing schemas to use, to build a composite Schema that shows the reader where the columns are coming from.
**kwargstuple[str, str] | tuple[str] | str | Column: A mapping of python-friendly variable names to their corresponding input column names and output column names. If value is a string, or 1-element tuple, it will be applied as both the input and output name.

Examples

Define a new schema

sch_person = Schema(
    first_name=("FName", "First Name"),
    last_name=("LName", "Last Name"),
    age="Age",
)

Define a composite schema that uses columns from a previous one

sch_employee = Schema(
    sch_person[[
        'last_name',
        'age',
    ]],
    rank="Rank"
)

Methods

`apply`(df[, strict])	Removes columns from a dataframe that aren't in the schema, and re-orders columns according to schema's order.
`drop`(columns)	Returns a copy of Self with the specified keys dropped
`fromkeys`(iterable[, value])	Create a new dictionary with keys from iterable and values set to value.
`get`(key[, default])	Return the value for key if key is in the dictionary, else default.
`inputs`()	The input values for each key in the schema
`items`()
`keys`()
`outputs`()	The output values for each key in the schema
`popitem`(/)	Remove and return a (key, value) pair as a 2-tuple.
`rename`([keys, inputs, outputs])	Rename any part of the schema's data (keys, inputs, outputs) using a dictionary.
`rename_inputs_to_vars`(df)	Calls `df.rename` on the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schema
`rename_vars_to_outputs`(df)	Calls `df.rename` on the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schema
`reset_inputs`()	Replaces all input values with the current output values.
`reset_outputs`()	Replaces all output values with current input values.
`select_inputs`(df)	Renames desired columns to var names, and selects them in the order of the schema.
`select_outputs`(df)	Renames the current columns to output names, and selects them in the order of the schema.
`setdefault`(key[, default])	Insert key with a value of default if key is not in the dictionary.
`update`([other])	Just like the normal `dict.update()`, but if a regular `dict`, or keyword arguments are passed, the arguments are first converted to a `Schema` before updating.
`values`()

Schema Methods#

excelbird.schema.Schema.select_inputs(self, df: DataFrame) → DataFrame#

Renames desired columns to var names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.

Parameters:

dfpd.DataFrame: Target dataframe

Returns:

pd.DataFrame

excelbird.schema.Schema.select_outputs(self, df: DataFrame) → DataFrame#

Renames the current columns to output names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.

Parameters:

dfpd.DataFrame: Target dataframe

Returns:

pd.DataFrame

excelbird.schema.Schema.apply(self, df: DataFrame, strict: bool = False) → DataFrame#

Removes columns from a dataframe that aren’t in the schema, and re-orders columns according to schema’s order. If strict=True, An error will be raised if df doesn’t contain at least all the desired columns

Parameters:

dfpd.DataFrame: Dataframe to apply the changes
strictbool, default False: Whether to enforce that df must contain all columns needed by the schema

Returns:

pd.DataFrame: The updated dataframe

excelbird.schema.Schema.__getitem__(self, key)#

Called when accessing items with sch[<key>] syntax.

Acts exactly like dict’s __getitem__, unless a list is passed. Pass a list of keys to return a new object with the selected elements, in the desired order, similar to how a pd.DataFrame works.

Parameters:

keystr or int or list[str] or slice: Used to access items

Returns:

xb.Column: If a non-list key is used
xb.Schema: If a list key is used

excelbird.schema.Schema.drop(self, columns: list[str] | str) → Schema#

Returns a copy of Self with the specified keys dropped

Parameters:

columnslist[str] or str: The items to drop

Returns:

Self

excelbird.schema.Schema.rename(self, keys: dict[str, str] | None = None, inputs: dict[str, str] | None = None, outputs: dict[str, str] | None = None) → Schema#

Rename any part of the schema’s data (keys, inputs, outputs) using a dictionary. Pick one of keys, inputs, outputs.

Regardless of which option is chosen, the keys in the provided dictionary must represent current keys in the schema.

Parameters:

keysdict[str, str], optional: Mapping to rename the keys in the current schema
inputsdict[str, str], optional: Mapping to rename the inputs in the current schema
outputsdict[str, str], optional: Mapping to rename the outputs in the current schema

Returns:

Self

excelbird.schema.Schema.update(self, other: excelbird.schema.Schema | dict | None = None, **kwargs) → None#

Just like the normal dict.update(), but if a regular dict, or keyword arguments are passed, the arguments are first converted to a Schema before updating.

Parameters:

otherSchema or dict, optional: Mapping to update the current schema with
**kwargsstr: Used to create a Schema first, then update the current one with it.

Returns:

Self

excelbird.schema.Schema.rename_inputs_to_vars(self, df: DataFrame) → DataFrame#

Calls df.rename on the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schema

Parameters:

dfpd.DataFrame: Dataframe to update

Returns:

pd.DataFrame: The updated dataframe

excelbird.schema.Schema.rename_vars_to_outputs(self, df: DataFrame) → DataFrame#

Calls df.rename on the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schema

Parameters:

dfpd.DataFrame: Dataframe to update

Returns:

pd.DataFrame: The updated dataframe

excelbird.schema.Schema.inputs(self) → list[str]#

The input values for each key in the schema

Returns:

list[str]

excelbird.schema.Schema.outputs(self) → list[str]#

The output values for each key in the schema

Returns:

list[str]

excelbird.schema.Schema.reset_inputs(self) → Schema#

Replaces all input values with the current output values. Use this if you’re using a previous schema to read in data that was outputted from it

Returns:

Self

excelbird.schema.Schema.reset_outputs(self) → Schema#

Replaces all output values with current input values.

Returns:

Self