xb.Schema#

class excelbird.schema.Column(input, output)#

The values stored by a xb.Schema

A namedtuple with two values, input and output, that can be accessed by dot notation, and is immutable.

Attributes:
input

Alias for field number 0

output

Alias for field number 1

class excelbird.schema.Schema(*schemas, **kwargs: tuple[str, str] | tuple[str] | list[str] | str | excelbird.schema.Column)[source]#

Defines the state of a dataframe.

Parameters:
*schemasSchema

Existing schemas to use, to build a composite Schema that shows the reader where the columns are coming from.

**kwargstuple[str, str] | tuple[str] | str | Column

A mapping of python-friendly variable names to their corresponding input column names and output column names. If value is a string, or 1-element tuple, it will be applied as both the input and output name.

Examples

Define a new schema

sch_person = Schema(
    first_name=("FName", "First Name"),
    last_name=("LName", "Last Name"),
    age="Age",
)

Define a composite schema that uses columns from a previous one

sch_employee = Schema(
    sch_person[[
        'last_name',
        'age',
    ]],
    rank="Rank"
)

Methods

apply(df[, strict])

Removes columns from a dataframe that aren't in the schema, and re-orders columns according to schema's order.

drop(columns)

Returns a copy of Self with the specified keys dropped

fromkeys(iterable[, value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

inputs()

The input values for each key in the schema

items()

keys()

outputs()

The output values for each key in the schema

popitem(/)

Remove and return a (key, value) pair as a 2-tuple.

rename([keys, inputs, outputs])

Rename any part of the schema's data (keys, inputs, outputs) using a dictionary.

rename_inputs_to_vars(df)

Calls df.rename on the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schema

rename_vars_to_outputs(df)

Calls df.rename on the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schema

reset_inputs()

Replaces all input values with the current output values.

reset_outputs()

Replaces all output values with current input values.

select_inputs(df)

Renames desired columns to var names, and selects them in the order of the schema.

select_outputs(df)

Renames the current columns to output names, and selects them in the order of the schema.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

update([other])

Just like the normal dict.update(), but if a regular dict, or keyword arguments are passed, the arguments are first converted to a Schema before updating.

values()

Schema Methods#

excelbird.schema.Schema.select_inputs(self, df: DataFrame) DataFrame#

Renames desired columns to var names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.

Parameters:
dfpd.DataFrame

Target dataframe

Returns:
pd.DataFrame


excelbird.schema.Schema.select_outputs(self, df: DataFrame) DataFrame#

Renames the current columns to output names, and selects them in the order of the schema. If a column isn’t found, an error is raised to force you to correct your schema.

Parameters:
dfpd.DataFrame

Target dataframe

Returns:
pd.DataFrame


excelbird.schema.Schema.apply(self, df: DataFrame, strict: bool = False) DataFrame#

Removes columns from a dataframe that aren’t in the schema, and re-orders columns according to schema’s order. If strict=True, An error will be raised if df doesn’t contain at least all the desired columns

Parameters:
dfpd.DataFrame

Dataframe to apply the changes

strictbool, default False

Whether to enforce that df must contain all columns needed by the schema

Returns:
pd.DataFrame

The updated dataframe


excelbird.schema.Schema.__getitem__(self, key)#

Called when accessing items with sch[<key>] syntax.

Acts exactly like dict’s __getitem__, unless a list is passed. Pass a list of keys to return a new object with the selected elements, in the desired order, similar to how a pd.DataFrame works.

Parameters:
keystr or int or list[str] or slice

Used to access items

Returns:
xb.Column

If a non-list key is used

xb.Schema

If a list key is used


excelbird.schema.Schema.drop(self, columns: list[str] | str) Schema#

Returns a copy of Self with the specified keys dropped

Parameters:
columnslist[str] or str

The items to drop

Returns:
Self


excelbird.schema.Schema.rename(self, keys: dict[str, str] | None = None, inputs: dict[str, str] | None = None, outputs: dict[str, str] | None = None) Schema#

Rename any part of the schema’s data (keys, inputs, outputs) using a dictionary. Pick one of keys, inputs, outputs.

Regardless of which option is chosen, the keys in the provided dictionary must represent current keys in the schema.

Parameters:
keysdict[str, str], optional

Mapping to rename the keys in the current schema

inputsdict[str, str], optional

Mapping to rename the inputs in the current schema

outputsdict[str, str], optional

Mapping to rename the outputs in the current schema

Returns:
Self


excelbird.schema.Schema.update(self, other: excelbird.schema.Schema | dict | None = None, **kwargs) None#

Just like the normal dict.update(), but if a regular dict, or keyword arguments are passed, the arguments are first converted to a Schema before updating.

Parameters:
otherSchema or dict, optional

Mapping to update the current schema with

**kwargsstr

Used to create a Schema first, then update the current one with it.

Returns:
Self


excelbird.schema.Schema.rename_inputs_to_vars(self, df: DataFrame) DataFrame#

Calls df.rename on the given dataframe and provides a mapping from the inputs in the current schema to the keys in the current schema

Parameters:
dfpd.DataFrame

Dataframe to update

Returns:
pd.DataFrame

The updated dataframe


excelbird.schema.Schema.rename_vars_to_outputs(self, df: DataFrame) DataFrame#

Calls df.rename on the given dataframe and provides a mapping from the keys in the current schema to the outputs in the current schema

Parameters:
dfpd.DataFrame

Dataframe to update

Returns:
pd.DataFrame

The updated dataframe


excelbird.schema.Schema.inputs(self) list[str]#

The input values for each key in the schema

Returns:
list[str]


excelbird.schema.Schema.outputs(self) list[str]#

The output values for each key in the schema

Returns:
list[str]


excelbird.schema.Schema.reset_inputs(self) Schema#

Replaces all input values with the current output values. Use this if you’re using a previous schema to read in data that was outputted from it

Returns:
Self


excelbird.schema.Schema.reset_outputs(self) Schema#

Replaces all output values with current input values.

Returns:
Self