{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Expressions / Cell References\n", "\n", "In Excel, cell references are just written records of which objects are being used by a formula. You don't care that a reference is labeled \"A3\" - that location can change at any time. What you're really doing is referencing an *object* that currently lives at A3, and you should be able to cut/paste that object to any new location in the workbook while maintaining that reference.\n", "\n", "Excelbird brings this behavior to Python." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from excelbird import *\n", "import excelbird as xb\n", "PATH = \"test.xlsx\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "A Cell in excelbird represents a real object in the workbook, and therefore it **can only exist once**.\n", "\n", "Let's try to place the **same object** twice." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_cell = Cell('hello world')\n", "\n", "Book(\n", " Sheet(\n", " my_cell,\n", " my_cell,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This was actually invalid code. Excelbird just corrected it for you. It's impossible to have the same thing be in two places at one time.\n", "\n", "What you meant to say is..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_cell = Cell('hello world')\n", "\n", "Book(\n", " Sheet(\n", " my_cell,\n", " my_cell.ref(), # <---\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**We can reference anything from anywhere**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_cell = Cell('hello world')\n", "\n", "Book(\n", " Sheet(\n", " my_cell.ref(),\n", " ),\n", " Sheet(\n", " my_cell,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ ".. note::\n", " `.ref()` is **only** used when you need to make an exact *copy* of something by reference. It is **not** needed when writing expressions or formula functions" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Python Expressions\n", "\n", "**Please take a minute and observe these first two examples closely**. They demonstrate one of the most important concepts in excelbird." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ten = Cell(10)\n", "\n", "Book(\n", " Sheet(\n", " ten,\n", " ten + 5,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ten = Cell(10)\n", "\n", "hundred_and_ten = 10 + (ten * 10)\n", "\n", "Book(\n", " Sheet(\n", " hundred_and_ten,\n", " ten,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Excelbird references are, in a sense, 'smarter' than those in Excel.\n", "\n", "If you \"delete\" the source data for a formula (i.e. the cells it's referencing) by excluding them from the layout, your formula will track down the original value and use it instead.\n", "\n", "For instance, let's try the same code as above, but **exclude Cell** `ten` from the Book" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ten = Cell(10)\n", "\n", "hundred_and_ten = 10 + (ten * 10)\n", "\n", "Book(\n", " Sheet(\n", " hundred_and_ten,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Excelbird's expression resolver noticed a reference to `ten` was invalid because `ten` wasn't placed in the Book, so it used its value instead.\n", "\n", "Let's try a trickier one" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "one = Cell(1)\n", "two = Cell(2)\n", "\n", "ten = Cell(10)\n", "twenty = Cell(20)\n", "\n", "three = one + two\n", "thirty = ten + twenty\n", "\n", "thirty_three = three + thirty\n", "\n", "Book(\n", " Sheet(\n", " one,\n", " thirty_three,\n", " ),\n", ").write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Expressions with Vectors\n", "\n", "**All** of the previous examples will work with series and frames as well.\n", "\n", "**Col + Col**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = Col(1, 2, 3)\n", "b = Col(10, 20, 30)\n", "\n", "Frame(\n", " a,\n", " b,\n", " a + b,\n", ")," ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Frame + Frame**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "theme = xb.colors.theme\n", "\n", "a_b = Frame([1,2,3], [4,5,6], fill_color=theme.green1)\n", "c_d = Frame([10,20,30], [40,50,60], fill_color=theme.red1)\n", "\n", "Stack(\n", " a_b,\n", " c_d,\n", " a_b + c_d\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Mixed Lengths\n", "\n", "Intuitively, the returned vector will only be as long as the *minimum* of the two inputs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shorter = Col(1, 2)\n", "longer = Col(1, 2, 3)\n", "\n", "Frame(\n", " shorter,\n", " longer,\n", " shorter + longer,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Mixed Types\n", "\n", "**Any combination** of Cell, Col/Row, Frame/VFrame, or any of those paired with an int, float, list, tuple, pandas Series/DataFrame can be used in an expression together.\n", "\n", "*But how do I know what will be returned?*\n", "\n", "It will take practice, but there are rules.\n", "\n", "We can describe a returned element by **2** characteristics:\n", "\n", "1. *Dimensionality*: Frame is 2-dimensional, Col is 1-dimensional, and Cell is 0-dimensional\n", "2. *Flow Direction*: Col & Frame are column-based, and Row & VFrame are row-based\n", "\n", "The return type of a mixed-type expression is determined by the following rules:\n", "\n", "1. **Dimensionality**: The *maximum* dimensionality between *a* and *b*. (1d + 2d -> 2d, and 2d + 0d -> 2d)\n", "2. **Flow Direction**: Favor the first *dimensional* element in the expression. If not enough information is available (i.e. Cell + list), default to Col or Frame.\n", "\n", "Let's break these down one by one\n", "\n", "#### Dimensionality\n", "\n", "This is the easy one. Just pick the bigger of the two.\n", "\n", "- 2D + 1D -> 2D\n", "- 0D + 2D -> 2D\n", "- 1D + 0D -> 1D" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Cell** (0D) **+ Frame** (2D) **-> Frame**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "frame = Frame([1,2,3], [4,5,6], fill_color=theme.purple1)\n", "\n", "cell = Cell(1000, fill_color=theme.orange1)\n", "\n", "Stack(\n", " frame,\n", " cell,\n", " cell + frame,\n", " fill_color=theme.green1,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Flow Direction\n", "\n", "Give priority to the **first** *dimensional* element (not a Cell) in the expression if possible. Otherwise default to Col and Frame, if a vector must be returned but none of the inputs are directional (i.e. Cell + list)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Col + Row -> Col**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "column = Col(1, 2, 3)\n", "row = Row(10, 20, 30)\n", "\n", "column + row" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Row + Col -> Row**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "row + column" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Col + VFrame -> Frame**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "column = Col(10, 20, 30, 40)\n", "vframe = VFrame([1,2,3,4], [5,6,7,8])\n", "\n", "column + vframe" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**VFrame + Col -> VFrame**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vframe + column" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**list + Cell -> Col** (default, since neither Cell nor list have direction)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[1,2,3,4] + Cell(100)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## A more practical example\n", "\n", "Most people will use pandas DataFrames as input data at some point, and want to access elements of a Frame/VFrame by header. Here's a quick example to show what that looks like.\n", "\n", ".. note::\n", " For more detail on subscripting/slicing elements from a Frame/VFrame, please refer to the \"Your First Book\" lesson." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NumberWord
01one
12two
23three
34four
\n", "
" ], "text/plain": [ " Number Word\n", "0 1 one\n", "1 2 two\n", "2 3 three\n", "3 4 four" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " zip([1,2,3,4], ['one', 'two', 'three', 'four']),\n", " columns=['Number', 'Word']\n", ")\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NumberWord
Cell(1)Cell(one)
Cell(2)Cell(two)
Cell(3)Cell(three)
Cell(4)Cell(four)
\n" ], "text/plain": [ "Frame([Col([Cell(1), Cell(2), Cell(3), Cell(4)]), Col([Cell(one), Cell(two), Cell(three), Cell(four)])])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fr = Frame(df)\n", "fr" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Add a new column that joins strings from two columns" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NumberWordBig Number
Cell(1)Cell(one)Cell({...})
Cell(2)Cell(two)Cell({...})
Cell(3)Cell(three)Cell({...})
Cell(4)Cell(four)Cell({...})
\n" ], "text/plain": [ "Frame([Col([Cell(1), Cell(2), Cell(3), Cell(4)]), Col([Cell(one), Cell(two), Cell(three), Cell(four)]), Col([Cell({...}), Cell({...}), Cell({...}), Cell({...})])])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fr['Big Number'] = fr['Number'] & \" thousand\"\n", "fr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Book(fr).write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Operators\n", "\n", "Up to this point, you've seen the `+`, `*`, and `&` in use, but every other operator you need should be available.\n", "\n", ".. note::\n", " Please read [the Python Operators page](../operators.ipynb) for a full list of available operators\n", "\n", "We're going to highlight the handful of operators and builtin functions that aren't as obvious or intuitive as the others." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Python's Built-In sum()\n", "\n", "Excelbird objects can handle the Python builtin `sum()` function, with one **important catch**: You must place arguments inside a list.\n", "\n", "So, instead of `sum(x)` you must write `sum([x])`, **even if** `x` is an iterable element, like Col or Frame. If summing multiple things, proceed normally: `sum([a, b, c])`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = Col(1,2,3)\n", "b = Col(4,5,6)\n", "\n", "Sheet(\n", " Frame(a, b),\n", " Row(\n", " sum([a]),\n", " sum([b]),\n", " ),\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### OR(), and NOT()\n", "\n", "In excelbird:\n", "\n", "- `a | b` in Pyhon returns `OR(a, b)` in Excel\n", "- `~ b` in Python returns `NOT(a, b)` in Excel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = Cell(True)\n", "b = Cell(False)\n", "\n", "Row(\n", " a,\n", " b,\n", " a | b,\n", " ~ a,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### MOD() and %\n", "\n", "In excelbird:\n", "\n", "- `a % b` in Python returns `MOD(a, b)` in Excel.\n", "\n", "There's also a hidden feature. In regular Excel, you can convert a cell to a percent by placing the `%` operator after a cell reference, like `=A4%`. In excelbird, this is achieved by using the **same object** on both sides of the operator: `a % a`. *Not* two equal things, but the *same* thing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ten = Cell(10)\n", "eight = Cell(8)\n", "\n", "Row(\n", " ten,\n", " eight,\n", " ten % eight,\n", " ten % ten,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Range ':'\n", "\n", "In Excel, a range is selected using syntax like `A3:A10`.\n", "\n", "Unfortunately, `my_rng = a:b` by itself is not possible in Python.\n", "\n", "Instead, we've chosen the `>>` operator for the job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = Cell(1)\n", "b = Cell(2)\n", "\n", "Row(\n", " a,\n", " b,\n", " sum([a >> b]),\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Other ways to reference a range\n", "\n", "The `>>` operator is only needed to specify exactly what a range should be. When working with Frames and series, there are easier ways to reference ranges.\n", "\n", "### .range()\n", "\n", "Every excelbird vector has a `.range()` method, and they all do the same thing: return a **Cell** that references the range from the *first* Cell to the *last* Cell in the vector.\n", "\n", "The main purpose of this on the user's end is to **override the default granularity** for an argument in a formula.\n", "\n", "Here's an ordinary Col function, where `MAX(...)` (when inside Frame, and given Cols as arguments) returns a vector of MAX() functions that each reference individual Cells" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from excelbird.fn import MAX\n", "\n", "a = Col(1,20,300)\n", "b = Col(600,50,4)\n", "\n", "Frame(\n", " a,\n", " b,\n", " MAX(a, b),\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "But if we replace `one` with `one.range()` in the MAX function..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "...\n", "MAX(one.range(), two)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### .loc\n", "\n", "All excelbird vectors have a `.loc` property, which offers an alternative to the default subscripting behavior: When subscripted, it returns a *range* from the elements returned by the slice, instead of the elements themselves.\n", "\n", "`my_frame.loc[2:5]` is equivalent to `my_frame[2] >> my_frame[4]`. As a reminder, both of these options will return a single Col, where each cell holds a cross range" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OneTwoThreeFourFive
Cell(1)Cell(4)Cell(7)Cell(10)Cell(13)
Cell(2)Cell(5)Cell(8)Cell(11)Cell(14)
Cell(3)Cell(6)Cell(9)Cell(12)Cell(15)
\n" ], "text/plain": [ "Frame([Col([Cell(1), Cell(2), Cell(3)]), Col([Cell(4), Cell(5), Cell(6)]), Col([Cell(7), Cell(8), Cell(9)]), Col([Cell(10), Cell(11), Cell(12)]), Col([Cell(13), Cell(14), Cell(15)])])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fr = Frame(pd.DataFrame(\n", " zip([1,2,3], [4,5,6], [7,8,9], [10,11,12], [13,14,15]),\n", " columns=['One', 'Two', 'Three', 'Four', 'Five']\n", "))\n", "fr" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Like normal, we can refer to elements by header instead of index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fr['Two to Four'] = sum([ fr.loc['Two':'Four'] ])\n", "\n", "Book(fr).write(PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "6ee8c39a5e4f5a5ffe92ac0abdca1f10885ded7e28c74450614be57680931831" } } }, "nbformat": 4, "nbformat_minor": 2 }