Expressions / Cell References#

In Excel, cell references are just written records of which objects are being used by a formula. You don’t care that a reference is labeled “A3” - that location can change at any time. What you’re really doing is referencing an object that currently lives at A3, and you should be able to cut/paste that object to any new location in the workbook while maintaining that reference.

Excelbird brings this behavior to Python.

[2]:
import pandas as pd
from excelbird import *
import excelbird as xb
PATH = "test.xlsx"

A Cell in excelbird represents a real object in the workbook, and therefore it can only exist once.

Let’s try to place the same object twice.

[ ]:
my_cell = Cell('hello world')

Book(
    Sheet(
        my_cell,
        my_cell,
    ),
).write(PATH)

bacf6c89075b4964a34db204ede33c48

This was actually invalid code. Excelbird just corrected it for you. It’s impossible to have the same thing be in two places at one time.

What you meant to say is…

[ ]:
my_cell = Cell('hello world')

Book(
    Sheet(
        my_cell,
        my_cell.ref(),  # <---
    ),
).write(PATH)

We can reference anything from anywhere

[ ]:
my_cell = Cell('hello world')

Book(
    Sheet(
        my_cell.ref(),
    ),
    Sheet(
        my_cell,
    ),
).write(PATH)

41e6844f4ec0413f9f43a0758d62d67a

30cd1ecf47f04d1aa4d1444cf03796a7

Note

.ref() is only used when you need to make an exact copy of something by reference. It is not needed when writing expressions or formula functions

Python Expressions#

Please take a minute and observe these first two examples closely. They demonstrate one of the most important concepts in excelbird.

[ ]:
ten = Cell(10)

Book(
    Sheet(
        ten,
        ten + 5,
    ),
).write(PATH)

6925efd6e56b42f885c4c05c0a29da9b

[ ]:
ten = Cell(10)

hundred_and_ten = 10 + (ten * 10)

Book(
    Sheet(
        hundred_and_ten,
        ten,
    ),
).write(PATH)

4714dccdb85c4d31909ece5deb52b867

Excelbird references are, in a sense, ‘smarter’ than those in Excel.

If you “delete” the source data for a formula (i.e. the cells it’s referencing) by excluding them from the layout, your formula will track down the original value and use it instead.

For instance, let’s try the same code as above, but exclude Cell ten from the Book

[ ]:
ten = Cell(10)

hundred_and_ten = 10 + (ten * 10)

Book(
    Sheet(
        hundred_and_ten,
    ),
).write(PATH)

b98c8937b7c74759b335ab2527a2421d

Excelbird’s expression resolver noticed a reference to ten was invalid because ten wasn’t placed in the Book, so it used its value instead.

Let’s try a trickier one

[ ]:
one = Cell(1)
two = Cell(2)

ten = Cell(10)
twenty = Cell(20)

three = one + two
thirty = ten + twenty

thirty_three = three + thirty

Book(
    Sheet(
        one,
        thirty_three,
    ),
).write(PATH)

3aa87a45437c4aca8d61d9e3199ac8be

Expressions with Vectors#

All of the previous examples will work with series and frames as well.

Col + Col

[ ]:
a = Col(1, 2, 3)
b = Col(10, 20, 30)

Frame(
    a,
    b,
    a + b,
),

9903091d74a74f50b5ff706a05900069

Frame + Frame

[ ]:
theme = xb.colors.theme

a_b = Frame([1,2,3], [4,5,6], fill_color=theme.green1)
c_d = Frame([10,20,30], [40,50,60], fill_color=theme.red1)

Stack(
    a_b,
    c_d,
    a_b + c_d
)

91c3098ece8947708e1b4577006884e4

Mixed Lengths#

Intuitively, the returned vector will only be as long as the minimum of the two inputs

[ ]:
shorter = Col(1, 2)
longer = Col(1, 2, 3)

Frame(
    shorter,
    longer,
    shorter + longer,
)

ba0a4bc9beb74d99995e91be3aba5b2a

Mixed Types#

Any combination of Cell, Col/Row, Frame/VFrame, or any of those paired with an int, float, list, tuple, pandas Series/DataFrame can be used in an expression together.

But how do I know what will be returned?

It will take practice, but there are rules.

We can describe a returned element by 2 characteristics:

  1. Dimensionality: Frame is 2-dimensional, Col is 1-dimensional, and Cell is 0-dimensional

  2. Flow Direction: Col & Frame are column-based, and Row & VFrame are row-based

The return type of a mixed-type expression is determined by the following rules:

  1. Dimensionality: The maximum dimensionality between a and b. (1d + 2d -> 2d, and 2d + 0d -> 2d)

  2. Flow Direction: Favor the first dimensional element in the expression. If not enough information is available (i.e. Cell + list), default to Col or Frame.

Let’s break these down one by one

Dimensionality#

This is the easy one. Just pick the bigger of the two.

  • 2D + 1D -> 2D

  • 0D + 2D -> 2D

  • 1D + 0D -> 1D

Cell (0D) + Frame (2D) -> Frame

[ ]:
frame = Frame([1,2,3], [4,5,6], fill_color=theme.purple1)

cell = Cell(1000, fill_color=theme.orange1)

Stack(
    frame,
    cell,
    cell + frame,
    fill_color=theme.green1,
)

8eccbd5389a040ec99e8a9ac7ecec547

Flow Direction#

Give priority to the first dimensional element (not a Cell) in the expression if possible. Otherwise default to Col and Frame, if a vector must be returned but none of the inputs are directional (i.e. Cell + list)

Col + Row -> Col

[ ]:
column = Col(1, 2, 3)
row = Row(10, 20, 30)

column + row

7b693388e1d644729f3f62af2fcdfc9e

Row + Col -> Row

[ ]:
row + column

7d30194be6954a64bd0b09b8300a835c

Col + VFrame -> Frame

[ ]:
column = Col(10, 20, 30, 40)
vframe = VFrame([1,2,3,4], [5,6,7,8])

column + vframe

ef4f137cd4ca4ff38f22365a754de984

VFrame + Col -> VFrame

[ ]:
vframe + column

7b2e2d27ebaf419e87444fdfa07b5857

list + Cell -> Col (default, since neither Cell nor list have direction)

[ ]:
[1,2,3,4] + Cell(100)

2ed960e8417c4ebaa71f7aac0c535cd1

A more practical example#

Most people will use pandas DataFrames as input data at some point, and want to access elements of a Frame/VFrame by header. Here’s a quick example to show what that looks like.

Note

For more detail on subscripting/slicing elements from a Frame/VFrame, please refer to the “Your First Book” lesson.

[2]:
df = pd.DataFrame(
    zip([1,2,3,4], ['one', 'two', 'three', 'four']),
    columns=['Number', 'Word']
)
df
[2]:
Number Word
0 1 one
1 2 two
2 3 three
3 4 four
[3]:
fr = Frame(df)
fr
[3]:
Number Word
Cell(1) Cell(one)
Cell(2) Cell(two)
Cell(3) Cell(three)
Cell(4) Cell(four)

Add a new column that joins strings from two columns

[5]:
fr['Big Number'] = fr['Number'] & " thousand"
fr
[5]:
Number Word Big Number
Cell(1) Cell(one) Cell({...})
Cell(2) Cell(two) Cell({...})
Cell(3) Cell(three) Cell({...})
Cell(4) Cell(four) Cell({...})
[ ]:
Book(fr).write(PATH)

6a1db7595c15427baee194c19d0065e0

Operators#

Up to this point, you’ve seen the +, *, and & in use, but every other operator you need should be available.

Note

Please read the Python Operators page for a full list of available operators

We’re going to highlight the handful of operators and builtin functions that aren’t as obvious or intuitive as the others.

Python’s Built-In sum()#

Excelbird objects can handle the Python builtin sum() function, with one important catch: You must place arguments inside a list.

So, instead of sum(x) you must write sum([x]), even if x is an iterable element, like Col or Frame. If summing multiple things, proceed normally: sum([a, b, c])

[ ]:
a = Col(1,2,3)
b = Col(4,5,6)

Sheet(
    Frame(a, b),
    Row(
        sum([a]),
        sum([b]),
    ),
)

1e3b4e687eb94a6e98ef4d2e7aa9961a

OR(), and NOT()#

In excelbird:

  • a | b in Pyhon returns OR(a, b) in Excel

  • ~ b in Python returns NOT(a, b) in Excel

[ ]:
a = Cell(True)
b = Cell(False)

Row(
    a,
    b,
    a | b,
    ~ a,
)

2fb93c24efcf408aaf1d0bbb3483cae6

MOD() and %#

In excelbird:

  • a % b in Python returns MOD(a, b) in Excel.

There’s also a hidden feature. In regular Excel, you can convert a cell to a percent by placing the % operator after a cell reference, like =A4%. In excelbird, this is achieved by using the same object on both sides of the operator: a % a. Not two equal things, but the same thing.

[ ]:
ten = Cell(10)
eight = Cell(8)

Row(
    ten,
    eight,
    ten % eight,
    ten % ten,
)

fc083e2c8bc6418396cf561288bab416

Range ‘:’#

In Excel, a range is selected using syntax like A3:A10.

Unfortunately, my_rng = a:b by itself is not possible in Python.

Instead, we’ve chosen the >> operator for the job.

[ ]:
a = Cell(1)
b = Cell(2)

Row(
    a,
    b,
    sum([a >> b]),
)

99c345d9fbfc49d6938f14d1d84c2e07

Other ways to reference a range#

The >> operator is only needed to specify exactly what a range should be. When working with Frames and series, there are easier ways to reference ranges.

.range()#

Every excelbird vector has a .range() method, and they all do the same thing: return a Cell that references the range from the first Cell to the last Cell in the vector.

The main purpose of this on the user’s end is to override the default granularity for an argument in a formula.

Here’s an ordinary Col function, where MAX(...) (when inside Frame, and given Cols as arguments) returns a vector of MAX() functions that each reference individual Cells

[ ]:
from excelbird.fn import MAX

a = Col(1,20,300)
b = Col(600,50,4)

Frame(
    a,
    b,
    MAX(a, b),
)

75c79c6ed8be43e3b90d1d85c73174de

But if we replace one with one.range() in the MAX function…

[ ]:
...
MAX(one.range(), two)

5f137cca82e24cea8098dc5840049b24

.loc#

All excelbird vectors have a .loc property, which offers an alternative to the default subscripting behavior: When subscripted, it returns a range from the elements returned by the slice, instead of the elements themselves.

my_frame.loc[2:5] is equivalent to my_frame[2] >> my_frame[4]. As a reminder, both of these options will return a single Col, where each cell holds a cross range

[16]:
fr = Frame(pd.DataFrame(
    zip([1,2,3], [4,5,6], [7,8,9], [10,11,12], [13,14,15]),
    columns=['One', 'Two', 'Three', 'Four', 'Five']
))
fr
[16]:
One Two Three Four Five
Cell(1) Cell(4) Cell(7) Cell(10) Cell(13)
Cell(2) Cell(5) Cell(8) Cell(11) Cell(14)
Cell(3) Cell(6) Cell(9) Cell(12) Cell(15)

Like normal, we can refer to elements by header instead of index

[ ]:
fr['Two to Four'] = sum([ fr.loc['Two':'Four'] ])

Book(fr).write(PATH)

47c50cd683fd460b9079eb07ed40f554