\( \require{mathtools}\newcommand{\fixedarrow}[1]{\xrightarrow{\hspace{1.5cm}\mathclap{\textstyle\text{#1}}\hspace{1.5cm}}} \)

Chapter 2: Thinking Algebraically

Author

Colin Foster

2.1 Introduction

In this chapter, we move from focusing on thinking multiplicatively to focusing on thinking algebraically.

Algebra is clearly a big feature of school mathematics, and this is not restricted to manipulating expressions and solving equations. As we will see, learners may need to think algebraically even when there are no letters or algebraic symbols around.

This chapter follows the pattern of Chapter 1 in setting out what thinking algebraically means and then showing how this unlocks a wide range of aspects of school mathematics across multiple areas of content.

2.2 Fear of algebra

Ask learners at any school what they find most difficult in mathematics and you can be sure lots of them will say ‘algebra’. Whereas numbers may be seen as being real and everyday, algebra is viewed as being like a mysterious ‘code’, which appears like gibberish to those who are not ‘in the know’. Many learners seem to experience algebra as a set of cryptic rules for manipulating symbols, with the teacher as the arbiter of whether the resulting confusion of symbols counts as ‘correct’ or not. There is often a limited sense that algebra is meaningful or useful, let alone illuminating.

Approaches to tackling ‘algebra anxiety’ often entail trying to make processes seem ‘easy’ by giving learners simple rules to follow. The intention is that this will maximise the chance of success, and success breeds confidence. However, if success merely means getting an answer that someone else assures you is correct, the learner may not be developing much of a deep sense of what they are doing. While they may appreciate being told they are getting things right for a change, they may continue to feel the same sense of being ‘at sea’ with the subject, and they may be unlikely to choose to pursue mathematics for a day longer than they have to. What is needed is a way to help learners make sense of algebra, so that what is correct or incorrect becomes apparent to them, without an external authority having to pass judgment on it. That way learners develop increasing independence from their teacher’s say-so.

Although algebra is traditionally feared and loathed for being difficult, I think it is always worth trying to persuade learners that doing algebra is, in a sense, easier than operating with numbers.1

For example, compare these two questions:

TASK 2.1

1. What is \(156 + 278\) ?
2. What is \(a + b + a\) ?

The second question involves two operations, whereas the first involves only one. And the second equation involves letters, whereas the first doesn’t. Nevertheless, there is no doubt that question 2 is easier than question 1 – provided the learner knows what to do!

We can think of algebra as generalised number - instead of working out numerical answers, we use algebra to capture the structure that would apply to infinitely many numerical cases. It is like a recipe for what you would do, without actually having to do it. When you write down an algebraic equality containing a letter that stands for any possible number, you are effectively doing infinitely many calculations all at once!

People often assume that something ‘abstract’, like algebra, is necessarily harder than more ‘concrete’ things like number. But a number such as \(3\) is quite an abstract object really - it isn’t three oranges, but just ‘three’. You have seen three chairs, three people and three tomatoes, but you have never seen ‘three’ in the abstract.

Concrete things can be difficult too; indeed, people sometimes say ‘There is nothing harder than concrete’! So, there is no reason to assume that algebra must necessarily be experienced as more difficult than anything else.

2.3 The equals sign

When a teacher asks a question like ‘What is …?’, as in the two questions in TASK 2.1, they usually want an answer that is as simplified as possible. So, for question 1, they would want ‘the answer’, which is \(434\). But, mathematically speaking, there are plenty of other things which are equal to \(156 + 278\).

For example, we could write:

… along with many other possibilities. These are all true statements, and so are all ‘equal to’ the given sum.

But are they ‘wrong answers’? Would a teacher be bold enough to place a ‘cross’ next to any of them?

It is unhelpful when questions try to prompt learners to ‘Work out’ or ‘Evaluate’ or ‘Calculate’ merely by ending some expression with an equals sign, as:

\[156 + 278 = \ldots\]

Mathematically, there are infinitely many possible correct responses a learner could give.

Using the equals sign to mean ‘Work out’ risks confusing learners about what the equals sign is actually supposed to mean in mathematics. It does not mean “Compute”, as when you press the equals button on a calculator. The equals sign simply conveys that two expressions are equal in value.2

It is worth being a bit careful with our language. ‘Equal in value’ does not necessarily mean ‘the same as’.

The number \(10\) is not ‘the same’ as \(9 + 1\), since \(10\) is a single number, and \(9 + 1\) is a sum of two numbers. They are different expressions, but they happen to have the same value, so it is appropriate to write \(10 = 9 + 1\) or \(9 + 1 = 10\). The language of ‘the same as’ or ‘is’ can be potentially confusing, and ‘are equal to’ or ‘equals’, have some advantages.

In the world around us, many people have not got this memo, and you will frequently see things like Figure 2.1, even in schools, where the equals sign is used to indicate ‘leads to’ or ‘results in’, and not ‘is equal to’.

Figure 2.1: Non-mathematical use of the equals sign.

If you are making signs like this, then I think an arrow like \(\rightarrow\) would be preferable. We cannot make the wider world do this, but at least around school teachers ought to be able to avoid displaying posters misusing the equals sign.

In school mathematics too, learners will sometimes misuse equals signs by chaining them like this:

\[3 + 7 = 10 \times 4 = 40^{2} = 1600 .\]

It is clear what the learner intends to communicate here, but they are misusing mathematical notation to do so. As it stands, this states that \(3 + 7 = 1600\), which is absurd!

The ‘maximum of one equals sign per line’ rule is intended to prevent this kind of chaining, and would lead to writing three correct statements on three separate lines:

\[3 + 7 = 10\]

\[10 \times 4 = 40\]

\[40^{2} = 1600 .\]

If the learner is really determined to be economical with space, and have it all on one line, they could to use brackets to show the priority of the operations, such as in this way:

\[\left( (3 + 7) \times 4 \right)^{2} = 1600.\]

In other circumstances, it can often be convenient to place multiple equals signs on one line, but it is only allowable when ‘equals really means equals’!

Here is an acceptable example:

\[156 + 278 = 300 + 120 + 14 = 420 + 14 = 434 .\]

Here, four expressions are linked by three equals signs, and every expression is equal to every other expression. There is no problem in cases like this with having multiple equals signs on the same line. It can be helpful for learners to compare and contrast valid and invalid use of the equals sign.

2.4 Algebra as generalised number

Let’s return to the two questions in TASK 2.1, one ‘numerical’ and one ‘algebraic’.

We have discussed the first (numerical) question.

How might someone respond to the second (algebraic) question, which asked ‘What is’ the expression \(a + b + a\) ?

A learner might say, “Well, there are two \(a\)s and one \(b\), so it simplifies to \(2a + b\)”, which is the correct answer.

Writing \(2a + b\) certainly seems easier than computing the sum of \(156\) and \(278\), so maybe it is true that doing algebra is not necessarily more difficult than calculating with numbers.

But is it OK to refer to “two \(a\)s and one \(b\)”? Is this merely informal, like saying ‘times’ rather than ‘is multiplied by’? Or is it an error, that treats \(a\) and \(b\) as ‘objects’, rather than as numbers? One response to that could be to ask, “Aren’t numbers objects – mathematical objects – in the same way a triangle is a mathematical object?”

‘Letter as object’ – often pejoratively called ‘fruit salad algebra’, due to the prominence of ‘apples’ being used for \(a\) and ‘bananas’ for \(b\) – gives us the right answers when ‘collecting terms’, such as

\[3a - 2b + 7a + 4b = 10a + 2b .\]

We might say that “three \(a\)s and another seven \(a\)s makes ten \(a\)s, and if we take away two \(b\)s but then add on another four \(b\)s we end up with 2 \(b\)s altogether”.

But this way of talking falls apart as soon as we try to multiply or divide letters, or raise them to powers:

\[3a \times 2b = 6ab\]

\[\dfrac{12ab}{3a} = 4b\]

\[(3a)^{2} = 9a^{2} .\]

However, if we are prepared to treat something like \(a^{2}\) or \(ab\) as an object in its own right, then we can continue to invoke them ‘as objects’ when simplifying something like \(9a^{2} - 2a^{2} + 3a^{2}\).3 (“There are nine ‘\(a\)-squareds’, and we take away two of them and then add on three more, so we must get ten ‘\(a\)-squareds’ at the end.”)

Nevertheless, the safest approach, especially when beginning to introduce algebra, is always to see letters as representing numbers – at least as far as studying mathematics is concerned.4 In science, variables tend to be equated with the physical reality, so a physicist will write ‘\(v = \text{speed}\)’, and will generally include the units as part of \(v\). So, they will write \(v = 10\ \text{m/s}\), where the ‘m/s’ indicates ‘metres per second’, the unit of speed. Keeping the units within the variable can be convenient when doing dimensional analysis to discover the units of related variables. It is also handy if the units happen to change, because the same \(v\) can be equal to 36 km/h (kilometres per hour), and a physicist would have no trouble writing \(v = 10\ \text{m/s} = 36\ \text{km/h}\).

In mathematics, however, it is usually safer to define the letter \(v\) to be a pure number, either ‘the speed in \(\text{m/s}\)’ or ‘the speed in \(\text{km/h}\)’, and therefore to write either \(v = 10\) or \(v = 36\). Here, \(v\) is the number of m/s or the number of km/h. Clearly, \(v\) cannot be equal to both \(10\) and \(36\), because \(10 \neq 36\).

It is important to realise that in mathematics when we write, say, \(v = 10\), we are not ‘forgetting to include the units’, because, in this way of operating, algebraic letters in mathematics contain no units. The units would be specified when \(v\) is defined and returned to after any algebraic manipulation is completed, to make sense of the final answer. We would interpret the final answer in the context of the question, and the ’speed’ would be stated to be \(10\) m/s.

There is a simplicity about operating consistently throughout all topics by saying that every letter always stands in for a number, and nothing more. It perhaps takes away a little from the mystery of algebra.

An analogy for this comes from the theatre industry. In the theatre, there are fixed parts, such as ‘Elphaba’ in Wicked, but those parts can be played by numerous different performers. Sometimes a standby will cover for a lead role, often jumping in at short notice. We can think of a letter, such as \(e\), as representing the general part of Elphaba, but different numbers as being particular instances of particular performers playing that role. Perhaps \(e = 7\) for the Wednesday matinee, but \(e = 8\) for the evening performance. We could say the numbers stand in for the letters, or the letters stand in for the numbers, depending on how we look at it. But the letter (i.e. the part of Elphaba) is the general role, and the numbers are the specific, real performers who play it.

What \(a + b + a = 2a + b\) is saying is that, for any two numbers, represented by \(a\) and \(b\), if you add up the first number (\(a\)), the second number (\(b\)) and the first number again (\(a\)), you will always get twice the first number, plus the second number (\(2a + b\)). Algebra is always a generalisation of how numbers behave. This means we are always free to swap out the letters for numbers whenever we want to get our feet back on the ground and return to something more concrete. If \(3 + 5 + 3\), say, does not equal \(2 \times 3 + 5\), then our \(a + b + a = 2a + b\) algebra must be wrong.

Of course, if substituting particular numbers does work, that doesn’t guarantee our algebra is necessarily correct.

For example, if we incorrectly stated that \(a + b + a = 2b + a\), rather than \(2a + b\), we would not discover our error by substituting \(3\) for \(a\) and also \(3\) for \(b\). When \(a = b\), it is the case that \(2b + a = 2a + b\), so we will obtain \(9\) both ways.

However, if we do find numbers which fail to satisfy an equation which is supposed to be true for all numbers, then we have definitely shown it to be false. If the algebra is supposed to work for all numbers, then even just one single instance where it fails is enough of a counter example to reject it.

The way we are talking about algebraic equalities means that they are really identities, and it could be argued that we should be writing them as identities, using the special identity symbol \(\equiv\), instead of an ordinary equals sign, \(=\).5 This would mean that we should perhaps write \(a + b + a \equiv 2a + b\). This makes sense, and the only reason I don’t do that is that no one else seems to do it!

Sometimes algebraic equalities are true for not quite all, but almost all, of the values of the letters they contain. For example,

\[\dfrac{{7a}^{5}}{a^{2}} = 7a^{3}\]

is true only for \(a \neq 0\), because division by zero is undefined.

And something like

\[a^{\frac{1}{3}} = a^{\frac{2}{6}} ,\]

which may look innocent enough, is actually true only for \(a \geq 0\), because, for example, \(( - 8)^{\frac{1}{3}} = - 2\), but \(( - 8)^{\frac{2}{6}}\) could be calculated as \(( - 8)^{2} = 64\), and \((64)^{\frac{1}{6}} = 2\), which is not equal to \(- 2\). But we don’t usually worry about this sort of thing at school level.

We have seen that we can simplify \(a + b + a\) to \(2a + b\), but we can’t ‘work it out’ fully without further information.

Learners are sometimes frustrated by this, because they ask, “So what’s the answer then?” They feel that \(2a + b\) cannot be ‘the answer’ when it still contains a multiplication and an addition that seem to have been left ‘undone’. Learners feel there is more that must be completed, and the question cannot be finished yet.

In a way, they are correct, and the drive to see things as simplified as possible is a good one, but we are just doing the best we can. If we knew, say, that \(a = 4\) and \(b = 10\), then we could simplify \(2a + b\) to \(18\), but without more information \(2a + b\) is as far as we can go.

Learners sometimes feel so pressured to obtain an answer that is a single term that they incorrectly simplify something like \(2a + b\) to \(2ab\). One response to that is to ask them to test whether these two expressions come out with the same value when, say, \(a = 4\) and \(b = 10\). They do not, because \(2a + b = 18 \neq 80 = 2ab\).

In this case, the two expressions are equal, however, for infinitely many possible pairs \((a,\ \ b)\), given in general by \[\left( n,\ \ \dfrac{2n}{2n - 1} \right).\] Learners could explore this kind of thing, and find that the only two integer solutions are \((0,\ \ 0)\) and \((1,\ \ 2)\). Spending a few minutes working on something like this can leave a lasting memory that \(2a + b\) and \(2ab\) are generally not equal.

Sometimes I think it is not so much that learners really believe that \(2a + b\) is likely to be equal to \(2ab\) as that they just do not want to stop with an answer not ‘fully worked out’. It can help to point out that an ordinary number like \(24\) could also be viewed as ‘not fully worked out’, since really the notation is a shorthand for \(2 \times 10 + 4\). We can even think of \(4\) as being shorthand for \(1+1+1+1\).

Checking algebra by substituting in easy trial numbers is a very powerful approach, not just for checking for errors, but in supporting learners in sensemaking their algebraic manipulation. At the end of the day, the ultimate test of algebraic manipulation is not what anybody says about whether it follows some rule or not, but whether the expressions are equal to each other or not. What we mean by this is that they are equal regardless of the particular numbers substituted in place of the letters.

When a learner asks, “Is this right?” about some algebraic manipulation, we do not want them to be reliant on the teacher’s authority in saying ‘yes’ or ‘no’. We want them to be able to determine this for themselves, otherwise they will be forever dependent on someone else to verify everything they do. They will always give their oral answers with a rising inflection, as “Two \(a\) plus \(b\)?” rather than confidently as “Two \(a\) plus \(b\)”.

The other way we could go further with \(2a + b\), even if we did not know the separate values of \(a\) and \(b\), would be if we had an additional piece of information about the relationship between \(a\) and \(b\). For example, we might somehow separately know that \(a = b\).

Learners sometimes think this kind of thing is impossible, since otherwise “they wouldn’t be different letters”. There is a subtle point here. In the absence of further information, the values of \(a\) and \(b\) could be equal, but, unless we are told or can deduce that they are, we treat them as though they might be different. Mathematicians are cautious and conservative, at least in this kind of situation. (In other situations mathematicians might be bold and adventurous!) So, if we know that \(a = b\), then we can write \(2a + b = 2a + a = 3a\) or, equivalently, \(2a + b = 2b + b = 3b\), which is the same relationship, since \(a = b\). Whether things can be simplified or not depends not just on the expression itself, but on what other information might be lying around.

With experience, learners will see that evaluating – working out, or calculating with numbers – is generally a hassle, and the beauty of algebra is that there is no evaluating in algebra!

If we see algebra as generalising the properties of numbers, then young children have been ‘doing algebra’ from the very beginning. For example, when they notice they can reorder when adding numbers (e.g. \(2 + 5 + 8 = (2 + 8) + 5 = 10 + 5 = 15\)), they are making an algebraic observation. They are noticing that terms in an addition can be added in any order, and by changing the order they may be able to make the arithmetic easier. That the answers come out the same is ‘doing algebra’, just as much as writing \(a + b + a = 2a + b\) is.

For me, it is not so much that algebra is about using symbols, since \(156 + 278\) consists of symbols. What is ‘\(+\)’, if not a symbol for the operation addition? And numerals like \(5\) and \(2\) are symbols too, since they represent numbers without being the actual numbers themselves. So, for me it is questionable whether \(a\) is any more ‘an abstract symbol’ than \(6\) is. The power of algebra is not so much that it uses letters as that it represents structure and generalises how numbers behave by going beyond specific cases, such as \(156 + 278\), to making statements about how all combinations of numbers behave.

By writing \(a + b\), we’re including \(156 + 278\), but at the same time including all other possible pairs of numbers added together.

When we write something like \(a + b = b + a\), we are saying that \(156 + 278 = 278 + 156\), but also \(371 + 564 = 564 + 371\), and also \(2,359,201 + 6723 = 6723 + 2,359,201\), and so on, including perhaps numbers which are arbitrarily large or not even integers.

2.5 Making sense of tricks

A nice ‘pre-algebra’ task is to present learners with a ‘trick’ of the following kind.

TASK 2.2

Think of a number.
Add \(4\).
Double.
Subtract \(2\).
Halve.
Subtract the number you first thought of.
Your answer is \(3\).

Why?

If different learners try this with different starting numbers, a sense will develop that it does seem that the answer will always be \(3\), but initially this is only a conjecture.

However, it is possible to prove this, initially without needing to use any algebraic letters. This can give a sense of the algebraic thinking, because all we need is some symbol to represent the unknown number - there is no reason why it has to be a letter, let alone the letter \(x\).

For example, we could use a square, and simply draw a picture for each line of the trick, as in Figure 2.2.

Figure 2.2: Representing the process with pictures.

The teacher could show the pictures in Figure 2.2, remove the pictures and ask learners to reproduce their own versions.

The key is realising that the \(4\) becomes \(8\) in the doubling step, and reduces to \(6\) when we subtract \(2\), and then to \(3\) when we halve. By subtracting the number we first thought of, we get the same answer, \(3\), regardless of what that number was, even if it were not a positive integer.

This can be clearer if we represent the process using a specific starting number, but not simplifying at each stage (Figure 2.3).

Figure 2.3: Representing the process with \(7\).

We can treat the symbol \(7\) here as a placeholder for any number. The \(7\) enters and later leaves, and we never use any particular property of \(7\), such as that \(7=2 \times 3 + 1\).

So, learners will see that it would be just the same for any other number. Although the total on any line would be different, the \(3\) at the end would be the same. It is a small step from this to replace the \(7\) with a general number \(n\).

Learners can invent their own similar tricks and try them out on each other.

2.6 Derived facts

A useful way in to the value of algebra is to use it to codify computational shortcuts, often known as derived facts. These are ‘derived’ in the sense that if you know one numerical fact, you can ‘derive’ multiple other ones from it.

For example, suppose you know that

\[2 + 3 = 5 .\]

That is very useful, because you then immediately know many other things, like

\[2000 + 3000 = 5000\]

\[0.2 + 0.3 = 0.5\]

\[2\ \text{kg} + 3\ \text{kg} = 5\ \text{kg}\]

\[\dfrac{2}{7} + \dfrac{3}{7} = \dfrac{5}{7} ,\]

which are all correct facts, derived from the base fact that \(2 + 3 = 5\) .

However, we cannot conclude things like

\[2.1 + 3.1 = 5.1\]

\[2^{2} + 3^{2} = 5^{2}\]

\[\dfrac{1}{2} + \dfrac{1}{3} = \dfrac{1}{5}\]

\[\sqrt{2} + \sqrt{3} = \sqrt{5} .\]

These ones are wrong, although they might sound equally plausible linguistically, when read aloud.6

Algebra helps us distinguish the correct examples, the derived facts, from the incorrect, non-examples that could deceive us.

The statement \(2a + 3a = 5a\) tells us that if there is a common multiplier \(a\), then these statements will be true, deriving their truth from the starting fact \(2 + 3 = 5\). If there is no such common multiplier, then the statements will be false, as shown in Figure 2.4.

Figure 2.4: Comparing examples and non-examples of \(2a + 3a = 5a\).

Here is another kind of derived fact opportunity:

TASK 2.3

Calculate \[99 \times 3782 + 3782 .\]

Starting by working out \(99 \times 3782\) looks a little tricky. However, if we step back and notice that we have \(99\) lots of \(3782\), plus one more lot of \(3782\), then we must have \(100\) lots of \(3782\), which is \(378,200\):

\[99 \times 3782 + 3782 = 99 \times 3782 + 1 \times 3782=100 \times 3782=378,200.\]

Although learners may think that algebra is supposed to make things harder, this pattern is easier to notice if we call \(3782\) by a letter, say \(n\).

Then we can write

\[99n + n = 100n .\]

This will always be true, whatever the value of \(n\).

Learners can make up similar ‘looks-hard-until-you-see-the-trick’ calculations, and they can be very inventive about doing this.

For example, they might create examples like this:

\[85.1 \times 27896 + 14.9 \times 27896 = 2,789,600 .\]

We could generalise this ‘trick’ to a statement involving two letters, say \(a\) and \(n\):

\[an + (100 - a)n = 100n .\]

In cases like this, learners sometimes find it easier to use a third letter, and write two statements:

\[an + bn = 100n, \text{ provided that } a + b = 100 .\]

Now, suppose we happen to know that

\[23 \times 350 = 8050 .\]

Perhaps we worked it out on a calculator, and then lent our calculator to someone else, or the calculator suddenly stopped working, and we don’t have access to another one.

And suppose that we now want to know \(24 \times 350\).

Learners will often begin calculating \(24 \times 350\) from scratch, but actually we can obtain the answer from \(23 \times 350\) if we notice that the difference between \(23\) lots of \(350\) and \(24\) lots of \(350\) must be just one lot of \(350\):

\[24 \times 350 = (23 + 1) \times 350 = 23 \times 350 + 1 \times 350 = 8050 + 350 = 8400.\]

What is the ‘algebraic idea’ behind this computational shortcut?

We could write

\[24a = 23a + a .\]

Twenty-four lots of ‘anything’ (\(a\) stands for ‘anything’) is equal to \(23\) lots of ‘anything’, plus \(1\) more lot of the ‘anything’.

Of course, the ‘anything’ has to be the same ‘anything’ in each term, which is what we assume when we use the same algebraic letter, \(a\).

More generally still,

\[(n + 1)a = na + a.\]

Learners can create many more examples of these kinds of things. The key is to begin with numbers that they can work out and check, and be confident of, and then to use the algebra to express the essence of their ‘trick’ and generate more examples. Algebraic claims always need to be sanity-checked by substituting in actual numbers, otherwise learners can quickly slip into writing nonsense.

Non-mathematicians often assume that mathematicians must enjoy calculating things – it’s what they do. While there may sometimes be some satisfaction in performing routine calculations, really mathematicians are usually looking for ways to avoid calculating.7 They look for shortcuts and generalities that mean that they can solve a problem by minimising the amount of direct calculation involved.

Here is an example:

TASK 2.4

Work out \[(156 + 278) \times (156 - 156).\]

A learner might begin by working out \(156 + 278\), and then \(156 - 156\), and then multiply together the answers to these two calculations. That is a workable plan that will lead to the correct answer. But a better approach is to hold back before beginning to calculate \(156 + 278\) and first take in the whole picture. Notice that the second bracket is much easier to calculate, because it happens to comes to zero. And notice further that whatever \(156 + 278\) comes to, when we multiply it by zero, the answer will necessarily be zero. So, we can avoid calculating \(156 + 278\) entirely, because the result of that sum will make no difference to the final answer.

For me, this is thinking algebraically, because if some time later you told someone else about working on this problem, you probably would not remember the exact numbers. You would forget that the second bracket was \(156 - 156\). You might tell your friend that it was something like \(391 - 391\). And that wouldn’t matter. The important thing would be the structure of the problem, that it was ‘a number minus itself’. And we might write that algebraically as \(n - n\); it would work just as well whatever \(n\) was. The point of the problem could be captured by saying

\[(a + b)(n - n) = (a + b) \times 0 = 0.\]

By using symbols like this, we can express that \(a\), \(b\) and \(n\) can be any numbers, but the two \(n\)s must be equal to each other. The exact numbers used in the example don’t matter.

Of course, there are more sophisticated ‘facts about how numbers behave’, which we capture in identities such as

\[a^{2} - b^{2} \equiv (a - b)(a + b),\]

the difference of two squares, which can also lead to simple ways of calculating.

TASK 2.5

Find an easy way to work out \(57^{2} - 43^{2}\).

One way to do this would be to first work out \(57^{2}\), and then work out \(43^{2}\), and then find the difference.

Alternatively, we could use the difference of two squares to replace \(57^{2} - 43^{2}\) by the equivalent product, \((57 - 43)(57 + 43)\).

Since the second factor is \(100\), we can just evaluate the first factor, \(57 - 43 = 14\), and multiply the answer by \(100\), to get \(1400\).

This may make the calculation easy enough for a learner to do quickly in their head – an excellent party trick (provided you go to the right kind of parties!).

The ideal task to help learners become familiar with identities like the difference of two squares is for them to devise calculations like this one, that can be done easily by using them. One possibility is to devise calculations too long for learners’ calculators to do with complete accuracy:8

TASK 2.6

Find an easy way to work out
\[{111,111,111,111}^{2} - {111,111,111,110}^{2}\] and \[\dfrac{{222,222,222,222}^{2}}{444,444,444,444} .\]

Learners may say that the ‘easiest way’ is to just use a calculator, but they would be wrong if their calculator cannot handle such long numbers without rounding and going into standard form.

By using the difference of two squares identity,

\[{111,111,111,111}^{2} - {111,111,111,110}^{2} = (111,111,111,111 + 111,111,111,110) \times 1\]

\[= 222,222,222,221.\]

Similarly,

\[\dfrac{{222,222,222,222}^{2}}{444,444,444,444} = \dfrac{(2 \times 111,111,111,111)^{2}}{4 \times 111,111,111,111}\]

\[= \dfrac{4 \times {111,111,111,111}^{2}}{4 \times 111,111,111,111} = 111,111,111,111.\]

Learners can get creative and generate all kinds of similar problems.9

2.7 Solving equations

2.7.1 Understanding the objective

There is a perception among many learners that algebra exists to make things more difficult, rather than easier. The teacher may claim that algebra is a powerful tool for solving problems, but learners’ experiences may clash with this.

It must be admitted that sometimes algebra is used unnecessarily in school to make easy questions seem much harder. For example, learners might be asked to solve an equation like \(x + 2 = 6\) by using inverse operations (subtract \(2\) from both sides), when everyone can immediately see that the solution is \(x = 4\).

The physicist Richard Feynman described how as a child he tried to help his older cousin with solving equations:10

I said to my cousin then, “What are you trying to do?” You know, I hear him talking about \(x\). He says, … “\(2x + 7\) is equal to \(15\) … and you’re trying to find out what \(x\) is.” I says, “You mean \(4\).” He says, “Yeah, but you did it with arithmetic, you have to do it by algebra,” and that’s why my cousin was never able to do algebra … there’s no such thing as, you know, you do it by arithmetic, you do it by algebra - that was a false thing that they had invented in school so that the children who have to study algebra can all pass it. They had invented a set of rules which if you followed them without thinking could produce the answer: subtract \(7\) from both sides, if you have a multiplier, divide both sides by the multiplier and so on, and a series of steps by which you could get the answer if you didn’t understand what you were trying to do.

For Feynman, there is no ‘right way’ to find the value of \(x\). The important thing is to understand that \(x\) stands in place of a ‘mystery’ number, and you need to find what that number could be. Any method is fine if it works; if you can solve an equation by inspection, all the better for you!

For this reason, it can be beneficial to begin solving equations by posing problems that are a little too difficult to solve immediately by inspection. It may seem counterintuitive not to begin with the simplest possible examples, but if we want learners to see why algebra is useful, then starting with something slightly more complicated – for example, linear equations with the unknown on both sides, or with non-integer solutions – may be an advantage.11 If the equations learners are presented with are so easy they can immediately do them in their heads, why would they be motivated to learn a method or even to write anything down?

Researchers have classified linear equations into arithmetic equations, in which the unknown appears on only one side of the equation, and non-arithmetic equations, in which the unknown appears on both sides. The former can be solved ‘with arithmetic’, as Feynman’s cousin would say, by ‘doing and undoing’, but the non-arithmetic ones can’t. The phase change between these two types of equations represents a jump in difficulty, and has been termed a didactic cut.12 The key to solving a non-arithmetic equation is often to reduce it to an arithmetic equation.

Indeed, solving equations may not be the logically simplest way to begin working on algebra, and it may make more sense to spend considerable time exploring expressions first.13 But solving equations can be an easy way to motivate learners, by posing equations as being about solving a mystery to find the unknown number.

For example, we could ask learners who have not yet been taught a formal method what sense they can make of an equation written like this:

\[5x - 11 = 3x + 9.\]

If this means nothing to them, we could state something equivalent in words, and ask them to make the connections: “I’m thinking of a mystery number. Five times my number, minus \(11\), is equal to the same as \(3\) times my number, plus \(9\). What could my number be?”

If this is too confusing, we could break it up further:

TASK 2.7

Leillah and Rajib are thinking of the same mystery number.

Leillah multiplies the number by \(5\) and subtracts \(11\).
Rajib multiplies the original mystery number by \(3\) and adds \(9\).
They both get the same answer.

What could their number be?

Of course, this version is longer, but in mathematics longer versions of tasks can often be easier to understand than more concise ones.

Rather than asking, “What is my number?” (or “What is their number?”), I prefer to ask “What could my/their number be?”, because that keeps open the possibility that there could be more than one answer. The teacher knows that this is a linear equation, and therefore can have only one solution. But we shouldn’t assume this is obvious to learners.

If the teacher allows learners to stare at this equation for some time, someone may well spot the solution, \(x = 10\), by inspection. Probably most learners will not immediately see this, and perhaps no one will. It is fine if the solution is given publicly – this does not spoil the discussion at all. You could even decide to tell learners at the start that ‘\(x = 10\)’ is a solution. (I would try to avoid saying that ‘\(10\)’ is a solution, because ‘\(10\)’ is just a number. The solution is a statement that ‘\(x\) is equal to \(10\)’, and so I prefer to say that ‘\(x = 10\)’ is the solution.) Somehow, getting the solution out of the way can free everyone’s mind to focus on what is coming, as otherwise learners may be only partially following because they are still trying numbers in their head to see what \(x\) could be.

Sharing the solution publicly really clarifies what the objective is – to find a number which ‘fits’ or ‘works’ or satisfies the equation.

2.7.2 Balancing equations

Now that it is claimed that \(x = 10\) is the (or perhaps merely ‘a’) solution, we can ask learners how they can check this:

\[ \begin{aligned} \text{Left-hand side } &= 5 \times 10 - 11 = 39, \\ \text{Right-hand side } &= 3 \times 10 + 9 = 39. \end{aligned} \]

This makes the equation equivalent to:

\[39 = 39.\]

I would discourage learners from writing the checking all in one go, as

\[5 \times 10 - 11 = 3 \times 10 + 9,\]

because it is easy for them to misunderstand what they are doing here. They may equally happily write an equality like this, with different values substituted, in which the equality does not hold; for example, as

\[5 \times 6 - 11 = 3 \times 6 + 9,\]

without noticing that the equals sign is incorrect here. It is generally a good discipline in proving an equality to handle each side separately, by writing ‘\(\text{LHS }=\)’ and ‘\(\text{RHS }=\)’ for ‘left-hand side’ and ‘right-hand side’ respectively, as above.

Now there are two tasks for the learner to accomplish:

  1. Discover whether there might be any other values of \(x\) which will work.

  2. Discover how to find the solution, \(x = 10\), without depending on a lucky observation, or even a more systematic process of trial and improvement.

We can accomplish both of these tasks together.

The balancing method is usually introduced by imagining a two-pan balance (Figure 2.5).

Figure 2.5: A two-pan balance.

These are less common in daily life these days, as scales are now always digital. Even seesaws in children’s playgrounds nowadays tend to operate with springs rather than as a traditional balance. But the imagery of the scales remains in depictions of Lady Justice, such as at the Central Criminal Court or the Old Bailey in London, UK, so it is not completely alien.14

We do not have to use the image of the balance to do ‘balancing’. We can simply say that if we add or subtract the same amount to two equal things, then they will remain equal.

For example, if Leillah has \(£30\) and Rajib also has \(£30\), and I give each of them another \(£10\), they will both have \(£40\), and so they will both have the same amount as each other, although not the same amount as they began with (Figure 2.6).

Figure 2.6: If Leillah and Rajib have equal amounts to begin with, then if each of them receives \(£10\), they will still have equal amounts.

Even if I don’t know how much money they each began with, provided I know that they were equal beforehand, they must be equal after I give them both another \(£10\). We could imagine that both Leillah and Rajib had a spare \(£5\) in their back pockets that I didn’t know about. Provided they had the same amount extra, the conclusion will still be valid.

To take this a stage further, even if I don’t tell you how much money I give to each of them, provided I give them both the same amount, and they had equal amounts before, then they are guaranteed to have the same amount as each other afterwards.

The same goes, of course, for subtracting amounts from both people. Even if I subtract more than they begin with, we can handle that with negative numbers (debts).

And the same goes for scaling both people’s money up or down by the same multiplier. A simple example of scaling would be to change both people’s currency from GB pounds to US dollars. If they had the same amount of pounds as each other beforehand, they will have the same amount of dollars as each other afterwards.

It is important that learners appreciate this quite abstract point.

So, to solve our equation, and find \(£x\), we can give both sides (both Leillah on the left and Rajib on the right) \(+ 11\) pounds:

Never mind about ‘Why \(+ 11\)?’ for now, rather than some other number. The aim for the moment is just that we agree that Leillah and Rajib remain in balance with each other. The equals sign in \(5x - 11 = 3x + 9\) tells us that they had the same total to begin with. Now they have both increased by \(11\), they must still have the same total, so the equals sign can stay.

It is worth noting that we don’t know what this total is. We don’t (yet) know what value \(5x - 11\) has, and we don’t know what value \(3x + 9\) has. To know that, we would have to know the value of \(x\), which is the very thing we are trying to find out. But, although we don’t know what those two values are, we do know they are equal. And, since \(11 = 11\), we are adding equals to equals, and so Leillah and Rajib will be just as equal after this addition as they were before.

Quite often, we know that things are equal, without knowing exactly what values they have. We can compare two people, and see that they have the same height, but might not have a tape measure handy to discover how tall they are. If two people of equal height step onto identical chairs, then the new heights (them plus the chair) will also be equal, even though we don’t know what the new heights are either. Similarly, we can fit objects inside a box, and know that they just fit because the object and the space in the box have the same dimensions. Or we can observe two people finishing a race at exactly the same time, even though we might not know their race times in seconds.

Occasionally a learner will ask, “If you are giving \(11\) to both of them, does that mean that altogether you are giving them \(22\)?” This question is a bit unhelpful, as it takes the focus away from Leillah and Rajib separately and onto whoever is supplying the money, but I find that learners do sometimes ask about it. It is natural to wonder where all this free money is coming from! I would just say that yes, that is true, but our focus is on how much Leillah and Rajib have each, rather than where any extra money is coming from.

Simplifying what we have written gives

\[5x\ \ \ \ \ \ \ \ \ \ = 3x + 20.\]

I find it useful, at least initially, to keep each term lined up vertically, to make it easier to see what corresponds to what.

Now, we remove \(3x\) from both Leillah and Rajib. Since we don’t know how much \(x\) is, we obviously don’t know how much \(3x\) is either, so we don’t know how much we are removing from both sides. But we do know that, however much it is, it is the same amount being subtracted from both people. And so the equality of the two sides is being preserved.

Simplifying again, we have

\[2x\ \ \ \ \ \ \ \ \ \ = \ \ \ \ \ \ \ \ \ \ \ \ 20.\]

Now that we can see the value of twice \(x\), we can halve both sides to find the value of \(x\) itself:

\[\ \ x\ \ \ \ \ \ \ \ \ \ = \ \ \ \ \ \ \ \ \ \ \ \ 10.\]

So, the number I was thinking of was \(10\), as we already knew, of course. But now we have a method that we can always use to get there. No matter what equation like this we began with, we would always be able to find things to do to both sides to eventually leave us with just \(x\) on one side and just its value on the other. And because all the steps are reversible, we could just as well go back from \(x = 10\) to the original equation. This tells us that \(x = 10\) is not just a solution but the complete solution. It is impossible for any other value of \(x\) to satisfy the equation. Anything greater than \(10\) will be too big, and anything less than \(10\) will be too small. So we are done.

Following this, I would get learners to explain each step to their partner, each taking a turn, so everyone has the chance to self-explain the process step by step.15

2.7.3 Developing strategy

Now we still need to go back to the beginning and go through it all again, but this time focused on the strategy, rather than on the preservation of equality. I would generally stay with a single example and try to delve into what it means in detail, rather than rushing learners through solving lots of different equations, without taking the time to really examine what is happening.

Preserving equality is necessary – it is like the rules in chess, which tell you how each of the pieces is allowed to move. If you break those rules, then you are not playing chess!

But to win at chess, it is not enough just to follow the rules and make allowable moves. You also have to try to defeat your opponent! That is the part that is about strategy, and is the bit that makes chess an interesting game.

I have summarised in Figure 2.7 the parallels between solving equations and playing chess, which I think can also be quite useful to share with learners. Any game of skill will do - it doesn’t have to be chess.

Figure 2.7: Three stages in solving equations, by analogy with playing chess.

Here, to solve \(5x - 11 = 3x + 9\), our first step was to add \(11\) to both sides.

Let’s try something else instead. Let’s add \(15\) to both sides, and see what happens.

Learners will end up with

\[5x - 11 + 15 = 3x + 9 + 15.\]

Is equality preserved here? Yes, because we added the same amount to both sides, so we are playing the game we are supposed to be playing. The equals sign deserves to be there.

Let’s simplify now:

\[5x + 4 = 3x + 24.\]

Hopefully learners’ directed numbers (Chapter 1) are up to calculating \(( - 11) + 15 = 4\).

Equality is preserved, but this time we don’t seem to be making progress in solving the equation. The equation we have produced is about as difficult to solve as the one we started with!

Why did adding \(15\) not help, but adding \(11\) did?

Make it the learners’ job to explain this. If they aren’t sure, then repeat the \(+ 11\) step, and then do something like adding \(12\), say, instead, to see the same problem as we had when adding \(15\).

What is special about \(+ 11\) is that it cancels out the \(- 11\), and reduces us from four terms (two each side) to three terms (one on the left-hand side and two on the right-hand side). Leillah’s side gets simpler when we add \(11\) to both sides, and that gets us closer to solving the equation.

If the teacher is lucky, someone will ask whether subtracting \(15\) from both sides might not also be useful as the first step, and that is a great opportunity to see that there could be other good strategies besides \(+ 11\). There is not one right way to solve an equation. There are more efficient and less efficient ways, and sometimes there are two ways that are about equally efficient. Then, it is personal preference which you choose.

We can continue in this way, interrogating the strategy behind each step. We have to maintain equality, otherwise we don’t have equations any more. We would have to replace our equals signs with \(\neq\), and then we would not be able to conclude much at the end! So, initially, any time a learner successfully preserves equality by what they choose to do, there is cause for celebration. Whatever they have done at least has not made things worse – our equation has not been ruined.

But, increasingly, we want to be strategic, and get closer to finding what \(x\) could be, and there is not really a need for a lot of rules here. The only rule is to ask yourself, “Does this get me an equation that is easier to solve than the previous one?” If so, that constitutes progress, so carry on with more steps. If not, go back to the previous equation and look for something better to do.

This is not only a great approach to solving linear equations; it is actually a great approach to solving all equations: do what helps you get closer to finding what \(x\) could be! This reflects Feynman’s view (see Section 2.7.1) that there need not be any rules other than to do things that help you find \(x\).

Learners can now choose their own \(x\) (or whatever letter they prefer), beginning with integers, but soon extending to non-integers, and make up an equation like ours for their partner to solve. Before handing it over to their partner, they must substitute in their \(x\) value and double-check that it satisfies the equation!

2.7.4 Developing fluency

Once the idea of being strategic is understood, learners need to develop fluency in solving equations of gradually increasing sophistication.

One task for developing fluency in solving equations is to work on Expression Polygons.16,17,18

TASK 2.8

The figure below shows an expression polygon consisting of four algebraic expressions.
Each expression is connected to every other expression, forming six equations.


Solve the six equations, writing each solution next to the corresponding line.
What do you notice?

Learners should discover that the solutions for \(x\) are \(\left\{ 1,\ 2,\ 3,\ 4,\ 5,\ 6 \right\}\); the first six positive integers.

This is a start, but so far this has just been ‘warm up’. The real task is to ask learners to try to invent an expression polygon of their own, with different expressions, that will produce a ‘nice’ set of solutions. What counts as ‘nice’ is for learners to decide. They might try to create an expression polygon that produces the first six prime numbers, or the first six even numbers, or the first six triangle numbers.

This is challenging, and beginning with an expression triangle, with just three expressions, or an expression quadrilateral, with four expressions, but omitting the diagonal lines, giving just four equations to work with, is much more accessible. Eventually learners might challenge themselves to make an expression pentagon or hexagon.

Sometimes learners begin by adapting the given expression polygon, such as by adding \(5\) to each expression or multiplying each expression by \(3\). They are often surprised that this does not change the solution set, but this makes sense, because they are transforming both sides of each equation in the same linear way.

Occasionally, a learner will realise that they can obtain the first six even numbers as solutions by replacing each \(x\) with \(\dfrac{x}{2}\), followed by perhaps scaling up all the expressions by a multiple of \(2\), to clear fractions. Equivalently, just doubling all the constant terms gives the same result, as shown in Figure 2.8.

Figure 2.8: An expression polygon producing the first six even numbers.

A very common way of practising solving equations is to use ‘I am thinking of a number’, followed by a series of operations, such as adding \(5\) or dividing by \(2\), leading to a given number. One learner invents the puzzle and another has to try to solve it to work out the original number.19

2.7.5 Common confusions

It is important for learners to become flexible in accepting unknowns expressed as letters other than \(x\). It may seem like a small point, but a lot of frustration in doing algebra comes from mix-ups caused by learners being unused to writing individual letters clearly. In other school subjects, joined-up handwriting may be the expectation, and learners may be out of practice at writing individual letters clearly. They may even sometimes benefit in other subjects from writing letters ambiguously, so that the teacher cannot be sure that a word is definitely misspelled.

However, in mathematics confusing one letter for another, or for a number, spells disaster. A letter \(x\), if not made curly, can be mistaken for a multiplication sign, a \(z\) can look like a number \(2\), a \(g\) can look like a \(9\), a \(b\) like a \(6\), and so on. Lowercase and capital letters are easily confused, and handwriting has to be very careful if you want to distinguish, say, an \(s\) from an \(S\) or a \(p\) from a \(P\). I sometimes challenge learners to see if they can write an expression like \(5s + S^{5} + 5^{s}\) so that someone else can read it out correctly, and this often highlights the issue. In order to be ‘cruel to be kind’, I sometimes give equations to solve that deliberately contain letters and numbers that could be easily confused, such as \(6b - 66 = 16b - 606.\)

Throughout this section, I have presented the idea of balancing – preserving equality by doing the same operations to both sides of an equation – as fundamental. A useful check on learners’ understanding of this can be to present them with incorrect transformations of an equation containing units, and ask them to explain why they are wrong.

For example, if we begin with the true equation

\[0.1\ \text{m} = 10\ \text{cm},\]

and ‘square both sides’, we seem to obtain

\[{0.1}^{2}\text{ m} = 10^{2}\ \text{cm}\]

\[0.01\ \text{m} = 100\ \text{cm},\]

where the left-hand side has become \(10\) times smaller while the right-hand side has become \(10\) times larger!

The problem is that we have altered the numbers but left the units unchanged. Since squaring is not a linear operation, the units need squaring too, to give

\[{0.1}^{2}\ \text{m}^{2} = 10^{2}\ \text{cm}^{2}\]

\[0.01\ \text{m}^{2} = 100\ \text{cm}^{2},\]

which is a true statement, since \(1\ \text{m}^{2} = 100\ \text{cm}\ \times 100\ \text{cm} = 10,000\ \text{cm}^{2},\) and \(0.01\ \text{m}^{2}\) is one-hundredth of this. After squaring, our equation about lengths has become one about area.

We can do something similar with other units.

For example, if we begin with the true statement that

\[\dfrac{1}{4}\text{ yard} = 9\ \text{inches},\]

and this time square root both sides, we seem to obtain

\[\dfrac{1}{2}\text{ yard} = 3\ \text{inches}.\]

This time, the left-hand side has doubled, whereas the right-hand side has become a third as much!

Clearly, again, we have not truly done ‘the same thing’ to both sides. As with squaring, square rooting is not a linear operation, and so the units have to change too.

What we actually obtain from square rooting is

\[\dfrac{1}{2}\text{ }\sqrt{\text{yard}} = 3\ \sqrt{\text{inch}},\]

but \(\sqrt{\text{yard}}\) and \(\sqrt{\text{inch}}\) are not familiar enough units for us to appreciate the truth of this statement!

If, instead, we begin with the true area statement that

\[\dfrac{1}{4}\text{ }\text{yard}^{2} = 324\ \text{inch}^{2},\]

and square root both sides of that, we obtain

\[\dfrac{1}{2}\text{ yard} = 18\ \text{inches},\]

which is correct.

2.8 Beyond linear equations

Once learners can confidently solve linear equations with the unknown on both sides, like

\[3x - 5 = x + 13,\]

what might be the next logical step in their equation-solving journey?

Schemes of learning will often go in the direction of making the numbers harder, both those that appear in the equation and the eventual numerical solution, offering equations like:

\[ \begin{aligned} 3x - 5 &= x + 14 \\ 73x - 75 &= 7x + 73 \\ 3.2x - 5\dfrac{1}{3} &= 6.1x + 1.35. \end{aligned} \]

While tackling these might be valuable, it can become more about rehearsing arithmetic than learning anything about algebra. The attention has shifted from the solving of the equations to performing operations with fractions and decimals.

At some point, learners will go on to meet simultaneous equations (Chapter 4) and quadratic equations (Section 2.10.1 and Section 2.10.2) as the next levels of equation-solving difficulty. But perhaps there are other directions that learners might go in that build more directly on what they have learned by this point. The elimination method of solving simultaneous equations involves actions like ‘adding two equations together’ (shorthand for ‘adding the left-hand sides and separately adding the right-hand sides’), which can feel like something quite new, although really it is still just adding the same thing to both sides of one of the equations (Chapter 4). Embarking on quadratic equations may mean learning the factorisation method of solution, based on the zero-product property, which again could feel like something quite new (Section 2.10.1). Moving in these directions may seem to some degree disconnected from what has gone before.

It is worth thinking about why we care about teaching learners to solve equations in a world in which nowadays a mobile phone app will not just perform numerical calculations but can capture a photograph of an equation and give, not just the solution, but all the steps. Has solving equations become a redundant skill?

I do not think so. There have always been quicker ways to solve an equation than the balancing method I have outlined above. We could sum up the entire thing by giving learners a formula:

The solution to

\[ax + b = cx + d,\ \ \text{when}\ a \neq c\ \text{is}\]

\[x = \dfrac{d - b}{a - c}.\]

This would be analogous to the quadratic formula for writing down the solutions to a quadratic equation (see Section 2.10.2).

But I have never taught linear equations by providing a formula, because what I care about is not so much that learners can find an answer but that they understand the BIG Idea that performing the same operations to both sides of an equation leaves the equality intact. That seems to me such a powerful idea, with so many applications in mathematics, that it is fundamental to thinking algebraically, regardless of whether learners ultimately end up solving equations by hand very often.

If we want to prioritise the ideas behind solving equations, then an alternative to making the numbers harder, before teaching simultaneous or quadratic equations, might be to use backward chaining20 to gradually increase the level of complexity in the equations learners are asked to solve. Backward chaining involves teaching something new by chaining it on to the beginning of a sequence that is already familiar. The first thing you do is the new thing, and then you are on familiar ground with the rest of the task.

For example, in the case of non-arithmetic equations (with the unknown on both sides), such as \(5x - 11 = 3x + 9\), we have seen that the first thing for learners to do could be to perform an operation on both sides that results in the unknown appearing on only one side (converting a non-arithmetic equation to an arithmetic one), like \(2x - 11 = 9\). From that point on, the learner practises their familiar routine for solving arithmetic equations. When this new development is secure, you are ready to bolt another new thing onto the beginning.

Perhaps next you might offer equations like this:

\[\dfrac{5x - 11}{3} = \dfrac{3x + 9}{2}.\]

Here, the first step could be to clear the fractions by ‘cross multiplying’; i.e., by multiplying both sides of the equation by \(2 \times 3 = 6\):

\[2(5x - 11) = 3(3x + 9).\]

Then, after expanding to give

\[10x - 22 = 9x + 27,\]

we are back to the kind of equation we already know how to solve, so the rest should be routine.

We keep generating useful practice of what has gone before, while adding new aspects to maintain interest and develop further the range of what learners can handle.

Next, they could try something like

\[\dfrac{3}{5x - 11} = \dfrac{2}{3x + 9}\ \ \ \ \ \ \ \ \ \text{or}\ \ \ \ \ \ \ \ \ \ \dfrac{3x + 9}{5x - 11} = \dfrac{2}{3},\]

which turn out to be exactly the same equation in disguise, once we have multiplied up.

By gradually complexifying in this kind of way, learners will eventually be able to solve complicated equations involving algebraic fractions.

Another approach is to move from linear equations like

\[3x - 5 = x + 13\]

to non-linear equations like

\[3\sqrt{x} - 5 = \sqrt{x} + 13.\]

Solving a radical equation like this is identical up to the point at which learners obtain \(\sqrt{x} = 9\) (i.e. it is ‘linear in \(\sqrt{x}\)’). There then follows a final step, in which both sides must be squared, to give \(x = 81\).

In this approach, the new thing comes at the end, rather than at the beginning, so this is forward chaining, in contrast to backward chaining.

Deliberately using the square number \(9\) for \(\sqrt{x}\) here is intended to offer the possibility for learners to be productively confused, and perhaps mistakenly conclude that \(x = 3\) (or \(\pm 3\)).

If this happens, then this radical equation could be contrasted with the related quadratic equation

\[3x^{2} - 5 = x^{2} + 13,\]

which does lead to \(x^{2} = 9\), and thus \(x=\pm 3\).

With mistakes like this, a productive strategy can be to present the learner with the equation they have accidentally solved and ask them to solve that – their solution was the right solution to this different equation – before going back to the original one and thinking about what they therefore need to do differently.

It is good for learners to encounter a quadratic equation of this kind (with no linear term in \(x\)), and have the opportunity to think about the double solution of \(x = \pm 3\) separately from the other complexities of solving more general quadratic equations, which will come later (Section 2.10.2).

We can also foreshadow completing the square (Section 2.10.2) by preparing learners by getting them to solve equations like

\[(x - 2)^{2} = 9\ \ \ \ \ \ \ \ \ \ \ \ \ \text{ and}\ \ \ \ \ \ \ \ \ \ \ \ \ \ (x - 2)^{2} + 1 = 10,\]

which are exactly what they will need to handle then.

With non-linear equations, the habit of always checking at the end by back-substituting the final answer is particularly important.

For example, a similar-looking equation such as

\[3\sqrt{x} + 5 = \sqrt{x} - 13,\]

with just the \(+\) and \(-\) signs swapped, may look innocent enough, but this leads to \(\sqrt{x} = - 9\), which is impossible, because the square root of every real number is non-negative.

However, if the learner ignores this problem, and simply squares both sides, they will obtain \(x = 81\), the same as before. However, this solution is extraneous, and does not satisfy the original equation:

\[\text{LHS} = 3\sqrt{81} + 5 = 32 \neq - 4 = \sqrt{81} - 13 = \text{RHS.}\]

This means that the equation \(3\sqrt{x} + 5 = \sqrt{x} - 13\) has no real solutions.

With non-linear equations, we must never just assume that the value equal to \(x\) that we arrive at at the end of a solution is necessarily going to satisfy the original equation. This is a misconception which learners who have only encountered linear equations will almost certainly develop unless it is explicitly addressed.

Other accessible variations on the equation

\[3x - 5 = x + 13\]

could include even wilder things like

\[3x^{3} - 5 = x^{3} + 11\]

or

\[3x^{1.2} - 120 = 2x^{1.2} + 8\]

or

\[2x^{4} - 5 = x^{4} + 11\]

or

\[\dfrac{3}{x} + 5 = \dfrac{1}{x} + \dfrac{17}{3}.\]

Learners could try to devise equations like these which can be solved using the ‘linear’ method they know, perhaps with the constraint of having integer (although not necessarily positive) solutions.

All these equations are essentially ‘linear’ – linear in \(\sqrt{x}\) or \(x^{2}\) or \(x^{3}\) or \(x^{4}\) or \(\dfrac{1}{x}\), as simple transformations, such as writing \(y = \sqrt{x}\), reveal. So, here we are of course not expecting learners to solve quadratics, cubics and quartics in their general forms. Instead, we can think of all these as being ‘linear with a twist at the end’. The principle in each case is to isolate the awkward term and then ‘undo’ it in the final step.

A more complicated ‘beyond linear’ avenue for older learners to explore is radical equations of the form

\[\sqrt{x} + \sqrt{x + a} = \sqrt{b}, \text{ where }b \geq 0.\]

Squaring both sides gives

\[x + 2\sqrt{x}\sqrt{x + a} + x + a = b.\]

Isolating the awkward term again,

\[2\sqrt{x}\sqrt{x + a} = b - a - 2x,\]

and, squaring for a second time,

\[4x(x + a) = (b - a)^{2} - 4x(b - a) + 4x^{2}.\]

Simplifying, we find that the \(x^{2}\) terms cancel out (Why is this? When does this happen?), to obtain the (linear in \(x\)) equation

\[0 = (b - a)^{2} - 4bx,\]

meaning that

\[x = \dfrac{(b - a)^{2}}{4b}.\]

This will be undefined if \(b = 0\), but for \(b > 0\) it is perhaps surprising that we obtain one solution. However, we of course need to check that it satisfies the original equation!

In this solution, we have carried out two non-reversible steps (the two squaring steps), so we may have introduced extraneous solutions that do not satisfy the original equation.

This is analogous to beginning with something like

\[x = - 3\]

and squaring both sides to obtain

\[x^{2} = 9,\]

and then solving this to obtain

\[x = \pm 3,\]

which consists of the original solution (\(- 3\)), plus a spurious one (\(+ 3\)), which is merely an artefact produced by squaring both sides.

Beginning with \(x = 3\), instead of \(x = - 3\), would have led us to an identical result of \(x^{2} = 9\), after squaring, and so we have no way of knowing which of these two values (\(3\) or \(- 3\)) we might have begun with.

This means we need to substitute \(\displaystyle x = \dfrac{(b - a)^{2}}{4b}\) back into our original equation to see under what conditions it will be a solution – we can’t assume that it always will be.

With \(b > 0\), we obtain from

\[\sqrt{x} + \sqrt{x + a} = \sqrt{b}\]

the left-hand side

\[\sqrt{\dfrac{(b - a)^{2}}{4b}} + \sqrt{\dfrac{(b - a)^{2}}{4b} + a},\]

and we want to know if this will be equal to \(\sqrt{b}\).

Simplifying, we obtain

\[\dfrac{|b - a|}{2\sqrt{b}} + \sqrt{\dfrac{(b - a)^{2} + 4ab}{4b}}\]

\[= \dfrac{|b - a|}{2\sqrt{b}} + \sqrt{\dfrac{(a + b)^{2}}{4b}}\]

\[= \dfrac{|b - a|}{2\sqrt{b}} + \dfrac{|b + a|}{2\sqrt{b}}.\]

This will be equal to \(\sqrt{b}\) if and only if

\[|b - a| + |b + a| = 2b.\]

Since \(b > 0\), this will happen only if \(b \geq |a|\).

For example, the equation \(\sqrt{x} + \sqrt{x - 7} = 7\), where \(a = - 7\) and \(b = 49\), has the solution \(x = 16\), but the equation \(\sqrt{x} + \sqrt{x - 7} = 1\), where \(a = - 7\) and \(b = 1\), does not have the solution \(x = 16\).

Working with the general solution in terms of \(a\) and \(b\) is demanding, but using specific numbers gives equations like \(\sqrt{x} + \sqrt{x - 7} = 7\) that can be valuable for learners to experiment with.

2.9 Expanding and factorising

Whenever you can do a process, a very mathematical question to ask is, “Can I go back?”

Inverse processes are usually harder: subtraction is harder than addition; division is harder than multiplication; taking roots is much harder than finding powers.

In this section, we’ll consider the opposite pair of ‘expanding brackets’ and ‘factorising’, and their parallels with the factors and multiples of the positive integers.

Expanding brackets is the forward direction, and factorising is the (harder) inverse process. I think it is helpful to point this out to learners, because the symmetry of ‘doing and undoing’ can make learners think that each direction should be equally easy, and if they find factorising hard then they may feel there must be something wrong with them. Factorising is hard, and finding it more difficult is just normal. The best approach I know to seeing what is going on with expanding and factorising is to use the excellent free software Grid Algebra.21

2.9.1 Expanding

It is natural to begin expanding by using a positive integer multiplier, such as \(5\), and thinking about what \(5(x + 2)\) will make when expanded out.

This is \(5\) lots of \(x + 2\), which we could sum as a vertical list:

\[ \begin{array}{c@{\hspace{1cm}} r c l} & x & + & 2 \\ & x & + & 2 \\ & x & + & 2 \\ & x & + & 2 \\ + & x & + & 2 \\ \hline & 5x & + & 10 \\ \hline \end{array} \]

However many lots of \(x + 2\) we have, even if it is, say, \(y\) lots, and we might not know the value of \(y\), we will obtain that many lots of both \(x\) and of \(2\), so \(y(x + 2) = xy + 2y\), and it is easy to extend this to cases where the ‘\(2\)’ might be negative. It seems reasonable to assume for now that the same thing will apply when \(y\) is not a positive integer.

A useful task for expanding and simplifying brackets is the following:22

TASK 2.9

Use additions and subtractions of multiples of the brackets below to make an expression that simplifies to \(5x + 8y\).
You can use as many or as few of these brackets as you like:
\[(x + y)\ \ \ \ \ \ \ (x + 2y)\ \ \ \ \ \ \ (x - 2y)\ \ \ \ \ \ \ (x + 4y)\ \ \ \ \ \ \ \text{and}\ \ \ \ \ \ \ (2x + 3y)\]
For example, you could choose the brackets \((x + 2y)\) and \((x + 4y)\):
\[\square(x + 2y) \pm \square(x + 4y)\]
What numbers would need to go in the boxes to make the entire expression equal to \(5x + 8y\)?

Learners will initially struggle to find the numbers that they need (e.g. \(6\) and \(- 1\), for the example given), but the task generates lots of practice at expanding and simplifying.

Various mnemonics are in common use for expanding a pair of binomial brackets (i.e. where there are two terms in each bracket). Some of these involve acronyms (e.g. FOIL – first, outer, inner, last) and others are purely visual (e.g. Figure 2.9 and Figure 2.10).

Figure 2.9: Smiley face mnemonic for expanding a product of two binomials.
Figure 2.10: Grid mnemonic for expanding a product of two binomials.

One advantage of the grid approach of Figure 2.10 is that it is readily extended to cases where there are more than two terms in one or more of the brackets, although extending to more than two brackets involves having more than two dimensions. The grid approach also allows links to be made with calculating the area of rectangles.

A good way to judge whether learners have a more than merely procedural understanding of what they are doing is to ask a question like, “Why is there no \(ab\) term in the expansion?” We have four letters in total, so if we are pairing them off, shouldn’t we get \(_{}^{4}C_{2} = 6\) terms, rather than four? Why not \(ab + ac + ad + bc + bd + cd\) ?

The key thing for learners to appreciate is that terms in the same bracket, that are added together, have no reason to become multiplied together. Each term in the first bracket must be multiplied by each term in the second bracket, but that is all – there is no multiplying of terms in the same bracket.

So, the product will consist of \(2 \times 2 = 4\) terms. Once learners understand this, they may be able to dispense with the mnemonics, and just be systematic about going through each term in the first bracket and multiplying it by each term in the second bracket. It is just like finding all possible combinations of T shirts and shorts (Chapter 1). That way, learners will be able to cope with brackets containing more than two terms and even, ultimately, with more than two brackets.

It can sometimes be helpful at the start to expand a pair of brackets one bracket at a time.

For example, \[ \begin{alignedat}{5} &(x + 5)(x - 3) &&= &&x(x - 3) &&+ 5 &&(x - 3) \end{alignedat} \] is equivalent to \[ \begin{alignedat}{5} &\rlap{(x + 5)y}\phantom{(x + 5)(x - 3)} &&= &&\rlap{xy}\phantom{x(x - 3)} &&+ 5 &&\rlap{y,}\phantom{(x - 3)} \end{alignedat} \] where \(y = x - 3\).

From here, we can continue by writing

\[= x^{2} - 3x + 5x - 15 = x^{2} + 2x - 15.\] We can also do this in the opposite order, expanding the \(x - 3\) factor first:

\[ \begin{alignedat}{5} &(x + 5)(x - 3) &&= &&(x + 5)x &&+ &&(x + 5)( - 3), \end{alignedat} \] which is equivalent to \[ \begin{alignedat}{5} &\phantom{(x + 5)}\llap{y}(x - 3) &&= &&\phantom{(x + 5)}\llap{y}x &&+ &&\phantom{(x + 5)}\llap{y}( - 3), \end{alignedat} \] where, this time, \(y = x + 5\).

We of course complete the expansion in the same way.

Expanding pairs of binomials is analogous to multiplying pairs of two-digit numbers.

To multiply, say, \(23\) and \(59\), we have to work out four products and add them together:

\[ \begin{alignedat}{4} (20 + 3)(50 + 9) &= 20 \times 50 &&{}+ 20 \times 9 &&{}+ 3 \times 50 &&{}+ 3 \times 9 \\ &= 1000 &&{}+ 180 &&{}+ 150 &&{}+ 27 \\ &= 1357. \end{alignedat} \]

We will pick up on this when looking at the standard calculation algorithms later in this chapter (Section 2.10.7).23

2.9.2 Factorising

Learners will need lots of experience with putting brackets in and taking them out, converting backwards and forwards between expanded and factorised forms. As is always the case with inverse processes, the key for example to being good at dividing is being really good at multiplying. So here, the key to being good at factorising is being really familiar with expanding, so that we spot potential factors that could have led to a given expansion.

In this context, unlike when working with numbers, a ‘factor’ does not necessarily have to be a positive integer.

For example, if we are very used to the idea that something like \(x(x + 5)\) will expand into \(x^{2} + 5x\), then it is easy to spot the ‘common factor’ of \(x\) and factorise \(x^{2} + 5x\) back into \(x(x + 5)\). This is true regardless of whether \(x\) happens to represent a positive integer or not.

As with all inverse problems, the way to see whether any factorisation is correct or not is always just to expand it and check. The teacher should never need to say, “Yes, that’s right” – if the learner cannot tell for themselves, then they do not know what they are doing.

It is particularly helpful when learners make errors to ask them to expand and see why their proposed factorisation was wrong - rather than just moving them on to what is correct. For example, if a learner writes \(x^{2} + 5x = x(x + 5x)\), instead of the correct factorisation \(x(x + 5)\), the best way for them to see not only that they are wrong, but exactly why they are wrong, is to expand \(x(x + 5x)\) and get \(x^{2} + 5x^{2} = 6x^{2}\), which reveals the consequence of the unnecessary \(x\) in \(5x\). If they cannot correctly expand \(x(x + 5x)\), then expanding is what they need to work on, rather than factorising.

Once learners can confidently expand products such as \((x + 5)(x - 3)\) into \(x^{2} + 2x - 15\), they can be encouraged to think for themselves how they would go back from expanded to factorised form.

In this example, where did the \(2x\) come from? Why did it end up being \(2\) lots of \(x\), rather than some other multiple? Where did the \(- 15\) come from?

If learners think hard about this, they will realise that the coefficient of \(x\) will always be the sum of the two numbers (\(5\) and \(- 3\)) and the constant term will always be the product of the two numbers.

The \(2x\) term is the sum of the two linear terms and the constant term \(- 15\) is the product of the two constants, one in each bracket.

This doesn’t need to be memorised as a rule (which could easily be muddled up and accidentally reversed). It is well worth helping learners to see that if they ever were to forget how to factorise, all they would need to do would be to invent a pair of brackets for themselves, with nice simple terms in them, expand the brackets, and look at what happened. This is a powerful approach in any kind of ‘inverse problem’ situation – “just try something forwards and look carefully at it”. It can relieve a lot of stress if the teacher repeatedly offers strategies like this for ‘in case you forget’.

Learners can generate lots of useful practice by inventing a pair of brackets, expanding them, and then swapping with a partner to see if they can factorise back to the original brackets. The fact that mistakes will be made, both in expanding and in factorising, leads to lots of useful discussion, checking, and becoming aware of common errors.

There are many approaches to factorising non-monic quadratics, which are quadratics in which the coefficient of the squared term is not \(1\).

It can be useful to compare our original monic quadratic, \(x^{2} + x - 6\), with what changes if we stick a ‘\(2\)’ on the front, to obtain \(2x^{2} + x - 6\).24

Learners may realise that to obtain a \(2x^{2}\) term they will need one of their brackets to contain \(x\) and the other \(2x\):

\[2x^{2} + x - 6 = (2x\ldots)(x\ldots).\]

The product of the constant terms must be \(- 6\), but there are twice as many possibilities for us to try this time, since we not only need to figure out which factor pair of \(- 6\) we need to use, but also which of each pair of factors to put in the ‘\(2x\)’ bracket and which in the ‘\(x\)’ bracket.

It is perfectly feasible in this case to go through each possibility to see which one works, but as the numbers become larger (and have more factors) this can become very tedious.

We’ll do it this time, just to see.

Here, the factor pairs of \(- 6\) are \((1, - 6),\ (2,\ - 3),\ ( - 1,\ 6)\) and \(( - 2,\ 3)\), so the eight possible pairs of brackets are these:

All of these are guaranteed to give the correct term in \(x^{2}\), as well as the correct constant, but they will give different terms in \(x\). By expanding each pair of brackets, which may be useful variation-theory-style practice for the learners,25 they will discover that the terms in \(x\) come out as:

The shaded cell, where the linear term is just \(x\), indicates that \((2x - 3)(x + 2)\) is the correct factorisation of \(2x^{2} + x - 6\).

If learners have worked through all of this, this should have clarified the issues, but also generated a need for having a better method than trying all these possibilities every time. If learners do not see the need for a better method, they might like to imagine using this approach to factorise, say, \({12x}^{2} + 11x - 5\).

My preferred method for non-monics is to reduce them to monics, which we already know how to factorise.

To do this with \(2x^{2} + x - 6\), we need the \(2x^{2}\) to be \((2x)^{2}\), so we have to multiply through by \(2\), and so we must divide through by \(2\) as well, to cancel that out:

\[2x^{2} + x - 6 = \dfrac{(2x)^{2} + (2x) - 12}{2}.\]

This is quite a tricky step, but a powerful ‘change of variable’ approach that I think it is useful for learners to encounter.

I find that learners are very tempted to simplify it back to \(2x^{2} + x - 6\), but that is going backwards. We have written it this way for a reason, which is that the the numerator is now a monic quadratic ‘in \(2x\)’. This is easier to see if you write, say, \(y = 2x\), which gives

\[\dfrac{y^{2} + y - 12}{2}.\]

But it is possible to work it through without changing the variable, just by treating the \(2x\) as a ‘unit’.

We need a pair of numbers that sum to \(1\) (the coefficient of \(y\)) and have a product of \(- 12\), so the numbers must be \(4\) and \(- 3\).

So, factorising, we get

\[\dfrac{(2x)^{2} + (2x) - 12}{2} = \dfrac{((2x) + 4)((2x) - 3)}{2}.\]

The number in the denominator will always be whatever the coefficient of \(x^{2}\) was, and it must always end up being a factor of the numerator, because we began by multiplying through by it.

So, we just have to cancel it in one or other of the numerator’s factors, in this case the first one:

\[\dfrac{(2x + 4)(2x - 3)}{2} = (x + 2)(2x - 3).\]

This is the same factorisation we obtained above by trying every possibility systematically.

Just to illustrate how this works for the trickier non-monic quadratic I mentioned above, we begin by writing:

\[{12x}^{2} + 11x - 5 = \dfrac{(12x)^{2} + 11(12x) - 60}{12}.\]

There is never any need to multiply out the coefficient of \(x\) in the numerator, so we just leave it as \(11(12x)\), because we are thinking in terms of \(12x\) as our unit now. All we have to do now is factorise a monic quadratic, by finding a pair of numbers with a product of \(- 60\) and a sum of \(11\). It must be \(15\) and \(- 4\).

So,

\[\dfrac{(12x)^{2} + 11(12x) - 60}{12} = \dfrac{(12x + 15)(12x - 4)}{12}.\]

This time, we finish by cancelling the \(12\) in the denominator with \(3\) from the first bracket and \(4\) from the second bracket, to obtain \((4x + 5)(3x - 1)\).

As always, it is well worth expanding this (perhaps mentally), just to ensure it matches the expression we started with. Establishing a checking culture among learners is a solid habit, and also incidentally generates lots of useful additional practice. I also think it often clarifies what the purpose was of what we just did. The task here was to find a pair of brackets that was equal to the given expression; by expanding those brackets, we remind ourselves what we were trying to do all along.

2.10 What does thinking algebraically get us?

To really see what is going on, sometimes we have to step back from the nitty gritty and take in the big picture. The writer Jorge Luis Borges wrote that “To think is to forget details, generalize, make abstractions”. Similarly with mathematics, to really understand number, we sometimes have to get away from the specificity of particular numbers and try to see more generally how they behave. That is what school algebra is all about.

The ability to manipulate symbols is certainly part of this, but the objective is to capture the structure of number and appreciate how numbers behave. We want learners to understand that letters always stand in for numbers, and to have the confidence to switch letters for numbers to test and check statements. We want learners to appreciate the meaning of the equals sign and the importance of maintaining an equality by doing the same operations to both sides. We want them to have developed some strategic skill in choosing operations that do this, but that also move them closer to achieving their goal. And we want learners to be comfortable converting between expanded and factorised forms of the same expression.

We will now consider some of the payoffs of these abilities to think algebraically. As with thinking multiplicatively, in Chapter 1, if the groundwork has been well set up, with solid foundations, applying the BIG Idea to these different areas, while not necessarily easy, should be satisfying and rewarding, rather than stressful and mystifying. What follows here should appear to learners as examples of the algebraic thinking they have already learned, and which provide practice and purpose.

2.10.1 Solving quadratics by factorisation

The big benefit of writing expressions in factorised form is that we can much more easily see their structure and properties.

The properties of a number such as \(650\) are much easier to discern when it is written as \(2 \times 5^{2} \times 13\). We can see at a glance that it is not a square number, it is even, and that it is a multiple of \(10\), \(25\), \(26\) and \(13\), for example. Writing a number in factorised form exposes its structure, as we will see more of later in this chapter (Section 2.10.9.3).

Similarly, an algebraic expression written unfactorised, as \(x^{5} - 5x^{3} + 4x\), conceals the structure, whereas in its factorised form, as \((x - 2)(x - 1)x(x + 1)(x + 2)\), the structure is exposed.

For example, in its unfactorised, expanded form, we can discover by substitution that it will be equal to zero for five different values of \(x\):

Substituting into \(x^{5} - 5x^{3} + 4x\) in this way, it seems almost miraculous that all five consecutive integer values of \(x\) give the same zero result!

By contrast, when substituting into the factorised form, it seems obvious:

This time, in each substitution there is one bracket, shown in grey above, that is clearly equal to zero. So, now there is no surprise at all that the product comes out to be zero every time.

We can verify by expanding \((x - 2)(x - 1)x(x + 1)(x + 2)\) that it can be written using less ink (but not necessarily ‘more simply’) as \(x^{5} - 5x^{3} + 4x\). But how on earth could we get from the expanded form back to the factorised form? I mentioned earlier (Section 2.9.2) that factorising is harder than expanding, and this seems a particularly extreme example of that.

Actually, the steps are not too difficult.

First we factor out the common factor \(x\):

\[x^{5} - 5x^{3} + 4x = x(x^{4} - 5x^{2} + 4).\]

Now, the quartic in the bracket is really ‘a quadratic in disguise’, because it is ‘quadratic in \(x^{2}\)’. There are no odd-powered terms to complicate things. This might be plainer to see by writing \(y = x^{2}\), so the bracket becomes \(y^{2} - 5y + 4\), which is clearly quadratic in \(y\). By putting brackets around \(x^{2}\), we might be able to see the \(x^{2}\) as a ‘unit’, and avoid introducing a new variable \(y\), and just write:

\[ \begin{aligned} x(x^{4} - 5x^{2} + 4) &= x\bigl( (x^{2})^{2} - 5(x^{2}) + 4 \bigr) \\ &= x\bigl( (x^{2}) - 4 \bigr)\bigl( (x^{2}) - 1 \bigr). \end{aligned} \]

Now, by using the difference of two squares twice on this, we obtain \(x(x - 2)(x + 2)(x - 1)(x + 1)\).

The important point is that zeroes are extremely powerful when they come in products, because they ‘kill’ the entire product.

For example, consider this task:

TASK 2.10

Simplify
\[(p - a)(p - b)(p - c)\ldots(p - y)(p - z) .\]

Someone who begins by expanding and simplifying \((p - a)(p - b)\), and then multiplying the result by \((p - c)\), and so on, is in for an enormous amount of work!

Alternatively, by stepping back and thinking about the entire expression, we see that something interesting is going to happen in the \(16\)th bracket. Buried within this string of brackets is one special bracket, which will be \(p - p\). And that is the only factor that matters, because it is equal to zero, and therefore, regardless of the values of the other factors, the entire product will be zero.

This ‘kill feature’ of zeroes in products is at the heart of the zero product property that underlies the method of solution of quadratic equations by factorisation.

I usually introduce this by asking if learners can think of two numbers which multiply to make zero. Often they begin by suggesting pairs like \(3\) and \(- 3\), which sum to zero, but have a non-zero product; in this case, \(- 9\). Eventually, someone will suggest a number and zero, such as \(3\) and \(0\), and I find that learners often laugh at this possibility, as though it is some kind of ‘cheat’ answer, and does not count as a proper answer. If you continue asking for more possibilities, you will get other numbers paired with zero, and perhaps \(0\) and \(0\).

Then I would ask, “Can you give me a pair of numbers that multiply to make zero, neither of which is zero?” After some thought, learners will conclude that they can’t, and this is the essence of the zero-product property:

If \(ab = 0\), then either \(a = 0\) or \(b = 0\).

It is important that learners don’t think that both \(a\) and \(b\) have to be zero at the same time. That is ‘overkill’.

We saw with the \(26\) factors in \((p - a)(p - b)(p - c)\ldots(p - y)(p - z)\) that just one factor being zero is sufficient to destroy the entire product. If \(10\) numbers multiplied together make zero, we can’t conclude that all of them must be zero, just that at least one of them must be. ‘Either … or …’ in mathematics always includes ‘…or both’, so if more than one factor in a product is zero, then the zero result is over-determined (true for more than one reason), but it isn’t necessary.

When faced with a quadratic equation, such as \(x^{2} + x = 6\), I find that learners are often reluctant to ‘put everything onto the left-hand side’, to obtain \(x^{2} + x - 6 = 0\). It can feel like a very odd, unbalanced move to shift everything all onto just one side of an equation. A learner might say, “So it’s all equal to nothing? Why are we bothering with it, if it’s just zero? Why don’t we just forget about it?”

I think it can be helpful to begin by seeing that there are other ways to solve quadratics by factorisation that don’t involve using the zero-product property.

For example, with \(x^{2} + x = 6\), we could leave the \(6\) on the right-hand side and just factorise the left-hand side, by taking out an \(x\). This is easy factorisation, because it is of a binomial, not a trinomial:

\[x(x + 1) = 6.\]

How do we ‘read’ an equation like this? Perhaps as follows: “A number, multiplied by a number one higher, is equal to \(6\)”. Alternatively, if we assume that the numbers are integers, we could say, “The product of two consecutive numbers is \(6\)”.

Once learners make sense of this language, they will quickly think of “\(2\) and \(3\)”. So, is the solution ‘\(x = 2\) or \(x = 3\)’? No, because only the \(2\) was equal to \(x\) here; the \(3\) was equal to \(x + 1\). So, we only have one solution from this: \(x = 2\).

However, with a bit more thought, we realise that if we are looking for a pair of consecutive numbers with a product of \(6\), and there is no reason to disallow negative numbers, then \(x\) could just as well be \(- 3\), because one more than that is \(- 2\), and \(( - 3) \times ( - 2)\) is also equal to \(6\).

This kind of thing always happens in this kind of situation, because if \(2\) and \(3\) are \(1\) apart, then \(- 3\) and \(- 2\) will also necessarily be \(1\) apart (Figure 2.11).

Figure 2.11: Pairs of consecutive integers that have a product of \(6\).

With experience, learners who know their tables will be able to look at an equation like \(x(x + 1) = 56\) and say, almost immediately, that \(x = 7\) or \(x = - 8\), because they are thinking of the two pairs of numbers \((7,\ \ 8)\) and \(( - 8,\ - 7)\).

If learners have not previously encountered quadratic equations, it may be a shock to them that there could be more than one value of \(x\) which solves a single equation. This is the kind of thing where the ‘curse of knowledge’ of the teacher may prevent them from seeing how strange this could appear.26 The learner may think of mathematics as always having ‘one right answer’, but in many cases a mathematics answer will have multiple components. For example, ‘What are the factors of \(12\)?’ has multiple parts to the answer, just as “Give me a factor of \(12\)” has multiple possible answers.

The notion of a solution set is critical to thinking algebraically. The solution set contains all the values of \(x\) that satisfy the equation.

It might contain nothing; i.e., be an empty set, if the equation has no solutions. For example, the equation \(x + 1 = x + 2\) could have solutions only if somehow \(1 = 2\), which is impossible, so \(x + 1 = x + 2\) has no solutions.

Or a solution set might include just one value (e.g. for linear equations).

For quadratics, the solution set could include \(0\), \(1\) or \(2\) solutions.

For inequalities, the solution set frequently includes infinitely many values.

We have obtained our solution to the equation \(x^{2} + x = 6\) as \(x = 2\) or \(x = - 3\), but of course we must always check that these numbers actually satisfy the equation:

It is important that learners realise that checking that the solutions satisfy the original equation is not just a check that we haven’t made an arithmetical blunder. When we go step by step, solving an equation, unless every step is reversible, there is always the possibility that we have introduced additional spurious solutions. We are finding statements that are true if the given equation is true, but that is not the same as saying that the equation we begin with is true if these statements we are deriving are true. Our conclusions about \(x\) need to be checked in the original equation at the end.

While finding solutions by factorisation may feel like ‘guesswork’, and be limited to situations in which the solutions are integers, the same kinds of limitations apply to the conventional solution method, where we ‘put everything onto the left-hand side’.

Starting with \(x(x + 1) = 6\), we would write:

\[ \begin{aligned} x(x + 1) - 6 &= 0 \\ x^{2} + x - 6 &= 0. \end{aligned} \]

At this point in the process, we again have to ‘guess’ a pair of numbers that will sum to \(1\) (the coefficient of \(x\)) and have a product of \(- 6\). This is analogous to having to think of two consecutive numbers with a product of \(6\), so to me it involves an exactly equal amount of ‘guesswork’.

Learners sometimes object to the ‘guessing’ aspect of factorising, but really ‘guessing and checking’ is an extremely powerful tool in mathematics, and nothing to be ashamed of! Provided we do the ‘checking’ part, it can be just as rigorous, because if we know that quadratics have up to two solutions, then if we find two solutions, we know we are done - there is no possibility of any others.

The two numbers we need here to factorise the quadratic are \(- 2\) and \(3\), so,

\[x^{2} + x - 6 = (x - 2)(x + 3) = 0.\]

Then, using the zero-product property, we have that either \(x - 2 = 0\) or \(x + 3 = 0\), giving us \(x = 2\) or \(x = - 3\), as before.

To my mind, there are perhaps more ways to go wrong here; for example, with making errors when moving the terms onto the left-hand side and factorising, and going from \(x - 2 = 0\) to \(x = - 2\), rather than \(x = 2\). These problems don’t exist when factorising \(x(x + 1) = 6\).

It is just as straightforward to use simple factorisation even if the constant term has more factors.

For example, let’s solve

\[x^{2} + 2x - 24 = 0.\]

Using the simple method, we put the \(24\) onto the right-hand side and factorise whatever is left:

\[x(x + 2) = 24.\]

Now, we are looking for two numbers that are \(2\) apart this time, with a product of \(24\). We think of \(4\) and \(6\), so \(x = 4\) is a solution. But then we remember that the two numbers could also be \(- 6\) and \(- 4\), because they also have a difference of \(2\) and a product of \(24\), so \(x = - 6\) is our other solution.

We can also solve in this kind of way quadratic equations in the form \((x - a)(x - b) = c\), where \(a\), \(b\) and \(c\) are constants, none of which have to be zero.

For example, we could solve an equation like \((x + 2)(x - 7) = - 20\) by looking for two numbers that are \(9\) apart, because the gap between \(x + 2\) and \(x - 7\) is \(9\), as shown in Figure 2.12, that have a product of \(- 20\).

Figure 2.12: The gap between the factors \(x - 7\) and \(x + 2\) is \(9\).

Because the product is negative this time, one of these numbers (i.e. one of the brackets) must be positive and one negative (i.e. in Figure 2.12 the gap must cross zero), and so the sum of their magnitudes must be \(9\).

So, the pair of factors must be either \(4\) and \(- 5\) or \(5\) and \(- 4\). This means that \(x + 2 = 4\) and \(x - 7 = - 5\), which both imply that \(x = 2\), or else \(x + 2 = 5\) and \(x - 7 = - 4\), which both imply that \(x = 3\). This gives us a solution set of \(x = 2\) or \(x = 3\).

2.10.2 Solving quadratics by completing the square

Many learners experience difficulties making sense of quadratic equations. We have looked at solving quadratics by factorising, but another way to introduce quadratic equations is to work towards the method of completing the square. There may be benefits to teaching this first, before teaching the method of factorisation.27

I would often offer learners the equation \(x^{2} + 2x = 15\) and see what sense they can make of it.28

Perhaps they will interpret the \(x^{2}\) as meaning \(2x\). This is a common difficulty with squares, and I hope that by including both \(x^{2}\) and \(2x\) as separate terms that it might focus learners on how these are different. But if not, they might end up with \(4x = 15\) and so \(x = \dfrac{15}{4}\). If they consistently interpret \(x^{2}\) as meaning \(2x\), then they will not discover their error when they back-substitute to check, and it might be necessary to probe, “What is the difference between the meaning of \(x^{2}\) and \(2x\)?

The other natural thing that learners do when trying to solve this equation is to look for operations they can perform on both sides to try to isolate \(x\). For example, they might begin by square-rooting both sides, but perhaps simplify incorrectly. Although they are not successfully solving the equation, they are doing sensible things. Even if they obtain incorrect answers, by substituting them back in to check, they are reminding themselves what it means to try to solve an equation: the whole idea is that we want to find out what \(x\) could be.

Often, learners will start to try possible numbers, sometimes systematically, and it doesn’t usually take long before someone finds that \(3^{2} + 2 \times 3 = 15\).

Is that the end? Is \(3\) the only possibility for \(x\)?

It might seem so, since if they try a smaller number, like \(x = 2\), then \(2^{2} + 2 \times 2 < 15\), and if they try a larger number, like \(x = 4\), then \(4^{2} + 2 \times 4 > 15\), so it may well look like \(3\) must be the only value that could work.

Can we be sure that no value greater than \(3\) will be a solution?

This may seem obvious to the learners, but it is good to try to make a convincing argument. As \(x\) increases beyond \(3\), both \(x^{2}\) and \(2x\) increase, so we can say that definitely, for all \(x > 3\), \(x^{2} + 2x > 15\). It is really powerful to be able to draw a conclusion about all numbers greater than \(3\), without having to try all these infinitely many numbers one by one! We now know there is no point trying any number greater than \(3\).

What about if \(x < 3\)? Can we conclude that \(x^{2} + 2x\) will always be less than \(15\)?

This is more difficult, because as \(x\) decreases below \(3\), \(2x\) decreases, but the behaviour of \(x^{2}\) as \(x\) decreases below \(3\) is more complicated. This usually prompts learners to try some smaller numbers, including negative numbers, and discover - perhaps with great surprise - that \(x^{2} + 2x\) first decreases but then starts to increase again!

If we try a large negative number, like \(x = - 100\), then \(x^{2} = 10,000\), not \(- 10,000\), and \(10,000\) is much larger than the negative \(2x\) term (\(- 200\)), and so the total ends up being large and positive.

If \(2^{2} + 2 \times 2 < 15\) and \(( - 100)^{2} + 2 \times ( - 100) > 15\), then presumably, somewhere in between \(2\) and \(- 100\), there ought to be another value of \(x\) that satisfies our equation \(x^{2} + 2x = 15\). (The implicit assumption here is that \(y=x^2\) is a smooth curve, with no sudden jumps.)

A bit of searching then leads to the solution \(x = - 5\).

Is this the end, or could there be more solutions?

We can sketch the graphs of \({y = x}^{2} + 2x\) and \(y = 15\) (or \(y = x^{2}\) and \(y = 15 - 2x\)) and ‘see’ the two solutions, \(x = 3\) and \(x = - 5\), and how the graph reverses direction exactly halfway between them (i.e., at \(x=-1\)) (Figure 2.13).

Figure 2.13: The graphs of \({y = x}^{2} + 2x\) and \(y = 15\) intersect when \(x^{2} + 2x = 15\).

The shape of the quadratic graph (parabola) suggests that some quadratic equations, like this one, will have two solutions, some will have just one (Figure 2.14(a)), and some will have none (Figure 2.14(b)). By changing the constant on the right-hand side (\(15\)), we can move the horizontal line in Figure 2.14 down so far that it just touches the bottom of the parabola, or below that, where it doesn’t intersect with the parabola at all.

Figure 2.14: (a) One intersection with \(y = - 1\) and (b) none for \(y = c\) where \(c < - 1\).

In some countries, completing the square may be the first method taught for solving quadratic equations. This has the advantage that it is kind of the most logical thing you might try to do, if you were building on your thinking from solving linear equations. It also has the advantage that, unlike factorisation, it works for all quadratic equations, and it doesn’t involve ‘guessing some numbers that will work’, as tends to happen in the factorisation method, which can feel to students more like finding solutions by trial and improvement than truly solving an equation.

In other countries, completing the square is regarded as a difficult method that might only be taught to the highest-attaining students, and even they may prefer to use ‘the formula’:

The solutions to \(ax^{2} + bx + c = 0\), if they exist, are given by

\[x = \dfrac{- b \pm \sqrt{b^{2} - 4ac}}{2a},\ \ \ \ \ \ \ \ a \neq 0.\]

Although learners often think that nothing can beat ‘having a formula for it’,29 it is easy to go wrong when substituting into this formula, such as by mixing up the priority of operations (Section 2.10.6), or thinking that \(b^{2}\) is negative when \(b\) is negative.30 It is also easy to misidentify \(a\), \(b\) and \(c\) when an equation is given in a non-standard form, such as \(5 - x^{2} = 3x\). I have even heard learners muddling up the quadratic formula with Pythagoras’ Theorem to give “minus \(b\), plus or minus the square root of \(a\) squared plus \(b\) squared equals \(c\) squared”!

However, if we foreshadow completing the square, for example by including equations such as

\[(x - 1)^{2} = 16 \qquad \text{ and } \qquad (2x - 3)^{2} - 9 = 0,\]

(as discussed in Section 2.8), to be solved before formally working on quadratic equations, then the leap to completing the square may not be so enormous. Another advantage of completing the square is that work done on this can lead to lots of incidental practice of algebra skills that learners need to practise anyway.

If learners are very used to expanding brackets of the form \({(x + a)}^{2}\), then they can be helped to notice that the left-hand side of \(x^{2} + 2x = 15\) is ‘almost’ \({(x + 1)}^{2}\), which we know is equal to \(x^{2} + 2x + 1\).

This is really the only tricky step in the whole process, and is the main thing to spend time on. We need to get the two appearances of our unknown \(x\) (once in \(x^{2}\) and once in \(2x\)) down to one appearance.

This is analogous to what happens when rearranging an equation such as \(x + y = 3x - 4y\) to make \(y\) the subject (Section 2.10.5). Learners will often focus on one of the appearances of \(y\), say the one on the left-hand side, and isolate this letter, without taking account of the fact that there is also a \(y\) on the right-hand side.

So, they will write

\[y = 3x - 4y - x = 2x - 4y.\]

Equality has been preserved, which is good, but we do not have a formula for \(y\) in terms of \(x\) only. If all I know is \(x\), I cannot use the equation in this form to find \(y\), because I would need to plug both \(x\) and \(y\) into the right-hand side to find out what \(y\) is, and \(y\) is the quantity I am trying to find!

Instead, the learner needs to first group all of the \(y\)s together:

\[x + y = 3x - 4y\]

\[x + 5y = 3x\]

\[5y = 2x\]

\[y = \dfrac{2x}{5}.\]

Now, \(y\) is expressed in terms of \(x\) only.

With completing the square, we are just doing something very similar, in replacing \(x^{2} + 2x\) with \({(x + 1)}^{2} - 1\), in which the letter \(x\) appears only once.

So, \[ \begin{alignedat}{3} & x &&{}^{2} + 2x &&{}= 15 \\ ( & x &&{}+ 1)^{2} - 1 &&{}= 15 \\ ( & x &&{}+ 1)^{2} &&{}= 16 \\ & x &&{}+ 1 &&{}= \pm 4. \end{alignedat} \] So, either \(x = 3\) or \(x = - 5\).

The overall aim is to get a perfect square (i.e. some expression squared) equal to a constant, which is then ideally prepared for us to square root both sides.

It was nice in this example that the \(16\) was a square number, but that wasn’t essential. Provided we don’t mind irrational answers, there is no need for the completed square to end up equal to a square number.

We can use completing the square to derive the quadratic formula that I quoted earlier.

If we start with \(ax^{2} + bx + c = 0\), the algebra turns out to be neater if we begin by multiplying through by \(4a\), which gives us

\[4a^{2}x^{2} + 4abx + 4ac = 0.\]

This is a convenient ‘trick’, invented with the benefit of hindsight, which learners shouldn’t expect to have thought of doing for themselves.

We can now complete the square by noticing that

\[(2ax + b)^{2} = 4a^{2}x^{2} + 4abx + b^{2},\]

so if we add \(b^{2}\) to both sides of our scaled up equation, we will get

\[ \begin{alignedat}{3} & 4a^{2}x^{2} + 4abx && {}+ 4ac + b^{2} && {}= b^{2} \\ & (2ax + b)^{2} && {}+ 4ac && {}= b^{2} \\ & (2ax + b)^{2} && && {}= b^{2} - 4ac \\ & \phantom{(}2ax + b && && {}= \pm \sqrt{b^{2} - 4ac} \\ & \phantom{(}2ax && && {}= - b \pm \sqrt{b^{2} - 4ac} \\ & \phantom{(2a}x && && {}= \dfrac{- b \pm \sqrt{b^{2} - 4ac}}{2a}. \end{alignedat} \]

I find that learners generally really like seeing this derived, even if they aren’t expected to remember (or even necessarily follow or check) every little detail.

2.10.3 Surds

If learners are comfortable with collecting like terms and with the difference of two squares identity, then there is not too much to the topic of surds.31

They will need to become convinced that \(\sqrt{a} \pm \sqrt{b}\) is not in general equal to \(\sqrt{a \pm b}\).

It is easier to see that this must fail with the subtraction, because if \(b > a\) then \(\sqrt{a - b}\) will be attempting to square root a negative number, which is undefined. (Learners will not yet have met complex numbers.)

So, there is no chance at all that \(\sqrt{12} - \sqrt{27}\), say, could possibly be equal to \(\sqrt{12 - 27} = \sqrt{- 15}\), because \(\sqrt{- 15}\) doesn’t exist, whereas \(\sqrt{12} - \sqrt{27}\) does.

This may cause learners to doubt the positive version too, that \(\sqrt{a} + \sqrt{b} = \sqrt{a + b}\), and one way to approach this is to ask them to test it out with some numbers.

They will discover that \(\sqrt{a} + \sqrt{b} = \sqrt{a + b}\) if and only if either \(a = 0\) or \(b = 0\).

We can prove this by squaring both sides:

\[\left( \sqrt{a} + \sqrt{b} \right)^{2} = a + b.\]

Expanding the left-hand side,

\[a + 2\sqrt{a}\sqrt{b} + b = a + b\]

\[2\sqrt{a}\sqrt{b} = 0,\]

which can only be true if either \(a = 0\) or \(b = 0\). These values separately satisfy \(\sqrt{a} + \sqrt{b} = \sqrt{a + b}\).

Learners are often resistant to accepting that the square root operation is ‘not distributive’. They get used to \(a(b + c) = ab + ac\), and want other things to behave this way too.

However, in general,

\[\sin{(a + b)} \neq \sin a + \sin b\]

\[3^{a + b} \neq 3^{a} + 3^{b}\]

\[(a + b)^{2} \neq a^{2} + b^{2},\]

and so

\[\sqrt{a + b} \neq \sqrt{a} + \sqrt{b}\]

is just another example of non-linearity.

In fact, we can improve on the \(\neq\) symbol here, and put \(\leq\) instead.

We found that \(2\sqrt{a}\sqrt{b}\) had to be zero for equality, but since both \(\sqrt{a}\) and \(\sqrt{b}\) are non-negative, if neither \(a\) nor \(b\) is zero, then \(2\sqrt{a}\sqrt{b}\) will be positive, meaning that \(\sqrt{a} + \sqrt{b}\) will be greater than \(\sqrt{a + b}\).

The fact that, in general, \(\sqrt{a + b} \leq \sqrt{a} + \sqrt{b}\) is described by saying that the square root operation is sub-additive.32 By making ‘a thing’ out of this, and giving it a name, learners may be less likely to slip into thinking that square roots are additive. We need to teach what ‘isn’t’, just as much as what ‘is’.

So, if \(\sqrt{12} + \sqrt{27} \neq \sqrt{39}\), then does that mean it is impossible to simplify sums and differences of surds?

Often it is, but sometimes they can be simplified. Here, if we first simplify each surd, by factoring out any square numbers in the radicand (the expression that we are rooting), then we get

\[\sqrt{12} = \sqrt{4 \times 3} = \sqrt{4}\sqrt{3} = 2\sqrt{3}\]

\[\sqrt{27} = \sqrt{9 \times 3} = \sqrt{9}\sqrt{3} = 3\sqrt{3}.\]

So the sequence \(\sqrt{3}\), \(\sqrt{12}\), \(\sqrt{27}\), … is an arithmetic sequence, going up in \(\sqrt{3}\)s (Chapter 4).

Treating \(\sqrt{3}\) as our ‘unit’, we can think of the sum \(\sqrt{12} + \sqrt{27}\) as just being an instance of \(2a + 3a = 5a\), as I discussed earlier in the chapter (Section 2.4) - effectively collecting terms - so, \(2\sqrt{3} + 3\sqrt{3} = 5\sqrt{3}\).

Learners ought to question the assumption that \(\sqrt{ab} = \sqrt{a}\sqrt{b}\).

Indeed, in general this is not true. For example, if \(a = - 3\) and \(b = - 3\), then \(\sqrt{( - 3) \times ( - 3)} = \sqrt{9} = 3\), but neither \(\sqrt{a}\) nor \(\sqrt{b}\) exists, and so there is no chance of \(\sqrt{ab} = \sqrt{a}\sqrt{b}\).

However, for non-negative \(a\) and \(b\) it is true, which learners can prove by squaring both sides:

\[\begin{aligned} (\text{Left-hand side})^{2} &= \bigl( \sqrt{ab} \bigr)^{2} = \sqrt{ab}\sqrt{ab} = ab \\ (\text{Right-hand side})^{2} &= \bigl( \sqrt{a}\sqrt{b} \bigr)^{2} = \sqrt{a}\sqrt{b}\sqrt{a}\sqrt{b} = \sqrt{a}\sqrt{a}\sqrt{b}\sqrt{b} = ab, \end{aligned}\]

since we can reorder the factors in a multiplication. So, they are equal.

Incidentally, it might not be obvious why ‘squaring both sides’ is a legitimate thing to do with an equation. Is it multiplying both sides by ‘the same thing’?

In fact, it is, because if we know that, say,

\[x = y,\]

then we could multiply both sides by \(x\), or we could multiply both sides by \(y\), and we would get true equations:

\[x^{2} = xy\ \]

and

\[xy = y^{2}.\]

However, if we wanted to, since \(x\) and \(y\) have the same value, we could multiply the left-hand side by \(x\) and the right-hand side by \(y\), and this would be ‘squaring both sides’:

\[x^{2} = y^{2}.\] For learners who have met simultaneous equations, we can think of it as multiplying two equations together:

\[ \begin{alignedat}{2} x &= y && \rlap{\qquad \text{①}} \\ x &= y && \rlap{\qquad \text{②}} \\ x^2 &= y^2 && \rlap{\qquad \text{①} \times \text{②}} \end{alignedat} \]

The two equations on this occasion just happen to be identical (\(\text{①} = \text{②}\)).

I think it is important to make sure learners have thought about this kind of thing, otherwise something like ‘squaring both sides of an equation’ comes out of the blue as another thing that algebra mysteriously says is somehow all right.

It is important to remember that the squares of numbers being equal does not imply that the numbers themselves are necessarily equal (e.g. \(( - 3)^{2} = 3^{2}\), but \((- 3) \neq 3\)). However, in this case, since we are only interested in \(a\) and \(b\) both \(\geq 0\), we have no problem.

While learners may agree that \(5\sqrt{3}\) is in some sense ‘simpler’ than \(\sqrt{12} + \sqrt{27}\), because \(5\sqrt{3}\) is quicker to write and uses less ink, some other ‘simplifications’ with surds are less obviously ‘simpler’.

For example, \(\dfrac{1}{\sqrt{2}}\), perhaps as the sine or cosine of \(45{^\circ}\), seems fairly simple. However, learners are told that it is simpler to rationalise the denominator, and so they are expected to transform this into the equivalent fraction \(\dfrac{\sqrt{2}}{2}\):

\[\dfrac{1}{\sqrt{2}} = \dfrac{1}{\sqrt{2}} \times \dfrac{\sqrt{2}}{\sqrt{2}} = \dfrac{\sqrt{2}}{2}.\]

There certainly are situations in which it is simpler to work with \(\dfrac{1}{\sqrt{2}}\), or even \(\sqrt{\dfrac{1}{2}}\), than \(\dfrac{\sqrt{2}}{2}\), but there are also occasions when it is helpful if the denominators of fractions are integers. One example is when comparing the magnitudes of expressions involving surds, where dividing by surds can be awkward.

Finally, the most complicated kind of surd simplification is something like

\[\dfrac{3 - \sqrt{2}}{\sqrt{2} + 1}.\]

By using the difference of two squares, we can remove all the surds from the denominator, if we multiply both numerator and denominator by \(\sqrt{2} - 1\).

In general, if the denominator is \(\sqrt{a} \pm b\), we will want to use a multiplier of \(\sqrt{a} \mp b\).

Here, we get \[ \begin{aligned} \dfrac{3 - \sqrt{2}}{\sqrt{2} + 1} \times \dfrac{\sqrt{2} - 1}{\sqrt{2} - 1} &= \dfrac{\bigl( 3 - \sqrt{2} \bigr)\bigl( \sqrt{2} - 1 \bigr)}{(\sqrt{2})^{2} - 1^{2}} \\ &= \dfrac{3\sqrt{2} - 3 - \sqrt{2}\sqrt{2} + \sqrt{2}}{2 - 1} \\ &= \dfrac{4\sqrt{2} - 3 - 2}{1} \\ &= 4\sqrt{2} - 5. \end{aligned} \]

The denominator won’t always disappear (i.e. become \(1\)), like here, but it will always become an integer.

There are nice ways to extend this kind of work, such as by rationalising expressions such as \(\displaystyle \dfrac{1}{\sqrt[3]{2} + 1}\), or by exploring nested surds, such as \(\sqrt{2\sqrt{2} + 3}\), which often turn out to have much simpler forms.33

In topics in which there are pairs of expressions that look as though they might be equal but aren’t, and pairs of expressions that look quite different but are equal, card sort activities can be very useful.34

2.10.4 Inequalities

In life generally, there are many more things that are not equal to each other than are equal. In mathematics too, equality, such as in an equation, is a special, unusual case, in which we have two expressions that happen to be equal to each other. Most of the time, pairs of expressions will differ in value, and so we probably ought to give more weight to studying inequalities than is typical in school curricula.

There is a case for beginning algebra with a focus on comparing expressions, rather than equations, and this naturally leads to statements in terms of inequalities.35 On this website, I will explore inequalities further in Chapter 5, when we consider estimation, which is all about things that are not exactly equal, but are close, to a greater or lesser degree.

Sometimes mnemonics involving crocodiles are used to help learners know which way round a \(<\) or \(>\) sign needs to go (the crocodile opens its mouth to eat the greater number).36 Since the symbol itself has a ‘large end’ and a ‘small end’, references to crocodiles should be unnecessary, since the greater number simply goes at the larger end of the symbol.

For example,

\[3 < 4 \qquad \qquad 4 > 3 \qquad \qquad 4 > - 3 \qquad \qquad - 4 < 3.\]

I have seen introductory lessons on solving linear inequalities, such as \(2x - 5 < 9\), in which learners are told to solve the inequality just as they would for the equation \(2x - 5 = 9\), just replacing the equals sign with the inequality sign throughout.

It is understandable why learners might be taught this way. As shown in Figure 2.15, the two processes look so similar that if learners are already fluent in solving equations like this, then it might seem that very little more is required to achieve fluency in solving inequalities.

Figure 2.15: Solving equations and inequalities compared.

The teacher may feel obliged to give an additional rule, that if at any stage the learner multiplies or divides both sides of the inequality by a negative number, the inequality sign switches around (\(<\) becomes \(>\), and vice versa, and \(\leq\) becomes \(\geq\), and vice versa). With that proviso, surely that constitutes a complete recipe for tackling all linear inequalities?

For me, this really falls short of helping learners make sense of inequalities.

For example, this recipe will be insufficient prescription to enable the learner to handle quadratic inequalities. Following that recipe could lead to a nonsense solution like this:

\[(x - 2)(x + 3) < 0\]

\[x - 2 < 0 \quad \text{ or } \quad x + 3 < 0\]

\[x < 2 \quad \text{ or } \quad x < 3.\]

The pair of conditions ‘\(x < 2\) or \(x < 3\)’ is equivalent to \(x < 3\), but there are plenty of values in this interval, such as \(x = 2.5\) or \(x = - 10\), that do not satisfy \((x - 2)(x + 3) < 0\).

To solve this quadratic inequality algebraically, we need to realise that the product of \((x - 2)\) and \((x + 3)\) being negative (\(< 0\)) means that one of these factors must be positive and the other negative.

This means we have to consider two possibilities:

\[ \begin{alignedat}{2} &\text{Either } &&x - 2 < 0 \text{ and } x + 3 > 0, \text{ which means } -3 < x < 2 \\ &\text{Or } &&x - 2 > 0 \text{ and } x + 3 < 0, \text{ which have no values in common.} \end{alignedat} \]

So, the correct solution set is \(- 3 < x < 2\).

Algebra is not the easiest approach here. More conveniently, we could instead sketch the graph of \(y = (x - 2)(x + 3)\) and just observe the \(x\) values corresponding to \(y < 0\) (Chapter 4).

However, might it not be argued that the ‘Treat it as an equation’ recipe is good enough for linear inequalities?

I think it is not good enough, if we want learners to sense-make what they are doing.37

When we justified ‘doing the same operations to both sides’ when solving an equation, in Section 2.7.2, this was on the basis that the two sides were equal beforehand. Adding or subtracting equals to things that are already equal, leaves them still equal. We imagined Leillah (on the left-hand side) and Rajib (on the right-hand side) having equal amounts of money beforehand, and each being given \(£10\). Although they both now have more, they still have the same amount as each other – equality has been preserved.

But how does this argument work when we have unequal sides?

It would seem we should now no longer have to be particularly careful about adding equal amounts to both sides, or subtracting equal amounts to both sides. It would seem that we can be much more casual about inequalities than we can be about equations.

For example, if

\[2x - 5 < 9,\]

then we can add whatever we like to the right-hand side, and, if we wish, do nothing at all to the left-hand side, and the \(<\) inequality will still hold. We could add \(100\) to the right-hand side, and leave the left-hand side alone:

\[2x - 5 < 109.\]

Any \(x\) that satisfies the original inequality, such as \(x = 4\), is guaranteed to satisfy the second one:

\[ \begin{aligned} 2 \times 4 - 5 = 3 &< 9 \\ 2 \times 4 - 5 = 3 &< 109. \end{aligned} \]

If they are thinking about it, learners should be puzzled about being asked to be careful to do the same thing to both sides of an inequality. If they don’t object, this could suggest they are not really thinking about what they are doing, and are just ‘going through the motions’.38

So, how do we present solving inequalities in a meaningful way?

For me, the “I’m thinking of a mystery number” approach I suggested for equations works just as well for inequalities too. However, just as then – but even more importantly here – we don’t want to say, “What is my number?” but “What could my number be?” As with equations, where we wanted to leave open the possibility that there could be more than one answer, here we absolutely have to be ready for multiple answers, because that is the norm with inequalities.

I prefer to begin with a ‘less than’ inequality, and to state that my mystery number is a positive integer, because that leads to a finite solution set, which is helpful in the beginning. Of course, we will relax this requirement later, but to begin with it helps with clarifying what we are doing. And initially I will not be expecting answers given as inequalities, but as an exhaustive list of all of the possible values that the mystery number could be. This highlights that we are not merely interested in finding some possible values of the mystery number. The task is to find all the possible values, and to ensure no impossible values sneak in by accident!

So, if we want to solve \(2x - 5 < 9\), where \(x\) is a positive integer, what do we do?

We can certainly compare this with solving the equation \(2x - 5 = 9\), which gives us \(x = 7\), but we can immediately see that \(x = 7\) is not a possible solution to our inequality:

\[2 \times 7 - 5 = 9 \nless 9,\]

where the \(\nless\) symbol means ‘is not less than’.

Nine is not less than \(9\); it’s equal to \(9\). I think it is good to clarify near the beginning that solving inequalities is not ‘just like’ solving equations. The solution to the corresponding equation is not even part of the solution set of the inequality.

So, we know that \(7\) is definitely not my mystery number, as it doesn’t satisfy the given inequality. Can learners find any number that could be the mystery number?

After a bit of experimenting, they will realise that there are \(6\) positive integer solutions: \(x = 1,\ 2,\ 3,\ 4,\ 5\) or \(6\). The solution set is all the positive integers \(x\) such that \(x < 7.\)

Instead of \(x = 7\), the solution is \(x < 7\). If we hadn’t been focused on only the positive integers, a number very near to \(7\), but just slightly below it, would also have done; for example, \(x = 6.99\):

\[2 \times 6.99 - 5 = 8.98 < 9.\]

Learners should keep substituting numbers into the inequality and verifying that they either satisfy or don’t satisfy it.

We can see now that, although a statement like \(2x - 5 < 109\) is true, given \(2x - 5 < 9\), and all of the values in the solution set \(x < 7\) do indeed satisfy \(2x - 5 < 109\) as well, the problem with writing \(2x - 5 < 109\) as part of our solution is that it includes a whole lot of values of \(x\) that don’t satisfy the original inequality. It’s far too generous with the numbers it includes.

For example, \(x = 50\) satisfies \(2x - 5 < 109\), because

\[2 \times 50 - 5 = 95 < 109,\]

but \(x = 50\) doesn’t satisfy \(2x - 5 < 9\), because \(95 \nless 9\).

Doing the same operations to both sides of an inequality matters because it maintains the solution set. We don’t lose any numbers we need, and we also don’t inadvertently gain numbers we don’t want! Both aspects are vital if we are to find what my mystery number could have been, because we don’t want to lose viable possibilities, but nor do we want to expand to include values that my number definitely could not have been.

Another way to say this is that doing the same operations to both sides has to be reversible. It follows from \(2x - 5 < 9\) that \(2x - 5 < 109\), but it does not follow the other way round. If we start with \(2x - 5 < 109\), it does not follow that \(2x - 5 < 9\), because lots of the values that satisfied \(2x - 5 < 109\) do not satisfy \(2x - 5 < 9\).

But, if we do the same linear operations to both sides, our steps are reversible. It follows from \(2x - 5 < 9\) that \(2x < 14\), and it follows from \(2x < 14\) that \(2x - 5 < 9\). They are equivalent statements, corresponding to the same solution set.

What is different if we want to solve \(2x - 5 \leq 9\)?

Now, by following the same process, we obtain \(x \leq 7\), so we get one extra positive integer solution, of \(x = 7\). So, now there are seven positive integer solutions, rather than six. In this case, the solution we get from solving \(2x - 5 = 9\) is one of the solutions of \(2x - 5 \leq 9\), because \(2x - 5 \leq 9\) means

\[\text{Either } 2x - 5 < 9 \text{ or } 2x - 5 = 9,\]

and so this time the equation is included in the inequality.

We can think of the \(x = 7\) solution as marking the boundary between values of \(x\) that are in, and those that are out. Seven is in (just), but anything above \(7\) is out. We can show that \(7\) is included in the interval by colouring in the dot at \(7\) (Figure 2.16).

Figure 2.16: The solution set \(x \leq 7\).

I would usually address the switching round of inequality signs when multiplying or dividing by a negative number by asking learners to study carefully and comment on the four solutions from ‘learners in another class’ shown in Figure 2.17.39

Figure 2.17: Four attempted solutions to \(17 - 3x = 2\) and \(17 - 3x < 2\).

Initially, learners may not be able to find anything wrong, and it may not be obvious to them that the solutions to the two inequalities in C and D actually contradict each other. If \(5\) is less than \(x\), then \(x\) cannot be less than \(5\); \(x\) would have to be greater than \(5\).

The error in attempted solution D is in the final step, where if \(- 3x < - 15\), then it does not follow that \(x < 5\). For example, if \(x = 10\), then \(- 3x = ( - 3) \times 10 = - 30 < - 15\). But \(x = 10\) does not satisfy \(x < 5\).

Substituting numbers into different lines of a solution is a good way to discover at which stage something has gone wrong. It is a bit like debugging computer code; if lines \(1\) to \(17\) work, but line \(18\) doesn’t, then something is going wrong in line \(18\).

We could explore what is wrong with attempted solution D by adding \(3x\) to both sides of \(-3x < -15\), to obtain

\[ \begin{alignedat}{3} \phantom{1}0 &< -15 + 3 &&x \end{alignedat} \]

Then we can add \(15\) to both sides and divide by \(3\):

\[ \begin{alignedat}{3} 15 &< \phantom{-15 + {}} 3 &&x \\ \phantom{1}5 &< \phantom{-15 + {}} &&x, \end{alignedat} \]

as in solution C.

Learners can see, by trying with numbers, that if \[ \phantom{-}2 < \phantom{-}3, \] say, then multiplying by \(10\), or dividing by \(10\), both lead to true inequalities: \[ \begin{alignedat}{2} \phantom{-}20 &< \phantom{-}30 \\ \phantom{-}0.2 &< \phantom{-}0.3 \end{alignedat} \] However, multiplying or dividing by \(-10\) lead to false statements: \[ \begin{alignedat}{2} -20 &< -30 && \rlap{\quad \text{False}} \\ -0.2 &< -0.3 && \rlap{\quad \text{False.}} \end{alignedat} \]

These are not false in the sense that they have no correspondence at all with reality; they just need their inequality signs turning around, and then they will be true. Any false inequality will become true if the inequality sign is turned around (unless the two sides are equal).

To make the ‘\(2\)’ in \(2 < 3\) negative, we subtract \(2\) from both sides, to get \(0 < 3 - 2\), and then we do the same thing with the \(3\), subtracting it from both sides, to get \(- 3 < - 2\), which is correct, and equivalent to \(- 2 > - 3\), with the ‘less than’ sign turned into a ‘greater than’ sign.

It is worth spending time on this, and doing it thoroughly, because the ‘shock’ of this discovery can make it memorable. If you just tell learners the rule that multiplication or division by a negative number switches around the inequality sign, and they just shrug and add it to a list of numerous other half-understood, half-forgotten mathematics rules, then it is unlikely to make much of an impression.

A harder-to-notice instance of this occurs when multiplying or dividing both sides of an inequality by something involving \(x\), whenever there is a possibility that \(x\) could be negative. (By comparison, multiplying or dividing by something like \(x^{2}\) is never a problem, since \(x^{2}\) is never negative.)

For example, if trying to solve

\[\dfrac{15}{x} < 5,\]

it could be tempting to multiply both sides by \(x\), like we would if solving the corresponding equation, as shown in Figure 2.18.

Figure 2.18: Solving equations and inequalities compared.

It is certainly true that, for all values of \(x > 3\), the inequality \(\displaystyle \dfrac{15}{x} < 5\) is satisfied. But there are infinitely many other numbers that satisfy this inequality as well, which we are missing.

For example, \(x = - 3\) satisfies the original inequality, since

\[\dfrac{15}{( - 3)} = - 5 < 5.\]

Why have we lost solutions like this one?

The mistake was to multiply both sides by \(x\) without worrying about whether \(x\) might be negative. Effectively, we assumed \(x > 0\) when we did this, and so we lost any negative solutions.

If \(x < 0\), then the inequality sign would turn around, to give

\[15 > 5x\]

\[3 > x,\]

telling us that, when \(x\) is negative, all values of \(x < 3\) are solutions, and together this means all values of \(x < 0.\)

The easier way to see this is to avoid multiplying up by \(x\) altogether, and instead subtract \(5\):

\[ \begin{alignedat}{2} & \displaystyle \dfrac{15}{x} && < 5 \\[2ex] & \displaystyle \dfrac{15}{x} - 5 && < 0 \\[2ex] & \displaystyle \dfrac{15 - 5x}{x} && < 0. \end{alignedat} \]

For a fraction to be \(< 0\) (i.e. negative), we need the numerator and denominator to have opposite signs, and so the two possibilities are:

\[\text{Either } 15 - 5x < 0 \text{ and } x > 0\] \[\text{Or } 15 - 5x > 0 \text{ and } x < 0.\]

The first case is \(15 < 5x\) (which is \(3 < x\)) and \(x > 0\), which are both true if \(x > 3\).

The second case is \(15 > 5x\) (which is \(3 > x\)) and \(x < 0\), which are both true if \(x < 0\).

Taken together, the solution set is \(x < 0\) or \(x > 3\), so we get a pair of disjoint intervals; there is an interval from \(0\) to \(3\) in between them, containing values that do not satisfy the inequality.

This solution method, in which we subtract \(5\) from both sides, obtaining a zero on the right-hand side, is reminiscent of solving quadratic equations by factorisation. Having a quotient equal to zero is just as handy as having a product equal to zero (the zero-product property). The zero-quotient property states that a quotient is zero when the numerator is zero and the denominator isn’t.

2.10.5 Rearranging equations

A lot of the mathematics in school science can be reduced to substituting into formulae. Often, when learners are doing this, they are not necessarily thinking about the meaning of what they are substituting. Learners in science have been known to scan through a formula booklet, looking for any formula that contains some of the same letters as the initial letters of the quantities in the question they are tackling, hoping to find a match. For example, \(v\) could be velocity, voltage or volume, perhaps.

In mathematics too, it is easy for geometry topics to end up not really being about shapes, but about number and algebra. A question about finding a supplementary angle becomes about finding number bonds to \(180\), and a question about triangles becomes about rearranging trigonometric equations.

Having said this, being fluent handling formulae is an important skill, within mathematics, and particularly for STEM subjects. And many learners complain about the difficulty of rearranging equations. Often, learners will rearrange an equation and then substitute in the values, and usually it is easier to do it the other way round.

Consider the equation \(v^{2} = u^{2} + 2as\) (from kinematics, but it doesn’t matter where it is from). Suppose the learner is told that \(v = 10\), \(u = 20\) and \(a = - 10\), and is asked to find the value of \(s\).

They might begin by rearranging the equation, but there are many ways in which they could go wrong. It is much easier to substitute the numbers first, because that way you take advantage of anything that simplifies along the way.

Figure 2.19 shows it both ways in parallel.

Figure 2.19: Comparing substituting first with rearranging first.

On the left in Figure 2.19, the numbers combine nicely, and simplify, whereas on the right all this has to be handled algebraically, and the simplification happens all at once at the end.

Rearranging may save time if you have multiple substitutions to do into the same formula, but this is rare in school mathematics, and nowadays a calculator’s table function would take care of this.

However, despite this, rearranging equations is certainly a skill that learners need to master. Rearranging equations is a little like finding the inverse function (Chapter 4). It is not exactly the same, because often the formulae being rearranged are not functions, or are functions of many variables.

Really, rearranging an equation is very similar to solving an equation, but without the simplifying – so, in a sense, it should be easier. This is how I would normally encourage learners to think about it. The language of ‘subject’ can provide useful analogies with sentences in English which can be framed in active or passive voice, swapping the subject.40

Let’s look at the same equation we solved earlier:

\[5x - 11 = 3x + 9.\]

Suppose we want to make \(x\) the subject. This is what we were doing when we were ‘solving for \(x\)’.

Let’s do the same steps, but this time only simplify on the left-hand side, and not on the right-hand side:

\[ \begin{alignedat}{2} & 5x - 11 + 11 &&= 3x + 9 + 11 \\[2ex] & 5x &&= 3x + 9 + 11 \\[2ex] & 2x &&= \phantom{3x + {}} 9 + 11 \\[2ex] & \phantom{5}x &&= \phantom{3x + {}} \! \dfrac{9 + 11}{2} \end{alignedat} \]

Of course, if we now simplify \(\displaystyle \dfrac{9 + 11}{2}\), then we will get \(x = 10\), as before. Why would we not simplify as we go?

Imagine you had to solve not one equation, but dozens of them, perhaps like these:

\[ \renewcommand{\arraystretch}{2.5} \begin{array}{|l|l|l|l|} \hline 5x - \phantom{0}5 = 3x + 17 & 5x - 13 = 3x + \phantom{0}5 & 5x - \phantom{0}7 = 3x + 57 & 5x - 10 = 3x + 90 \\ \hline 5x - \phantom{0}6 = 3x + 20 & 5x - 45 = 3x + \phantom{0}1 & 5x - \phantom{0}2 = 3x + \phantom{0}5 & 5x - 58 = 3x + \phantom{0}2 \\ \hline 5x - 20 = 3x + \phantom{0}8 & 5x - 63 = 3x + \phantom{0}5 & 5x - 14 = 3x + 30 & 5x - \phantom{0}1 = 3x + \phantom{0}9 \\ \hline \end{array} \]

If we had to summarise what is common about all of these equations, we’d say that they are all of the form:

\[5x - a = 3x + b,\]

where \(a\) and \(b\) are numbers. The \(a\)s and \(b\)s change from equation to equation, but the \(5x\) and \(3x\) are fixed.

If we have lots of these equations to solve, then knowing

\[x = \dfrac{9 + 11}{2}\]

is much more useful than knowing \(x = 10\), even though \(10\) is ‘the answer’.

The equation \(x = \dfrac{9 + 11}{2}\) tells us that we add ‘the \(9\)’ and ‘the \(11\)’, and divide the answer by \(2\), and we will do this regardless of what ‘the \(9\)’ and ‘the \(11\)’ were.

We could say:

\[x = \dfrac{a + b}{2},\]

and that would summarise what we need to do every time.

The point is that we could get this, step by step, from our starting equation.

We could add \(a\) to both sides, even though we don’t know the value of \(a\) (or perhaps even though we know that it is going to take multiple possible values):

\[ \renewcommand{\arraystretch}{2.5} \begin{array}{l} 5x - a + a = 3x + b + a \\ 5x \phantom{{}- a + a} = 3x + b + a \\ 2x \phantom{{}- a + a} = \phantom{3x + {}} b + a \\ \phantom{2}x \phantom{{}- a + a} = \phantom{3x + {\!}} \displaystyle \dfrac{b + a}{2} \end{array} \]

This time, instead of getting a numerical value of \(x\), we get a formula, in which \(x\) is now the subject.

Rearranging equations is very important in science. The whole purpose is to make substitution more efficient by rearranging once and substituting many times, rather than rearranging (i.e. solving and simplifying) every time you substitute. However, as I mentioned at the start, if you are only going to be substituting once, it is usually not worth rearranging.

Learners often need to be able to rearrange a wider variety of equations than the kinds of equations they are expected to be able to solve.

For example, the equation

\[\dfrac{1}{R_{T}} = \dfrac{1}{R_{1}} + \dfrac{1}{R_{2}},\]

for the total resistance \(R_{T}\) of two resistors, \(R_{1}\) and \(R_{1}\), in parallel with each other, is well known in physics. The teacher may say, “It’s just adding fractions”, but learners often struggle to see what to do.

They might try to invert each term, to obtain

\[R_{T} = R_{1} + R_{2}.\]

But that is the equation for resistors in series, not parallel, and the science tells us that the total resistance of two resistors is going to be less when they are connected in parallel than when they are connected in series, because the current has more ways to go, so this attempted rearrangement must be wrong. The reason it is wrong is that we have not actually done the same operation to both sides of the equation. If we wanted to find the reciprocal of both sides, we would have to write

\[R_{T} = \dfrac{1}{\dfrac{1}{R_{1}} + \dfrac{1}{R_{2}}},\]

and the right-hand side does not simplify to \(R_{1} + R_{2}.\)

Our compound fraction is the right answer, but it isn’t in a simplified form.

Instead of inverting both sides like this, the easiest way whenever we have fractions is usually to multiply through to clear all the fractions in the first step.

In this case, beginning with \(\displaystyle \dfrac{1}{R_{T}} = \dfrac{1}{R_{1}} + \dfrac{1}{R_{2}}\), we would need to multiply both sides by \(R_{T}R_{1}R_{2}\).

This gives

\[\dfrac{R_{T}R_{1}R_{2}}{R_{T}} = \dfrac{R_{T}R_{1}R_{2}}{R_{1}} + \dfrac{R_{T}R_{1}R_{2}}{R_{2}}.\]

Cancelling down each fraction gives

\[R_{1}R_{2} = R_{T}R_{2} + R_{T}R_{1}.\]

Factorising,

\[R_{1}R_{2} = R_{T}\left( R_{2} + R_{1} \right)\]

\[\dfrac{R_{1}R_{2}}{R_{1} + R_{2}} = R_{T}.\]

But there are plenty of other ways to do this.

For example, we can use a common denominator of \(R_{1}R_{2}\) to add \(\displaystyle \dfrac{1}{R_{1}} + \dfrac{1}{R_{2}}\):

\[\dfrac{1}{R_{T}} = \dfrac{1}{R_{1}} + \dfrac{1}{R_{2}} = \dfrac{R_{2}}{R_{1}R_{2}} + \dfrac{R_{1}}{R_{1}R_{2}} = \dfrac{R_{1} + R_{2}}{R_{1}R_{2}}.\]

Now, inverting both sides gives

\[R_{T} = \dfrac{R_{1}R_{2}}{R_{1} + R_{2}},\]

as before.

We can prove that \(\displaystyle \dfrac{R_{1}R_{2}}{R_{1} + R_{2}}\) (the total resistance in parallel) is always going to be less than \(R_{1} + R_{2}\) (the total resistance in series), because the total resistance actually has to be less than even the smaller of the two resistors. This makes sense in terms of the science, because adding a second resistor in parallel, even one of very high resistance, provides at least some additional pathway for the current, and so allows a little more current through for the same amount of potential difference.

Let’s suppose (without loss of generality) that \(R_{1} < R_{2}\).

We know that \({R_{1}}^{2} > 0\), and so, adding \(R_{1}R_{2}\) to both sides,

\[{R_{1}}^{2} + R_{1}R_{2} > R_{1}R_{2}.\]

Factorising,

\[R_{1}\left( R_{1} + R_{2} \right) > R_{1}R_{2}.\]

Since \(R_{1}\) and \(R_{2}\) are both \(> 0\), so is \(R_{1} + R_{2}\), and we can safely divide both sides of the inequality by \(R_{1} + R_{2}\), to give

\[R_{1} > \dfrac{R_{1}R_{2}}{R_{1} + R_{2}}.\]

This means that the total resistance of these two resistors connected in parallel is less than the smaller of the two resistances, \(R_{1}\).

An interesting pure mathematics context for practising substituting numbers into formulae is to test prime-generating formulae to see for which values of integer \(n\) they produce prime numbers. It is not possible to create a simple polynomial formula (other than a constant) that will always give prime numbers for every integer value, but there are several formulae that give primes for many values of \(n\).

For example, \(n^{2} + n + 41\) produces primes for values of \(n\) from \(0\) to \(39\). Clearly, it cannot work for \(n = 41\), because \(41^{2} + 41 + 41 = 41(41 + 2) = 41 \times 43\), which is semiprime (the product of two primes), not prime.

It also, but less obviously, doesn’t work for \(n = 40\), because \(40^{2} + 40 + 41 = 40^{2} + 2 \times 40 + 1^{2} = (40 + 1)^{2} = 41^{2}\).

The similar formula \(n^{2} - n + 41\) produces primes for values of \(n\) from \(0\) to \(40\), but the primes for \(n = 0\) and \(n = 1\) are the same prime (\(41\)). Other formulae learners could explore, with the values of \(n\) for which they give primes are shown in Figure 2.20.

Figure 2.20: Prime-generating formulae.

Learners might like to invent their own formulae and see how many primes (or how many consecutive primes) they can get from them.

2.10.6 Priority of operations

Not everyone agrees, but to me the priority of operations is an arbitrary matter; it could have been otherwise, and is a notational convenience.41

Because we often want to write ‘polynomial’ calculations like

\[4 \times 10^{3} + 5 \times 10^{2} + 3 \times 10^{1} + 2 \times 10^{0} = 4532,\]

it is very convenient to allow powers to take priority over multiplication, and multiplication to take priority over addition. This means, for instance, that we know in this expression that with \(4 \times 10^{3}\) we work out \(10^{3}\) first, and then multiply the answer by \(4\), before adding on the other terms. If we did not have this agreed convention, we would have to put in a lot of brackets, and this would be cumbersome and time consuming:

\[ \bigl( 4 \times ( 10^{3} ) \bigr) + \bigl( 5 \times ( 10^{2} ) \bigr) + \bigl( 3 \times ( 10^{1} ) \bigr) + \bigl( 2 \times ( 10^{0} ) \bigr) = 4532. \]

Brackets force the priority in a particular order; without brackets, expressions would be ambiguous, if we didn’t have an agreed order.

Sometimes our notation provides implied brackets that help us see the priority.

For example, by writing indices as superscripts, as in \(2^{3 + 1}\), rather than using an up arrow or a caret, as in \(2 \uparrow 3 + 1\) or \(2 \textasciicircum 3 + 1\), we can discern that the \(3 + 1\) is to be evaluated first.

So, \(2^{3 + 1} = 2^{4} = 16\), whereas \(2^{3} + 1 = 8 + 1 = 9\).

Without the superscript notation, we would need brackets to indicate this, as \(2 \uparrow (3 + 1)\) or \(2 \textasciicircum (3 + 1)\).

Similarly, by always writing division using a vinculum (the horizontal line in \(\displaystyle \dfrac{6}{4 + 1}\)), rather than with an obelus (e.g. as \(6 \div (4 + 1)\)), the vinculum operates as implied brackets, saving us from having to write actual brackets. By using helpful notation, we can limit the number of brackets needed.

A very common way to introduce to learners the idea that some operations take priority over others is to ask them to calculate something like \(3 + 4 \times 5\).

Unless they are already familiar with the convention, some are likely to give the conventionally correct answer, \(23\) (i.e. \(3 + 20\)), and others the conventionally incorrect answer, \(35\) (i.e. \(7 \times 5\)).

By deciding to agree that multiplications, as the ‘more powerful’ operation, happen first, we can resolve this ambiguity, so that everyone gets the same answer, and we avoid confusion in our communication. If we want to override the standard order, we can always use brackets to do so: \((3 + 4) \times 5 = 35\).

In a similar way, the more ‘powerful’ powers and roots take precedence over multiplication (and therefore over addition), so \(E = mc^{2} = mcc\), not \((mc)^{2},\) which would be \(mcmc\).

This means we can write Pythagoras’ Theorem without needing any brackets: \(a^{2} + b^{2} = c^{2}\), not \(\left( a^{2} \right) + \left( b^{2} \right) = \left( c^{2} \right)\).

However, sometimes we will put in brackets just to ‘be on the safe side’, even if we do not strictly need them.

For example, brackets may be helpful to show that \(2 \times 3 + 4 \times 5\) means \((2 \times 3) + (4 \times 5)\).

People often say that division and multiplication must have equal priorities to each other, and the same for addition and subtraction, and that it is a misconception to think that in these pairs one should happen before the other. However, if you go in the order shown in Figure 2.21, you never need to bother about equal priorities.42

Figure 2.21: The conventional priority of operations.

‘Brackets’ is not an operation, and learners are sometimes confused when it appears in a list similar to the one in Figure 2.21. If you wish to use an acronym for IDMSA, it could be I Don’t Mind Solving Anything!

To begin with, learners may find it helpful to do only one operation per line, and perhaps use highlighter pen to indicate the part of the calculation they are doing at each step, as illustrated in Figure 2.22.

Figure 2.22: Evaluating an expression using the conventional priority of operations.

Learners may like to experiment with what would happen if the priority of operations had been decided differently.43

The Number Snakes task44 can support thinking about priority of operations, and a very nice task for working on this topic is ‘Four fours’:45

TASK 2.11

Using addition, subtraction, multiplication and division only, what numbers can you make from four \(4\)s?

It can be useful to extend this to allow other operations, such as square roots.

This task can also be done with other sets of four numbers, such as digits from a relevant date.46

Another puzzle-type task is this one.

TASK 2.12

Insert operations in the gaps, including brackets where necessary, to make this into an equation:
\[9\ \ \ \ \ \ 9\ \ \ \ \ \ 9\ \ \ \ \ \ 9\ \ \ \ \ \ 9\ \ \ \ \ \ 9 = 162\]
Make up other puzzles like this.

One possible answer is \(9 \times (9 + 9) + 9 - 9 = 162\).

2.10.7 Standard algorithms

You might think that the standard algorithms for addition, subtraction, multiplication and division ought to belong in a chapter about numbers, rather than in one about algebra. But the point of algorithms is that they always work – they are not dependent on the specifics of particular numbers, but they work generally, in every case.47 The reason we can be sure of this is that they are built on the algebraic properties of number, such as the distributive law.

If these algorithms are still worth teaching in schools, it is probably no longer because of their everyday compuational value but because they give insight into how numbers combine.

2.10.7.1 Addition

From an algebraic point of view, the addition algorithm is rather like collecting like terms, where the like terms are multiples of powers of \(10\).

For example, suppose we want to add \(354\) and \(768\).

We can think of this as:

\[354 = 3 \times 100 + 5 \times 10 + 4 \times 1\]

\[768 = 7 \times 100 + 6 \times 10 + 8 \times 1.\]

When we add these together, we get

\[354 + 768 = (3 + 7) \times 100 + (5 + 6) \times 10 + (4 + 8) \times 1.\]

This is just collecting together the \(1\)s, the \(10\)s and the \(100\)s.

The only tricky part is that sometimes, as here, there are more than \(9\) of some of these collections:

\[(3 + 7) \times 100 = \textcolor{darkgray}{1} \times 1000\]

\[(5 + 6) \times 10 = \textcolor{darkgray}{1} \times 100 + 1 \times 10\]

\[(4 + 8) \times 1 = \textcolor{darkgray}{1} \times 10 + 2 \times 1.\]

The numbers shown in grey are the ‘carries’, which move over into the next column.

When adding two numbers, each carry can never be more than \(1\), because the greatest possible sum in any column will be \(9 + 9 = 18\), which gives a carry of \(1\).

However, when adding more than two numbers, the carry can be more than \(1\). In general, when adding \(n\) numbers, the maximum carry possible will be the \(10\)s digit of \(9n\), which is always \(1\) less than \(n\), so the maximum carry with a sum of \(n\) numbers will be \(n - 1\).

After collecting up the carries, we just collect like terms again:

\[354 + 768 = (1 \times 1000) + (1 \times 100 + 1 \times 10) + (1 \times 10 + 2 \times 1) = 1122.\]

Of course, it is much more usual to write this in vertical form, to save all the unnecessary writing of the powers of \(10\), but it is exactly the same thing, and learners can be quite amazed to see this.

2.10.7.2 Subtraction

The subtraction algorithm is exactly the same as this, but in reverse.

With addition, we ran into the difficulty of getting a total of more than \(9\) in a column, which leads to a ‘carry’. With subtraction, we have the opposite problem: we have to engineer getting more than \(9\) in a column, so we have enough in the minuend (the number we are subtracting from) to subtract the subtrahend from it. This is known as borrowing or exchanging.

Let’s do \(1122 - 354\), expecting to get \(768\), since this is the inverse of the addition calculation we just carried out.

We begin by thinking about how many we have of each power of \(10\):

\[\begin{alignedat}{4} 1122 &= 1 \times 1000 & {}+{} & 1 \times 100 & {}+{} & 2 \times 10 & {}+{} & 2 \times 1 \\ 354 &= & & 3 \times 100 & {}+{} & 5 \times 10 & {}+{} & 4 \times 1 \end{alignedat}\]

When we do the subtraction, we find that for each digit we are trying to take away more of that power of \(10\) than we have expressed as that power of \(10\). So, we need to exchange.

Let’s swap a \(10\) for ten \(1\)s:

\[1122 = 1 \times 1000 + 1 \times 100 + 1 \times 10 + 12 \times 1.\]

Now, we can subtract the four \(1\)s in the subtrahend, because we have more than four \(1\)s in the minuend:

\[1122 - 4 = 1 \times 1000 + 1 \times 100 + 1 \times 10 + 8 \times 1\]

Now, we need more \(10\)s than we have, so we have to swap the \(100\) for ten \(10\)s:

\[1122 - 4 = 1 \times 1000 + 11 \times 10 + 8 \times 1.\]

Now, we have enough \(10\)s to subtract the five \(10\)s in the subtrahend:

\[1122 - 54 = 1 \times 1000 + 6 \times 10 + 8 \times 1.\]

Finally, we exchange the \(1000\) for ten \(100\)s:

\[1122 - 54 = 10 \times 100 + 6 \times 10 + 8 \times 1.\]

Now, we have enough 100s to subtract the three 100s in the subtrahend:

\[1122 - 354 = 7 \times 100 + 6 \times 10 + 8 \times 1.\]

And our answer is \(768\).

In the more usual, economical notation, we have:

I am actually not too keen on exchanging, and for learners familiar with negative numbers, these kinds of subtractions are much easier to do using negative numbers.48

Instead of saying, for example, for the \(1\)s, that “two minus four doesn’t go” or “can’t be done”, once learners know about negative numbers, they are likely to protest that there is an answer.

So, we can accept this, and just work it out in directed numbers:

\[\begin{aligned} 2 - 4 &= - 2 \\ 20 - 50 &= - 30 \\ 100 - 300 &= - 200. \end{aligned}\]

In column layout we have:

Instead of little crossings out and exchanged numbers, this time there are little negative signs before the \(2\), \(3\) and \(2\) on the answer line.

Now, the answer, reading left to right, is \(1000 - 200 - 30 - 2\), and the number we get at this point is always easy to work out if we tackle it in this order:

I think this is usually an easier way to do subtractions.

A rich, well-known task that can be used for practising integer addition and subtraction is ‘\(1089\)’:49

TASK 2.13

Write down a three-digit number.
Now write the number with the digits in the reverse order.
Subtract the smaller number from the larger number and write down the answer.
Reverse this number and add it on.
What do you get?

What happens with different starting numbers?
What happens if you start with a four-digit number instead?

If learners begin with a palindromic number, such as \(313\), they will get zero; otherwise, with one caveat, they should always get the answer \(1089\), which is surprising, interesting and also a convenient check on their calculations.

By unpicking what is going on, learners can see why the answer always comes to \(1089\). It is easiest to do this by following through a generic example, such as \(265\).

First, we subtract \(265\) from \(562\). The \(10\)s digit, \(6\), will disappear (\(60 - 60 = 0)\), and we can see that this will always happen, whatever the \(10\)s digit is, because it will be the \(10\)s digit of both numbers. So, our subtraction will leave us with \(5 - 2 = 3\) hundreds and \(2 - 5 = - 3\) ones.

The \(3\) comes from the difference between the \(100\)s digit and the \(1\)s digit in the original number, and we could call this difference \(d\). In total, for \(265\), our subtraction will give us \(300 - 3\), which is \(99 \times 3\).

In general, the subtraction step is always going to give the \(d\)th multiple of \(99\), because it will always be \(100d - d\), which is \(99d\).

There are only \(9\) possible different numbers you can get here, because there are only \(9\) different \(d\) values (\(1\) to \(9\)). This means that, whatever three-digit number you begin with, there are only \(9\) possible values you can get for your subtraction, which is a surprise to learners. The differences are listed in Figure 2.23.

Figure 2.23: The \(9\) possible results from the subtraction step.

If we look at the multiples of \(99\) from \(2\) to \(9\), we notice that the \(10\)s digit is always \(9\), and the \(100\)s and the \(1\)s digits always sum to \(9\). So, if we reverse and add any one of these values, we must get \(9\) hundreds, \(2 \times 9\) tens and \(9\) ones.

That makes

\[900 + 2 \times 90 + 9 = 1089.\]

What about the \(d = 1\) case? If we reverse \(99\), we get \(99\), and if we add those together we get \(2 \times 99 = 198\), so a starting number in which the difference (\(d\)) between the \(100\)s and the \(1\)s digits is only \(1\) seems to always give us \(198\), rather than \(1089\). However, if we instead write \(99\) as a \(3\)-digit number, \(099\), this makes a reverse number of \(990\), which then fits the pattern, because \(099 + 990 = 1089.\)

The \(d = 0\) case doesn’t work at all, because no difference between the \(100\)s and the \(1\)s digits means we have a palindromic number, and the first subtraction leads to zero.

Once learners know about decimal notation, decimal addition and subtraction don’t involve any new ideas - we just have columns to the right of the \(1\)s, but we handle them in exactly the same way, provided we are careful to align the decimal points in all the numbers.

2.10.7.3 Multiplication

Long and short multiplication depend on the distributive law, that \(a(b + c) = ab + ac\).

Suppose we want to work out \(823 \times 15\).

We can write

\[823 \times 15 = (8 \times 100 + 2 \times 10 + 3 \times 1) \times (1 \times 10 + 5 \times 1).\]

This is just like expanding brackets with algebraic letters. Each number in the first bracket needs multiplying by each number in the second bracket, so we will get \(2 \times 3 = 6\) products, which will then need adding together.

Pairing the \(100\)s and the \(1\)s will make \(100\)s, and so will pairing the \(10\)s and the \(10\)s. Pairing the \(10\)s in the first bracket with the \(1\)s in the second bracket will make \(10\)s, as will pairing the \(1\)s in the first bracket with the \(10\)s in the second bracket. It is good to stop and think about how many terms there are going to be and what order of magnitude each will have before plunging in.

Based on this, we might be able to stand back and see that we are going to get

\[\begin{alignedat}{2} 800 \times 10 &= {} & 8000 \\ 800 \times 5 + 20 \times 10 &= {} & 4200 \\ 20 \times 5 + 3 \times 10 &= {} & 130 \\ 3 \times 5 &= {} & 15 \end{alignedat}\]

This makes a total of \(12,345\), which is a nice answer.

In the standard layout, we have:

More efficiently,

A nice task for practising multiplication of integers is the one below.50

TASK 2.14

Using the digits \(1\), \(2\), \(3\) and \(4\), make two numbers which multiply together to give the greatest possible answer.

Learners will have to consider whether to multiply a \(3\)-digit number by a \(1\)-digit number, or a \(2\)-digit number by a \(2\)-digit number. They will want the higher digits in the more ‘powerful’ place value positions of each number, so the digits in any number should decrease from left to right.

After working out all of this, learners may be surprised that the product \(43 \times 21\) does not give anywhere near the greatest possible answer. The product \(43 \times 21 = 903\), but \(42 \times 31 = 1302\), which is much higher, and indeed \(41 \times 32 = 1312\), which is in fact the greatest possible product.

Learners will find it tricky to decide which of \(42 \times 31\) and \(41 \times 32\) will be greater, without working them out, but it should be possible to reason this out without calculating. (Working them out is ‘cheating’!)

One way to think about it is to visualise the products in terms of the areas of rectangles. In Figure 2.24, the dark grey (\(40 \times 30\)) and light grey (\(2 \times 1\)) rectangles are common to both of these products. The difference comes from whether the \(2\) is multiplied by the \(40\) and the \(1\) by the \(30\), or the other way round.

Figure 2.24: Visualising \(42 \times 31 < 41 \times 32\) (not drawn to scale).

It is clearly better to multiply the larger \(1\)s digit by the larger \(10\)s digit, so we want the \(2\) and the \(40\) in different numbers, and this is why \(41 \times 32\) wins out over \(42 \times 31\).

Another way to express this is to note that the (semi-)perimeter of the outer rectangle in the two products will be the same, because \(42 + 31 = (40 + 2) + (30 + 1) = (40 + 1) + (30 + 2) = 41 + 32\). And we can think about looking for the maximum area with a fixed perimeter as being a geometrical problem.

Among rectangles with equal perimeter, the one with the largest area is the one that is closest to being a square. Another way to capture ‘closest to being a square’ is to say that the difference between the two numbers being multiplied should be as small as possible. Since \(42 - 31 = 11 > 9 = 41 - 32\), it follows that \(41 \times 32\) is closer to being a square than \(43 \times 21\) is, and therefore the product \(41 \times 32\) is greater.

There is lots for learners to explore here, and it is easy to increase the challenge by extending to more than four digits.

Multiplying decimals is more complicated than adding and subtracting them.

With addition and subtraction, there is generally a lot of emphasis on being careful to line up the decimal points in the different numbers, and so it is natural for learners to try to do the same thing with multiplication.

If they want to work out \(82.3 \times 1.5\), say, then they might write

where the decimal points are carefully aligned in each number, and they may be surprised that the answer is not correct.

We can immediately see that a number less than \(100\) (\(82.3\)) multiplied by a number less than \(2\) (\(1.5\)) must give an answer less than \(200\), so \(1234.5\) is much too large. We could also reason that when \(82.3\) is multiplied by a number greater than \(1\) (\(1.5\)), we must obtain an answer greater than \(82.3\). If we trust the digits in our calculated answer (\(12345\)), then we can reason that the only possible number in between \(82.3\) and \(200\) is \(123.45\), so that must be the right answer.

Sometimes learners do decimal multiplications in this kind of way - by omitting all the decimal points, getting the correct digits from the corresponding integer calculation (here, \(823 \times 15 = 12,345\)), and then using this kind of estimation to put the decimal point into the answer in the correct place. This can be a good way to develop estimation skills (see Chapter 5), but for me it is unsatisfying to have to ‘guess’ where the decimal point must go.

An alternative approach is for learners to do the integer calculation and then deduce precisely how many places the digits must be moved:51

This is all about ‘doing the same things to both sides’, since these are equations, and so thinking algebraically is the tool needed for this.

Despite having these alternative methods that give the correct answer, we should expect learners to want to understand why simply tracking the decimal points down into the answer, as we did above, does not work.

When we did the integer calculation

we had to use a zero placeholder (highlighted above) on the second line of our calculation. This was because we were multiplying \(10\)s, rather than \(1\)s. Really, we were just shifting one place left to do our calculation.

All that is happening in decimal multiplication is that we need to shift one place right when multiplying \(10\)ths, two places right when multiplying \(100\)ths, and so on. It is exactly analogous to what we do with numbers with integers greater than \(9\).

For example, to multiply \(82.3\) by \(0.5\) we need

In the first step, when we multiply ‘\(5\)’ by ‘\(3\)’, everything is shifted one place to the right, relative to multiplying by \(5\), because we are multiplying by \(0.5\), not \(5\). This exactly parallels how, if we were multiplying by \(50\), everything would be shifted one place to the left (relative to multiplication by \(5\)).

Provided learners do this shifting, there is no reason why they cannot do decimal multiplication by aligning the decimal points:

Learners can use this approach to explain why, for example, \({0.3}^{2} = 0.09\), and not \(0.9\). (If learners think that \(0.3 \times 0.3 = 0.9\), the teacher can ask them what \(0.3 \times 3\) would be equal to.)

Once learners are comfortable with indices (Chapter 4) and standard form (Chapter 5), they may find decimal multiplications easier to carry out by writing the powers of \(10\) explicitly.

For example, \[(0.3)^2 = (3 \times 10^{-1})^2 = 9 \times 10^{-2} = 0.09.\]

2.10.7.4 Division

Finally, for the division algorithm,52 we can understand what is going on by thinking in terms of fractions.53

To work out, say \(\displaystyle \dfrac{12,345}{15}\), we could divide by \(5\) and then by \(3\), but let’s imagine we want to use long division, with \(15\) as the divisor.

We need to find the largest multiple of \(15\) that is less than \(123\) (the first three digits).

We might reason that \(15 \times 10 = 150\), which is too much, and try \(15 \times 8 = 120\), which is less than \(15\) below \(123\), so must be the closest possible.

This means that \(15 \times 800 = 12000\), so we can split the fraction into two divisions, one of which we now know the answer to:

\[\dfrac{12345}{15} = \dfrac{12000}{15} + \dfrac{345}{15} = 800 + \dfrac{345}{15}.\]

Now, we just repeat this process as many times as necessary.

We know that \(15 \times 2 = 30\), so \(15 \times 20 = 300\), so we can again split our remaining fraction into two fractions, one of which is an integer:

\[800 + \dfrac{345}{15} = 800 + \dfrac{300}{15} + \dfrac{45}{15} = 800 + 20 + \dfrac{45}{15}.\]

We perhaps realise at this point that \(\displaystyle \dfrac{45}{15} = 3\), but if we suppose we didn’t, then we might use \(15 \times 2 = 30\) to split up the final fraction into two:

\[800 + 20 + \dfrac{45}{15} = 800 + 20 + \dfrac{30}{15} + \dfrac{15}{15} = 800 + 20 + 2 + 1.\]

And so we have \(823\).

This is all that long division is doing, just in a different layout, and comparing the two can help learners clarify what is happening:

If we find that we are not left with a zero at the end, then there is a remainder, which is just the numerator of a proper fraction with denominator \(15\).

If the dividend (numerator) is a decimal, this presents no problem.

For example, to work out \(\displaystyle \dfrac{123.45}{15}\), we just include a decimal point in the quotient directly above the one that is in the dividend:

This corresponds to shifting all the digits in the dividend (and therefore the quotient) two places to the left, relative to the integer calculation \(\displaystyle \dfrac{12345}{15}\).

When there is a decimal in the denominator, that is trickier, and it is usually easier just to change the division into one in which the denominator is an integer.

For example, to work out \(\displaystyle \dfrac{123.45}{1.5}\), we could multiply numerator and denominator by \(10\), and instead calculate \(\displaystyle \dfrac{1234.5}{15}\), where the denominator is now an integer.

Learners sometimes get confused when doing this, and think they need to make an adjustment afterwards to ‘put the decimal point back’.

However, since \[\displaystyle\dfrac{123.45}{1.5} = \displaystyle \dfrac{1234.5}{15},\] the two calculations give the same answer, \(82.3\), and no such adjustment is needed.

In some cases, multiplication by a smaller number than \(10\) might be enough to make the denominator an integer.

In our example, we could double the numerator and the denominator, to give

\[\dfrac{123.45}{1.5} = \dfrac{246.9}{3}.\]

We can then divide \(246.9\) by \(3\), to obtain \(82.3\).

It is always good with divisions to check the answer by doing the inverse (multiplication) calculation, at least approximately.

2.10.8 Multiples and factors

2.10.8.1 Why multiples and factors are useful

Why does the school curriculum devote time to factors and multiples of the positive integers?

One reason is that this is fundamental to learning about prime numbers, which are the building blocks of the integers.

Another reason is that they are useful when working with fractions and ratios. Dividing by common factors enables us to ‘cancel down’ and simplify fractions into their lowest terms, ready for multiplication and division. In the other direction, using a multiplier to obtain common multiples enables us to ‘cancel up’, as I sometimes call it (i.e. scale up), to create fractions with common denominators, ready for addition and subtraction (Figure 2.25).

Figure 2.25: (a) Cancelling down and (b) cancelling ‘up’ (scaling up).

This is what I described in Chapter 1 as thinking multiplicatively.

Once the denominators match, we can visualise just sliding right (for addition) or left (for subtraction) along a number line going up in units of whatever the reciprocal of the denominator is (e.g. \(\frac{1}{20}\)s, if the denominator is \(20\)).

For me, the congestion around the arrows in Figure 2.25(b) is undesirable, but hard to avoid when handwriting. Putting each expression on a different line introduces a different problem, in distinguishing transformations to numerators and denominators.

So, my preferred layout is to use ‘neat crossing out’, similar to how exchanging a power of \(10\) is notated in the standard subtraction algorithm. I tend to encourage learners to show ‘cancelling up’ in this way (Figure 2.25(a)), similar to the more common practice of doing this when cancelling down (Figure 2.25(b)).54

Figure 2.26: Using ‘neat crossing out’ to show the method for (a) addition of fractions, (b) multiplication of fractions.

Factors and multiples can be confused,55 but factors are numbers that ‘go into’ another number, and there are a finite number of them, whereas multiples are the numbers in the multiplication table, and therefore go on forever. It can be helpful to contrast that \(6\) is a factor of \(12\) but a multiple of \(3\) (Figure 2.27).

Figure 2.27: The relationship between \(6\) and \(12\).

Eventually, learners will be able to find all the factors of a number by using the prime factorisation. But initially they will do this systematically by finding all the factor pairs.

For example, to find the factors of \(24\), we begin with \(1\) and \(24\), and write

\(1\)
\(24\)

at opposite ends of a line.

Then we try \(2\), which goes into \(24\) twelve times, so we write

\(1, 2\)
\(12, 24\)


It is important for learners to realise at this point that we have just cut our work in half. They may assume we will need to check all the integers between \(1\) and \(24\), and so far we have only checked \(1\) and \(2\), so there is a lot of work left to do. But we have actually already ruled out any number between \(12\) and \(24\).

If there were a number in this interval that was a factor of \(24\), its factor-pair partner would have to lie between \(1\) and \(2\), which is impossible, since there are no integers between \(1\) and \(2\). So, there are far fewer numbers to check than learners might think, and stressing this can encourage them to persevere.

Now, we check \(3\), and then \(4\), and write

\(1, 2, 3, 4\)
\(6, 8, 12, 24\)


We check \(5\), and it is not a factor of \(24\), and now we are finished, because our left-hand numbers have met the right-hand numbers (there is no integer between \(5\) and \(6\)).

This exhaustive approach guarantees we can’t miss any factors, so we can conclude that \(24\) has exactly \(8\) factors - no more, no fewer - which are \(\{ 1,\ 2,\ 3,\ 4,\ 6,\ 8,\ 12,\ 24\}\).

It is worth asking learners whether a number could ever have an odd number of factors.

It may seem impossible, because we find the factors in pairs. But with a square number, the final pair of numbers will be a number ‘with itself’, which we count only once. So, square numbers always have an odd number of factors, and are the only numbers which do so.

A task which reveals this is the following.

TASK 2.15

A large hall has a row of \(100\) ceiling lights, numbered \(1\) to \(100\).
Each light has a push-button switch labelled with the light number.
Pushing the button turns the light on, if it is off, or off, if it is on.

One hundred people, numbered from \(1\) to \(100\), line up to push the light switch buttons.
At the start, all the lights are off.
The first person pushes every button, and then leaves.
The second person pushes every second button, starting with light \(2\), and then leaves.
The third person pushes every third button, starting with light \(3\), and then leaves.
The pattern continues, until the \(100\)th person presses only the \(100\)th button.

Once all \(100\) people have passed through, which lights remain on?

Learners will need time to think through what is happening and perhaps make a chart to explore what happens for, say, the first \(10\) people (and the first \(10\) lights, which will be the only ones touched by them).

Since each push on a light switch toggles between on and off, and all the lights begin in the off state, the lights that will be on after all \(100\) people have passed through will be the ones with an odd number of factors. As discussed above, these will be the square numbers, so the lights remaining on will be all the squares up to \(100\); namely, \(1\), \(4\), \(9\), \(16\), \(25\), \(36\), \(49\), \(64\), \(81\) and \(100\).

2.10.8.2 Lowest common multiples and highest common factors

Learners do not always appreciate that lowest common multiples and highest common factors are intimately related.56 This may not always be apparent from the algorithms commonly taught to find them. There is sometimes a tension between the easiest method for obtaining the right answer and the most transparent method, which will be more memorable and understandable, but which may be a bit slower in practice.

For me, the choice here is easy: I want methods that communicate the essence of what is going on, and if they take a few more seconds to compute with, that is not a big issue. We are not preparing learners for a lifetime of extensive hand computation of factors and multiples. But we do want them to leave their mathematics education with a sense of what it is all about.

For me, the Euclidean Algorithm is the most transparent way to find the highest common factor (HCF) – also known as greatest common divisor (GCD) – of two positive integers, \(a\) and \(b\).57 When we need the lowest common multiple (LCM) of them, instead, we can find that from the relationship

\[\text{LCM}(a,\ b) = \dfrac{ab}{\text{HCF}(a,b)}.\]

It is intuitive that the LCM is just the product of the two numbers, provided the two numbers are co-prime (i.e. have no common factors greater than \(1\)). If the two numbers happen not to be co-prime, all we have to do is divide out from the product the \(\text{HCF} > 1\) that is preventing them from being co-prime.

To use the Euclidean Algorithm to find the HCF, we just keep subtracting multiples of the smaller number from the larger number, until we can do so no longer.

For example, to find the HCF of \(24\) and \(40\), we can write

\[\begin{aligned} & \text{HCF}(24, 40) \\ ={}& \text{HCF}(24, 16) && \text{Subtracting $24$ from $40$} \\ ={}& \text{HCF}(8, 16) && \text{Subtracting $16$ from $24$} \end{aligned}\]

In the first step, the first number was the smaller number and in the second step the second number was the smaller number. This doesn’t matter, since \(\text{HCF}(a,b) = \text{HCF}(b,a)\).

We can see now that the answer must be \(8\), but if we wanted to, we could continue one more step:

\[\begin{aligned} ={}& \text{HCF}(8, 8) && \text{Subtracting $8$ from $16$} \end{aligned}\]

Let’s try an example where one of the numbers is much larger than the other:

\[\begin{aligned} & \text{HCF}(24, 650) \\ ={}& \text{HCF}(24, 2) && \text{Subtracting $27$ lots of $24$ from $650$} \end{aligned}\]

So, the \(\text{HCF}\) is \(2\).

If we don’t spot that \(27\) is the largest number of \(24\)s that will fit into \(650\), it doesn’t matter. We can take away fewer \(24\)s, and we are still making progress - we will just have a couple more steps. In terms of the chess analogy I presented in Section 2.7.3, we are at Stage 2, rather than the maximally efficient Stage 3.

For example, the process could go like this:

\[\begin{aligned} & \text{HCF}(24, 650) \\ ={}& \text{HCF}(24, 170) && \text{Subtracting $20$ lots of $24$ from $650$} \\ ={}& \text{HCF}(24, 26) && \text{Subtracting $6$ lots of $24$ from $170$} \\ ={}& \text{HCF}(24, 2) && \text{Subtracting $24$ from $26$} \end{aligned}\]

We will always arrive at \(\text{HCF} = 2\), however we do it.

Some tasks to provoke thought about the relationship between HCF and LCM are these:58

TASK 2.16

I’m thinking of two positive integers.

What could they be if …
1. \(\text{LCM} = \text{HCF}\)
2. \(\text{LCM} = 2 \times \text{HCF}\)
3. \(\text{LCM} = 10 \times \text{HCF}\)
4. \(\text{LCM} = 12 \times \text{HCF}\)

Try this sort of problem with other multipliers.
What do you find?

A nice task that relates to HCF is the following:

TASK 2.17

Look at the line segment shown below.
How many dots are on this line segment (including the ends)?


Investigate the number of dots on different line segments joining pairs of dots.

Learners will realise that the number of dots depends on the relationship between the horizontal distance, \(5 - 2 = 3\), and the vertical distance, \(7 - 1 = 6\), that form the horizontal and vertical legs of the right-angled triangle whose hypotenuse is the given line segment. Another way to think of these numbers is as the translation vector corresponding to the line segment, \(\begin{pmatrix} 3 \\ 6 \end{pmatrix}\) (Chapter 4).

Because the HCF of \(3\) and \(6\) is \(3\), there will be \(3 + 1\) dots on the line (including the ends).

In general, the line segment from \((a,\ b)\) to \((c,\ d)\), where \(a\), \(b\), \(c\) and \(d\) are integers, will contain

\[\text{HCF}\left( |a - c|,|b - d| \right) + 1 \text{ dots}.\]

A related question is this task:

TASK 2.18

Draw a square with all its vertices on the lattice points of a square grid.
How many complete whole squares are contained inside it?

Investigate for other squares, including tilted squares.

For squares with edges parallel to the grid lines, an \(n \times n\) square will obviously contain \(n^2\) unit squares.

Tilted squares are more interesting.

For example, the tilted square shown in Figure 2.28 contains \(13\) whole squares, shaded grey. Learners can explore the connection between the size and orientation of the tilted square and the number of whole squares inside it.59

Figure 2.28: A tilted square containing \(13\) whole squares.

Factor puzzles can be a good way to develop fluency with finding factors.60

TASK 2.19

The numbers inside the cells in the square below are the products of the corresponding numbers outside.


Make up another example like this.
Delete the outside numbers.
Can your partner discover what numbers were outside?

What is a good strategy for solving a puzzle like this?
When is there more than one solution?

2.10.9 Prime numbers

2.10.9.1 Identifying primes

Even if learners use the Euclidean Algorithm for finding HCFs, rather than working from prime factorisations, expressing a positive integer as a product of primes is still an important skill, and essential for revealing why the primes are so special.61

Every positive integer greater than \(1\) can be expressed uniquely (apart from order) as a product of prime numbers. Prime numbers are, in a sense, the ‘ultimate’ multipliers, because, starting with \(1\), with the primes we can reach multiplicatively to any positive integer we like:

\[ \begin{matrix} 1 & \xrightarrow{\textstyle \times \text{ product of primes}} & \text{any integer} > 1 \\ \end{matrix} \]

I think this is the best way to answer the common question why \(1\) is not counted as a prime number. Historically, it did used to be included. But, with a modern perspective on the primes, as the ‘atoms’ of all the positive integers (greater than \(1\)), then \(1\) is not part of that story, because multiplication by \(1\) doesn’t have any effect. It is useless as a multiplier (although, of course, very important for other reasons!).62

Learners can use the sieve of Eratosthenes63 to find the first \(25\) (say) prime numbers, in which they shade out the composite (non-prime) numbers on a number grid, working multiple by multiple. A grid with six columns offers more potential insight than a standard \(1-100\) number square.

First, they shade out all the even numbers (except \(2\)), then all the multiples of \(3\) (except \(3\)) that are left (i.e. the odd multiples of \(3\)). Using six columns (effectively ‘numbers \(\text{mod }6\)’, Chapter 4) makes this very easy (Figure 2.29(a)).

Figure 2.29: Finding prime numbers using the sieve of Eratosthenes.

By this point, all the multiples of \(4\) have already gone (why?), so next we have to shade out any multiples of \(5\) that are still left. Multiples of \(6\) have gone already (because we eliminated all the multiples of \(2\) and of \(3\)). Some multiples of \(7\) remain, so we shade those out, which is easy, as they make a nice diagonal pattern in a \(6\)-column arrangement.

All the remaining (unshaded) squares on the grid now must actually be prime, although this is not immediately obvious. The trickiest aspect of the Sieve of Eratosthenes is knowing when to stop.

Learners should realise that the multiples of \(8\), \(9\) and \(10\) will already all be gone (because they all have factors that we have already shaded out). But learners might worry that we need to go on next to check the multiples of \(11\). Might there not be a multiple of \(11\) remaining unshaded somewhere?

The reason why this cannot be is that the first \(10\) of these multiples, excluding \(11\) itself, necessarily have factors less than \(11\). For example, \(5 \times 11\) has a factor of \(5\), and we have already shaded out all of these. This means we would only need to begin checking multiples of \(11\) from \(11^{2}\) onwards.

However, since \(11^{2} = 121\), and our grid only goes as far as \(102\), there can’t be any more unshaded multiples of \(11\) on the grid. And the same goes for any higher number, such as \(13\), since if \(11^{2}\) is off the grid, \(13^{2}\) must be also.

It follows that we only need to check primes up to the largest prime that is less than the square root of the highest number on the grid - in our case, \(\sqrt{102}\), which is just over \(10\). This explains why the multiples of \(7\) were the last ones that needed shading out, because the next largest prime is \(11 > \sqrt{102}\).

Our final grid looks as in Figure 2.29(b).

2.10.9.2 Patterns among primes

Learners will observe in Figure 2.29(b) that, after the first row, primes appear only in columns \(1\) and \(5\), meaning that all primes appear to be either \(1\) more than a multiple of \(6\) or \(1\) less than a multiple of \(6\). We can prove that this is true in general by explaining why there can’t be any primes in the other four columns (after the first row).

Obviously we won’t get primes in columns \(2\), \(3\) or \(6\), after row \(1\), because all the numbers in those columns are multiples of \(2\), \(3\) and \(6\) respectively. We can reason this by observing that the numbers in column \(2\) begin with \(2\) and then go up in \(6\)s, and regardless of how many \(6\)s you add to \(2\), because \(6\) is even, we must remain within the even numbers. Similarly, for column \(3\), the starting number \(3\) plus an arbitrary number of \(6\)s will always be a multiple of \(3\), and, for column \(4\), the number \(4\) plus an arbitrary number of \(6\)s will have to be even, because we are adding evens to evens. So, columns \(2\), \(3\), \(4\) and \(6\) can’t contain primes after the first row, meaning that any primes that exist will have to fall in columns \(1\) and \(5\). And we do in fact see that there are primes in both of those columns beyond the first row.

We can also prove that there are infinitely many primes, without using any algebra, and this can be a highlight in mathematical thought that is well worth sharing with learners. It can be styled as a proof by contradiction:64 we suppose that something is the case, and then see where that leads. If it leads, by valid steps, to something that is obviously false, then we know that the initial supposition must have been flawed.

In this case, we suppose that there are just a finite number of primes. There might be a very large number of them, but if we made a list of all the primes, the list would eventually end. There would be some largest prime number, and no more primes after that one. That is the thing that we suppose to be true, and we will show by a simple argument that it can’t possibly be correct.

The trick is to imagine multiplying together all the prime numbers on our list, and adding \(1\) to the product. If there are a lot of prime numbers, then this would be a lot of computation, resulting in a very large number. But the computation will not go on forever, if there are only a finite number of primes, and so we will eventually get an answer to the product of all of the primes plus \(1\).

Now what kind of number will we have we got? Might it be one of the prime numbers on our list? Certainly not - it is much too big - considerably larger than the largest prime number on our list, which is supposed to be the largest prime number there is. So, this number we have created cannot be prime, if our list contains all the primes.

So, if our big number is composite (i.e. not prime), then it must be composed of some product of the primes on our list. But that is impossible too. By the way we constructed that number, it must give a remainder of \(1\) when you divide it by any of the prime numbers on our list. For example, \(7\) cannot be a factor of our big number, because our big number is equal to \(7 \times \left( \text{the product of all the other prime numbers on our list} \right) + 1\). So, our big number would leave a remainder of \(1\) if we divided it by \(7\). And the same applies for all the other prime numbers on our list, because that is how we created this big number.

So, our big number is not a prime number on our list, and is not a product of any of the prime numbers that are on our list. So, it must either be a prime number that is not on our list, or a product of some prime numbers that includes at least one prime number that is not on our list. Either way, there must exist prime numbers that were not on our list. And this will apply however (finitely) long our list of prime numbers gets. It is not just a matter of adding a couple more prime numbers to our list to complete it. We can always run this argument on any finite list of prime numbers, however long, and show that there will always be at least one other prime number that is not on that list. Any list of primes must be incomplete. This is the same thing as saying that there can be no end to the list of prime numbers, or that there are infinitely many of them.

2.10.9.3 Prime factorisation

You can find prime factorisations of any positive integer by systematically repeatedly dividing by the highest prime factor. Less formally, you can just keep splitting into ‘easy’ products that you recognise, until you reach a product containing only primes.

For example.

\[650 = 10 \times 65 = 10 \times (5 \times 13) = (2 \times 5) \times 5 \times 13 = 2 \times 5^{2} \times 13.\]

Sometimes people draw out trees to represent the factors of each number, but trees are also used sometimes to represent partitioning numbers into sums, rather than products. And it can be easy to lose a branch of the tree by mistake, without noticing. So, I usually prefer to just write out the products, as above.

Once a number is in prime factorised form, it is easy to see how many factors it has.

For example, consider the number \(60 = 2^{2} \times 3 \times 5\).

Any factor of \(60\) must be a product of some power of \(2\), some power of \(3\) and some power of \(5\), where we include \(1\) as the zeroth power of each prime number, so we are not requiring every prime number to be present in every factor.

\[ \begin{array}{ccc} 2^0 & 3^0 & 5^0 \\ 2^1 & 3^1 & 5^1 \\ 2^2 & & \end{array} \]

This means that, if we choose a power of \(2\) less than the third, a power of \(3\) less than the second, and a power of \(5\) less than the second, and multiply them together, we must get a factor of \(60\). For example, \(2^{1} \times 3^{0} \times 5^{1} = 10\), which is a factor of \(60\).

In the case of \(60\), we have three possible powers of \(2\), two possible powers of \(3\) and two possible powers of \(5\), so altogether there are \(3 \times 2 \times 2 = 12\) factors of \(60\). And we can read off the \(3\), \(2\) and \(2\) from the prime factorisation \(2^{2} \times 3^1 \times 5^1\), because those numbers are just \(1\) more than the index on each prime number.

In general, the number \(2^{a} \times 3^{b} \times 5^{c} \times 7^{d} \times \ldots,\) where \(a\), \(b\), \(c\), \(d\), … are positive integers, will have \[(a + 1)(b + 1)(c + 1)(d + 1)\ldots \text{ factors.}\] There are many interesting puzzles that reveal and exploit this property, such as asking learners to find a number that has exactly \(30\) factors.65

There is much that is not known about prime numbers, and many open questions and conjectures are quite accessible for learners to understand and explore.66

2.10.10 Divisibility tests

One way to get into divisibility tests is to present learners with a fairly large positive integer, and ask them what they can say about it, just by looking at it.

TASK 2.20

What can you tell me about the number \(4536\)?

Someone might say that it is an integer, or that it is even, or that it is not a multiple of \(5\), that it has \(4\) digits, is greater than \(1000\), has no repeated digits, and so on.

The teacher can steer things towards divisibility by asking about whether it could be a multiple of \(10\). If learners know that multiples of \(10\) always end in a zero, this could be the first ‘divisibility test’ to write down in a list of such tests. Learners may know that multiples of \(5\) have to end in a \(5\) or a zero, and that multiples of \(2\) (even numbers) have to end in \(0\), \(2\), \(4\), \(6\) or \(8\).

There is lots of scope for learners to test similar-sounding conjectures, such as ‘Multiples of \(4\) always end in a \(4\)’, which is false.67

There is also a great opportunity to think about the difference between ‘if’ and ‘if and only if’ (‘iff’). The phrase ‘if and only if’ is not just a pretentious way of saying ‘if’ - it means something more.

For example, a multiple of \(4\) is always even, but an even number is not always a multiple of \(4\) (it could be half way between two multiples of \(4\), like \(30\) or \(6\)).

So, an integer is a multiple of \(10\) if it is a multiple of \(100\), but an integer is a multiple of \(10\) iff it ends in a zero. Not only do all multiples of \(10\) end in a zero, but all numbers that end in a zero are multiples of \(10\). When the ‘if’ works in both directions, we have an ‘iff’.

2.10.10.1 Divisibility by \(\boldsymbol{4}\) and \(\boldsymbol{8}\)

Learners may not know how to test whether a number is a multiple of \(4\) or not.

One way is to divide the number by \(2\) and see if the result is even. If and only if half of a number is even, then the number is a multiple of \(4\).

Another way is to realise that only the final two digits matter, since \(100\) is a multiple of \(4\), and therefore any multiple of \(100\) must also be a multiple of \(4\).

For example, with our number \(4536\), because \(36\) is a multiple of \(4\), the entire number must be, because we can write

\[4536 = 45 \times 100 + 36 = 45 \times 100 + 4 \times 9.\]

Because \(100\) is a multiple of \(4\), \(45 \times 100\) must also be, and therefore \(45 \times 100 + 4 \times 9\) must be, since the sum of two multiples of \(4\) has to be a multiple of \(4\). Even though \(45\) is not a multiple of \(4\), we know that \(4500\) is, because \(100\) is a multiple of \(4\).

We can extend this thinking to multiples of \(8\). If we halve a number twice, and the result is even, then the original number must have been a multiple of \(8\). Or we could halve the number once and test for divisibility by \(4\).

For divisibility by \(8\), looking at the final two digits is not sufficient, because \(100\) is not a multiple of \(8\). We can find examples, such as \(216\), in which the final two digits are a multiple of \(8\), and the entire number is a multiple of \(8\). But we can also find counterexamples, such as \(116\), in which the final two digits are a multiple of \(8\), but the entire number is not a multiple of \(8\). Just one counterexample is enough to demolish a universal claim (a claim about ‘all’ members of some category).

To devise a similar test for divisibility by \(8\), we would need to consider the final three digits, because \(1000\) is a multiple of \(8\). This is not such an easy divisibility test, because we are unlikely to instantly know our multiples of \(8\) for all three-digit numbers, but it is useful for very large numbers to know that we can ignore everything except the final three digits.

2.10.10.2 Divisibility by \(\boldsymbol{9}\)

Divisibility by \(9\) is a particularly interesting case that is worth learners knowing about.

If the digital root of a number is \(9\), then the number is divisible by \(9\). To find the digital root of a number, we sum the digits, and then sum the digits of that number, and so on, until we obtain a single digit.

For example, the digit sum of \(4536\) is \(4 + 5 + 3 + 6 = 18\), and the digit sum of \(18\) is \(1 + 8 = 9\), and now that we have obtained a single-digit answer it follows that \(9\) is the digital root of both \(18\) and \(4536\). Since the digital root of \(4536\) is \(9\), we know that \(4536\) is a multiple of \(9\).

To prove this, we need to think algebraically, although not necessarily using any formal symbols.

We can start by writing out \(4536\) in terms of powers of \(10\), like we did when exploring the standard algorithms (Section 2.10.7):

\[4536 = 4 \times 1000 + 5 \times 100 + 3 \times 10 + 6 \times 1.\]

Now, we can use that fact that each power of \(10\) is one more than a string of \(9\)s, to write:

\[4536 = (4 \times 999 + 4) + (5 \times 99 + 5) + (3 \times 9 + 3) + 6.\]

This looks like a very odd thing to do, but since \(9n + n = 10n\), and \(99n + n = 100n\), and \(999n + n = 1000n\), and so on, it is correct.

The point of doing this is that we know that any rep-digit (i.e. repeating digit number) of \(9\)s (i.e. \(9\), \(99\), \(999\), \(9999\), etc.) must be divisible by \(9\). We know that, because dividing rep-\(9\) by \(9\) gives rep-\(1\) (i.e. \(1\), \(11\), \(111\), \(1111\), etc.), if we imagine dividing each digit \(9\) in turn by \(9\) and getting a \(1\).

So, we can rewrite \(4536\) as

\[4536 = \underbrace{(4 \times 999 + 5 \times 99 + 3 \times 9)}_{\textstyle \text{multiple of } 9} + \underbrace{(4 + 5 + 3 + 6)}_{\textstyle \text{digit sum}}.\]

The digit sum might be more than \(9\), especially if the starting number had a lot of digits, but if it is then we can just do the same thing again with the digit sum, to obtain a new, smaller digit sum, and we can continue until the digit sum as just one digit, at which point it is the digital root.

This means that any positive integer at all is always equal to a multiple of \(9\) plus its digital root. Another way to say this is that the digital root is a number’s remainder after dividing by \(9\), unless the digital root is \(9\), as in our example, in which case the remainder is zero. That is why a digital root of \(9\) corresponds to the number being a multiple of \(9\).

So, we can conclude from this that \(4536\) must be a multiple of \(9\), even though we haven’t bothered to work out which multiple of \(9\) it is. Learners can now use this to easily devise large numbers that they can be sure will be multiples of \(9\), and check using a calculator.

2.10.10.3 Divisibility by \(\boldsymbol{3}\)

A multiple of \(9\) is necessarily a multiple of \(3\), so we immediately know that \(4536\) is also a multiple of \(3\).

But the converse is not true: a multiple of \(3\) isn’t necessarily a multiple of \(9\). A multiple of \(3\) could be a multiple of \(9\), but it could also be \(3\) more than a multiple of \(9\), or \(6\) more than a multiple of \(9\) (equivalently, we could say ‘\(3\) less than a multiple of \(9\)’) (Figure 2.30).

Figure 2.30: Three kinds of multiple of \(3\): \(9n\), \(9n + 3\) and \(9n + 6\) (shown shaded differently).

That means that the multiples of \(3\) must have a digital root that is \(3\), \(6\) or \(9\) - in other words, they must have a digital root that is itself a multiple of \(3\).

In practice, we find digit sums until we hit a number we can easily see is a multiple of \(3\). If we get to \(66\), for example, we can see that that must be a multiple of \(3\) (since \(3\) goes into each digit), so there is no need to do \(6 + 6 = 12\) and \(1 + 2 = 3\) to get to the digital root.

It is very powerful to know that any number with a digit sum of \(66\), say, must be a multiple of \(3\), because we can instantly make very large numbers that we know absolutely have to be multiples of \(3\). For example, we could have \(30\) twos digits and \(6\) ones digits, in any order we like, and we can be sure that, for instance, \(222,221,222,112,222,122,222,221,222,222,221,222\) has to be divisible by \(3\).

2.10.10.4 Divisibility by \(\boldsymbol{6}\)

To test for divisibility by \(6\), we can do the test for \(2\) (even number) and the test for \(3\), and the number must pass both tests. This works because \(2\) and \(3\) are co-prime (i.e. they have a HCF of \(1\)).

Combining tests doesn’t work for divisibility by \(8\), for example, because passing the test for divisibility by \(2\) and for divisibility by \(4\) would not guarantee that a number was divisible by \(8\). Divisibility by \(4\) implies divisibility by \(2\), so the test for divisibility by \(2\) would add no additional information, and be redundant. This is because \(2\) and \(4\) are not co-prime.

2.10.10.5 Divisibility by \(\boldsymbol{11}\) and \(\boldsymbol{7}\)

There is lots for learners to explore in the area of divisibility, with plenty of opportunities for ‘tricks’68 and to create and test all kinds of conjectures.69 Learners may find it interesting to explore patterns in divisibility by less common divisors, such as \(11\) and \(7\).

For the case of dividing by \(11\), we can note that even powers of \(10\) (i.e. \(10^{0}\), \(10^{2}\), \(10^{4}\), …) are all \(1\) more than a multiple of \(11\), whereas odd powers of \(10\) (i.e. \(10^{1}\), \(10^{3}\), \(10^{5}\), …) are all \(1\) less than a multiple of \(11\). Because of this, the alternating digit sum (i.e. adding one digit, subtracting the next, and so on) of a multiple of \(11\) will also be a multiple of \(11\), so the alternating digital root must be zero.

For example, for the number \(739,586,749\), we compute \(7 - 3 + 9 - 5 + 8 - 6 + 7 - 4 + 9 = 22\), which is a multiple of \(11\). It may actually be easier to compute the sum of the digits in the odd positions (\(7 + 9 + 8 + 7 + 9 = 40\)) and the sum of the digits in the even positions (\(3 + 5 + 6 + 4 = 18\)) separately, and then find the difference, \(40 - 18 = 22\).

If we wanted to, we could go on to calculate the alternating digit sum of \(22\), which would be \(2 - 2 = 0\), which is a multiple of \(11\) (since \(0 \times 11 = 11\)). Either way, we conclude that \(739,586,749\) is a multiple of \(11\).

For the case of dividing by \(7\), one test is to remove the \(1\)s digit, double it, and subtract it from the rest of the number. If the result is a multiple of \(7\), then so was the original number.

For example, for the number \(12,684\), we take off the \(4\) and double it, to get \(8\), and then work out \(1268 - 8 = 1260\). It isn’t obvious to me whether \(1260\) is a multiple of \(7\) or not, so I will continue the process: \(126 - 2 \times 0 = 126\). If I am still not sure whether \(126\) is a multiple of \(7\), then I can perform the process one final time: \(12 - 2 \times 6 = 0\), which is certainly a multiple of \(7\) (because \(0 \times 7 = 0\)), so \(126\) must also have been a multiple of \(7\), and so must \(1260\), and so must \(12,684\).

There is a nice opportunity here to use a little algebra to prove why this test works.

Let our original number be \(10a + b\), where \(b\) is an integer between \(0\) and \(9\), and \(10a\) is the rest of the number. Separating out an integer into its ‘tens’ and its \(1\)s can be very useful when we are interested in the digits. Here, \(b\) is a single digit, but \(a\) doesn’t have to be.

The process for our divisibility by \(7\) test involves making the new number \(a - 2b\).

Now, if our original number \(10a + b\) is a multiple of \(7\), we can write

\[10a + b = 7m,\]

where \(m\) is some integer.

Rearranging, we get \[b = 7m - 10a,\] and substituting this into \(a - 2b\) gives

\[\begin{aligned} a - 2b &= a - 2(7m - 10a) \\ &= a - 14m + 20a \\ &= 21a - 14m. \end{aligned}\]

Since both \(a\) and \(m\) are integers, and \(21\) and \(14\) are multiples of \(7\), it follows that \(a - 2b\), our new number, is a multiple of \(7\) if and only if our previous number, \(10a + b\), was. This is a nice opportunity to use a bit of algebraic simplification to verify something that isn’t at all obvious.

A nice extension of this work is to consider how square numbers (and sums and differences of squares) relate to the multiples of \(4\).70 Learners could also explore what happens when numbers ending in a \(5\) are squared.71

2.10.11 Rational and irrational numbers

One nice feature of all this work focused on positive integers, and from there, fractions, is that it leads to the realisation that there must be numbers which cannot be expressed as fractions (i.e. as rational numbers, which are equal to \(\dfrac{a}{b}\), where \(a\) and \(b\) are integers, with \(b \neq 0\)).

The way I like to introduce this is to ask learners if they can think of two square numbers, one of which is twice the other.

The discussion sometimes goes like this:

\[\begin{aligned} &\text{6 and 12} && \text{\textit{Those aren't square numbers.}} \\ &\text{9 and 16} && \text{\textit{Those are square numbers, but 16 isn't twice 9.}} \\ &\text{8 and 16} && \text{\textit{But 8 isn't square.}} \end{aligned}\] … and so on.

Eventually, someone will conjecture that it is impossible. But why should it be impossible?

Usually, someone notices that if you double a square twice, then we do get back to a square number.

For example, \(9\) is square, \(18\) (doubled once) isn’t, but \(36\) (doubled again) is square. In other words, finding a square number which is four times another square number is easy. But, for some reason, finding a square number which is twice another square number seems like it might be impossible.

Our prime factorisations can help us see why it is impossible. If we take a number like \(90\), for example, which isn’t square, we can write it as a product of primes like this:

\[90 = 2 \times 3^{2} \times 5.\]

If we square \(90\), that means squaring \(2\) and \(3^{2}\) and \(5\):

\[90^{2} = \left( 2 \times 3^{2} \times 5 \right)^{2} = 2^{2} \times 3^{4} \times 5^{2}.\]

Whatever the indices of the prime factors were for \(90\), they all have to be doubled in \(90^{2}\), and that means they will all have to end up being even. In particular, the prime factor of \(2\) must have an even index in \(90^{2}\), and, indeed, in the prime factorisation of any square number, all the primes must appear as even powers.

Now we see the problem that comes when you double a square.

For example, if we double \(90^{2}\) we get

\[2 \times 90^{2} = 2^{3} \times 3^{4} \times 5^{2}.\]

Inevitably, the \(2\) now has an odd index, because its even index has increased by \(1\), from the new factor of \(2\) that has been introduced.

This is completely general, and so we can see that if \(b^{2}\) is a square then \(2b^{2}\) can’t be.

We can also see why doubling again made it into a square.

If \(b^{2}\) is a square then \(4b^{2}\) is guaranteed to be also:

\[4 \times 90^{2} = 2^{2} \times \left( 2^{2} \times 3^{4} \times 5^{2} \right) = 2^{4} \times 3^{4} \times 5^{2}.\]

Multiplication by \(4\) is no problem, because \(4\) is an even power of \(2\), and that increases the power of \(2\) in \(90^{2}\) from \(2^{2}\) to \(2^{4}\).

This tells us that \(kb^{2}\) will be a square if and only if \(k\) itself is a square. The value of \(k\) doesn’t have to be \(2^{2}\); it could be \(5^{2}\), because that would do to the index of \(5\) exactly what \(2^{2}\) did to the index of \(2\). It doesn’t even have to be a square of a prime - so long as it is some square, like \((2 \times 5)^{2}\) or \(7^8\), it will be a product of even powers of primes, and so it won’t disturb the even powers of \(b^{2}\).

Why is all this important? Because we have discovered that the ratio of two squares can never be equal to \(2\).

In symbols,

\[\dfrac{a^{2}}{b^{2}} = 2\]

is impossible, and that means that, square rooting both sides,

\[\dfrac{a}{b} = \sqrt{2}\]

is impossible, for any integer \(a\) and \(b\), which tells us that \(\sqrt{2}\) is irrational.

This is a shocking result, because we might have thought that all numbers could be expressed as some fraction, with complicated enough integer numerator \(a\) and denominator \(b\), perhaps very large numbers.

But no, there is no such way to represent \(\sqrt{2}\). And \(\sqrt{2}\) is not some freaky exception. We see from the argument above that the square root of any non-square integer (e.g. \(\sqrt{3}\), \(\sqrt{6}\), \(\sqrt{21}\)), i.e. any surd, will also be irrational. Indeed, it turns out that, in a sense, ‘most’ numbers are irrational. Up to on this website, we have quite artificially focused on the rationals, but if you pick a random point on the number line it will almost certainly represent an irrational number, rather than a rational number.

Irrational numbers are sometimes not as surprising to learners as we might expect or hope them to be. I think this is usually because they are not as familiar with the properties of the rational numbers as we might assume they are. For example, when we say that the digits in \(\sqrt{2}\), say, do not form a fixed-length repeating pattern, this might seem unremarkable, unless learners already know that all rational numbers do.

For example, \[\sqrt{2} = 1.414,213,562,373,095,048,801,688,724,209,698,078,\ldots.\] It is impossible to summarise the decimal expansion in terms of a repeating unit, and this is a feature of all irrational numbers.

Notice that this isn’t the same as saying that the decimal expansion of an irrational number has ‘no pattern’. The number \(0.1011011101111011111\ldots\), with steadily increasing numbers of \(1\)s separated by zeroes, certainly has a simple pattern. It is easy to describe the pattern in this decimal expansion. But it is an irrational number, because there is no fixed repeating unit.

Really, there are only two kinds of decimal:

(1) infinite, non-recurring, non-repeating, and

(2) recurring - because a terminating decimal, such as \(0.45\), can be thought of as \(0.450000\ldots = 0.45\dot{0}\), with a recurring zero on the end.

If we think this way, then all rational numbers are equivalent to recurring decimals, and all irrational numbers to infinite, non-recurring, non-repeating decimals.

My favourite application of long division is in converting fractions into decimals and observing the recurring decimals we get with all fractions in which (when simplified) the denominator contains prime factors other than \(2\) and \(5\) (the factors of our base \(10\)).

Doing long division shows why all rational numbers have repeating decimal expansions.

For example, if we divide an integer by \(7\), then the only possible remainders in each step are \(0\), \(1\), \(2\), \(3\), \(4\), \(5\) and \(6\). If at any point we get a remainder of zero, then the decimal expansion terminates (or we could say continues with zeroes forever). If not, then we have to cycle around the non-zero remainders in some fixed order. This also shows that the maximum length of the repeating unit will be \(6\), since there are only \(6\) non-zero remainders.

In general, the simplified fraction \(\dfrac{a}{b}\) will have a repeating unit in its decimal expansion of maximum length \(b - 1.\)

For sevenths (\(b = 7\)), the repeating unit is actually this maximum length of \(6\) (e.g. \(\frac{3}{7} = 0.\dot{4}2857\dot{1}\)), but for thirds (\(b = 3\)), the repeating unit is just of length \(1\), not \(2\) (e.g. \(\frac{2}{3} = 0.\dot{6}\)).

In particular, by discovering that

\[\frac{1}{9} = 0.111,111,111\ldots = 0.\dot{1},\]

\[\dfrac{1}{99} = 0.010,101,010\ldots = 0.\dot{0}\dot{1},\]

\[\dfrac{1}{999} = 0.001,001,001\ldots = 0.\dot{0}0\dot{1},\]

and so on, learners have the building blocks to convert any recurring decimal into a fraction.72

For example,

\[0.\dot{3} = 3 \times 0.\dot{1} = 3 \times \dfrac{1}{9} = \dfrac{1}{3},\]

although we would want learners to recognise and know this one.

A harder example would be \(6.2\dot{8}\):

\[6.2\dot{8} = 6.2 + \dfrac{0.\dot{8}}{10} = 6.2 + \dfrac{\left( \dfrac{8}{9} \right)}{10} = \dfrac{62}{10} + \dfrac{8}{90} = \dfrac{558 + 8}{90} = \dfrac{566}{90} = \dfrac{283}{45}.\]

If the repeating unit is longer than \(1\) digit, we just need to use as many \(9\)s in the denominator as there are digits in the repeating unit.

For example, if we want to convert \[0.345634563456\ldots = 0.\dot{3}45\dot{6},\] then, since \(\dfrac{1}{9999} = 0.\dot{0}00\dot{1}\), it will be \[3456 \times \dfrac{1}{9999} = \dfrac{3456}{9999} = \dfrac{384}{1111}.\]

A question to pose to learners that will create a lot of discussion is the following:

TASK 2.21

Convert \(0.\dot{9}\) to a fraction.

Learners may write

\[0.\dot{9} = 9 \times 0.\dot{1} = 9 \times \dfrac{1}{9} = 1,\]

but they may resist this conclusion, since \(0.\dot{9}\) really looks like it ought to be a bit less than \(1\), rather than exactly equal to \(1\). Appearances are deceptive however, and we do in fact say that \(0.\dot{9}\) is precisely equal to \(1\).

2.11 Problem solving with thinking algebraically

Algebra is an incredibly powerful tool for unlocking and illuminating numerous problems. It would be easy to add more and more things to this section, so I have just included some of my favourite tasks.

2.11.1 Consecutive integers

There are many rich tasks that relate in some way to patterns involving consecutive integers.

2.11.1.1 Summing the integers

The story is often told of the mathematician Carl Friedrich Gauss when he was a child and was asked, as an arithmetic exercise, to sum the positive integers from \(1\) to \(100\). He found the answer by replacing a long addition with a much quicker multiplication.

This might seem impossible at first glance. When summing identical numbers, it is easy to switch an addition for a multiplication.

For example,

\[3 + 3 + 3 + 3 + 3 = 5 \times 3.\]

But if the numbers being added are not the same as each other, this looks impossible:

\[1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = \ ?\ \times \ ?\]

Gauss’ trick was to write out the sum again, but backwards:

\[8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = \ ?\]

We don’t know what either of these sums must be equal to, but we do know that the two sums must be equal to each other, so we can call the sum \(S\). The clever insight is to realise that, if we add the numbers vertically, we get \(9\) for each pair of terms:

Each successive term in the first sum is \(1\) greater than the previous term, and each successive term in the second sum is \(1\) less than the previous term. So, if the first vertical pair, \(1 + 8\), is equal to \(9\), then the next vertical pair, \(2 + 7\), is guaranteed to have the same sum of \(9\), and so on all the way through.

This means that in total we end up with \(8\) sums of \(9\), one for each of the \(8\) original numbers in our sum, and so the sum all these sums will be twice the \(S\) value that we want:

\[8 \times 9 = 2S.\]

It follows that \[S = \dfrac{8 \times 9}{2} = 36,\] which we can verify by directly summing the first \(8\) integers.

If we generalise what we have done, the pairs of terms will always sum to \(1\) more than the last term in our sequence, and there will always be as many terms in our sum as there are in our original sequence.

So, to sum the integers from \(1\) to \(10\), we would get \(10\) sums of \(11\), which we would have to divide by \(2\), so the total would be \[\dfrac{10 \times 11}{2} = 55.\] In general, the sum of the integers from \(1\) to \(n\) will be \(\dfrac{1}{2}n(n + 1)\). So, the sum of the first \(100\) integers is \(\dfrac{100 \times 101}{2} = 5050\).

It is worth noting here that we must always get an integer answer when we divide by \(2\) at the end, because the sum of integers must be an integer.

Learners may wonder how this happens, and one way to see it is to realise that we can do easier arithmetic by doing the division by \(2\) earlier. For example, with our sum of integers up to \(8\), we had \(S = \displaystyle \dfrac{8 \times 9}{2}\), which we can calculate as \(4 \times 9 = 36\).

We can see that, because the product is the product of two consecutive integers, exactly one of them will always be even, and so this cancellation will always be possible, as one or other of the two factors must be a multiple of \(2\). That explains why \(\dfrac{1}{2}n(n + 1)\) is an integer for all \(n\).

As an extension, learners can consider summing arithmetic sequences of integers that are not necessarily consecutive.

Here is an example:

TASK 2.22

What is the sum of all of the even numbers between \(1\) and \(100\)?

We found above that the sum of all of the integers between \(1\) and \(100\) was \(5050\), so learners may think the answer should be half of this, \(2525\). But that is incorrect, because although among these \(100\) numbers, \(50\) are odd and \(50\) are even, each even number is \(1\) more than each odd number.

We can see this if we imagine pairing them up:

\[\begin{alignedat}{4} 1 + 3 + 5 + 7 +{} & \phantom{0}9 + 11 + 13 + \dots + \phantom{0}99 &&= S_{\text{odd}} \\ 2 + 4 + 6 + 8 +{} & 10 + 12 + 14 + \dots + 100 &&= S_{\text{even}} \end{alignedat}\]

Since there are \(50\) numbers in each sum, and every number in the even sum is \(1\) more than the corresponding number in the odd sum, \(S_{even}\) must be \(50\) more than \(S_{odd}\). Since they add up to \(5050\), we need to find two numbers which sum to \(5050\), where one number (the one we want) is \(50\) more than the other.

We could use simultaneous equations to solve this (Chapter 4), but we do not really need to do this formally, because we can see that if we subtract \(50\) we must obtain twice the sum of the odd numbers. So, \(S_{odd}\) must be \(2500\), and therefore \(S_{even}\) must be \(2550\).

Another way to think about it is that \(50\) is small, relative to \(5050\), and so the two numbers will be close to half of \(5050\). Half of \(5050\) is \(2525\), and, to be \(50\) apart, our two numbers (\(S_{odd}\) and \(S_{even}\)) must be \(25\) either side of \(2525\), so

\[S_{odd} = 2525 - 25 = 2500 \qquad \text{ and } \qquad S_{even} = 2525 + 25 = 2550.\]

2.11.1.2 Triangle numbers

The numbers we obtain from summing the first so-many positive integers, \(1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ..\), are the triangle numbers, because they can be represented as triangular arrays of dots, either in an isometric arrangement (Figure 2.31) or a rectangular one (Figure 2.32).

Figure 2.31: The first four triangle numbers represented in isometric arrangements.
Figure 2.32: The first four triangle numbers represented in rectangular arrangements.

Two copies of the same triangle fit together to make a parallelogram with one of its dimensions \(1\) greater than the other (e.g. Figure 2.33).

In general, if \(T_{n}\) is the \(n\)th triangle number, then \(2T_{n} = n(n + 1)\).

Figure 2.33: Two copies of the same triangle make a rectangle.

Two consecutive triangle numbers will sum to a square.

We can see this algebraically, as

\[T_{n} + T_{n - 1} = \dfrac{1}{2}n(n + 1) + \dfrac{1}{2}(n - 1)n = \dfrac{1}{2}n(2n) = n^{2},\]

as well as in terms of dots (e.g. Figure 2.34).

Figure 2.34: A visualisation of \(T_{3} + T_{4} = 4^{2}\).

If we wish to sum consecutive integers, we do not necessarily have to begin at \(1\). We might wish to find the sum from, say, \(20\) to \(26\), and the same ‘trick’ of writing the sum backwards will work:

We obtain \(S = \displaystyle \dfrac{(26 - 19)(20 + 26)}{2} = 161\).

Our \(S\) must be a difference between two triangle numbers, which is called a trapezium number. The sum from \(20\) to \(26\) will be the sum from \(1\) to \(26\) minus the sum from \(1\) to \(19\):

\[T_{26} - T_{19} = \dfrac{26 \times 27}{2} - \dfrac{19 \times 20}{2} = 351 - 190 = 161.\]

We can also focus on the number of terms we are summing.

Here is a rich task:

TASK 2.23

What can you say about the sum of three consecutive positive integers?

Learners might represent the three numbers as \(a\), \(b\) and \(c\). But using three different letters does not capture the information that they are consecutive. Although \(a\), \(b\) and \(c\) are consecutive letters of the alphabet, we make no such assumption about the values that those three letters might take in algebra.

Instead, to encode the information about them being consecutive, we would need to call them, for example,

Now, when we sum them, we get \(3n + 3\).

We can see that this is a multiple of \(3\), because it is a sum of two multiples of \(3\) (both \(3n\) and \(3\) are multiples of \(3\)).

We can also factorise it, to obtain \(3(n + 1)\), which we can see is a multiple of \(3\), because it is \(3\) lots of an integer (\(n + 1\) is an integer if \(n\) is an integer). It is good for learners to see this both ways.

This would work for any three consecutive expressions, even something like

When we sum these, we get

\[(n + n + n) + (10 + 11 + 12) = 3n + 33 = 3(n + 11).\]

We will always get \(3n\) from the three \(n\)s, and the numerical terms will always be the sum of three consecutive integers. (We can think of the \(3\) that we obtained in \(3n + 3\) above as being the sum of \(0\), \(1\) and \(2\).) Since the sum of three consecutive integers is a multiple of \(3\), this number will also be a multiple of \(3\), so we will have the sum of two multiples of \(3\).

It is actually neater if we label our three integers more symmetrically:

Now, when we sum them, we get \(3n\), which is clearly a multiple of \(3\).

We can see that the same thing will happen even if the integers are not consecutive, provided they are evenly spaced (consecutive members of an arithmetic sequence with constant difference \(d\), see Chapter 4):

The \(+ d\) and the \(- d\) will cancel out when we sum the three numbers, and we will again obtain \(3n\).

Given that three consecutive integers always sum to a multiple of \(3\), learners can extend this to see whether two consecutive numbers will sum to a multiple of \(2\).

They may be surprised that this can never happen, because with two consecutive numbers one has to be odd and the other even, and ‘\(\text{odd} + \text{even} = \text{odd}\)’. So, two consecutive numbers always sum to an odd number, which is never a multiple of \(2\).

Similarly, the sum of \(4\) consecutive integers will never be a multiple of \(4\) (although it will be even). However, the sum of \(n\) consecutive integers will always be a multiple of \(n\) whenever \(n\) is odd.

A related question to explore is which positive integers we can make by summing consecutive integers.

Learners can experiment with this.

For example, because \(15\) is a multiple of \(3\), we must be able to write it as a sum of three consecutive integers, with \(\dfrac{15}{3}\) as the middle number:

But, because \(15\) is also a multiple of \(5\) (another odd number), we must also be able to write it as the sum of \(5\) consecutive integers:

This means that every odd factor (greater than \(1\)) that a number has gives a new way of writing it as the sum of consecutive integers. (We have to exclude the factor of \(1\), because that corresponds to the number itself, not a sum of consecutive integers.) So, the number of ways of writing any number as a sum of consecutive integers will be equal to the number of odd factors greater than \(1\) that the number has. (Sometimes some of these integers will be negative, but that is OK.)

We can see from this that if a number has no odd factors greater than \(1\), then it will be impossible to make it as a sum of consecutive integers. And these will be the only numbers that cannot be expressed as the sum of consecutive integers.

Are there any numbers with no odd factors greater than \(1\)? Clearly every positive integer has at least one odd factor, because \(1\) is a factor of every integer. But the powers of \(2\) have no odd factors greater than \(1\), and indeed they are the only such numbers. So, they are the only numbers which cannot be made by summing consecutive integers.

We can also see this algebraically.

As we saw above, a sum of consecutive integers must be the difference between two triangle numbers.

In general, it will be

\[T_{m} - T_{n} = \dfrac{m(m + 1)}{2} - \dfrac{n(n + 1)}{2},\ \text{where}\ m - n > 1.\]

We note that if \(m - n = 1\), then \(T_{m} - T_{n}\) would be a single number, and not a sum, so we require that \(m - n > 1\).

Simplifying,

\[T_{m} - T_{n} = \dfrac{m^{2} + m - n^{2} - n}{2} = \dfrac{\left( m^{2} - n^{2} \right) + (m - n)}{2}\]

\[= \dfrac{(m - n)(m + n + 1)}{2},\]

using the difference of two squares.

Now, when \(m - n\) is even, \(m + n\) will also be even, and so \(m + n + 1\) will be odd.

And, when \(m - n\) is odd, \(m + n\) will also be odd, and so \(m + n + 1\) will be even.

Therefore, the product \((m - n)(m + n + 1)\) will always either be ‘odd \(\times\) even’ or ‘even \(\times\) odd’.

When we divide the even factor in this numerator by \(2\), we may then obtain \(T_{m} - T_{n}\) equal to ‘even \(\times\) odd’ or ‘odd \(\times\) odd’, but either way we are guaranteed to have at least one odd factor in \(T_{m} - T_{n}\). So, it is impossible for \(T_{m} - T_{n}\) to be a number that has no odd factors.

We might worry that the only odd factor could be \(1\), but this cannot happen. Since we required that \(m > n + 1\), it follows that the first bracket, \(m - n\), cannot be equal to \(1\). And, since \(n > 0\) the second bracket, \(m + n + 1 > 2\), meaning that the odd factor we found in \(T_{m} - T_{n}\) must be greater than \(1\).

2.11.1.3 Handshakes

Many tasks lead to triangle numbers. This is an old favourite:

TASK 2.24

Imagine that everyone in a room shakes hands with everyone else.
How many handshakes will there be altogether?

If there are \(n\) people, we can imagine them arriving into the room one by one.

The first person has no one to shake hands with.

The second person to arrive shakes hands with just the first person.

The third person shakes hands with both the first person and the second person, so that is \(2\) more handshakes.

At each stage, the new person has to shake hands with everyone who is already in the room, but that is all, since all the people already in the room have already shaken hands with each other.

In this way, for \(n\) people in total, we get the sum \[0 + 1 + 2 + 3 + \ldots + (n - 1).\] The process ends with \(n - 1\), rather than with \(n\), because the final (\(n\)th) person only has the \(n - 1\) people who got there before them to shake hands with. They don’t shake hands with themselves. So, for \(n\) people, the total number of handshakes is the \((n - 1)\)th triangle number.

Looked at another way, everyone in the room has to shake hands with everyone else - i.e., with everyone except themselves.

So, each of the \(n\) people has to shake hands with \(n - 1\) people, making a total of \(n(n - 1)\) handshakes. However, this counts every handshake twice - from both ends - and so the correct number is \(\dfrac{1}{2}n(n - 1)\), which is \(T_{n - 1}\).

In Chapter 4, we will see that the handshakes can be represented visually by a mystic rose.

2.11.1.4 Products of integers

If we can sum the positive integers, then we can also multiply them.

The factorial of a positive integer is the product of all the positive integers less than or equal to that integer.

For example, \(5!\) (‘\(5\) factorial’) is

\[5! = 5 \times 4 \times 3 \times 2 \times 1.\]

The final multiplication by \(1\) does not affect the value, but it perhaps looks neater to leave it in than to omit it. Obviously we must not include zero on the end, as otherwise all factorials would be equal to zero, which would not be very useful!

Factorials get big extremely quickly. Even \(10!\) is \(3,628,800\) (i.e. over \(3\) million). Unlike the triangle number formula for the sums of the integers, there is no neat formula for the products of the integers.73

We might pose a question analogous to the one in TASK 2.23:

TASK 2.25

What can you say about the product of three consecutive positive integers?

Learners might adopt a similar strategy to the one used for TASK 2.23, and approach this algebraically:

The product is \[(n - 1)n(n + 1) = n\left( n^{2} - 1 \right) = n^{3} - n.\] So, the task could even be posed in this way:

TASK 2.26

What kinds of numbers are \(n^{3} - n\)?

It is not easy to answer this by looking at this expression.

One way to highlight the need for flexibility in using algebra is to contrast these two similar-sounding questions:

  • What can you say about the sum of three consecutive positive integers? (TASK 2.23)

  • What can you say about the product of three consecutive positive integers? (TASK 2.25)

They need different strategies for their solution.

Here, it is helpful to think about how frequently different multiples appear among the integers (see Section 2.10.8). The multiples of \(2\) come every other integer, so with three consecutive integers, either the middle integer is a multiple of \(2\) or both of the two end integers are:

Either

Or

In both cases, the product of the three integers will be even, because there is at least one even number involved, and in the second case it will be at least a multiple of \(4\).

What about the multiples of \(3\)?

Multiples of \(3\) come every third integer, and so, because we are multiplying three consecutive integers, exactly one of them must be a multiple of \(3\).

Putting this together, the product must be at least a multiple of \(2 \times 3 = 6\).

We can extend this argument to say that a product of \(n\) consecutive integers must be a multiple of \(n!\)

2.11.2 Dots inside rectangles

This is a nice task that can be represented algebraically:74

TASK 2.27

How many dots are there inside a \(4 \times 3\) rectangle?
We aren’t including the dots on the boundary.

Learners can easily count that there are \(6\) dots inside the rectangle.

The teacher may need to ensure that when drawing the \(4 \times 3\) rectangle learners make the sides of lengths \(4\) and \(3\), and are not confused by there being \(5\) dots on the horizontal sides and \(4\) dots on the vertical sides.

If the rectangle is \(x\) by \(y\), then there will be \(x - 1\) columns of dots and \(y - 1\) rows of dots inside the rectangle.

So, there must be \((x - 1)(y - 1)\) dots altogether inside an \(x\) by \(y\) rectangle, shown starred for the \(4 \times 3\) case in Figure 2.35.

Figure 2.35: Counting the dots inside.

Depending on how learners think about this, they may have different but equivalent expressions, such as \(xy - x - y + 1\) or \(xy - (x + y) + 1\) .

We can also prove this result using Pick’s Theorem.75

According to Pick’s Theorem, the number of interior lattice points of a polygon like this is given by

\[\text{area} - \dfrac{b}{2} + 1,\]

where \(b\) is the number of points on the boundary of the polygon.

In our case, \(b = 2(x + y)\), and the area of our rectangle is \(xy\), so the number of interior points will be \(xy - (x + y) + 1\), as we obtained above.

To extend this task, learners can think about tilted rectangles.

Figure 2.36 shows what we will call a tilted \(4 \times 3\) rectangle, in which the sides have gradient \(\pm 1\). Notice that our unit of length along the sides of the rectangle is no longer \(1\), but is now \(\sqrt{2}\).

Figure 2.36: Dots inside a tilted \(4 \times 3\) rectangle.

It is now harder to count the dots inside. Finding a systematic way to count them is not only more reliable, but is also more likely to reveal the structure that will enable us to generalise to other tilted rectangles.

Learners may discern two interleaved rectangles of dots inside the rectangle, marked differently in Figure 2.37.

Figure 2.37: Counting the lattice points inside a tilted \(4 \times 3\) rectangle.

We have a \(4 \times 3\) rectangle of stars, with a \(3 \times 2\) rectangle of little squares nested within it.

So, the total number of lattice points inside the rectangle is \[4 \times 3 + 3 \times 2 = 12 + 6 = 18.\]

In general, for an \(x\) by \(y\) gradient \(\pm 1\) rectangle, there will be \[xy + (x - 1)(y - 1) \text{ dots.}\] This can also be written in other equivalent ways, such as \(2xy - (x + y) + 1\).

Again, we can see this by using Pick’s Theorem. The number of points on the boundary is still \(2(x + y)\), but the area has increased. The rectangle now has dimensions \(x\sqrt{2}\) and \(y\sqrt{2}\), rather than \(x\) and \(y\), so the area will be the product of these, which is \(2xy\). So, the number of interior points will be

\[\text{area} - \dfrac{b}{2} + 1= 2xy - (x + y) + 1,\]

as we obtained above.

Finally, learners can explore rectangles tilted to other angles.

In general, for an \(x\) by \(y\) rectangle with two of its sides having positive integer gradient \(m\), all that will change is the area of the rectangle.

Using Pythagoras’ Theorem, the dimensions of the tilted rectangle will be \(x\sqrt{1 + m^{2}}\) and \(y\sqrt{1 + m^{2}}\), so the area of the rectangle will now be \(xy\left( 1 + m^{2} \right)\).

Therefore, the number of interior points will be

\[\text{area} - \dfrac{b}{2} + 1= xy\left( 1 + m^{2} \right) - (x + y) + 1.\]

Substituting \(m = 1\) or \(m = 0\) into this shows that this result is consistent with the formulae given above.

2.11.3 Pascal’s triangle

The following task is an extremely rich problem:

TASK 2.28

Two moves are allowed:
- Taking \(1\) step to the right
- Taking \(1\) step up

How many moves are needed to get from the origin \((0,\ 0)\) to the point \((3,\ 2)\)?
In how many different ways can you get from the origin \((0,\ 0)\) to the point \((3,\ 2)\)?

It is possible to contextualise this as journeys through a city, such as New York City, that is laid out on a grid pattern.

However someone travels, it will have to entail a total of \(3\) steps to the right and \(2\) steps up, so \(5\) moves are needed altogether. However, these moves can happen in any order. If we label a move to the right with \(x\) and a move up with \(y\), then one possibility would be \(xyxxy\), shown in Figure 2.38.

Figure 2.38: One way of getting from the origin \((0,\ 0)\) to the point \((3,\ 2)\).

It is interesting to ask where on the grid we can get in exactly \(5\) moves.

Learners may be used to the idea that the locus of a fixed distance from a fixed point is usually a circle (Chapter 3). However, where the distances are constrained to be horizontal and vertical only, we get a different kind of geometry, called (for obvious reasons) taxicab geometry. The points you can get to in \(5\) moves lie along the diagonal line from \((0, 5)\) to \((5, 0)\). (If we also allowed movements West and South, the locus of possible positions \(5\) steps from the origin would be a tilted square centred on \((0, 0)\).)

Learners can experiment with different points, being systematic in ensuring they find all of the possible ways to the point. For example, every point on the axes can only be obtained in one way (a string of \(x\)s for the horizontal axis or a string of \(y\)s for the vertical axis). The point \((1,\ 1)\) can be obtained in two ways: \(xy\) or \(yx\).

The key relationship depends on the fact that you can arrive at any point only by moving \(x\) from the point immediately to the left or \(y\) from the point immediately below. So, the total number of ways of getting to, say, \((3,\ 2)\) has to be the sum of the number of ways of getting to \((2,\ 2)\) and the number of ways of getting to \((3,\ 1)\). This relationship enables you to calculate the number of ways to any point on the grid, provided you work outwards gradually from the origin.

It is also possible to think about this problem in terms of combinations (Chapter 1).

Figuring out the total number of ways of getting from the origin \((0,\ 0)\) to the point \((3,\ 2)\) is equivalent to working out the number of permutations of three \(x\)s and two \(y\)s.

We found in Chapter 1 that the number of different possible orderings of the letters in something like \(xxxyy\) would be \(\displaystyle \dfrac{5!}{3!2!}\), or \({_{}^{5}C}_{3}\) (equivalently \({_{}^{5}C}_{2}\), because \({_{}^{n}C}_{n - r} = {_{}^{n}C}_{r}\)). Each of these orderings corresponds to one possible route to \((3,\ 2)\).

So, in general, the number of ways from the origin to each point (\(x,y)\) will be \({_{}^{x + y}C}_{x}\).76 We can label these on the diagram (Figure 2.39).

Figure 2.39: Labelling the numbers of ways of getting from \((0,\ 0)\) to several points.

It is more usual to present this triangle of numbers rotated and reflected, as shown in Figure 2.40, and it is generally termed Pascal’s triangle, although it was known hundreds of years before Blaise Pascal.

Figure 2.40: The first few rows of Pascal’s triangle, (a) in terms of \({_{}^{n}C}_{r}\) and (b) as numbers.

The numbers in Pascal’s triangle crop up in many situations, and there are many fascinating patterns. One connection that is particularly relevant to this chapter involves the Binomial Theorem, when expanding brackets.

Look at the pattern below:

\[(x + y)^{0} = 1\]

\[(x + y)^{1} = y + x\]

\[(x + y)^{2} = y^{2} + 2xy + x^{2}\]

\[(x + y)^{3} = y^{3} + 3xy^{2} + 3x^{2}y + x^{3}\]

\[(x + y)^{4} = y^{4} + 4xy^{3} + 6x^{2}y^{2} + 4x^{3}y + x^{4}\]

\[(x + y)^{5} = y^{5} + 5xy^{4} + 10x^{2}y^{3} + 10x^{3}y^{2} + 5x^{4}y + x^{5}\]

It may be easier to discern the patterns if we write in all the coefficients, even when they are \(1\), and the indices, even when they are \(0\) or \(1\):

\[(x + y)^{0} = {1x}^{0}y^{0}\]

\[(x + y)^{1} = 1x^{0}y^{1} + 1{x^{1}y}^{0}\]

\[(x + y)^{2} = 1x^{0}y^{2} + 2x^{1}y^{1} + 1x^{2}y^{0}\]

\[(x + y)^{3} = 1x^{0}y^{3} + 3x^{1}y^{2} + 3x^{2}y^{1} + 1x^{3}y^{0}\]

\[(x + y)^{4} = 1x^{0}y^{4} + 4x^{1}y^{3} + 6x^{2}y^{2} + 4x^{3}y^{1} + 1x^{4}y^{0}\]

\[(x + y)^{5} = 1x^{0}y^{5} + 5x^{1}y^{4} + 10x^{2}y^{3} + 10x^{3}y^{2} + 5x^{4}y^{1} + 1x^{5}y^{0}\]

The coefficients are the numbers in Pascal’s triangle. In each term, the \(x\) indices step up from zero to the power, and the \(y\) indices step down from the power to zero.

We can see why this happens if we think about what happens when we expand a product of brackets.

Let’s take \((x + y)^{5}\) as an example, and write out the five brackets:

\[(x + y)(x + y)(x + y)(x + y)(x + y).\]

Every individual product in the expansion will come from multiplying together one term, either \(x\) or \(y\), from each bracket. This means that each product will contain \(5\) letters, and each letter will be either an \(x\) or a \(y\). So, we will get products like \(xyxxy\) and \(yyxyx\).

When we simplify, we will find that some of these products are equal.

For example, \(xyxxy\) and \(yxxxy\) are equal, because the order makes no difference when we multiply variables. This is exactly what we saw with the routes to \((3,\ 2)\), and the same arguments will follow. To get, say, the \(x^{3}y^{2}\) term, we have to choose \(x\)s from \(3\) of the brackets and \(y\)s from the other \(2\) brackets. The number of products we get that are equal to \(x^{3}y^{2}\) will be the number of ways in which we can choose \(3\) \(x\)s out of \(5\), and that is exactly what \({_{}^{5}C}_{3}\) is, so the expansion is going to be

\[(x + y)^{5} = {_{}^{5}C}_{0}x^{0}y^{5} + {_{}^{5}C}_{1}x^{1}y^{4} + {_{}^{5}C}_{2}x^{2}y^{3} + {_{}^{5}C}_{3}x^{3}y^{2} + {_{}^{5}C}_{4}x^{4}y^{1} + {_{}^{5}C}_{5}x^{5}y^{0}.\]

Once we choose the three \(x\)s, the remaining brackets have to supply \(y\)s. Equivalently, we could focus on the \(y\)s, and say that the number of products we will get that are equal to \(x^{3}y^{2}\) will be the number of ways we can choose two \(y\)s out of \(5\), which is \({_{}^{5}C}_{2}\). This means the expansion would be

\[(x + y)^{5} = {_{}^{5}C}_{5}x^{0}y^{5} + {_{}^{5}C}_{4}x^{1}y^{4} + {_{}^{5}C}_{3}x^{2}y^{3} + {_{}^{5}C}_{2}x^{3}y^{2} + {_{}^{5}C}_{1}x^{4}y^{1} + {_{}^{5}C}_{0}x^{5}y^{0}.\]

Here, the Binomial coefficients go in the opposite order. However, the symmetry of Pascal’s triangle means that

\[{_{}^{5}C}_{0} = {_{}^{5}C}_{5} \qquad \text{ and } \qquad {_{}^{5}C}_{1} = {_{}^{5}C}_{4} \qquad \text{ and }\qquad {_{}^{5}C}_{2} = {_{}^{5}C}_{3},\]

and so the result is the same.

We saw earlier that if each bracket can supply either an \(x\) or a \(y\), then the total number of products from \(5\) brackets must be \(2^{5}\). It follows that the sum of the six Binomial coefficients above must be \(2^{5}\).

One way to see this is to substitute \(x = y = 1\), to get

\[(1 + 1)^{5} = {_{}^{5}C}_{0} + {_{}^{5}C}_{1} + {_{}^{5}C}_{2} + {_{}^{5}C}_{3} + {_{}^{5}C}_{4} + {_{}^{5}C}_{5},\]

and indeed \((1 + 1)^{5} = 2^{5}\), which is equal to \(1 + 5 + 10 + 10 + 5 + 1\).

In general, the \(n\)th row of Pascal’s triangle sums to \(2^{n}\), but to make this work we have to count the top number \({_{}^{0}C}_{0} = 1\) as the zeroth row.

An extension to this task is the following.77

TASK 2.29

How many ways are there to read MATHEMATICS in the maze below.
One possible route is shown in blue

Learners’ initial response may be to answer “a lot” and feel overwhelmed. This is quite a helpful start, because it makes us realise that MATHEMATICS is a long word, and we could get a feel for the problem, and perhaps even solve it, by replacing MATHEMATICS with a shorter word, such as MAT. We could even build up from MAT to MATH to MATHS.

Figure 2.41(a) shows a maze reduced to just MATHS. In Figure 2.41(b), we write in place of each letter the number of ways to reach that position from the centre. For the same reasons we have discussed, these numbers will be the numbers from Pascal’s triangle.

Figure 2.41: (a) A MATHS maze, (b) the number of ways to reach each position.

To find the number of ways to read MATHS, we need to sum all the numbers in the letter S positions.

This is going to be \(4\) times (because of the four sides of the square) the sum of the fourth row of Pascal’s triangle, minus \(4\), because we double-count the four S’s at the vertices of the square.

We know that the \(4\)th row of Pascal’s triangle will sum to \(2^{4}\), so the number of ways of reading MATHS is going to be \[4\left( 2^{4} - 1 \right) = 60.\]

This is for a \(5\)-letter word.

In general, for an \(n\)-letter word, the number of ways of reading it will be \[4\left( 2^{n - 1} - 1 \right),\] so for MATHEMATICS (\(11\) letters), it will be \(4\left( 2^{10} - 1 \right) = 4092\). Was that close to how many you expected?

Learners might need to think carefully about whether it would make any difference if the word contained double letters, which it can if we allow diagonal moves! Learners might think that a palindromic word (one that reads the same forwards and backwards) will have twice as many ways to be read, but there are many more than that, because we can begin and end on the edge of the figure and may also be able to follow cycles, in which we start and end at the same position.

2.11.4 Magic squares

Magic squares are often used for arithmetic practice, but they also have great potential for problem solving and simple algebraic thinking.

In a magic square, the sum of the numbers in each row, column and main diagonal is the same – known as the magic sum. An example is shown in Figure 2.42, which uses the first \(9\) integers and has a magic sum of \(15\).

Figure 2.42: A magic square with a magic sum of \(15\).

Since the sum of the first \(9\) integers is \(\displaystyle \dfrac{10 \times 9}{2} = 45\) (see Section 2.11.1.1), and this is spread across three rows (or columns), the magic sum must be one-third of \(45\), which is \(15\).

Puzzle books often contain magic squares with missing numbers to be found. They can be easy or tricky, depending on which numbers are given.

Figure 2.43 shows an easy example.

Figure 2.43: An easy magic square to solve.

Because we have three numbers in a row (or column or diagonal), we get the magic sum immediately: \(3 + 13 + 11 = 27\).

Then, we can work backwards from this on the middle column, to find the middle number: \(27 - 13 - 5 = 9\).

Then we can use the diagonals to find the two bottom corner numbers, and finally the remaining two numbers. The solution is shown in Figure 2.44.

Figure 2.44: The solution.

However, magic square puzzles can be harder than this.

For example, in Figure 2.45 we have the same magic square, but are given a different four numbers. This time, we do not have a complete row, column or diagonal, and we are not told what the magic sum is, so we cannot immediately work this out.

Figure 2.45: A harder magic square to solve.

However, we can observe that in Figure 2.46 the light grey shaded squares must sum to the same as the dark grey shaded squares. We know this because the empty square in the top row must add on to both of those pairs to make the same (unknown) magic sum.

Figure 2.46: Two pairs of squares which must have the same sum.

So, we can deduce that the central square must be \((3 + 11) - 5 = 9\).

We can now focus on the squares shaded in Figure 2.47, because these also must have equal sums, since each of these pairs of squares, added to the bottom left corner square, makes the magic sum.

Figure 2.47: A different two pairs of squares with equal sums.

This tells us that the bottom right square must be \((3 + 17) - 5 = 15\), and this gives us the entire top left to bottom right diagonal (\(3\), \(9\), \(15\)), allowing us to find the magic sum and complete the solution.

There are other ways to arrive at the same solution.

Some configurations are trickier still, such as the one shown in Figure 2.48.

Figure 2.48: An even harder magic square to solve.

Using the same approach as before, we can find two more numbers, shown in grey in Figure 2.49. However, by that point we have exhausted the potential of that trick.

Figure 2.49: Getting stuck.

However, we can finish the puzzle by using a bit of algebra.

Letting \(m\) be the magic sum, we can find expressions for the three remaining squares in terms of \(m\), and these are: \(m - 24\), \(m - 18\) and \(m - 12\).

Summing these, and setting the result equal to \(m\), gives

\[\begin{aligned} 3m - 54 &= m \\ 2m &= 54 \\ m &= 27, \end{aligned}\]

and then the solution follows. (The final three missing numbers must be \(3\), \(9\) and \(15\), as we found previously.)

Another tricky configuration is being given the four corner numbers. Expressing each value in the middle row or column in terms of the (unknown) magic sum also works here.

Learners can invent magic square puzzles for each other, by beginning with a complete magic square and deleting five of the numbers. When solving them, they should try to see whether symbolic algebra is necessary or not, depending on the configuration of the given numbers.

2.11.5 Arithmagons

Like magic squares, arithmagons are often used in school as a way to practise arithmetic. But they can also be a valuable source of mathematical reasoning.78

In an arithmagon, such as the one shown in Figure 2.50, the number in each square is the sum of the numbers in the two adjacent circles.

Figure 2.50: An example of an arithmagon.

Beginning with the circle numbers and finding the numbers in the squares is easy, just involving addition.

Being given the numbers in two of the squares and one circle number, such as in Figure 2.51, is slightly more challenging, but just involves some subtraction.

Figure 2.51: An easy arithmagon puzzle.

The real challenge is to be given the three numbers in squares and have to deduce the three circle numbers, as in Figure 2.52.

Figure 2.52: A difficult arithmagon puzzle.

One way to solve these is just to try numbers until you find ones that work. That can be effective, but is not very interesting, and can be frustrating. Typically, learners will easily find circle numbers that work for two of the squares, but not the third. One way to discourage trial and error is to choose circle numbers that are half way between integers, so that the numbers in the squares (given) are integers, but the circle numbers (to be found) are not.

If we want to deduce (rather than guess) the circle numbers, one way to do this is to realise that because each square is the sum of the two adjacent circles, the sum of all three squares must be twice the sum of all three circles.

So, for the puzzle in Figure 2.52, if we calculate \(11 + 12 + 15 = 38\), and divide this by \(2\), we discover that the sum of the three circle numbers must be \(19\).

That is the key to entire puzzle, because now, we can choose any of the numbers in the squares, such as the \(11\), and notice that \(11\) is the sum of the two circles adjacent to it, whereas \(19\) is the sum of all three circles, and so it follows from this that the circle opposite to \(11\) must be \(19 - 11 = 8\).

We can then use the same reasoning to find the other two circle numbers - except that we don’t need to, because once we have one circle number, we can find the others by subtraction from the numbers in the squares (as for the puzzle shown in Figure 2.51).

Although thinking this way doesn’t involve writing down any algebraic equations, it is very much the kind of thinking I have argued in this chapter is ‘algebraic’.

Related tasks can include more than three circle numbers.79

2.12 Conclusion

It is difficult to overstate the benefits that come from algebraic thinking in school mathematics, and many of these will continue to appear throughout the subsequent chapters.

In Chapter 3, we will see how problem solving in geometry is often facilitated by using some algebra.

In Chapter 4, I bring together symbols and visuals in considering functions and graphs, where algebra is absolutely central.

Finally, in Chapter 5, we consider applications of mathematics beyond mathematics itself. You might have wondered why word problems were not mentioned in the present chapter, and in the context of mathematical modelling we will consider issues around ‘real-life applications’ in Chapter 5.

As with thinking multiplicatively (Chapter 1), the key to success in algebra is to see the challenge not as a long list of isolated techniques that need to be separately learned and practised to fluency, but rather as a powerful way of thinking. Thinking algebraically gives learners the confidence to introduce a letter in place of a number where it will be helpful, and manipulate it with meaning, in order to draw reliable conclusions.

Learners accustomed to thinking algebraically won’t be asking the teacher, “Am I allowed to change this expression into this expression by doing this?” but will recognise that anything is ‘allowed’ if it leads to true statements, and true statements are those which represent how numbers actually behave. Learners will become much more likely to check their algebra for themselves by ‘testing it’ with some convenient numbers.

Notes

  1. Hodgen, J., & Foster, C. (2018). What’s so hard about algebra? Mathematics in School47(5), 6–7. https://www.foster77.co.uk/Hodgen%20&%20Foster,%20Mathematics%20in%20School,%20What’s%20so%20hard%20about%20algebra.pdf↩︎

  2. Jones, I., Inglis, M., Gilmore, C., & Evans, R. (2013). Teaching the substitutive conception of the equals sign. Research in Mathematics Education15(1), 34-49. https://doi.org/10.1080/14794802.2012.756635↩︎

  3. Foster, C. (2021). Questions pupils ask: What are ‘like terms’? Mathematics in School50(4), 20–21. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20What%20are%20’like%20terms’.pdf↩︎

  4. Leversha, G. (2010). More on the ‘Algebra as Object’ analogy. Mathematics in School39(1), 5-6. https://www.jstor.org/stable/20696955↩︎

  5. Foster, C. (2021). Identity crisis. Scottish Mathematical Council Journal, 51, 36–37. https://www.foster77.co.uk/Foster,%20Scottish%20Mathematical%20Council%20Journal,%20Identity%20crisis.pdf↩︎

  6. Foster, C. (2020). Trusting in patterns. Mathematics in School49(3), 17–19. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Trusting%20in%20patterns.pdf↩︎

  7. Droujkova, M., Tanton, J., McManaman, & Yelena, Y. (2016). Avoid Hard Work!: And Other Encouraging Problem-Solving Tips for the Young, the Very Young, and the Young at Heart. Delta Stream Media, an imprint of Natural Math.↩︎

  8. Foster, C. (2018). Beat the calculator. Teach Secondary7(5), 86–87. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Beat%20the%20calculator.pdf↩︎

  9. Foster, C. (2010). Resources for teaching mathematics 14–16. Continuum.↩︎

  10. Feynman, R. P. (1999). The pleasure of finding things out. Allen Lane, pp. 5-6.↩︎

  11. Ngu, B. H., & Phan, H. P. (2022). Learning linear equations: capitalizing on cognitive load theory and learning by analogy. International Journal of Mathematical Education in Science and Technology53(10), 2686-2702. https://doi.org/10.1080/0020739X.2021.1902007↩︎

  12. Filloy, E., & Rojano, T. (1989). Solving equations: The transition from arithmetic to algebra. For the Learning of Mathematics9(2), 19-25.↩︎

  13. Prestage, S., & Perks, P. (2003). Ban the equals sign. Mathematics Teaching, 192, 3-5.↩︎

  14. Shore, C., Foster, C., & Francome, T. (2026). Empty number lines for solving linear equations. Mathematics in School55(1), 10–17. https://www.foster77.co.uk/Shore%20et%20al.,%20Mathematics%20in%20School,%20Empty%20number%20lines%20for%20solving%20linear%20equations.pdf↩︎

  15. Rittle-Johnson, B. (2024). Encouraging students to explain their ideas when learning mathematics: A psychological perspective. The Journal of Mathematical Behavior, 76, 101192. https://doi.org/10.1016/j.jmathb.2024.101192↩︎

  16. Foster, C. (2018). Developing mathematical fluency: Comparing exercises and rich tasks. Educational Studies in Mathematics, 97(2), 121–141. https://doi.org/10.1007/s10649-017-9788-x↩︎

  17. Foster, C. (2015). Expression polygons. Mathematics Teacher, 109(1), 62–65. https://doi.org/10.5951/mathteacher.109.1.0062↩︎

  18. Foster, C. (2012). Connected expressions. Mathematics in School41(5), 32–33. https://www.foster77.co.uk/Foster,%20Mathematics%20In%20School,%20Connected%20Expressions.pdf↩︎

  19. Foster, C. (2014). The power of puzzles. Teach Secondary3(8), 34–35. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20The%20Power%20of%20Puzzles.pdf↩︎

  20. van Merriënboer, J. J., & Kirschner, P. A. (2018). Ten steps to complex learning: A systematic approach to four-component instructional design (3rd ed). Routledge. https://doi.org/10.4324/9781315113210↩︎

  21. https://gridalgebra.com/↩︎

  22. Foster, C. (2016). The simple life. Teach Secondary5(2), 31–33. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20The%20simple%20life.pdf↩︎

  23. Foster, C. (2024). Expanding pairs of brackets. Teach Secondary13(3), 21. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Expanding%20pairs%20of%20brackets.pdf↩︎

  24. Foster, C. (2012). Quadratic doublets. Mathematical Gazette96(536), 264–266. https://doi.org/10.1017/S0025557200004514↩︎

  25. https://www.cambridgemaths.org/for-teachers-and-practitioners/espresso/view/variation-theory-in-mathematics-education/↩︎

  26. Kirschner, P. A., Hendrick, C., & Heal, J. (2025). Instructional illusions. Hachette Learning.↩︎

  27. Reid O’Connor, B., & Norton, S. (2024). Exploring the challenges of learning quadratic equations and reflecting upon curriculum structure and implementation. Mathematics Education Research Journal36(1), 151-176. https://doi.org/10.1007/s13394-022-00434-w↩︎

  28. Foster, C. (2022). Starting with completing the square. Mathematics in School51(5), 2–5. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Starting%20with%20completing%20the%20square.pdf↩︎

  29. Foster, C. (2014). “Can’t you just tell us the rule?” Teaching procedures relationally. In S. Pope (Ed.), Proceedings of the 8th British Congress of Mathematics Education, Vol. 34, No. 2 (pp. 151–158). University of Nottingham. https://www.bsrlm.org.uk/wp-content/uploads/2016/09/BCME8-20.pdf↩︎

  30. Foster, C. (2024). The quadratic formula. Teach Secondary13(1), 21. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20The%20quadratic%20formula.pdf↩︎

  31. Foster, C. (2022). Adding surds. Teach Secondary11(7), 13. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Adding%20surds.pdf↩︎

  32. Foster, C. (2024). Questions pupils ask: Why aren’t square roots additive? Mathematics in School53(5), 29–31. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20QPA%20Why%20aren’t%20square%20roots%20additive.pdf↩︎

  33. Foster, C. (2025). Rationalising all kinds of denominators. Mathematics in School54(1), 13–15. https://foster77.co.uk/Foster%2C%20Mathematics%20in%20School%2C%20Rationalising%20all%20kinds%20of%20denominators.pdf↩︎

  34. See Malcolm Swans’ ‘Manipulating Surds N11’ task from the Department for Education Standards Unit: https://www.stem.org.uk/resources/library/resource/26731/manipulating-surds-n11↩︎

  35. Prestage, S., & Perks, P. (2003). Ban the equals sign. Mathematics Teaching, 192, 3-5.↩︎

  36. Foster, C. (2023, March 16). Crocodiles and inequality signs [Blog post]. https://blog.foster77.co.uk/2023/03/crocodiles-and-inequality-signs.html↩︎

  37. Prestage, S., & Perks, P. (2005). Inequalities and paper hats. Mathematics Teaching193, 31-33.↩︎

  38. Foster, C. (2013). Unequal reasoning. Mathematics in School42(1), 19–21. https://www.foster77.co.uk/Foster,%20Mathematics%20In%20School,%20Unequal%20Reasoning.pdf↩︎

  39. Foster, C. (2024). Inequality signs turning round. Teach Secondary13(4), 21. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Inequality%20signs%20turning%20round.pdf↩︎

  40. Foster, C. (2012). Changing the subject. Teach Secondary1(2), 33–35. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Changing%20the%20Subject.pdf↩︎

  41. Foster, C., Francome, T., Shore, C., Hewitt, D., & Sangwin, C. (2024). Priority of operations: Necessary or arbitrary? For the Learning of Mathematics44(2), 24–26.↩︎

  42. Foster, C. (2008). Higher priorities. Mathematics in School, 37(3), 17. https://www.foster77.co.uk/Foster,%20Mathematics%20In%20School,%20Higher%20Priorities.pdf↩︎

  43. See https://aperiodical.com/tag/order-of-operations/↩︎

  44. Foster, C. (2020). Number snakes. Teach Secondary9(8), 70–71. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Number%20snakes.pdf↩︎

  45. Foster, C. (2020). Revisiting ‘Four 4s’. Mathematics in School49(3), 22–23. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Revisiting%20Four%204s.pdf↩︎

  46. Foster, C. (2017). Just four numbers. Teach Secondary6(7), 36–37. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Just%20four%20numbers.pdf↩︎

  47. Fry, H. (2018). Hello World: How to be human in the age of the machine. Random House.↩︎

  48. Foster, C., & Ollerton, M. (2020). Mathematical white lies. Mathematics Teaching272, 24–25. https://www.foster77.co.uk/MT272Foster&OllertonMathematical_white_lies.pdf↩︎

  49. Acheson, D. J. (2002). 1089 and all that: A journey into mathematics. Oxford University Press.↩︎

  50. Foster, C. (2016). Making products. Teach Secondary5(5), 31–33. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Making%20products.pdf↩︎

  51. Foster, C. (2019). Missing the point. Teach Secondary8(2), 86–87. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Missing%20the%20point.pdf↩︎

  52. Foster, C. (2021). Quotative and partitive models of division. Mathematics in School50(3), 24–25. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Quotative%20and%20partitive%20models%20of%20division.pdf↩︎

  53. Foster, C. (2025). Being flexible about division. Primary Mathematics29(3), 3–6. https://foster77.co.uk/Foster,%20Primary%20Mathematics,%20Being%20flexible%20about%20division.pdf↩︎

  54. Foster, C. (2022). Crossing out. Mathematics in School51(4), 6–7. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Crossing%20out.pdf↩︎

  55. Foster, C. (2022). Factors and multiples. Teach Secondary11(3), 13. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Factors%20and%20multiples.pdf↩︎

  56. Foster, C. (2012). The what factor? Teach Secondary1(4), 56–58. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20The%20What%20Factor.pdf↩︎

  57. Foster, C. (2024). In favour of the Euclidean Algorithm. Mathematics in School53(1), 24–26. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20In%20favour%20of%20the%20Euclidean%20Algorithm.pdf↩︎

  58. Foster, C. (2012). HCF and LCM – Beyond procedures. Mathematics in School41(3), 30–32. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20HCF%20and%20LCM%20-%20Beyond%20Procedures.pdf↩︎

  59. Foster, C. (2012). Squares within squares. Mathematical Gazette96(536), 328–331. https://doi.org/10.1017/S002555720000468X↩︎

  60. Foster, C. (2022). Factor puzzles. Symmetry Plus79, 4–5. https://www.foster77.co.uk/Foster,%20Symmetry%20Plus,%20Factor%20puzzles.pdf↩︎

  61. Neale, V. (2017). Closing the gap: The quest to understand prime numbers. Oxford University Press.↩︎

  62. Foster, C. (2016). Questions pupils ask: Why isn’t 1 a prime number? Mathematics in School45(3), 12–13. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Why%20isn’t%201%20a%20prime%20number.pdf↩︎

  63. Foster, C. (2015). Prime suspects. Teach Secondary4(2), 41–43. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Prime%20Suspects.pdf↩︎

  64. Foster, C. (2021). Making sense of proof by contradiction. Scottish Mathematical Council Journal, 51, 74–77. https://www.foster77.co.uk/Foster,%20Scottish%20Mathematical%20Council%20Journal,%20Making%20sense%20of%20proof%20by%20contradiction.pdf↩︎

  65. Foster, C. (2016). Thirty factors. Mathematics in School, 45(2),25–27. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Thirty%20Factors.pdf↩︎

  66. Foster, C. (2022, August 4). Misremembering Goldbach’s conjecture [Blog post]. https://blog.foster77.co.uk/2022/08/misremembering-goldbachs-conjecture.html↩︎

  67. Foster, C. (2024). Terminal digits and prime numbers. Mathematics in School53(5), 8–9. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Terminal%20digits%20and%20prime%20numbers.pdf↩︎

  68. Foster, C. (2017). Surprise, surprise! Teach Secondary6(1), 42–44. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Surprise%20surprise.pdf↩︎

  69. Foster, C. (2024). Terminal digits and prime numbers. Mathematics in School53(5), 8–9. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20Terminal%20digits%20and%20prime%20numbers.pdf↩︎

  70. Foster, C. (2020). Combining square numbers. Teach Secondary9(7), 94–95. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Combining%20square%20numbers.pdf↩︎

  71. Foster, C. (2016). All square. Teach Secondary5(3), 33–35. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20All%20Square.pdf↩︎

  72. Foster, C. (2022). Converting recurring decimals to fractions. Scottish Mathematical Council Journal, 52, 68–70. https://www.foster77.co.uk/Foster,%20Scottish%20Mathematical%20Council%20Journal,%20Converting%20recurring%20decimals%20to%20fractions.pdf↩︎

  73. Foster, C. (2018). Questions pupils ask: What is the formula for factorial? Mathematics in School47(4), 40–41. https://www.foster77.co.uk/Foster,%20Mathematics%20in%20School,%20QPA%20What%20is%20the%20formula%20for%20factorial.pdf↩︎

  74. Foster, C. (2022). The usual suspects? Teach Secondary11(4), 62–63. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20The%20usual%20suspects.pdf↩︎

  75. https://nrich.maths.org/problems/proof-picks-theorem↩︎

  76. Note that I am being a bit free and easy with notation in this sentence, and using ‘\(x\)’ to represent both ‘a move one unit to the right’ and ‘the total distance to the right’.↩︎

  77. Foster, C. (2011). Resources for teaching mathematics 11–14. Continuum.↩︎

  78. Foster, C. (2014). Arithmagons. Teach Secondary, 3(2), 57–59. https://www.foster77.co.uk/Foster,%20Teach%20Secondary,%20Arithmagons.pdf↩︎

  79. Foster, C. (2016). Sums of pairs. Symmetry Plus59, 14–16. https://www.foster77.co.uk/Foster,%20Symmetry%20Plus,%20Sums%20of%20Pairs.pdf↩︎