Thinking Functionally with Haskell (30 page)

All very reasonable (except possibly for the last), and we could give some of them mathematical names (
nest i
distributes through concatenation,
nest
is a homomorphism from numerical addition to functional composition and
nest i
commutes with
group
). The third law fails if
nest
were to indent from the beginning of a document; and it would also fail if we allowed text strings to contain newline characters. The last law holds because grouping adds a layout with no line breaks, and nesting has no effect on such a layout. See Exercise D for a more precise argument.

Turning to the properties of
layouts
, we require that

layouts (x <> y)
= layouts x <++> layouts y
layouts nil
= [""]
layouts (text s)
= [s]
layouts line
= ["\n"]
layouts (nest i x)
= map (nestl i) (layouts x)
layouts (group x)
= layouts (flatten x) ++ layouts x

The operation
(<++>)
is lifted concatenation:
xss <++> yss = [xs ++ ys | xs <- xss, ys <- yss]

The function
nestl :: Int -> Layout -> Layout
is defined by

nestl i
= concat (map indent i)
indent i c = if c=='\n' then c:replicate i ' ' else [c]

Finally,
flatten :: Doc -> Doc
is the function that converts a document into one with a single layout in which each newline and its associated indentation is replaced by a single space. This function is not provided in the public interface of our documents library, though it will be needed internally. It is a missing gadget in the sense that we need it to complete the description of the algebraic laws.

We require that
flatten
should satisfy the following conditions:

flatten (x <> y)
= flatten x <> flatten y
flatten nil
= nil
flatten (text s)
= text s
flatten line
= text " "
flatten (nest i x)
= flatten x
flatten (group x)
= flatten x

That makes 24 laws in total (one for
<>
, two each for
nil
and
text
, seven for
nest
and six each for
layouts
and
flatten
). Many of the laws look like constructive Haskell definitions of functions over a data type in which
nil
,
text
and so on are constructors. More on this is in
Section 8.6
.

The eight operations certainly seem reasonable enough, but do they give us sufficient flexibility to describe the layouts we might want? The proof of the pudding is in the eating, so in a moment we will pause to consider three examples. Before doing so, some implementation of documents, however quick and dirty, will be needed to test the examples.

8.3 A direct implementation

One obvious choice of representation is to identify a document with its list of layouts:
type Doc = [Layout]

Such a representation is called a
shallow embedding
. With a shallow embedding, the library functions are implemented directly in terms of the values of interest (here,
layouts
). Later on we will abandon this representation in favour of a more structured alternative, but it is the obvious one to try first.

Here are the definitions of the operations above (we will leave
pretty
until later):

layouts
= id
x <> y
= x <++> y
nil
= [""]
line
= ["\n"]
text s
= [s]
nest i
= map (nestl i)
group x
= flatten x ++ x
flatten x
= [flattenl (head x)]

We have already defined
nestl
, and
flattenl
is defined by

flattenl :: Layout -> Layout
flattenl [] = []

flattenl (c:cs)
| c=='\n' = ' ':flattenl (dropWhile (== ' ') cs)
| otherwise = c:flattenl cs

Do the 24 laws hold for this implementation? Well, let’s go through them. Lifted concatentation
<++>
is associative with
[[]]
as identity element, so the first three laws are okay. The two laws of
text
are easy to check, and the six laws of
layouts
are immediate. All but two laws of
nest
are routine. The remaining two, namely

nest i . nest j = nest (i+j)
nest i . group = group . nest i

involve a bit of work (see Exercises C and D). That leaves the laws of
flatten
. Three are easy, and one can show

flatten . nest i = flatten
flatten . group = flatten

with a bit of work (see Exercises E and F). But the stumbling block is the law
flatten (x <> y) = flatten x <> flatten y
This one is false. Take
x = line
and
y = text " hello"
. Then

flatten (x <> y) = ["hello"]

flatten x <> flatten y = [" hello"]

and the two results are different. The reason is that
flatten
removes the effect of nesting, but does not remove spaces after newlines if they are present in an unnested document. On the other hand,
flattenl
removes spaces after every newline in the document.

Rather than try to fix up this deficiency, we can accept the less than perfect implementation and move on. One can show that all layouts of a document flatten to the same string (see the Answer to Exercise E). The shallow embedding also possesses another property that we will exploit in the definition of
pretty
. To see what it is, consider the function
shape
that returns the shape of a layout:

shape :: Layout -> [Int]

shape = map length . lines

The prelude function
lines
breaks up a string on newline characters, returning a list of strings without newlines. Thus the shape of a layout is the list of lengths of the lines that make up the layout. The crucial property of
layouts
is that the list
of shapes of the layouts of a document is in lexicographically decreasing order. For example, one of the documents described in the following section has 13 possible layouts whose shapes are given by

[[94],[50,43],[50,28,19],[50,15,17,19],[10,39,43],
[10,39,28,19],[10,39,15,17,19],[10,28,15,43],
[10,28,15,28,19],[10,28,15,15,17,19],[10,13,19,15,43],
[10,13,19,15,28,19],[10,13,19,15,15,17,19]]

This list is in decreasing lexicographic order. The reason the property holds is that
layouts (group x)
puts the flattened layout at the head of the list of layouts of document
x
, and a flattened layout consists of a single line. Exercise G goes into more details.

8.4 Examples

Our first example deals with laying out conditional expressions. For present purposes a conditional expression can be represented as an element of the data type
CExpr
, where
data CExpr = Expr String | If String CExpr CExpr
Here is a function
cexpr
that specifies the acceptable layouts described earlier:

cexpr :: CExpr -> Doc
cexpr (Expr p) = text p
cexpr (If p x y)
= group (group (text "if " <> text p <>
line <> text "then " <>
nest 5 (cexpr x)) <>
line <> text "else " <>
nest 5 (cexpr y))

This definition is similar to our previous version, except for the nesting of the subexpressions.

For example, two of the 13 possible layouts for one particular expression are as follows:

if wealthy
then if happy then lucky you else tough
else if in love then content else miserable
if wealthy
then if happy
then lucky you
else tough
else if in love
then content
else miserable

You can see from the last expression why we have chosen an indentation of five spaces. The 13 possible layouts for this particular conditional expression have the shapes displayed in the previous section.

The second example concerns how to lay out general trees, trees with an arbitrary number of subtrees:
data GenTree a = Node a [GenTree a]

Here is an example tree, laid out in two different ways:

Node 1

[Node 2

[Node 7 [],
Node 8 []],
Node 3

[Node 9

[Node 10 [],
Node 11 []]],
Node 4 [],
Node 5

[Node 6 []]]

 

Node 1

[Node 2 [Node 7 [], Node 8 []],
Node 3 [Node 9 [Node 10 [], Node 11 []]],
Node 4 [],
Node 5 [Node 6 []]]

The function
gtree
that produced these trees (coincidentally, also among a total of 13 different ways) was defined as follows:

gtree :: Show a => GenTree a -> Doc
gtree (Node x [])
= text ("Node " ++ show x ++ " []")
gtree (Node x ts)
= text ("Node " ++ show x) <>
group (nest 2 (line <> bracket ts))

The first clause says that a tree with no subtrees is always displayed on a single line; the second clause says that a tree with at least one subtree is displayed either on a single line or has its subtrees each displayed on a new line with an indentation of two units. The function
bracket
is defined by

bracket :: Show a => [GenTree a] -> Doc
bracket ts = text "[" <> nest 1 (gtrees ts) <> text "]"

 

gtrees [t] = gtree t
gtrees (t:ts) = gtree t <> text "," <> line <> gtrees ts

To be honest, it took a little time and experimentation to find the definitions above (for which the function
layouts
proved indispensable), and the result is certainly not the only way to lay out trees. Finally, here is a way of laying out a piece of text (a string of characters containing spaces and newlines, not a document
text
) as a single paragraph:

para :: String -> Doc
para = cvt . map text . words

cvt [] = nil
cvt (x:xs)
= x <> foldr (<>) nil [group (line <> x) | x <- xs]

First, the words of the text are computed using the standard library function
words
, a function we have encountered a number of times before. Then each word is converted into a document using
text
. Finally, each word, apart from the first, is laid out either on the same line or on a new line. If there are
n
+1 words in the text, and so
n
inter-word spaces, the code above describes 2
n
possible layouts. We certainly don’t want to examine all these layouts in computing one that will fit within a given line width.

8.5 The best layout

As we said above, the best layout depends on the maximum permitted line width. That’s a simple decision, but not the only one. In general a pretty layout of a nested document will consist of a ribbon of text snaking across the page, and it is arguable
that the width of the ribbon should also play a part in determining the best layout. After all, is the best layout on an infinitely wide page one in which everything is placed on one line? However, for simplicity we will ignore this very reasonable refinement and take only the line width as the deciding factor.

There is also another decision to be made. Suppose we choose the best layout, according to some criterion, among those layouts all of whose lines fit within the given line width. That’s fine if there is at least one such layout, but what if there isn’t? The two options are either to abandon the formatting process with a suitable error message, or else to do the best we can, accepting that the width may be exceeded.

Psychologically and practically the second option seems the better one, so let us explore what it entails. We can start by comparing the first lines, ℓ
1
and ℓ
2
, of two layouts. We can decide that line ℓ
1
is better than ℓ
2
if: (i) both lines fit into width
w
and ℓ
1
is longer than ℓ
2
; (ii) ℓ
1
fits
w
but ℓ
2
doesn’t; or (iii) neither fits
w
and ℓ
1
is shorter than ℓ
2
. The decision is a reasonable one because it should be capable of being implemented by a
greedy
strategy: fill up the first line as much as possible without exceeding the line width; and if that is not possible, stop as soon as the width is exceeded.

Other books

The Dark Part of Me by Belinda Burns
Cop Job by Chris Knopf
The Middle Kingdom by Andrea Barrett
Notorious Pleasures by Elizabeth Hoyt
Samantha James by His Wicked Promise
The Emerald Cat Killer by Richard A. Lupoff
Se armó la de San Quintín by Nieves Concostrina
Beast of Burden by Ray Banks
Abel Sánchez by Miguel de Unamuno