Essence»Blog
nakst
=== Introduction ===

Modern day secondary storage drives, like SSDs, divide up their storage space into fixed-sized sectors. On floppy disks and old hard disks, each sector was 512 bytes (these days they're 4KB each). When the operating system wants to access the drive, it needs to specify the sector it wants to access. Sectors are numbered from 0 up to X-1, where X is the total number of sectors on the drive. The number of a sector is called its logical block address, or LBA for short. The total storage capacity of a drive is calculated as X * sector_size.

However, it wasn't always this way. Sectors used to be addressed using three numbers, which for now we'll call C, H and S. In this blog post, I'm going to try to explain how this works, and why I hate it. This post will only cover CHS on IBM PCs; it might have worked differently on your favourite system of yore.

=== Converting between CHS and LBA ===

The conversion from CHS to LBA depends on two properties of the "geometry" of the drive, called HPC and SPT. The conversion itself is a fairly simply affair.

To convert from CHS to LBA, we use the formula:

LBA = (C * SPT * HPC) + (H * SPT) + (S - 1)

To convert from LBA to CHS, we use the formulae:

C = (LBA / SPT) / HPC
H = (LBA / SPT) % HPC
S = (LBA % SPT) + 1

We can think of C-H-S as 3 digits, with S ranging from 1 up to SPT, H ranging from 0 up to HPC - 1, and C starting at 0 and going up to whatever is necessary to index all sectors on the drive. As such, it is a fairly standard modular arithmetic affair, with the only complication being that S inexplicably starts at 1.

Why do we want to convert between CHS and LBA? Well, CHS would be a completely bonkers way for the operating system to keep track of where files are located on a drive. Not only would it massively complicate writing file system drivers, it would also have the benefit of making a file system dependent on the drive on which it is created (because it relies on the drive's geometry).

=== HPC and SPT ===

When you ask the BIOS for the geometry of a drive, instead of giving you the values of HPC and SPT, it unhelpfully gives the maximum value H and S can respectively take. The value of HPC is one more than the maximum value of H (since H goes from 0 to HPC - 1). The value of SPT however is the same as the maximum value of S, because S starts at 1.

=== Eating the fruit of the tree whereof I said thou shouldst not eat ===

If you haven't lost all hope in technology yet, don't worry. It's about to get a lot worse. I must now uncover the terrible reality which I have kept hidden from you. I will now explain what CHS, HPC and SPT stand for.

C: Cylinder
H: Head
S: Sector
HPC: Heads per cylinder
SPT: Sectors per track

Oh dear.

=== A floppy disk ===

A floppy disk contains a thin circular disk of a magnetic material. Data is read and written by a read head, part of the floppy drive (please ignore that the "read" head does both reading and writing). The data on the disk is divided in 512 byte sectors. To make efficient use of the space, sectors are placed on both sides of the disk. On a given side, each sector belongs to a track, a group of sectors that all the same distance from the centre of the disk.



As you may see in the above diagram, the number of sectors in each track is kept constant. This means that the sectors on the tracks close to the center of the disk are physically smaller. There is some empty space left in the centre, because if sectors were placed there, the timing of accessing them would be too precise.

Thus, to uniquely identify a sector on a floppy disk, we need precisely 3 pieces of information:
  1. Which side of the disk the sector is on.
  2. How far the sector is from the center of the disk (radius).
  3. The anticlockwise angle, starting from the positive x axis, which we must go to reach the sector.

These are in fact the values we'll use for CHS addressing! (1) corresponds to H, the "head"; (2) corresponds to C, the "cylinder"; and (3) corresponds to S, the "sector" (very aggressive air quotes for this one).

=== Cylindrical coordinates ===

Sectors are arranged on a hard disk in a very similar fashion to a floppy disk. The main difference is that in a hard disk there are multiple magnetic disks stacked on top of each other. So we can extend our above addressing format to deal with hard disks by changing (1) to give the depth of the sector from the top of the stack.

With this, we are identifying each sector with a depth-radius-angle triplet. In mathematics, this is a fairly standard way to identify points on a cylinder, which is a good way to visualize the arrangement of sectors in a hard disk.



=== Sets of sectors ===

If we specify a depth, radius and angle we uniquely identify exactly one sector. However, what happens if we don't specify all three? Then we identify a collection of the sectors on the disk. If we think about the sectors on the disk as forming a mathematical set, then we can think of these underspecified sector collections as subsets of the set of sectors on the disk. Let us consider the geometric interpretation of each possibility.

Case 0 - nothing specified: this gives all the sectors on the disk.

Case 1 - depth specified: this gives a vertical slice of sectors. For example,


Case 2 - radius specified: this gives a sub-cylinder of sectors. For example,


Case 3 - angle specified: this gives a wedge of sectors. For example,


Case 4 - depth and radius specified: this gives a vertical slice of a sub-cylinder of sectors. For example,


Case 5 - radius and angle specified: this gives a vertical stack of sectors. For example,


Case 6 - angle and depth specified: this gives a vertical slice of a wedge of sectors. For example,


Case 7 - depth, radius and angle all specified: this gives exactly one sector.

All this allows us to give a new interpretation of what a CHS address is actually doing. Instead of thinking of it as directly specifying a depth, radius and angle, we can think of as an intersection between 3 sets, one from Case 1, one from Case 2, and the last from Case 3.

For example, the Case 1 set (depth) is coloured in green, the Case 2 set (radius) is coloured in blue, and the Case 3 set (angle) is coloured in red.


=== The explanation ===

With this greater understanding under our metaphorical belt, we are ready to explain what IBM means by "cylinder", "head", "sector", "sectors per track" and "heads per cylinder".

First in a CHS address C, the "cylinder", is specified. This gives the radius at which the sector exists. We now have a Case 2 set of sectors.

Secondly, we specify H, the "head". This gives the depth at which the sector exists. But this is where the terminology becomes complicated. There are two things that we could mean by a "head" on a disk. We could either be referring to a Case 1 set where we have only specified the depth, OR we could be referring to a Case 4 set where both the radius and depth have been specified, since before selecting a "head" we must have already chosen a "cylinder". The name given to Case 4 set is a "track".

Finally, we specify S, the "sector". This gives the angle at which the sector exists. However, instead of thinking of "sector" as the angle itself, we are in Case 7 -- a unique sector -- because we have already specified the "cylinder" and "head" (altogether a "track"). Therefore, although the "sector" digit by itself gives a Case 3 set, along with its previous digits we do indeed specify a sector.

This naming is completely nonsensical. If S is called a "sector", then surely we should call H a "track", because with its prior digit we have identified a track! And going the other way, if H if called a "head", it seems as if we are calling each digit be the most general set it represents when standing alone, and therefore we should be calling S a "wedge" (or something along those lines).

We have further problems when we look into "SPT" and "HPC". "SPT" stands for "sectors per track", which actually makes a reasonable amount of sense. It does indeed tell you the number of sectors in each "track" -- a Case 4 set. However "HPC" stands for "heads per cylinder" which is an absurdity: it is asking for the number of heads (Case 1 sets) in each cylinder (Case 2 sets), and yet Case 1 sets are not subsets of Case 2 sets! A more reasonable name for "HPC" would be "tracks per cylinder", since it tells us the number of Case 4 sets in each Case 2 set. But it could also be called the number of "heads per disk", the number of distinct Case 1 sets, an equivalent definition.

*Yikes.*

=== Addendum A: Virtual geometry ===

Since solid state drives don't actually have any disk-like geometry, they don't have for values HPC and SPT. Unfortunately, because of Backwards Compatibility™, we need to emulate CHS addressing for things like SSDs and USB thumb drives. (In fact, we also have to emulate CHS addressing for optical media like CDs, because although they have disk-like structure, they weren't similar enough to old hard drives for IBM's liking.)

Sadly, we can't just make these drives report SPT to be larger than the total number of sectors, so that we can ignore C and H. No; that would be far to easy. We have the restriction that S must fit within 6 bits, and H within 8 bits. Therefore, HPC can be at most 256, and SPT at most 63. Wait, why can't SPT be 64? Recall, because of S starting at 1, the maximum value of S *is* SPT. Thus, SPT itself must be less than 2^6. So it's at most 63. When selecting virtual geometry for a drive, it is customary to select SPT and HPC to be their maximum values. Then, C will range from 0 up to X / 16128 - 1, where X is the total number of sectors on the drive.

=== Addendum B: Packed like sardines ===

As stated in Addendum A, S is a 6 bit value and H is an 8 bit value. Additionally, C is a 10 bit value. (This limits us to at most 63 * 256 * 1024 = 16515072 sectors (just less than 8GB) that can be addressed using CHS addressing.) This means that we can fit a CHS into a 3 byte value. You might think that the standard encoding of CHS would be something like (C << 14) | (H << 6) | (S << 0). But of course, IBM would not want it to be this simple. Instead, it's ((C & 0xFF) << 16) | (((C & 0x300) >> 8) << 14) | (S << 8) | (H << 0). (Admittedly, this encoding might have made it slightly easier to do on 16-bit processors, but it still annoys me.)

=== Addendum C: The lifetime of the universe ===

English has a similar problem with the word "second" to disks with the word "sector". If I ask you what the "current second" is, what do I mean? I could be asking you to identify for me the exact second that the present is currently occupying in the lifetime of the universe. To answer such a question, you would have to specify a year, month, day, hour, minute and second. But I could also be asking for the current second within the minute; i.e. what a second hand on a clock is currently pointing to. Akin to "sector", "second" refers both to a precise subsection of time, and to a single digit used in expressing times. (It also refers to a period of time equivalent to the length of these subsections of time, starting from an arbitrary point in time. But let's try to not think about that.)

=== Addendum D: Further reading ===

https://en.wikipedia.org/wiki/Logical_block_addressing
https://en.wikipedia.org/wiki/Master_boot_record
https://en.wikipedia.org/wiki/Cylindrical_coordinate_system
nakst
Generating polygon outlines

Recently I wrote a small 2D vector graphics renderer for Essence. It's currently only used for drawing binary SVG icons (previously I used the renderer in nanosvgrast.h), but I plan to use it for a custom animatable icon format, and also expose its functionality through the API directly. For the actual rasterization process, it uses a method closely adapted from stb_truetype.h, which you can read more about in Sean Barrett's article. In this blog post I'm going to discuss how my renderer generates outlines of polygons to draw lines.

The two functions that handle generating outlines are RastShapeCreateContour and RastShapeCreateDashed, which create solid and dashed lines respectively. They both take a RastPath as an argument, which is simply a list of vertices representing the polygon. Curved edges are flattened to straight lines of a given tolerance before outline generation begins. The polygon outline is returned as a RastShape, which consists of a sorted array of edges, and the bounding rectangle of the shape. The shape can then be drawn onto a surface. Outlines should be drawn with a non-zero fill rule, for reasons we'll discuss later.

First attempt

Let's think about the steps needed to generating an outline. First, we have our polygon (convex for now).

For each pair of adjacent vertices, we're going to extrude the edge in both directions. Draw some perpendicular lines:

Then draw the extruded edges.

Now we do this for all pairs of adjacent vertices.

The code looks something like this.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
RastVertex *from = vertices + (i + 0) % vertexCount;
RastVertex *to   = vertices + (i + 1) % vertexCount;
float dx = to->x - from->x;
float dy = to->y - from->y;
float scale = width / sqrtf(dx * dx + dy * dy);
if (internal) scale = -scale; // Extrude in the opposite direction.
float ox = -dy * scale;
flaot oy =  dx * scale;
RastVertex cf = { from->x + ox, from->y + oy };
RastVertex ct = {   to->x + ox,   to->y + oy };
AddVertexToPath(path, cf);
AddVertexToPath(path, ct);

Since the extruded edges don't meet up, we need to connect them somehow. For now, we'll just draw a straight line connecting them.

Finally, we can fill the created shape.

Well... it was only a first try, wasn't it...

Line joins

By directly connecting internal edges of the outline, we find that regions of the desired outline get cut-out. What we actually want to do is find the intersection of the edges (this is done by solving the two linear equations, one in each axis), and then join them at that point. This process is called mitering.

This results in a decent looking outline!

Here's the code.
1
2
3
4
5
6
// from1 and to1 are the ends of the first edge, and from2 and to2 are the ends of the second edge.
float a = to1->x - from1->x, b = from2->x - to2->x, e = from2->x - from1->x;
float c = to1->y - from1->y, d = from2->y - to2->y, f = from2->y - from1->y;
float j = (d * e - b * f) / (d * a - b * c);
RastVertex v = { from1->x + j * a, from1->y + j * c };
AddVertexToPath(path, v);

However, the external edges' bevelled joins still look wrong. We can also apply mitering to them, by extending them out until they meet, in a similar fashion as before.

Pointy.

Unfortunately, mitering the external joins isn't always a good idea. As the angle formed by the three adjacent vertices gets smaller and smaller, the mitered join begins to look hilariously oversized. It's common to place a limit on the distance the lines can be extended and fallback to the bevelled edge when this happens.

An alternative to mitered joins is to round them, which creates a softer, more Microsoft Office-like appearance.

In code, this looks like:
1
2
3
4
5
for (int j = 0; j < divisions; j++) {
	float angle = fromAngle + j * (toAngle - fromAngle) / (divisions - 1 /* don't miss out the final point! */);
	RastVertex v = { point->x + cosf(angle) * width, point->y + sinf(angle) * width };
	AddVertexToPath(path, v);
}


Line caps

As fun as creating outlines for polygons is, in the real world we also need to draw plain lines. This is a nearly identical process, except we now have two ends of the line to deal with, denoted by question marks in the following diagram.

The easiest thing to do is to simply connect the two vertices at either edge with a straight line. This is called a flat line cap.

However, we see that "half" of the vertex at either end of the line is outside the shape we have formed. By imagining an imaginary square with the vertex at its center, and tracing its edges, we form a more pleasing line cap, known as square cap.

Here's the code.
1
2
3
4
5
6
7
float dx = to->x - from->x;
float dy = to->y - from->y;
float scale = width / sqrtf(dx * dx + dy * dy);
RastVertex c0 = { .x = c.x + scale * dx, .y = c.y + scale * dy };
RastVertex d0 = { .x = d.x + scale * dx, .y = d.y + scale * dy };
AddVertexToPath(path, c0);
AddVertexToPath(path, d0);

By re-imagining our imaginary square as an imaginary circle, we can also create rounded line caps.

The code to do this is identical to the rounded line join code given above.

Dashed and dotted lines

To create dashed and dotted lines, we break the path forming the polygon up before sending it to RastShapeCreateContour. In fact, a dashed line is just several evenly spaced lines, going around the polygon's perimeter.

We start by adding the first vertex to the "dash".
1
2
3
4
5
6
RastPath dash = {};
RastVertex from = vertices[0];
RastVertex *to = &vertices[1];

loopStart:
AddVertexToPath(&dash, from);


We then accumulate length going around the polygon until we reach the dash's length. If we stop in the middle of a edge, we linearly interpolate from the start to end vertex to find the stopping position.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
float accumulatedLength = 0;

while (true) {
	float dx = to->x - from.x;
	float dy = to->y - from.y;
	float distance = sqrtf(dx * dx + dy * dy);

	if (accumulatedLength + distance >= dashLength) {
		float fraction = (dashLength - accumulatedLength) / distance;
		RastVertex stop = { from.x + fraction * dx, from.y + fraction * dy };
		AddVertexToPath(&dash, stop);
		from = stop; // We'll pick up from this point.
		break;
	}

	// Go to the next edge of the perimeter.
	accumulatedLength += distance;
	from = *to;
	AddVertexToPath(&dash, from);
	to++;
}


We use the path formed to create an outline. Next, we skip over some length of perimeter given by the gap between the dashes, and then we repeat the process, until we've worked our way around the entire polygon, optionally alternating between different dash lengths to create a variety of patterns. Dots also be constructed by specifying a dash length of 0 and using rounded line caps.

Concave polygons

Let's investigate what happens if we try to use the algorithm we created in the earlier sections to construct the outline of a concave polygon, mitering internal joins and rounding external joins.

Create the lines...

Apply the joins...

Oh no! By rounding the join at the concave section, we get a large arc inside the polygon! For concave sections, we need to force the external edges to the mitered. We can also rounds the interior edge at concave joins, if we want.

Much better. The code the detect the concavity of three adjacent vertices uses the sign of the cross product.
1
float cross = (p2->x - p1->x) * (p3->y - p2->y) - (p2->y - p1->y) * (p3->x - p2->x);

Assuming vertices given anticlockwise: for external edges, if the cross product is greater than zero, we have to use a miter join, and for internal edges, if the cross product is less than zero, we have to use a miter join. Otherwise, we can use a bevelled or rounded join.

Why we use a non-zero fill rule

Let's apply our algorithm to this shape.

When we extrude edges, something bad happens. Edges extruded from non-adjacent vertices are crossing!

Focusing on the interior edges, and after applying mitering, we get:

Unfortunately, I couldn't think of a way these extra bits without searching the resulting outline for self-intersections. So instead, I just draw the outlines with a non-zero fill rule, and these unwanted regions don't shown up.

The end

When writing the code for this renderer, I was surprised how easy most of it was. I always expected outline generation to be complicated, but once I began to break the problem down it wasn't that bad after all. Furthermore, writing the renderer was very rewarding, because of the immediate visual feedback you get. Unfortunately, there are a few edge cases and bugs that need to be ironed out, but currently there's no rush for me to fix them.

You can find the full code for the renderer on the Essence Bitbucket page.

If you'd like to contribute to Essence, or otherwise discuss it, join the Handmade Network Discord guild, where Essence has its own text channel.
nakst

Click to zoom in.

The OS currently uses FreeType for text rendering. Although I originally used stb_truetype, it unfortunately was not able to produce text of comparable quality to other modern operating systems. This lead me to porting FreeType to the OS. In this blog post I'll explain how building and porting FreeType works, which'll hopefully help people implement it into their own programs.

Downloading FreeType

FreeType can be downloaded from their website, www.freetype.org. I haven't yet updated to version 2.9.1 (still using 2.9), but it shouldn't matter which version you download. Once you've got the source, all you need to do is extract it. The OS's build system automatically downloads and extracts the source in these three UNIX commands:

1
2
3
curl https://mirrors.up.pt/pub/nongnu/freetype/freetype-2.9.tar.gz > freetype-2.9.tar.gz
gunzip freetype-2.9.tar.gz
tar -xf freetype-2.9.tar


Configuring FreeType

FreeType is a large library with support for many different features, most of which you probably don't need. Luckily, the main configuration for the library is split into only 3 files (I'm looking at you, GCC!). The build system automatically patches all three of these files after the source has been downloaded.

The first is modules.cfg, which allows you to enable and disable different features. Features are written in the form FONT_MODULES += .... You can disable the ones you don't need by placing a # at the start of the line. Here is the full list of modules we disable, in addition to the disabled-by-default modules:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
type1
cff
cid
pfr
type42
winfonts
pcf
bdf
autohint
pshinter
cache
gzip
lzw
bzip2
psaux
fttype1.c
ftwinfnt.c


You don't have to disable all of these if libary size isn't a concern, but I wanted to keep it as lightweight as possible.

The next file is include/freetype/config/ftoption.h. This is a header file, so you can disable features here using //. From this file, I disable FT_CONFIG_OPTION_ENVIRONMENT_PROPERTIES since the OS doesn't have any notion of environment variables (yay!), FT_CONFIG_OPTION_USE_LZW and FT_CONFIG_OPTION_USE_ZLIB to avoid any dependencies on external decompression libraries, FT_CONFIG_OPTION_POSTSCRIPT_NAMES, FT_CONFIG_OPTION_ADOBE_GLYPH_LIST, and FT_CONFIG_OPTION_MAC_FONTS. I also enable FT_CONFIG_OPTION_DISABLE_STREAM_SUPPORT so there's no dependency on the C standard library.

The final file we patch is ports/freetype/src/include/freetype/config/ftstdlib.h, where FreeType gets the names of C runtime functions from. We have a lightweight set of these functions in the OS API, prefixed with OSCRT-, so if a program doesn't want the C standard library, it won't have to link to it. We redirect all the functions in this file to our versions.

At the start of the file, I include the OS API's header file. We have to declare some additional defines so we can statically link FreeType with the API. Sadly, I can't remember how many of these are still necessary since the API has been through several refactors. If you're building FreeType on another platform you shouldn't have to worry about all this.

1
2
3
4
5
6
7
8
#ifndef IncludedEssenceAPIHeader
#define OS_CRT
#define OS_API
#define OS_FORWARD(x) x
#define OS_DIRECT_API
#define OS_EXTERN_FORWARD extern
#include <os.h>
#endif


After that, I comment out all the C standard library header inclusions, e.g. // #include <stdio.h>, and changes all the defines to use the OSCRT- prefix, e.g. #define ft_strrchr OSCRTstrrchr. If you want to use my lightweight reimplementation of all the C standard library functions needed to port FreeType, look at api/crt.c of the OS's source. I'm doubtful whether you can copyright such simple implementations of such generic functions, but it'd be nice if you credited the project if you used these functions.

Building

Now that we've configured FreeType we can build it. This is a simple configure followed by make, although I have to add a few extra flags because I'm cross-compiling.

1
2
./configure --without-zlib --without-bzip2 --without-png --without-harfbuzz CC=x86_64-essence-gcc CFLAGS="-g -ffreestanding -DARCH_X86_64 -Wno-unused-function" LDFLAGS="-nostdlib -lgcc" --host=x86_64-essence
make ANSIFLAGS="" > /dev/null


When configuring I inform it of the cross compiler and all the no-cstdlib releated flags. I then tell it to not use any dependencies. I pass ANSIFLAGS="" to the makefile, so it doesn't complain that my OS's header isn't compatible with whatever version of C they used 100 years ago. I also pass > /dev/null/ so only errors are reported. Much quieter. If everything built correctly, you now have a new shiny static library in objs/.libs/libfreetype.a, and headers in include/. If it didn't build correctly, please don't ask me why, because I have no idea.

Using FreeType

Once you've build FreeType, linked your program to it, and added its include path, you can now start actually using it.

In your soruce file, first include the FreeType headers. For some reason you have to include ft2build.h, sine it defines the path of the header file you actually want to include. There's probably a good reason why (there isn't).

1
2
#include <ft2build.h>
#include FT_FREETYPE_H


Since we built FreeType without access to C's stdio functions, we can load our fonts ourselves.

1
2
3
void *loadedFontRegular = OSFileReadAll(OSLiteral("/OS/Fonts/Noto Sans Regular.ttf"), (size_t *) &nodeRegular.fileSize);
void *loadedFontBold = OSFileReadAll(OSLiteral("/OS/Fonts/Noto Sans Bold.ttf"), (size_t *) &nodeBold.fileSize);
...


We then can initialise FreeType and load our fonts into it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FT_Library freetypeLibrary;
FT_Face fontRegular, fontBold, ...;
FT_Error error;

error = FT_Init_FreeType(&freetypeLibrary);
if (error) ...
error = FT_New_Memory_Face(freetypeLibrary, (uint8_t *) loadedFontRegular, nodeRegular.fileSize, 0, &fontRegular);
if (error) ...
error = FT_New_Memory_Face(freetypeLibrary, (uint8_t *) loadedFontBold, nodeBold.fileSize, 0, &fontBold);
if (error) ...
...


Rendering glyphs

Good news! We are finally render to actually render glyphs from our fonts.

1
2
3
4
5
6
7
int size = 9;
int character = 'a';
FT_Face font = fontRegular;

FT_Set_Char_Size(font, 0, size * 64, 100, 100);
FT_Load_Glyph(font, FT_Get_Char_Index(font, character), FT_LOAD_DEFAULT);
FT_Render_Glyph(font->glyph, FT_RENDER_MODE_LCD);	// Render using subpixel antialiasing.


We can then extract some information about the glyph that we need.

1
2
3
4
5
6
7
8
9
int advanceWidth = font->glyph->advance.x >> 6;		// The amount of pixels to advance to draw the next character.
int lineHeight = font->size->metrics.height >> 6; 	// The height of a line.
int ascent = font->size->metrics.ascender >> 6; 	// Where the line starts, vertically. (Used for drawing carets).

FT_Bitmap *bitmap = &font->glyph->bitmap;
int width = bitmap->width / 3;				// The width of the glyph.
int height = bitmap->rows;					// The height of the glyph.
int xoff = font->glyph->bitmap_left;			// The x offset to use when drawing the glyph.
int yoff = -font->glyph->bitmap_top;			// The y offset to use when drawing the glyph.


Then we can store the glyph's data in our cache. The OS caches the last 256 glyphs it sees, storing their size, character and font. If a glyph is already in the cache, it won't go to FreeType to render it. We need to shift the color channels around to get them into our desired format, and when also reduce the effect of the subpixel antialiasing to prevent the appearance of color fringes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
for (int y = 0; y < height; y++) {
	for (int x = 0; x < width; x++) {
		int32_t r = (int32_t) ((uint8_t *) bitmap->buffer)[x * 3 + y * bitmap->pitch + 0];
		int32_t g = (int32_t) ((uint8_t *) bitmap->buffer)[x * 3 + y * bitmap->pitch + 1];
		int32_t b = (int32_t) ((uint8_t *) bitmap->buffer)[x * 3 + y * bitmap->pitch + 2];

		// Reduce how noticible the colour fringes are.
		int32_t average = (r + g + b) / 3;
		r -= (r - average) / 3;
		g -= (g - average) / 3;
		b -= (b - average) / 3;

		output[(x + y * width) * 4 + 0] = (uint8_t) r;	// Put the colors into the order we want. You may want to change this.
		output[(x + y * width) * 4 + 1] = (uint8_t) g;
		output[(x + y * width) * 4 + 2] = (uint8_t) b;
		output[(x + y * width) * 4 + 3] = 0xFF;
	}
}

#ifdef OS_ALT_BACKEND
BackendStoreGlyph(output, width, height, fontCachePosition); 	// If we're using the SDL backend, upload the glyph to the graphics card.
#endif


We can then draw the glyph to the screen. outputPosition contains the position on the screen we're drawing to. After the glyph's drawn, you'll want to add advanceWidth to outputPosition.x. textColor contains the color of our text.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
int xOut = outputPosition.x + xoff;
int yOut = outputPosition.y + yoff;
int xFrom = xOut, xTo = xOut + width;
int yFrom = yOut, yTo = yOut + height;
uint8_t alpha = (textColor & 0xFF000000) >> 24;

for (int oY = yFrom; oY < yTo; oY++) {
	int y = oY - yOut;

	for (int oX = xFrom; oX < xTo; oX++) {
		int x = oX - xOut;

		uint32_t pixel = *((uint32_t *) (output + (x * 4 + y * width * 4)));			// The pixel from the rendered glyph.
		uint32_t *destination = (uint32_t *) ((uint8_t *) bitmap + oX * 4 + oY * stride);	// The destination in our window.
		uint32_t original = *destination;

		uint32_t ra = (((pixel & 0x000000FF) >> 0) * alpha) >> 8;		// Alpha blending.
		uint32_t ga = (((pixel & 0x0000FF00) >> 8) * alpha) >> 8;		// Subpixel antialiasing gives us a seperate alpha value to use for each of the RGB channels.
		uint32_t ba = (((pixel & 0x00FF0000) >> 16) * alpha) >> 8;		
		uint32_t r2 = (255 - ra) * ((original & 0x000000FF) >> 0);
		uint32_t g2 = (255 - ga) * ((original & 0x0000FF00) >> 8);
		uint32_t b2 = (255 - ba) * ((original & 0x00FF0000) >> 16);
		uint32_t r1 = ra * ((textColor & 0x000000FF) >> 0);
		uint32_t g1 = ga * ((textColor & 0x0000FF00) >> 8);
		uint32_t b1 = ba * ((textColor & 0x00FF0000) >> 16);

		uint32_t result = 0xFF000000 | (0x00FF0000 & ((b1 + b2) << 8)) 
			| (0x0000FF00 & ((g1 + g2) << 0)) 
			| (0x000000FF & ((r1 + r2) >> 8));
		*destination = result;
	}
}


Rich text

Instead of implementing HTML5 like other modern UI libraries for rich text, I've started to work on my own simple rich text format. At the moment, the code is fairly work in progress, but let's take a short look at the markup format.

The format uses '\v' (vertical tab) and '\f' (form feed) to open and close a tag by default. Since these characters are non-printable, you don't need to worry about escaping certain characters in your input. And instead of having seperate open and close tags, you put the text your marking inside the tag instead.

For example:
1
2
3
4
5
This word is <em>emphasised</em>.

.. becomes ..

This word is \vem emphasised\f.


A single space after the tag name indicates where the marked text starts. Tags can also be nested, as you'd expect.

1
\vmono This text is monospaced. This word is also \vem emphasised\f\f.


I know I'll definitely want lists, tables and images in the future, but I'm not exactly sure where to draw the line before I end up with something like HTML/CSS. If people have any thoughts about rich text, I'd be interested to here them. I'm fairly happy with how it works already (although the code need a rewrite), and have been using it in the File Manager to change the color of text in list view item labels.



SVG icons

I thought I'd also briefly talk about SVG icons here, since they share the glyph cache with text glyphs. The OS currently uses a mix of bitmap Tango icons, and the SVG elementary icons.

As someone who is morally opposed to XML, I needed a way to render SVGs without introducing an XML parser into the OS. For this reason, I pre-parse all the icons during the build process into a icon_pack file. This is about the 33% of the size of the unzipped SVGs, and 160% of the size of the gzip'd SVGs. (But when gzip'd itself it gets to about 50% the size of the gzip's SVGs). I use nanosvg to parse the files.

The icon_pack file format is incredibly simple. At the start of the file, there's a list of filename hashes and file offsets. After that is the uncompressed pre-parsed data, directly saved from the output of nanosvg. This makes it very quick to load. I can then render the icons using nanosvgrast. You can get nanosvg and nanosvgrast here.

Conclusion

Thank you for reading this blog post! I hope it helps more people get high-quality text into their programs using FreeType. If you have any thoughts about how I should implement rich text, please leave a comment, as I'm interested to here them.
nakst
Hello all. I haven't made a blog post in quite a while since I've been very busy recently, but luckily things are back to normal and hopefully progress can resume at its original pace.

I've drafted a roadmap of all the drivers that I think a functional operating system needs, and marked which ones are already usable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Buses
	PCI - Usable
	USB - Usable
Human interface devices
	PS2 - Usable
	USB mouse/keyboard - Usable
Storage
	ATA - Usable
	ATAPI (CD/DVD drives) - Read only; does not yet support ejecting discs
	SATA - Usable
	NVMe - Usable
	USB mass storage devices - Usable
Filesystems
	EsFS - Essence's custom filesystem
	ISO9660 - Usable
	FAT12 - Read only
	FAT32 - Read only
	NTFS - Read only
	ext2 - Read only 
Sound
	Intel HD Audio - Output to speakers only
Graphics (these are too complex to write from scratch, will need to be ported)
	VGA w/ software rendering - Usable
	BIOS mode setting w/ software rendering - Usable
	UEFI mode setting w/ software rendering - Usable
	Nvidia - Started port of kernel platform layer
	Intel
	AMD
Networking
	Network cards - i8254x working
	Protocol stack - Started basic implementation
	Wifi
	Bluetooth
CPU architectures
	AMD64 - The currently only available platform; SMP working
	x86 - Maybe??
	ARM processors
Misc
	ACPI - Finished port of ACPICA but no power management features
	Raspberry Pi, Arduino, etc.


Currently I think the highest priority are SATA, ATAPI, USB mouse/keyboard and USB mass storage devices. These shouldn't be too difficult to implement.

We also need to work out a solution for graphics and network cards. I know relatively little about these topics, but I presume that these drivers will have to be ported from Linux. If you have experience working with the Linux graphics or networking stacks, your help will be much appreciated!

It'd also be nice to have audio drivers, a few more filesystems, and support for other platforms like ARM, but these are low priority.

I'm now working on a rewrite of all the userland code (I have gained a lot of experience since writing the original code), so I don't have much time to work on drivers. This means I really need contributions from other people to help develop Essence. All the drivers I've separated from the main kernel unity build are in this folder https://bitbucket.org/nakst/essence/src/master/drivers/. These should serve a good introduction into working with the kernel and writing drivers for it.

I highly recommend you join the #essence discord channel on the handmade network guild if you are interested in contributing to the project.

Thank you!!
nakst
This is a continuation of part 1.

Flow layout mode

In the previous blog post, we discussed the table layout mode, where the layout element is divided into a grid with a fixed number of columns and rows.

However, this may not be how we want to layout our UI.

Consider a file manager:



This would be difficult to layout with a table. We'd either have to make two tables (a 2x1 table containing the navigation pane and folder contents, and 1x3 table containing the toolbar, the other table and the status bar), or use multi-cell positioning.

This layout would be more naturally achieved as:
  • Toolbar (OS_CELL_H_PUSH | OS_CELL_H_EXPAND)
  • Start new row
  • Navigation pane (OS_CELL_V_PUSH | OS_CELL_V_EXPAND)
  • Folder contents (OS_CELL_H_PUSH | OS_CELL_H_EXPAND | OS_CELL_V_PUSH | OS_CELL_V_EXPAND)
  • Start new row
  • Status bar (OS_CELL_H_PUSH | OS_CELL_H_EXPAND)


(As a reminder, the "PUSH | EXPAND" flags will cause the cell to fill up the remaining space in that axis, and instruct the element to fill its cell. Therefore, when the window is resized horizontally, the navigation pane will remain fixed, while all the other elements grow, and when the window is resized vertically, the toolbar and status bar are fixed, while the navigation pane and folder contents grow.)

The layout we described above is a flow layout! Elements are laid out in rows, where each row has a variable number of cells. There are no columns between rows like in a table layout.

Flow direction

We need to be able to specify the flow direction. There are 4 options.

  • Left to right
  • Right to left
  • Top to bottom
  • Bottom to top




(Note that the boxes in the bottom 2 examples have different sizes than the boxes in the top 2 examples; top to bottom and bottom to top flow directions do not "rotate" their child elements.)

It's also likely that the programmer will want their layout to following the natural reading order of the system, so there are 2 additional flow directions:

  • Reading order
  • Opposite reading order


These will be automatically assigned to left to right and right to left as the system's text direction changes.

From here on we'll assume that the reading order is fixed at left to right for simplicity.

Measuring a flow layout

To measure a flow layout, each row is considered separately, and then the widest row is selected for the width, and the sum of the row's heights is used for the height. To measure each row, the tallest-reporting element is selected for the height, and the sum of the element's reported widths is used for the width.

In this example, a flow layout has 2 rows, each with 2 elements.



The layout's height is obtained by adding the reported heights of elements 1 and 4, which are the tallest elements in their corresponding rows. The layout's width is the same as the first row's, as it it the widest row. In order words, the layout's width is equal to the sum of the reported widths of elements 1 and 2.

Band underflow

If no weighted element exists in a given row of a flow layout, then there may be some unused space at the end of the row. We discussed in the previous blog post overflow strategies, but now let's look at underflow strategies.

There are four underflow strategies.

  • Start - cells are placed at the start of the row
  • End - cells are placed at the end of the row
  • Center - cells are centred in their row
  • Justify - the remaining space is evenly distributed in between the cells




And as before, the vertical position of an element in its row may be controlled using alignment flags discussed in the previous blog post, OS_CELL_V_TOP, OS_CELL_V_CENTER and OS_CELL_V_BOTTOM.

Wrap layout mode

The third and final layout mode is a modified version of the flow layout mode.

When a flow layout is larger than its given bounds, it'll employ one of the overflow strategies discussed in the previous blog post (clip, compress or scroll). Wrap layouts, on the other hand, move the overflowing elements onto a new row automatically. They can still be instructed to always start a new row before a certain element.



Measuring wrapped elements

The introduction of wrapped layouts create an interesting problem: how do you measure them? In fact, how do you measure any wrapped element at all, such as a paragraph of text? The height of the element depends on its width! You can't just ask it to measure itself.

We need to introduce two special ways of measuring a wrapped element:
  • Measure unwrapped - measure the element as if wrapping was disabled
  • Measure with wrap limit - measure the element's height with a given wrap limit (i.e. width)




Unless the OS_CELL_H_COLLAPSE flag if is specified on an wrapped element, it will use its unwrapped measurement for its reported size. This is mostly not the desired behaviour, you can specify the COLLAPSE flag, and make sure that the element is in a weighted column.

The parent element must first work out the widths of each of its columns - only then can it send the measure with wrap limit message to the wrapped element. It can then find out the height of the element.

Let's look at an example.



Here we have a top to bottom flow layout that contains 3 paragraph elements. Between paragraphs 2 and 3 an new column is started. The two columns receive equal weights. All paragraphs have the following flags: OS_CELL_H_EXPAND | OS_CELL_H_COLLAPSE.

First we start with the columns. As per usual, for each column we find the largest reported width. For both columns in our example this is 0px - since all the paragraphs have the H_COLLAPSE flag. We don't need to measure any of them. Let's say we have 300px of remaining space. Since both columns have the same weighting, this'll get divided up equally. Each column is now 150px in size.

Now we can go into each column. For the first column, we measure paragraph 1 giving it the wrap limit of 150px. This returns a height of 60px, let's say. We then measure paragraph 2 with the wrap limit of 150px. This returns a height also of 60px. The column is now 120px in height. For the second column, we measure paragraph 3 giving it the wrap limit of 150px. This returns a height of 100px. Therefore the second column is 100px in height.

The tallest column is the first column at 120px. The sum of the columns' widths is 300px. Therefore the layout element is 300px by 120px.

Try doing that with CSS.

Wrapping axis

So far we've assumed that we always wrap horizontally. That means when we reach a horizontal limit we move down vertically. However this is not always the case. For example, consider a wrap layout with a top to bottom reading order. It will automatically move elements into a new column once a vertical limit is reached. Therefore it wraps vertically.

Therefore we need to inform our parent element which wrapping axis we need, so that it lay out everything properly.

If we used the wrong wrapping axis, the following would occur:
  • The layout element would attempt to measure the wrapping element. This causes it to report the unwrapped size.
  • The other dimension is then decided based purely on a fraction of the available space.
  • This means the element will not wrap correctly at all. It will end up clipped as soon as it reaches its wrap limit.


By setting the wrapping axis in the parent layout element, we can correctly measure columns then rows, or rows then columns. Therefore when the latter is measured, the elements can be provided with the information gathered from the former.

Making a layout

Okay, let's put together everything we've learnt and recreate a page from Windows 7, "Control Panel\All Control Panel Items\Display". (Apologies for awful gif quality.)



I've labelled each element with a number here which we'll used to refer to them.



Here's how we'll refer to the type of each element:

Paragraphs1, 2
Layouts3, 4, 12
Fixed elements5, 6, 7, 8
Radio boxes9, 10, 11


We won't be considered the window frame, toolbars or the sidebars, since they're all fairly simple table layouts.

Make sure you've carefully understood the resizing behaviour shown in the gif, as we're about to attempt to recreate it.

At the root we have the layout 3 element. This is a flow layout with 2 rows. Its x-overflow strategy is compress, and its y-overflow strategy is scroll. In the first row (which uses the end underflow strategy) there is element 8, and in the second row (which uses the start underflow strategy) there is element 12. Element 12 has the OS_CELL_H_SHRINK flag so its minimum width is set to 0px, and has its maximum width explicitly set to 580px. These settings keep element 8 fixed at the right side of the screen, and element 12 at the left side, and when there is less than 580px of space it will start to shrink.

The layout 12 element is also a flow layout, with the x-overflow strategy of compress. It doesn't need a y-overflow strategy since its parent will always scroll to the necessary height. It has 5 rows. Rows 1 and 2 use the start underflow strategy, and row 5 uses the end strategy. In rows 1 and 2 paragraphs 1 and 2 are placed respectively. They have the OS_CELL_H_COLLAPSE and OS_CELL_H_SHRINK flags. They also are weighted. This means they will always take up the entire space of their rows. In row 5 (the last row) element 7 is placed. Its size is fixed. This row uses the end underflow strategy, to keep the element aligned to the right. In row 4 element 6 is placed. It has the OS_CELL_H_EXPAND and OS_CELL_H_SHRINK flags, and its weighted. Therefore it will take up the entire space of the row. Row 3 contains element 4 and 5. It uses the justify underflow strategy, and when it overflows it will use the compress strategy as noted above. Both elements use OS_CELL_H_SHRINK for when they are compressed after the justified space is gone.

Element 4 is the final layout element, also a flow layout. It has 3 rows, in which radio boxes 9, 10, and 11 are placed. They have the OS_CELL_H_SHRINK flags, but not the OS_CELL_H_COLLAPSE flags. This means they will only start word-wrapping after they get compressed by the parent layout's overflow strategy.

And I think that's it!

The end

That's the end of these two blog posts. In the comments, please leave any questions you have - or any difficult layouts you don't think this system would be able to handle.

Thank you for reading!!