What Does Code Look Like?

Have you ever wondered what a program looks like? It’s a reasonable question, but not one we can answer without some clarification. A program looks very different depending on if we’re talking about source code that a programmer writes, or about assembly which is what a computer runs.

To the Programmer

What the programmer sees is called source code. Source code is written in one of many different programming languages. They all look slightly different, but all work in the same way. Here’s a simple program written in the language ‘C’:

#include <stdio.h>
    
void main() {
     printf("Hello world");
}

This is almost the simplest C program you can write, it simply outputs ‘Hello World’. 

Here the first line tells the computer to look at the ‘standard library’ to figure out what ‘printf’ means. Imagine you’re asking somewhere to differentiate some trig functions. To do that they’ll probably need to know some identities. This line is like telling the person (computer) where to find the identities (code).

The next line, void main(), more or less says ‘the program starts here’. Every program has to have a main function somewhere so the computer knows where to start. 

The last important line, printf(…) , tells the computer to output some text, ‘Hello World’. printf is a function. A function is just a piece of code which we can use multiple times. We do this by passing arguments and getting return values back. Here the text to print is the only argument. printf does have a return value, but we’ll talk about it a bit later.
To get to the next step we need to compile our source code. Essentially we’re translating it from something programmers can read to something the computer can read.

To the Computer

Here’s out simple program in assembly, which if you squint really hard and forget about some technicalities, is kind of what the computer sees. Not all programmers can read assembly, it’s usually only useful for writing low-level software and sometimes improving performance. If you make it through this section you’ll know something many programmers don’t!

.LC0:
    .string "Hello world!"
main:
    push rbp
    mov rbp, rsp
    mov edi, OFFSET FLAT:.LC0
    mov eax, 0
    call printf
    nop
    pop rbp
    ret

Looks pretty different huh? There’s some parts that you should be able to recognise here though. We still have ‘Hello World’ in there, and printf on line 8. Going through this line by line:

.LC0:
This is a label, and doesn’t actually get read by the computer. It’s a placeholder for where in the computer’s memory we’re going to put our data. 

.string "Hello World!"
Here we declare that we have some information we want stored. For our program the only thing we need is the text ‘Hello World!’.

.main:
Similar to before, this is where the computer should start running the program. This is another label, but unlike before this part contains code rather than data.

push rbp
mov rbp, rsp
These two lines are called the prologue. Like most prologues this one isn’t important, so we’ll skip it.

mov edi, OFFSET FLAT:.LC0
This line is where our code starts. mov stands for move. It tells the computer to move some ‘memory’ (information) from one place to another. In this case we’re moving our text from the label LC0 to a ‘register’ called edi.  There’s a bit to unpack here. The text ‘Hello World’ is stored next to our program code. In order for the printf function to access it we need to move it somewhere it can find it. A register is like short term memory. Each stores some small piece of information. It happens to be the case that printf expects to find the text to output in the edi register so that’s where we move our text.

mov eax, 0
Can you guess what happens here?
We’re moving the number 0 into the register eax. The register could have had something left over from some previous code in it. The eax register is used to store the number that a function gives back to us, called a return code. printf is defined so that it will return the number of characters it outputs if it succeeds, or a negative number if it fails. Our program assumes everything went ok, so we don’t use this register after this point.

call printf
This is where the magic happens. call jumps to some code somewhere else, in this case a function which will output some text. Notice that we don’t tell printf what to output here. That’s because it knows to find the text in the edi register that we moved our string into two lines ago.

nop
This line does nothing. Literally. It exists to make some things easier for debuggers, but isn’t important to us.

pop rbp
This is called the epilogue, you might notice it’s similar to the push rbp line near the top of the assembly. Again, we won’t cover what this does.

ret
This tells the computer that we’ve done everything we wanted to and that our program can exit.

And that’s it! We’ve covered what a simple program looks like to both the programmer, and to the computer. Hopefully this gave you more of an understanding of what code looks like, both to a programmer writing it, and to the computer that has to run it.


Disclaimer: Many technical details have been omitted where it aids clarity. If you’re a programmer some of the assumptions here may be misleading. If you want a more in depth explanation of assembly and other similar topics take COMPSCI 110.

Leave a comment

Design a site like this with WordPress.com
Get started