Machine Language

Computers think in binary. Yes or no, true or false, on or off, one or zero.

Each digit is one bit. A byte is eight bits. Hexidecimal is 16 bits. 32 bit processors were the norm up to a few years ago. 64 bit processors are the standard today. So, the largest word a modern personal computer can comprehend is 64 bits long.

It is very difficult for us humans to think using binary mathematics, so, we’re developing higher level languages that we can understand and that the computer can translate into the binary that it needs.

The next level up from binary is assembly language. It is simple instructions that move data around and process it; add, subtract, multiply, divide, etc.. The computer translates the commands and the data into binary. The Assembly programming language is usually specific to a particular machine architecture. Each machine has its own assembly language.

There are two major forms of assembly language, the Intel and the AT&T versions. Most of the rest of the Assembly languages are variations of these two themes. The compiler translates C into LLVM IR, and then it translates LLVM IR into Assembly Code, and then it translates the Assembly code into the binary code the computer works with.

Some higher level languages, like C and C++ are compiled, some are interpreted. Compiled languages are compiled all at once. Interpreted languages are interpreted one line at a time.

Compiling

  • Preprocessing
  • Compiling
  • Assembling
  • Linking

GCC, the GNU Compiler Collection, is a set of tools and libraries that enables you to compile a variety of languages, like C, C++, Fortran and Java on your computer. GCC preprocesses your C program by translating any lines of code marked with hash tags, like #include <stdio.h> or #include <string.h>, and any others that your main program needs, into proper C programming code.

It essentially takes code that is already written in libraries of such codes, and adds it to the program you are working on in the appropriate place. They are like modules that you can plug into your program, so you don’t have to keep writing that code over and over again, every time you need that function.

The compiler links all the libraries, so they all work well with each other. GCC compiles your C source code into Assembly code. Then it assembles your Assembly code into the binary code that your hardware uses to process your software.

Computer Architecture

compiling
Transisters and Gates.

Digital logic is made of electrical circuits made of transisters, capacitors and resisters. A voltage of less than .1 volts may be zero or off, a voltage of more than .5 volts could be a one or on. You can use a small voltage to turn a larger voltage on or off.

A DRAM is a Dynamic Random-Access Memory integrated circuit made of MOSFETs (Metal-Oxide-Semiconductor Field-Effect Transisters) and capacitors. They can be arranged into a variety of applications, including DRAM memory banks, which is an array of MOSFETs.

A DRAM bit cell circuit is a MOSFET and a capacitor. Each MOSFET is one bit. The control unit sends a signal to the appropriate wordline of the 64 bit array. The signal turns the MOSFET on, allowing the capacitor to discharge its voltage, which is either high or low, depending on whether there is a 1 or 0 stored there, onto the bitline.

Integrated Circuits

There is a set of instructions that controls the flow of information, kind of like syntax and grammar controls the meaning of sounds in human language. The instruction set includes addressing mades, instruction catagories, interrupt processing and input/output operations.

In modern computers, the electrical circuits, the DRAM bit cells, are arranged in arrays of 64 bit words. The computer uses the instruction set to read and write to these electrical circuits billions of times every second.

A typical integrated circuit has billions of these tiny circuits in them. There are several different kinds of circuits that accomplish different tasks, but integrated circuits are made out of billions of copies of six or seven different electrical circuits. There are logic gates, latches, flip flops, registers, adders, clocking circuits and sequential logic.

The arrays are arranged into control units, arithmetic logic units and registers. There is usually more than one core on each integrated circuit chip. They work so fast that each one can work on more than one task at a time, so the computer counts each physical core as two cores. So, a chip with four cores will say it has eight cores and use the instruction set to cause them to work accordingly.

Each core will have many registors arrainged into the control unit and the ALU and several layers of cache memory. The L1 layer of cache memory is as close to the ALU as possible. L2 is slightly farther away. The control unit controls the flow of information around to registors, which are arrays of DRAM circuits, made out of MOSFETs, with word lines and bit lines transmitting information in and out of the array.

The Random Access Memory (RAM) is a card that has several chips full of registers that the CPU can use as memory in its calculations. It is plugged into the mother board as close to the CPU as possible.

The integrated circuits are all plugged into a mother board. There is usually some kind of heat sink that draws heat away from the central processor. All that processing generates a lot of heat. They also usually have a fan blowing over the heat sink.

Modern computers usually have a graphics processing unit in addition to the central processing unit. Graphic processing units are designed to work a lot faster than central processing units.

preprocesser

The preprocessor processes the header files before processing the main function. That way, all the functions called for in the header files will be available for the main function.

compiling

The compiler takes the C programming code you write and converts it into Assembly code.

A byte is eight bits. That’s why there are 16 bit, 32 bit and 64 bit words in computer architecture. That’s why computers use hexidecimal instead of decimal numbers. 64 bit words are sometimes two 32 bit words connected together. You know, the registers are arranged into 32 bit words in the CPU, and the Assembly code adds two 32 two bit words into one 64 bit, long long, word.

ASCII is a system that assigns a number for all the letters and symbols we can use in a computer. The computer converts the numbers, letters and symbols we write into ASCII numbers, which can be read as binary numbers in the CPU.

It uses 0x as a prefix to tell the CPU that the current number is actually a hexidecimal number, not an ASCII character. There are other prefixes for other number systems. Hexidecimal is the most important because that’s what computers use to calculate.

The reason we use hexidecimal instead of decimal is that, we can arrange binary into four bit words and 16 is a multiple of four. That is not possible with 10.

So, the compiler compiles C programming language into Assembly code. Assembly is a very simple and precise code, telling the CPU exactly what to do. Such as, move a number from a register in the ALU, the Arithmetic Logic Unit, into a register in cache, or add two registers together.

Registers are the circuits in the CPU, where data is stored temporarily while the computer processes it. The computer is processing data in the billions of operations every second. A laptop with eight processors can process a lot of information fast.

The CPU is reading the code as ASCII in binary numbers. So, the binary number for 64 is 01000000 which the computer recognizes as the @ symbol. 65, or 01000001, equals the ASCII symbol for A. The hexidecimal numbers for those two examples are 40 and 41,

assembling
linking
Loading
Instruction set

The Instuction Set is the vocabulary of the Assembler. x86/x64 computers use a complex instruction set cumputer (CISC) architecture. Here are a few examples of general purpose registers:

  • Register – Name – Function
  • EAX – Accumulator – Arithmetic operations
  • ECX – Counter – Loop counter and shift/rotate counter
  • EDX – Data – Arithmetic and I/O operations
  • EBX – Base – Pointer to data
  • ESP – Stack pointer – Pointer to the top of the stack
  • EBP – Base pointer – Pointer to the base of the stack within a function
  • ESI – Source index – Pointer to the source location within array operations
  • EDI – Destination index – Pointer to the destination location in array operations

Source: Modern Computer Architecture and Organization, Jim Ledin, p 252, 2020

The computer would read 01000101 01000001 01011000 as EAX. The numbers flow through the CPU billions of bits per second.

There is an instruction set that is unique for each kind of machine. The assembly language for an HP laptop may be different than the assembly language for a Dell. The compiler translates the C Programming language into the correct Assembly Language for the machine it is running on.

The Assembler then send the program to the Linker, which links all the pieces of the program together into one executable file. For example, the header files are often large files with functions the current program needs to run. This way you don’t have to keep reinventing the wheel every time you need a function. The Linker links all the pieces of the program together.

Processes and Threads

x86/x64
CISC/RISC

32 bit ARM
64 bit ARM

RISC V

Instruction Set Architecture (ISA)

Motherboard
UEFI
Assembly
Compiler
C/C++

include

int main()
{
printf( “Hello, world!\n” );
return 0;
}

include is a header file that calls for the stdio (standard input output) library to be called into the program.

Write your program in C. The compiler transforms the program from C into LLVM, and then from LLVM into Assembly

Desktop
Mobile
Network