What happens when you type ls -l in the shell

Image for post
Image for post
Photo by Marc Rentschler on Unsplash

By the way of introduction, I would like to make few reminders.

Image for post
Image for post
Conceptual diagram of linux architecture

· The kernel is the core of your computer’s operating system.

· Shell is a command line interpreter, it allows a user to communicate with the kernel by entering commands into the command line.

The command ‘ls’ lists all the files and directories in the current directory.

There are many options that can be used with ‘ls’. The command ‘man ls’ shows the manual page of ‘ls’. The flag ‘-l’ means “use a long listing format”.

So the command in full ‘ls -l’ displays all the files and directories in the current working directory, along with respective permissions, owners, and created date and time.

When you type ‘ls -l’ and hit<Enter>, you’ll see:

Image for post
Image for post

Are we done? Is this it then?

No, the fun only begins. Let us walk you through what actually happens when you type ‘ls -l’.

In a nutshell, we could see Shell as an infinite loop, that ends when the user chooses to type ‘exit’, or Ctrl-D (Ctrl-C won’t force Shell to exit).

That loop is made of three sequential steps:

Image for post
Image for post
Shell: sequential steps of the infinite loop

READ

When you open the terminal on the computer, you’ll see the Shell’s prompt ‘$’. The prompt is stored in the $PS1 environment variable.

With the prompt, the Shell indicates that it is waiting for a command. When the user types in a command, the Shell makes a system call to the kernel. System calls are the way for the Shell to communicate with the kernel.

We need to store the input command into a buffer as a string. Which means that it needs some space to store it. That’s where the function getline comes in handy since it will handle the memory management operations, such as malloc and realloc, if needed. Providing that you don’t forget to free these allocated memory by the end of the program.

As you can see, in the command ‘ls -l’, there is a command name ‘ls’, followed by a space, then a flag ‘-l’.

The getline function stores the input command into an array of characters.

Note the ‘\n’ at the end of the command, that we’ll need to handle as well.

Image for post
Image for post

PARSE

In order to get the command ready to run, we’d need to split the command into tokens by removing the delimiters, such as white spaces and ‘\n’.

The strtok function will set a null (‘\0’) character in each delimiter. It splits up a string and returns an array of tokens. Tokens are basically just individual pieces of a line.

Image for post
Image for post

Unlike getline, with strtok, you will be in charge of the memory management, as it will need memory allocated to store char pointers to each token, increase memory if needed, and last but not least, to free them all by the end of the program.

The Shell will focus on the first token, ‘ls’. This token is considered to be a command name.

During this phase, ‘ls’ gets checked against aliases, built-ins and the environment $PATH. If a match is found in any of these places, they will be passed immediately to the execution phase.

Aliases

Any name could be an alias for another command. In which case, the Shell will replace the alias with the full command.

Regarding ‘ls’, we assume that there are no aliases.

There are two types of commands: built-in or executable external.

Built-in

Built-in commands are compiled into a library that is linked into the Shell’s executable file. Since the command file is built into the executable there is no need to search for it on the $PATH.

The Shell starts by verifying if a command is a built-in function, if so, it’ll run it. Since ‘ls’ isn’t a built-in, the Shell will look for it on the $PATH.

Environment

Before we carry on with the $PATH, we take a step back to think out about how programs are executed.

In Unix-based operating systems a process is created every time a program is executed. Each process has a unique process identification number (PID) and its own environment.

An environment contains variables (NAME=Value) that affect how the user experiences the Shell. You can view your environment by typing ‘env’ and pressing <Enter>.

Image for post
Image for post

Some environment variables are set by the system when it is initialized, some come from the user, and some may be set by another program.

$PATH & Executable

$PATH is an environment variable, which contains a colon-separated list of directories.

$PATH is parsed using the ‘=’ as a delimiter. Once the $PATH is identified, all the directories in $PATH are tokenized, parsed further using ‘:’ as a delimiter.

The Shell peruses the $PATH, using system call stat, for an executable file with the name of ‘ls’ in each directory.

It should find ‘ls’ binary executable file in ‘/bin’ directory, and will return the absolute path, which is ‘/bin/ls’.

If the command is not found in the directories in $PATH, the Shell will display an error message.

Image for post
Image for post

Execute

Now we’re getting at the heart of the Shell does.

The system call fork splits a process into two. Forking duplicates the calling process, Shell, and assigns each process with an unique ID. The latter process is known as the “child”, and the original is the “parent”.

Once fork successfully returns, two processes continue to run the same program, but with different stacks, data and heaps.

The wait system call suspends execution of the calling parent process until the child process terminates.

Image for post
Image for post

In order to actually execute the command ‘ls’ with its parameters the Shell needs to use another system call called ‘execve’ within the child process. The system call ‘execve’ allows a process to execute another program.

The system call ‘execve’ accepts an environment as the third argument. So the child process inherite from its parent process’s environment variables.

The reason the command is executed by a child process is because the function ‘execve’ will actually overwrite the calling process. If the parent process calls ‘execve’ then the Shell will terminate once the command is completed.

The results of the command ‘ls -l’ are written to standard output.

Upon success the child processes terminates and the flow of the program returns back to the parent processes.

FREE AT LAST

Last but not least, there is a fourth step: Release.

Don’t forget to free some of the resources previously allocated, before getting back to the beginning of the infinite loop.

A command prompt will be printed and the Shell will wait for another command to be written on the command line and the process will start all over again.

When requested to exit definitely the infinite loop (remember: when the user types in ‘exit’ or Ctrl-D), all the resources shall be released.

See you soon for the next adventures in programming.

Co author Yago Martinez-Falero Hein.

Software Engineer student at Holberton School. Reach me @huyxuanminh

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store