Multi-thread and multi-process

Yuta Fujii
4 min readFeb 9, 2019

So many times we have to think about the performance and that sometimes I want to give up all the tasks to improve it. There is, of course, some solution for improving the performance of your app, multi-thread and multi-process. But what is the difference between them?

Keywords

  • heap
  • stack
  • registers
  • thread-safety
  • concurrency
  • parallelism

What is a process, a thread?

How does a computer execute your commands? Whatever language you choose, your code is finally translated into i/o (0 and 1) which computer can understand. This translation and output form are what you call compiling and binary file(or form).

What’s next? Computer runs binary code using some resources:

  • Memory: storage for current program CPU is handling(located outside CPU)
  • Registers: short-term storage of the data and transfer them from one component to another
  • Stack: linear LIFO data structure, containing temporary information. e.g. Local variables, return addresses, etc.
  • Counter: a register indicating the next instruction
  • Heap: allocating memory to a process at run time

A process is an instance of a computer program containing binary code along with the resources above.

And a thread is a component of a process. It is an execution unit and it contains program counter, stack and set of registers.

https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/

Difference between process and thread

The main difference between them is whether they share memory or not. Each process does not share them, thus operating independently. Threads, on the other hand, share the same memory allocated to that process.

Multi-process vs Multi-thread

So which way to choose for better performance of an app? The answer depends on what your app is for.

For example, Google Chrome adopted multi-process architecture. Considering a user opens several pages that has nothing to do with each other, and that some of them would not work correctly, it’s better to avoid affecting that malfunctioning to other pages. Thanks to that architecture, we can enjoy Youtube video without problems even when other page breaks.

Processes and threads have pros and cons. Eqbal Quran summarized cleary here.

https://www.toptal.com/ruby/ruby-concurrency-and-parallelism-a-practical-primer

Practical application

If you are to use multi-process architecture in Ruby, simply wrap your task in fork .

# in your app filefork do
sometask
end
Process.waitall

For multi-thread users, I recommend to use some gems to implement that.

Thread safety

As mentioned, multi-threading shares memory, which may cause unexpected outputs. So you have to make sure those malfunctions will not occur, this concept is called as thread safety.

For example, you need to use thread-safe datatype in the multi-thread process(Queue class for Ruby).

Thread-pooling

Threads are not infinite. When you create them without control, you’ll receive an error something like:

can't create Thread: Resource temporarily unavailable (ThreadError)

Thread pooling is the way that split threads first as needed so that it won’t cause this error. I believe most of the external gems can handle this point.

Concurrency vs parallelism

There are similar concepts named concurrency and parallelism.

  • Parallelism: two tasks literally run at the same time
  • Concurrency: two tasks can be ‘in progress’ at the same time

Concurrency is a little tricky, but the main difference is that in concurrency there is always only one task that is actually running.

https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/

A real situation…

Don’t be safe just you succeed to improve the performance just with your local, because configurations differ between your local(development) and the production.

If you’ve bought a hard server for your product, you have to set configurations according to the OS of that production server.

And if you’re using a cloud server like AWS, GCP, Azure or Heroku, read some docs and configure, and test if external libraries you implement for multi-thread/process execution work well. Each cloud server Iaas and Paas vendors offer has some specific OS (AWS operation system is Amazon Linux, not exactly the same as Linux for instance).

Still, if you’re developing using a virtualization environment(Docker, VMware, Vagrant, etc…), you have to follow that virtual OS, not your laptop’s.

Demo: Google Custom Search API

Here is a sample code to check the performances.

I used Google Custom Search API and get 5 image paths of each food. I generated 100 random food first and then put them as a keyword one by one when sending HTTP requests.

NOTE: Be careful sending HTTP requests because too many requests can be an attack to that page.

  • Add sleep(1) when you want to send HTTP request for not-API endpoint.
  • DON’T push your own code to Github with your credentials hard-coded

And here returns:

$ ruby base.rb
# 2.100000 0.170000 2.270000 ( 51.480358)
$ ruby multi_process.rb
# 0.010000 0.050000 10.330000 ( 2.972397)
$ ruby multi_thread.rb
# 1.640000 0.140000 1.780000 ( 4.273130)

…just awesome!

I used multi-threading architecture in my pet project [Github source].

Resources

--

--

Yuta Fujii

Web developer, Data analyst, Product Manager. Ex investment banker( structured finance ). Learn or Die.