Installing FastChat on Ubuntu 22.04 using Docker

a lama with sunglasses next to a computer, generated using stable diffusion with Steps: 48, Sampler: LMS Karras, CFG scale: 6, Seed: 1937289325, Face restoration: CodeFormer, Size: 512x512, Model hash: 4711ff4dd2, Model: v2-1_768-nonema-pruned, Version: v1.2.1

In this blog post, we will walk you through the process of installing FastChat on Ubuntu using Docker. FastChat „is an open platform for training, serving, and evaluating large language model based chatbots“.

To follow along with this article, you will need an Ubuntu 22.04 system with 16 GiB of CPU RAM and an NVIDIA GPU with 12 GiB of GPU RAM (mine is a GeForce RTX 3060). You will also need to have access to a terminal or command prompt for the installation and configuration of Docker and interacting with FastChat.

Here are the steps:

  1. Install Docker (you need Docker 19.04 or higher, at the time of writing it was Docker 22.04) and add your own user to the Docker group:
    sudo apt install && sudo usermod -aG docker $USER
  2. Install the Nvidia drivers and compute API packages. I used driver version 525, so my packages are:
    • nvidia-compute-utils-525
    • nvidia-dkms-525
    • nvidia-driver-525
    • nvidia-kernel-common-525
    • nvidia-kernel-source-525
    • nvidia-prime
    • nvidia-settings
    • nvidia-utils-525
  3. Install the Nvidia container toolkit (CTK), which will allow the app in the container to utilize the GPU:
    • Add the Nvidia package repository and GPG key:
      distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -fsSL | sudo gpg –dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L$distribution/libnvidia-container.list | sed ’s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g‘ | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    • Update the package cache and install the CTK:
      sudo apt update && sudo apt install nvidia-container-toolkit
    • Let the CTK configure Docker to integrate it’s runtime:
      sudo nvidia-ctk runtime configure –runtime=docker
    • Restart the Docker daemon:
      sudo systemctl restart docker
  4. Logout and back in to apply the group change and be able to use Docker from your non-root user account.
  5. (optional) Check that the Nvidia integration and Docker itself work by running:
    docker run –rm –runtime=nvidia –gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
  6. In a new folder, create the following files, content follows below:
    • .dockerignore
    • Dockerfile
    • Makefile

.dockerignore should contain:


Dockerfile should contain:

FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
RUN /opt/conda/bin/pip install fschat
CMD python3 -m fastchat.serve.cli –model-path lmsys/fastchat-t5-3b-v1.0

Makefile should contain:

.PHONY: all build run debug clean

NAME = fastchat
IMAGE = yourusername/$(NAME)

all: build run

        docker build -t $(IMAGE) .

        docker run -ti --rm --init --runtime=nvidia --gpus all --ipc=host --name $(NAME) --user $$(id -u):$$(id -g) --volume $$(pwd):/workspace --read-only $(IMAGE)

        docker run -ti --rm --init --runtime=nvidia --gpus all --ipc=host --name $(NAME) --user $$(id -u):$$(id -g) --volume $$(pwd):/workspace --read-only $(IMAGE) /bin/bash

        docker stop $(NAME)
        docker rm $(NAME)
  1. Install FastChat into a container image and run the application with the default model by running:
    make all

The first time you launch FastChat, the default model will be downloaded and used, while subsequent launches will re-use that model from the current directory. You can end your session by pressing [Ctrl]+[D], like other CLI applications.

It’s important to note that conversations are not stored between sessions, so requests for the model’s name or the last conversation will provide varying results, as could be expected.

Training or re-training your model or launching the WebUI is beyond the scope of this article.

This article was produced using the installed FastChat application and feeding it the blog authors notes. Most blocks of prose were re-written by FastChat, the commands are all based on the authors research and the author re-arranged the FastChat generated output to their liking.

Discussion Area - Leave a Comment