If you transcribe audio, interviews, meetings, podcasts, or lectures and have ever wondered "is there a simpler and cheaper way to do this?", the answer is yes. And best of all: you can run it on your own laptop, without sending any file to the cloud.

1. Who this guide is for

You are a journalist, researcher, teacher, lawyer, administrative assistant, psychologist, content creator, graduate student… in short, you deal with many hours of audio each month, but you are not a programmer. You may have even tried online tools and discovered that:

  • They have file size limits (and that 2-hour meeting does not fit).
  • They charge per transcribed minute (and by the end of the month the bill hurts).
  • They do not guarantee privacy (your client’s audio goes to a third-party server).
  • Or they simply crash in the middle of the file.

This article is for you. I’ll show you, step by step, how to install and use OpenAI’s Whisper directly on your computer, even if it does not have a good graphics card, even if you have never opened the terminal in your life.

2. What Whisper is (in 30 seconds, no jargon)

Whisper is an artificial intelligence model created by OpenAI (the same people behind ChatGPT), which is used to listen to audio and turn it into text. It understands Portuguese very well — accents, slang, technical terms — and works offline, meaning that after it is installed, it no longer needs internet to transcribe.

It is free and open source. You do not pay anything to use it, there is no minute limit, and nobody is watching your files.

There are several ways to use Whisper. Here we will use the openai-whisper version in Python, which is the most stable and the easiest to automate.

3. What you will need (and probably already have)

Item What it is Minimum acceptable
Computer Windows 10/11, macOS 11+ or Linux Any of the last 6 years
RAM Your PC’s "short-term memory" 8 GB (16 GB recommended)
Disk space Where Whisper will live 5 GB free
Processor The PC’s "brain" 8th-gen Intel i5 / Ryzen 5 2000+ or Apple M1+
Graphics card (GPU) Speeds things up — optional Not needed
Internet connection Only for installation (one time) Standard broadband
Python The language we will use Version 3.9 to 3.12

No GPU? Relax. Whisper runs 100% on the processor (CPU). It is slower than using a graphics card, but it works the same way — you just get to go have a coffee while it processes very large files.

4. Part 1 — Installing Python (no fear)

If you already have Python installed, skip to Part 2. If you do not, come with me.

On Windows

  1. Go to python.org/downloads.
  2. Click the big "Download Python 3.x.x" button.
  3. Important: when running the installer, check the "Add Python to PATH" box at the bottom. This is the part most people forget and then end up scratching their heads.
  4. Click Install Now and that’s it.

On macOS

Open the terminal (press Cmd + Space, type "terminal" and press Enter) and paste:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install python

On Linux (Ubuntu/Debian)

sudo apt update
sudo apt install python3 python3-pip python3-venv

How to know if it worked

Open the terminal (on Windows, search for "cmd" or "PowerShell") and type:

python --version

If something like Python 3.11.5 (or similar) appears, you are on the right track. ✅

5. Part 2 — Creating a safe little space for the project

This is a trick programmers use so they do not mix things up: we create an isolated folder with everything needed inside. That way, if something goes wrong, just delete the folder and start over.

mkdir whisper-local
cd whisper-local
python -m venv venv

Now activate the environment:

Windows (PowerShell)

.\venv\Scripts\Activate.ps1

If you get a red error about "script execution disabled", open PowerShell as administrator and run: Set-ExecutionPolicy -Scope CurrentUser RemoteSigned. Then close it and open it again.

Windows (cmd)

venv\Scripts\activate.bat

macOS / Linux

source venv/bin/activate

You will notice that (venv) appears at the beginning of the line. That is a good sign — it means you are "inside" the little workspace.

6. Part 3 — Installing Whisper (the magical part)

With the environment activated, run:

pip install openai-whisper

A lot will start happening on the screen — downloads, installation, little compilations. That is normal. It may take 2 to 10 minutes depending on your internet connection.

Then install ffmpeg, the audio "decoder" Whisper uses behind the scenes:

Windows (with Chocolatey installed)

choco install ffmpeg

Without Chocolatey: download it from gyan.dev/ffmpeg/builds, extract it, and add the bin folder to the Windows PATH.

macOS

brew install ffmpeg

Linux (Ubuntu/Debian)

sudo apt install ffmpeg

7. Part 4 — Your first transcription

Create a file called transcrever.py inside the whisper-local folder (you can use Notepad, VS Code, or whatever you prefer) and paste this:

import whisper

# Load the model. The first time, it downloads from the internet (about 150 MB).
# Options: tiny | base | small | medium | large
# tiny and base = faster, lower quality
# small and medium = ideal balance for CPU
# large = best quality, but requires a good machine
modelo = whisper.load_model("small")

# Put your file path here (mp3, m4a, wav, mp4, etc.)
arquivo = "minha_reuniao.mp3"

# The magic happens here
resultado = modelo.transcribe(arquivo, language="portuguese")

# Save into a text file
with open("transcricao.txt", "w", encoding="utf-8") as f:
    f.write(resultado["text"])

print("✅ Done! File saved as transcricao.txt")

How to use it:

  1. Put an audio file (e.g. minha_reuniao.mp3) inside the whisper-local folder.
  2. In the terminal (with (venv) still active), run: python transcrever.py
  3. Go have a coffee. ☕ For a 30-minute audio file, expect between 10 and 25 minutes (depending on your processor).
  4. When you come back, open the transcricao.txt file and see the magic.

8. Part 5 — Enhanced version (with timestamps and subtitles)

If you need to know at what minute each sentence was spoken (useful for subtitles, citing excerpts in a thesis, or finding a specific part of the meeting), use this version:

import whisper
from whisper.utils import get_writer

modelo = whisper.load_model("small")
resultado = modelo.transcribe(
    "minha_reuniao.mp3",
    language="portuguese",
    task="transcribe",
    verbose=True  # shows progress in real time
)

# Plain text
with open("transcricao.txt", "w", encoding="utf-8") as f:
    f.write(resultado["text"])

# .srt subtitle (perfect for videos)
escritor = get_writer("srt", ".")
escritor(resultado, "minha_reuniao.mp3")

print("✅ Text and .srt subtitle generated!")

Bonus: change "srt" to "vtt" if you need subtitles for the web, or to "tsv" if you want a spreadsheet with each sentence on a line (with start time, end time, and text).

9. Practical tips (the part that saves hours)

9.1. Choosing the right model

Model Size RAM required CPU speed Quality
tiny 75 MB ~1 GB ⚡⚡⚡⚡⚡ ⭐⭐
base 140 MB ~1 GB ⚡⚡⚡⚡ ⭐⭐⭐
small 460 MB ~2 GB ⚡⚡⚡ ⭐⭐⭐⭐
medium 1.5 GB ~5 GB ⚡⚡ ⭐⭐⭐⭐⭐
large 3 GB ~10 GB ⭐⭐⭐⭐⭐

My honest recommendation for beginners: go with small. It is the best balance between speed and quality on ordinary machines. If you notice it is missing too many proper names or technical terms, move up to medium.

9.2. Transcribing multiple files at once

import whisper
from pathlib import Path

modelo = whisper.load_model("small")
pasta = Path("./audios")  # put your files in this folder

for arquivo in pasta.glob("*"):
    if arquivo.suffix.lower() in [".mp3", ".wav", ".m4a", ".mp4"]:
        print(f"🎙️ Transcribing: {arquivo.name}")
        resultado = modelo.transcribe(str(arquivo), language="portuguese")
        saida = arquivo.with_suffix(".txt")
        saida.write_text(resultado["text"], encoding="utf-8")

print("✅ All files have been transcribed!")

9.3. Translating audio in another language into Portuguese

resultado = modelo.transcribe("palestra_ingles.mp3", task="translate")

It will deliver the English audio translated into Portuguese. Very useful for those who consume a lot of foreign content.

9.4. Forcing the transcription to recognize specific terms

resultado = modelo.transcribe(
    "entrevista.mp3",
    language="portuguese",
    initial_prompt="Interview with Dr. Almeida about LGPD, COAF and compliance."
)

The initial_prompt is like "context" — Whisper uses it to better understand what the audio is about and make fewer mistakes with proper names and technical terms.

10. Common problems (and how to fix them)

❌ "ModuleNotFoundError: No module named 'whisper'"
You probably forgot to activate (venv) before running it. Go back and activate it.

❌ "ffmpeg not found"
ffmpeg was not installed or is not in the PATH. Reinstall it following Part 3.

❌ The process dies with a memory error
Try a smaller model (tiny or base) or close your browser, Spotify, and other heavy programs while transcribing.

❌ The transcription quality is bad

  • Check whether the audio has a lot of background noise. Whisper can work miracles, but bad audio becomes bad text.
  • Try a larger model (smallmedium).
  • Use the initial_prompt trick with keywords from your context.

❌ It is taking WAY too long

  • Long audios (over 1 hour) on CPU can take several hours. That is normal.
  • If you have many files, you can leave the computer on overnight processing them.
  • If this is a constant problem in your work, it may be worth investing in a more robust solution (more on that soon 👇).

11. And when "do it yourself" starts getting in the way?

Look, I’ll be very honest: local Whisper is great, but it is not always the best choice for day-to-day work. Before you spend the next week setting everything up, ask yourself:

  • 📦 Do you transcribe more than 10 hours of audio per week? It is probably already time to have an automated pipeline (input folder → transcription → spreadsheet or final document ready).
  • 👥 Do you or your team lose hours on repetitive tasks (renaming files, splitting transcription by speaker, generating summaries, formatting)?
  • 🔒 Are the audios sensitive (clients, patients, legal cases) and do you need to guarantee that NOTHING leaves your machine?
  • 📊 Do you need to integrate transcription with other tools (Google Sheets, Notion, CRM, petition generator, etc.)?

If you answered yes to at least one of these questions, it may not make sense to keep doing everything manually.

That is exactly where we come in.

👋 Meet Vem pra Descomplica

We are a team that removes the technical part from your path so you can get back to doing what matters in your work. We help professionals and companies to:

  • 🛠️ Set up a local transcription environment on your machine (yes, we install and configure everything for you, on a video call, without you needing to learn any Python).
  • ⚙️ Create automated workflows that turn audio into text, summaries, spreadsheets, meeting minutes, video subtitles — automatically.
  • 🔐 100% local solutions for those who handle sensitive data (legal, healthcare, academic research).
  • 🧩 Custom integrations with the tools you already use (Google Drive, Notion, ClickUp, spreadsheets, email, etc.).
  • 🧪 POCs and proof of concept for companies that want to test transcription AI before investing.

You do not need to become a programmer to use artificial intelligence in your favor. You just need someone to make it simple for you.

💬 Talk to us

The first conversation is free and without commitment. We will look at your case together and honestly tell you whether you can solve it on your own or whether it is worth us building a custom solution.

I want to talk to Descomplica →

12. Summary (for those who just want the cheat sheet)

  1. Install Python 3.9+ (do not forget to check "Add to PATH" on Windows).
  2. Create a folder, open the terminal inside it, and run python -m venv venv.
  3. Activate the environment (.\venv\Scripts\Activate.ps1 on Windows or source venv/bin/activate on Mac/Linux).
  4. Run pip install openai-whisper and install ffmpeg.
  5. Copy the script from Part 4, change the audio file name, and run python transcrever.py.
  6. Open transcricao.txt and celebrate. 🎉

Done. You just transcribed audio with AI running 100% on your machine, without paying anything, without sending anything to the cloud, without any time limit. And if one day this becomes a bottleneck in your work — you already know who to call. 😉