Disclaimer
The Execute Command node is only supported on self-hosted (local) instances of n8n.Introduction
KOKORO TTS – Kokoro TTS is a compact yet powerful text-to-speech model, currently available on Hugging Face and GitHub. Despite its modest size—trained on less than 100 hours of audio—it delivers impressive results, consistently topping the TTS leaderboard on Hugging Face.
Unlike larger systems, Kokoro TTS offers the advantage of running locally, even on devices without GPUs, making it accessible for a wide range of users.Who will benefit from this integration?
This will be useful for video bloggers, TikTokers, and it will also enable the creation of a free voice chat bot. Currently, TTS models are mostly paid, but this integration will allow for fully free voice generation. The possibilities are limited only by your imagination.Note
Unfortunately, we can’t interact with the KOKORO API via browser URL (GET/POST), but we can run a Python script through n8n and pass any variables to it. In the tutorial, the D drive is used, but you can rewrite this for any paths, including the C drive.Step 1
You need to have Python installed. Also, download and extract the portable version of KOKORO from GitHub. Create a file named voicegen.py with the following code in the KOKORO folder: (C:KOKORO). As you can see, the output path is: (D:output.mp3).python import sys import shutil from gradio_client import Client # Set UTF-8 encoding for stdout sys.stdout.reconfigure(encoding=’utf-8′) # Get arguments from command line text = sys.argv[1] # First argument: input text voice = sys.argv[2] # Second argument: voice speed = float(sys.argv[3]) # Third argument: speed (converted to float) print(f”Received text: {text}”) print(f”Voice: {voice}”) print(f”Speed: {speed}”) # Connect to local Gradio server client = Client(“”) # Generate speech using the API result = client.predict( text=text, voice=voice, speed=speed, api_name=”/generate_speech” ) # Define output path output_path = r”D:\\output.mp3″ # Move the generated file shutil.move(result[1], output_path) # Print output path print(output_path) Step 2
Go to n8n and create the following workflow.Step 3
Edit Field Module.json { “voice”: “af_sarah”, “text”: “Hello world!” } Step 4
We’ll need an Execute Command module with the command:plaintext python C:\KOKORO\voicegen.py “{{ $json.text }}” “{{ $json.voice }}” 1 Step 5
The script is already working, but to listen to it, you can connect a Binary module with the path to the generated MP3 file: D:/output.mp3.Step 6
Click “Text workflow” and enjoy the result. There are more voices and accents than in ChatGPT, plus it’s free.P.S.
If you want, there is a detailed tutorial on my blog.