It has recently become possible to alter any audio file to sound like a famous artist by the proces of altering the original file with deep learning. This model is trained on large amounts of audio data containing speech of the artist in question. There are multiple free ways to use open source code to achieve this. It is not a very user friendly process with few tutorials online (therefore this tutorial).
This article will describe the quickest way to set it up, with tips on how to run the whole process locally. To avoid local processing, the method described here will run the code in Google Colab. Let’s first show the original input Audio and the Audio generated by a model trained on data from the Weeknd.
Input audio file
"Late nights, the city's playground, we own the streets Heartbeats racing as darkness and music meet In the moons embrace we find our escape Late night whispering secrets, passion takes shape Late nights with Paulus, my feline confidant Whiskers and melodies our secret rendezvous chant"
The text is generated by chatGPT attempting to write a verse like the Weeknd (and a shoutout to my cat Paulus).
Output audio file (with background music)
Considering the poor quality of the input audio file this result is very good. A real person singing on beat as audio input can result in a very good copy of the Weeknd.
Generate the input audio file (text to speech)
The end result will be much better if the input audio file is a real person singing instead of an artifical voice. For completions sake I show python code to easily generate and store text to speech audio.
# import library
import pyttsx4
# Generate audio, better to record yourself tho
engine = pyttsx4.init()
# If multiple voices are installed, this code shows how to define it:
## engine.setProperty('voice', voice.id)
# define the text to use
string = "Late nights, the city playground, we own the streets heartbeats racing" \
+ "as darkness and music meet in the moons embrace we find our escape late night" \
+ "whispering secrets, passion takes shape late nights with my boy Paulus, my" \
+ "feline counfodant whiskers and melodies our secret rendezvous chant"
# play the audio before storing it as a file
engine.say(string);
# We can use file extension as mp3 and wav, both will work
engine.save_to_file(string, 'speech.wav')
# Wait until above command is not finished.
engine.runAndWait()
Generate the output audio file
To generate an output audio file we need to perform three steps:
- Download the voice-model for the artist we want to simulate
- Prepare our Google Drive environment correctly
- Run the code in Google Colab
Note that in this case we are using Google Colab to run the code. This makes it easier since we do not need to install the required libraries locally and avoid a lot of processing that is now done in the cloud. At the end of this article tips are given on how to run the whole process locally.
1. Download the voice-model for the artist to simulate
Join this discord: https://discord.gg/aihub. This is where all the voice-models are shared. Once you have access look for ‘voice-models‘ under Models & Datasets, as shown in the image below. Search for the artist of your interest and download the model.
Unzip the downloaded zip file, we will need the config.json file and the .pth file contained inside. Those are stored on your Google Drive explained in the next step.
2. Prepare the Google Drive environment correctly
- In the root folder of your Google drive create a folder called ‘models‘.
- Then inside the ‘models‘ folder create a new folder with the name of the artist. In my case the folder is called ‘theweeknd‘. Important: the folder name cannot contain spaces!
- Inside this folder place the config.json file and .pth file downloaded in the previous step through the discord group.
After these steps you now have in your Google Drive root a folder called ‘models’ then in there a folder with the name of the artist and within that the files download through discord. Your Google drive should now look like this:
3. Run the code in Google Colab
If all the steps are performed correctly we can now generate the audio. Click on the following link to go to the Google Colab file: https://colab.research.google.com/drive/1Nt_3kLUTRLGeZxI7zbGLnkvCDhI8yaVJ?usp=sharing#scrollTo=zNb903TEgJxB
The colab file gives some information as well, but not very clearly. Simply put, there are two code blocks to run. The first one can already be run, it will ask for permission to access your Google Drive (give it access). This is needed to access the files we just uploaded.
The second code block will generate the audio. before we can run this we need to upload our input file. We can drop our input file in the Google Colab file as follows:
Now run the second code block, after some time it will show a User Interface below the code. Click on ‘Convert‘. After some time it will look like this:
You did it, congrats! You can now play the generated audio and save it to your local machine. Note that the UI also gives the option to use different AI models (aka different artists in this case). SImply adding a new folder in the ‘models‘ folder in Google Drive will make it show up here.
Running So-VITS-SVC locally
The code in the Colab shows how it is all done. This can quite simply be adjusted for local usage. Here are some aditional links for info on how to set up this whole process locally.
p3tro, A.i. Vocal Tutorial: LINK
Sam Smyers, How to Make Your Voice Sound Like Drake… : LINK