utils#

pytsmod.utils._validate_audio(audio)#

validate the input audio and modify the order of channels.

Parameters

audionumpy.ndarray [shape=(channel, num_samples) or (num_samples) or (num_samples, channel)]: the input audio sequence to validate.

Returns

audionumpy.ndarray [shape=(channel, num_samples)]: the validataed output audio sequence.

pytsmod.utils._validate_scale_factor(audio, s)#

Validate the scale factor s and convert the fixed scale factor to anchor points.

Parameters

audionumpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)]: the input audio sequence.
snumber > 0 [scalar] or numpy.ndarray [shape=(2, num_points) or (num_points, 2)]: the time stretching factor. Either a constant value (alpha) or an (2 x n) (or (n x 2)) array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

Returns

anc_pointsnumpy.ndarray [shape=(2, num_points)]: anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

pytsmod.utils.win(win_type='hann', win_size=4096, zero_pad=0)#

Generate diverse type of window function

Parameters

win_typestr: the type of window function. Currently, Hann and Sin are supported.
win_sizeint > 0 [scalar]: the size of window function. It doesn’t contains the length of zero padding.
zero_padint > 0 [scalar]: the total length of zero-pad. Zeros are equally distributed for both left and right of the window.

Returns

winnumpy.ndarray([shape=(win_size)]): the window function generated.

pytsmod.utils.stft(x, ana_hop=2048, win_type='hann', win_size=4096, zero_pad=0, sr=44100, fft_shift=0, time_frequency_out=False)#

Short-Time Fourier Transform (STFT) for the audio signal. This function is used for phase vocoder.

Parameters

xnumpy.ndarray [shape=(num_samples)]: the input audio sequence. Should be a single channel.
ana_hopint > 0 [scalar] or numpy.ndarray [shape=(num_frames)]: either a analysis hop size (scalar) or analyze window positions (array).
win_typestr: type of the window function for the STFT. hann and sin are available.
win_sizeint > 0 [scalar]: size of the window function.
zero_padint > 0 [scalar]: the size of the zero pad in the window function.
srint > 0 [scalar]: the sample rate of the audio sequence. Only used for time_frequency_out.
fft_shiftbool: apply circular shift to STFT.
time_frequency_outbool: returns time and frequency axis indices in (spec, t, f).

Returns

specnumpy.ndarray [shape=(win_size // 2 + 1, num_frames)]: the STFT result of the input audio sequence.
tnumpy.ndarray [shape=num_frames]: timestamp of the output result.
fnumpy.ndarray [shape=win_size // 2 + 1]: frequency value for each frequency bin of the output result.

pytsmod.utils.istft(spec, syn_hop=2048, win_type='hann', win_size=4096, zero_pad=0, num_iter=1, original_length=-1, fft_shift=False, restore_energy=False)#

Inverse Short-Time Fourier Transform to recover the audio signal from the spectrogram. This function is used for phase vocoder.

Parameters

Xnumpy.ndarray [shape=(num_bins, num_frames)]: the input audio complex spectrogram.
syn_hopint > 0 [scalar]: the hop size of the synthesis window.
win_typestr: type of the window function for the ISTFT. hann and sin are available.
win_sizeint > 0 [scalar]: size of the window function.
zero_padint > 0 [scalar]: the size of the zero pad in the window function.
num_iterint > 0 [scalar]: the number of iterations the algorihm should perform to adapt the phase.
original_lengthint > 0 [scalar]: original length of the audio signal.
fft_shiftbool: apply circular shift to ISTFT.
restore_energybool: tries to reserve potential energy loss.

Returns

ynumpy.ndarray [shape=(original_length)]: the output audio sequence.

pytsmod.utils.lsee_mstft(X, syn_hop, win_type, win_size, zero_pad, fft_shift, restore_energy)#

Least Squares Error Estimation from the MSTFT (Modified STFT). Griffin-Lim procedure to estimate the audio signal from the modified STFT.

Parameters

Xnumpy.ndarray [shape=(num_bins, num_frames)]: the input audio complex spectrogram.
syn_hopint > 0 [scalar]: the hop size of the synthesis window.
win_typestr: type of the window function for the ISTFT. hann and sin are available.
win_sizeint > 0 [scalar]: size of the window function.
zero_padint > 0 [scalar]: the size of the zero pad in the window function.
fft_shiftbool: apply circular shift to ISTFT.
restore_energybool: tries to reserve potential energy loss.

Returns

xnumpy.ndarray [shape=num_samples]: the output audio sequence through LSEE_MSTFT

pytsmod.utils._validate_f0(audio, f0)#

Validate the input f0 is suitable for input audio.

Parameters

audionumpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)]: the input audio sequence.
f0numpy.ndarray [shape=(num_channels, num_pitches) or (num_pitches) or (num_pitches, num_channels)]: the f0 sequence that used for TD-PSOLA. If f0 is 1D array, the f0 of all audio channels are regarded as the same f0.

Returns

f0numpy.ndarray [shape=(num_channels, num_freqs)]: the f0 sequence that used for TD-PSOLA.