utils#

pytsmod.utils._validate_audio(audio)#

validate the input audio and modify the order of channels.

Parameters
audionumpy.ndarray [shape=(channel, num_samples) or (num_samples) or (num_samples, channel)]

the input audio sequence to validate.

Returns
audionumpy.ndarray [shape=(channel, num_samples)]

the validataed output audio sequence.

pytsmod.utils._validate_scale_factor(audio, s)#

Validate the scale factor s and convert the fixed scale factor to anchor points.

Parameters
audionumpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)]

the input audio sequence.

snumber > 0 [scalar] or numpy.ndarray [shape=(2, num_points) or (num_points, 2)]

the time stretching factor. Either a constant value (alpha) or an (2 x n) (or (n x 2)) array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

Returns
anc_pointsnumpy.ndarray [shape=(2, num_points)]

anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

pytsmod.utils.win(win_type='hann', win_size=4096, zero_pad=0)#

Generate diverse type of window function

Parameters
win_typestr

the type of window function. Currently, Hann and Sin are supported.

win_sizeint > 0 [scalar]

the size of window function. It doesn’t contains the length of zero padding.

zero_padint > 0 [scalar]

the total length of zero-pad. Zeros are equally distributed for both left and right of the window.

Returns
winnumpy.ndarray([shape=(win_size)])

the window function generated.

pytsmod.utils.stft(x, ana_hop=2048, win_type='hann', win_size=4096, zero_pad=0, sr=44100, fft_shift=0, time_frequency_out=False)#

Short-Time Fourier Transform (STFT) for the audio signal. This function is used for phase vocoder.

Parameters
xnumpy.ndarray [shape=(num_samples)]

the input audio sequence. Should be a single channel.

ana_hopint > 0 [scalar] or numpy.ndarray [shape=(num_frames)]

either a analysis hop size (scalar) or analyze window positions (array).

win_typestr

type of the window function for the STFT. hann and sin are available.

win_sizeint > 0 [scalar]

size of the window function.

zero_padint > 0 [scalar]

the size of the zero pad in the window function.

srint > 0 [scalar]

the sample rate of the audio sequence. Only used for time_frequency_out.

fft_shiftbool

apply circular shift to STFT.

time_frequency_outbool

returns time and frequency axis indices in (spec, t, f).

Returns
specnumpy.ndarray [shape=(win_size // 2 + 1, num_frames)]

the STFT result of the input audio sequence.

tnumpy.ndarray [shape=num_frames]

timestamp of the output result.

fnumpy.ndarray [shape=win_size // 2 + 1]

frequency value for each frequency bin of the output result.

pytsmod.utils.istft(spec, syn_hop=2048, win_type='hann', win_size=4096, zero_pad=0, num_iter=1, original_length=- 1, fft_shift=False, restore_energy=False)#

Inverse Short-Time Fourier Transform to recover the audio signal from the spectrogram. This function is used for phase vocoder.

Parameters
Xnumpy.ndarray [shape=(num_bins, num_frames)]

the input audio complex spectrogram.

syn_hopint > 0 [scalar]

the hop size of the synthesis window.

win_typestr

type of the window function for the ISTFT. hann and sin are available.

win_sizeint > 0 [scalar]

size of the window function.

zero_padint > 0 [scalar]

the size of the zero pad in the window function.

num_iterint > 0 [scalar]

the number of iterations the algorihm should perform to adapt the phase.

original_lengthint > 0 [scalar]

original length of the audio signal.

fft_shiftbool

apply circular shift to ISTFT.

restore_energybool

tries to reserve potential energy loss.

Returns
ynumpy.ndarray [shape=(original_length)]

the output audio sequence.

pytsmod.utils.lsee_mstft(X, syn_hop, win_type, win_size, zero_pad, fft_shift, restore_energy)#

Least Squares Error Estimation from the MSTFT (Modified STFT). Griffin-Lim procedure to estimate the audio signal from the modified STFT.

Parameters
Xnumpy.ndarray [shape=(num_bins, num_frames)]

the input audio complex spectrogram.

syn_hopint > 0 [scalar]

the hop size of the synthesis window.

win_typestr

type of the window function for the ISTFT. hann and sin are available.

win_sizeint > 0 [scalar]

size of the window function.

zero_padint > 0 [scalar]

the size of the zero pad in the window function.

fft_shiftbool

apply circular shift to ISTFT.

restore_energybool

tries to reserve potential energy loss.

Returns
xnumpy.ndarray [shape=num_samples]

the output audio sequence through LSEE_MSTFT

pytsmod.utils._validate_f0(audio, f0)#

Validate the input f0 is suitable for input audio.

Parameters
audionumpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)]

the input audio sequence.

f0numpy.ndarray [shape=(num_channels, num_pitches) or (num_pitches) or (num_pitches, num_channels)]

the f0 sequence that used for TD-PSOLA. If f0 is 1D array, the f0 of all audio channels are regarded as the same f0.

Returns
f0numpy.ndarray [shape=(num_channels, num_freqs)]

the f0 sequence that used for TD-PSOLA.