As an definition for Nyquist frequency: in order to reconstruct signal x(t) from its samples, sampling frequency (Fs) needs to be (2x times) higher than bandwidth (B) of the sampled sound (Fs > 2B).
For example, human voice usually contain relatively insignificant frequencies at or above 10 kHz. Sampling such an audio signal with sample rate at 20k samples/sec or more, results an good approximation to meeting the criterion. No problem.
If there is some need (e.g. due to technical limitation or standardisation, to only have 8kHz audio signal), in this case, human voice should be filtered before sampling in order to reduce to aliasing. In this situation the type of filter needed is a lowpass filter (which in this example can be called as anti-aliasing filter.)
Aliasing & anti-aliasing can also be applied in other digital signal processing, e.g. digital image processing.
I'll define aliasing and anti-aliasing later in an another section of this evolving blog.
As a rule of thumb; sampling frequency needs to be (at least) double than the original audio signal's bandwidth.