Audio, or sound, is a variation of air pressure, caused by the vibration of an object, like vocal cords or a guitar string or a hammer hitting a nail. That vibration causes the surrounding air to vibrate: sound! Much like our larynx can create sound by creating vibrations, our ears can pick up those vibrations.
We can use an electromagnet to create sound from electricity: by placing a coil of wire near a permanent magnet, attaching a paper cone to it, and then turning electricity to it off and on very quickly. This is a speaker: it causes the air around it to vibrate. We can also make a microphone in a similar way: its vibration against the permanent magnet, picked up from the vibration of the surround air, causes the electromagnet to move, which creates electricity. We can record that electrical representation of sound to a magnetic tape or a vinyl record, or we can take samples of it tens of thousands of times per second, and record and playback those samples using a computer.
Audio, whether it's sound in the air, or electricity down a wire, can be represented as a wave, with the amplitude of the wave representing how loud it is - how big the vibration is - and the frequency of the wave representing its pitch - whether it's a low thud or hum, or something in the middle like someone saying "ah", or something very high like hissing.
A very pure sound like from a flute or a tuning fork will look like a pure sine wave. A raspy sound like a washing machine or a librarian going "shhhh" will be a very complex wave. In practice, sound captured by a microphone (or our ears) will nearly always be a complex wave - lots of different sounds of differing volumes stacked upon one another. The visual representation of this is called a waveform.
Above is a pair of stereo recordings as shown in Audacity - the left and right tracks of a recent song from a CD, followed by the left and right tracks of the same song on a vinyl LP record. You can tell from the vertical size of the waveform how loud it is - its amplitude. Note the top recording is much louder in general than the bottom one.
The voice is a set of complex sounds: long throated sounds (vowels) with shorter, higher sounds in between (consonants). Above is the waveform produced by an English speaker saying the word "Wikipedia". Note how the letters 'K' and 'P' are very high-frequency sounds and stand out a bit from the rest of the word. 'D' and the transition from 'iy' to 'ah' are also noticeable, but to a lesser extent.
Below is a link to the Audacity website, where the latest version is freely available to download and use.
Audacity, like most digital audio workstations (DAWs) shows a waveform of a recording on a track. It shows one or multiple tracks on a timeline. Time runs from left-to-right.
Import an audio file by selecting File > Import > Audio....
Or drag and drop the file onto the timeline
This image above shows a stereo waveform. The left channel is displayed in the top half of the track and the right channel in the bottom half. The track name takes the name of the imported audio file ("No Town" in this example). Where the waveform reaches closer to the top and bottom of the track, the audio is louder (and vice versa).
The ruler above the waveform shows you the length of the audio in minutes and seconds.
The image above shows Transport Toolbar.
Click the Play button to listen to the audio. Click the Stop button to stop playback. If you do not hear anything, see Audacity Setup and Configuration.
You can use the Space key on the keyboard as a shortcut for Play or Stop.
Click on Selection Tool then click on the waveform to choose a place to start, then click the Play button . Click and drag to create a selection, and then when you click Play button only the selection will play.
Correct adjustment of level before recording is essential to avoid noise or distortion.
Setting the Audio Host
The 'audio host' is the part of the computer's operating system that handles audio. For historic reasons / compatibility with older audio hardware, you have a choice.
On Windows, the choice is between the following audio interfaces. If WASAPI works, use it. If it doesn't, use DirectSound. If neither works, try MME.
On Mac the only choice is Core Audio.
On Linux there is often only one option: ALSA. Other options could include JACK (Jack Audio Connection Kit), which is useful for loopback recording.
Choose the built-in or attached sound device that you want to use for recording.
In most cases (for example, the inbuilt computer sound device), each entry for recording device consists of the type of device (such as microphone), followed in parentheses by the name of the sound card manufacturer.
On Mac, the internal sound card recording inputs are usually referred to as "Built-in".
On GNU/Linux, recording is often managed by the pulse sound server. It is normally best to select "default".
Our minds are good at tuning out the unimportant bits. A microphone is limited by its design: it either picks up all the sounds from everywhere (an omnidirectional microphone) or from one specific region (a directional microphone, often named by its 'pickup pattern' - where in the air around it that it picks up a signal - this is often heart shaped, so directional microphones are often some variation of 'cardioid'.)
Regardless of whether the microphone is omnidirectional or directional, it should be placed close to the source of sound. If there are multiple sources of sound, then an omnidirectional mic should be placed close to equidistant from all of them.
If you call the sound you want 'the signal', and all the background noise you don't want 'noise', then you're looking for a high signal-to-noise ratio.
With microphones, this can be achieved through careful placement, and with appropriate (loud but not distorting) setting of levels.
If digitizing a tape or vinyl album, or mixing music, your chief tool is appropriate (loud but not distorting) setting of levels.
The easiest way to select a region of audio is to click the left mouse button anywhere inside of an audio track, then drag (in either direction) until the other edge of your selection is made, then release the mouse.
If Selection tool is not selected (default setting), choose from Tools Toolbar, below:
The whole of an individual track can be selected by clicking in the Select button in the Track Control Panel to the left of a track.
You can select the entire length of all tracks on screen with Select > All or use the shortcut Ctrl + A (or ⌘ + A on a Mac).
When you record some audio or import audio from a file, you get a single track. In many cases, there are natural gaps in the audio - silence between sentences or pauses between phrases in music. Those are good candidates for splitting the track into multiple clips, allowing you to move or otherwise manipulate those clips independently.
To split a clip at the cursor (or region), press Ctrl-I (or Cmd-I on a Mac), or, select from the menu: Edit > Clip Boundaries > Split.
Select a clip or region within a clip, then press Ctrl+K (Cmd-K on a Mac), or select from the menu: Edit > Delete.
To move clips around independently, use the Time Shift Tool . When you click on a clip and drag it to the left or right, this is called time-shifting because you are changing the time at which that audio will be heard.
You can cut, copy, and paste selected audio just like words in a word processor, using Ctrl-X, C, and V (Cmd-X, C, and V on a Mac), or via the Edit menu.
When you select Envelope Tool from the Tools Toolbar, your track, which normally looks like this:
now has a thick blue border at the top and bottom of the waveform, like this:
An amplitude envelope is manipulated by a number of control points. Each control point is visible by its four handles (the small circles in the image below), by which you can drag the point up or down to control the volume level.
Dragging either the top or bottom handle ensures you can never distort the track by dragging outside its original volume envelope. Dragging an inner handle allows you to amplify a quiet piece of audio beyond the original volume envelope of the track.
Here's an example of an amplitude envelope applied to the track. The volume is made to diminish slowly and then to grow again much faster (note the much steeper slope of the blue line). The volume is directly proportional to the height of the waveform - the smaller you make the waveform, the quieter it will sound:
In this example we will create a fade that starts at full volume, then starts to fade slowly, then gradually more rapidly to silence, as illustrated here:
When you save an Audacity project with File > Save Project > Save Project you are doing just that - saving an Audacity project. Audacity projects can be opened only by Audacity. If you want other applications (such as Apple Music/iTunes or Windows Media Player) to be able to open this file you need to export it.
Before your first export, let' simplify things a bit. Go to the Import / Export Preferences, and under When exporting tracks to an audio file uncheck "Show Metadata Editor prior to export step". Metadata Editor adds extra information about the speech or music into the file - see For More Information below to learn more. You can go back to the Import / Export Preferences at any time if you ever want to re-enable Metadata Editor.
The steps for exporting a file in MP3 format are the same as for a WAV file, except:
If you're informally sharing the file with a friend, export to MP3. This may also make sense for submitting class projects and the like - check with your professor. If you intend to submit the file to a distribution system like iTunes, Anchor, Spotify, Audible, or save it for long term archival, then export to WAV.