AudioWord represents a word in audio transcription.
end represents the end time of the word in seconds.
Optional
speaker represents the speaker of the word.
start represents the start time of the word in seconds.
text represents the text for the word.
AudioWord represents a word in audio transcription.