Associate teachers

-

ECTS credits

5

Number of hours: Lectures + Seminars + Exercises

30 / 0 / 15

Course objectives

Course gives fundamentals of digital speech processing and its applications in communications and multimedia.

Digital speech modeling, parametric models. Speech analysis, parameter estimation for vocal tract model and excitation model. Most important speech models and their properties. Speech coding and applications. Automatic speech and speaker recognition, language detection. Speech feature vectors, Cepstral analysis. Statistical models for speech recognition, Hidden Markov Model, Gaussian Mixture Model, training procedures for statistical models. Acoustical and lexical models. Speech synthesis, diphonic, threephonic. Speech normalization and modification. Examples of systems for speech coding, recognition and synthesis.

Enrolment requirements and/or entry competences required for the course

-

Learning outcomes at the level of the programme to which the course contributes

  • Apply theoretical knowledge of the fundamentals of the six core disciplines and their relationship within cognitive science.
  • Apply specific knowledge and skills from selected disciplines constituting cognitive science.
  • Critically evaluate cognitive science findings and synthesize information to be employed in a collaborative professional environment.
  • Participate in data-driven innovation projects and apply appropriate data science tools.
  • Employ cognitive science insights in developing innovative, human-friendly and sustainable technological solutions.
  • Apply AI tools in concrete tasks and practical contexts.

Course content (syllabus)

  • Lectures: (L): Introduction to digital speech processing and its applications, Automatic speech, speaker and language recognition, Basic principles of speech synthesis, Text-to-Speech, Computer dialog systems with applications in virtual reality;
    Lab.exc. (E): Chap.: Survey of digital speech processing applications, Chap.: Fundamentals of speech production, Chap.: Phonetics and Linguistics.
  • Lectures (L): Fundamentals of speech production, Physical model of production;
    Lab.exc. (E): Chap 1: Recording of speech signals using sound cards.
  • Lectures (L): Acoustic model of vocal tract;
    Lab.exc. (E): Chap. 2: Analysis of speech signals in time domain.
  • Lectures (L): Excitation signal of the vocal tract;
    Lab.exc. (E): Chap. 3: Spectral analysis of speech signals and spectrograms and Chap. Analysis of speech formant structure.
  • Lectures (L): Connected tube model of the vocal tract, Time discrete vocal tract model;
    Lab.exc. (E): Chap. 5: Automatic classification of vowels based on their format structure.
  • Lectures (L): Linear prediction and its application for speech modeling;
    Lab.exc. (E): Chap. 6: Automatic speaker classification based on formant structure.
  • Lectures (L): Autocorrelation method for LPC model estimation;
    Lab.exc. (E): Chap. 7: Linear prediction methods.
  • Midterm exam
  • Lectures (L): Properties of autocorrelation based LPC model;
    Lab.exc. (E): Chap. 8: Autocorrelation method for speech predictor estimation; and Chap. 9: Levinson-Durbin algorithm; prediction gain analysis.
  • Lectures (L): Covariance method for LPC model estimation, Parametric representations for short-time speech spectral envelope modeling;
    Lab.exc. (E): Chap. 10: Covariance method for speech predictor estimation.
  • Lectures (L): Homomorphic speech processing;
    Lab.exc. (E): Chap. 11: Quantization effects of LPC predictor coefficients.
  • Lectures (L): Applications of homomorphic processing on speech signal;
    Lab.exc. (E): Chap. 12: Homomorphic analysis of speech signal.
  • Lectures (L): Introduction to automatic speech recognition (ASR), Speech analysis for ASR;
    Lab.exc. (E): Chap. 13: Voicing and pitch estimation.
  • Lectures (L): Feature vectors; Statistical models and classification methods for ASR;
    Lab.exc. (E): Chap. 14: Example of the Vocoder.
  • Final exam for continuous assessment

Student responsibilities

Laboratory exercises: During the semester, laboratory exercises are organized in accordance to week-by-week plan. These exercises are used to prepare students for individual work.
Individual assignments: Total course workload related to student individual work amounts to 90 hours, which students use for Program exercises and preparation for exams. Homework for each of the two semester terms is the Report of individual work on Program exercises. This Report also includes the report of Laboratory exercises. For individual work, students have to examine corresponding chapters in Course-book and Lecture notes which are cited in the week-by-week plan, perform the required exercises and prepare the report for each chapter.

Required literature

  • E. Keller, Fundamentals of Speech Synthesis and Speech Recognition, Wiley-Blackwell, 1994, ISBN/ISSN 0471944491
  • Lawrence R. Rabiner, Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993, ISBN/ISSN 0130151572

Optional literature

  • Petrinović, D. (2010.), Uvod u digitalnu obradbu govora koristenjem Matlaba, FER, Udžbenici sveučilišta u Zagrebu
  • Petrinović, D., Digitalna obrada govora, Zavodska skripta, FER, ZESOI, 2010