Home / Academics / List of courses /

Multivariate data analysis

ECTS credits

5

Number of hours: Lectures + Seminars + Exercises

45 / 0 / 15

Course objectives

Multivariate data analysis forms one of the basic pillars of data science and is a generalization of univariate and bivariate statistical methods.

Multivariate analysis is intended for simultaneous analysis and visualization of complex datasets with a large number of independent and/or dependent variables that are in different degrees of correlation, and their various effects cannot be interpreted separately.

The contents of the course are grouped into three sections. The first part contains the basic concepts and basic techniques that precede the multivariate analysis, the second part relates to various advanced regression techniques and their understanding (with reference to high-dimensional data), and the third to techniques based on matrix decompositions (separation by eigenvalues and separation by singular values).

Enrolment requirements and/or entry competences required for the course

-

Learning outcomes at the level of the programme to which the course contributes

  • Participate in data-driven innovation projects and apply appropriate data science tools.
  • Initiate and sustain innovation activities in an interdisciplinary team.
  • Apply AI tools in concrete tasks and practical contexts.

Course content (syllabus)

  • Objective of multivariate statistical analysis; Data, objects, variables and scales (Stevens's classification); Classification of multivariate techniques, Summarizing, describing and graphical representation of multivariate data.
  • Data manipulation prior to multivariate analysis (missing data, outlier detection, transformations of data, standardization, normality, linearity, homoscedascity, homoegenity), Data appropriate for multivariate analysis: data, correlation, variance-covariance, sum-of-squares and cross-products matrices, residuals, distances (statistical and Mahalanobis).
  • Sample geometry and Random sampling.
  • Applied correlation and regression analysis, interpretation and relation to ANOVA, Canonical correlation analysis.
  • Discriminant analysis.
  • Logistic regression.
  • Principal component analysis.
  • Midterm exam.
  • Exploratory factor analysis.
  • Cluster analysis.
  • Multidimensional scaling.
  • Correspondence analysis.
  • Survival analysis/Failure analysis.
  • The Lasso method for high dimensional data (Lasso for linear models, generalized linear models and the Lasso, group Lasso).
  • Final exam.

Student responsibilities

Class attendance. Active engagement in classes. Midterm and final exam.

Required literature

  • Johnson, R. A., and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th Edition, Prentice Hall (2002)
  • B.G. Tabachnick, L.S. Fidell, Using multivariate statistics, 6th Edition, Pearson (2018)

Optional literature

  • -