Curriculum

 

Core courses:

Probability and Statistics for data analysis (6 units) Syllabus

Basic principles of Probabilities. Basic theorems in Probability e.g. law of large numbers, the Central Limit theorem etc. Common probability distributions. Principles of statistics. Data summarization. Statistical inference and causality, Experimental design and sampling methods, Estimation and hypothesis testing. Bootstrap and variants.

Practical Data Science (6 units) - Syllabus

The course gives students a set of practical skills for handling data that comes in a variety of formats and sizes, such as texts, spatial and time series data. These skills cover the data analysis lifecycle from initial access and acquisition, modeling, transformation, integration, querying, application of statistical learning and data mining methods, and presentation of results. (The course is hands-on, using python, in iPython interactive computing framework.)

Large Scale Data Management (6 units)

Methods and techniques for database design and management, operational data management and transaction processing, data warehouse creation, and information retrieval. New approaches for storage and querying (column stores, NewSQL) will be discussed and experimented upon. Management of large scale structured and unstructured data in different information systems environments.

Machine Learning and Computational Statistics (7 units) - Syllabus

Introduction to the basic ideas of statistical learning models (supervised and unsupervised learning). Model selection, feature selection and cross-validation. Linear regression and logistic regression. Generalized linear models. K-nearest neighbor classification, Bayes and naive Bayes classifiers. Kernel Discriminant Analysis and Support Vector Machines. Unsupervised learning methods. Clustering using k-means and mixtures models. The EM algorithm. Dimensionality reduction using PCA, probabilistic PCA, factor analysis and independent component analysis.

Numerical optimization and Large Scale Linear Algebra (6 units) - Syllabus

Floating point arithmetic; Stability of numerical algorithms; Norms; Fundamentals of matrix theory; Solution of systems of linear equations: direct methods, error analysis, structured matrices; Iterative methods for linear equations and least squares; Eigenanalysis; important matrix factorizations and their algorithms. Application to case studies.

Data visualization and communication (6 units) - Syllabus

Communicating clearly and effectively about the patterns we find in data is a key skill for a successful data scientist. Visualizations are graphical depictions that can improve comprehension. Collaborative filtering Visualizations will be paired with verbal analyses and reporting. Different tools will be used to transform data and create visualizations, including Python, Google Charts, Tableau, and Spotfire. Assignments will give students experience with reporting on complex patterns and results with graphics and prose.

Legal, ethical and policy issues in data science (3 units) - Syllabus

Discusses issues of privacy, surveillance, security, classification, discrimination and decisional autonomy from a legal, ethical, and policy perspective (whether business or public policy). Areas of relevance include health, marketing, employment, law enforcement, and education.

 

Electives (indicative list):

Data mining (6 units) - Syllabus

Data-oriented techniques for extracting patterns from data. Association rules, decision trees. Collaborative filtering and recommendation algorithms Finding similar items and frequent itemsets. Mining data streams. Mining social network graphs. Mining for Web advertising. Implementing machine learning schemes.

Bayesian Statistics and simulation methods (6 units) - Syllabus

Bayesian inference. Simulation and random number generation. Markov models and hidden Markov models. Probabilistic graphical models. Bayesian statistical methods, Markov chain Monte Carlo, Metropolis-Hastings algorithm, Gibbs sampling, sequential Monte Carlo methods, approximate Bayesian computation.

Advanced Large Scale Data Management (5 units)

Distributed and parallel data-oriented computation and transaction processing. Integration and management of large scale structured and unstructured data in different information systems environments.

Big Data Systems and techniques (6 units) - Syllabus

Techniques and best practices for the development of production Big Data systems using Parquet and ORC columnar storage files in Hadoop and the Apache Spark data processing framework with SQL Query Engines (Spark SQL, Presto). Integration with latest parallel Machine Learning Frameworks. Cloud service technologies like Amazon EMR. Data visualisation technologies. Streaming and realtime processing with Apache Storm + Kafka.  

Statistics for Big data (3 units) - Syllabus 

Small n large p problems, regularizations, model and variable selection techniques, LASSO, elastic net. Multiplicity. Graphical Models. Techniques for sparse matrices and graphical LASSO. Compressed sensing.

Time series and Forecasting methods (3 units) - Syllabus 

Basic principles, autocorrelation and autocovariance, Holt-Winters method, AR, ARMΑ, ARIMA models.  Regression models, ARCH – GARCH, volatility models.

Optimization (5 units) - Syllabus

Linear programming (formulations and algorithms), convex optimization and applications to machine learning (least squares, linear regression, gradient descent, support vector machines), combinatorial optimization (integer programming formulations, branch and bound), local search methods (hill climbing, tabu search, simulated annealing), genetic algorithms.

Text analytics (6 units) - Syllabus

Language models, text normalization. Applying feature extraction, classification, sequence labeling algorithms (e.g., PCA, naive Bayes, logistic regression, SVMs, HMMs, CRFs) to texts (for document classification, entity recognition etc.). Parsing (CKY, Earley, probabilistic CFGs). Semantics (logic-based, distributional, word embeddings, sense disambiguation) and discourse analysis (co-reference, rhetorical relations). Machine translation. Information extraction (incl., relation extraction) and sentiment analysis. Question answering. Text summarization. Concept-to-text generation. Speech recognition fundamentals.

Data science and optimization for operations management (5 units)

Overview of basic concepts from operations management: Process Analysis, queues, inventory management, revenue management. Demand Forecasting. Inventory/Replenishment Optimization. Lead Time Analysis. MRP/Production Planning. Fleet Allocation. Route Optimization

Marketing and sales analytics (6 units)

Overview of data mining techniques: clustering, classification, dimensionality reduction, sequence modeling. Techniques for Customer Segmentation. Churn management. Cross-/Up-sell Campaign Targeting. Next Best Action. Marketing Mix optimization. Omni-Channel Optimization. Loyalty Analytics. Basket Analysis

Data Science for medicine (3 units) - Syllabus

Introduction to epidemiological methods: bias, confounding, sample size. Survival analysis: hazard functions, parameter inference. Methods for categorical data. Analysis of contingency tables, risk assessment in retrospective and prospective studies.

Data Science for Biology (3 units) - Syllabus

Reproducible Research, Pedigree Analysis and Relationship Matrices, Experimental Design with emphasis in Replication and Confounding, Linear and Logistic Regression in R with many explanatory variables: Bayesian and classical treatment, Wide/un-stacked data and Genome Wide Association Studies

Information retrieval (3 units) - Syllabus

Text vocabulary, automatic indexing, inverted files, fast inversion algorithm, index compression. Evaluation of information retrieval systems. Information retrieval models (Boolean model, vector space model, probabilistic retrieval model), latent semantic indexing. Computing scores, result ranking. Crawling. Link analysis. Search engine architecture and systems issues.

Data curation (3 units)

Data lifecycle and value chains. Data provenance, curation and preservation: models, practices and tools. Using ontologies and metadata. Data and metadata aggregators and repositories.

Advanced Econometric Models for Finance (3 units) - Syllabus

Introduction to the theory and empirical analysis of advanced econometric models to financial applications. Optimal portfolio construction, performance evaluation and forecasting financial time series. Multivariate multifactor models. Multivariate heteroskedastic models. Examples applying these advanced econometric models/techniques to actual financial data using R. 

Data Science Challenge (5 units)

This course aims at getting the students the students familiar with the integrated workflow of a Data Science (DS) problem. There will be an introduction to the DS methods including data preprocessing, feature selection & engineering, machine learning,  graph/text mining and visualization. Next there will be an introduction to the specific data challenge and its domain specificities. The students will have a sufficient time period to work on and provide solutions to the challenge that will be submitted to a platform (such as Kaggle) that enables automated evaluation of predictions for unclassified data. At the end the best solutions will be presented to the class.

Introduction to Quantitative Finance and Financial Risk Management (5 units) - Syllabus

The course will provide an integrated overview of the basic financial instruments (securities and derivatives), the models of asset dynamics for different risk types (Equities, Interest Rates, FX & Credit) and the key techniques of identification, measurement and management of financial risk. Basic financial instruments and associated fundamental concepts: time value of money, interest rates and fixed income securities; Simple derivatives: Futures, Forwards and Interest Rate Swaps; Options and the Black-Scholes framework. Statistical measures and error metrics of different distributions. Value at Risk (VaR), Expected Shortfall; Methodologies for VaR calculation; Credit risk and the Basel II capital requirements.

Online Analytical Processing and Big Data Warehouses  (3 units) - Syllabus

What is BI? OLAP vs OLTP. Extract-Transform-Load; Process and tools. Datacubes, Models, operations, algorithms. Data warehouses. Indexing and updating. In-memory databases. Column Stores. NoSQL systems.

Social Network Analysis (3 units) - Syllabus

Social network graph models and node metrics. Methods for social network analysis, clustering, classification, partitioning and community detection. Pregel paradigm. Diffusion and information propagation in social networks. Dynamic social networks. Apache Giraph and SNAP graph processing systems. Graph visualization.

Financial Information Systems (3 units) - Syllabus

The aim of the Financial Information Systems (FIS) course is to present the Information Systems and their Technology that the Financial Industry (Banks, Investment Firms, Brokerage Houses, Exchanges) use, and how these IT systems operate and evolve.
 

Preparatory courses:

Elements of Statistics and Probability - Syllabus
The course is a sort and brief introduction to the basics of probability, statistics and data analysis. The aim is to remind to all graduates the basic notions of statistics in combination with a short introduction in R.
 
Foundations of Computer Science - Syllabus
This Preparatory course aims to amplify students knowledge on the design and analysis of algorithms for a wide spread of practical and theoretical problems.
 
Math for Data Science - Syllabus
The course is a brief overview of the basic tools from Linear Algebra and Multivariable Calculus that will be needed in subsequent courses of the program.