Skip to main content

Data Science Specialization


Overview

The Data Science Specializations are designed to meet a critical educational gap at the intersection of Biomedical & Health Informatics (BHI) and Data Science. The specializations, as electives in our research MS and PhD programs, provide students with an introduction to the world of data science, giving them additional skills and a variety of techniques and tools. The goal of the specialzations is to provide students the opportunity to acquire a strong foundation in data science, so they may apply those methods and techniques to current BHI research and further their careers. The specializations are aligned with the campus-wide eSciences Institute graduate-level educational offerings.

Find course requirements and descriptions below.

Regular Data Science Specialization

To complete the regular data science option, students must pass:

  • One out of three of the following Software Development courses:
    • CSE 583: Software Development for Data Scientists
    • CHEM E 546: Software Engineering for Molecular Data Scientists
    • BIOST 544: Introduction to Biomedical Data Science
  • One out of eight of the following Statistics and Machine Learning courses:
    • CSE 416: Introduction to Statistical Machine Learning
    • CSE 473: Artificial Intelligence
    • CSE 546: Machine Learning
    • STAT 435: Introduction to Statistical Machine Learning
    • STAT 535: Statistical Learning: Modeling, Prediction, and Computing
    • BIOST 545: Biostatistical Methods for Big Omics Data
    • BIOST 546: Machine Learning for Biomedical and Public Health Big Data
    • BIOST 558: Statistical Machine Learning for Data Scientists
  • One out of eight of the following Data Management and Data Visualization courses:
    • CSE 412: Introduction to Data Visualization
    • CSE 414: Introduction to Database Systems
    • CSE 442: Data Visualization
    • CSE 512: Data Visualization
    • CSE 544: Principles of Database Systems
    • HCDE 411: Information for Visualization
    • HCDE 511: Information Visualization
    • INFO 474: Interactive Information Visualization
  • Two quarters of the eScience Community Seminar (1 credit each quarter)

 

Advanced Data Science Specialization

For students with a strong computer science background, or for those who want a more intensive experience, we also offer the Advanced Data Science Specialization. In alignment with the eSciences Institute, those requirements are:

  • One course from each of the following four categories:
    • CSE 544: Principles of Database Systems
    • BIOST 546: Machine Learning for Biomedical and Public Health Big Data OR STAT 535: Statistical Learning: Modeling, Prediction, and Computing
    • CSE 512: Data Visualization
    • STAT 509: Econometrics I: Introduction to Mathematical Statistics; STAT 512: Statistical Interference OR STAT 513: Statistical Interference (a more in-depth version of 509)
  • Four quarters of the eScience Community Seminar (1 credit each quarter)

 

Data Science Course Descriptions, Prerequisites, Quarters Offered, and Number of Credits

Please note: The UW Time Schedule contains the most up-to-date information.

Software Development

CSE 583Software Development for Data Scientists — Provides students outside of CSE with a practical knowledge of software development that is sufficient to do graduate work in their discipline. Modules include Python basics, software version control, software design, and using Python for machine learning and visualization. Autumn, 4 credits

CHEM E 546Software Engineering for Molecular Data Scientists — Introduces basic principles of scientific software development in Python in the context of project-based work for molecular science and engineering spanning to the process scale. Covers command line tools, Python from the perspective of molecular and process engineering methods, software design, and development and collaboration principles (e.g., version control). Offered: jointly with CHEM 546/MSE 546 Winter, 3 credits

BIOST 544Introduction to Biomedical Data Science — Provides an introduction to biomedical data science with an emphasis on statistical perspectives, inducing the process of collecting, organizing, and integrating information toward extracting knowledge from data in public health, biology, and medicine. Prerequisites: either BIOST 511 or equivalent; either BIOST 509 or equivalent; or permission of instructor. Autumn, 4 credits

Statistics and Machine Learning

CSE 416Introduction to Statistical Machine Learning — Provides practical introduction to machine learning. Modules include regression, classification, clustering, retrieval, recommender systems, and deep learning, with a focus on an intuitive understanding grounded in real-world applications. Intelligent applications are designed and used to make predictions on large, complex datasets. offered jointly with STAT 416. Prerequisite: either CSE 123, CSE 143, CSE 160, or CSE 163; and either STAT 311, STAT 390, STAT 391, IND E 315, MATH 394/STAT 394, STAT 395/MATH 395, or Q SCI 381. Autumn, 4 credits

CSE 473: Artificial Intelligence —Principal ideas and developments in artificial intelligence: Problem solving and search, game playing, knowledge representation and reasoning, uncertainty, machine learning, natural language processing. Prerequisite: CSE 312 and CSE 332. Autumn, 3 credits

STAT 435: Introduction to Statistical Machine Learning — Introduces the theory and application of statistical machine learning. Topics may include supervised versus unsupervised learning; cross-validation; the bias-variance trade-off; regression and classification; regularization and shrinkage approaches; non-linear approaches; tree-based methods; and support vector machines. Includes applications in R. Prerequisite: either STAT 341, STAT 390/MATH 390, or STAT 391; recommended: MATH 208. Spring, 4 credits

BIOST 545Biostatistical Methods for Big Omics Data — This “hands-on” course introduces statistical methods for high-dimensional omics data, as well as the R programming language and the Bioconductor project as tools to extract, query, integrate, visualize, and analyze real world omics data sets. Prerequisites: BIOST 512, 514, or 517. Not offered, 3 credits

BIOST 546Machine Learning for Biomedical and Public Health Big Data — Provides an introduction to statistical learning for biomedical and public health data. Intended for graduate students in SPH/SOM. Winter, 3 credits

BIOST 558Statistical Machine Learning for Data Scientists — Bias-variance trade-off; training versus test error; overfitting; cross-validation; subset selection methods; regularized approaches for linear/logistic regression: ridge and lasso; non-parametric regression: trees, bagging, random forests; local regression and splines; generalized additive models; support vector machines; k-means and hierarchical clustering; principal components analysis. Offered: jointly with DATA 558/STAT 558. Spring, 5 credits

CSE 546: Machine Learning — This choice has a number of prerequisites and is part of the advanced data science option. Explores methods for designing systems that learn from data and improve with experience. Supervised learning and predictive modeling; decision trees, rule induction, nearest neighbors, Bayesian methods, neural networks, support vector machines, and model ensembles. Unsupervised learning and clustering. Offered jointly with STATS 535. Prerequisites: CSE 312, STAT 341, STAT 391 or equivalent. Autumn, 4 credits

STAT 535: Statistical Learning: Modeling, Prediction, and Computing — Covers statistical learning over discrete multivariate domains, exemplified by graphical probability models. Emphasizes the algorithmic and computational aspects of these models. Includes additional topics in probability and statistics of discrete structures, general purpose discrete optimization algorithms like dynamic programming and minimum spanning tree, and applications to data analysis. Prerequisite: experience with programming in a high level language. Autumn, 3 credits

Only for Advanced Data Science 

  • STAT 509: Introduction to Mathematical Statistics: Econometrics I — Examines methods, tools, and theory of mathematical statistics. Covers probability densities, transformations, moment generating functions, conditional expectation. Bayesian analysis with conjugate priors, hypothesis tests, the Neyman-Pearson Lemma. Likelihood ratio tests, confidence intervals, maximum likelihood estimation, Central limit theorem, Slutsky Theorems, and the delta-method. Prerequisite: STAT 311; either MATH 126 or MATH 136; and either MATH 208 or MATH 209. Offered: jointly with CSSS 509/ECON 580. Autumn, 4 credits
  • STAT 512: Statistical Interference — Review of random variables; transformations, conditional expectation, moment generating functions, convergence, limit theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation, sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Hypothesis testing: Neyman-Pearson lemma, monotone likelihood ratio, likelihood-ratio tests, large-sample theory. Contingency tables, confidence intervals, invariance. Decision theory. Course overlaps with: BIOST 522 and BIOST 523. Prerequisite: STAT 395 and STAT 421, STAT 423, STAT 504, or BIOST 512 (concurrent registration permitted for these three). Autumn, 4 credits
  • STAT 513: Statistical Interference — Review of random variables; transformations, conditional expectation, moment generating functions, convergence, limit theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation, sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Hypothesis testing: Neyman-Pearson lemma, monotone likelihood ratio, likelihood-ratio tests, large-sample theory. Contingency tables, confidence intervals, invariance. Decision theory. Course overlaps with: BIOST 522 and BIOST 523. Prerequisite: STAT 512. Winter, 4 credits

 

Data Management and Data Visualization

CSE 414Introduction to Database Systems — Introduces database management systems and writing applications that use such systems; data models, query languages, transactions, database tuning, data warehousing, and parallelism. Intended for non-majors. Cannot be taken for credit if credit received for CSE 344. Prerequisites: a minimum grade of 2.5 in either CSE 123, CSE 143, or CSE 163. Autumn, 4 credits

HCDE 411: Information for Visualization — Introduces the design and presentation of digital information. Covers the use of graphics, animation, sound, and other modalities in presenting information to the user; understanding vision and perception; methods of presenting complex information to enhance comprehension and analysis; and the incorporation of visualization techniques into human-computer interfaces. Prerequisite: HCDE 310; and either HCDE 210, HCDE 303, INFO 360, or both HCDE 300 and HCDE 318. Winter, 5 credits

HCDE 511: Information Visualization — Covers the design and presentation of digital information. Uses graphics, animation, sound, and other modalities in presenting information to users. Studies understanding vision and perception. Includes methods of presenting complex information to enhance comprehension and analysis; and incorporation of visualization techniques into human-computer interfaces. “For HCDE PhD only” offered in summer A term. Summer, Winter, 4 credits

INFO 474: Interactive Information Visualization — Techniques and theory for visualizing, analyzing, and supporting interaction with structured data like numbers, text, and relations. Provides practical experience designing and building interactive visualizations for the web. Exposes students to cognitive science, statistics, and perceptual psychology. An empirical approach will be used to design and evaluate visualizations. Prerequisite: INFO 340 or CSE 154; CSE 123, CSE 143, or CSE 163; and either QMETH 201, Q SCI 381, STAT 220, STAT 221/CS&SS 221/SOC 221, STAT 290, STAT 311, or STAT 390. Autumn, 5 credits

CSE 412: Introduction to Data Visualization — Introduction to data visualization design and use for both data exploration and explanation. Methods for creating effective visualizations using principles from graphic design, psychology, and statistics. Topics include data models, visual encoding methods, data preparation, exploratory analysis, uncertainty, cartography, interaction techniques, visual perception, and evaluation methods. Cannot be taken for credit if credit received for CSE 442. Prerequisite: either CSE 123, CSE 143, or CSE 163. Winter 4, credits

CSE 442: Data Visualization — Techniques for creating effective visualizations of data based on principles from graphic design, perceptual psychology, and statistics. Topics include visual encoding models, exploratory data analysis, visualization software, interaction techniques, graphical perception, color, animation, high-dimensional data, cartography, network visualization, and text visualization. Prerequisite: CSE 332. Autumn, 4 credits

CSE 544Principles of Database Systems — Data models and query languages (SQL, datalog, OQL). Relational databases, enforcement of integrity constraints. Object-oriented databases and object-relational databases. Principles of data storage and indexing. Query-execution methods and query optimization algorithms. Static analysis of queries and rewriting of queries using views. Data integration. Data mining. Principles of transaction processing. This choice requires a number of prerequisites, including comfort in Java programming. This choice is part of the advanced data science option. Winter, 4 credits

CSE 512Data Visualization — Covers techniques and algorithms for creating effective visualizations based on principles from graphic design, visual art, perceptual psychology, and cognitive science. Topics include data and image models; visual encoding; graphical perception; color; animation; interaction techniques; graph layout; and automated design. Lectures, reading, and project. This choice is part of the advanced data science option. Spring, 4 credits