• Who’s Teaching What
  • Subject Updates
  • MEng program
  • Opportunities
  • Minor in Computer Science
  • Resources for Current Students
  • Program objectives and accreditation
  • Graduate program requirements
  • Admission process
  • Degree programs
  • Graduate research
  • EECS Graduate Funding
  • Resources for current students
  • Student profiles
  • Instructors
  • DEI data and documents
  • Recruitment and outreach
  • Community and resources
  • Get involved / self-education
  • Rising Stars in EECS
  • Graduate Application Assistance Program (GAAP)
  • MIT Summer Research Program (MSRP)
  • Sloan-MIT University Center for Exemplary Mentoring (UCEM)
  • Electrical Engineering
  • Computer Science
  • Artificial Intelligence + Decision-making
  • AI and Society
  • AI for Healthcare and Life Sciences
  • Artificial Intelligence and Machine Learning
  • Biological and Medical Devices and Systems
  • Communications Systems
  • Computational Biology
  • Computational Fabrication and Manufacturing
  • Computer Architecture
  • Educational Technology
  • Electronic, Magnetic, Optical and Quantum Materials and Devices
  • Graphics and Vision
  • Human-Computer Interaction
  • Information Science and Systems
  • Integrated Circuits and Systems
  • Nanoscale Materials, Devices, and Systems
  • Natural Language and Speech Processing
  • Optics + Photonics
  • Optimization and Game Theory
  • Programming Languages and Software Engineering
  • Quantum Computing, Communication, and Sensing
  • Security and Cryptography
  • Signal Processing
  • Systems and Networking
  • Systems Theory, Control, and Autonomy
  • Theory of Computation
  • Departmental History
  • Departmental Organization
  • Visiting Committee
  • News & Events
  • News & Events
  • EECS Celebrates Awards

Doctoral Thesis: The Modeling Spectrum of Data-Driven Decision Making

Xianglin Flora Meng

Data-driven decision-making has become an essential part of modern life by virtue of the rapid growth in data, the massive improvements in computing power, and great progress in academic research. The range of techniques used fall broadly on the spectrum that varies from model-based to applied, depending on the problem complexity and data availability. This thesis studies three settings that span the modeling spectrum in the contexts of digital agriculture, cell reprogramming, and pandemic policymaking.

First, we investigate the problem of learning good farming practices in the framework of multi-armed bandits with expert advice. We extend the setting from finitely many experts to any countably infinite set and provide algorithms that are provably optimal. Second, we explore optimizing perturbations for cell reprogramming in batched experiments. Building upon multi-armed bandit algorithms, we propose an active learning approach that integrates deep learning and biology-based analysis. We numerically demonstrate the success of our method on gene expression data. Finally, we model the impacts of nonpharmaceutical interventions during the coronavirus disease 2019 (COVID-19) pandemic. We develop an agent-based model in order to overcome the limitations of observational data. We show that the trade-off between COVID-19 deaths and deaths of despair, dependent on the lockdown level, only exists in the socioeconomically disadvantaged population. Our model establishes effective measures for reducing disparities during the pandemic.

  • Date: Thursday, July 28
  • Time: 10:30 am - 12:00 pm
  • Category: Thesis Defense
  • Location: 32-D677

Additional Location Details:

CODA Logo

  • Simple Search
  • Advanced Search
  • Deposit an Item
  • Deposit Instructions
  • Instructions for Students

Wei, Skylar Xueyao (2024) Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/qpbp-0x81.

This thesis addresses the critical challenge of ensuring safety in autonomous exploration within unknown, unstructured, dynamic environments, a domain filled with various types of uncertainties. These include model uncertainties in system dynamics, localization uncertainties stemming from measurement noises, and the risks of collision in environments with dynamic obstacles. Traditional models for vehicle planning and control are often simplified for computational feasibility, but this simplification without careful analysis can compromise safety and system stability. My research introduces a novel, comprehensive framework to provide probabilistically safe planning and control for robot autonomy, structured around three components:

(1) Probabilistic Uncertainty Quantification for Model Mismatches:

This segment focuses on identifying model discrepancies given closed-loop tracking data in an unstructured environment where a reduced-order robot model is used for planning and control. The disturbance is modeled as a scalar-valued stochastic process of a norm on the difference between the reduce-order robot model and actual system evolution. In an online and risk-aware framework, Gaussian Process Regression is employed to extract the probabilistic upper bound to such stochastic process, referred to as the Surface-at-Risk. Theoretical guarantees on the accuracy of the fitted discrepancy surface are analyzed and verified to the data sets collected during system operation.

In an offline setting, conformal prediction, a statistical inference tool, is employed to obtain probabilistic upper bounds of matched and unmatched model disturbance in the system from data, without any assumption of the latent probability distribution governing these discrepancies. Building on these bounds, the robot's nominal ancillary controller is augmented for extending robustness and stability guarantees of the closed-loop system in the face of such discrepancies. Additionally, a maximum tracking error tube is constructed along the planned trajectory using the reduced-order model. Such error tubes describe the maximum permissible deviation in actual trajectory tracking under the augmented ancillary controller and the worst-case matched and unmatched model uncertainties, thereby delineating safe operational boundaries for the system.

(2) Data-Driven Unsafe Set Prediction for Dynamic Obstacles:

This thesis topic develops an online, data-driven predictive model for dynamic obstacles, accounting for measurement noise and low-frequency data rates. First inspired by singular spectrum analysis (SSA), a time-series forecast technique, obstacle models characterized by linear recurrence relationships are extracted from real-time position observables. Using the statistical bootstrap technique, a set of predicted obstacle trajectories are constructed, which in turn are reformulated into deterministic distributionally robust obstacle avoidance constraints, reflecting a user-defined risk tolerance.

Further refining the obstacle predictor for intention-unknown obstacles, a linear, time-varying model is learned from data using time-delay embedding of obstacle position observables. Additive process and measurement noises are anticipated in the learned model, where their intensities are estimated from data. For inferring prediction uncertainties, a companion data-driven Kalman Filter (DDKF) is constructed to forecast obstacle positions and uncertainties. This "heuristic unsafe set" from DDKF is then dynamically calibrated using adaptive conformal prediction, ensuring safety without relying on any distribution assumptions regarding the uncertainties or model accuracy. The calibrated sets, called conformal prediction sets, are then reformulated into convex state constraints.

(3) Safety-Critical Planning:

The thesis proposes two methods for ensuring safety in planning and navigation: Probabilistic-Safe Model Predictive Control (MPC) and Probabilistic-Safe Model Predictive Path Integral (MPPI) given uncertainties arising from operating in unknown, unstructured, and dynamic environments. The MPC approach integrates the quantified obstacle avoidance constraints into a convex program to balance computational tractability while providing probabilistic safety guarantees. In contrast, the MPPI method, a sampling-based strategy, incorporating unsafe sets into a cost map derived from sensory data, optimizes reference tracking trajectory while guaranteeing collision avoidance up to a user-defined risk tolerance.

In unknown and cluttered environments automatically, the proposed framework learns an upper bound on model residuals from data and systematically calculates the safety buffers needed to provide the desired probabilistic safe navigation of robotics systems. Additionally, in the presence of dynamic obstacles, the proposed data-driven predictor systematically extracts an obstacle model and makes obstacle-occupied unsafe set forecasts. These features largely eliminate the "hand tuning" of the underlying planner and controller that is normally required in heuristic-based algorithms. The efficacy of these proposed frameworks is empirically validated through Monte Carlo Simulations, alongside hardware validations on both ground and aerial vehicles, demonstrating their robustness, versatility, and applicability in real-world scenarios.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Risk-Aware Motion Planning, Model Predictive Control, Collision Avoiding, Data-Driven Methods, Robotics.
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Control and Dynamical Systems
Awards:Outstanding Student Paper Award, CDC 2023.
Thesis Availability:Public (worldwide access)
Research Advisor(s):
Thesis Committee:
Defense Date:6 February 2024
Funders:
Funding AgencyGrant Number
JPLJPL’s President’s and Director’s Research and Development Fund
DARPALINC
Technology Innovation InstituteUNSPECIFIED
URLURL TypeDescription
PublisherPaper adapted for Chapters 2 and 3.
DOIPaper adapted for Chapters 2 and 5.
PublisherPaper adapted for Chapters 2 and 6.
arXivPaper adapted for Chapters 2 and 4.
PublisherPaper adapted for Chapter 2.
AuthorORCID
Wei, Skylar Xueyao

Thesis Files

- Final Version
9MB

Repository Staff Only: item control page

  • Princeton University Doctoral Dissertations, 2011-2024
  • Mechanical and Aerospace Engineering
Title: Data-driven Modeling for Fluid Dynamics and Control
Authors: 
Advisors: 
Contributors: Mechanical and Aerospace Engineering Department
Keywords: 




Subjects: 

Issue Date: 2020
Publisher: Princeton, NJ : Princeton University
Abstract: One of the most critical tasks in fluid dynamics and control is to build simple, low-order, and accurate models. The models are essential for understanding dynamics and control. However, in many cases, the models are either unknown or too complicated to be useful. As an example, fluid flows are governed by Navier-Stokes equations (NSE), which remain intractable for real-time applications. Meanwhile, with increasing computational power and advances in experimental and numerical methods, researchers have access to much more data about dynamical systems. For instance, computational fluid dynamics (CFD) produces tons of data, but the data have not been fully utilized. Data-driven modeling addresses these challenges by learning dynamical system models from data. This thesis focuses on data-driven modeling methods for applications in fluid dynamics and control. First, we propose an evaluation criterion to quantify the accuracy of dynamic mode decomposition (DMD), a data-driven algorithm for extracting spatial and temporal features about dynamical systems from data. DMD is a numerical approximation to the linear Koopman operator associated with a dynamical system. By exploiting this connection, the accuracy criterion is purely data-driven and physically meaningful. It also applies to other variants of DMD algorithms and assists in model selection. Second, fast algorithms are developed for online dynamic mode decomposition (ODMD). Given real-time measurement about a dynamical system, this algorithm efficiently updates an adaptive model upon each new snapshot. It reduces both the computational time and memory requirements by order of magnitudes compared with existing methods. ODMD algorithm can be modified to gradually forget old data, which enables faster tracking of dynamics. ODMD also extends to both linear and nonlinear system identification, where control is included. Finally, we study the input-output response of a separated flow past a flat plate. The analysis is based on the frequency-domain transfer function of the linearized NSE about the mean flow. The control input is body forcing, and the output is the flow field. This analysis sheds light on the optimal control placement and reveals that the trailing edge separation bubble is most sensitive to streamwise body force (control) in upstream of the separation point.
URI: 
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:
File Description SizeFormat 
Zhang_princeton_0181D_13281.pdf11.46 MBAdobe PDF

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.

Graduate Thesis Or Dissertation

Data-driven model development and identification of dynamical systems public deposited.

Default

In recent years, data-driven model discovery has become increasingly popular due to rapid advances in computational power, and data processing and storage procedures. This has fostered the development of new algorithms to identify complex systems from data. However, the performance and robustness of the present techniques significantly deteriorate when the data is contaminated with noise. This dissertation considers modern sparse regression techniques to robustly recover governing equations of nonlinear dynamical systems from noisy state measurements. Comprised in three main chapters, we investigate convex  ℓ 1 -regularized least squares methods, denoising strategies to enhance the performance and accuracy of identification algorithms, and non-convex optimization procedures for dynamical system identification. We begin by exploring an iteratively reweighted version of l1-regularized least squares to mitigate noise effects on measurements and conclude that a reweighted approach enhances the accuracy of the dynamical identification process. We also propose a method to recover dynamical constraints given by implicit functions of the state variables. Next, we compare and assess local and global measurement denoising strategies as well as model selection techniques as a pre-processing step to improve the robustness and performance of sparse identification algorithms. We empirically prove that global methods outperform local methods, and that Pareto curves generally yield better regularization parameters than generalized cross-validation. Finally, we present a promising non-convex formulation and suitable optimization algorithms for sparse dynamical system identification that avoids errors arising from numerical differentiation of noisy data. We conclude by discussing potential improvements for non-convex dynamical system identification approaches and provide further research directions.

  • Cortiella, Alexandre
  • Aerospace Engineering
  • Doostan, Alireza
  • Scheeres, Daniel
  • Becker, Stephen
  • Maute, Kurt
  • Hussein, Mahmoud
  • University of Colorado Boulder
  • Machine Learning
  • Applied mathematics
  • Engineering
  • Aerospace engineering
  • Sparse Regression
  • System Identification
  • Data-driven modeling
  • Dissertation
  • In Copyright
  • English [eng]

Relationships

Thumbnail Title Date Uploaded Visibility Actions
2022-03-05 Public ' $('.canonical-image').after(template) $('.canonical-image').remove() }
2022-03-05 Public ' $('.canonical-image').after(template) $('.canonical-image').remove() }

data driven thesis

Research Topics & Ideas: Data Science

50 Topic Ideas To Kickstart Your Research Project

Research topics and ideas about data science and big data analytics

If you’re just starting out exploring data science-related topics for your dissertation, thesis or research project, you’ve come to the right place. In this post, we’ll help kickstart your research by providing a hearty list of data science and analytics-related research ideas , including examples from recent studies.

PS – This is just the start…

We know it’s exciting to run through a list of research topics, but please keep in mind that this list is just a starting point . These topic ideas provided here are intentionally broad and generic , so keep in mind that you will need to develop them further. Nevertheless, they should inspire some ideas for your project.

To develop a suitable research topic, you’ll need to identify a clear and convincing research gap , and a viable plan to fill that gap. If this sounds foreign to you, check out our free research topic webinar that explores how to find and refine a high-quality research topic, from scratch. Alternatively, consider our 1-on-1 coaching service .

Research topic idea mega list

Data Science-Related Research Topics

  • Developing machine learning models for real-time fraud detection in online transactions.
  • The use of big data analytics in predicting and managing urban traffic flow.
  • Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.
  • The application of predictive analytics in personalizing cancer treatment plans.
  • Analyzing consumer behavior through big data to enhance retail marketing strategies.
  • The role of data science in optimizing renewable energy generation from wind farms.
  • Developing natural language processing algorithms for real-time news aggregation and summarization.
  • The application of big data in monitoring and predicting epidemic outbreaks.
  • Investigating the use of machine learning in automating credit scoring for microfinance.
  • The role of data analytics in improving patient care in telemedicine.
  • Developing AI-driven models for predictive maintenance in the manufacturing industry.
  • The use of big data analytics in enhancing cybersecurity threat intelligence.
  • Investigating the impact of sentiment analysis on brand reputation management.
  • The application of data science in optimizing logistics and supply chain operations.
  • Developing deep learning techniques for image recognition in medical diagnostics.
  • The role of big data in analyzing climate change impacts on agricultural productivity.
  • Investigating the use of data analytics in optimizing energy consumption in smart buildings.
  • The application of machine learning in detecting plagiarism in academic works.
  • Analyzing social media data for trends in political opinion and electoral predictions.
  • The role of big data in enhancing sports performance analytics.
  • Developing data-driven strategies for effective water resource management.
  • The use of big data in improving customer experience in the banking sector.
  • Investigating the application of data science in fraud detection in insurance claims.
  • The role of predictive analytics in financial market risk assessment.
  • Developing AI models for early detection of network vulnerabilities.

Research topic evaluator

Data Science Research Ideas (Continued)

  • The application of big data in public transportation systems for route optimization.
  • Investigating the impact of big data analytics on e-commerce recommendation systems.
  • The use of data mining techniques in understanding consumer preferences in the entertainment industry.
  • Developing predictive models for real estate pricing and market trends.
  • The role of big data in tracking and managing environmental pollution.
  • Investigating the use of data analytics in improving airline operational efficiency.
  • The application of machine learning in optimizing pharmaceutical drug discovery.
  • Analyzing online customer reviews to inform product development in the tech industry.
  • The role of data science in crime prediction and prevention strategies.
  • Developing models for analyzing financial time series data for investment strategies.
  • The use of big data in assessing the impact of educational policies on student performance.
  • Investigating the effectiveness of data visualization techniques in business reporting.
  • The application of data analytics in human resource management and talent acquisition.
  • Developing algorithms for anomaly detection in network traffic data.
  • The role of machine learning in enhancing personalized online learning experiences.
  • Investigating the use of big data in urban planning and smart city development.
  • The application of predictive analytics in weather forecasting and disaster management.
  • Analyzing consumer data to drive innovations in the automotive industry.
  • The role of data science in optimizing content delivery networks for streaming services.
  • Developing machine learning models for automated text classification in legal documents.
  • The use of big data in tracking global supply chain disruptions.
  • Investigating the application of data analytics in personalized nutrition and fitness.
  • The role of big data in enhancing the accuracy of geological surveying for natural resource exploration.
  • Developing predictive models for customer churn in the telecommunications industry.
  • The application of data science in optimizing advertisement placement and reach.

Recent Data Science-Related Studies

While the ideas we’ve presented above are a decent starting point for finding a research topic, they are fairly generic and non-specific. So, it helps to look at actual studies in the data science and analytics space to see how this all comes together in practice.

Below, we’ve included a selection of recent studies to help refine your thinking. These are actual studies,  so they can provide some useful insight as to what a research topic looks like in practice.

  • Data Science in Healthcare: COVID-19 and Beyond (Hulsen, 2022)
  • Auto-ML Web-application for Automated Machine Learning Algorithm Training and evaluation (Mukherjee & Rao, 2022)
  • Survey on Statistics and ML in Data Science and Effect in Businesses (Reddy et al., 2022)
  • Visualization in Data Science VDS @ KDD 2022 (Plant et al., 2022)
  • An Essay on How Data Science Can Strengthen Business (Santos, 2023)
  • A Deep study of Data science related problems, application and machine learning algorithms utilized in Data science (Ranjani et al., 2022)
  • You Teach WHAT in Your Data Science Course?!? (Posner & Kerby-Helm, 2022)
  • Statistical Analysis for the Traffic Police Activity: Nashville, Tennessee, USA (Tufail & Gul, 2022)
  • Data Management and Visual Information Processing in Financial Organization using Machine Learning (Balamurugan et al., 2022)
  • A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis (Pitroda, 2022)
  • Applications of Data Science in Respective Engineering Domains (Rasool & Chaudhary, 2022)
  • Jupyter Notebooks for Introducing Data Science to Novice Users (Fruchart et al., 2022)
  • Towards a Systematic Review of Data Science Programs: Themes, Courses, and Ethics (Nellore & Zimmer, 2022)
  • Application of data science and bioinformatics in healthcare technologies (Veeranki & Varshney, 2022)
  • TAPS Responsibility Matrix: A tool for responsible data science by design (Urovi et al., 2023)
  • Data Detectives: A Data Science Program for Middle Grade Learners (Thompson & Irgens, 2022)
  • MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH (Mike & Hazzan, 2022)
  • COMPONENTS OF DATA SCIENCE AND ITS APPLICATIONS (Paul et al., 2022)
  • Analysis on the Application of Data Science in Business Analytics (Wang, 2022)

As you can see, these research topics are a lot more focused than the generic topic ideas we presented earlier. So, for you to develop a high-quality research topic, you’ll need to get specific and laser-focused on a specific context with specific variables of interest.  In the video below, we explore some other important things you’ll need to consider when crafting your research topic.

Get 1-On-1 Help

If you’re still unsure about how to find a quality research topic, check out our Research Topic Kickstarter service, which is the perfect starting point for developing a unique, well-justified research topic.

Research Topic Kickstarter - Need Help Finding A Research Topic?

I have to submit dissertation. can I get any help

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

data driven thesis

Author: 
Title: Data-driven modelling of soil properties and behaviours with geotechnical applications
Advisors: Yin, Zhen-yu (CEE)
Yin, Jian-hua (CEE)
Degree: Ph.D.
Year: 2022
Award: FCE Awards for Outstanding PhD Theses (2022/23)
PolyU PhD Thesis Award - Merit Award (2023)
Subject: Soil mechanics
Soil mechanics -- Data processing
Hong Kong Polytechnic University -- Dissertations
Department: Department of Civil and Environmental Engineering
Pages: xvii, 218 pages : color illustrations
Language: English
Abstract: Understanding soil properties and behaviours are fundamental to geotechnical design. Myriad empirical and analytical models have been proposed for prediction accordingly but they tend to be site-specific and increasing parameters need to be calibrated for constitutive models. With the increasing data in the geotechnical domain, machine learning (ML) has emerged as a new methodology to directly learn from raw data to identify soil properties and behaviours. Its applicability has been proved to be promising because of its versatility and strong fitting capability. Nevertheless, the current ML-based data-driven models still exhibited limitations including lack of interpretability, dependency on numerous high-quality data and poor generalization ability, thus they are still far away from application to engineering practice. To this end, this study aims to elaborate data-driven models for predicting soil properties and mechanical behaviours merely based on their micro computed-tomography (µCT) images, as well as facilitate their applications in geotechnical engineering. First, a set of ML-assisted algorithms is developed for automatically reconstructing three-dimensional real particles from µCT images and subsequently identifying their particle size and morphology. Bayesian inference is incorporated into the ML algorithms for enhancing the interpretability of the data-driven model. Then, a multi-fidelity residual neural network incorporating Bayesian uncertainty is proposed to leverage existing knowledge and limited high-quality data for modelling mechanical behaviours of soils. In this context, a multi-scale data-driven model is proposed from the identification of particle size and morphology to the prediction of their mechanical responses together with fabric evolution. Finally, the developed data-driven models are integrated with finite element code for modelling boundary value problems and the results are compared with conventional numerical modelling methods and measurements for the validation. The proposed data-driven modelling methods are successfully used to predict various soil properties such as compressibility, creep, strength and permeability, behaviours such as anisotropy and dilatancy and boundary value problems.
Rights: All rights reserved
Access: open access
File Description SizeFormat 
For All Users12.14 MBAdobe PDF

As a bona fide Library user, I declare that:

  • I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  • I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  • I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

  • DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Data-driven decision making in online and offline retail/

Thumbnail

Other Contributors

Terms of use, description, date issued, collections.

Show Statistical Information

Purdue University Graduate School

File(s) under embargo

until file(s) become available

First Principles and Machine Learning-Based Analyses of Stability and Reactivity Trends for High-Entropy Alloy Catalysts

Since its inception, the field of heterogeneous catalysis has evolved to address the needs of the ever-growing human population. Necessity, after all, fosters innovation. Today, the world faces numerous challenges related to anthropogenic climate change, and that has necessitated, among other things, a search for new catalysts that can enable renewable energy conversion and storage, sustainable food and chemicals production, and a reduction in carbon emissions. This search has led to the emergence of many promising classes of materials, each having a unique set of catalytic properties. Among such candidate materials, high-entropy alloys (HEAs) have very recently shown the potential to be a new catalyst design paradigm. HEAs are multimetallic, disordered alloys containing more than four elements and, as a result, possess a higher configurational entropy, which gives them considerable stability. They have many conceivable benefits over conventional bimetallic alloy catalysts—greater site heterogeneity, larger design space, and higher stability, among others­. Consequently, there is a need to explore their application in a wide range of thermal and electrocatalytic reaction systems so that their potential can be realized.

In the past few decades, first principles-based approaches involving Density Functional Theory (DFT) calculations have proven to be effective in probing catalytic mechanisms at the atomic scale. Fundamental insights from first principles studies have also led to a detailed understanding of reactivity and stability trends for bimetallic alloy catalysts. However, the express application of first principles approaches to study HEA catalysts remains a challenge, due to the large computational cost incurred in performing DFT calculations for disordered alloys, which can be represented by millions of different configurations. A combination of first principles approaches and computationally efficient machine learning (ML) approaches can, however, potentially overcome this limitation.

In this thesis, combined workflows involving first principles and machine learning-based approaches are developed. To map catalyst structure to properties graph convolutional network (GCN) models are developed and trained on DFT-predicted target properties such as formation energies, surface energies, and adsorption energies. Further, the Monte Carlo dropout method is integrated into GCN models to provide uncertainty quantification, and these models are in turn used in active learning workflows that involve iterative model retraining to both improve model predictions and optimize the target property value. Dimensionality reduction methods, such as principal components analysis (PCA) and Diffusion Maps (DMaps), are used to glean physicochemical insights from the parameterization of the GCN.

These workflows are applied to the analysis of binary, ternary, and quaternary alloy catalysts, and a series of fundamental insights regarding their stability are elucidated. In particular, the origin and stability of “Pt skins” that form on Pt-based bimetallic alloys such as Pt 3 Ni in the context of the oxygen reduction reaction (ORR) are investigated using a rigorous surface thermodynamic framework. The active learning workflow enables the study of Pt skin formation on stepped facets of Pt 3 Ni (with a complex, low-symmetry geometry), and this analysis reveals a hitherto undiscovered relationship between surface coordination and surface segregation. In another study, an active learning workflow is used to identify the most stable bulk composition in the Pd-Pt-Sn ternary alloy system using a combination of exhaustively sampled binary alloy data and prudently sampled ternary alloy data. Lastly, a new GCN model architecture, called SlabGCN, is introduced to predict the sulfur poisoning characteristics of quaternary alloy catalysts, and to find an optimal sulfur tolerant composition.

On another front, the electrocatalytic activity of quinary HEAs towards the ORR is investigated by performing DFT calculations on HEA structures generated using the High-Entropy Alloy Toolbox (HEAT), an in-house code developed for the high-throughput generation and analysis of disordered alloy structures with stability constraints (such as Pt skin formation). DFT-predicted adsorption energies of key ORR intermediates are further deconvoluted into ligand, strain, and surface relaxation effects, and the influence of the number of Pt skins on these effects is expounded. A Sabatier volcano analysis is performed to calculate the ORR activities of selected HEA compositions, and correspondence between theoretical predictions and experimental results is established, to pave the way for rational design of HEA catalysts for oxygen reduction.

In summary, this thesis examines stability and reactivity trends of a multitude of alloy catalysts, from conventional bimetallic alloys to high-entropy alloys, using a combination of first principles approaches (involving Density Functional Theory calculations) and machine learning approaches comprising graph convolutional network models.

Data Science-Driven Discovery of Multimetallic Oxygen-cycle Electrocatalysts for Enhanced Energy Conversion

Office of Basic Energy Sciences

Degree Type

  • Doctor of Philosophy
  • Chemical Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Additional committee member 2, additional committee member 3, additional committee member 4, usage metrics.

  • Chemical engineering not elsewhere classified
  • Catalysis and mechanisms of reactions

CC BY 4.0

IMAGES

  1. Data-driven hypothesis development

    data driven thesis

  2. Data Driven PowerPoint for Thesis Results

    data driven thesis

  3. Data Driven software development at large scale Thesis 2018

    data driven thesis

  4. Data-Driven Approaches for Sparse Reflectance Modeling and Acquisition

    data driven thesis

  5. Thesis

    data driven thesis

  6. Master Thesis Data Collection

    data driven thesis

COMMENTS

  1. PDF Data-driven Decision Making in Operations Management

    decade, this thesis studies the length to which we can push various business oper-ations with new technologies, in our theoretical understanding and practical per-formance alike. Towards this goal, this thesis develops data-driven decision-making methods for a selection of challenging emerging problems in supply chain and other business operations.

  2. Data-driven decision making : an adoption framework

    Data powers insights, decision and actions, and we are only scratching the surface of the value that can be created, captured and redistributed through data-driven decision making. Description Thesis: S.M. in Management Studies, Massachusetts Institute of Technology, Sloan School of Management, 2017.

  3. Data-Driven Dynamic Decision Making: Algorithms, Structures, and

    Abstract. This thesis aims to advance the theory and practice of data-driven dynamic decision making, by synergizing ideas from machine learning and operations research. Throughout this thesis, we focus on three aspects: (i) developing new, practical algorithms that systematically empower data-driven dynamic decision making, (ii) identifying ...

  4. PDF Data-Driven Dynamic Decision Making: Algorithms, Structures, and

    Data-DrivenDynamicDecisionMaking: Algorithms,Structures,and ComplexityAnalysis by Yunzong Xu SubmittedtotheInstituteforData,Systems,andSociety onMay5,2023 ...

  5. PDF Enhancing Data-Driven Decision Support with Multi-Perspective Solutions

    This thesis takes a multi-perspective view toward several important decision support problems encountered in various application domains and attempts to enhance existing data-driven techniques for better decision support. 6 In summary, all three essays are related to the overarching theme of the thesis.

  6. PDF Data Driven Computing

    Data Driven Computing is a new eld of computational analysis which uses provided data to directly produce predictive outcomes. This thesis rst establishes de nitions of Data-Driven solvers and working examples of static mechanics problems to demonstrate e cacy. Signi cant extensions are

  7. PDF FROM CHAOS TO ORDER: A study on how data-driven

    Master Thesis project: FROM CHAOS TO ORDER: A study on how data-driven development can help improve decision-making 2 Contact information Author: ... Keywords: Data-driven development, Agile methodologies, Waterfall model, Continuous Integration, Continuous Deployment, A/B Testing, Test Driven Development, Product ...

  8. PDF Data-Driven Decision-Making and Its Impacts on Education Quality in

    Articles report data-driven decision-making with the aid of information and communication technology. Studies report data-driven decision-making and its impact on education quality in developing countries. Exclusion Articles do not discuss data-driven decision-making technologies in countries other than developing ones.

  9. PDF The Role of Big Data in Strategic Decision-making

    (Talouselämä, 2013). Data utilization is shaping how organizations ought to do business and even survive (Hurwitz, 2013). Digital transformation has forced most organizations to operate in data-driven ways and data-driven decisions have widely been argued to lead to higher performance and sometimes even in competitive

  10. Data Driven Computing

    Data Driven Computing is a new field of computational analysis which uses provided data to directly produce predictive outcomes. This thesis first establishes definitions of Data-Driven solvers and working examples of static mechanics problems to demonstrate efficacy. Significant extensions are then explored to both accommodate noisy data sets ...

  11. PDF Linguistic Knowledge in Data-Driven Natural Language Processing

    The central goal of this thesis is to bridge the divide between theoretical linguistics—the scien-tific inquiry of language—and applied data-driven statistical language processing, to provide deeper insight into data and to build more powerful, robust models. To corroborate the practi-

  12. Doctoral Thesis: The Modeling Spectrum of Data-Driven Decision Making

    Abstract: Data-driven decision-making has become an essential part of modern life by virtue of the rapid growth in data, the massive improvements in computing power, and great progress in academic research. The range of techniques used fall broadly on the spectrum that varies from model-based to applied, depending on the problem complexity and ...

  13. Data-Driven Safety-Critical Autonomy in Unknown, Unstructured, and

    This thesis topic develops an online, data-driven predictive model for dynamic obstacles, accounting for measurement noise and low-frequency data rates. First inspired by singular spectrum analysis (SSA), a time-series forecast technique, obstacle models characterized by linear recurrence relationships are extracted from real-time position ...

  14. Full article: Data-based decision-making for school improvement

    This is often referred to as data-based decision-making, data-driven decision-making or data-informed decision-making, which can be defined as the process of 'systematically analyzing existing data sources within the school, applying the outcomes of analyses in order to innovate teaching, curricula, and school performance, and, implementing ...

  15. DataSpace: Data-driven Modeling for Fluid Dynamics and Control

    Data-driven modeling addresses these challenges by learning dynamical system models from data. This thesis focuses on data-driven modeling methods for applications in fluid dynamics and control. First, we propose an evaluation criterion to quantify the accuracy of dynamic mode decomposition (DMD), a data-driven algorithm for extracting spatial ...

  16. Graduate Thesis Or Dissertation

    In recent years, data-driven model discovery has become increasingly popular due to rapid advances in computational power, and data processing and storage procedures. This has fostered the development of new algorithms to identify complex systems from data.

  17. Research Topics & Ideas: Data Science

    If you're just starting out exploring data science-related topics for your dissertation, thesis or research project, you've come to the right place. In this post, we'll help kickstart your research by providing a hearty list of data science and analytics-related research ideas, including examples from recent studies.. PS - This is just the start…

  18. Data-driven decision making : an adoption framework

    Thesis: S.M. in Management Studies, Massachusetts Institute of Technology, Sloan School of Management, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages ix-xii). ... an obligatory first step is to make decisions more data-driven, and less guided by intuition. While the positive effects of data-driven ...

  19. Data-Driven vs. Hypothesis-Driven Research: Making sense of big data

    To explore these questions, we examine several fields and describe an historical progression in knowledge production. We believe that in these contexts large scale data collection and analysis represent the next step - going beyond the capabilities of todays simulation models with an empirical data-collection driven approach.

  20. Data-Driven Decision Making in Operations Management

    Abstract. Encouraged by the plethora of advances in artificial intelligence (AI) in the past decade, this thesis studies the length to which we can push various business operations with new technologies, in our theoretical understanding and practical performance alike. Towards this goal, this thesis develops data-driven decision-making methods ...

  21. PDF Nonlinear Control of A Ground Vehicle Using Data-driven Dynamic Models

    an extremely large data set is required for training. A good compromise between the analytical and model-free approaches may come in the form of empirical data-fitting. Using data-driven dynamics, collected data is fit using simpler mathematical models such as n-th degree multivariable polynomials. A famous example is Pacejka's empirical ...

  22. PolyU Electronic Theses: Data-driven modelling of soil properties and

    PolyU PhD Thesis Award - Merit Award (2023) Subject: Soil mechanics Soil mechanics -- Data processing Hong Kong Polytechnic University -- Dissertations: ... The proposed data-driven modelling methods are successfully used to predict various soil properties such as compressibility, creep, strength and permeability, behaviours such as anisotropy ...

  23. Data-driven decision making in online and offline retail/

    Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020. ... .Retail operations have experienced a transformational change in the past decade with the advent and adoption of data-driven approaches to drive decision making. Granular data collection has enabled firms to make ...

  24. First Principles and Machine Learning-Based Analyses of Stability and

    In summary, this thesis examines stability and reactivity trends of a multitude of alloy catalysts, from conventional bimetallic alloys to high-entropy alloys, using a combination of first principles approaches (involving Density Functional Theory calculations) and machine learning approaches comprising graph convolutional network models.