how-implement-hypothesis-driven-development

How to Implement Hypothesis-Driven Development

Remember back to the time when we were in high school science class. Our teachers had a framework for helping us learn – an experimental approach based on the best available evidence at hand. We were asked to make observations about the world around us, then attempt to form an explanation or hypothesis to explain what we had observed. We then tested this hypothesis by predicting an outcome based on our theory that would be achieved in a controlled experiment – if the outcome was achieved, we had proven our theory to be correct.

We could then apply this learning to inform and test other hypotheses by constructing more sophisticated experiments, and tuning, evolving or abandoning any hypothesis as we made further observations from the results we achieved.

Experimentation is the foundation of the scientific method, which is a systematic means of exploring the world around us. Although some experiments take place in laboratories, it is possible to perform an experiment anywhere, at any time, even in software development.

Practicing  Hypothesis-Driven Development  is thinking about the development of new ideas, products and services – even organizational change – as a series of experiments to determine whether an expected outcome will be achieved. The process is iterated upon until a desirable outcome is obtained or the idea is determined to be not viable.

We need to change our mindset to view our proposed solution to a problem statement as a hypothesis, especially in new product or service development – the market we are targeting, how a business model will work, how code will execute and even how the customer will use it.

We do not do projects anymore, only experiments. Customer discovery and Lean Startup strategies are designed to test assumptions about customers. Quality Assurance is testing system behavior against defined specifications. The experimental principle also applies in Test-Driven Development – we write the test first, then use the test to validate that our code is correct, and succeed if the code passes the test. Ultimately, product or service development is a process to test a hypothesis about system behaviour in the environment or market it is developed for.

The key outcome of an experimental approach is measurable evidence and learning.

Learning is the information we have gained from conducting the experiment. Did what we expect to occur actually happen? If not, what did and how does that inform what we should do next?

In order to learn we need use the scientific method for investigating phenomena, acquiring new knowledge, and correcting and integrating previous knowledge back into our thinking.

As the software development industry continues to mature, we now have an opportunity to leverage improved capabilities such as Continuous Design and Delivery to maximize our potential to learn quickly what works and what does not. By taking an experimental approach to information discovery, we can more rapidly test our solutions against the problems we have identified in the products or services we are attempting to build. With the goal to optimize our effectiveness of solving the right problems, over simply becoming a feature factory by continually building solutions.

The steps of the scientific method are to:

  • Make observations
  • Formulate a hypothesis
  • Design an experiment to test the hypothesis
  • State the indicators to evaluate if the experiment has succeeded
  • Conduct the experiment
  • Evaluate the results of the experiment
  • Accept or reject the hypothesis
  • If necessary, make and test a new hypothesis

Using an experimentation approach to software development

We need to challenge the concept of having fixed requirements for a product or service. Requirements are valuable when teams execute a well known or understood phase of an initiative, and can leverage well understood practices to achieve the outcome. However, when you are in an exploratory, complex and uncertain phase you need hypotheses.

Handing teams a set of business requirements reinforces an order-taking approach and mindset that is flawed.

Business does the thinking and ‘knows’ what is right. The purpose of the development team is to implement what they are told. But when operating in an area of uncertainty and complexity, all the members of the development team should be encouraged to think and share insights on the problem and potential solutions. A team simply taking orders from a business owner is not utilizing the full potential, experience and competency that a cross-functional multi-disciplined team offers.

Framing hypotheses

The traditional user story framework is focused on capturing requirements for what we want to build and for whom, to enable the user to receive a specific benefit from the system.

As A…. <role>

I Want… <goal/desire>

So That… <receive benefit>

Behaviour Driven Development (BDD) and Feature Injection  aims to improve the original framework by supporting communication and collaboration between developers, tester and non-technical participants in a software project.

In Order To… <receive benefit>

As A… <role>

When viewing work as an experiment, the traditional story framework is insufficient. As in our high school science experiment, we need to define the steps we will take to achieve the desired outcome. We then need to state the specific indicators (or signals) we expect to observe that provide evidence that our hypothesis is valid. These need to be stated before conducting the test to reduce biased interpretations of the results. 

If we observe signals that indicate our hypothesis is correct, we can be more confident that we are on the right path and can alter the user story framework to reflect this.

Therefore, a user story structure to support Hypothesis-Driven Development would be;

how-implement-hypothesis-driven-development

We believe < this capability >

What functionality we will develop to test our hypothesis? By defining a ‘test’ capability of the product or service that we are attempting to build, we identify the functionality and hypothesis we want to test.

Will result in < this outcome >

What is the expected outcome of our experiment? What is the specific result we expect to achieve by building the ‘test’ capability?

We will know we have succeeded when < we see a measurable signal >

What signals will indicate that the capability we have built is effective? What key metrics (qualitative or quantitative) we will measure to provide evidence that our experiment has succeeded and give us enough confidence to move to the next stage.

The threshold you use for statistically significance will depend on your understanding of the business and context you are operating within. Not every company has the user sample size of Amazon or Google to run statistically significant experiments in a short period of time. Limits and controls need to be defined by your organization to determine acceptable evidence thresholds that will allow the team to advance to the next step.

For example if you are building a rocket ship you may want your experiments to have a high threshold for statistical significance. If you are deciding between two different flows intended to help increase user sign up you may be happy to tolerate a lower significance threshold.

The final step is to clearly and visibly state any assumptions made about our hypothesis, to create a feedback loop for the team to provide further input, debate and understanding of the circumstance under which we are performing the test. Are they valid and make sense from a technical and business perspective?

Hypotheses when aligned to your MVP can provide a testing mechanism for your product or service vision. They can test the most uncertain areas of your product or service, in order to gain information and improve confidence.

Examples of Hypothesis-Driven Development user stories are;

Business story

We Believe That increasing the size of hotel images on the booking page

Will Result In improved customer engagement and conversion

We Will Know We Have Succeeded When we see a 5% increase in customers who review hotel images who then proceed to book in 48 hours.

It is imperative to have effective monitoring and evaluation tools in place when using an experimental approach to software development in order to measure the impact of our efforts and provide a feedback loop to the team. Otherwise we are essentially blind to the outcomes of our efforts.

In agile software development we define working software as the primary measure of progress.

By combining Continuous Delivery and Hypothesis-Driven Development we can now define working software and validated learning as the primary measures of progress.

Ideally we should not say we are done until we have measured the value of what is being delivered – in other words, gathered data to validate our hypothesis.

Examples of how to gather data is performing A/B Testing to test a hypothesis and measure to change in customer behaviour. Alternative testings options can be customer surveys, paper prototypes, user and/or guerrilla testing.

One example of a company we have worked with that uses Hypothesis-Driven Development is  lastminute.com . The team formulated a hypothesis that customers are only willing to pay a max price for a hotel based on the time of day they book. Tom Klein, CEO and President of Sabre Holdings shared  the story  of how they improved conversion by 400% within a week.

Combining practices such as Hypothesis-Driven Development and Continuous Delivery accelerates experimentation and amplifies validated learning. This gives us the opportunity to accelerate the rate at which we innovate while relentlessly reducing cost, leaving our competitors in the dust. Ideally we can achieve the ideal of one piece flow: atomic changes that enable us to identify causal relationships between the changes we make to our products and services, and their impact on key metrics.

As Kent Beck said, “Test-Driven Development is a great excuse to think about the problem before you think about the solution”. Hypothesis-Driven Development is a great opportunity to test what you think the problem is, before you work on the solution.

How can you achieve faster growth?

  • Work together
  • Product development
  • Ways of working

menu image

Have you read my two bestsellers, Unlearn and Lean Enterprise? If not, please do. If you have, please write a review!

  • Read my story
  • Get in touch

menu image

  • Oval Copy 2 Blog

How to Implement Hypothesis-Driven Development

  • Facebook__x28_alt_x29_ Copy

Remember back to the time when we were in high school science class. Our teachers had a framework for helping us learn – an experimental approach based on the best available evidence at hand. We were asked to make observations about the world around us, then attempt to form an explanation or hypothesis to explain what we had observed. We then tested this hypothesis by predicting an outcome based on our theory that would be achieved in a controlled experiment – if the outcome was achieved, we had proven our theory to be correct.

We could then apply this learning to inform and test other hypotheses by constructing more sophisticated experiments, and tuning, evolving, or abandoning any hypothesis as we made further observations from the results we achieved.

Experimentation is the foundation of the scientific method, which is a systematic means of exploring the world around us. Although some experiments take place in laboratories, it is possible to perform an experiment anywhere, at any time, even in software development.

Practicing Hypothesis-Driven Development [1] is thinking about the development of new ideas, products, and services – even organizational change – as a series of experiments to determine whether an expected outcome will be achieved. The process is iterated upon until a desirable outcome is obtained or the idea is determined to be not viable.

We need to change our mindset to view our proposed solution to a problem statement as a hypothesis, especially in new product or service development – the market we are targeting, how a business model will work, how code will execute and even how the customer will use it.

We do not do projects anymore, only experiments. Customer discovery and Lean Startup strategies are designed to test assumptions about customers. Quality Assurance is testing system behavior against defined specifications. The experimental principle also applies in Test-Driven Development – we write the test first, then use the test to validate that our code is correct, and succeed if the code passes the test. Ultimately, product or service development is a process to test a hypothesis about system behavior in the environment or market it is developed for.

The key outcome of an experimental approach is measurable evidence and learning. Learning is the information we have gained from conducting the experiment. Did what we expect to occur actually happen? If not, what did and how does that inform what we should do next?

In order to learn we need to use the scientific method for investigating phenomena, acquiring new knowledge, and correcting and integrating previous knowledge back into our thinking.

As the software development industry continues to mature, we now have an opportunity to leverage improved capabilities such as Continuous Design and Delivery to maximize our potential to learn quickly what works and what does not. By taking an experimental approach to information discovery, we can more rapidly test our solutions against the problems we have identified in the products or services we are attempting to build. With the goal to optimize our effectiveness of solving the right problems, over simply becoming a feature factory by continually building solutions.

The steps of the scientific method are to:

  • Make observations
  • Formulate a hypothesis
  • Design an experiment to test the hypothesis
  • State the indicators to evaluate if the experiment has succeeded
  • Conduct the experiment
  • Evaluate the results of the experiment
  • Accept or reject the hypothesis
  • If necessary, make and test a new hypothesis

Using an experimentation approach to software development

We need to challenge the concept of having fixed requirements for a product or service. Requirements are valuable when teams execute a well known or understood phase of an initiative and can leverage well-understood practices to achieve the outcome. However, when you are in an exploratory, complex and uncertain phase you need hypotheses. Handing teams a set of business requirements reinforces an order-taking approach and mindset that is flawed. Business does the thinking and ‘knows’ what is right. The purpose of the development team is to implement what they are told. But when operating in an area of uncertainty and complexity, all the members of the development team should be encouraged to think and share insights on the problem and potential solutions. A team simply taking orders from a business owner is not utilizing the full potential, experience and competency that a cross-functional multi-disciplined team offers.

Framing Hypotheses

The traditional user story framework is focused on capturing requirements for what we want to build and for whom, to enable the user to receive a specific benefit from the system.

As A…. <role>

I Want… <goal/desire>

So That… <receive benefit>

Behaviour Driven Development (BDD) and Feature Injection aims to improve the original framework by supporting communication and collaboration between developers, tester and non-technical participants in a software project.

In Order To… <receive benefit>

As A… <role>

When viewing work as an experiment, the traditional story framework is insufficient. As in our high school science experiment, we need to define the steps we will take to achieve the desired outcome. We then need to state the specific indicators (or signals) we expect to observe that provide evidence that our hypothesis is valid. These need to be stated before conducting the test to reduce the bias of interpretation of results.

If we observe signals that indicate our hypothesis is correct, we can be more confident that we are on the right path and can alter the user story framework to reflect this.

Therefore, a user story structure to support Hypothesis-Driven Development would be;

hdd-card

We believe < this capability >

What functionality we will develop to test our hypothesis? By defining a ‘test’ capability of the product or service that we are attempting to build, we identify the functionality and hypothesis we want to test.

Will result in < this outcome >

What is the expected outcome of our experiment? What is the specific result we expect to achieve by building the ‘test’ capability?

We will have confidence to proceed when < we see a measurable signal >

What signals will indicate that the capability we have built is effective? What key metrics (qualitative or quantitative) we will measure to provide evidence that our experiment has succeeded and give us enough confidence to move to the next stage.

The threshold you use for statistical significance will depend on your understanding of the business and context you are operating within. Not every company has the user sample size of Amazon or Google to run statistically significant experiments in a short period of time. Limits and controls need to be defined by your organization to determine acceptable evidence thresholds that will allow the team to advance to the next step.

For example, if you are building a rocket ship you may want your experiments to have a high threshold for statistical significance. If you are deciding between two different flows intended to help increase user sign up you may be happy to tolerate a lower significance threshold.

The final step is to clearly and visibly state any assumptions made about our hypothesis, to create a feedback loop for the team to provide further input, debate, and understanding of the circumstance under which we are performing the test. Are they valid and make sense from a technical and business perspective?

Hypotheses, when aligned to your MVP, can provide a testing mechanism for your product or service vision. They can test the most uncertain areas of your product or service, in order to gain information and improve confidence.

Examples of Hypothesis-Driven Development user stories are;

Business story.

We Believe That increasing the size of hotel images on the booking page Will Result In improved customer engagement and conversion We Will Have Confidence To Proceed When  we see a 5% increase in customers who review hotel images who then proceed to book in 48 hours.

It is imperative to have effective monitoring and evaluation tools in place when using an experimental approach to software development in order to measure the impact of our efforts and provide a feedback loop to the team. Otherwise, we are essentially blind to the outcomes of our efforts.

In agile software development, we define working software as the primary measure of progress. By combining Continuous Delivery and Hypothesis-Driven Development we can now define working software and validated learning as the primary measures of progress.

Ideally, we should not say we are done until we have measured the value of what is being delivered – in other words, gathered data to validate our hypothesis.

Examples of how to gather data is performing A/B Testing to test a hypothesis and measure to change in customer behavior. Alternative testings options can be customer surveys, paper prototypes, user and/or guerilla testing.

One example of a company we have worked with that uses Hypothesis-Driven Development is lastminute.com . The team formulated a hypothesis that customers are only willing to pay a max price for a hotel based on the time of day they book. Tom Klein, CEO and President of Sabre Holdings shared the story  of how they improved conversion by 400% within a week.

Combining practices such as Hypothesis-Driven Development and Continuous Delivery accelerates experimentation and amplifies validated learning. This gives us the opportunity to accelerate the rate at which we innovate while relentlessly reducing costs, leaving our competitors in the dust. Ideally, we can achieve the ideal of one-piece flow: atomic changes that enable us to identify causal relationships between the changes we make to our products and services, and their impact on key metrics.

As Kent Beck said, “Test-Driven Development is a great excuse to think about the problem before you think about the solution”. Hypothesis-Driven Development is a great opportunity to test what you think the problem is before you work on the solution.

We also run a  workshop to help teams implement Hypothesis-Driven Development . Get in touch to run it at your company. 

[1]  Hypothesis-Driven Development  By Jeffrey L. Taylor

More strategy insights

Creating new markets, scaling the heights of human performance with annastiina hintsa, the ceo of hintsa performance, how high performance organizations innovate at scale, read my newsletter.

Insights in every edition. News you can use. No spam, ever. Read the latest edition

We've just sent you your first email. Go check it out!

.

  • Explore Insights
  • Nobody Studios
  • LinkedIn Learning: High Performance Organizations

Mobile Menu

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

HDD & More from Me

Hypothesis-Driven Development (Practitioner’s Guide)

Table of Contents

What is hypothesis-driven development (HDD)?

How do you know if it’s working, how do you apply hdd to ‘continuous design’, how do you apply hdd to application development, how do you apply hdd to continuous delivery, how does hdd relate to agile, design thinking, lean startup, etc..

Like agile, hypothesis-driven development (HDD) is more a point of view with various associated practices than it is a single, particular practice or process. That said, my goal here for is you to leave with a solid understanding of how to do HDD and a specific set of steps that work for you to get started.

After reading this guide and trying out the related practice you will be able to:

  • Diagnose when and where hypothesis-driven development (HDD) makes sense for your team
  • Apply techniques from HDD to your work in small, success-based batches across your product pipeline
  • Frame and enhance your existing practices (where applicable) with HDD

Does your product program feel like a Netflix show you’d binge watch? Is your team excited to see what happens when you release stuff? If so, congratulations- you’re already doing it and please hit me up on Twitter so we can talk about it! If not, don’t worry- that’s pretty normal, but HDD offers some awesome opportunities to work better.

Scientific-Method

Building on the scientific method, HDD is a take on how to integrate test-driven approaches across your product development activities- everything from creating a user persona to figuring out which integration tests to automate. Yeah- wow, right?! It is a great way to energize and focus your practice of agile and your work in general.

By product pipeline, I mean the set of processes you and your team undertake to go from a certain set of product priorities to released product. If you’re doing agile, then iteration (sprints) is a big part of making these work.

Product-Pipeline-Cowan.001

It wouldn’t be very hypothesis-driven if I didn’t have an answer to that! In the diagram above, you’ll find metrics for each area. For your application of HDD to what we’ll call continuous design, your metric to improve is the ratio of all your release content to the release content that meets or exceeds your target metrics on user behavior. For example, if you developed a new, additional way for users to search for products and set the success threshold at it being used in >10% of users sessions, did that feature succeed or fail by that measure? For application development, the metric you’re working to improve is basically velocity, meaning story points or, generally, release content per sprint. For continuous delivery, it’s how often you can release. Hypothesis testing is, of course, central to HDD and generally doing agile with any kind focus on valuable outcomes, and I think it shares the metric on successful release content with continuous design.

hypothesis driven

The first component is team cost, which you would sum up over whatever period you’re measuring. This includes ‘c $ ’, which is total compensation as well as loading (benefits, equipment, etc.) as well as ‘g’ which is the cost of the gear you use- that might be application infrastructure like AWS, GCP, etc. along with any other infrastructure you buy or share with other teams. For example, using a backend-as-a-service like Heroku or Firebase might push up your value for ‘g’ while deferring the cost of building your own app infrastructure.

The next component is release content, fe. If you’re already estimating story points somehow, you can use those. If you’re a NoEstimates crew, and, hey, I get it, then you’d need to do some kind of rough proportional sizing of your release content for the period in question. The next term, r f , is optional but this is an estimate of the time you’re having to invest in rework, bug fixes, manual testing, manual deployment, and anything else that doesn’t go as planned.

The last term, s d , is one of the most critical and is an estimate of the proportion of your release content that’s successful relative to the success metrics you set for it. For example, if you developed a new, additional way for users to search for products and set the success threshold at it being used in >10% of users sessions, did that feature succeed or fail by that measure? Naturally, if you’re not doing this it will require some work and changing your habits, but it’s hard to deliver value in agile if you don’t know what that means and define it against anything other than actual user behavior.

Here’s how some of the key terms lay out in the product pipeline:

hypothesis driven

The example here shows how a team might tabulate this for a given month:

hypothesis driven

Is the punchline that you should be shooting for a cost of $1,742 per story point? No. First, this is for a single month and would only serve the purpose of the team setting a baseline for itself. Like any agile practice, the interesting part of this is seeing how your value for ‘F’ changes from period to period, using your team retrospectives to talk about how to improve it. Second, this is just a single team and the economic value (ex: revenue) related to a given story point will vary enormously from product to product. There’s a Google Sheets-based calculator that you can use here: Innovation Accounting with ‘F’ .

Like any metric, ‘F’ only matters if you find it workable to get in the habit of measuring it and paying attention to it. As a team, say, evaluates its progress on OKR (objectives and key results), ‘F’ offers a view on the health of the team’s collaboration together in the context of their product and organization. For example, if the team’s accruing technical debt, that will show up as a steady increase in ‘F’. If a team’s invested in test or deploy automation or started testing their release content with users more specifically, that should show up as a steady lowering of ‘F’.

In the next few sections, we’ll step through how to apply HDD to your product pipeline by area, starting with continuous design.

pipeline-continuous-design

It’s a mistake to ask your designer to explain every little thing they’re doing, but it’s also a mistake to decouple their work from your product’s economics. On the one hand, no one likes someone looking over their shoulder and you may not have the professional training to reasonably understand what they’re doing hour to hour, even day to day. On the other hand, it’s a mistake not to charter a designer’s work without a testable definition of success and not to collaborate around that.

Managing this is hard since most of us aren’t designers and because it takes a lot of work and attention to detail to work out what you really want to achieve with a given design.

Beginning with the End in Mind

The difference between art and design is intention- in design we always have one and, in practice, it should be testable. For this, I like the practice of customer experience (CX) mapping. CX mapping is a process for focusing the work of a team on outcomes–day to day, week to week, and quarter to quarter. It’s amenable to both qualitative and quantitative evidence but it is strictly focused on observed customer behaviors, as opposed to less direct, more lagging observations.

CX mapping works to define the CX in testable terms that are amenable to both qualitative and quantitative evidence. Specifically for each phase of a potential customer getting to behaviors that accrue to your product/market fit (customer funnel), it answers the following questions:

1. What do we mean by this phase of the customer funnel? 

What do we mean by, say, ‘Acquisition’ for this product or individual feature? How would we know it if we see it?

2. How do we observe this (in quantitative terms)? What’s the DV?

This come next after we answer the question “What does this mean?”. The goal is to come up with a focal single metric (maybe two), a ‘dependent variable’ (DV) that tells you how a customer has behaved in a given phase of the CX (ex: Acquisition, Onboarding, etc.).

3. What is the cut off for a transition?

Not super exciting, but extremely important in actual practice, the idea here is to establish the cutoff for deciding whether a user has progressed from one phase to the next or abandoned/churned.

4. What is our ‘Line in the Sand’ threshold?

Popularized by the book ‘Lean Analytics’, the idea here is that good metrics are ones that change a team’s behavior (decisions) and for that you need to establish a threshold in advance for decision making.

5. How might we test this? What new IVs are worth testing?

The ‘independent variables’ (IV’s) you might test are basically just ideas for improving the DV (#2 above).

6. What’s tricky? What do we need to watch out for?

Getting this working will take some tuning, but it’s infinitely doable and there aren’t a lot of good substitutes for focusing on what’s a win and what’s a waste of time.

The image below shows a working CX map for a company (HVAC in a Hurry) that services commercial heating, ventilation, and air-conditioning systems. And this particular CX map is for the specific ‘job’/task/problem of how their field technicians get the replacement parts they need.

CX-Map-Full-HinH

For more on CX mapping you can also check out it’s page- Tutorial: Customer Experience (CX) Mapping .

Unpacking Continuous Design for HDD

For the unpacking the work of design/Continuous Design with HDD , I like to use the ‘double diamond’ framing of ‘right problem’ vs. ‘right solution’, which I first learned about in Donald Norman’s seminal book, ‘The Design of Everyday Things’.

I’ve organized the balance of this section around three big questions:

How do you test that you’ve found the ‘Right Problem’?

How do you test that you’ve found demand and have the ‘right solution’, how do you test that you’ve designed the ‘right solution’.

hdd+design-thinking-UX

Let’s say it’s an internal project- a ‘digital transformation’ for an HVAC (heating, ventilation, and air conditioning) service company. The digital team thinks it would be cool to organize the documentation for all the different HVAC equipment the company’s technicians service. But, would it be?

The only way to find out is to go out and talk to these technicians and find out! First, you need to test whether you’re talking to someone who is one of these technicians. For example, you might have a screening question like: ‘How many HVAC’s did you repair last week?’. If it’s <10,  you might instead be talking to a handyman or a manager (or someone who’s not an HVAC tech at all).

Second, you need to ask non-leading questions. The evidentiary value of a specific answer to a general question is much higher than a specific answer to a specific questions. Also, some questions are just leading. For example, if you ask such a subject ‘Would you use a documentation system if we built it?’, they’re going to say yes, just to avoid the awkwardness and sales pitch they expect if they say no.

How do you draft personas? Much more renowned designers than myself (Donald Norman among them) disagree with me about this, but personally I like to draft my personas while I’m creating my interview guide and before I do my first set of interviews. Whether you draft or interview first is also of secondary important if you’re doing HDD- if you’re not iteratively interviewing and revising your material based on what you’ve found, it’s not going to be very functional anyway.

Really, the persona (and the jobs-to-be-done) is a means to an end- it should be answering some facet of the question ‘Who is our customer, and what’s important to them?’. It’s iterative, with a process that looks something like this:

personas-process-v3

How do you draft jobs-to-be-done? Personally- I like to work these in a similar fashion- draft, interview, revise, and then repeat, repeat, repeat.

You’ll use the same interview guide and subjects for these. The template is the same as the personas, but I maintain a separate (though related) tutorial for these–

A guide on creating Jobs-to-be-Done (JTBD) A template for drafting jobs-to-be-done (JTBD)

How do you interview subjects? And, action! The #1 place I see teams struggle is at the beginning and it’s with the paradox that to get to a big market you need to nail a series of small markets. Sure, they might have heard something about segmentation in a marketing class, but here you need to apply that from the very beginning.

The fix is to create a screener for each persona. This is a factual question whose job is specifically and only to determine whether a given subject does or does not map to your target persona. In the HVAC in a Hurry technician persona (see above), you might have a screening question like: ‘How many HVAC’s did you repair last week?’. If it’s <10,  you might instead be talking to a handyman or a manager (or someone who’s not an HVAC tech at all).

And this is the point where (if I’ve made them comfortable enough to be candid with me) teams will ask me ‘But we want to go big- be the next Facebook.’ And then we talk about how just about all those success stories where there’s a product that has for all intents and purpose a universal user base started out by killing it in small, specific segments and learning and growing from there.

Sorry for all that, reader, but I find all this so frequently at this point and it’s so crucial to what I think is a healthy practice of HDD it seemed necessary.

The key with the interview guide is to start with general questions where you’re testing for a specific answer and then progressively get into more specific questions. Here are some resources–

An example interview guide related to the previous tutorials A general take on these interviews in the context of a larger customer discovery/design research program A template for drafting an interview guide

To recap, what’s a ‘Right Problem’ hypothesis? The Right Problem (persona and PS/JTBD) hypothesis is the most fundamental, but the hardest to pin down. You should know what kind of shoes your customer wears and when and why they use your product. You should be able to apply factual screeners to identify subjects that map to your persona or personas.

You should know what people who look like/behave like your customer who don’t use your product are doing instead, particularly if you’re in an industry undergoing change. You should be analyzing your quantitative data with strong, specific, emphatic hypotheses.

If you make software for HVAC (heating, ventilation and air conditioning) technicians, you should have a decent idea of what you’re likely to hear if you ask such a person a question like ‘What are the top 5 hardest things about finishing an HVAC repair?’

In summary, HDD here looks something like this:

Persona-Hypothesis

01 IDEA : The working idea is that you know your customer and you’re solving a problem/doing a job (whatever term feels like it fits for you) that is important to them. If this isn’t the case, everything else you’re going to do isn’t going to matter.

Also, you know the top alternatives, which may or may not be what you see as your direct competitors. This is important as an input into focused testing demand to see if you have the Right Solution.

02 HYPOTHESIS : If you ask non-leading questions (like ‘What are the top 5 hardest things about finishing an HVAC repair?’), then you should generally hear relatively similar responses.

03 EXPERIMENTAL DESIGN : You’ll want an Interview Guide and, critically, a screener. This is a factual question you can use to make sure any given subject maps to your persona. With the HVAC repair example, this would be something like ‘How many HVAC repairs have you done in the last week?’ where you’re expecting an answer >5. This is important because if your screener isn’t tight enough, your interview responses may not converge.

04 EXPERIMENTATION : Get out and interview some subjects- but with a screener and an interview guide. The resources above has more on this, but one key thing to remember is that the interview guide is a guide, not a questionnaire. Your job is to make the interaction as normal as possible and it’s perfectly OK to skip questions or change them. It’s also 1000% OK to revise your interview guide during the process.

05: PIVOT OR PERSEVERE : What did you learn? Was it consistent? Good results are: a) We didn’t know what was on their A-list and what alternatives they are using, but we do know. b) We knew what was on their A-list and what alternatives they are using- we were pretty much right (doesn’t happen as much as you’d think). c) Our interviews just didn’t work/converge. Let’s try this again with some changes (happens all the time to smart teams and is very healthy).

By this, I mean: How do you test whether you have demand for your proposition? How do you know whether it’s better enough at solving a problem (doing a job, etc.) than the current alternatives your target persona has available to them now?

If an existing team was going to pick one of these areas to start with, I’d pick this one. While they’ll waste time if they haven’t found the right problem to solve and, yes, usability does matter, in practice this area of HDD is a good forcing function for really finding out what the team knows vs. doesn’t. This is why I show it as a kind of fulcrum between Right Problem and Right Solution:

Right-Solution-VP

This is not about usability and it does not involve showing someone a prototype, asking them if they like it, and checking the box.

Lean Startup offers a body of practice that’s an excellent fit for this. However, it’s widely misused because it’s so much more fun to build stuff than to test whether or not anyone cares about your idea. Yeah, seriously- that is the central challenge of Lean Startup.

Here’s the exciting part: You can massively improve your odds of success. While Lean Startup does not claim to be able to take any idea and make it successful, it does claim to minimize waste- and that matters a lot. Let’s just say that a new product or feature has a 1 in 5 chance of being successful. Using Lean Startup, you can iterate through 5 ideas in the space it would take you to build 1 out (and hope for the best)- this makes the improbably probable which is pretty much the most you can ask for in the innovation game .

Build, measure, learn, right? Kind of. I’ll harp on this since it’s important and a common failure mode relate to Lean Startup: an MVP is not a 1.0. As the Lean Startup folks (and Eric Ries’ book) will tell you, the right order is learn, build, measure. Specifically–

Learn: Who your customer is and what matters to them (see Solving the Right Problem, above). If you don’t do this, you’ll throwing darts with your eyes closed. Those darts are a lot cheaper than the darts you’d throw if you were building out the solution all the way (to strain the metaphor some), but far from free.

In particular, I see lots of teams run an MVP experiment and get confusing, inconsistent results. Most of the time, this is because they don’t have a screener and they’re putting the MVP in front of an audience that’s too wide ranging. A grandmother is going to respond differently than a millennial to the same thing.

Build : An experiment, not a real product, if at all possible (and it almost always is). Then consider MVP archetypes (see below) that will deliver the best results and try them out. You’ll likely have to iterate on the experiment itself some, particularly if it’s your first go.

Measure : Have metrics and link them to a kill decision. The Lean Startup term is ‘pivot or persevere’, which is great and makes perfect sense, but in practice the pivot/kill decisions are hard and as you decision your experiment you should really think about what metrics and thresholds are really going to convince you.

How do you code an MVP? You don’t. This MVP is a means to running an experiment to test motivation- so formulate your experiment first and then figure out an MVP that will get you the best results with the least amount of time and money. Just since this is a practitioner’s guide, with regard to ‘time’, that’s both time you’ll have to invest as well as how long the experiment will take to conclude. I’ve seen them both matter.

The most important first step is just to start with a simple hypothesis about your idea, and I like the form of ‘If we [do something] for [a specific customer/persona], then they will [respond in a specific, observable way that we can measure]. For example, if you’re building an app for parents to manage allowances for their children, it would be something like ‘If we offer parents and app to manage their kids’ allowances, they will download it, try it, make a habit of using it, and pay for a subscription.’

All that said, for getting started here is- A guide on testing with Lean Startup A template for creating motivation/demand experiments

To recap, what’s a Right Solution hypothesis for testing demand? The core hypothesis is that you have a value proposition that’s better enough than the target persona’s current alternatives that you’re going to acquire customers.

As you may notice, this creates a tight linkage with your testing from Solving the Right Problem. This is important because while testing value propositions with Lean Startup is way cheaper than building product, it still takes work and you can only run a finite set of tests. So, before you do this kind of testing I highly recommend you’ve iterated to validated learning on the what you see below: a persona, one or more PS/JTBD, the alternatives they’re using, and a testable view of why your VP is going to displace those alternatives. With that, your odds of doing quality work in this area dramatically increase!

trent-value-proposition.001

What’s the testing, then? Well, it looks something like this:

hypothesis driven

01 IDEA : Most practicing scientists will tell you that the best way to get a good experimental result is to start with a strong hypothesis. Validating that you have the Right Problem and know what alternatives you’re competing against is critical to making investments in this kind of testing yield valuable results.

With that, you have a nice clear view of what alternative you’re trying to see if you’re better than.

02 HYPOTHESIS : I like a cause an effect stated here, like: ‘If we [offer something to said persona], they will [react in some observable way].’ This really helps focus your work on the MVP.

03 EXPERIMENTAL DESIGN : The MVP is a means to enable an experiment. It’s important to have a clear, explicit declaration of that hypothesis and for the MVP to delivery a metric for which you will (in advance) decide on a fail threshold. Most teams find it easier to kill an idea decisively with a kill metric vs. a success metric, even though they’re literally different sides of the same threshold.

04 EXPERIMENTATION : It is OK to tweak the parameters some as you run the experiment. For example, if you’re running a Google AdWords test, feel free to try new and different keyword phrases.

05: PIVOT OR PERSEVERE : Did you end up above or below your fail threshold? If below, pivot and focus on something else. If above, great- what is the next step to scaling up this proposition?

How does this related to usability? What’s usability vs. motivation? You might reasonably wonder: If my MVP has something that’s hard to understand, won’t that affect the results? Yes, sure. Testing for usability and the related tasks of building stuff are much more fun and (short-term) gratifying. I can’t emphasize enough how much harder it is for most founders, etc. is to push themselves to focus on motivation.

There’s certainly a relationship and, as we transition to the next section on usability, it seems like a good time to introduce the relationship between motivation and usability. My favorite tool for this is BJ Fogg’s Fogg Curve, which appears below. On the y-axis is motivation and on the x-axis is ‘ability’, the inverse of usability. If you imagine a point in the upper left, that would be, say, a cure for cancer where no matter if it’s hard to deal with you really want. On the bottom right would be something like checking Facebook- you may not be super motivated but it’s so easy.

The punchline is that there’s certainly a relationship but beware that for most of us our natural bias is to neglect testing our hypotheses about motivation in favor of testing usability.

Fogg-Curve

First and foremost, delivering great usability is a team sport. Without a strong, co-created narrative, your performance is going to be sub-par. This means your developers, testers, analysts should be asking lots of hard, inconvenient (but relevant) questions about the user stories. For more on how these fit into an overall design program, let’s zoom out and we’ll again stand on the shoulders of Donald Norman.

Usability and User Cognition

To unpack usability in a coherent, testable fashion, I like to use Donald Norman’s 7-step model of user cognition:

user-cognition

The process starts with a Goal and that goals interacts with an object in an environment, the ‘World’. With the concepts we’ve been using here, the Goal is equivalent to a job-to-be-done. The World is your application in whatever circumstances your customer will use it (in a cubicle, on a plane, etc.).

The Reflective layer is where the customer is making a decision about alternatives for their JTBD/PS. In his seminal book, The Design of Everyday Things, Donald Normal’s is to continue reading a book as the sun goes down. In the framings we’ve been using, we looked at understanding your customers Goals/JTBD in ‘How do you test that you’ve found the ‘right problem’?’, and we looked evaluating their alternatives relative to your own (proposition) in ‘How do you test that you’ve found the ‘right solution’?’.

The Behavioral layer is where the user interacts with your application to get what they want- hopefully engaging with interface patterns they know so well they barely have to think about it. This is what we’ll focus on in this section. Critical here is leading with strong narrative (user stories), pairing those with well-understood (by your persona) interface patterns, and then iterating through qualitative and quantitative testing.

The Visceral layer is the lower level visual cues that a user gets- in the design world this is a lot about good visual design and even more about visual consistency. We’re not going to look at that in depth here, but if you haven’t already I’d make sure you have a working style guide to ensure consistency (see  Creating a Style Guide ).

How do you unpack the UX Stack for Testability? Back to our example company, HVAC in a Hurry, which services commercial heating, ventilation, and A/C systems, let’s say we’ve arrived at the following tested learnings for Trent the Technician:

As we look at how we’ll iterate to the right solution in terms of usability, let’s say we arrive at the following user story we want to unpack (this would be one of many, even just for the PS/JTBD above):

As Trent the Technician, I know the part number and I want to find it on the system, so that I can find out its price and availability.

Let’s step through the 7 steps above in the context of HDD, with a particular focus on achieving strong usability.

1. Goal This is the PS/JTBD: Getting replacement parts to a job site. An HDD-enabled team would have found this out by doing customer discovery interviews with subjects they’ve screened and validated to be relevant to the target persona. They would have asked non-leading questions like ‘What are the top five hardest things about finishing an HVAC repair?’ and consistently heard that one such thing is sorting our replacement parts. This validates the PS/JTBD hypothesis that said PS/JTBD matters.

2. Plan For the PS/JTBD/Goal, which alternative are they likely to select? Is our proposition better enough than the alternatives? This is where Lean Startup and demand/motivation testing is critical. This is where we focused in ‘How do you test that you’ve found the ‘right solution’?’ and the HVAC in a Hurry team might have run a series of MVP to both understand how their subject might interact with a solution (concierge MVP) as well as whether they’re likely to engage (Smoke Test MVP).

3. Specify Our first step here is just to think through what the user expects to do and how we can make that as natural as possible. This is where drafting testable user stories, looking at comp’s, and then pairing clickable prototypes with iterative usability testing is critical. Following that, make sure your analytics are answering the same questions but at scale and with the observations available.

4. Perform If you did a good job in Specify and there are not overt visual problems (like ‘Can I click this part of the interface?’), you’ll be fine here.

5. Perceive We’re at the bottom of the stack and looping back up from World: Is the feedback from your application readily apparent to the user? For example, if you turn a switch for a lightbulb, you know if it worked or not. Is your user testing delivering similar clarity on user reactions?

6. Interpret Do they understand what they’re seeing? Does is make sense relative to what they expected to happen. For example, if the user just clicked ‘Save’, do they’re know that whatever they wanted to save is saved and OK? Or not?

7. Compare Have you delivered your target VP? Did they get what they wanted relative to the Goal/PS/JTBD?

How do you draft relevant, focused, testable user stories? Without these, everything else is on a shaky foundation. Sometimes, things will work out. Other times, they won’t. And it won’t be that clear why/not. Also, getting in the habit of pushing yourself on the relevance and testability of each little detail will make you a much better designer and a much better steward of where and why your team invests in building software.

For getting started here is- A guide on creating user stories A template for drafting user stories

How do you create find the relevant patterns and apply them? Once you’ve got great narrative, it’s time to put the best-understood, most expected, most relevant interface patterns in front of your user. Getting there is a process.

For getting started here is- A guide on interface patterns and prototyping

How do you run qualitative user testing early and often? Once you’ve got great something to test, it’s time to get that design in front of a user, give them a prompt, and see what happens- then rinse and repeat with your design.

For getting started here is- A guide on qualitative usability testing A template for testing your user stories

How do you focus your outcomes and instrument actionable observation? Once you release product (features, etc.) into the wild, it’s important to make sure you’re always closing the loop with analytics that are a regular part of your agile cadences. For example, in a high-functioning practice of HDD the team should be interested in and  reviewing focused analytics to see how their pair with the results of their qualitative usability testing.

For getting started here is- A guide on quantitative usability testing with Google Analytics .

To recap, what’s a Right Solution hypothesis for usability? Essentially, the usability hypothesis is that you’ve arrived at a high-performing UI pattern that minimizes the cognitive load, maximizes the user’s ability to act on their motivation to connect with your proposition.

Right-Solution-Usability-Hypothesis

01 IDEA : If you’re writing good user stories , you already have your ideas implemented in the form of testable hypotheses. Stay focused and use these to anchor your testing. You’re not trying to test what color drop-down works best- you’re testing which affordances best deliver on a given user story.

02 HYPOTHESIS : Basically, the hypothesis is that ‘For [x] user story, this interface pattern will perform will, assuming we supply the relevant motivation and have the right assessments in place.

03 EXPERIMENTAL DESIGN : Really, this means have a tests set up that, beyond working, links user stories to prompts and narrative which supply motivation and have discernible assessments that help you make sure the subject didn’t click in the wrong place by mistake.

04 EXPERIMENTATION : It is OK to iterate on your prototypes and even your test plan in between sessions, particularly at the exploratory stages.

05: PIVOT OR PERSEVERE : Did the patterns perform well, or is it worth reviewing patterns and comparables and giving it another go?

There’s a lot of great material and successful practice on the engineering management part of application development. But should you pair program? Do estimates or go NoEstimates? None of these are the right choice for every team all of the time. In this sense, HDD is the only way to reliably drive up your velocity, or f e . What I love about agile is that fundamental to its design is the coupling and integration of working out how to make your release content successful while you’re figuring out how to make your team more successful.

What does HDD have to offer application development, then? First, I think it’s useful to consider how well HDD integrates with agile in this sense and what existing habits you can borrow from it to improve your practice of HDD. For example, let’s say your team is used to doing weekly retrospectives about its practice of agile. That’s the obvious place to start introducing a retrospective on how your hypothesis testing went and deciding what that should mean for the next sprint’s backlog.

Second, let’s look at the linkage from continuous design. Primarily, what we’re looking to do is move fewer designs into development through more disciplined experimentation before we invest in development. This leaves the developers the do things better and keep the pipeline healthier (faster and able to produce more content or story points per sprint). We’d do this by making sure we’re dealing with a user that exists, a job/problem that exists for them, and only propositions that we’ve successfully tested with non-product MVP’s.

But wait– what does that exactly mean: ‘only propositions that we’ve successfully tested with non-product MVP’s’? In practice, there’s no such thing as fully validating a proposition. You’re constantly looking at user behavior and deciding where you’d be best off improving. To create balance and consistency from sprint to sprint, I like to use a ‘ UX map ‘. You can read more about it at that link but the basic idea is that for a given JTBD:VP pairing you map out the customer experience (CX) arc broken into progressive stages that each have a description, a dependent variable you’ll observe to assess success, and ideas on things (independent variables or ‘IV’s’) to test. For example, here’s what such a UX map might look like for HVAC in a Hurry’s work on the JTBD of ‘getting replacement parts to a job site’.

hypothesis driven

From there, how can we use HDD to bring better, more testable design into the development process? One thing I like to do with user stories and HDD is to make a habit of pairing every single story with a simple, analytical question that would tell me whether the story is ‘done’ from the standpoint of creating the target user behavior or not. From there, I consider focal metrics. Here’s what that might look like at HinH.

hypothesis driven

For the last couple of decades, test and deploy/ops was often treated like a kind of stepchild to the development- something that had to happen at the end of development and was the sole responsibility of an outside group of specialists. It didn’t make sense then, and now an integral test capability is table stakes for getting to a continuous product pipeline, which at the core of HDD itself.

A continuous pipeline means that you release a lot. Getting good at releasing relieves a lot of energy-draining stress on the product team as well as creating the opportunity for rapid learning that HDD requires. Interestingly, research by outfits like DORA (now part of Google) and CircleCI shows teams that are able to do this both release faster and encounter fewer bugs in production.

Amazon famously releases code every 11.6 seconds. What this means is that a developer can push a button to commit code and everything from there to that code showing up in front of a customer is automated. How does that happen? For starters, there are two big (related) areas: Test & Deploy.

While there is some important plumbing that I’ll cover in the next couple of sections, in practice most teams struggle with test coverage. What does that mean? In principal, what it means is that even though you can’t test everything, you iterate to test automation coverage that is catching most bugs before they end up in front of a user. For most teams, that means a ‘pyramid’ of tests like you see here, where the x-axis the number of tests and the y-axis is the level of abstraction of the tests.

test-pyramid-v2

The reason for the pyramid shape is that the tests are progressively more work to create and maintain, and also each one provides less and less isolation about where a bug actually resides. In terms of iteration and retrospectives, what this means is that you’re always asking ‘What’s the lowest level test that could have caught this bug?’.

Unit tests isolate the operation of a single function and make sure it works as expected. Integration tests span two functions and system tests, as you’d guess, more or less emulate the way a user or endpoint would interact with a system.

Feature Flags: These are a separate but somewhat complimentary facility. The basic idea is that as you add new features, they each have a flag that can enable or disable them. They are start out disabled and you make sure they don’t break anything. Then, on small sets of users, you can enable them and test whether a) the metrics look normal and nothing’s broken and, closer to the core of HDD, whether users are actually interacting with the new feature.

In the olden days (which is when I last did this kind of thing for work), if you wanted to update a web application, you had to log in to a server, upload the software, and then configure it, maybe with the help of some scripts. Very often, things didn’t go accordingly to plan for the predictable reason that there was a lot of opportunity for variation between how the update was tested and the machine you were updating, not to mention how you were updating.

Now computers do all that- but you still have to program them. As such, the job of deployment has increasingly become a job where you’re coding solutions on top of platforms like Kubernetes, Chef, and Terraform. These folks are (hopefully) working closely with developers on this. For example, rather than spending time and money on writing documentation for an upgrade, the team would collaborate on code/config. that runs on the kind of application I mentioned earlier.

Pipeline Automation

Most teams with a continuous pipeline orchestrate something like what you see below with an application made for this like Jenkins or CircleCI. The Manual Validation step you see is, of course, optional and not a prevalent part of a truly continuous delivery. In fact, if you automate up to the point of a staging server or similar before you release, that’s what’s generally called continuous integration.

Finally, the two yellow items you see are where the team centralizes their code (version control) and the build that they’re taking from commit to deploy (artifact repository).

Continuous-Delivery

To recap, what’s the hypothesis?

Well, you can’t test everything but you can make sure that you’re testing what tends to affect your users and likewise in the deployment process. I’d summarize this area of HDD as follows:

CD-Hypothesis

01 IDEA : You can’t test everything and you can’t foresee everything that might go wrong. This is important for the team to internalize. But you can iteratively, purposefully focus your test investments.

02 HYPOTHESIS : Relative to the test pyramid, you’re looking to get to a place where you’re finding issues with the least expensive, least complex test possible- not an integration test when a unit test could have caught the issue, and so forth.

03 EXPERIMENTAL DESIGN : As you run integrations and deployments, you see what happens! Most teams move from continuous integration (deploy-ready system that’s not actually in front of customers) to continuous deployment.

04 EXPERIMENTATION : In  retrospectives, it’s important to look at the tests suite and ask what would have made the most sense and how the current processes were or weren’t facilitating that.

05: PIVOT OR PERSEVERE : It takes work, but teams get there all the time- and research shows they end up both releasing more often and encounter fewer production bugs, believe it or not!

Topline, I would say it’s a way to unify and focus your work across those disciplines. I’ve found that’s a pretty big deal. While none of those practices are hard to understand, practice on the ground is patchy. Usually, the problem is having the confidence that doing things well is going to be worthwhile, and knowing who should be participating when.

My hope is that with this guide and the supporting material (and of course the wider body of practice), that teams will get in the habit of always having a set of hypotheses and that will improve their work and their confidence as a team.

Naturally, these various disciplines have a lot to do with each other, and I’ve summarized some of that here:

Hypothesis-Driven-Dev-Diagram

Mostly, I find practitioners learn about this through their work, but I’ll point out a few big points of intersection that I think are particularly notable:

  • Learn by Observing Humans We all tend to jump on solutions and over invest in them when we should be observing our user, seeing how they behave, and then iterating. HDD helps reinforce problem-first diagnosis through its connections to relevant practice.
  • Focus on What Users Actually Do A lot of thing might happen- more than we can deal with properly. The goods news is that by just observing what actually happens you can make things a lot easier on yourself.
  • Move Fast, but Minimize Blast Radius Working across so many types of org’s at present (startups, corporations, a university), I can’t overstate how important this is and yet how big a shift it is for more traditional organizations. The idea of ‘moving fast and breaking things’ is terrifying to these places, and the reality is with practice you can move fast and rarely break things/only break them a tiny bit. Without this, you end up stuck waiting for someone else to create the perfect plan or for that next super important hire to fix everything (spoiler: it won’t and they don’t).
  • Minimize Waste Succeeding at innovation is improbable, and yet it happens all the time. Practices like Lean Startup do not warrant that by following them you’ll always succeed; however, they do promise that by minimizing waste you can test five ideas in the time/money/energy it would otherwise take you to test one, making the improbable probable.

What I love about Hypothesis-Driven Development is that it solves a really hard problem with practice: that all these behaviors are important and yet you can’t learn to practice them all immediately. What HDD does is it gives you a foundation where you can see what’s similar across these and how your practice in one is reenforcing the other. It’s also a good tool to decide where you need to focus on any given project or team.

Copyright © 2022 Alex Cowan · All rights reserved.

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

LogRocket blog logo

  • Product Management
  • Solve User-Reported Issues
  • Find Issues Faster
  • Optimize Conversion and Adoption

What is hypothesis-driven development?

hypothesis driven

Uncertainty is one of the biggest challenges of modern product development. Most often, there are more question marks than answers available.

What Is Hypothesis Driven Development

This fact forces us to work in an environment of ambiguity and unpredictability.

Instead of combatting this, we should embrace the circumstances and use tools and solutions that excel in ambiguity. One of these tools is a hypothesis-driven approach to development.

Hypothesis-driven development in a nutshell

As the name suggests, hypothesis-driven development is an approach that focuses development efforts around, you guessed it, hypotheses.

To make this example more tangible, let’s compare it to two other common development approaches: feature-driven and outcome-driven.

In feature-driven development, we prioritize our work and effort based on specific features we planned and decided on upfront. The underlying goal here is predictability.

In outcome-driven development, the priorities are dictated not by specific features but by broader outcomes we want to achieve. This approach helps us maximize the value generated.

When it comes to hypothesis-driven development, the development effort is focused first and foremost on validating the most pressing hypotheses the team has. The goal is to maximize learning speed over all else.

Benefits of hypothesis-driven development

There are numerous benefits of a hypothesis-driven approach to development, but the main ones include:

Continuous learning

Mvp mindset, data-driven decision-making.

Hypothesis-driven development maximizes the amount of knowledge the team acquires with each release.

After all, if all you do is test hypotheses, each test must bring you some insight:

Continuous Learning With Hypothesis Driven Development Cycle Image

Hypothesis-driven development centers the whole prioritization and development process around learning.

Instead of designing specific features or focusing on big, multi-release outcomes, a hypothesis-driven approach forces you to focus on minimum viable solutions ( MVPs ).

After all, the primary thing you are aiming for is hypothesis validation. It often doesn’t require scalability, perfect user experience, and fully-fledged features.

hypothesis driven

Over 200k developers and product managers use LogRocket to create better digital experiences

hypothesis driven

By definition, hypothesis-driven development forces you to truly focus on MVPs and avoid overcomplicating.

In hypothesis-driven development, each release focuses on testing a particular assumption. That test then brings you new data points, which help you formulate and prioritize next hypotheses.

That’s truly a data-driven development loop that leaves little room for HiPPOs (the highest-paid person in the room’s opinion).

Guide to hypothesis-driven development

Let’s take a look at what hypothesis-driven development looks like in practice. On a high level, it consists of four steps:

  • Formulate a list of hypotheses and assumptions
  • Prioritize the list
  • Design an MVP
  • Test and repeat

1. Formulate hypotheses

The first step is to list all hypotheses you are interested in.

Everything you wish to know about your users and market, as well as things you believe you know but don’t have tangible evidence to support, is a form of a hypothesis.

At this stage, I’m not a big fan of robust hypotheses such as, “We believe that if <we do something> then <something will happen> because <some user action>.”

To have such robust hypotheses, you need a solid enough understanding of your users, and if you do have it, then odds are you don’t need hypothesis-driven development anymore.

Instead, I prefer simpler statements that are closer to assumptions than hypotheses, such as:

  • “Our users will love the feature X”
  • “The option to do X is very important for student segment”
  • “Exam preparation is an important and underserved need that our users have”

2. Prioritize

The next step in hypothesis-driven development is to prioritize all assumptions and hypotheses you have. This will create your product backlog:

Prioritization Graphic With Cards In Order Of Descending Priority

There are various prioritization frameworks and approaches out there, so choose whichever you prefer. I personally prioritize assumptions based on two main criteria:

  • How much will we gain if we positively validate the hypothesis?
  • How much will we learn during the validation process?

Your priorities, however, might differ depending on your current context.

3. Design an MVP

Hypothesis-driven development is centered around the idea of MVPs — that is, the smallest possible releases that will help you gather enough information to validate whether a given hypothesis is true.

User experience, maintainability, and product excellence are secondary.

4. Test and repeat

The last step is to launch the MVP and validate whether the actual impact and consequent user behavior validate or invalidate the initial hypothesis.

The success isn’t measured by whether the hypothesis turned out to be accurate, but by how many new insights and learnings you captured during the process.

Based on the experiment, revisit your current list of assumptions, and, if needed, adjust the priority list.

Challenges of hypothesis-driven development

Although hypothesis-driven development comes with great benefits, it’s not all wine and roses.

Let’s take a look at a few core challenges that come with a hypothesis-focused approach.

Lack of robust product experience

Focusing on validating hypotheses and underlying MVP mindset comes at a cost. Robust product experience and great UX often require polishes, optimizations, and iterations, which go against speed-focused hypothesis-driven development.

You can’t optimize for both learning and quality simultaneously.

Unfocused direction

Although hypothesis-driven development is great for gathering initial learnings, eventually, you need to start developing a focused and sustainable long-term product strategy. That’s where outcome-driven development shines.

There’s an infinite amount of explorations you can do, but at some point, you must flip the switch and narrow down your focus around particular outcomes.

Over-emphasis on MVPs

Teams that embrace a hypothesis-driven approach often fall into the trap of an “MVP only” approach. However, shipping an actual prototype is not the only way to validate an assumption or hypothesis.

You can utilize tools such as user interviews, usability tests, market research, or willingness to pay (WTP) experiments to validate most of your doubts.

There’s a thin line between being MVP-focused in development and overusing MVPs as a validation tool.

When to use hypothesis-driven development

As you’ve most likely noticed, a hypothesis-driven development isn’t a multi-tool solution that can be used in every context.

On the contrary, its challenges make it an unsuitable development strategy for many companies.

As a rule of thumb, hypothesis-driven development works best in early-stage products with a high dose of ambiguity. Focusing on hypotheses helps bring enough clarity for the product team to understand where even to focus:

When To Use Hypothesis Driven Development Grid

But once you discover your product-market fit and have a solid idea for your long-term strategy, it’s often better to shift into more outcome-focused development. You should still optimize for learning, but it should no longer be the primary focus of your development effort.

While at it, you might also consider feature-driven development as a next step. However, that works only under particular circumstances where predictability is more important than the impact itself — for example, B2B companies delivering custom solutions for their clients or products focused on compliance.

Hypothesis-driven development can be a powerful learning-maximization tool. Its focus on MVP, continuous learning process, and inherent data-driven approach to decision-making are great tools for reducing uncertainty and discovering a path forward in ambiguous settings.

Honestly, the whole process doesn’t differ much from other development processes. The primary difference is that backlog and priories focus on hypotheses rather than features or outcomes.

Start by listing your assumptions, prioritizing them as you would any other backlog, and working your way top-to-bottom by shipping MVPs and adjusting priorities as you learn more about your market and users.

However, since hypothesis-driven development often lacks long-term cohesiveness, focus, and sustainable product experience, it’s rarely a good long-term approach to product development.

I tend to stick to outcome-driven and feature-driven approaches most of the time and resort to hypothesis-driven development if the ambiguity in a particular area is so hard that it becomes challenging to plan sensibly.

Featured image source: IconScout

LogRocket generates product insights that lead to meaningful action

Get your teams on the same page — try LogRocket today.

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • #product strategy

hypothesis driven

Stop guessing about your digital experience with LogRocket

Recent posts:.

hypothesis driven

How to implement the Zone to Win framework

People need to work on problems that have an impact or else they won’t be intrinsically motivated to sustain an incubation effort.

hypothesis driven

Leader Spotlight: Optimizing touch points in the user journey, with Natalie Shaddick

Natalie Shaddick discusses having a “united front” for branding and the importance of maintaining a cohesive message across touch points.

hypothesis driven

A guide to product testing

Product testing evaluates a product’s performance, safety, quality, and compliance with established standards and set goals.

hypothesis driven

Leader Spotlight: Differentiating digital experiences, with Scott Lux

Scott Lux talks about the future of ecommerce and its need for innovation and excellent customer experience.

Leave a Reply Cancel reply

InVisionApp, Inc.

Inside Design

5 steps to a hypothesis-driven design process

  •   mar 22, 2018.

S ay you’re starting a greenfield project, or you’re redesigning a legacy app. The product owner gives you some high-level goals. Lots of ideas and questions are in your mind, and you’re not sure where to start.

Hypothesis-driven design will help you navigate through a unknown space so you can come out at the end of the process with actionable next steps.

Ready? Let’s dive in.

Step 1: Start with questions and assumptions

On the first day of the project, you’re curious about all the different aspects of your product. “How could we increase the engagement on the homepage? ” “ What features are important for our users? ”

Related: 6 ways to speed up and improve your product design process

To reduce risk, I like to take some time to write down all the unanswered questions and assumptions. So grab some sticky notes and write all your questions down on the notes (one question per note).

I recommend that you use the How Might We technique from IDEO to phrase the questions and turn your assumptions into questions. It’ll help you frame the questions in a more open-ended way to avoid building the solution into the statement prematurely. For example, you have an idea that you want to make riders feel more comfortable by showing them how many rides the driver has completed. You can rephrase the question to “ How might we ensure rider feel comfortable when taking ride, ” and leave the solution part out to the later step.

“It’s easy to come up with design ideas, but it’s hard to solve the right problem.”

It’s even more valuable to have your team members participate in the question brainstorming session. Having diverse disciplines in the room always brings fresh perspectives and leads to a more productive conversation.

Step 2: Prioritize the questions and assumptions

Now that you have all the questions on sticky notes, organize them into groups to make it easier to review them. It’s especially helpful if you can do the activity with your team so you can have more input from everybody.

When it comes to choosing which question to tackle first, think about what would impact your product the most or what would bring the most value to your users.

If you have a big group, you can Dot Vote to prioritize the questions. Here’s how it works: Everyone has three dots, and each person gets to vote on what they think is the most important question to answer in order to build a successful product. It’s a common prioritization technique that’s also used in the Sprint book by Jake Knapp —he writes, “ The prioritization process isn’t perfect, but it leads to pretty good decisions and it happens fast. ”

Related: Go inside design at Google Ventures

Step 3: Turn them into hypotheses

After the prioritization, you now have a clear question in mind. It’s time to turn the question into a hypothesis. Think about how you would answer the question.

Let’s continue the previous ride-hailing service example. The question you have is “ How might we make people feel safe and comfortable when using the service? ”

Based on this question, the solutions can be:

  • Sharing the rider’s location with friends and family automatically
  • Displaying more information about the driver
  • Showing feedback from previous riders

Now you can combine the solution and question, and turn it into a hypothesis. Hypothesis is a framework that can help you clearly define the question and solution, and eliminate assumption.

From Lean UX

We believe that [ sharing more information about the driver’s experience and stories ] For [ the riders ] Will [ make riders feel more comfortable and connected throughout the ride ]

4. Develop an experiment and testing the hypothesis

Develop an experiment so you can test your hypothesis. Our test will follow the scientific methods, so it’s subject to collecting empirical and measurable evidence in order to obtain new knowledge. In other words, it’s crucial to have a measurable outcome for the hypothesis so we can determine whether it has succeeded or failed.

There are different ways you can create an experiment, such as interview, survey , landing page validation, usability testing, etc. It could also be something that’s built into the software to get quantitative data from users. Write down what the experiment will be, and define the outcomes that determine whether the hypothesis is valids. A well-defined experiment can validate/invalidate the hypothesis.

In our example, we could define the experiment as “ We will run X studies to show more information about a driver (number of ride, years of experience), and ask follow-up questions to identify the rider’s emotion associated with this ride (safe, fun, interesting, etc.). We will know the hypothesis is valid when we get more than 70% identify the ride as safe or comfortable. ”

After defining the experiment, it’s time to get the design done. You don’t need to have every design detail thought through. You can focus on designing what is needed to be tested.

When the design is ready, you’re ready to run the test. Recruit the users you want to target , have a time frame, and put the design in front of the users.

5. Learn and build

You just learned that the result was positive and you’re excited to roll out the feature. That’s great! If the hypothesis failed, don’t worry—you’ll be able to gain some insights from that experiment. Now you have some new evidence that you can use to run your next experiment. In each experiment, you’ll learn something new about your product and your customers.

“Design is a never-ending process.”

What other information can you show to make riders feel safe and comfortable? That can be your next hypothesis. You now have a feature that’s ready to be built, and a new hypothesis to be tested.

Principles from from The Lean Startup

We often assume that we understand our users and know what they want. It’s important to slow down and take a moment to understand the questions and assumptions we have about our product.

After testing each hypothesis, you’ll get a clearer path of what’s most important to the users and where you need to dig deeper. You’ll have a clear direction for what to do next.

by Sylvia Lai

Sylvia Lai helps startup and enterprise solve complex problems through design thinking and user-centered design methodologies at Pivotal Labs . She is the biggest advocate for the users, making sure their voices are heard is her number one priority. Outside of work, she loves mentoring other designers through one-on-one conversation. Connect with her through LinkedIn or Twitter .

Collaborate in real time on a digital whiteboard Try Freehand

Get awesome design content in your inbox each week, give it a try—it only takes a click to unsubscribe., thanks for signing up, you should have a thank you gift in your inbox now-and you’ll hear from us again soon, get started designing better. faster. together. and free forever., give it a try. nothing’s holding you back..

6 Steps Of Hypothesis-Driven Development That Works

hypothesis driven

One of the greatest fears of product managers is to create an app that flopped because it's based on untested assumptions. After successfully launching more than 20 products, we're convinced that we've found the right approach for hypothesis-driven development.

In this guide, I'll show you how we validated the hypotheses to ensure that the apps met the users' expectations and needs.

What is hypothesis-driven development?

Hypothesis-driven development is a prototype methodology that allows product designers to develop, test, and rebuild a product until it’s acceptable by the users. It is an iterative measure that explores assumptions defined during the project and attempts to validate it with users’ feedbacks.

What you have assumed during the initial stage of development may not be valid for the users. Even if they are backed by historical data, user behaviors can be affected by specific audiences and other factors. Hypothesis-driven development removes these uncertainties as the project progresses. 

hypothesis-driven development

Why we use hypothesis-driven development

For us, the hypothesis-driven approach provides a structured way to consolidate ideas and build hypotheses based on objective criteria. It’s also less costly to test the prototype before production.

Using this approach has reliably allowed us to identify what, how, and in which order should the testing be done. It gives us a deep understanding of how we prioritise the features, how it’s connected to the business goals and desired user outcomes.

We’re also able to track and compare the desired and real outcomes of developing the features. 

The process of Prototype Development that we use

Our success in building apps that are well-accepted by users is based on the Lean UX definition of hypothesis. We believe that the business outcome will be achieved if the user’s outcome is fulfilled for the particular feature. 

Here’s the process flow:

How Might We technique → Dot voting (based on estimated/assumptive impact) → converting into a hypothesis → define testing methodology (research method + success/fail criteria) → impact effort scale for prioritizing → test, learn, repeat.

Once the hypothesis is proven right, the feature is escalated into the development track for UI design and development. 

hypothesis driven development

Step 1: List Down Questions And Assumptions

Whether it’s the initial stage of the project or after the launch, there are always uncertainties or ideas to further improve the existing product. In order to move forward, you’ll need to turn the ideas into structured hypotheses where they can be tested prior to production.  

To start with, jot the ideas or assumptions down on paper or a sticky note. 

Then, you’ll want to widen the scope of the questions and assumptions into possible solutions. The How Might We (HMW) technique is handy in rephrasing the statements into questions that facilitate brainstorming.

For example, if you have a social media app with a low number of users, asking, “How might we increase the number of users for the app?” makes brainstorming easier. 

Step 2: Dot Vote to Prioritize Questions and Assumptions

Once you’ve got a list of questions, it’s time to decide which are potentially more impactful for the product. The Dot Vote method, where team members are given dots to place on the questions, helps prioritize the questions and assumptions. 

Our team uses this method when we’re faced with many ideas and need to eliminate some of them. We started by grouping similar ideas and use 3-5 dots to vote. At the end of the process, we’ll have the preliminary data on the possible impact and our team’s interest in developing certain features. 

This method allows us to prioritize the statements derived from the HMW technique and we’re only converting the top ones. 

Step 3: Develop Hypotheses from Questions

The questions lead to a brainstorming session where the answers become hypotheses for the product. The hypothesis is meant to create a framework that allows the questions and solutions to be defined clearly for validation.

Our team followed a specific format in forming hypotheses. We structured the statement as follow:

We believe we will achieve [ business outcome], 

If [ the persona],

Solve their need in  [ user outcome] using [feature]. ‍

Here’s a hypothesis we’ve created:

We believe we will achieve DAU=100 if Mike (our proto persona) solve their need in recording and sharing videos instantaneously using our camera and cloud storage .

hypothesis driven team

Step 4: Test the Hypothesis with an Experiment

It’s crucial to validate each of the assumptions made on the product features. Based on the hypotheses, experiments in the form of interviews, surveys, usability testing, and so forth are created to determine if the assumptions are aligned with reality. 

Each of the methods provides some level of confidence. Therefore, you don’t want to be 100% reliant on a particular method as it’s based on a sample of users.

It’s important to choose a research method that allows validation to be done with minimal effort. Even though hypotheses validation provides a degree of confidence, not all assumptions can be tested and there could be a margin of error in data obtained as the test is conducted on a sample of people. 

The experiments are designed in such a way that feedback can be compared with the predicted outcome. Only validated hypotheses are brought forward for development.

Testing all the hypotheses can be tedious. To be more efficient, you can use the impact effort scale. This method allows you to focus on hypotheses that are potentially high value and easy to validate. 

You can also work on hypotheses that deliver high impact but require high effort. Ignore those that require high impact but low impact and keep hypotheses with low impact and effort into the backlog. 

At Uptech, we assign each hypothesis with clear testing criteria. We rank the hypothesis with a binary ‘task success’ and subjective ‘effort on task’ where the latter is scored from 1 to 10. 

While we’re conducting the test, we also collect qualitative data such as the users' feedback. We have a habit of segregation the feedback into pros, cons and neutral with color-coded stickers.  (red - cons, green -pros, blue- neutral).

The best practice is to test each hypothesis at least on 5 users. 

Step 5  Learn, Build (and Repeat)

The hypothesis-driven approach is not a single-ended process. Often, you’ll find that some of the hypotheses are proven to be false. Rather than be disheartened, you should use the data gathered to finetune the hypothesis and design a better experiment in the next phase.

Treat the entire cycle as a learning process where you’ll better understand the product and the customers. 

We’ve found the process helpful when developing an MVP for Carbon Club, an environmental startup in the UK. The app allows users to donate to charity based on the carbon-footprint produced. 

In order to calculate the carbon footprint, we’re weighing the options of

  • Connecting the app to the users’ bank account to monitor the carbon footprint based on purchases made.
  • Allowing users to take quizzes on their lifestyles.

Upon validation, we’ve found that all of the users opted for the second option as they are concerned about linking an unknown app to their banking account. 

The result makes us shelves the first assumption we’ve made during pre-Sprint research. It also saves our client $50,000, and a few months of work as connecting the app to the bank account requires a huge effort. 

hypothesis driven development

Step 6: Implement Product and Maintain

Once you’ve got the confidence that the remaining hypotheses are validated, it’s time to develop the product. However, testing must be continued even after the product is launched. 

You should be on your toes as customers’ demands, market trends, local economics, and other conditions may require some features to evolve. 

hypothesis driven development

Our takeaways for hypothesis-driven development

If there’s anything that you could pick from our experience, it’s these 5 points.

1. Should every idea go straight into the backlog? No, unless they are validated with substantial evidence. 

2. While it’s hard to define business outcomes with specific metrics and desired values, you should do it anyway. Try to be as specific as possible, and avoid general terms. Give your best effort and adjust as you receive new data.  

3. Get all product teams involved as the best ideas are born from collaboration.

4. Start with a plan consists of 2 main parameters, i.e., criteria of success and research methods. Besides qualitative insights, you need to set objective criteria to determine if a test is successful. Use the Test Card to validate the assumptions strategically. 

5. The methodology that we’ve recommended in this article works not only for products. We’ve applied it at the end of 2019 for setting the strategic goals of the company and end up with robust results, engaged and aligned team.

You'll have a better idea of which features would lead to a successful product with hypothesis-driven development. Rather than vague assumptions, the consolidated data from users will provide a clear direction for your development team. 

As for the hypotheses that don't make the cut, improvise, re-test, and leverage for future upgrades.

Keep failing with product launches? I'll be happy to point you in the right direction. Drop me a message here.

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy .

Cookie Policy   |   Privacy Policy

Manage Consent Preferences

Essential/Strictly Necessary Cookies

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

hypothesis driven

  • Starburst vs Trino
  • Why Icehouse with Starburst

By Use Cases

  • Open Data Lakehouse
  • Artificial Intelligence
  • ELT Data Processing
  • Data Applications
  • Data Migrations
  • Data Products

By Industry

  • Financial Services
  • Healthcare & Life Sciences
  • Retail & CPG
  • All Industries
  • Meet our Customers
  • Professional Services
  • Starburst Data Rebels
  • Documentation
  • Technical overview
  • Starburst Galaxy
  • Starburst Enterprise
  • Upcoming Events
  • Data Fundamentals
  • Icehouse Center
  • Starburst Academy
  • Starburst Orbit
  • Become a Partner
  • Partner Login
  • Security & Trust

hypothesis driven

Fully managed in the cloud

Self-managed anywhere

Hypothesis-Driven Development

Hypothesis-driven development (hdd), also known as hypothesis-driven product development, is an approach used in software development and product management..

HDD involves creating hypotheses about user behavior, needs, or desired outcomes, and then designing and implementing experiments to validate or invalidate those hypotheses.

Related blogs

hypothesis driven

BCG study: Data costs & architectural complexity reach a tipping point

hypothesis driven

Data-driven innovation: If you want to innovate with data, this is what you should do!

hypothesis driven

Artificial intelligence life cycle

Data Products For Dummies

Unlock the value in your data

Why use a hypothesis-driven approach?

With hypothesis-driven development, instead of making assumptions and building products or features based on those assumptions, teams should formulate hypotheses and conduct experiments to gather data and insights.

This method assists with making informed decisions and reduces the overall risk of building products that do not meet user needs or solve their problems.

How do you implement hypothesis-driven development

At a high level, here’s a general approach to implementing HDD:

  • Identify the problem or opportunity: Begin by identifying the problem or opportunity that you want to address with your product or feature.
  • Create a hypothesis: Clearly define a hypothesis that describes a specific user behavior, need, or outcome you believe will occur if you implement the solution.
  • Design an experiment: Determine the best way to test your hypothesis. This could involve creating a prototype, conducting user interviews, A/B testing, or other forms of user research.
  • Implement the experiment: Execute the experiment by building the necessary components or conducting the research activities.
  • Collect and analyze data: Gather data from the experiment and analyze the results to determine if the hypothesis is supported or not.
  • If the hypothesis is supported, you can move forward with further development.
  • If the hypothesis is not supported, you may need to pivot, refine the hypothesis, or explore alternative solutions.
  • Rinse and repeat: Continuously repeat the process, iterating and refining your hypotheses and experiments to guide the development of your product or feature.

Hypothesis-driven development emphasizes a data-driven and iterative approach to product development, allowing teams to make more informed decisions, validate assumptions, and ultimately deliver products that better meet user needs.

A single point of access to all your data

Stay in the know - sign up for our newsletter.

  • Resource Library
  • Events and Webinars
  • Open-source Trino

Quick Links

  • Why Icehouse

Get In Touch

  • Customer Support

LinkedIn

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Read Starburst reviews on G2

Privacy Policy   |   Legal Terms   |   Cookie Notice

Start Free with Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes

For more deployment options:

Please fill in all required fields and ensure you are using a valid email address.

By clicking Create Account , you agree to Starburst Galaxy's terms of service and privacy policy .

Why hypothesis-driven development is key to DevOps

gears and lightbulb to represent innovation

Opensource.com

The definition of DevOps, offered by  Donovan Brown is  "The union of people , process , and products to enable continuous delivery of value to our customers. " It accentuates the importance of continuous delivery of value. Let's discuss how experimentation is at the heart of modern development practices.

hypothesis driven

Reflecting on the past

Before we get into hypothesis-driven development, let's quickly review how we deliver value using waterfall, agile, deployment rings, and feature flags.

In the days of waterfall , we had predictable and process-driven delivery. However, we only delivered value towards the end of the development lifecycle, often failing late as the solution drifted from the original requirements, or our killer features were outdated by the time we finally shipped.

hypothesis driven

Here, we have one release X and eight features, which are all deployed and exposed to the patiently waiting user. We are continuously delivering value—but with a typical release cadence of six months to two years, the value of the features declines as the world continues to move on . It worked well enough when there was time to plan and a lower expectation to react to more immediate needs.

The introduction of agile allowed us to create and respond to change so we could continuously deliver working software, sense, learn, and respond.

hypothesis driven

Now, we have three releases: X.1, X.2, and X.3. After the X.1 release, we improved feature 3 based on feedback and re-deployed it in release X.3. This is a simple example of delivering features more often, focused on working software, and responding to user feedback. We are on the path of continuous delivery, focused on our key stakeholders: our users.

Using deployment rings and/or feature flags , we can decouple release deployment and feature exposure, down to the individual user, to control the exposure—the blast radius—of features. We can conduct experiments; progressively expose, test, enable, and hide features; fine-tune releases, and continuously pivot on learnings and feedback.

When we add feature flags to the previous workflow, we can toggle features to be ON (enabled and exposed) or OFF (hidden).

hypothesis driven

Here, feature flags for features 2, 4, and 8 are OFF, which results in the user being exposed to fewer of the features. All features have been deployed but are not exposed (yet). We can fine-tune the features (value) of each release after deploying to production.

Ring-based deployment limits the impact (blast) on users while we gradually deploy and evaluate one or more features through observation. Rings allow us to deploy features progressively and have multiple releases (v1, v1.1, and v1.2) running in parallel.

Ring-based deployment

Exposing features in the canary and early-adopter rings enables us to evaluate features without the risk of an all-or-nothing big-bang deployment.

Feature flags decouple release deployment and feature exposure. You "flip the flag" to expose a new feature, perform an emergency rollback by resetting the flag, use rules to hide features, and allow users to toggle preview features.

Toggling feature flags on/off

When you combine deployment rings and feature flags, you can progressively deploy a release through rings and use feature flags to fine-tune the deployed release.

See deploying new releases: Feature flags or rings , what's the cost of feature flags , and breaking down walls between people, process, and products for discussions on feature flags, deployment rings, and related topics.

Adding hypothesis-driven development to the mix

Hypothesis-driven development is based on a series of experiments to validate or disprove a hypothesis in a complex problem domain where we have unknown-unknowns. We want to find viable ideas or fail fast. Instead of developing a monolithic solution and performing a big-bang release, we iterate through hypotheses, evaluating how features perform and, most importantly, how and if customers use them.

Template: We believe {customer/business segment} wants {product/feature/service} because {value proposition}. Example: We believe that users want to be able to select different themes because it will result in improved user satisfaction. We expect 50% or more users to select a non-default theme and to see a 5% increase in user engagement.

Every experiment must be based on a hypothesis, have a measurable conclusion, and contribute to feature and overall product learning. For each experiment, consider these steps:

  • Observe your user
  • Define a hypothesis and an experiment to assess the hypothesis
  • Define clear success criteria (e.g., a 5% increase in user engagement)
  • Run the experiment
  • Evaluate the results and either accept or reject the hypothesis

Let's have another look at our sample release with eight hypothetical features.

hypothesis driven

When we deploy each feature, we can observe user behavior and feedback, and prove or disprove the hypothesis that motivated the deployment. As you can see, the experiment fails for features 2 and 6, allowing us to fail-fast and remove them from the solution. We do not want to carry waste that is not delivering value or delighting our users! The experiment for feature 3 is inconclusive, so we adapt the feature, repeat the experiment, and perform A/B testing in Release X.2. Based on observations, we identify the variant feature 3.2 as the winner and re-deploy in release X.3. We only expose the features that passed the experiment and satisfy the users.

Hypothesis-driven development lights up progressive exposure

When we combine hypothesis-driven development with progressive exposure strategies, we can vertically slice our solution, incrementally delivering on our long-term vision. With each slice, we progressively expose experiments, enable features that delight our users and hide those that did not make the cut.

But there is more. When we embrace hypothesis-driven development, we can learn how technology works together, or not, and what our customers need and want. We also complement the test-driven development (TDD) principle. TDD encourages us to write the test first (hypothesis), then confirm our features are correct (experiment), and succeed or fail the test (evaluate). It is all about quality and delighting our users , as outlined in principles 1, 3, and 7  of the Agile Manifesto :

  • Our highest priority is to satisfy the customers through early and continuous delivery of value.
  • Deliver software often, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  • Working software is the primary measure of progress.

More importantly, we introduce a new mindset that breaks down the walls between development, business, and operations to view, design, develop, deliver, and observe our solution in an iterative series of experiments, adopting features based on scientific analysis, user behavior, and feedback in production. We can evolve our solutions in thin slices through observation and learning in production, a luxury that other engineering disciplines, such as aerospace or civil engineering, can only dream of.

The good news is that hypothesis-driven development supports the empirical process theory and its three pillars: Transparency , Inspection , and Adaption .

hypothesis driven

But there is more. Based on lean principles, we must pivot or persevere after we measure and inspect the feedback. Using feature toggles in conjunction with hypothesis-driven development, we get the best of both worlds, as well as the ability to use A|B testing to make decisions on feedback, such as likes/dislikes and value/waste.

Hypothesis-driven development:

  • Is about a series of experiments to confirm or disprove a hypothesis. Identify value!
  • Delivers a measurable conclusion and enables continued learning.
  • Enables continuous feedback from the key stakeholder—the user—to understand the unknown-unknowns!
  • Enables us to understand the evolving landscape into which we progressively expose value.

Progressive exposure:

  • Is not an excuse to hide non-production-ready code. Always ship quality!
  • Is about deploying a release of features through rings in production. Limit blast radius!
  • Is about enabling or disabling features in production. Fine-tune release values!
  • Relies on circuit breakers to protect the infrastructure from implications of progressive exposure. Observe, sense, act!

What have you learned about progressive exposure strategies and hypothesis-driven development? We look forward to your candid feedback.

User profile image.

Comments are closed.

Related content.

Working on a team, busy worklife

Career in Consulting

hypothesis driven

Hypothesis-driven approach: the definitive guide

Imagine you are walking in one of McKinsey’s offices.

Around you, there are a dozen of busy consultants.

The word “hypothesis” would be one of the words you would hear the most.

Along with “MECE” or “what’s the so-what?”.

This would also be true in any BCG, Bain & Company office or other major consulting firms.

Because strategy consultants are trained to use a hypothesis-driven approach to solve problems.

And as a candidate, you must demonstrate your capacity to be hypothesis-driven in your case interviews .

There is no turnaround:

If you want a consulting offer, you MUST know how to use a hypothesis-driven approach .

Like a consultant would be hypothesis-driven on a real project for a real client?

Hell, no! Big mistake!

Because like any (somehow) complex topics in life, the context matters.

What is correct in one context becomes incorrect if the context changes.

And this is exactly what’s happening with using a hypothesis-driven approach in case interviews.

This should be different from the hypothesis-driven approach used by consultant solving a problem for a real client .

And that’s why many candidates get it wrong (and fail their interviews).

They use a hypothesis-driven approach like they were already a consultant.

Thus, in this article, you’ll learn the correct definition of being hypothesis-driven in the context of case interviews .

Plus, you’ll learn how to use a hypothesis in your case interviews to “crack the case”, and more importantly get the well-deserved offer!

Ready? Let’s go. It will be super interesting!

Table of Contents

The wrong hypothesis-driven approach in case interviews.

Let’s start with a definition:

Hypothesis-driven thinking is a problem-solving method whereby you start with the answer and work back to prove or disprove that answer through fact-finding.

Concretely, here is how consultants use a hypothesis-driven approach to solve their clients’ problems:

  • Form an initial hypothesis, which is what they think the answer to the problem is.
  • Craft a logic issue tree , by asking themselves “what needs to be true for the hypothesis to be true?”
  • Walk their way down the issue tree and gather the necessary data to validate (or refute) the hypothesis.
  • Reiterate the process from step 1 – if their first hypothesis was disproved by their analysis – until they get it right.

hypothesis driven

With this answer-first approach, consultants do not gather data to fish for an answer. They seek to test their hypotheses , which is a very efficient problem-solving process.

The answer-first thinking works well if the initial hypothesis has been carefully formed.

This is why – in top consulting firms like McKinsey , BCG , or Bain & Company – the hypothesis is formed by a Partner with 20+ years of work experience.

And this is why this is NOT the right approach for case interviews.

Imagine a candidate doing a case interview at McKinsey and using answer-first thinking.

At the beginning of a case, this candidate forms a hypothesis (a potential answer to the problem), builds a logic tree, and gathers data to prove the hypothesis.

Here, there are two options:

The initial hypothesis is right

The initial hypothesis is wrong

If the hypothesis is right, what does it mean for the candidate?

That the candidate was lucky.

Nothing else.

And it certainly does not prove the problem-solving skills of this candidate (which is what is tested in case interviews).

Now, if the hypothesis is wrong, what’s happening next?

The candidate reiterates the process.

Imagine how disorganized the discussion with the interviewer can be.

Most of the time, such candidates cannot form another hypothesis, the case stops, and the candidate feels miserable.

This leads us to the right hypothesis-driven approach for case interviews.

The right hypothesis-driven approach in case interviews

To make my point clear between the wrong and right approach, I’ll take a non-business example.

Let’s imagine you want to move from point A to point B.

And for that, you have the choice among a multitude of roads.

hypothesis driven

Using the answer-first approach presented in the last section, you’d know which road to take to move from A to B (for instance the red line in the drawing below).

hypothesis driven

Again, this would not demonstrate your capacity to find the “best” road to go from A to B.

(regardless of what “best” means. It can be the fastest or the safest for instance.)

Now, a correct hypothesis-driven approach consists in drawing a map with all the potential routes between A and B, and explaining at each intersection why you want to turn left or right (” my hypothesis is that we should turn right ”).

hypothesis driven

And in the context of case interviews?

In the above analogy:

  • A is the problem
  • B is the solution
  • All the potential routes are the issues in your issue tree

And the explanation of why you want to take a certain road instead of another would be your hypothesis.

Is the difference between the wrong and right hypothesis-driven approach clearer?

If not, don’t worry. You’ll find many more examples below in this article.

But, next, let’s address another important question.

Why you must (always) use a hypothesis in your case interviews

You must use a hypothesis in your case interviews for two reasons.

A hypothesis helps you focus on what’s important to solve the case

Using a hypothesis-driven approach is critical to solving a problem efficiently.

In other words:

A hypothesis will limit the number of analysis you need to perform to solve a problem.

Thus, this is a way to apply the 80/20 principle and prioritize the issues (from your MECE issue tree ) you want to investigate.

And this is very important because your time with your interviewer is limited (like is the time with your client on a real project).

Let’s take a simple example of a hypothesis:

The profits of your client have dropped.

And your initial analysis shows increasing costs and stagnating revenues.

So your hypothesis can be:

“I think something happened in our cost structure, causing the profit drop. Next, I’d like to understand better the cost structure of our clients and which cost items have changed recently.”

Here the candidate is rigorously “cutting” half of his/her issue tree (the revenue side) and will focus the case discussion on the cost side.

And this is a good example of a hypothesis in case interviews.

Get 4 Complete Case Interview Courses For Free

hypothesis driven

You need 4 skills to be successful in all case interviews: Case Structuring, Case Leadership, Case Analytics, and Communication. Join this free training and learn how to ace ANY case questions.

A hypothesis tells your interviewers why you want to do an analysis

There is a road that you NEVER want to take.

On this road, the purpose of the questions asked by a candidate is not clear.

Here are a few examples:

“What’s the market size? growth?”

“Who are the main competitors? what are their market shares?”

“Have customer preferences changed in this market?”

This list of questions might be relevant to solve the problem at stake.

But how these questions help solve the problem is not addressed.

Or in other words, the logical connection between these questions and the problem needs to be included.

So, a better example would be:

“We discovered that our client’s sales have declined for the past three years. I would like to know if this is specific to our client or if the whole market has the same trend. Can you tell me how the market size has changed over the past three years? »

In the above question, the reason why the candidate wants to investigate the market is clear: to narrow down the analysis to an internal root cause or an external root cause.

Yet, I see only a few (great) candidates asking clear and purposeful questions.

You want to be one of these candidates.

How to use a hypothesis-driven approach in your case interviews?

At this stage, you understand the importance of a hypothesis-driven approach in case interviews:

You want to identify the most promising areas to analyze (remember that time is money ).

And there are two (and only two) ways to create a good hypothesis in your case interviews:

  • a quantitative way
  • a qualitative way

Let’s start with the quantitative way to develop a good hypothesis in your case interviews.

The quantitative approach: use the available data

Let’s use an example to understand this data-driven approach:

Interviewer: your client is manufacturing computers. They have been experiencing increasing costs and want to know how to address this issue.

Candidate: to begin with, I want to know the breakdown of their cost structure. Do you have information about the % breakdown of their costs?

Interviewer: their materials costs count for 30% and their manufacturing costs for 60%. The last 10% are SG&A costs.

Candidate: Given the importance of manufacturing costs, I’d like to analyze this part first. Do we know if manufacturing costs go up?

Interviewer: yes, manufacturing costs have increased by 20% over the past 2 years.

Candidate: interesting. Now, it would be interesting to understand why such an increase happened.

You can notice in this example how the candidate uses data to drive the case discussion and prioritize which analysis to perform.

The candidate made a (correct) hypothesis that the increasing costs were driven by the manufacturing costs (the biggest chunk of the cost structure).

Even if the hypothesis were incorrect, the candidate would have moved closer to the solution by eliminating an issue (manufacturing costs are not causing the overall cost increase).

That said, there is another way to develop a good hypothesis in your case interviews.

The qualitative approach: use your business acumen

Sometimes you don’t have data (yet) to make a good hypothesis.

Thus, you must use your business judgment and develop a hypothesis.

Again, let’s take an example to illustrate this approach.

Interviewer: your client manufactures computers and has been losing market shares to their direct competitors. They hired us to find the root cause of this problem.

Candidate: I think of many reasons explaining the drop in market shares. First, our client manufactures and sells not-competitive products. Secondly, we might price our products too high. Third, we need to use the right distribution channels. For instance, we might sell in brick-and-mortars stores when consumers buy their computers in e-stores like Amazon. Finally, I think of our marketing expenses. There may be too low or not used strategically.

Candidate: I see these products as commodities where consumers use price as the main buying decision criteria. That’s why I’d like to explore how our client prices their products. Do you have information about how our prices compare to competitors’?

Interviewer: this is a valid point. Here is the data you want to analyze.

Note how this candidate explains what she/he wants to analyze first (prices) and why (computers are commodities).

In this case interview, the hypothesis-driven approach looks like this:

This is a commodity industry —> consumers buying behavior is driven by pricing —> our client’s prices are too high.

Again, note how the candidate first listed the potential root causes for this situation and did not use an answer-first approach.

Want to learn more?

In this free training , I explain in detail how to use data or your business acumen to prioritize the issues to analyze and “crack the case.”

Also, you’ll learn what to do if you don’t have data or can’t use your business acumen.

Sign up now for free .

Form a hypothesis in these two critical moments of your case interviews

After you’ve presented your initial structure.

The first moment to form a hypothesis in your case interview?

In the beginning, after you’ve presented your structure.

When you’ve presented your issue tree, mention which issue you want to analyze first.

Also, explain why you want to investigate this first issue.

Make clear how the outcome of the analysis of this issue will help you solve the problem.

After an analysis

The second moment to form a hypothesis in your case interview?

After you’ve derived an insight from data analysis.

This insight has proved (or disproved) your hypothesis.

Either way, after you have developed an insight, you must form a new hypothesis.

This can be the issue you want to analyze next.

Or what a solution to the problem is.

Hypothesis-driven approach in case interviews: a conclusion

Having spent about 10 years coaching candidates through the consulting recruitment process , one commonality of successful candidates is that they truly understand how to be hypothesis-driven and demonstrate efficient problem-solving.

Plus, per my experience in coaching candidates , not being able to use a hypothesis is the second cause of rejection in case interviews (the first being the lack of MECEness ).

This means you can’t afford NOT to master this concept in a case study.

So, sign up now for this free course to learn how to use a hypothesis-driven approach in your case interviews and land your dream consulting job.

More than 7,000 people have already signed up.

Don’t waste one more minute!

See you there.

SHARE THIS POST

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

You need 4 skills to be successful in all case interviews: Case Structuring, Case Leadership, Case Analytics, and Communication. Enroll in our 4 free courses and discover the proven systems +300 candidates used to learn these 4 skills and land offers in consulting.

Stratechi.com

  • What is Strategy?
  • Business Models
  • Developing a Strategy
  • Strategic Planning
  • Competitive Advantage
  • Growth Strategy
  • Market Strategy
  • Customer Strategy
  • Geographic Strategy
  • Product Strategy
  • Service Strategy
  • Pricing Strategy
  • Distribution Strategy
  • Sales Strategy
  • Marketing Strategy
  • Digital Marketing Strategy
  • Organizational Strategy
  • HR Strategy – Organizational Design
  • HR Strategy – Employee Journey & Culture
  • Process Strategy
  • Procurement Strategy
  • Cost and Capital Strategy
  • Business Value
  • Market Analysis
  • Problem Solving Skills
  • Strategic Options
  • Business Analytics
  • Strategic Decision Making
  • Process Improvement
  • Project Planning
  • Team Leadership
  • Personal Development
  • Leadership Maturity Model
  • Leadership Team Strategy
  • The Leadership Team
  • Leadership Mindset
  • Communication & Collaboration
  • Problem Solving
  • Decision Making
  • People Leadership
  • Strategic Execution
  • Executive Coaching
  • Strategy Coaching
  • Business Transformation
  • Strategy Workshops
  • Leadership Strategy Survey
  • Leadership Training
  • Who’s Joe?

“A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A hypothesis is a novel suggestion that no one wants to believe. It is guilty until found effective.”

– Edward Teller, Nuclear Physicist

During my first brainstorming meeting on my first project at McKinsey, this very serious partner, who had a PhD in Physics, looked at me and said, “So, Joe, what are your main hypotheses.” I looked back at him, perplexed, and said, “Ummm, my what?” I was used to people simply asking, “what are your best ideas, opinions, thoughts, etc.” Over time, I began to understand the importance of hypotheses and how it plays an important role in McKinsey’s problem solving of separating ideas and opinions from facts.

What is a Hypothesis?

“Hypothesis” is probably one of the top 5 words used by McKinsey consultants. And, being hypothesis-driven was required to have any success at McKinsey. A hypothesis is an idea or theory, often based on limited data, which is typically the beginning of a thread of further investigation to prove, disprove or improve the hypothesis through facts and empirical data.

The first step in being hypothesis-driven is to focus on the highest potential ideas and theories of how to solve a problem or realize an opportunity.

Let’s go over an example of being hypothesis-driven.

Let’s say you own a website, and you brainstorm ten ideas to improve web traffic, but you don’t have the budget to execute all ten ideas. The first step in being hypothesis-driven is to prioritize the ten ideas based on how much impact you hypothesize they will create.

hypothesis driven example

The second step in being hypothesis-driven is to apply the scientific method to your hypotheses by creating the fact base to prove or disprove your hypothesis, which then allows you to turn your hypothesis into fact and knowledge. Running with our example, you could prove or disprove your hypothesis on the ideas you think will drive the most impact by executing:

1. An analysis of previous research and the performance of the different ideas 2. A survey where customers rank order the ideas 3. An actual test of the ten ideas to create a fact base on click-through rates and cost

While there are many other ways to validate the hypothesis on your prioritization , I find most people do not take this critical step in validating a hypothesis. Instead, they apply bad logic to many important decisions . An idea pops into their head, and then somehow it just becomes a fact.

One of my favorite lousy logic moments was a CEO who stated,

“I’ve never heard our customers talk about price, so the price doesn’t matter with our products , and I’ve decided we’re going to raise prices.”

Luckily, his management team was able to do a survey to dig deeper into the hypothesis that customers weren’t price-sensitive. Well, of course, they were and through the survey, they built a fantastic fact base that proved and disproved many other important hypotheses.

business hypothesis example

Why is being hypothesis-driven so important?

Imagine if medicine never actually used the scientific method. We would probably still be living in a world of lobotomies and bleeding people. Many organizations are still stuck in the dark ages, having built a house of cards on opinions disguised as facts, because they don’t prove or disprove their hypotheses. Decisions made on top of decisions, made on top of opinions, steer organizations clear of reality and the facts necessary to objectively evolve their strategic understanding and knowledge. I’ve seen too many leadership teams led solely by gut and opinion. The problem with intuition and gut is if you don’t ever prove or disprove if your gut is right or wrong, you’re never going to improve your intuition. There is a reason why being hypothesis-driven is the cornerstone of problem solving at McKinsey and every other top strategy consulting firm.

How do you become hypothesis-driven?

Most people are idea-driven, and constantly have hypotheses on how the world works and what they or their organization should do to improve. Though, there is often a fatal flaw in that many people turn their hypotheses into false facts, without actually finding or creating the facts to prove or disprove their hypotheses. These people aren’t hypothesis-driven; they are gut-driven.

The conversation typically goes something like “doing this discount promotion will increase our profits” or “our customers need to have this feature” or “morale is in the toilet because we don’t pay well, so we need to increase pay.” These should all be hypotheses that need the appropriate fact base, but instead, they become false facts, often leading to unintended results and consequences. In each of these cases, to become hypothesis-driven necessitates a different framing.

• Instead of “doing this discount promotion will increase our profits,” a hypothesis-driven approach is to ask “what are the best marketing ideas to increase our profits?” and then conduct a marketing experiment to see which ideas increase profits the most.

• Instead of “our customers need to have this feature,” ask the question, “what features would our customers value most?” And, then conduct a simple survey having customers rank order the features based on value to them.

• Instead of “morale is in the toilet because we don’t pay well, so we need to increase pay,” conduct a survey asking, “what is the level of morale?” what are potential issues affecting morale?” and what are the best ideas to improve morale?”

Beyond, watching out for just following your gut, here are some of the other best practices in being hypothesis-driven:

Listen to Your Intuition

Your mind has taken the collision of your experiences and everything you’ve learned over the years to create your intuition, which are those ideas that pop into your head and those hunches that come from your gut. Your intuition is your wellspring of hypotheses. So listen to your intuition, build hypotheses from it, and then prove or disprove those hypotheses, which will, in turn, improve your intuition. Intuition without feedback will over time typically evolve into poor intuition, which leads to poor judgment, thinking, and decisions.

Constantly Be Curious

I’m always curious about cause and effect. At Sports Authority, I had a hypothesis that customers that received service and assistance as they shopped, were worth more than customers who didn’t receive assistance from an associate. We figured out how to prove or disprove this hypothesis by tying surveys to transactional data of customers, and we found the hypothesis was true, which led us to a broad initiative around improving service. The key is you have to be always curious about what you think does or will drive value, create hypotheses and then prove or disprove those hypotheses.

Validate Hypotheses

You need to validate and prove or disprove hypotheses. Don’t just chalk up an idea as fact. In most cases, you’re going to have to create a fact base utilizing logic, observation, testing (see the section on Experimentation ), surveys, and analysis.

Be a Learning Organization

The foundation of learning organizations is the testing of and learning from hypotheses. I remember my first strategy internship at Mercer Management Consulting when I spent a good part of the summer combing through the results, findings, and insights of thousands of experiments that a banking client had conducted. It was fascinating to see the vastness and depth of their collective knowledge base. And, in today’s world of knowledge portals, it is so easy to disseminate, learn from, and build upon the knowledge created by companies.

NEXT SECTION: DISAGGREGATION

DOWNLOAD STRATEGY PRESENTATION TEMPLATES

THE $150 VALUE PACK - 600 SLIDES 168-PAGE COMPENDIUM OF STRATEGY FRAMEWORKS & TEMPLATES 186-PAGE HR & ORG STRATEGY PRESENTATION 100-PAGE SALES PLAN PRESENTATION 121-PAGE STRATEGIC PLAN & COMPANY OVERVIEW PRESENTATION 114-PAGE MARKET & COMPETITIVE ANALYSIS PRESENTATION 18-PAGE BUSINESS MODEL TEMPLATE

JOE NEWSUM COACHING

Newsum Headshot small

EXECUTIVE COACHING STRATEGY COACHING ELEVATE360 BUSINESS TRANSFORMATION STRATEGY WORKSHOPS LEADERSHIP STRATEGY SURVEY & WORKSHOP STRATEGY & LEADERSHIP TRAINING

THE LEADERSHIP MATURITY MODEL

Explore other types of strategy.

BIG PICTURE WHAT IS STRATEGY? BUSINESS MODEL COMP. ADVANTAGE GROWTH

TARGETS MARKET CUSTOMER GEOGRAPHIC

VALUE PROPOSITION PRODUCT SERVICE PRICING

GO TO MARKET DISTRIBUTION SALES MARKETING

ORGANIZATIONAL ORG DESIGN HR & CULTURE PROCESS PARTNER

EXPLORE THE TOP 100 STRATEGIC LEADERSHIP COMPETENCIES

TYPES OF VALUE MARKET ANALYSIS PROBLEM SOLVING

OPTION CREATION ANALYTICS DECISION MAKING PROCESS TOOLS

PLANNING & PROJECTS PEOPLE LEADERSHIP PERSONAL DEVELOPMENT

sm icons linkedIn In tm

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Oral Maxillofac Pathol
  • v.23(2); May-Aug 2019

Hypothesis-driven Research

Umadevi krishnamohan rao.

1 Department of Oral and Maxillofacial Pathology, Ragas Dental College and Hospital, Chennai, Tamil Nadu, India E-mail: moc.liamg@kvuamu

An external file that holds a picture, illustration, etc.
Object name is JOMFP-23-168-g001.jpg

As Oral Pathologists, we have the responsibility to upgrade our quality of service with an open mind attitude and gratitude for the contributions made by our professional colleagues. Teaching the students is the priority of the faculty, and with equal priority, oral pathologists have the responsibility to contribute to the literature too as a researcher.

Research is a scientific method of answering a question. This can be achieved when the work done in a representative sample of the population, i.e., the outcome of the result, can be applied to the rest of the population, from which the sample is drawn. In this context, frequently done research is a hypothesis-driven research which is based on scientific theories. Specific aims are listed in this type of research, and the objectives are stated. The scope of a well-designed methodology in a hypothesis-driven research equips the researcher to establish an opportunity to state the outcome of the study.

A provisional statement in which the relationship between two variables is described is known as hypothesis. It is very specific and offers the freedom of evaluating a prediction between the variables stated. It facilitates the researcher to envision and gauge as to what changes can occur in the listed specific outcome variables (dependent) when changes are made in a specific predictor (independent) variable. Thus, any given hypothesis should include both these variables, and the primary aim of the study should be focused on demonstrating the association between the variables, by maintaining the highest ethical standards.

The other requisites for a hypothesis-based study are we should state the level of statistical significance and should specify the power, which is defined as the probability that a statistical test will indicate a significant difference when it truly exists.[ 1 ] In a hypothesis-driven research, specifications of methodology help the grant reviewers to differentiate good science from bad science, and thus, hypothesis-driven research is the most funded research.[ 2 ]

“Hypotheses aren’t simply useful tools in some potentially outmoded vision of science; they are the whole point.” This was stated by Sean Carroll, from the California Institute of Technology, in response to Editor-In-Chief of “ Wired ” Chris Anderson, who argued that “biology is too complex for hypotheses and models, and he favored working on enormous data by correlative analysis.”[ 3 ]

Research does not stop by stating the hypotheses but must ensure that it is clear, testable and falsifiable and should serve as the fundamental basis for constructing a methodology that will allow either its acceptance (study favoring a null hypothesis) or rejection (study rejecting the null hypothesis in favor of the alternative hypothesis).

It is very worrying to observe that many research projects, which require a hypothesis, are being done without stating one. This is the fundamental backbone of the question to be asked and tested, and later, the findings need to be extrapolated in an analytical study, addressing a research question.

A good dissertation or thesis which is submitted for fulfillment of a curriculum or a submitted manuscript is comprised of a thoughtful study, addressing an interesting concept, and has to be scientifically designed. Nowadays, evolving academicians are in a competition to prove their point and be academically visible, which is very vital in their career graph. In any circumstance, unscientific research or short-cut methodology should never be conducted or encouraged to produce a research finding or publish the same as a manuscript.

The other type of research is exploratory research, which is based on a journey for discovery, which is not backed by previously established theories and is driven by hope and chance of breakthrough. The advantage of using these data is that statistics can be applied to establish predictions without the consideration of the principles of designing a study, which is the fundamental requirement of a conventional hypothesis. There is a need to set standards of statistical evidence with a much higher cutoff value for acceptance when we consider doing a study without a hypothesis.

In the past few years, there is an emergence of nonhypothesis-driven research, which does receive encouragement from funding agencies such as innovative molecular analysis technologies. The point to be taken here is that funding of nonhypothesis-driven research does not implicate decrease in support to hypothesis-driven research, but the objective is to encourage multidisciplinary research which is dependent on coordinated and cooperative execution of many branches of science and institutions. Thus, translational research is challenging and does carry a risk associated with the lack of preliminary data to establish a hypothesis.[ 4 ]

The merit of hypothesis testing is that it takes the next stride in scientific theory, having already stood the rigors of examination. Hypothesis testing is in practice for more than five decades and is considered to be a standard requirement when proposals are being submitted for evaluation. Stating a hypothesis is mandatory when we intend to make the study results applicable. Young professionals must be appraised of the merits of hypothesis-based research and must be trained to understand the scope of exploratory research.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 November 2020

Data driven theory for knowledge discovery in the exact sciences with applications to thermonuclear fusion

  • A. Murari 1   na1 ,
  • E. Peluso 2   na1 ,
  • M. Lungaroni 2 ,
  • P. Gaudio 2 ,
  • J. Vega 3 &
  • M. Gelfusa 2  

Scientific Reports volume  10 , Article number:  19858 ( 2020 ) Cite this article

3563 Accesses

8 Citations

Metrics details

  • Characterization and analytical techniques
  • Experimental nuclear physics
  • Information theory and computation
  • Magnetically confined plasmas

In recent years, the techniques of the exact sciences have been applied to the analysis of increasingly complex and non-linear systems. The related uncertainties and the large amounts of data available have progressively shown the limits of the traditional hypothesis driven methods, based on first principle theories. Therefore, a new approach of data driven theory formulation has been developed. It is based on the manipulation of symbols with genetic computing and it is meant to complement traditional procedures, by exploring large datasets to find the most suitable mathematical models to interpret them. The paper reports on the vast amounts of numerical tests that have shown the potential of the new techniques to provide very useful insights in various studies, ranging from the formulation of scaling laws to the original identification of the most appropriate dimensionless variables to investigate a given system. The application to some of the most complex experiments in physics, in particular thermonuclear plasmas, has proved the capability of the methodology to address real problems, even highly nonlinear and practically important ones such as catastrophic instabilities. The proposed tools are therefore being increasingly used in various fields of science and they constitute a very good set of techniques to bridge the gap between experiments, traditional data analysis and theory formulation.

Similar content being viewed by others

hypothesis driven

Combining data and theory for derivable scientific discovery with AI-Descartes

hypothesis driven

The data-driven future of high-energy-density physics

hypothesis driven

Applications of nature-inspired metaheuristic algorithms for tackling optimization problems across disciplines

Introduction.

After a brief period, mainly dominated by observational methods, the scientific enterprise has progressed in a hypothesis driven way, particularly in fields such as physics and chemistry. Typically, on the basis of already established theories, new models have been developed mathematically and their predictions have been falsified with specifically designed experiments 1 , 2 . This scientific methodology has been very successful and has produced great results, particularly in the investigation of mostly deterministic linear systems, composed of weakly coupled parts. On the other hand, in the last decades, various technological and cultural changes have contributed to expose the limitations of such an approach to knowledge discovery. First, all scientific disciplines are nowadays required to tackle increasingly challenging, nonlinear problems and systems, some of which, like thermonuclear plasmas, are very difficult, if not impossible, to model with theories based on first principles. Moreover, many newly interesting phenomena, for various reasons ranging from intrinsic randomness to inaccessibility for measurement, are characterised by a high level of uncertainties, limiting the effectiveness of purely deterministic approaches. All this in the context of a data deluge, affecting not only society in general but also the scientific community 3 , 4 . Increasingly many experiments, particularly in Big Physics, tend to produce enormous amounts of data, impossible to properly analyse by hand or with traditional statistical tools, conceived in an era of paucity of information. At CERN, the main detector ATLAS has shown the capability of producing Petabytes of data per year. In its prime, the Hubble space telescope sent to earth Gigabytes of data per day and the data warehouse of the Joint European Torus (JET), the largest operational Tokamak in the world, is approaching 0.5 Petabytes.

The limitations of the hypothesis driven approach to investigating complex physical objects, affected by great uncertainties, has become particularly evident in the case of open systems such as high temperature plasmas 5 , 6 . Thermonuclear plasmas are open, nonlinear, out of equilibrium systems characterised by a very high level of complexity and poor accessibility for measurement in a hostile environment. The consequences are typically a limited experimental characterization of many phenomena and the presence of significant noise in the data. Such complexity and high levels of uncertainty in practice reduce dramatically the effectiveness of formulating theories from first principles. Traditional analysis techniques, such a simple fitting routines or log regression, are also of limited help, due to their rigidity and poor exploratory capability. They indeed assume that a solution of the problem is already known and that only the parameters of the models have to be adjusted. As a consequence, a lot of untapped knowledge remains buried in the large collected databases, with the scientists unable to fully exploit them due to the lack of adequate mathematical tools for data mining. Historically, these difficulties have led to a hierarchy of descriptions (particle, kinetic, fluid) and to a plethora of ad hoc models, aimed at interpreting specific phenomena with poor generalization power and limited applicability 6 . On the other hand, modern machine learning techniques, as deployed in commercial applications, are not completely adequate to address scientific questions. This quite unsatisfactory situation motivates the quest for more sophisticated analysis methodologies to draw reliable and sound conclusions.

With regard to the rest of the paper, in the next section the present limitations of traditional machine learning (ML) methods are discussed. In “ Data driven theory with symbolic regression via genetic programming ” section, the main tools developed in the last decade, to overcome the drawbacks of commercial data mining techniques, are described in some detail. Numerical examples, to show the potential of the new approaches for scientific investigations, are the subject of “ Numerical examples ” section. In “ Application to scaling laws: the energy confinement time in Tokamaks ” and “ Application to the extraction of boundary equations: disruptions ” sections, two fundamental applications to Magnetic Confinement Nuclear Fusion are presented: the scaling laws for the energy confinement time and an original derivation of the boundary equation between safe and disruptive regions of the operational space. Conclusions and lines of future investigations are the subject of the last section of the paper.

The limits of traditional statistical and machine learning tools

In the last years, the limitations of the hypothesis driven approach to scientific investigation has motivated the adoption of more data drives techniques. Typically referred to with the collective name of “machine learning”, these methods are explicitly conceived to derive models directly from the data 7 , 8 . The developed tools make a different use of the available computational power. Whereas traditional algorithms implement complete solutions to the problem at hand and mainly help only in carrying out massive calculations, machine learning tools are exploratory in nature, in the sense that they analyse the data to find possible, not already known solutions and models. In the last years, their successes have been astonishing and they are now widely deployed in a great variety of domains, ranging from automatic translators to image and voice recognition, to diagnostic support. On the other hand, even if machine learning tools have found many very useful applications, they are popular in the private sector but their penetration in the sciences has been quite poor and typically confined to “theory-less” applications, where it is not required to devise interpretable mathematical models. This relatively minor acceptance is the consequence of the many limitations of traditional machine learning techniques. Indeed, to be really useful, the knowledge discovery process in the natural sciences has to satisfy specific criteria and requirements, which are not necessarily crucial in other fields. In particular, in addition to the accuracy of prediction, the derived equations must reflect the “physics” reality of the phenomena under investigation. The derived models should also be easily interpretable and guarantee a proper treatment of the uncertainties and a solid quantification of the confidence intervals. Even if the traditional data driven tools are providing quite impressive performance in terms of prediction accuracy, they have been found lacking in practically all the other respects. The main problem relates to the structure of their mathematical models, which can be completely unrelated to the physics reality of the investigated phenomena. This poor “physics fidelity” is a major concern, which has badly affected the adoption of many machine-learning tools in various scientific disciplines, particularly in physics and chemistry. Indeed, a significant dissonance, between the model mathematical form and the actual physics of the phenomena under study, can compromise some of the most important objectives of scientific investigations. Indeed, it can jeopardize the interpretation of the results in the light of existing mathematical theories, with consequent reduced contribution to general understanding and limited confidence in the extrapolability of the results. This aspect is particularly problematic for the design of new experiments, which are typically required to investigate the phenomena in previously unexplored regions of the operational space. Purely statistical models, without any relation with the actual dynamics of the systems under study, can be delicate to use in this perspective and can provide misleading indications.

Data driven theory with symbolic regression via genetic programming

To overcome, or at least to alleviate, all the previously mentioned limitations of machine learning for science, a new methodology has been developed in the last decade. It is called Symbolic Regression (SR) via Genetic Programming (GP) 9 . This technique consists of a series of tools, which allow a new approach to theory formulation. The mathematical models, describing the phenomena under investigation, are derived directly from the data. The tools can be used either in an exploratory way, by reducing to a minimum the “a priori” assumptions, or steered toward certain classes of solutions, for example for comparison with existing theories. They implement genetic programming but apply it to the symbolic manipulation of mathematical formulas. Genetic Programs (GPs) are designed to emulate the behaviour of living organisms by working with a population of individuals, i.e. mathematical expressions in our case 10 , 11 The individuals are candidate models of the system to be investigated. These models are represented as trees, which makes easy the implementation of Genetic Programming, in particular of the three main genetic operators: copy, cross-over and mutation. The main mechanism of the methodology consists of traversing the database and checking the behaviour of each candidate model, derived from the initial families of functions selected by the scientists. These initial families, i.e.basic units, are usually arithmetic operations, functions, possibly including saturation terms, and ad hoc operators introduced by the user. The basis functions have to be selected carefully, taking into account the nature of the problem at hand. For example, their combination must include a realistic and physically meaningful model of the phenomena to be analysed. For the sake of completeness, the basis functions implemented to obtain the results presented in the rest of the paper are summarized in Table 1 .

A specific metric must be used to evaluate the performances of the candidate equations. This performance indicator, designed to find the appropriate trade-off between complexity and goodness of fit, is usually called fitness function (FF) and allows selecting the best candidates of each generation. The better the FF of certain individuals, the higher the probability to choose those elements to produce descendants. The genetic operators are applied to these best performing individuals to obtain the following population. The process is iterated for a high number of generations, until convergence on a satisfactory solution; Fig.  1 reports graphically what has been described in the above paragraphs.

figure 1

Overview of the proposed methodology to identify the best models, directly from the data. Figure created using Microsoft Power Point from Office Professional Plus 2013 https://www.microsoft.com/it-it/microsoft-365/microsoft-office?rtc=1 .

In the end, the proposed technique provides a series of data driven models, whose mathematical structure is the most appropriate to fit the available databases. On the other hand, “a priori” knowledge can be brought to bear on the formulation of the final solutions. Indeed, the extraction of the models can be influenced at least at three different levels: (a) the selection of the basis functions (b) the structure of the trees (c) the mathematical from of the fitness function.

Of course, the heart of the method is the FF, the indicator chosen to assess the quality of the solutions. To this end, various model selection criteria have been implemented: the Akaike Information Criterion (AIC), the Takeuchi Information Criterion (TIC) and the Bayesian Information Criterion (BIC) 12 . Since all these metrics are conceptually similar, only the AIC is discussed in the following. The aim of the AIC estimator is to minimize the generalisation error by finding the best trade-off between complexity and goodness of fit. The most widely used form is:

where the Mean Square Error ( MSE ) is evaluated between the predictions obtained using each model and the target data, while k represents the complexity of the model itself, in terms of number of nodes of the tree, and n stands for the number of columns or rows, i.e.entries, in the database to be analysed. Considering the parameterization of Eq. ( 1 ), it can be easily understood why this indicator has to minimized. The closer the models are to the data (first addend) and the lower the number of nodes is required to express them in a tree form (second addend), the lower the estimator. Therefore parsimony is built in the metric to avoid overfitting, contrary to the vast majority of the alternatives (for example the maximum likelihood), which do not consider this issue and therefore do not include terms penalising excessive complexity of the models.

In terms of computational complexity, it should be considered that the proposed genetic programming approach can be parallelized relatively easily. Therefore, the main difficulties, in the deployment of symbolic regression, typically reside in the quality and quantity of the data. The databases available are often not sufficiently selective to identify a unique model, clearly superior to all the others in all respects. The most common situation is convergence on a small set of reasonable candidates. The typical approach to select the final model is based on the so-called Pareto Frontier (PF) 13 , which consists of the set of best models, each one for a specific value of complexity. When the FF of the models in the PF is plotted versus complexity, the resulting curve typically allows identifying a region with the best trade-off between goodness of fit and complexity; the models in this region are the ones, to which a final nonlinear fitting 14 is applied to determine the most appropriate solution. This last stage allows also quantifying the confidence intervals to be associated to the models. More details on the entire procedure can be found in the literature 15 , 16 .

Numerical examples

To illustrate the potential of the proposed techniques, in this section some numerical examples are analysed, using synthetic data. It is worth mentioning that the numerical tests reported have been performed in conditions very similar to those of the real life cases, presented in the rest of the paper. The number of generated examples is of the order of a couple of thousand. Zero mean Gaussian noise of standard deviations equal to about 10% of the data mean value has been added, again to simulate realistic experimental situations.

The first case involves the identification of scaling laws, mathematical expressions meant to quantify how the properties of a system change with size. Their importance is at least twofold. On the one hand, scaling laws reveal important aspects about the behaviour and dynamics of the systems under investigation. From a more practical perspective, scaling considerations play a fundamental role in the engineering of artefacts and machines. Extracting robust scaling laws directly from available data is essential in the case of the design of new experiments, which cannot be easily modelled theoretically, such as Tokamak devices. Unfortunately, the mathematical methods, available until the development of SR via GP, were basically fitting algorithms, which required the scientist to make strict assumptions about the mathematical form of the scalings. Consequently, for decades the most popular forms of the scaling laws have been power laws in the regressors, mainly because a popular numerical tool, log regression, was available to derive them 17 . Until recently, indeed, all scaling laws in Tokamak physics were basically power laws 18 . Even if very popular, power laws present various drawbacks of high significance for scientific applications. First of all, tens of different physical mechanisms end up providing scalings in power law form. Therefore, power laws are poorly informative about the actual dynamics behind the scalings 19 . Moreover, power laws do not present any saturation mechanism, are monotonic in the regressors (cannot model minima or maxima) and force the interaction between the independent variables to be multiplicative. This application of SR via GP is therefore a paradigmatic example of the much higher flexibility provided by the new developed solutions, compared to previous techniques. To show the potential of the developed tools, a series of systematic tests has been performed, proving the capability of the methodology to derive scaling laws of any mathematical form provided enough data is available. Various functional relations have been implemented for generating synthetic data, to assess how the proposed technique manages to identify the right equation. As a representative example, the results obtained for a model consisting of three parts is discussed in the following. Equation ( 2 ) contains a power law term, a linear term and a squashing factor, covering some of the most important functional dependencies of practical relevance in the sciences. The mathematical expression of this hypothetical scaling is:

It should be mentioned that Eq. ( 2 ) is at least of the same level of complexity as the actual experimental cases analysed in the rest of the paper. The method has been able to easily recover the right expression of the scaling in very reasonable time, whereas log regression is obviously at a loss to identify this type of function.

To complement the previous example, a series of additional tests have been performed to verify that the developed tools can identify the most appropriate dimensionless quantities to describe a database. Dimensionless quantities are quantities of dimension 1, which are often utilized to simplify the description of complex systems, characterized by multiple interacting phenomena. Their importance is emphasized by the Buckingham π theorem, which states that the validity of the laws of physics does not depend on a specific unit system. Therefore, any physically meaningful equation can always be expressed as an identity involving only dimensionless combinations, obtained by extracting ratios and/or products of the variables linked by the law. Dimensionless quantities have proved to be particularly useful in fluid dynamic. A well-known law connecting dimensionless quantities is:

The Peclet number Pe is used to evaluate the ratio between transferred heat by advection and diffusion in a fluid. The Prandtl number Pr is defined as the ratio of kinematic and thermal diffusivity; the Reynold number Re takes into account the relative importance of viscosity for the internal layers of a fluid. The above quantities can be written as:

μ is the dynamic viscosity

k is the thermal conductivity

\(c_{p}\) is the specific heat

ρ is the density

u is the velocity of the fluid

d is a characteristic linear dimension of the object in which the fluid moves

To prove the capability of the proposed methodology to identify dimensionless variables, a set of data has been generated using Eq. ( 3 ). On the other hand, SR via GP has been provided only with the dimensional quantities, among which of course the ones used to build the database, i.e. the dimensional quantities appearing in Eqs. ( 4 ) and ( 5 ). The implemented algorithm has been able to derive the dimensionless relationship (3) by grouping the correct dimensional variables for various levels of normally distributed noise. As an example, the following scaling has been obtained with 30% of added noise:

which, once rounded out, provides the exact answer, Eq. ( 3 ).

Application to scaling laws: the energy confinement time in Tokamaks

Nuclear fusion, the process of building larger nuclei from the synthesis of smaller ones, is considered a potential solution to humanity energy needs 20 . The most promising technical alternative, to bring the nuclei close enough to fuse, is magnetic confinement. In this approach, plasmas are confined by magnetic fields in a vacuum chamber and heated to temperatures higher than in the core of the sun. The best performing configuration of the magnetic fields so far is the Tokamak 5 . One of the most crucial quantities, to assess the reactor relevance of a magnetic configuration, is the so called energy confinement time τ E , which quantifies how fast the internal energy of the plasma is lost 21 , 22 , 23 . Unfortunately, high temperature plasmas are too complex to be modelled, even numerically, to estimate τ E . The range of scales involved spans many orders of magnitudes, ranging from microturbulence to macroscopic dimensions comparable to the size of the devices. For this reasons, from the beginning of the 70′s, scientists have spent many efforts trying to derive the behaviour of τ E from experimental data using robust scaling laws, since τ E is calculated routinely in practically all Tokamaks. Therefore, multi-machine databases, with reliable estimates of this quantity, are available. Moreover, to reduce the adverse effects of the uncertainties in the measurements, very significant efforts have been devoted to formulating the scaling laws of τ E in terms of dimensionless variables, which are believed: (a) to have a stronger physical base with respect to dimensional scalings, and (b) to be more robust, particularly for extrapolation. The most commonly used dimensionlees quantities have been derived from the so called Vlasov equation. This model assumes that the plasma behaviour is governed by equations invariant under a certain class of transformations. Consequently, also any scaling expressions should present the same invariances. Assuming the validity of the assumptions previously cited, the choice of specific dimensionless variables is determined a priori from theoretical considerations and not derived from the data 24 , 25 .

Recently SR has been applied to the task of identifying potential scaling expressions of the confinement time studying the ITPA DB3v13f 26 international database, which has been built with the purpose of supporting advanced studies and includes validated measurements from all of the most relevant reactors in the world. In line with what stated above and with the literature 27 , the following dimensionless quantities have been considered: \( \beta ,\varrho ,\nu ,,\kappa_{a} ,M,q_{95} \) . In the previous list, \(\beta\) is the normalized plasma pressure, ρ indicates the normalized ion Larmour radius, ν the normalized collision frequency, \(\varepsilon\) the inverse aspect ratio, \(k_{a}\) the volume elongation measurement, M the effective atomic mass in a.m.u, q the plasma safety factor evaluated at the flux surface enclosing the 95% of the poloidal flux 25 . Coherently, the dimensionless product of the ion cyclotron frequency times the confinement time (τ E ω ci ) has been selected as the dependent quantity to be analysed. Using the same selection criteria as in 27 , from which the reference scalings in power law form were derived, the final database is made of 2806 entries.

Table 2 reports the classical scaling expressions 27 and the one obtained with the aforementioned approach, using SR via GP. The best model, derived with symbolic regression, is not in power law monomial form. The non-power law scaling outperforms the other two according to all the aforementioned estimators (TIC, AIC, BIC, MSE and Kullback–Leibler divergence), normally used to assess the quality of models and fits. The selection process, which converges on the best unconstrained empirical model reported in Table 2 on the basis of the Pareto Frontier, should also guarantee that the risk of overfitting is negligible Since, as mentioned, one of the main purposes of scaling laws consists of providing guidance to the design of new experiments, the predictions for ITER 28 , the next generation international Tokamak at present being built in France, have been investigated in detail. Considering the values in Table 3 for the predicted confinement time on ITER, it emerges how, according to AdNPL scaling, τ E would be about 20% lower than the expected values obtained using the most widely accepted traditional scaling, AdPL1 27 . In terms of simple extrapolations, the model derived with SR via GP is much more in line with the AdPL2 scaling, calculated with the more realistic errors in variable technique 27 . The AdNPL estimate of Table 2 is also confirmed by dimensional scaling laws in terms of dimensional quantities 18 . With regard to interpretation, the non-power law scaling provides a better fit, because the smaller devices follow a more positive trend than the larger ones and the power laws are not flexible enough to accommodate this fact 18 , 25 .

However, extrapolation is always a delicate matter. The present case is particularly challenging because the parameter space is 7-dimensional and the number of entries less than 3000. The available experimental points are very sparse and this is a well-known difficulty 29 . Therefore, a specific test has been performed. The methodology has been applied on a subset of data (train dataset), without JET’s entries, and then the obtained model has been tested on the previously excluded JET data (test dataset). The traditional scaling laws have been fitted directly to the training dataset and evaluated on the test dataset of JET. The logic behind such an approach resides in the fact that JET parameters cover a middle ground between the smaller devices and ITER. Figure  2 shows the quality of the proposed non power law scaling. The residuals of the AdNPL are better centred on zero and have smaller standard deviation. Extrapolating better, from the smaller devices to JET, gives confidence in the overall quality of the results 30 , 31 , 32 .

figure 2

Distribution of residuals between the data of JET and the models predictions. In green, the distribution of the residuals for the non-power law model AdNPL is shown. In blue and red the distributions of the residuals for the models AdPL1 and AdPL2, reported in the literature. Figure created using MATLAB R2019a, https://it.mathworks.com/products/matlab.html .

Application to the extraction of boundary equations: disruptions

Many natural and man-made systems can give the superficial impression of being very stable and resilient to perturbations, however, in reality, might be prone to collapse. The consequent catastrophic events may be quite straightforward to interpret and most do not require sophisticated investigations because, adopting the proper level of precautionary care, they are relatively easy to avoid. On the other hand, some such as earthquakes, and in general accidents due to atmospheric phenomena, can be very sudden and very difficult to forecast. In the last years, increasing attention has been devoted to devising mathematical tools more appropriate to investigating and predicting rare catastrophic events 33 . Machine learning tools are new instruments in the arsenal of techniques, which can be deployed by the analysts. The methodology described in the following is proving to be very useful in the modelling and interpretation of sudden disruptive events.

In Magnetic Confinement Nuclear Fusion, the macroscopic instabilities called disruptions are the most striking example of catastrophic failures difficult to predict 5 . They occur when the plasma crosses one of the major stability limits and cause the sudden loss of confinement and the consequent abrupt extinction of the plasma current. They can have very serious consequences for the integrity of the experimental devices; moreover the electromagnetic forces and the thermal loads that they can generate become more severe the larger the machines. Therefore, disruptions are one of the most severe problems to be faced by the Tokamak magnetic configuration on the route to designing and operating commercial reactors. Unfortunately, physical models of disruptions, based on first principles, are practically unusable for prediction. Consequently, in the last decades, machine learning tools have been increasingly deployed to derive empirical models capable of predicting the approach of a disruption 34 , 35 , 36 , 37 , 38 , 39 , 40 . Unfortunately, these models present all the problems mentioned in the second section and are particularly lacking in physics fidelity. Indeed, the models of most ML tools have nothing to do with the actual dynamics behind disruptions, even if they can be very accurate in terms of predictions 41 , 42 , 43 , 44 . Consequently, the capability of the present models to extrapolate to larger devices is questionable. A method has therefore been developed to combine the predictive capability of the ML tools with the advantages of more physically meaningful equations. The main steps of such an approach are:

Training the machine learning tools for classification, i.e. to discriminate between disruptive and non-disruptive examples

Determining a sufficient number of points on the boundary between safe and disruptive regions of the operational space provided by the machine learning tools

Deploying Symbolic Regression via Genetic Programming to express the equation of the boundary in a physically meaningful form, using the points identified in the previous step.

The just described procedure has been tested using a variety of ML tools, ranging from clustering to probabilistic classifiers. In detail, the machine learning technology, used to obtain the results presented in this section, is probabilistic Support Vector Machines (SVM). The database investigated comprises 187 disruptions and 1200 safe shots, belonging to campaigns C29-C30 at the beginning of JET operation with the new ITER Like Wall (ILW). The separation between the disruptive and safe regions of the operational space, in the plane of the locked mode amplitude and the internal inductance, is reported in Fig.  3 . The following equation has been retained as a good compromise between complexity and accuracy:

where LM is the amplitude of the locked mode expressed in 10 –4  T, l i the internal inductance and the coefficients assume the values:

figure 3

Plot of the safe and disruptive regions of the operational space in JET at the beginning of operation with the ILW. The colour code represents the posterior probability of the classifier. The light blue circles are all the non-disruptive shots (10 random time slices for each shot). The blue squares are the data of the disruptive shots at the time slice when the predictor triggers the alarm. The green crosses are the false alarms. The white line represents Eq. ( 7 ). Figure created using MATLAB R2019a, https://it.mathworks.com/products/matlab.html .

The performance of the previous equation reproduces very well the one of the original SVM model as can be appreciated from Table 4 , where the traditional indicators used to quantify the quality of predictors are reported. In terms of interpretability, Eq. ( 7 ) should be compared with an SVM model of tens of Gaussian functions centred on the support vectors. In this application, therefore, the results are particularly positive, because the dramatic increase in interpretability does not imply a loss of accuracy. From the point of view of the physics, the relation between the locked mode amplitude and the current profile, identified by SR via GP, is an important aspect, which has to be included in any theoretical model of disruption physics 41 .

Conclusions

The proposed methodology is meant to support theory formulation starting from the data, when the complexity of the phenomena to be studied is so severe that it is difficult or impossible to devise models from first principles. It is to be considered a complement to traditional hypothesis driven modelling. Indeed, the developed tools mathematize the derivation of theoretical models directly from the data, in analogy with the analogous process of formulating new models from existing ones in the hypothesis driven approach. Therefore SR via GP at least alleviates a traditional weakness of present day research, since so far deriving models from data has been more an art than a scientific procedure. The proposed techniques make systematic use of the most advanced machine learning tools, which are proving so successful in society at large. Application of Symbolic Regression and Genetic Programming has proved invaluable to obtain results of a physically meaningful and interpretable form. More advanced versions of the tools, with a more sophisticated treatment of the errors in the measurements, using the Geodesic Distance, are also available 45 , 46 , 47 . The proposed approach has also found various applications in several branches of physics, such as atmospheric physics and remote sensing 48 , 49 , and not only in the field of thermonuclear plasmas. In terms of future applications, it is planned to combine SR via GP also with neural networks of complex topology, to profit from the great exploratory power and flexibility of deep learning 50 , 51 . Upgrades of the methodology to address times series with memory, recursive functions 52 , 53 , 54 , 55 and distributed quantities are also quite advanced.

Bailly, F. & Longo, G. Mathematics and the Natural Sciences. The Physical Singularity of Life (Imperial College Press, London, 2011).

Book   Google Scholar  

D’Espargnat, B. On Physics and Philosophy (Princeton University Press, Oxford, 2002).

Google Scholar  

Mainzer, K. Thinking in Complexity (Springer, New York, 2004).

Gray J., A. Szalay. eScience A Transformed Scientific Method. Presentation to the Computer Science and Technology Board of the National Research Council, Mountain View, CA, https://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt (11th January 2007)

Wesson, J. Tokamaks 3rd edn. (Clarendon Press, Oxford, 2004).

MATH   Google Scholar  

Murari, A. & Vega, J. Physics-based optimization of plasma diagnostic information. Plasma Phys. Controll. Fus. https://doi.org/10.1088/0741-3335/56/11/110301 (2014).

Article   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, New York, 2001).

Domingos, P. The Master Algorithm (Basic Books, New York, 2015).

Schmid, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324 , 81–85. https://doi.org/10.1126/science.1165893 (2009).

Article   ADS   CAS   Google Scholar  

Koza, J. R. Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992).

Sivanandam, S. N. & Deepa, S. N. Introduction to Genetic Algorithms (Springer, Heidelberg, 2007).

Burnham, K. P. & Anderson, D. R. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach 2nd edn. (Springer, New York, 2002).

Miettinen, K. Nonlinear Multiobjective Optimization (Springer, Berlin, 1998).

Silverman, B. W. Density Estimation for Statistics and Data Analysis (Chapmans & Hall, London, 1986).

Murari, A., Lupelli, I., Gelfusa, G. & Gaudio, P. Non-power law scaling for access to the H-mode in tokamaks via symbolic regression. Nucl. Fus. 53 , 043001. https://doi.org/10.1088/0029-5515/53/4/043001 (2013).

Murari, A. et al. Symbolic regression via genetic programming for data driven derivation of confinement scaling laws without any assumption on their mathematical form. Plasma Phys. Control. Fus. 57 , 014008. https://doi.org/10.1088/0741-3335/57/1/014008 (2015).

Dielman, T. E. Applied Regression Analysis (South Western Cengage Learning, Mason, 2005).

Murari, A., Peluso, E., Gelfusa, M., Lupelli, I. & Gaudio, P. A new approach to the formulation and validation of scaling expressions for plasma confinement in tokamaks. Nucl. Fus 55 , 073009. https://doi.org/10.1088/0029-5515/55/7/073009 (2015).

Sornette, D. Critical Phenomena in Natural Sciences 2nd edn. (Springer, Heidelberg, 2003).

Chen, F. An Indispensable Truth: How Fusion Power Can Save the Planet (Springer, New York, 2011).

Romanelli, F. et al. Overview of JET results. Nucl. Fus. 49 , 104006. https://doi.org/10.1088/0029-5515/49/10/104006 (2009).

Ongena, J. et al. Towards the realization on JET of an integrated H-mode scenario for ITER. Nucl. Fus. 44 , 124–133. https://doi.org/10.1088/0029-5515/44/1/015 (2004).

Fasoli, A. et al. Computational challenges in magnetic-confinement fusion physics. Nat. Phys. 12 , 411–423. https://doi.org/10.1038/NPHYS3744 (2016).

Article   CAS   Google Scholar  

Sonnino, G., Peeter, P., Sonnino, A., Nardone, P. & Steinbrecher, G. Stationary distribution functions for ohmic Tokamak-plasmas in the weak-collisional transport regime by MaxEnt principle. J. Plasma 81 , 905810116. https://doi.org/10.1017/S0022377814000713 (2014).

Murari, A., Peluso, E., Lungaroni, M., Gelfusa, M. & Gaudio, P. Application of symbolic regression to the derivation of scaling laws for tokamak energy confinement time in terms of dimensionless quantities. Nucl. Fus. 56 , 026005. https://doi.org/10.1088/0029-5515/56/2/026005 (2016).

https://efdasql.ipp.mpg.de/hmodepublic/DataDocumentation/Datainfo/DB3v13/db3v13.html

McDonald, D. et al. ELMy H-modes in JET helium-4 plasmas. Plasma Phys. Control. Fus. 46 , 519–534. https://doi.org/10.1088/0741-3335/46/3/007 (2004).

IAEA, ITER Technical Basis . https://www.iaea.org/publications/6492/iter-technical-basis (2002)

Giraud, C. Introduction to High-Dimensional Statistics (Taylor & Francis Group, New York, 2015).

Peluso, E., Gelfusa, M., Murari, A., Lupelli, I. & Gaudio, P. A statistical analysis of the scaling laws for the confinement time distinguishing between core and edge. Phys. Procedia 62 , 113–117. https://doi.org/10.1016/j.phpro.2015.02.020 (2015).

Article   ADS   Google Scholar  

Peluso, E., Murari, A., Gelfusa, M. & Gaudio, P. A statistical method for model extraction and model selection applied to the temperature scaling of the L-H transition. Plasma Phys. Control. Fusion 56 , 114001. https://doi.org/10.1088/0741-3335/56/11/114001 (2014).

Murari, A., Peluso, E., Gaudio, P. & Gelfusa, M. Robust scaling laws for energy confinement time, including radiated fraction, in Tokamaks. Nucl. Fus. 57 , 126017. https://doi.org/10.1088/1741-4326/aa7bb4 (2017).

Hadlock, C. R. Six Causes of Collapse (Mathematical Association of America, Washington, 2012).

Murari, A. et al. Determining the prediction limits of models and classifiers with application to disruption prediction on JET. Nucl. Fus. 57 , 016024. https://doi.org/10.1088/0029-5515/57/1/016024 (2017).

Peluso, E. et al. On determining the prediction limits of mathematical models for time series. J. Instrum. 11 , C07013. https://doi.org/10.1088/1748-0221/11/07/C07013 (2016).

Murari, A. et al. Unbiased and non-supervised learning methods for disruption prediction at JET. Nucl. Fus. 49 , 055028. https://doi.org/10.1088/0029-5515/49/5/055028 (2009).

Murari, A. et al. Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees. Nucl. Fus. 48 , 035010. https://doi.org/10.1088/0029-5515/48/3/035010 (2008).

Rattà, G. et al. An advanced disruption predictor for JET tested in a simulated real-time environment. Nucl. Fus. 50 , 025005. https://doi.org/10.1088/0029-5515/50/2/025005 (2010).

Zhang, Y., Pautasso, G., Kardaun, O., Tardini, G. & Zhang, X. D. Prediction of disruptions on ASDEX Upgrade using discriminant analysis. Nucl. Fus. 51 , 063039. https://doi.org/10.1088/0029-5515/51/6/063039 (2011).

Vega, J. et al. Results of the JET real-time disruption predictor in the ITER-like wall campaigns. Fus. Eng. Des. 88 , 1228–1231. https://doi.org/10.1016/j.fusengdes.2013.03.003 (2013).

Murari, A. et al. Adaptive predictors based on probabilistic SVM for real time disruption mitigation on JET. Nucl. Fus. 58 , 056002. https://doi.org/10.1088/1741-4326/aaaf9c (2018).

Pautasso, G. et al. On-line prediction and mitigation of disruptions in ASDEX Upgrade. Nucl. Fus. 42 , 100. https://doi.org/10.1088/0029-5515/42/1/314 (2002).

Cannas, B. et al. Disruption prediction with adaptive neural networks for ASDEX upgrade. Fus. Eng. Des. 86 , 1039–1104. https://doi.org/10.1016/j.fusengdes.2011.01.069 (2011).

Pautasso, G. et al. Contribution of ASDEX upgrade to disruption studies for ITER. Nucl. Fus. 51 , 103009. https://doi.org/10.1088/0029-5515/51/10/103009 (2011).

Lungaroni, M. et al. On the potential of ruled-based machine learning for disruption prediction on JET. Fus. Eng. Des. 130 , 62–68. https://doi.org/10.1016/j.fusengdes.2018.02.087 (2018).

Murari, A. et al. Clustering based on the geodesic distance on Gaussian manifolds for the automatic classification of disruptions. Nucl. Fus. 53 , 033006. https://doi.org/10.1088/0029-5515/53/3/033006 (2013).

Murari, A. et al. How to handle error bars in symbolic regression for data mining in scientific applications. Stat. Learn. Data Sci. 9047 , 347–355. https://doi.org/10.1007/978-3-319-17091-6_29 (2015).

Bellecci, C. et al. Application of a CO 2 dial system for infrared detection of forest fire and reduction of false alarms. Appl. Phys. B 87 , 373–378. https://doi.org/10.1007/s00340-007-2607-9 (2007).

Bellecci, C. et al. In-cell measurements of smoke backscattering coefficients using a CO 2 laser. Opt. Eng. 49 , 124302. https://doi.org/10.1117/1.3526331 (2010).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Schmidhuber, J. Deep learning in neural network: an overview. Neural Netw. 61 , 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 (2015).

Article   PubMed   Google Scholar  

Kos, L., Jelić, N., Tskhakaya, D. D. & Kuhn, S. Introduction to the theory and application of a unified bohm criterion for arbitrary-ion-temperature collision-free plasmas with finite Debye lengths. Phys. Plasmas 25 , 043509. https://doi.org/10.1063/1.5030121 (2018).

Kos, L., Jelić, N., Gyergyek, T., Kuhn, S. & Tskhakaya, D. D. Modeling and simulations of plasma and sheath edges in warm-ion collision-free discharges. AIP Adv. 8 , 105311. https://doi.org/10.1063/1.5044664 (2018).

Tskhakaya, D. D., Kos, L. & Jelić, N. A unified analysis of plasma-sheath transition in the Tonks-Langmuir model with warm ion source. Phys. Plasmas 21 , 073503. https://doi.org/10.1063/1.4885638 (2014).

Robinson, S. Sheath and presheath in plasma with warm ions. Phys. Plasmas 16 , 103503. https://doi.org/10.1063/1.3247874 (2009).

Download references

Acknowledgements

This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Project No. ENE2015-64914-C3-1-R

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Author information

These authors contributed equally: A. Murari and E. Peluso

Authors and Affiliations

Consorzio RFX (CNR, ENEA, INFN, Università di Padova, Acciaierie Venete SpA), Corso Stati Uniti 4, 35127, Padua, Italy

Department of Industrial Engineering, University of Rome “Tor Vergata”, via del Politecnico 1, 00133, Rome, Italy

E. Peluso, M. Lungaroni, P. Gaudio & M. Gelfusa

Laboratorio Nacional de Fusión, CIEMAT, Av. Complutense 40, 28040, Madrid, Spain

You can also search for this author in PubMed   Google Scholar

Contributions

Data curation, E.P. M.L; Formal analysis, A.M., E.P., J.V ; Investigation, E.P. and M.G.; Methodology, A.M. E.P.; Project administration, M.G. P.G; Software, E.P, M.L; Validation, J.V. and M.G.; Writing–original draft, A.M.; Writing–review & editing, A.M. and E.P.

Corresponding author

Correspondence to E. Peluso .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Murari, A., Peluso, E., Lungaroni, M. et al. Data driven theory for knowledge discovery in the exact sciences with applications to thermonuclear fusion. Sci Rep 10 , 19858 (2020). https://doi.org/10.1038/s41598-020-76826-4

Download citation

Received : 02 January 2020

Accepted : 03 November 2020

Published : 16 November 2020

DOI : https://doi.org/10.1038/s41598-020-76826-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Causality detection and quantification by ensembles of time delay neural networks for application to nuclear fusion reactors.

  • Michela Gelfusa
  • Riccardo Rossi
  • Andrea Murari

Journal of Fusion Energy (2024)

A practical utility-based but objective approach to model selection for regression in scientific applications

Artificial Intelligence Review (2023)

Combining neural computation and genetic programming for observational causality detection and causal modelling

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

hypothesis driven

Hypothesis-Driven Approach: Crack Your Case Like a Consultant

  • Last Updated June, 2023

A hypothesis-driven approach in consulting is a structured method of problem-solving. Consultants formulate a hypothesis for the solution to a business problem, then gather data to support or disprove it. 

Cracking a case interview can be a daunting task, with a wide range of potential solutions and approaches to consider. However, using a hypothesis-driven approach is a systematic and effective problem-solving method. It will impress your interviewer and demonstrate your readiness for a career in consulting.

In this article, we will talk about:

  • The definitions of a hypothesis and a hypothesis-driven approach
  • The differences between a hypothesis-driven approach and a non-hypothesis-driven approach
  • An example of how to solve a case using both approaches
  • Our 5-step process for using a hypothesis-driven approach to solve consulting cases

Let’s get started!

What Is a Hypothesis & a Hypothesis-Driven Approach?

Differences between a hypothesis-driven approach vs. non-hypothesis-driven approach, our 5-step process for using the hypothesis-driven mindset to solve cases, other consulting tools that will strengthen your problem-solving.

In the realm of science, the term “hypothesis” is used to describe a proposed explanation for a question or phenomenon, based on limited evidence, as a starting point for further investigation. Similarly, consultants act as scientists or as doctors solving their clients’ business problems, constantly forming and testing hypotheses to identify the best solutions. 

The key phrase here is “starting point,” as a hypothesis is an educated guess at the solution, formed from currently available information. As more data is gathered, the hypothesis may be adjusted or even discarded entirely.

Nail the case & fit interview with strategies from former MBB Interviewers that have helped 89.6% of our clients pass the case interview.

Hypothesis-Driven Approach

Consultants are engaged to efficiently and effectively solve their clients’ problems and assist in making critical business decisions. With the vast amount of data available and an array of options to consider, it can be overwhelming to examine everything. Time constraints on projects make it imperative that consultants avoid getting bogged down in excessive analysis and questioning, without making meaningful progress toward a recommendation.

Instead, consultants begin by forming a hypothesis after gaining an understanding of the client’s problem and high-level range of possibilities. Then, they gather data to test the initial hypothesis. If the data disproves the hypothesis, the consultants repeat the process with the next best hypothesis. This method of problem-solving is commonly used by top consulting firms, such as McKinsey.

A non-hypothesis-driven approach is the opposite of a hypothesis-driven approach. Instead of forming a hypothesis, the individual makes a recommendation only after thoroughly evaluating all data and possibilities. This approach may rely on intuition, trial and error, or exhaustively exploring all options to solve the problem. This is not an efficient method for a case interview, where time is limited.

An analogy that illustrates the distinction between the two methods is to look at problem-solving as trying to find a needle in a haystack. A non-hypothesis-driven approach would involve randomly searching through the entire stack without any clear strategy. 

On the other hand, a hypothesis-driven approach would involve dividing the haystack into smaller piles, and systematically searching through one section at a time. The searcher would gather information from the person who lost the needle, such as their location when it was lost, to identify the most likely pile to search first. This not only saves time but also increases the likelihood of finding the needle. If the needle is not found in the initial pile, the search can then move on to the next most probable pile.

Solving a Case Interview Using the Hypothesis-Driven Approach vs. the Non-Hypothesis-Driven Approach

To further illustrate the advantages of a hypothesis-driven approach, let’s examine two different approaches to the same case interview example. We’ll compare and contrast these approaches, highlighting the key distinctions between them. By the end, you’ll have a clear understanding of the benefits of using a hypothesis-driven approach in problem-solving. 

The client is SnackCo, a consumer goods company that manufactures and sells trail mixes in the United States. Over the past decade, SnackCo has seen significant growth following the launch of premium trail mix products, capitalizing on the trend toward healthier snacking options. Despite this success, the company’s operations have remained unchanged for the past decade. SnackCo is asking for your help to improve its bottom line.

Let’s look at how two candidates, Alex and Julie, solve the same case.

The Non-Hypothesis-Driven Approach

After hearing this prompt, Alex jumps right into listing possible questions related to how to improve the bottom line.

Alex: I understand SnackCo wants to improve profitability. Here are some questions I want to look into. Has SnackCo’s retail prices remained the same in recent years?

Interviewer: No, SnackCo has adjusted prices quite closely to what competitive products are selling at.

Alex: Oh interesting. Are consumers willing to pay more for premium trail mix? Do we know if we are underpricing?

Interviewer: SnackCo’s Director of Sales strongly believes that they should not change product prices. He believes the consumers love the product and it is priced fairly. 

Alex: Got it. Has the client’s market share decreased?

Interviewer: No, the market share has increased over the years.

Alex: In that case, it seems like our growth is fine. Have the costs increased?

Interviewer: SnackCo has not made many changes to its costs and operations in the last decade. What are some ways we can help them look at their cost savings opportunities?

Although Alex is making progress and may eventually solve the case, his communication style gives the impression that he is randomly guessing at the sources of the problem, rather than using logical reasoning and structure to pinpoint the solution.

The Hypothesis-Driven Approach

Julie has prepared for her case interviews with My Consulting Offer’s coaches so she is well-versed in the hypothesis-driven approach. 

After hearing the same prompt, she takes a moment to write down the key issues she wants to dig into to solve this case and organizes her thoughts. 

Julie: For the goal of improving profitability, we could look at how to improve revenue or decrease costs. For revenue, we could look at if prices or volumes have changed. Since the client said they haven’t made any changes to the business operations in the last decade, I would like to start with a better understanding of their costs. However, before we begin, I want to confirm if there have been any changes to prices or volumes recently.

Interviewer: SnackCo’s Director of Sales strongly believes that they should not change product prices. They also believe the volumes have grown well as SnackCo is one of the market leaders now. 

Julie: Great. That confirms what I was thinking. It’s likely a cost problem. We could look at their variable costs, such as ingredients, or fixed costs, such as manufacturing facilities. Given that this is an established business, I would assume their fixed costs are likely consistent. Therefore, let’s start with their variable costs.

Interviewer: How should we think about variable costs?

Julie: Variable costs for SnackCo likely include ingredients, packaging, and freight. The levers they could pull to reduce these costs would be through supplier relationships or changing the product composition. 

Julie quickly identifies that variable costs are likely the problem and has a structured approach to understanding which opportunities to explore. 

Key Differences

The interviewer is looking for candidates with strong problem-solving and communication skills, which are the qualities of a good consultant. Let’s look at how the two candidates performed.

Problem-Solving

Alex’s approach to solving the client’s problem was haphazard, as he posed a series of seemingly unrelated questions in no particular order. This method felt more like a rapid-fire Q&A session rather than a structured problem-solving approach. 

On the other hand, Julie takes a structured and analytical approach to address profitability concerns. She quickly realizes that while revenue is one factor of profitability, it is likely costs that are the main concern, as they haven’t changed much in the last decade. She then breaks down the major cost categories and concludes that variable costs are the most likely opportunity for cost reduction. Julie is laser-focused on the client’s goal and efficiently gets to a solution.

Communication

Alex is not making a positive initial impression. If this were an actual client interaction, his questioning would appear disorganized and unprofessional. 

On the other hand, Julie appears more organized through her clear communication style. She only considers the most pertinent issues at hand (i.e., the client’s business operations and costs) and avoids going down irrelevant rabbit holes.

  • Understand the client’s problem; ask clarifying questions if needed.
  • Formulate an issue tree to break down the problem into smaller parts.
  • State the initial hypothesis and key assumptions to be tested.
  • Gather and analyze information to prove or disprove the hypothesis; do not panic if the hypothesis is disproven.
  • Pivot the hypothesis if necessary and repeat step 4. Otherwise, make your recommendation on what the client can do to solve their problem. 

Other helpful tips to remember when using the hypothesis-driven approach:

  • Stay focused on the client’s problem and remember what the end goal is.
  • Think outside the box and consider new perspectives beyond traditional frameworks. The basic case interview frameworks are useful to understand but interviewers expect candidates to tailor to the specific client situation.
  • Clearly communicate assumptions and implications throughout the interview; don’t assume the interviewer can read your mind.

A hypothesis-driven approach is closely tied to other key consulting concepts, such as issue trees, MECE, and 80/20. Let’s take a closer look at these topics and how they relate.

  • Issue Trees

Issue trees, also known as decision trees, are visual tools that break down complex business problems into smaller, more manageable parts. In a consulting interview, candidates use the issue tree to outline key issues and potential factors in the client’s problem, demonstrating their understanding of the situation. This structure is then used to guide the case discussion, starting with the candidate’s best hypothesis, represented as one branch of the issue tree. For more information and examples of issue trees, check out our i ssue tree post. 

During the interview process, consulting firms look for candidates who can demonstrate a MECE (mutually exclusive and collectively exhaustive) approach to problem-solving, which involves breaking down complex issues into distinct, non-overlapping components. 

A MECE approach in case interviews involves identifying all potential paths to solving a client’s problem at a high level. This allows the candidate to form an initial hypothesis with confidence that no potential solutions have been overlooked. To gain a deeper understanding, read our comprehensive guide on the MECE case structure .

Consultants use the 80/20 rule, also known as the Pareto principle, to prioritize their efforts and focus on the most important things. This principle states that 80% of effects come from 20% of causes, which means a small number of issues often drive a large portion of the problem. By identifying and focusing on the key issues, consultants can achieve significant results with relatively minimal resources.

By following these tips and developing a solid understanding of the hypothesis-driven approach to case-solving, you will have the necessary tools to excel in your case interview. For more interview resources, check out Our Ultimate Guide to Case Interview Prep . 

– – – – –

In this article, we’ve covered:

  • Explanations of a hypothesis and hypothesis-driven approach
  • Comparison between a hypothesis-driven approach and a non-hypothesis-driven approach
  • Examples of the same case using both approaches and the key differences
  • Practical tips on how to develop a hypothesis-driven mindset to ace the case

Still have questions?

If you have more questions about the best degrees for a career in consulting, leave them in the comments below. One of My Consulting Offer’s recruiters will answer them.

Other people preparing to apply to consulting firms found the following pages helpful:

  • Our Ultimate Guide to Case Interview Prep
  • Types of Case Interviews
  • Case Frameworks
  • Hypothesis Trees

Help with Your Consulting Application

Thanks for turning to My Consulting Offer for advice on the best majors for consulting. My Consulting Offer has helped almost 89.6% of the people we’ve worked with to get a job in management consulting. We want you to be successful in your consulting interviews too. For example, here is how Misha was able to get his offer from BCG.

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

© My CONSULTING Offer

3 Top Strategies to Master the Case Interview in Under a Week

We are sharing our powerful strategies to pass the case interview even if you have no business background, zero casing experience, or only have a week to prepare.

No thanks, I don't want free strategies to get into consulting.

We are excited to invite you to the online event., where should we send you the calendar invite and login information.

hypothesis driven

UChicago biophysicist studies locomotion in creatures from all walks of life

Asst. prof. jasmine nirody traces the relationship between habitat and movement.

Editor’s note: This story is part of Meet a UChicagoan, a regular series focusing on the people who make UChicago a distinct intellectual community. Read about the others  here .

Where does physics meet biology?

For biophysicist Jasmine Nirody , the intersection lies in locomotion, or how things move. It’s both a direct application of fundamental mechanics from her very first college physics class and a tangible reflection of evolutionary processes.

“Locomotion is something that we think about all the time: we are moving all the time, interacting with our surroundings in physical, mechanical ways,” she said. “So, we intuitively understand it to be important in almost every species.”

But movement is just a specific instance of the broader phenomenon Nirody, who is an Assistant Professor of Organismal Biology and Anatomy at the University of Chicago, seeks to understand.

“I'm really interested in this broad evolutionary question of how interacting with complex environments affects behavior, and how that affects morphology — both on the short timescale and then over long evolutionary timescales,” she said.

“Complex” environments have changing or varied conditions. Animals living in such environments must therefore develop unique adaptations to thrive in such dynamic surroundings. To use a human example, we adapt to changing seasons every few months. We both change our behavior — by changing our wardrobes — and rely on our internal biological mechanisms to maintain a steady body temperature.

But to understand how these adaptations evolve, Nirody sets her sights well beyond just human behavior. For example, in her postdoctoral fellowships at Oxford and Rockefeller University, Nirody studied tardigrades. These microscopic animals (which you may know by their cuddlier name “water bears”) are known for occupying diverse habitats, using their jointed legs both for swimming and for an array of walking patterns. E. coli bacteria, which Nirody studied during her PhD at UC Berkeley, also display discrete movement patterns, using their flagella to both drive themselves forward and to rotate.

Selecting organisms so biologically distinct, separated by hundreds of millions of years of evolution, is key to addressing the broadness of Nirody’s question.

“Math and physics give you an obsession with universality,” Nirody said. “Because of that, I’m not married to any one species. I’m more zoomed out — excited by questions, by principles.”

Two approaches to science

Nirody describes her ideal lab as being split 50/50 between experimentalists and theorists, because she appreciates the interplay between both approaches to science.

Until her postdoctoral fellowships, Nirody herself was largely a theorist — someone who uses mathematical modeling and statistics to predict biological outcomes rather than observe them through experiments. For this reason, she initially found it challenging to wrangle the unpredictable nature of experimental research.

“The thing I really had to come to terms with as a biologist, and as an experimental biologist in particular, is that I can't own all the mistakes that happen,” she said. “I can't always know what went wrong, even if it's something that I did — that’s not something that happens in theory.”

But Nirody finds that straddling theoretical and experimental research has sharpened her skills in both areas. Becoming familiar with the limits of working at the bench allowed her to build models better suited for experimental testing. In turn, her precise models allow her to plan specific, hypothesis-driven experiments, to which she credits her efficiency at the bench.

These conceptual intersections — between physics and biology, theory, and experimentation — form the heart of Nirody’s work. But other intersections appear in her work too, like that between past and future. While exploring the history of how an organism adapted to a dynamic environment, Nirody finds herself thinking about how the principles of that relationship can be applied to fields like robotics.

“We can design synthetic systems that do some things better than biology,” she said. “But for things that synthetic systems don’t handle as well as biology, the inspiration from biological systems can be valuable.”

Nirody also enjoys pondering intersections of science and metaphysics with other philosophically minded biologists. For example, how can you reliably compare behavior across species when the species in question are a snake and a bacterial biofilm?

“When I did work with snakes, it was very easy to define a behavior. But that sense of agency and autonomy is not always granted to bacteria,” she said. “So, it raises questions like what is ‘behavior’? What kind of interactions define sociality, in terms of biofilm formation?”

What’s in store for the Nirody lab

Now heading her own lab, Nirody is excited to dive into new projects and collaborations to explore new ways that animals navigate changing environments.

Haibei Zhang, a PhD student in the lab, is currently studying Vibrio fischeri , a species of bacteria related to the cholera-causing species Vibrio cholera . These bacteria spend each day swimming in the open ocean and each night in the dense tissues of bobtail squid, giving the squid their signature bioluminescence.

“You typically have species that are purely marine, that only live in liquid, and species that only live within another organism, that are purely intestinal, for example,” Nirody said. “So, Vibrio fischeri , able to be in both of these environments, provides a really nice portal into understanding how generalist versus specialist species might work.”

Erin Brandt, a postdoctoral scholar in the Nirody lab, works on a much larger scale, studying the mechanics of how jumping spiders jump. Jumping spider species can be found in various habitats, from deserts, where they jump off shifting sand granules, to Midwest forests, jumping from leaf litter on the forest floor.

Brandt is interested in comparing the legs of jumping spider species from many different environments to understand how each species is physically adapted to their specific conditions. The researchers can do this using a CT scanner, which produces very detailed images of how the legs are structured. To access the more exotic species not native to the Midwest, the lab obtains specimens from the Field Museum.

Working on such exciting projects with students and postdocs is part of what makes UChicago exciting, Nirody says.

—Adapted from an article by Manasa Prahlad published by the Biological Sciences Division .

Get more with UChicago News delivered to your inbox.

Top Stories

  • Living their Olympic dreams: UChicago alumni relish moments on world stage

New Webb Telescope data suggests our model of the universe may hold up after all

Breakthrough by uchicago scientists could ease notoriously difficult chemical reaction, related topics, latest news.

AI parody of Michelangelo's painting "The Creation of Adam"

Big Brains podcast

Big Brains podcast: Fighting back against AI piracy

Chen Lab

Inside the Lab

Inside the Lab: New ways to grow cells to protect our lungs from disease

Photo closeup of a gloved hand holding a small beaker with yellow liquid

Go 'Inside the Lab' at UChicago

Explore labs through videos and Q&As with UChicago faculty, staff and students

A women's hands painted with blue nail polish hold onto a man's signing hand.

New book explores emergence of touch-based language in DeafBlind communities

Jasmine Nirody

Meet a UChicagoan

Around UChicago

Quantrell and PhD Teaching Awards

UChicago announces 2024 winners of Quantrell and PhD Teaching Awards

Campus News

Project to improve accessibility, sustainability of Main Quadrangles

National Academy of Sciences

Five UChicago faculty elected to National Academy of Sciences in 2024

Group photo of 100+ people outside a building with many windows

UChicago’s Kavli Institute for Cosmological Physics celebrates 20 years of discovery

Dean Thomas Miles with Richard Sandor (far right) and his wife Ellen (center)

University of Chicago Law School

Coase-Sandor Institute for Law and Economics celebrates decade of impact

Biological Sciences Division

“You have to be open minded, planning to reinvent yourself every five to seven years.”

Prof. Chuan He faces camera smiling with hands on hips with a chemistry lab in the background

Meet A UChicagoan

Organist pulls out all the stops to bring Bach to UChicago

Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach

  • Open access
  • Published: 12 August 2024
  • Volume 57 , article number  244 , ( 2024 )

Cite this article

You have full access to this open access article

hypothesis driven

  • Uchenna Akujuobi 1   na1 ,
  • Priyadarshini Kumari 2   na1 ,
  • Jihun Choi 3 ,
  • Samy Badreddine 1 ,
  • Kana Maruyama 3 ,
  • Sucheendra K. Palaniappan 4 &
  • Tarek R. Besold 1  

135 Accesses

Explore all metrics

Over the last few years Literature-based Discovery (LBD) has regained popularity as a means to enhance the scientific research process. The resurgent interest has spurred the development of supervised and semi-supervised machine learning models aimed at making previously implicit connections between scientific concepts/entities within often extensive bodies of literature explicit—i.e., suggesting novel scientific hypotheses. In doing so, understanding the temporally evolving interactions between these entities can provide valuable information for predicting the future development of entity relationships. However, existing methods often underutilize the latent information embedded in the temporal aspects of the interaction data. Motivated by applications in the food domain—where we aim to connect nutritional information with health-related benefits—we address the hypothesis-generation problem using a temporal graph-based approach. Given that hypothesis generation involves predicting future (i.e., still to be discovered) entity connections, in our view the ability to capture the dynamic evolution of connections over time is pivotal for a robust model. To address this, we introduce THiGER , a novel batch contrastive temporal node-pair embedding method. THiGER excels in providing a more expressive node-pair encoding by effectively harnessing node-pair relationships. Furthermore, we present THiGER-A , an incremental training approach that incorporates an active curriculum learning strategy to mitigate label bias arising from unobserved connections. By progressively training on increasingly challenging and high-utility samples, our approach significantly enhances the performance of the embedding model. Empirical validation of our proposed method demonstrates its effectiveness on established temporal-graph benchmark datasets, as well as on real-world datasets within the food domain.

Similar content being viewed by others

hypothesis driven

Connecting the Dots: Hypotheses Generation by Leveraging Semantic Shifts

hypothesis driven

Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network

hypothesis driven

Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. — Isaac Asimov. Science is advancing at an increasingly quick pace, as evidenced, for instance, by the exponential growth in the number of published research articles per year (White 2021 ). Effectively navigating this ever-growing body of knowledge is tedious and time-consuming in the best of cases, and more often than not becomes infeasible for individual scientists (Brainard 2020 ). In order to augment the efforts of human scientists in the research process, computational approaches have been introduced to automatically extract hypotheses from the knowledge contained in published resources. Swanson ( 1986 ) systematically used a scientific literature database to find potential connections between previously disjoint bodies of research, as a result hypothesizing a (later confirmed) curative relationship between dietary fish oils and Raynaud’s syndrome . Swanson and Smalheiser then automatized the search and linking process in the ARROWSMITH system (Swanson and Smalheiser 1997 ). Their work and other more recent examples (Fan and Lussier 2017 ; Trautman 2022 ) clearly demonstrate the usefulness of computational methods in extracting latent information from the vast body of scientific publications.

Over time, various methodologies have been proposed to address the Hypothesis Generation (HG) problem. Swanson and Smalheiser (Smalheiser and Swanson 1998 ; Swanson and Smalheiser 1997 ) pioneered the use of a basic ABC model grounded in a stringent interpretation of structural balance theory (Cartwright and Harary 1956 ). In essence, if entities A and B, as well as entities A and C, share connections, then entities B and C should be associated. Subsequent years have seen the exploration of more sophisticated machine learning-based approaches for improved inference. These encompass techniques such as text mining (Spangler et al. 2014 ; Spangler 2015 ), topic modeling (Sybrandt et al. 2017 ; Srihari et al. 2007 ; Baek et al. 2017 ), association rules (Hristovski et al. 2006 ; Gopalakrishnan et al. 2016 ; Weissenborn et al. 2015 ), and others (Jha et al. 2019 ; Xun et al. 2017 ; Shi et al. 2015 ; Sybrandt et al. 2020 )

In the context of HG, where the goal is to predict novel relationships between entities extracted from scientific publications, comprehending prior relationships is of paramount importance. For instance, in the domain of social networks, the principles of social theory come into play when assessing the dynamics of connections between individuals. When there is a gradual reduction in the social distance between two distinct individuals, as evidenced by factors such as the establishment of new connections with shared acquaintances and increased geographic proximity, there emerges a heightened likelihood of a subsequent connection between these two individuals (Zhang and Pang 2015 ; Gitmez and Zárate 2022 ). This concept extends beyond social networks and finds relevance in predicting scientific relationships or events through the utilization of temporal information (Crichton et al. 2018 ; Krenn et al. 2023 ; Zhang et al. 2022 ). In both contexts, the principles of proximity and evolving relationships serve as valuable indicators, enabling a deeper understanding of the intricate dynamics governing these complex systems.

Modeling these relationships’ temporal evolution assumes a critical role in constructing an effective and resilient hypothesis generation model. To harness the temporal dynamics, Akujuobi et al. ( 2020b , 2020a ) and Zhou et al. ( 2022 ) conceptualize the HG task as a temporal graph problem. More precisely, given a sequence of graphs \(G = \{G_{0}, G_{2},\ldots , G_{T} \}\) , the objective is to deduce which previously unlinked nodes in \(G_{T}\) ought to be connected. In this framework, nodes denote biomedical entities, and the graphs \(G_{\tau }\) represent temporal graphlets (see Fig.  1 ).

Definition 1

Temporal graphlet : A temporal graphlet \(G_{\tau } = \{V^{\tau },E^{\tau }\}\) is a temporal subgraph at time point \(\tau\) , where \(V^{\tau } \subset V\) and \(E^{\tau } \subset E\) are the temporal set of nodes and edges of the subgraph.

Their approach tackles the HG problem by introducing a temporal perspective. Instead of relying solely on the final state \(E_{T}\) on a static graph, it considers how node pairs evolve over discrete time steps \(E^{\tau }: \tau = 0 \dots T\) . To model this sequential evolution effectively, Akujuobi et al. and Zhou et al. leverage the power of recurrent neural networks (RNNs) (see Fig.  2 a). However, it is essential to note that while RNNs have traditionally been the preferred choice for HG, their sequential nature may hinder capturing long-range dependencies, impacting performance for lengthy sequences.

figure 1

Modeling hypothesis generation as a temporal link prediction problem

figure 2

Predicting the link probability \(p_{i,j}\) for a node pair \(v_i\) and \(v_j\) using a a Recurrent Neural Network approach (Akujuobi et al. 2020b ; Zhou et al. 2022 ), b THiGER, our approach. The recurrent approach aggregates the neighborhood information \({{\mathcal {N}}}^t(v_i)\) and \({{\mathcal {N}}}^t(v_j)\) sequentially while THiGER aggregates the neighborhood information hierarchically in parallel

To shed these limitations, we propose THiGER ( T emporal Hi erarchical G raph-based E ncoder R epresentation), a robust transformer-based model designed to capture the evolving relationships between node pairs. THiGER overcomes the constraints of previous methods by representing temporal relationships hierarchically (see Fig.  2 b). The proposed hierarchical layer-wise framework presents an incremental approach to comprehensively model the temporal dynamics among given concepts. It achieves this by progressively extracting the temporal interactions between consecutive time steps, thus enabling the model to prioritize attention to the informative regions of temporal evolution during the process. Our method effectively addresses issues arising from imbalanced temporal information (see Sect.  5.2 ). Moreover, it employs a contrastive learning strategy to improve the quality of task-specific node embeddings for node-pair representations and relationship inference tasks.

An equally significant challenge in HG is the lack of negative-class samples for training. Our dataset provides positive-class samples, which represent established connections between entities, but it lacks negative-class samples denoting non-existent connections (as opposed to undiscovered connections, which could potentially lead to scientific breakthroughs). This situation aligns with the positive-unlabeled (PU) learning problem. Prior approaches have typically either discarded unobserved connections as uninformative or wrongly treated them as negative-class samples. The former approach leads to the loss of valuable information, while the latter introduces label bias during training.

In response to these challenges, we furthermore introduce THiGER-A, an active curriculum learning strategy designed to train the model incrementally. THiGER-A utilizes progressively complex positive samples and highly informative, diverse unobserved connections as negative-class samples. Our experimental results demonstrate that by employing incremental training with THiGER-A, we achieve enhanced convergence and performance for hypothesis-generation models compared to training on the entire dataset in one go. Remarkably, our approach demonstrates strong generalization capabilities, especially in challenging inductive test scenarios where the entities were not part of the seen training dataset.

Inspired by Swanson’s pioneering work, we chose the food domain as a promising application area for THiGER. This choice is motivated by the increasing prevalence of diet-related health conditions, such as obesity and type-2 diabetes, alongside the growing recognition and utilization of the health benefits associated with specific food products in wellness and medical contexts.

In summary, our contributions are as follows:

Methodology: We propose a novel temporal hierarchical transformer-based architecture for node pair encoding. In utilizing the temporal batch-contrastive strategy, our architecture differs from existing approaches that learn in conventional static or temporal graphs. In addition, we present a novel incremental training strategy for temporal graph node pair embedding and future relation prediction. This strategy effectively mitigates negative-label bias through active learning and improves generalization by training the model progressively on increasingly complex positive samples using curriculum learning.

Evaluation: We test the model’s efficacy on several real-world graphs of different sizes to give evidence for the model’s strength for temporal graph problems and hypothesis generation. The model is trained end-to-end and shows superior performance on HG tasks.

Application: To the best of our knowledge, this is the first application of temporal hypothesis generation in the health-related food domain. Through case studies, we validate the practical relevance of our findings.

The remaining sections of this paper include a discussion of related work in Sect.  2 , a detailed introduction of the proposed THiGER model and the THiGER-A active curriculum learning strategy in Sect.  3 , an overview of the datasets, the model setup and parameter tuning, and our evaluation approach in Sect. 4 , the results of our experimental evaluations in Sect.  5 , and finally, our conclusions and a discussion of future work in Sect.  6 .

2 Related works

2.1 hypothesis generation.

The development of effective methods for machine-assisted discovery is crucial in pushing scientific research into the next stage (Kitano 2021 ). In recent years, several approaches have been proposed in a bid to augment human abilities relevant to the scientific research process including tools for research design and analysis (Tabachnick and Fidell 2000 ), process modelling and simulation (Klein et al. 2002 ), or scientific hypothesis generation (King et al. 2004 , 2009 ).

The early pioneers of the hypothesis generation domain proposed the so called ABC model for generating novel scientific hypothesis based on existing knowledge (Swanson 1986 ; Swanson and Smalheiser 1997 ). ABC-based models are simple and efficient, and have been implemented in classical hypothesis generation systems such as ARROWSMITH (Swanson and Smalheiser 1997 ). However, several drawbacks remain, including the need for similarity metrics defined on heuristically determined term lists and significant costs in terms of computational complexity with respect to the size of common entities.

More recent approaches, thus, have aimed to curtain the limitation of the ABC model. Spangler et al. ( 2014 ); Spangler ( 2015 ) proposed text mining techniques to identify entity relationships from unstructured medical texts. AGATHA (Sybrandt et al. 2020 ) used a transformer encoder architecture to learn the ranking criteria between regions of a given semantic graph and the plausibility of new research connections. Srihari et al. ( 2007 ); Baek et al. ( 2017 ) proposed several text mining approaches to detect how concepts are linked within and across multiple text documents. Sybrandt et al. ( 2017 ) proposed incorporating machine learning techniques such as clustering and topical phrase mining. Shi et al. ( 2015 ) modeled the probability that concepts will be linked based on a given time window using random walks.

The previously mentioned methods do not consider temporal attributes of the data. More recent works (Jha et al. 2019 ; Akujuobi et al. 2020a ; Zhou et al. 2022 ; Xun et al. 2017 ) argue that capturing the temporal information available in scholarly data can lead to better predictive performance. Jha et al. ( 2019 ) explored the co-evolution of concepts across knowledge bases using a temporal matrix factorization framework. Xun et al. ( 2017 ) modeled concepts’ co-occurrence probability using their temporal embedding. Akujuobi et al. ( 2020a , 2020b ) and Zhou et al. ( 2022 ) captured the temporal information in the scholarly data using RNN techniques.

Our approach captures the dynamic relationship information using a temporal hierarchical transformer encoder model. This strategy alleviates the limitations of the RNN-based models. Furthermore, with the incorporation of active curriculum learning strategies, our model can incrementally learn from the data.

2.2 Temporal graph learning

Learning on temporal graphs has received considerable attention from the research community in recent years. Some works (Hisano 2018 ; Ahmed et al. 2016 ; Milani Fard et al. 2019 ) apply static methods on aggregated graph snapshots. Others, including (Zhou et al. 2018 ; Singer et al. 2019 ), utilize time as a regularizer over consecutive snap-shots of the graph to impose a smoothness constraint on the node embeddings. A popular category of approaches for dynamic graphs is to introduce point processes that are continuous in time. DyRep (Trivedi et al. 2019 ) models the occurrence of an edge as a point process using graph attention on the destination node neighbors. Dynamic-Triad (Zhou et al. 2018 ) models the evolution patterns in a graph by imposing a triadic closure-where a triad with three nodes is developed from an open triad (i.e., with two nodes not connected).

Some recent works on temporal graphs apply several combinations of GNNs and recurrent architectures (e.g., GRU). EvolveGCN (Pareja et al. 2020 ) adapts the graph convolutional network (GCN) model along the temporal dimension by using an RNN to evolve the GCN parameters. T-PAIR (Akujuobi et al. 2020b , a ) recurrently learns a node pair embedding by updating GraphSAGE parameters using gated neural networks (GRU). TGN (Rossi et al. 2020 ) introduces a memory module framework for learning on dynamic graphs. TDE (Zhou et al. 2022 ) captures the local and global changes in the graph structure using hierarchical RNN structures. TNodeEmbed (Singer et al. 2019 ) proposes the use of orthogonal procrustes on consecutive time-step node embeddings along the time dimension.

However, the limitation of RNN remains due to their sequential nature and robustness especially when working on a long timeline. Since the introduction of transformers, there has been interest in their application on temporal graph data. More related to this work, Zhong and Huang ( 2023 ) and Wang et al. ( 2022 ) both propose the use of a transformer architecture to aggregate the node neighborhood information while updating the memory of the nodes using GRU. TLC (Wang et al. 2021a ) design a two-stream encoder that independently processes temporal neighborhoods associated with the two target interaction nodes using a graph-topology-aware Transformer and then integrates them at a semantic level through a co-attentional Transformer.

Our approach utilizes a single hierarchical encoder model to better capture the temporal information in the network while simultaneously updating the node embedding on the task. The model training and node embedding learning is performed end-to-end.

2.3 Active curriculum learning

Active learning (AL) has been well-explored for vision and learning tasks (Settles 2012 ). However, most of the classical techniques rely on single-instance-oracle strategies, wherein, during each training round, a single instance with the highest utility is selected using measures such as uncertainty sampling (Kumari et al. 2020 ), expected gradient length (Ash et al. 2020 ), or query by committee (Gilad-Bachrach et al. 2006 ). The single-instance-oracle approach becomes computationally infeasible with large training datasets such as ours. To address these challenges, several batch-mode active learning methods have been proposed (Priyadarshini et al. 2021 ; Kirsch et al. 2019 ; Pinsler et al. 2019 ). Priyadarshini et al. ( 2021 ) propose a method for batch active metric learning, which enables sampling of informative and diverse triplet data for relative similarity ordering tasks. In order to prevent the selection of correlated samples in a batch, Kirsch et al. ( 2019 ); Pinsler et al. ( 2019 ) develop distinct methods that integrate mutual information into the utility function. All three approaches demonstrate effectiveness in sampling diverse batches of informative samples for metric learning and classification tasks. However, none of these approaches can be readily extended to our specific task of hypothesis prediction on an entity-relationship graph.

Inspired by human learning, Bengio et al. ( 2009 ) introduced the concept of progressive training, wherein the model is trained on increasingly difficult training samples. Various prior works have proposed different measures to quantify the difficulty of training examples. Hacohen and Weinshall ( 2019 ) introduced curriculum learning by transfer, where they developed a score function based on the prediction confidence of a pre-trained model. Wang et al. ( 2021b ) proposed a curriculum learning approach specifically for graph classification tasks. Another interesting work is relational curriculum learning (RCL) (Zhang et al. 2023 ) suggests training the model progressively on complex samples. Unlike most prior work, which typically consider data to be independent, RCL quantifies the difficulty level of an edge by aggregating the embeddings of the neighboring nodes. While their approach utilizes similar relational data to ours, their method does not specifically tackle the challenges inherent to the PU learning setting, which involves sampling both edges and unobserved relationships from the training data. In contrast, our proposed method introduces an incremental training strategy that progressively trains the model by focusing on positive edges of increasing difficulty, as well as incorporating highly informative and diverse negative edges.

figure 3

Schematic representation of the proposed model for temporal node-pair link prediction. In a , the hierarchical graph transformer model takes as input the aggregated node pair embeddings obtained at each time step \(\tau\) , these temporal node pair embeddings are further encoded and aggregated at each encoder layer. The final output is the generalized node pair embedding across all time steps. In b , a general overview of the model is given, highlighting the incorporation of the Active Curriculum Learning strategy

3 Methodology

3.1 notation.

\(G = \{G_0, \dots , G_T\}\) is a temporal graph such that \(G_\tau = \{V^\tau , E^\tau \}\) evolves over time \(\tau =0\dots T\) ,

\(e(v_i, v_j)\) or \(e_{ij}\) is used to denote the edge between nodes \(v_i\) and \(v_j\) , and \((v_i, v_j)\) is used to denote the node pair corresponding to the edge,

\(y_{i,j}\) is the label associated with the edge \(e(v_i,v_j)\) ,

\({{\mathcal {N}}}^{{\tau }}(v_{})\) gives the neighborhood of a node v in \(V^\tau\) ,

\(x_{v_{}}\) is the embedding of a node v and is static across time steps,

\(z_{i,j}^{\tau }\) is the embedding of a node pair \(\langle v_i, v_j \rangle\) . It depends on the neighborhood of the nodes at a time step \(\tau\) ,

\(h_{i,j}^{[\tau _0,\tau _f]}\) is the embedding of a node pair over a time step window \(\tau _0, \dots , \tau _f\) where \(0 \le \tau _0 \le \tau _f \le T\) ,

\(f(.; \theta )\) is a neural network depending on a set of parameters \(\theta\) . For brevity, \(\theta\) can be omitted if it is clear from the context.

\(E^+\) and \(E^-\) are the subsets of positive and negative edges, denoting observed and non-observed connections between biomedical concepts, respectively.

L is the number of encoder layers in the proposed model.

figure a

Hierarchical Node-Pair Embedding \(h_{i,j}^{[\tau _0,\tau _f]}\)

figure b

Link Prediction

3.2 Model overview

The whole THiGER(-A) model is shown in Fig.  3 b. Let \(v_i, v_j \in V_T\) be nodes denoting two concepts. The pair is assigned a positive label \(y_{i,j} = 1\) if a corresponding edge (i.e., a link) is observed in \(G_T\) . That is, \(y_{i,j} = 1\) iff \(e(v_i, v_j) \in E^{T}\) , otherwise 0. The model predicts a score \(p_{i,j}\) that reflects \(y_{i,j}\) . The prediction procedure is presented in Algorithm 2.

The link prediction score is given by a neural classifier \(p_{i,j} = f_C(h_{i,j}^{[0,T]}; \theta _C)\) , where \(h_{i,j}^{[0,T]}\) is an embedding vector for the node pair. This embedding is calculated in Algorithm 1 using a hierarchical transformer encoder and illustrated in Fig.  3 a.

The input to the hierarchical encoder layer is the independent local node pair embedding aggregation at each time step shown in line 3 of algorithm 1 as

where \({\textbf{x}}_{{{\mathcal {N}}}^{{\tau }}(v_{i})} = \{x_{v'}: v' \in {{\mathcal {N}}}^{{\tau }}(v_{i})\}\) and \({\textbf{x}}_{{{\mathcal {N}}}^{{\tau }}(v_{j})} = \{x_{v'}: v' \in {{\mathcal {N}}}^{{\tau }}(v_{j})\}\) are the embeddings of the neighbors of \(x_{v_{i}}\) and \(x_{v_{j}}\) at the given time step.

Subsequently, the local node pair embeddings aggregation is processed by the aggregation layer illustrated in Fig.  3 a and shown in line 10 of Algorithm 1. At each hierarchical layer, temporal node pair embeddings are calculated for a sub-window using

where n represents the sub-window size. When necessary, we ensure an even number of leaves to aggregate by adding zero padding values \(H_\textrm{padding} = {\textbf{0}}_d\) , where d is the dimension of the leaf embeddings. The entire encoder architecture is denoted as \(f_E = { f^l_E: l=1\dots L}\) .

In this work, the classifier \(f_C(.; \theta _C)\) is modeled using a multilayer perceptron network (MLP), \(f_A(.; \theta _A)\) is elaborated in Sect.  3.3 , and \(f_E(.;\theta _E)\) is modeled by a multilayer transformer encoder network, which is detailed in Sect.  3.4 .

3.3 Neighborhood aggregation

The neighborhood aggregation is modeled using GraphSAGE (Hamilton et al. 2017 ). GraphSAGE uses K layers to iteratively aggregate a node embedding \(x_{v_{}}\) and its neighbor embeddings \({\textbf{x}}_{{{\mathcal {N}}}^{{\tau }}(v_{})} = \{x_{v'}, v' \in {{\mathcal {N}}}^{{\tau }}(v_{})\}\) . \(f_A\) uses the GraphSAGE block to aggregate \((x_{v_{i}}, {\textbf{x}}_{{{\mathcal {N}}}^{{\tau }}(v_{i})})\) and \((x_{v_{j}}, {\textbf{x}}_{{{\mathcal {N}}}^{{\tau }}(v_{j})})\) in parallel, then merges the two aggregated representations using a MLP layer. In this paper, we explore three models based on the aggregation technique used at each iterative step of GraphSAGE.

Mean Aggregation: This straightforward technique amalgamates neighborhood representations by computing element-wise means of each node’s neighbors and subsequently propagating this information iteratively. For all nodes within the specified set:

Here, \(\beta _{v}^{k}\) denotes the aggregated vector at iteration k , and \(\beta ^{k-1}_{v}\) at iteration \(k-1\) . \(W^S\) and \(W^N\) represent trainable weights, and \(\sigma\) constitutes a sigmoid activation, collectively forming a conventional MLP layer.

GIN (Graph Isomorphism Networks): Arguing that traditional graph aggregation methods, like mean aggregation, possess limited expressive power, GIN introduces the concept of aggregating neighborhood representations as follows:

hypothesis driven

In this formulation, \(\epsilon ^{k}\) governs the relative importance of the node compared to its neighbors at layer k and can be a learnable parameter or a fixed scalar.

Multi-head Attention: We introduce a multi-head attention-based aggregation technique. This method aggregates neighborhood representations by applying multi-head attention to the node and its neighbors at each iteration:

Here, \(\phi\) represents a multi-head attention function, as detailed in Vaswani et al. ( 2017 ).

3.3.1 Neighborhood definition

To balance performance and scalability considerations, we adopt the neighborhood sampling approach utilized in GraphSAGE to maintain a consistent computational footprint for each batch of neighbors. In this context, we employ a uniform sampling method to select a neighborhood node set of fixed size, denoted as \({{\mathcal {N}}}^{'}(v_{}) \subset {{\mathcal {N}}}^{{\tau }}(v_{})\) , from the original neighbor set at each step. This sampling procedure is essential as, without it, the memory and runtime complexity of a single batch becomes unpredictable and, in the worst-case scenario, reaches a prohibitive \({{\mathcal {O}}}(|V|)\) , making it impractical for handling large graphs.

3.4 Temporal hierarchical multilayer encoder layer

The temporal hierarchical multilayer encoder is the fundamental component of our proposed model, responsible for processing neighborhood representations collected over multiple time steps, specifically \((z_{i,j}^{0}, z_{i,j}^{1}, \dots , z_{i,j}^{T})\) . These neighborhood representations are utilized to construct a hierarchical tree.

At the initial hierarchical layer, we employ an encoder, denoted as \(f_E^1\) , to distill adjacent sequential local node-pair embeddings, represented as \((z_{i,j}^{\tau }, z_{i,j}^{\tau + 1})\) , combining them into a unified embedding, denoted as \(h_{i,j}^{[\tau ,\tau +1]}\) . In cases where the number of time steps is not an even multiple of 2, a zero-vector dummy input is appended.

This process repeats at each hierarchical level l within the tree, with \(h_{i,j}^{[\tau -n,\tau ]} = f^l_E(h_{i,j}^{[\tau -n,\tau -\frac{n}{2}]}, h_{i,j}^{[(\tau -\frac{n}{2}) + 1,\tau ]};\theta ^l_E)\) . Each layer \(f_E^l\) consists of a transformer encoder block and may contain \(N - 1\) encoder sublayers, where \(N \ge 1\) . This mechanism can be viewed as an iterative knowledge aggregation process, wherein the model progressively summarizes the information from pairs of local node pair embeddings.

The output of each encoder layer, denoted as \(h_{i,j}^{[\tau _0,\tau _f]}\) , offers a comprehensive summary of temporal node pair information from time step \(\tau _0\) to \(\tau _f\) . Finally, the output of the last layer, \(h_{i,j}^{[0,T]}\) , is utilized for inferring node pair relationships.

3.5 Parameter learning

The trainable parts of the architecture are the weights and parameters of the neighborhood aggregator \(f_A\) , the transformer network \(f_E\) , the classifier \(f_C\) and the embedding representations \(\{x_{v_{}}: v \in V\}\) .

To obtain suitable representations, we employ a combination of supervised and contrastive loss functions on the output of the hierarchical encoder layer \(h_{i,j}^{[0,T]}\) . The contrastive loss function encourages the embeddings of positive (i.e. a link exists in \(E^T\) ) node pairs to be closer while ensuring that the embeddings of negative node pairs are distinct.

We adopt a contrastive learning framework (Chen et al. 2020 ) to distinguish between positive and negative classes. For brevity, we temporarily denote \(h_{i,j}^{[0,T]}\) as \(h_{i,j}\) . Given two positive node pairs with corresponding embeddings \(e(v_i, v_j) \rightarrow h_{i,j}\) and \(e(v_o, v_n) \rightarrow h_{o,n}\) , the loss function is defined as follows:

where \(\alpha\) represents a temperature parameter, B is a set of node pairs in a given batch, and \(\mathbbm {1}_{(k,w) \ne (i,j)}\) indicates that the labels of node pair ( k ,  w ) and ( i ,  j ) are different. We employ the angular similarity function \(\textrm{sim}(x)=1 - \arccos (x)/\pi\) . We do not explicitly sample negative examples, following the methodology outlined in Chen et al. ( 2020 ).

The contrastive loss is summed over the positive training data \(E^+\) :

To further improve the discriminative power of the learned features, we also minimize the center loss:

where E is the data of positive and negative edges, \(y_{i,j}\) is the class of the pair (0 or 1), \(c_{y_{{i,j}}} \in R^d\) denotes the corresponding class center. The class centers are updated after each mini-batch step following the method proposed in Wen et al. ( 2016 ).

Finally, a good node pair vector \(h_{i,j}^{[0,T]}\) should minimize the binary cross entropy loss of the node pair prediction task:

We adopt the joint supervision of the prediction loss, contrastive loss, and center loss to jointly train the model for discriminative feature learning and relationship inference:

As is usual, the losses are applied over subsets of the entire dataset. In this case, we have an additional requirement for pairs of nodes in \(E^-\) : at least one of the two nodes needs to appear in \(E^+\) . An elaborate batch sampling strategy is proposed in the following section. The model parameters are trained end to end.

figure c

Training Procedure in THiGER-A

3.6 Incremental training strategy

This section introduces the incremental training strategy THiGER-A , which extends our base THiGER model. The pseudo-code for THiGER-A is presented in Algorithm 3. We represent the parameters used in the entire architecture as \(\varvec{\theta }= (\theta _A, \theta _E, \theta _C)\) . Let \(P(y \mid e_{i,j}; \varvec{\theta })\) , where \(y\in \{0,1\}\) , denote the link predictor for the nodes \((v_i, v_j)\) . Specifically, in shorthand, we denote \(P(y=1 \mid e_{i,j};\varvec{\theta })\) by \(p_{i,j}\) as in line 3 of Algorithm 2, likewise \(P(y=0\mid e_{i,j}; \varvec{\theta }) = 1 - p_{i,j}\) .

We define \(E^- = (V \times V) \setminus E\) as the set of negative edges representing non-observed connections in the graph. The size of the negative set grows quadratically with the number of nodes, resulting in a computational complexity of \({{\mathcal {O}}}(|V|^2)\) . For large, sparse graphs like ours, the vast number of negative edges makes it impractical to use all of them for model training.

Randomly sampling negative examples may introduce noise and hinder training convergence. To address this challenge, we propose an approach to sample a smaller subset of “informative” negative edges that effectively capture the entity relationships within the graph. Leveraging active learning, a technique for selecting high-utility datasets, we aim to choose a subset \(B^*_N \subset E^-\) that leads to improved model learning.

3.6.1 Negative Edge Sampling using Active Learning

Active learning (AL) is an iterative process centered around acquiring a high-utility subset of samples and subsequently retraining the model. The initial step involves selecting a subset of samples with high utility, determined by a specified informativeness measure. Once this subset is identified, it is incorporated into the training data, and the model is subsequently retrained. This iterative cycle, involving sample acquisition and model retraining, aims to improve the model’s performance and generalizability through the learning process.

In this context, we evaluate the informativeness of edges using a score function denoted as \(S_{AL}: (v_{i}^{-}, v_{j}^{-}) \rightarrow {\mathbb {R}}\) . An edge \((v_{i}^{-}, v_{j}^{-})\) is considered more informative than \((v_{k}^{-}, v_{l}^{-})\) if \(S_{AL}(v_{i}^{-}, v_{j}^{-}) > S_{AL}(v_{k}^{-}, v_{l}^{-})\) . The key challenge in AL lies in defining \(S_{AL}\) , which encodes the learning of the model \(P(.;\varvec{\theta })\) trained in the previous iteration.

We gauge the informativeness of an edge based on model uncertainty. An edge is deemed informative when the current model \(P(.;\varvec{\theta })\) exhibits high uncertainty in predicting its label. Uncertainty sampling is one of the most popular choices for the quantification of informativeness due to its simplicity and high effectiveness in selecting samples for which the model lacks sufficient knowledge. Similar to various previous techniques, we use Shannon entropy to approximate informativeness (Priyadarshini et al. 2021 ; Kirsch et al. 2019 ). It is important to emphasize that ground truth labels are unavailable for negative edges, which represent unobserved entity connections. Therefore, to estimate the informativeness of negative edges, we calculate the expected Shannon entropy across all possible labels. Consequently, the expected entropy for a negative edge \((v_{i}^{-}, v_{j}^{-})\) at the \(m^{th}\) training round is defined as:

Here, \(\varvec{\theta }^{m-1}\) is the base hypothesis predictor model trained at the \((m-1)^{th}\) training round and \(m = 0, 1, \cdots , M\) denotes the AL training round. Selecting a subset of uncertain edges, \(B_{U}\) using Eq.  12 unfortunately does not ensure diversity among the selected subset. The diversity metric is crucial in subset selection as it encourages the selection of diverse samples within the embedding space. This, in turn, results in a higher cumulative informativeness for the selected subset, particularly when the edges exhibit overlapping features. The presence of a highly-correlated edges in the selected subset can lead to a sub-optimal batch with high redundancy. The importance of diversity in selecting informative edges has been emphasized in several prior works (Kirsch et al. 2019 ; Priyadarshini et al. 2021 ). To obtain a diverse subset, both approaches aim to maximize the joint entropy (and consequently, minimize mutual information) among the samples in the selected batch. However, maximizing joint entropy is an expensive combinatorial optimization problem and does not scale well for larger datasets, as in our case.

We adopt a similar approach as (Kumari et al. 2020 ) and utilize the k-means++ algorithm (Arthur and Vassilvitskii 2006 ) to cluster the selected batch \(B_U\) into diverse landmark points. While (Kumari et al. 2020 ) is tailored for metric learning tasks with the triplet samples as inputs, our adaptation of the k-means++ algorithm is designed for graph datasets, leading to the selection of diverse edges within the gradient space. Although diversity in the gradient space is effective for gradient-based optimizer, a challenge arises due to the high dimensionality of the gradient space, particularly when the model is large. To overcome this challenge, we compute the expected gradient of the loss function with respect to only the penultimate layer of the network, \(\nabla _{\theta _{out}}{{\mathcal {L}}}_{e_{ij}^{-}}\) , assuming it captures task-specific features. We begin to construct an optimal subset \(B_{N}^{*} \in B_{U}\) by initially (say, at \(k=0\) ) selecting the two edges with the most distinct gradients. Subsequently, we iteratively select the most dissimilar gradient edge from the selected subset using the maxmin optimization objective defined in Eq.  13 .

Here \(d_{E}\) represents the Euclidean distance between two vectors in the gradient space, consisting of \(\nabla _{\theta _{out}}{{\mathcal {L}}}_{e_{ij}^{-1}}\) , which denotes the gradient of the loss function \({{\mathcal {L}}}\) with respect to the penultimate layer of the network \(\theta _{out}\) . The process continues until we reach the allocated incremental training budget, \(|B_{N}^{*}| = K\) . The resulting optimal subset of negative edges, \(B_{N}^{*}\) , comprises negative edges that are both diverse and informative.

3.6.2 Positive Edge Sampling

Inspired by Curriculum Learning (CL), a technique mimicking certain aspects of human learning, we investigate its potential to enhance the performance and generalization of the node pair predictor model. Curriculum Learning involves presenting training data to the model in a purposeful order, starting with easier examples and gradually progressing to more challenging ones. We hypothesize that applying CL principles can benefit our node pair predictor model. By initially emphasizing the learning of simpler connections and leveraging prior knowledge, the model can effectively generalize to more complex connections during later stages of training. Although Active Learning (AL) and CL both involve estimating the utility of training samples, they differ in their approach to label availability. AL operates in scenarios where labels are unknown and estimates sample utility based on expected scores. In contrast, CL uses known labels to assess sample difficulty. For our model, we use one of the common approaches to define a difficulty score \(S_{CL}\) based on the model’s prediction confidence. The model’s higher prediction confidence indicates easier samples.

Here, \(S_{CL}(v_{i}, v_{j})\) indicates predictive uncertainty of an edge \(e_{ij}\) to be positive by an existing trained model \(\theta ^{m-1}\) at \((m-1)^{th}\) iteration. In summary, for hypothesis prediction using a large training dataset, Active Curriculum Learning provides a natural approach to sample an informative and diverse subset of high-quality samples, helping to alleviate the challenges associated with label bias.

4 Experimental setup

In this section, we present the experimental setup for our evaluation. We compare our proposed model, THiGER(-A), against several state-of-the-art (SOTA) methods to provide context for the empirical results on benchmark datasets. To ensure fair comparisons, we utilize publicly available baseline implementations and modify those as needed to align with our model’s configuration and input requirements. All experiments were conducted using Python. For the evaluation of the interaction datasets, we train all models on a single NVIDIA A10G GPU. In the case of the food-related biomedical dataset, we employ 4 NVIDIA V100 GPUs for model training. Notably, all models are trained on single machines. In our experiments, we consider graphs as undirected. The node attribute embedding dimension is set to \(d=128\) for all models evaluated. For baseline methods, we performed a parameter search on the learning rate and training steps, and we report the best results achieved. Our model is implemented in TensorFlow.

4.1 Datasets and model setup

Table 1 shows the statistics of the datasets used in this study. Unless explicitly mentioned, all methods, including our model, share the same initial node attributes provided by pretrained Node2Vec (Grover and Leskovec 2016 ). The pre-trained Node2vec embedding effectively captures the structural information of nodes in the training graph. In our proposed framework, the choice of a fixed node embedding is to enable the model capture the temporal evolution of network relations, given that the node embeddings are in the same vector space. While employing a dynamic node embedding framework may enhance results, it introduces complexities associated with aligning vector spaces across different timestamps. This aspect is deferred to future research. It is important to note that the Node2vec embeddings serve solely as initializations for the embedding layer, and the embedding vectors undergo fine-tuning during the learning process to further capture the dynamic evolution of node relationships. For models that solely learn embedding vectors for individual nodes, we represent the \(h_{i,j}\) of a given node pair as the concatenation of the embedding vectors for nodes \(\langle x_i, x_j \rangle\) .

4.1.1 Interaction datasets

We have restructured the datasets to align with our specific use case. We partition the edges in the temporal graphs into five distinct groups based on their temporal labels. For example, if a dataset is labeled up to 500 time units, we reorganize them as follows: \(\{0, \dots , 100\} \rightarrow 0\) , \(\{101, \dots , 200\} \rightarrow 1\) , \(\{201, \dots , 300\} \rightarrow 2\) , \(\{301, \dots , 400\} \rightarrow 3\) , and \(\{401, \dots , 500\} \rightarrow 4\) . These User-Item based datasets create bipartite graphs. For all inductive evaluations, we assume knowledge of three nearest node neighbors for each of the unseen nodes. Neighborhood information is updated after model training to incorporate this knowledge, with zero vectors assigned to new nodes.

4.1.2 Food-related biomedical temporal datasets

To construct the relationship graph, we extract sentences containing predefined entities (Genes, Diseases, Chemical Compounds, Nutrition, and Food Ingredients). We establish connections between two concepts that appear in the same sentence within any publication in the dataset. The time step for each relationship between concept pairs corresponds to the publication year when the first mention was identified (i.e., the oldest publication year among all the publications where the concepts are associated). We generate three datasets for evaluation based on concept pair domains: Ingredient, disease pairs, Ingredient, Chemical compound pairs, and all pairs (unfiltered). Graph statistics are provided in Table 1 . For training and testing sets, we divide the graph into 10-year intervals starting from 1940 (i.e., { \(\le 1940\) }, {1941–1950}, \(\dots\) , {2011–2020}). The splits \(\le\) 2020 are used for training, and the split {2021–2022} is used for testing. In accordance with the problem configuration in the interaction dataset, we update the neighborhood information and also assume knowledge of three nearest node neighbors pertaining to each of the unseen nodes for inductive evaluations.

4.1.3 Model setup & parameter tuning

Model Configuration: We employ a hierarchical encoder with \(N\lceil \log _{2} T \rceil\) layers, where N is a multiple of each hierarchical layer (i.e., with \(N-1\) encoder sublayers), and T represents the number of time steps input to each hierarchical encoder layer. In our experiments, we set the number of encoder layer multiples to \(N=2\) . We use 8 attention heads with 128 dimensional states. For the position-wise feed-forward networks, we use 512 dimensional inner states. For the activation function, we applied the Gaussian Error Linear Unit (GELU, Hendrycks and Gimpel 2016 ). We apply a dropout (Srivastava et al. 2014 ) to the output of each sub-layer with a rate of \(P_{drop} = 0.1\) .

Optimizer: Our models are trained using the AdamW optimizer (Loshchilov and Hutter 2017 ), with the following hyper-parameters: \(\beta _1 = 0.9\) , \(\beta _2 = 0.99\) , and \(\epsilon = 10^{-7}\) . We use a linear decay of the learning rate. We set the number of warmup steps to \(10\%\) of the number of train steps. We vary the learning rate with the size of the training data.

Time Embedding: We use Time2Vec (T2V, Kazemi et al. 2019 ) to generate time-step embeddings which encode the temporal sequence of the time steps. The T2V model is learned and updated during the model training.

Active learning: The size of subset \(B_U\) is twice the size of the optimal subset \(B^{*}\) . The model undergoes seven training rounds for the Wikipedia, Reddit, and Last FM datasets, while it is trained for three rounds for the food-related biomedical dataset (All, Ingredient-Disease, Ingredient-Chemical). Due to the large size of biomedical dataset, we limit the model training to only three rounds. However, we anticipate that increasing the number of training rounds will lead to further improvements in performance.

4.2 Evaluation metrics

In this study, we assess the efficacy of the models by employing the binary F1 score and average precision score (AP) as the performance metrics. The binary F1 score is defined as the harmonic mean of precision and recall, represented by the formula:

Here, precision denotes the ratio of true positive predictions to the total predicted positives, while recall signifies the ratio of true positive predictions to the total actual positives.

The average precision score is the weighted mean of precisions achieved using different thresholds, using the incremental change in recall from the previous threshold as weight:

where N is the total number of thresholds, \(P_{k}\) is the precision at cut-off k , and \(\Delta R_{k} = R_{k} - R_{k - 1}\) is a sequential change in the recall value. Our emphasis on positive predictions in the evaluations is driven by our preference for models that efficiently forecast future connections between pairs of nodes.

4.3 Method categories

We categorize the methods into two main groups based on their handling of temporal information:

Static Methods: These methods treat the graph as static data and do not consider the temporal aspect. The static methods under consideration include the Logistic regression model, GraphSAGE (Hamilton et al. 2017 ), and AGATHA (Sybrandt et al. 2020 ).

Temporal Methods: These state-of-the-art methods leverage temporal information to create more informative node representations. We evaluate the performance of our base model, THiGER, and the final model, THiGER-A, against the following temporal methods: CTDNE (Nguyen et al. 2018 ), TGN (Rossi et al. 2020 ), JODIE (Kumar et al. 2019 ), TNodeEmbed (Singer et al. 2019 ), DyRep (Trivedi et al. 2019 ), T-PAIR (Akujuobi et al. 2020b ), and TDE (Zhou et al. 2022 ).

5 Experiments

The performance of THiGER-A is rigorously assessed across multiple benchmark datasets, as presented in Tables 2 and 3 . The experimental evaluations are primarily geared toward two distinct objectives:

Assessing the model’s effectiveness in handling interaction datasets pertinent to temporal graph problems.

Evaluating the model’s proficiency in dealing with food-related biomedical datasets, specifically for predicting relationships between food-related concepts and other biomedical terms.

In Sects.  4.1.1 and 4.1.2 , a comprehensive overview of the used datasets is provided. Our evaluations encompass two fundamental settings:

Transductive setup: This scenario involves utilizing data from all nodes during model training.

Inductive setup: In this configuration, at least one node in each evaluated node pair has not been encountered during the model’s training phase.

These experiments are designed to rigorously assess THiGER-A’s performance across diverse datasets, offering insights into its capabilities under varying conditions and problem domains.

5.1 Quantitative evaluation: interaction temporal datasets

We assess the performance of our proposed model in the context of future interaction prediction (Rossi et al. 2020 ; Kumar et al. 2019 ). The datasets record interactions between users and items.

We evaluate the performance on three distinct datasets: (i) Reddit, (ii) LastFM, and (iii) Wikipedia, considering both transductive and inductive settings. In the transductive setting, THiGER-A outperforms other models across all datasets, except Wikipedia, where AGATHA exhibits significant superiority. Our analysis reveals that AGATHA’s advantage lies in its utilization of the entire graph for neighborhood and negative sampling, which gives it an edge over models using a subset of the graph due to computational constraints. This advantage is more evident in the transductive setup since AGATHA’s training strategy leans towards seen nodes. Nevertheless, THiGER-A consistently achieves comparable or superior performance even in the presence of AGATHA’s implicit bias. It is imperative to clarify that AGATHA was originally designed for purposes other than node-pair predictions. Nonetheless, we have adapted the algorithm to align with the node-pair configuration specifically for our research evaluations.

In the inductive setup, our method excels in the Wikipedia and Reddit datasets but lags behind some baselines in the LastFM dataset. Striking a balance between inductive and transductive performance, THiGER-A’s significant performance gain over THiGER underscores the effectiveness of the proposed incremental learning strategy. This advantage is particularly pronounced in the challenging inductive test setting.

5.2 Quantitative evaluation: food-related biomedical temporal datasets

This section presents the quantitative evaluation of our proposed model on temporal node pair (or “link”) prediction, explicitly focusing on food-related concept relationships extracted from scientific publications in the PMC dataset. The evaluation encompasses concept pairs from different domains, including Ingredient, Disease pairs (referred to as F-ID), Ingredient, Chemical Compound pairs (F-IC), and all food-related pairs (F-A). The statistical characteristics of the dataset are summarized in Table  1 .

Table  3 demonstrates that our model outperforms the baseline models in both inductive and transductive setups. The second-best performing model is AGATHA, which, as discussed in the previous section, exhibits certain advantages over alternative methods. It is noteworthy that the CTDNE method exhibits scalability issues with larger datasets.

An intriguing observation from this evaluation is that, aside from our proposed model, static methods outperform temporal methods on this dataset. Further investigation revealed that the data is more densely distributed toward the later time steps. Notably, a substantial increase in information occurs during the last time steps. Up to the year 2000, the average number of edges per time step is approximately 100, 000. However, this number surges to about 1 million in the time window from 2001 to 2010, followed by another leap to around 4 million in the 2011–2020 time step. This surge indicates a significant influx of knowledge in food-related research in recent years.

We hypothesize that while this influx is advantageous for static methods, it might adversely affect some temporal methods due to limited temporal information. To test this hypothesis, we conduct an incremental evaluation, illustrated in Fig.  4 , using two comparable link prediction methods (Logistic Regression and GraphSAGE) and the two best temporal methods (tNodeEmbed and THiGER). In this evaluation, we incrementally assess the transductive performance on testing pairs up to the year 2000. Specifically, we evaluate the model performance on the food dataset (F-A) in the time intervals 1961–1970 by using all available training data up to 1960, and similarly for subsequent time intervals.

From Fig.  4 , it is evident that temporal methods outperform static methods when the temporal data is more evenly distributed, i.e., when there is an incremental increase in temporal data. The sudden exponential increase in data during the later years biases the dataset towards the last time steps. However, THiGER consistently outperforms the baseline methods in the incremental evaluation, underscoring its robustness and flexibility.

figure 4

Transductive F1 score of incremental prediction (per year) made by THiGER and three other baselines. The models are incrementally trained with data before the displayed evaluation time window

5.3 Ablation study

In this section, we conduct an ablation study to assess the impact of various sampling strategies on the base model’s performance. The results are presented in Table  4 , demonstrating the performance improvements achieved by the different versions of the THiGER model (-mean, -gin and -attn) for each dataset. Due to the much larger size of the food-related biomedical dataset, we conduct the ablation study only for the baseline datasets.

First, we investigate the influence of the active learning (AL)-based negative sampler on the base THiGER model. A comparison of the model’s performance with and without the AL-based negative sampler reveals significant improvements across all datasets. Notably, the performance gains are more pronounced in the challenging inductive test cases where at least one node of an edge is unseen in the training data. This underscores the effectiveness and generalizability of the AL-based learner for the hypothesis prediction model in the positive-unlabeled (PU) learning setup.

Next, we integrate curriculum learning (CL) as a positive data sampler, resulting in further enhancements to the base model. Similar to the AL-based sampling, the performance gains are more pronounced in the inductive test setting. The relatively minor performance improvement in the transductive case may be attributed to the limited room for enhancement in that specific context. Nevertheless, both AL alone and AL combined with CL enhance the base model’s performance and generalizability, particularly in the inductive test scenario.

figure 5

Pair embedding visualization. The blue color denotes the true negative samples, the red points are false negative, the green points are true positive, and the purple points are false positive

5.4 Pair embedding visualization

In this section, we conduct a detailed analysis of the node pair embeddings generated by THiGER using the F-ID dataset. To facilitate visualization, we randomly select 900 pairs and employ t-SNE (Van der Maaten and Hinton 2008 ) to compare these embeddings with those generated by Node2Vec, as shown in Fig.  5 . We employ color-coding to distinguish between the observed labels and the predicted labels. Notably, we observe distinct differences in the learned embeddings. THiGER effectively separates positive and negative node pairs in the embedding space. True positives (denoted in green) and true negatives (denoted in blue) are further apart in the embedding space, while false positives (indicated in red) and false negatives (shown in purple) occupy an intermediate region. This observation aligns with the idea that unknown connections are not unequivocal in our application domain, possibly due to missing data or discoveries yet to be made.

5.5 Case study

To assess the predictive accuracy of our model, we conducted a detailed analysis using the entire available food-related biomedical temporal dataset. We collaborated with biologists to evaluate the correctness of the generated hypotheses. Unlike providing binary predictions (1 or 0), we take a probabilistic approach by assigning a probability score within the range of 0 to 1. This score reflects the likelihood of a connection existing between the predicted node pairs. Consequently, the process of ranking a set of relation predictions associated with a specific node is tantamount to ranking the corresponding predicted probabilities.

Using this methodology, we selected 402 node pairs and presented them to biomedical researchers for evaluation. The researchers sought hypotheses related to specific oils. Subsequently, we generated hypotheses representing potential future connections between the oil nodes and other nodes, resulting in a substantial list. Given the anticipated extensive list, we implemented a filtering process based on the associated probability scores. This enabled us to selectively identify predictions with high probabilities, which were then communicated to the biomedical researchers for evaluation. The evaluation encompassed two distinct approaches.

First, they conducted manual searches for references to the predicted positive node pairs in various biology texts, excluding our dataset. Their findings revealed relationships in 70 percent of the node pairs through literature searches and reviews.

Secondly, to explore cases where no direct relationship was apparent in existing literature, they randomly selected and analyzed three intriguing node pairs: (i) Flaxseed oil and Root caries , (ii) Benzoxazinoid and Gingelly oil , and (iii) Senile osteoporosis and Soybean oil .

5.5.1 Flaxseed oil and root caries

Root caries refers to a dental condition characterized by the decay and demineralization of tooth root surfaces. This occurs when tooth roots become exposed due to gum recession, allowing bacterial invasion and tooth structure erosion. While the scientific literature does not explicitly mention the use of flaxseed oil for root caries, it is well-established that flaxseed oil possesses antibacterial properties (Liu et al. 2022 ). These properties may inhibit bacterial species responsible for root caries. Furthermore, flaxseed oil is a rich source of omega-3 fatty acids and lignans, factors potentially relevant to this context. Interestingly, observational studies are investigating the oil’s effects on gingivitis (Deepika 2018 ).

5.5.2 Benzoxazinoid and gingelly oil

Benzoxazinoids are plant secondary metabolites synthesized in many monocotyledonous species and some dicotyledonous plants (Schullehner et al. 2008 ). Gingelly oil, derived from sesame seeds, originates from a dicotyledonous plant. In the biologists’ opinion, this concurrence suggests a valid basis for the hypothesized connection.

5.5.3 Senile osteoporosis and soybean oil

Senile osteoporosis is a subtype of osteoporosis occurring in older individuals due to age-related bone loss. Soybean oil, a common vegetable oil derived from soybeans, contains phytic acid (Anderson and Wolf 1995 ). Phytic acid is known to inhibit the absorption of certain minerals, including calcium, which is essential for bone strength (Lönnerdal et al. 1989 ). Again, in the experts’ opinion, this suggests a valid basis for a (unfortunately detrimental) connection between the oil and the health condition.

6 Conclusions

We introduce an innovative approach to tackle the hypothesis generation problem within the context of temporal graphs. We present THiGER, a novel transformer-based model designed for node pair prediction in temporal graphs. THiGER leverages a hierarchical framework to effectively capture and learn from temporal information inherent in such graphs. This framework enables efficient parallel temporal information aggregation. We also introduce THiGER-A, an incremental training strategy that enhances the model’s performance and generalization by training it on high-utility samples selected through active curriculum learning, particularly benefiting the challenging inductive test setting. Quantitative experiments and analyses demonstrate the efficiency and robustness of our proposed method when compared to various state-of-the-art approaches. Qualitative analyses illustrate its practical utility.

For future work, an enticing avenue involves incorporating additional node-pair relationship information from established biomedical and/or food-related knowledge graphs. In scientific research, specific topics often experience sudden exponential growth, leading to temporal data distribution imbalances. Another intriguing research direction, thus, is the study of the relationship between temporal data distribution and the performance of temporal graph neural network models. We plan to analyze the performance of several temporal GNN models across diverse temporal data distributions and propose model enhancement methods tailored to such scenarios.

Due to the vast scale of the publication graph, training the hypothesis predictor with all positive and negative edges is impractical and limits the model’s ability to generalize, especially when the input data is noisy. Thus, it is crucial to train the model selectively on a high-quality subset of the training data. Our work presents active curriculum learning as a promising approach for feasible and robust training for hypothesis predictors. However, a static strategy struggles to generalize well across different scenarios. An exciting direction for future research could be to develop dynamic policies for data sampling that automatically adapt to diverse applications. Furthermore, improving time complexity is a critical challenge, particularly for applications involving large datasets and models.

Ahmed NM, Chen L, Wang Y et al. (2016) Sampling-based algorithm for link prediction in temporal networks. Inform Sci 374:1–14

Article   MathSciNet   Google Scholar  

Akujuobi U, Chen J, Elhoseiny M et al. (2020) Temporal positive-unlabeled learning for biomedical hypothesis generation via risk estimation. Adv Neural Inform Proc Syst 33:4597–4609

Google Scholar  

Akujuobi U, Spranger M, Palaniappan SK et al. (2020) T-pair: Temporal node-pair embedding for automatic biomedical hypothesis generation. IEEE Trans Knowledge Data Eng 34(6):2988–3001

Anderson RL, Wolf WJ (1995) Compositional changes in trypsin inhibitors, phytic acid, saponins and isoflavones related to soybean processing. J Nutr 125(suppl–3):581S-588S

Arthur D, Vassilvitskii S (2006) \(k\) -means++: The advantages of careful seeding. Stanford University, Tech. rep

Ash JT, Zhang C, Krishnamurthy A et al. (2020) Deep batch active learning by diverse, uncertain gradient lower bounds. ICLR, Vienna

Baek SH, Lee D, Kim M et al. (2017) Enriching plausible new hypothesis generation in pubmed. PloS One 12(7):e0180539

Article   Google Scholar  

Bengio Y, Louradour J, Collobert R, et al. (2009) Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, 41–48

Brainard J (2020) Scientists are drowning in COVID-19 papers. Can new tools keep them afloat? — science.org. https://www.science.org/content/article/scientists-are-drowning-covid-19-papers-can-new-tools-keep-them-afloat , [Accessed 25-May-2023]

Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s theory. Psychol Rev 63(5):277

Chen T, Kornblith S, Norouzi M, et al. (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, PMLR, 1597–1607

Crichton G, Guo Y, Pyysalo S et al. (2018) Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches. BMC Bioinform 19(1):1–11

Deepika A (2018) Effect of flaxseed oil in plaque induced gingivitis-a randomized control double-blind study. J Evid Based Med Healthc 5(10):882–5

Fan Jw, Lussier YA (2017) Word-of-mouth innovation: hypothesis generation for supplement repurposing based on consumer reviews. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, p 689

Gilad-Bachrach R, Navot A, Tishby N (2006) Query by committee made real. NeurIPS, Denver

Gitmez AA, Zárate RA (2022) Proximity, similarity, and friendship formation: Theory and evidence. arXiv preprint arXiv:2210.06611

Gopalakrishnan V, Jha K, Zhang A, et al. (2016) Generating hypothesis: Using global and local features in graph to discover new knowledge from medical literature. In: Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB, 23–30

Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864

Hacohen G, Weinshall D (2019) On the power of curriculum learning in training deep networks. In: International Conference on Machine Learning, PMLR, 2535–2544

Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inform Proc Syst. https://doi.org/10.48550/arXiv.1706.02216

Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR, abs/160608415 3

Hisano R (2018) Semi-supervised graph embedding approach to dynamic link prediction. In: Complex Networks IX: Proceedings of the 9th Conference on Complex Networks CompleNet 2018 9, Springer, 109–121

Hristovski D, Friedman C, Rindflesch TC, et al. (2006) Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, 349

Jha K, Xun G, Wang Y, et al. (2019) Hypothesis generation from text based on co-evolution of biomedical concepts. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 843–851

Kazemi SM, Goel R, Eghbali S, et al. (2019) Time2vec: Learning a vector representation of time. arXiv preprint arXiv:1907.05321

King RD, Whelan KE, Jones FM et al. (2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427(6971):247–252

King RD, Rowland J, Oliver SG et al. (2009) The automation of science. Science 324(5923):85–89

Kirsch A, van Amersfoort J, Gal Y (2019) BatchBALD: efficient and diverse batch acquisition for deep Bayesian active learning. NeurIPS, Denver

Kitano H (2021) Nobel turing challenge: creating the engine for scientific discovery. npj Syst Biol Appl 7(1):29

Klein MT, Hou G, Quann RJ et al. (2002) Biomol: a computer-assisted biological modeling tool for complex chemical mixtures and biological processes at the molecular level. Environ Health Perspect 110(suppl 6):1025–1029

Krenn M, Buffoni L, Coutinho B et al. (2023) Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network. Nat Machine Intell 5(11):1326–1335

Kumari P, Goru R, Chaudhuri S et al. (2020) Batch decorrelation for active metric learning. IJCAI-PRICAI, Jeju Island

Book   Google Scholar  

Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1269–1278

Liu Y, Liu Y, Li P et al. (2022) Antibacterial properties of cyclolinopeptides from flaxseed oil and their application on beef. Food Chem 385:132715

Lönnerdal B, Sandberg AS, Sandström B et al. (1989) Inhibitory effects of phytic acid and other inositol phosphates on zinc and calcium absorption in suckling rats. J Nutr 119(2):211–214

Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

Milani Fard A, Bagheri E, Wang K (2019) Relationship prediction in dynamic heterogeneous information networks. In: Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, Springer, 19–34

Nguyen GH, Lee JB, Rossi RA et al. (2018) Continuous-time dynamic network embeddings. Companion Proc Web Conf 2018:969–976

Pareja A, Domeniconi G, Chen J, et al. (2020) Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In: Proceedings of the AAAI conference on artificial intelligence, 5363–5370

Pinsler R, Gordon J, Nalisnick E et al. (2019) Bayesian batch active learning as sparse subset approximation. NeurIPS, Denver

Priyadarshini K, Chaudhuri S, Borkar V, et al. (2021) A unified batch selection policy for active metric learning. In: Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part II 21, Springer, 599–616

Rossi E, Chamberlain B, Frasca F, et al. (2020) Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637

Schullehner K, Dick R, Vitzthum F et al. (2008) Benzoxazinoid biosynthesis in dicot plants. Phytochemistry 69(15):2668–2677

Settles B (2012) Active learning. SLAIML, Shimla

Shi F, Foster JG, Evans JA (2015) Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc Networks 43:73–85

Singer U, Guy I, Radinsky K (2019) Node embedding over temporal graphs. arXiv preprint arXiv:1903.08889

Smalheiser NR, Swanson DR (1998) Using Arrowsmith: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Prog Biomed 57(3):149–153

Spangler S (2015) Accelerating discovery: mining unstructured information for hypothesis generation. Chapman and Hall/CRC, Boca Raton

Spangler S, Wilkins AD, Bachman BJ, et al. (2014) Automated hypothesis generation based on mining scientific literature. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 1877–1886

Srihari RK, Xu L, Saxena T (2007) Use of ranked cross document evidence trails for hypothesis generation. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 677–686

Srivastava N, Hinton G, Krizhevsky A et al. (2014) Dropout: a simple way to prevent neural networks from overfitting. J Machine Learn Res 15(1):1929–1958

MathSciNet   Google Scholar  

Swanson DR (1986) Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18

Swanson DR, Smalheiser NR (1997) An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell 91(2):183–203

Sybrandt J, Shtutman M, Safro I (2017) Moliere: Automatic biomedical hypothesis generation system. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1633–1642

Sybrandt J, Tyagin I, Shtutman M, et al. (2020) Agatha: automatic graph mining and transformer based hypothesis generation approach. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2757–2764

Tabachnick BG, Fidell LS (2000) Computer-assisted research design and analysis. Allyn & Bacon Inc, Boston

Trautman A (2022) Nutritive knowledge based discovery: Enhancing precision nutrition hypothesis generation. PhD thesis, The University of North Carolina at Charlotte

Trivedi R, Farajtabar M, Biswal P, et al. (2019) Dyrep: Learning representations over dynamic graphs. In: International Conference on Learning Representations

Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Machine Learn Res 9(11):2579–2605

Vaswani A, Shazeer N, Parmar N et al. (2017) Attention is all you need. Adv Neural Inform Proc Syst. https://doi.org/10.48550/arXiv.1706.03762

Wang Y, Wang W, Liang Y et al. (2021) Curgraph: curriculum learning for graph classification. Proc Web Conf 2021:1238–1248

Wang Z, Li Q, Yu D et al. (2022) Temporal graph transformer for dynamic network. In: Part II (ed) Artificial Neural Networks and Machine Learning-ICANN 2022: 31st International Conference on Artificial Neural Networks, Bristol, UK, September 6–9, 2022, Proceedings. Springer, Cham, pp 694–705

Chapter   Google Scholar  

Wang L, Chang X, Li S, et al. (2021a) Tcl: Transformer-based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944

Weissenborn D, Schroeder M, Tsatsaronis G (2015) Discovering relations between indirectly connected biomedical concepts. J Biomed Semant 6(1):28

Wen Y, Zhang K, Li Z, et al. (2016) A discriminative feature learning approach for deep face recognition. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, Springer, 499–515

White K (2021) Publications Output: U.S. Trends and International Comparisons | NSF - National Science Foundation — ncses.nsf.gov. https://ncses.nsf.gov/pubs/nsb20214 , [Accessed 25-May-2023]

Xun G, Jha K, Gopalakrishnan V, et al. (2017) Generating medical hypotheses based on evolutionary medical concepts. In: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, 535–544

Zhang R, Wang Q, Yang Q et al. (2022) Temporal link prediction via adjusted sigmoid function and 2-simplex structure. Sci Rep 12(1):16585

Zhang Y, Pang J (2015) Distance and friendship: A distance-based model for link prediction in social networks. In: Asia-Pacific Web Conference, Springer, 55–66

Zhang Z, Wang J, Zhao L (2023) Relational curriculum learning for graph neural networks. https://openreview.net/forum?id=1bLT3dGNS0

Zhong Y, Huang C (2023) A dynamic graph representation learning based on temporal graph transformer. Alexandria Eng J 63:359–369

Zhou H, Jiang H, Yao W et al. (2022) Learning temporal difference embeddings for biomedical hypothesis generation. Bioinformatics 38(23):5253–5261

Zhou L, Yang Y, Ren X, et al. (2018) Dynamic network embedding by modeling triadic closure process. In: Proceedings of the AAAI Conference on Artificial Intelligence

Download references

Author information

Uchenna Akujuobi and Priyadarshini Kumari have contributed equally to this work.

Authors and Affiliations

Sony AI, Barcelona, Spain

Uchenna Akujuobi, Samy Badreddine & Tarek R. Besold

Sony AI, Cupertino, USA

Priyadarshini Kumari

Sony AI, Tokyo, Japan

Jihun Choi & Kana Maruyama

The Systems Biology Institute, Tokyo, Japan

Sucheendra K. Palaniappan

You can also search for this author in PubMed   Google Scholar

Contributions

U.A. and P.K. co-lead the reported work and the writing of the manuscript, J.C., S.B., K.M., and S.P. supported the work and the writing of the manuscript. T.B. supervised the work overall. All authors reviewed the manuscript and contributed to the revisions based on the reviewers’ feedback.

Corresponding authors

Correspondence to Uchenna Akujuobi or Priyadarshini Kumari .

Ethics declarations

Conflict of interest.

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Akujuobi, U., Kumari, P., Choi, J. et al. Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach. Artif Intell Rev 57 , 244 (2024). https://doi.org/10.1007/s10462-024-10885-1

Download citation

Accepted : 25 July 2024

Published : 12 August 2024

DOI : https://doi.org/10.1007/s10462-024-10885-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Temporal graph neural network
  • Active learning
  • Hierarchical transformer
  • Curriculum learning
  • Literature based discovery
  • Edge prediction
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    hypothesis driven

  2. | The synergistic cycle of hypothesis-driven and data-driven

    hypothesis driven

  3. Hypothesis driven research concept icon Stock Vector Image & Art

    hypothesis driven

  4. Hypothesis-driven Development

    hypothesis driven

  5. The intertwined cycles involving hypothesis-driven research and

    hypothesis driven

  6. Hypothesis-driven approach: the definitive guide

    hypothesis driven

COMMENTS

  1. How to Implement Hypothesis-Driven Development

    Learn how to use the scientific method to test hypotheses about new ideas, products and services in software development. See examples of user stories that frame experiments and measure outcomes with indicators and assumptions.

  2. What I learned at McKinsey: How to be hypothesis-driven

    There is a repeating cycle of forming and testing hypotheses. McKinsey consultants follow three steps in this cycle: Form a hypothesis about the problem and determine the data needed to test the ...

  3. Hypothesis Driven Problem-Solving Explained: Tactics and Training

    Hypothesis driven problem solving also known as "top-down problem solving" or "hypothesis driven thinking" is a form of problem-solving that starts with the answer and works backward to prove or disprove that answer. Practiced by the biggest consulting firms around the globe for its effectiveness in getting to the heart of the matter ...

  4. How to Implement Hypothesis-Driven Development

    Practicing Hypothesis-Driven Development[1] is thinking about the development of new ideas, products, and services - even organizational change - as a series of experiments to determine whether an expected outcome will be achieved. The process is iterated upon until a desirable outcome is obtained or the idea is determined to be not viable.

  5. Hypothesis-Driven Development (Practitioner's Guide)

    Like agile, hypothesis-driven development (HDD) is more a point of view with various associated practices than it is a single, particular practice or process. That said, my goal here for is you to leave with a solid understanding of how to do HDD and a specific set of steps that work for you to get started. After reading this guide and trying ...

  6. What is hypothesis-driven development?

    Hypothesis-driven development in a nutshell. As the name suggests, hypothesis-driven development is an approach that focuses development efforts around, you guessed it, hypotheses. To make this example more tangible, let's compare it to two other common development approaches: feature-driven and outcome-driven.

  7. Guide for Hypothesis-Driven Development: How to Form a List of

    The hypothesis-driven development management cycle begins with formulating a hypothesis according to the "if" and "then" principles. In the second stage, it is necessary to carry out several works to launch the experiment (Action), then collect data for a given period (Data), and at the end, make an unambiguous conclusion about whether ...

  8. 5 steps to a hypothesis-driven design process

    Recruit the users you want to target, have a time frame, and put the design in front of the users. 5. Learn and build. You just learned that the result was positive and you're excited to roll out the feature. That's great! If the hypothesis failed, don't worry—you'll be able to gain some insights from that experiment.

  9. The 6 Steps that We Use for Hypothesis-Driven Development

    Learn how to use hypothesis-driven development, a prototype methodology that validates assumptions with user feedback. Follow the six steps from idea generation to testing, learning, and repeating.

  10. Hypothesis-driven development: Definition, why and implementation

    Hypothesis-driven development emphasizes a data-driven and iterative approach to product development, allowing teams to make more informed decisions, validate assumptions, and ultimately deliver products that better meet user needs. Hypothesis-driven development (HDD) is an approach used in software development and product management.

  11. Hypothesis-Driven Development

    Hypothesis-driven decisions. Specifically, you need to shift your teammates focus from their natural tendency to focus on their own output to focusing out user outcomes. Easier said than done, but getting everyone excited about results of an experiment is one of the most reliable ways to get there. This week, we'll focus on how you get ...

  12. Lessons from Hypothesis-Driven Development

    The principle of hypothesis-driven development is to apply scientific methods to product development. Defining success criteria and then forming testable hypotheses around how to meet them. Over ...

  13. Hypothesis Driven Research

    Dr. Helene Engler defines and discusses the basic principles of hypothesis driven research. She also discusses the importance of developing a good scientific...

  14. Why hypothesis-driven development is key to DevOps

    Hypothesis-driven development is based on a series of experiments to validate or disprove a hypothesis in a complex problem domain where we have unknown-unknowns. We want to find viable ideas or fail fast. Instead of developing a monolithic solution and performing a big-bang release, we iterate through hypotheses, evaluating how features ...

  15. Hypothesis-driven approach: the definitive guide

    Hypothesis-driven thinking is a problem-solving method whereby you start with the answer and work back to prove or disprove that answer through fact-finding. Concretely, here is how consultants use a hypothesis-driven approach to solve their clients' problems: Form an initial hypothesis, which is what they think the answer to the problem is.

  16. Structure the Problem: Pyramids and Trees

    The Irresistible Appeal of the Hypothesis-Driven Approach. Hypothesis-driven problem solving is efficient when you start from a sound hypothesis. It saves time and energy by focusing your efforts on a candidate solution (or a range of consistent solutions). Experts and senior business people often structure problems in this way.

  17. What Is A Hypothesis?

    Hypothesis Driven Approach. Using a hypothesis driven approach requires the following steps: State a hypothesis based on the provided information. Gather data to test the hypothesis. Revise hypothesis as needed or offer a completely new one if the data proves your original hypothesis wrong. Repeat steps 2-3 for additional buckets in your framework.

  18. How McKinsey uses Hypotheses in Business & Strategy by McKinsey Alum

    And, being hypothesis-driven was required to have any success at McKinsey. A hypothesis is an idea or theory, often based on limited data, which is typically the beginning of a thread of further investigation to prove, disprove or improve the hypothesis through facts and empirical data. The first step in being hypothesis-driven is to focus on ...

  19. Hypothesis-driven Research

    In a hypothesis-driven research, specifications of methodology help the grant reviewers to differentiate good science from bad science, and thus, hypothesis-driven research is the most funded research. "Hypotheses aren't simply useful tools in some potentially outmoded vision of science; they are the whole point."

  20. Using Hypothesis-Driven Thinking in Strategy Consulting

    This technical note describes the process of hypothesis-driven thinking, using examples from strategy consulting, medicine, and architecture. Associated with the scientific method, hypothesis-driven thinking focuses on the creative generation of alternative hypotheses and on their subsequent validation or refutation through the use of data.

  21. Data driven theory for knowledge discovery in the exact ...

    The limitations of the hypothesis driven approach to investigating complex physical objects, affected by great uncertainties, has become particularly evident in the case of open systems such as ...

  22. Hypothesis-Driven Approach: Crack Your Case Like a Consultant

    A hypothesis-driven approach in consulting is a structured method of problem-solving. Consultants formulate a hypothesis for the solution to a business problem, then gather data to support or disprove it. Cracking a case interview can be a daunting task, with a wide range of potential solutions and approaches to consider.

  23. Hypothesis-driven approach: Problem solving in the context of global

    The hypothesis-driven approach is a problem-solving method that is necessary at WHO because the environment around us is changing rapidly. WHO needs a new way of problem-solving to process large amounts of information from different fields and deliver quick, tailored recommendations to meet the needs of Member States. ...

  24. UChicago biophysicist studies locomotion in creatures from all walks of

    In turn, her precise models allow her to plan specific, hypothesis-driven experiments, to which she credits her efficiency at the bench. These conceptual intersections — between physics and biology, theory, and experimentation — form the heart of Nirody's work. But other intersections appear in her work too, like that between past and future.

  25. Link prediction for hypothesis generation: an active curriculum

    Over the last few years Literature-based Discovery (LBD) has regained popularity as a means to enhance the scientific research process. The resurgent interest has spurred the development of supervised and semi-supervised machine learning models aimed at making previously implicit connections between scientific concepts/entities within often extensive bodies of literature explicit—i.e ...

  26. The Next Crash: The 2008 Echo Sounds Like a Commercial Property Crash

    This paper examines the interplay between commercial and residential real estate markets within the United States and assesses their impact on financial stability. Employing Hyman Minsky's Financial Instability Hypothesis alongside a robust Ordinary Least Squares (OLS) regression analysis, this research identifies how fluctuations in the commercial real estate sector can influence the ...