This website describes projects that are part of the Harvard Program in Therapeutic Science (HiTS), at Harvard Medical School.


Our team is developing systems that can accelerate scientific discovery in biomedicine using a combination of text mining, knowledge assembly, mathematical modeling and causal analysis. We are pursuing approaches that use artificial intelligence to increasingly automate the interpretation of scientific literature and large experimental datasets, and also to enable sophisticated human-machine interaction and collaboration. Applications of these tools range from drug discovery for cancer and other diseases to modeling complex processes at the interface of physical and social systems.




INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system that uses natural language processing and structured databases to collect mechanistic and causal assertions, represent them in a standardized form, and assemble them into causal graphs or dynamical models. Internally, INDRA performs knowledge assembly to correct errors, find and resolve redundancies, infer missing information, filter to a scope of interest and assess belief.
Code   Docs   REST API



EMMAA (Ecosystem of Machine-maintained Models with Automated Analysis) monitors the scientific literature for new findings and automatically updates a set of disease-specific models with new knowledge. It also automatically analyzes these models against a set of test conditions (typically experimental observations) and measures the effect of new knowledge on these results. It then notifies users about relevant new analysis results.
Website   Code   Docs  

INDRA Database

The INDRA Database makes knowledge assembled by INDRA at scale available as a service. It aggregates knowledge extracted by multiple machine-reading systems from all available abstracts and open-access full text articles, and combines this with mechanisms from pathway databases. Queries allow searching for genes, chemicals, biological processes and other concepts of interest, and returns a ranked list of relevant interactions.
Website  REST API   Code

INDRA-IPM (Interactive Pathway Map)

The INDRA-IPM allows you to build pathway maps using natural language descriptions. You simply describe the set of mechanisms to include in English, and then click a button to assemble and lay out a pathway map. The pathway can be exported into various formats like SBML, SBGN, Kappa and others.
Website   Code

Bob with Bioagents is a machine assistant you can chat with about molecular biology. Assume you want to explain an experimental observation, or get some ideas for a new hypothesis. You can talk with the machine agent in English language to discuss topics such as drugs, transcription factors, miRNAs, and their targets, and various mechanisms described in the literature and databases. You can also build up a model of a mechanism during the dialogue, and then ask questions about the properties of the model to see if it behaves as expected.

CLARE machine assistant

CLARE is a machine assistant that can be deployed as a Slack application in a workspace and engage in human-machine dialogue in channels and private messages. It can answer questions about mechanisms such as "what phosphorylates ELK1?" or "does RHOA interact with MYL12B?" and connect this information to other resources, while also allowing to build and discuss models of mechanisms. Please contact us if you'd like to deploy CLARE on your Slack workspace.
Demo video

INDRA World Modeling

In the context of the DARPA World Modelers program, we have generalized INDRA to modeling complex causal mechanisms governing processes such as agricultural production, food security and migration. We are also part of the DSMT-E project, a large collaboration funded by DARPA and the Gates Foundation to develop AI-driven decision support tools for Ethiopia.
Code   Service   DSMT-E

Network search and DepMap explainer

The INDRA Network search builds on the INDRA assembly of literature extractions and pathway databases to find mechanistic paths between entities of interest. It allows searching for a variety of patterns (paths between, common up/downstreams, etc.) and tuning multiple search parameters to define constraints and context. The DepMap explainer is a specific instance of network search aimed at constructing explanations for correlations between genes involved in CRISPR screens of cancer cell lines found at
DepMap explainer   Network search

Application to COVID-19

In the context of the ongoing COVID-19 pandemic, the INDRA team is working on understanding the mechanisms by which SARS-CoV-2 infects cells and the subsequent host response process, with the goal of finding new therapeutics using INDRA.

Application to studying pain and inflammation

In the context of the DARPA Panacea program, the INDRA team is working on understanding the regulation of pain and inflammation with the goal of finding new therapeutics using INDRA.


Adeft (Acromine based Disambiguation of Entities From Text context) builds machine-learning models to disambiguate acronyms and other abbreviations of biological terms in the scientific literature. A growing number of pretrained disambiguation models are publicly available through the Python package.


Gilda is a Python package and REST service that grounds (i.e., finds appropriate identifiers in namespaces for) named entities in biomedical text. It also uses a set of machine-learned disambiguation models to choose between different senses of ambiguous synonyms. It can be integrated into applications as a Python package or through the REST service.
Website and REST API   Code

Projects and Funding

Big Mechanism The DARPA Big Mechanism program set out to automate the reading, assembly and modeling of mechanisms from the scientific literature. We built INDRA, an automated model assembly system which draws on natural language processing systems, and assembles their output into various predictive and explanatory models.
Funded by the Defense Advanced Research Projects Agency under award W911NF-14-1-0397.

Communicating with Computers The DARPA Communicating with Computers (CwC) program develops technologies for a new generation of human-machine interaction in which machines act as proactive collaborators rather than merely problem solving tools. We are developing an interactive dialogue system which allows scientists to interact with a computer partner – one that is able to harness knowledge extracted from the biomedical literature – to construct and test hypotheses about molecular systems.
Funded by the Defense Advanced Research Projects Agency under award W911NF-15-1-0544.

Automated Scientific Discovery Framework The DARPA Automated Scientific Discovery Framework program (ASDF) will develop algorithms and software for reasoning about complex mechanisms operating in the natural world, explaining large-scale data, assisting humans in generating actionable, model-based hypotheses and testing these hypotheses empirically.
Funded by the Defense Advanced Research Projects Agency under award W911NF018-1-0124.

World Modelers The DARPA World Modelers program aims to develop automated information collection and computational modeling techniques to understand the complex dynamics of global processes such as food security, migration and public health. We are developing the INDRA-GEM (Integrated Network and Dynamical Reasoning Assembler for Generalized Ensemble Modeling) automated model assembly system, which integrates information from diverse sources and implements novel probabilistic assembly techniques that can account for the uncertain nature of information in models.
Funded by the Defense Advanced Research Projects Agency under award W911NF-18-1-0014.

Automating Scientific Knowledge Extraction The DARPA ASKE program is part of DARPA's broader Artificial Intelligence Exploration program with the goal of developing technologies for the "Third Wave" of AI. We are developing EMMAA (Ecosystem of Machine-maintained Models with Automated Assembly), a set of self-updating models of cancer biology that run analysis proactively, and report about meaningful changes in conclusions to users.
Funded by the Defense Advanced Research Projects Agency under award HR00111990009.

Panacea The STOP PAIN project, as part of DARPA’s Panacea program, aims to develop novel drugs for the treatment of pain and inflammation using innovative research platforms. Unlike many modern drug discovery campaigns, which are target focused, we combine target-agnostic screening with network inference tools to create causal and mechanistic networks used for identification of previously unknown target-chemical ligand relationships.
Funded by the Defense Advanced Research Projects Agency under award HR00111920022.

DARPA Young Faculty Award Benjamin M. Gyori received a DARPA Young Faculty Award for 2020-2022 for the project "Collaborative scientific discovery with semantically linked, machine built models". The project aims to automate the construction of dynamical models of complex mechanisms embedded in knowledge graphs, and build human-machine collaboration capabilities around this to enable rapid scientific discovery.
Funded by the Defense Advanced Research Projects Agency under award W911NF2010255.


Gyori BM, Bachman JA, Subramanian K, Muhlich JL, Galescu L, Sorger PK. From word models to executable models of signaling networks using automated assembly. Molecular Systems Biology. 2017 13(11):954.
Summary video:

Bachman JA, Gyori BM, Sorger PK. FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinformatics. 2018 19(1):248.
Repository: FamPlex

Todorov PV, Gyori BM, Bachman JA, Sorger PK. INDRA-IPM: interactive pathway modeling using natural language with automated assembly. Bioinformatics. 2019.

Sharp R, Pyarelal A, Gyori BM, Alcock K, Laparra Egoitz, Valenzuela-Escárcega MA, Nagesh A, Yadav V, Bachman JA, Tang Z, Lent H, Luo F, Paul M, Bethard S, Barnard K, Morrison C, Surdeanu M. Eidos, INDRA, & Delphi: From Free Text to Executable Causal Models NAACL, 2019

Hoyt C, Domingo-Fernández D, Aldisi R, Xu L, Kolpeja K, Spalek S, Wollert E, Bachman J, Gyori BM, Greene P, Hofmann-Apitius M. Re-curation and rational enrichment of knowledge graphs in Biological Expression Language Database, 2019.

Steppi A, Gyori BM, Bachman JA. Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature Journal of Open Source Software, 2019

Ietswaart R, Gyori BM, Bachman JA, Sorger PK, Churchman S GeneWalk identifies relevant gene functions for a biological context using network representation learning bioRxiv, 2019

Moret N, Liu C, Gyori BM, Bachman JA, Steppi A, Taujale R, Huang LC, Hug C, Berginski M, Gomez S, Kannan N, Sorger PK. Exploring the understudied human kinome for research and therapeutic opportunities bioRxiv, 2020

Ostaszewski M, Niarakis A, ... Gyori BM, Bachman JA, ..., Baling R, Schneider R. COVID-19 Disease Map, a computational knowledge repository of SARS-CoV-2 virus-host interaction mechanisms bioRxiv, 2020

Wong J, Franz M, Siper MC, Fong D, Durupinar F, Dallago C, Luna A, Giorgi JM, Rodchenkov I, Babur O, Bachman JA, Gyori BM, Demir E, Bader G, Sander C. Capturing scientific knowledge in computable form OSF Preprints, 2020

Bachman JA, Gyori BM Sorger PK. Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining bioRxiv, 2019


WIRED : Our research on human-machine collaboration was featured in WIRED UK, in the article " The merging of humans and machines is happening now", written by then director of DARPA, Arati Prabhakar.

The Guardian : Ben Gyori and John Bachman were interviewed by The Guardian in the tech podcast " Siri of the Cell". Here we introduce our approach to human-machine communication and the assembly of models from the scientific literature.

Harvard Medicine Magazine : Ben Gyori and John Bachman were interviewed for an article in Harvard Medicine Magazine. In "A Closer Read" (see section WALL-E), they talk about natural language processing and the INDRA system.

Harvard News : Ben Gyori was interviewed by the Harvard News to talk about how AI's Next Wave can be applied to scientific discovery in biology.


Our team is part of the Harvard Program in Therapeutic Science and the Laboratory of Systems Pharmacology at Harvard Medical School.

Core members

John Bachman
John Bachman, PhD

Co-Project Lead, Research Associate in Therapeutic Science

Samuel Bunga
Samuel Bunga

Bioinformatics Software Developer

Patrick Greene
Patrick Greene

Scientific Software Developer

Benjamin Gyori
Benjamin Gyori, PhD

Co-Project Lead, Research Associate in Therapeutic Science

Klas Karis
Klas Karis

Scientific Software Developer

Diana Kolusheva
Diana Kolusheva

Scientific Software Developer

Catherine Luria
Catherine Luria, PhD

Program Manager

Peter Sorger
Peter Sorger, PhD

PI, Otto Krayer Professor of Systems Pharmacology

Albert Steppi
Albert Steppi, PhD

Scientific Software Developer


  • Lily Chylek, PhD
  • Jeremy Muhlich
  • Artem Sokolov, PhD
  • Fabian Froehlich, PhD

Past members and contributors

  • Petar Todorov
  • Isabel Latorre, PhD
  • P.S. Thiagarajan, PhD
  • Daniel Milstein
  • William Chen, PhD
  • Robert Sheehan, PhD
  • Kartik Subramanian, PhD