Skip to content
@Rootly-AI-Labs

Rootly AI Labs

Pushing the boundaries of AI in incident management & system reliability

LinkedIn GitHub followers Blog

Building The Future of Reliability and Operational Excellence

The Rootly AI Labs is a fellow-led community designed to redefine reliability engineering. We develop innovative prototypes, create open-source tools, and produce research that’s shared to advance the standards of operational excellence.

Some of Our Projects

  • SRE-skills-bench: Can LLMs resolve real-world SRE Tasks? A benchmark testing LLMs on SRE-type tasks. Like SWE-bench, but for SREs.
  • On-Call Health: Detects potential signs of overwork in incident responders, which could lead to burnout.
  • Rootly MCP server: Resolve production incidents in under a minute without leaving your IDE.
  • IncidentDiagram: Generates a diagram highlighting what happened during an incident by ingesting the retrospective and associated codebase.

About the Rootly AI Labs

Rootly AI Labs

Rootly began in 2021 by building a category-defining on-call and incident response platform, trusted by thousands, including Replit, NVIDIA, LinkedIn, and Dropbox.

Now, GenAI is simultaneously introducing new complexities and unlocking opportunities to redefine reliability forever.

The Rootly AI Labs is a fellow-led community designed to redefine reliability engineering. We develop innovative prototypes, create open-source tools, and produce research that's shared to advance the standards of operational excellence.

Our Fellows

  • Allan Parson – Sr Staff Engineer at Venmo
  • Casey Brown – Head of Infrastructure Engineering at Weights and Biases
  • Kishan Rao – Engineering Manager at Okta
  • Kishore Korathaluri – Staff Site Reliability Engineer at Cribl
  • Laurence Liang – Student Researcher at McGill University
  • Muhammad Hamza – Machine Learning Researcher at University of Toronto
  • Sahil Kumar – Director of AI Product at Twilio
  • Spencer Cheng – Software Engineer at Rivian
  • Sylvain Kalache – Head of Rootly AI Labs

Supported By

Thank you to our partners for supporting us.

Anthropic Google Cloud Google DeepMind

Popular repositories Loading

  1. Rootly-MCP-server Rootly-MCP-server Public

    Rootly MCP server

    Python 36 15

  2. logs-dataset logs-dataset Public

    A collection of logs used for training AI-powered Incident Management & SRE Automation

    19 1

  3. IncidentDiagram IncidentDiagram Public

    A tool for creating diagrams from Incident Reviews/PostMortems using LLMs

    Python 9

  4. GMCQ-benchmark GMCQ-benchmark Public

    Evaluation benchmark for language models to understand code to close pull requests.

    6

  5. EventOrOutage EventOrOutage Public

    EventOrOutage is leveraging LLMs to help SREs understand if a drop in traffic is due to an external event (holiday, election, sport event...) instead of an outage.

    Python 5 2

  6. On-Call-Health On-Call-Health Public

    On-call Health: identify signs that incident responders are overworked.

    Python 5 2

Repositories

Showing 10 of 14 repositories

Top languages

Loading…

Most used topics

Loading…