skip to main content


San Francisco's source{d} has seized an opportunity to use machine learning (ML) to analyze software code bases, treating all forms of code as data.


  • The source{d} Engine retrieves and stores source code from any code base – such as all code on premises – for historical analysis reports or interrogation through an API.
  • The source{d} Engine has a SQL interface and uses distributed computing (using Apache Spark) to generate large datasets of universal abstract syntax trees (ASTs), analyzed directly or as input to ML models.
  • source{d} provides tools and libraries to train ML models in public or private code bases as well as models pretrained on big code.

Features and Benefits

  • Explains how source{d} uses a SQL interface and distributed computing to generate large data sets of universal abstract syntax trees (ASTs).
  • Describes source{d}'s Lookout product for assisted code review and code duplicate or similarity detection.

Key questions answered

  • How does source{d} use machine learning to mine source code as data?
  • Which programming languages are supported by source{d} products?

Table of contents


  • Catalyst
  • Key messages

Recommendations for enterprises

  • Why put source{d} on your radar?


  • Deployment options
  • Background
  • Current position

Data sheet

  • Key facts


  • On the Radar
  • Authors