Gematria - machine learning for machine code

Contains sources of Gematria, a framework for machine learning on machine code. It includes implementations of the GRANITE model and the Ithemal hierarchical LSTM model for learning inverse throughput of basic blocks.

Installation

Requirements and installation

Our models are built on top of TensorFlow 2.x (using the TensorFlow 1.x compatibility layer) in a mix of C++ and Python. Most of the training code is written in Python; we use C++ for the more demanding parts of the code like graph construction. We use pybind11 to make C++ APIs available in Python.

Basic requirements that need to be installed before starting:

Bazel 6.0 or newer.
A C++ compiler supported by Bazel that compiles C++17. Recent versions of GCC and Clang on Linux both fit the bill.
Python 3.10 or newer.
Git.
PIP.

Additional dependencies, including TensorFlow, Protocol buffers, and different Python libraries are installed through PIP and through Bazel's WORKSPACE file. We strongly recommend using virtualenv to install Python packages to avoid dependency version conflicts with other libraries.

# Get the source code.
$ git clone https://github.com/google/gematria.git
$ cd gematria

# Set up virtualenv.
$ pip install virtualenv
$ virtualenv env
$ . env/bin/activate

# Install Python dependencies.
$ pip install -r requirements.in

# On OS X only. The dependencies of tensorflow-ranking are not set up correctly
# and it needs to be installed manually.
$ pip install --no-deps tensorflow-ranking.

# Build the project, run tests, ...
$ bazel build ...
$ bazel test ...

Building with CMake

A subset of the project, consisting of tools and libraries we eventually plan to merge in the LLVM monorepo, are built with cmake. The requirements are inherited from LLVM, as we use LLVM's "external project" mechanism to build.

First, build TFLite. In addition to the requirements above, see also these prerequisites, noting the reference to the buildbot script which lists additional packages.

Then:

mkdir /tmp/tflite && cd /tmp/tflite
curl https://raw.githubusercontent.com/google/ml-compiler-opt/main/buildbot/build_tflite.sh | bash

This should produce a /tmp/tflite/tflite.cmake.

cd ${GEMATRIA_SRC}
mkdir cmake-build && cd cmake-build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release \
  -C /tmp/tflite/tflite.cmake \
  ${LLVM_PROJECT_SRC}/llvm \
  -DLLVM_EXTERNAL_PROJECTS=gematria \
  -DLLVM_EXTERNAL_GEMATRIA_SOURCE_DIR=${GEMATRIA_SRC}
ninja llvm-granite llvm-cm

Where LLVM_PROJECT_SRC is the absolute path to your local llvm repo, and GEMATRIA_SRC the path to this (the gematria) repo.

To run the llvm-cm tests, you can run the following target:

ninja check-llvm-tools-llvm-cm

Platform Support

We develop and test our code on Linux and x86-64, and we test it on Mac OS X and ARM. While we did not test it, we expect it to work with minimal changes also on other architectures and platforms that run TensorFlow.

Using the models

See the training guide and guides for Python inference and C++ inference.

Repository structure

See the separate document.

Get Involved

Issue tracker: https://github.com/google/Gematria/issues

We welcome patches -- see CONTRIBUTING for more information on how to submit a patch.

Cite us

@inproceedings{granite:iiswc:2022,
  author = {O. Sykora and P. Phothilimthana and C. Mendis and A. Yazdanbakhsh},
  booktitle = {2022 IEEE International Symposium on Workload Characterization (IISWC)},
  title = {{GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation}},
  year = {2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github/workflows		.github/workflows
g3doc		g3doc
gematria		gematria
llvm_cm		llvm_cm
tools		tools
.bazelrc		.bazelrc
.clang-format		.clang-format
.gitignore		.gitignore
BUILD.bazel		BUILD.bazel
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING		CONTRIBUTING
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE
benchmark_build.patch		benchmark_build.patch
graph_nets.BUILD		graph_nets.BUILD
proto.bzl		proto.bzl
pyproject.toml		pyproject.toml
python.bzl		python.bzl
requirements.in		requirements.in
requirements.txt		requirements.txt
sonnet.patch		sonnet.patch

License

google/gematria

Folders and files

Latest commit

History

Repository files navigation

Gematria - machine learning for machine code

Installation

Requirements and installation

Building with CMake

Platform Support

Using the models

Repository structure

Get Involved

Cite us

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages