Relationships API methods¶
This notebook describes a basic flow in the context of relationships between entities with the API provided by ML-Git.
In it, we'll show you how to use the set of commands provided. You can check the documentation for more information: API documentation
Notebook prerequisites¶
This notebook uses the GitHub API to get data from a repository and performs Ml-Git operations on its contents, so before running this notebook, take the following steps:
-
Have a GitHub SHH access key so that you can use the repository information retrieval API.
-
Have a GitHub repository that the SHH key has access to.
1 - Context¶
In this notebook we consider a scenario of an ML-Git project with the following settings:
-
A versioned config file in GitHub. Pointing to the entities' metadata repositories.
-
Each entity type having its metadata repository.
-
One mode entity (model-ex), two labels entities (labels-ex and labels-ex2) and one dataset entity (dataset-ex)
-
Entities have relationships defined at versioning time.
This settings mentioned above can be better visualized in the diagram below:
2 - Configuring¶
To use the methods, you will need to import the API and define some constants related to the user's credential:
Below are the constants described in the pre-requirements section, where:['removed'] should be replaced by the SHH access key and api_url can be modified if necessary as reported in the GitHub API documentation.
from ml_git.api import MLGitAPI
from ml_git import api
github_token = ['removed']
api_url = 'https://api.github.com'
After defining the variables to configure, it will be possible to start a manager that will be responsible for operating on the github API.
api = MLGitAPI()
manager = api.init_entity_manager(github_token, api_url)
We will use the manager to execute the commands in the next steps.
3 - Methods¶
3.1 - Get Entities¶
The get_entities method allows the user to get a list of entities being versioned in a project. For this, the user must inform the path to the configuration file, whether this path is a local directory or the name of a git repository. The path can be modified using the config_repository_name field, in our example case the configuration file is in 'user/mlgit-config-repository.
config_repository_name='user/mlgit-config-repository'
project_entities = manager.get_entities(config_repo_name=config_repository_name)
print("Entities found: {}".format(len(project_entities)))
print("Example of output object:\n{}".format(project_entities[3]))
Entities found: 4
Example of output object:
{
"name": "model-ex",
"entity_type": "model",
"metadata": {
"full_name": "user/mlgit-models",
"git_url": "git@github.com:user/mlgit-models.git",
"html_url": "https://github.com/user/mlgit-models",
"owner_email": "user@gmail.com",
"owner_name": "User Name"
},
"last_spec_version": {
"version": 3,
"tag": "test__model-ex__3",
"mutability": "flexible",
"categories": [
"test"
],
"amount": 3,
"size": "27 Bytes",
"storage": {
"type": "s3h",
"bucket": "mlgit-bucket"
}
}
}
As expected the API found 4 entities in the repository (dataset-ex, model-ex, labels-ex, labels-ex2).
3.2 - Get Entity Versions¶
The get_entity_version method allows the user to get a list of spec versions found for an especific entity.
selected_entity = project_entities[3]
entity_versions = manager.get_entity_versions(selected_entity.name, selected_entity.metadata.full_name)
print("Versions found: {}".format(len(entity_versions)))
print("Example of output object:\n{}".format(entity_versions[len(entity_versions)-1]))
Versions found: 3
Example of output object:
{
"version": 1,
"tag": "test__model-ex__1",
"mutability": "flexible",
"categories": [
"test"
],
"amount": 1,
"size": "9 Bytes",
"storage": {
"type": "s3h",
"bucket": "mlgit-bucket"
}
}
As expected the API found 3 versions for the model-ex entity.
3.3 - Get Linked Entities¶
The get_linked_entities method allows the user to get a list of linked entities found for an entity in a specific version.
entity_version = 1
linked_entities_in_version = manager.get_linked_entities(selected_entity.name, entity_version, selected_entity.metadata.full_name)
print("Output: \n{}".format(linked_entities_in_version))
Output:
[{
"tag": "test__dataset-ex__1",
"name": "dataset-ex",
"version": "1",
"entity_type": "dataset"
}, {
"tag": "test__labels-ex__1",
"name": "labels-ex",
"version": "1",
"entity_type": "labels"
}]
If we go back to the diagram, we can see that as shown in the output, version 1 of the model-ex entity is related to dataset-ex in version 1 and labels-ex in version 1.
3.4 - Get Entity Relationships¶
The get_linked_entities method allows the user to get the list of all relationships that the specific entity has. For this it goes through all versions of the entity and checks the relationships that have been established.
entity_relationships = manager.get_entity_relationships(selected_entity.name, selected_entity.metadata.full_name)
count_relationships = 0
for version in entity_relationships[selected_entity.name]:
count_relationships += len(version.relationships)
print("Relationships found: {}".format(count_relationships))
print("Example of output object:\n{}".format(entity_relationships[selected_entity.name][0]))
Relationships found: 6
Example of output object:
{
"version": 3,
"tag": "test__model-ex__3",
"relationships": [
{
"tag": "test__dataset-ex__3",
"name": "dataset-ex",
"version": "3",
"entity_type": "dataset"
},
{
"tag": "test__labels-ex2__2",
"name": "labels-ex2",
"version": "2",
"entity_type": "labels"
}
]
}
In addition, this command allows the user to define the output format, which can be json (as in the previous example) or CSV. If he wants, he can also define the export_path to export the data to a file.
An example of how to use the generated csv can be seen below:
import pandas as pd
entity_relationships_csv = manager.get_entity_relationships(selected_entity.name, selected_entity.metadata.full_name, export_type='csv')
df = pd.read_csv(entity_relationships_csv)
df
from_tag | from_name | from_version | from_type | to_tag | to_name | to_version | to_type | |
---|---|---|---|---|---|---|---|---|
0 | test__model-ex__3 | model-ex | 3 | model | test__dataset-ex__3 | dataset-ex | 3 | dataset |
1 | test__model-ex__3 | model-ex | 3 | model | test__labels-ex2__2 | labels-ex2 | 2 | labels |
2 | test__model-ex__2 | model-ex | 2 | model | test__dataset-ex__1 | dataset-ex | 1 | dataset |
3 | test__model-ex__2 | model-ex | 2 | model | test__labels-ex__2 | labels-ex | 2 | labels |
4 | test__model-ex__1 | model-ex | 1 | model | test__dataset-ex__1 | dataset-ex | 1 | dataset |
5 | test__model-ex__1 | model-ex | 1 | model | test__labels-ex__1 | labels-ex | 1 | labels |
3.5 - Get Project Entities Relationships¶
Like the previous command, the get_project_entities_relationships command aims to present the entity relationships, but with this single command the user can capture the relationships of all entities that are in the project.
In our case we have 4 versioned entities, so the command will check the relationships of these 4 entities.
project_entities_relationships = manager.get_project_entities_relationships(config_repository_name)
count_relationships = 0
for entity in project_entities_relationships:
for version in project_entities_relationships[entity]:
count_relationships += len(version.relationships)
print("Relationships found: {}".format(count_relationships))
print("Example of output object:\n{}".format(project_entities_relationships[entity][0]))
Relationships found: 10
Example of output object:
{
"version": 3,
"tag": "test__model-ex__3",
"relationships": [
{
"tag": "test__dataset-ex__3",
"name": "dataset-ex",
"version": "3",
"entity_type": "dataset"
},
{
"tag": "test__labels-ex2__2",
"name": "labels-ex2",
"version": "2",
"entity_type": "labels"
}
]
}
Like the previous one, it is possible to export the result in csv.
project_entities_relationships_csv = manager.get_project_entities_relationships(config_repository_name, export_type='csv')
df = pd.read_csv(project_entities_relationships_csv)
df
from_tag | from_name | from_version | from_type | to_tag | to_name | to_version | to_type | |
---|---|---|---|---|---|---|---|---|
0 | test__labels-ex2__2 | labels-ex2 | 2 | labels | test__dataset-ex__3 | dataset-ex | 3 | dataset |
1 | test__labels-ex2__1 | labels-ex2 | 1 | labels | test__dataset-ex__3 | dataset-ex | 3 | dataset |
2 | test__labels-ex__2 | labels-ex | 2 | labels | test__dataset-ex__1 | dataset-ex | 1 | dataset |
3 | test__labels-ex__1 | labels-ex | 1 | labels | test__dataset-ex__1 | dataset-ex | 1 | dataset |
4 | test__model-ex__3 | model-ex | 3 | model | test__dataset-ex__3 | dataset-ex | 3 | dataset |
5 | test__model-ex__3 | model-ex | 3 | model | test__labels-ex2__2 | labels-ex2 | 2 | labels |
6 | test__model-ex__2 | model-ex | 2 | model | test__dataset-ex__1 | dataset-ex | 1 | dataset |
7 | test__model-ex__2 | model-ex | 2 | model | test__labels-ex__2 | labels-ex | 2 | labels |
8 | test__model-ex__1 | model-ex | 1 | model | test__dataset-ex__1 | dataset-ex | 1 | dataset |
9 | test__model-ex__1 | model-ex | 1 | model | test__labels-ex__1 | labels-ex | 1 | labels |
As expected, all the relationships that were highlighted in the diagram were captured by the API.