Cache and Objects
Centralized cache¶
Centralized cache is a configuration mode that allows cached files to be shared between multiple users on the same machine, reducing the total cost of disk space. Currently, this feature works only in Linux and derivative machines.
:warning:Caution:
- We encourage the use of centralized cache just with mutability set as strict.
- It is necessary to deactivate the feature
fs.protected_hardlinks
, because ML-Git uses hardlink to share cache files. Be aware that changing this setting is a risky operation, as malicious people can exploit this (see the extract below). Do this only if you really need to use the Centralized Cache feature. Remember to revert this change if you will stop to use Centralized Cache.
Please read this extract from kernel.org about protected_hardlinks setting:
protected_hardlinks:
A long-standing class of security issues is the hardlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given hardlink (i.e. a root process follows a hardlink created by another user). Additionally, on systems without separated partitions, this stops unauthorized users from "pinning" vulnerable setuid/setgid files against being upgraded by the administrator, or linking to special files.
When set to "0", hardlink creation behavior is unrestricted.
When set to "1" hardlinks cannot be created by users if they do not already own the source file, or do not have read/write access to it.
This protection is based on the restrictions in Openwall and grsecurity.
Changing fs.protected_hardlinks:
- Execute in terminal:
sudo gedit /etc/sysctl.conf
- Search for:
#fs.protected_hardlinks = 0
and uncomment (remove ‘#’). If you didn't find it, add a line with#fs.protected_hardlinks = 0
to this file - Then execute:
sudo sysctl -p
Requirements¶
Machine's root user (administrator).
Configuration steps¶
1 - Create a common directory for each entity with read and write permission for all users:
sudo mkdir -p /srv/mlgit/cache/dataset
sudo mkdir -p /srv/mlgit/cache/labels
sudo mkdir -p /srv/mlgit/cache/model
Change permissions:
sudo chmod -R a+rwX /srv/mlgit/cache/dataset
sudo chmod -R a+rwX /srv/mlgit/cache/labels
sudo chmod -R a+rwX /srv/mlgit/cache/model
2 - With the project ml-git initialized change .ml-git/config.yaml :
datasets:
git: ''
cache_path: 'Cache path directory created on step 1 for dataset entity'
labels:
git: ''
cache_path: 'Path directory created on step 1 for labels entity'
models:
git: ''
cache_path: 'Path directory created on step 1 for model entity'
storages: {}
Centralized objects¶
Centralized objects is a configuration that allow to user share ml-git’s data between machine’s users, avoiding downloading times.
Requirements¶
Machine's root user (administrator).
Configuration steps¶
1 - Create a common directory for each entity with read and write permission for all users:
For Windows users
mkdir \a %ALLUSERSPROFILE%\mlgit\objects\dataset
mkdir \a %ALLUSERSPROFILE%\mlgit\objects\labels
mkdir \a %ALLUSERSPROFILE%\mlgit\objects\model
or
mkgit \a C:\ProgramData\mlgit\objects\dataset
mkgit \a C:\ProgramData\mlgit\objects\labels
mkgit \a C:\ProgramData\mlgit\objects\model
For Linux and derivatives users
sudo mkdir -p /srv/mlgit/objects/dataset
sudo mkdir -p /srv/mlgit/objects/labels
sudo mkdir -p /srv/mlgit/objects/model
Change permissions
sudo chmod -R a+rwX /srv/mlgit/objects/dataset
sudo chmod -R a+rwX /srv/mlgit/objects/labels
sudo chmod -R a+rwX /srv/mlgit/objects/model
2 - With the project ml-git initialized change .ml-git/config.yaml :
datasets:
git: ''
objects_path: 'Path directory created on step 1 for dataset entity'
labels:
git: ''
objects_path: 'Path directory created on step 1 for labels entity'
models:
git: ''
objects_path: 'Path directory created on step 1 for model entity'
storages: {}