Hangar keeps track of data in the form multidimensional arrays, as well as strings and bytestrings, so that you can rely on a representation that is closest to your AI pipelines. In Hangar data compression and I/O are optimized according to data type and sparsity. You can save space and CPU cycles on small to huge datasets.
Rather than tracking the contents of a directory in time, Hangar knows about the data you version. No need to materialize data in order to access it, or even fetch it all from a remote. Work on partial clones and access to different branches from multiple concurrent processes is fully supported.
Hangar has a brand new book-keeping engine, heavily inspired by git and optimized for tracking large collections of numerical data. Under the hood, a Merkle DAG and cryptographic hashing ensure the consistency of history. Structural sharing optimizes tracking of large datasets. Content addressable storage ensure that space requirements depend on the actual information stored in a repository.
Hangar keeps track of data in the form multidimensional arrays, as well as strings and bytestrings, so that you can rely on a representation that is closest to your AI pipelines. In Hangar data compression and I/O are optimized according to data type and sparsity. You can save space and CPU cycles on small to huge datasets.
Rather than tracking the contents of a directory in time, Hangar knows about the data you version. No need to materialize data in order to access it, or even fetch it all from a remote. Work on partial clones and access to different branches from multiple concurrent processes is fully supported.
Hangar has a brand new book-keeping engine, heavily inspired by git and optimized for tracking large collections of numerical data. Under the hood, a Merkle DAG and cryptographic hashing ensure the consistency of history. Structural sharing optimizes tracking of large datasets. Content addressable storage ensure that space requirements depend on the actual information stored in a repository.
Zero-cost branching to encourage data evolution and collaboration.
Cheap merging to build datasets over time (with multiple collaborators).
Ability to push and pull changes directly to collaborators or a central server.
Data storage is abstracted, modularized and extensible.