{"_id":"5514436ae0e4bf1900378d05","version":{"_id":"5509d0dae463aa3d000dd351","forked_from":"55078478fa89210d00c8ca26","project":"55078477fa89210d00c8ca23","__v":5,"createdAt":"2015-03-18T19:24:10.735Z","releaseDate":"2015-03-18T19:24:10.735Z","categories":["5509d0dbe463aa3d000dd352","5509d0dbe463aa3d000dd353","5509d1664c7c3f2300aac023","550c237bf23ffc230037b98c","55192997822f9c23006e9571","55192a8f337285170047f867"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"0.9.0","version":"0.9"},"__v":16,"category":{"_id":"5509d0dbe463aa3d000dd352","pages":["5509d0dbe463aa3d000dd354","5509d0dbe463aa3d000dd355","5509d0dbe463aa3d000dd356","5509d0dbe463aa3d000dd357","5509d0dbe463aa3d000dd358","5509e8a9e463aa3d000dd3a6","5509e9fbbc9cc21900200f48","5509f7fae463aa3d000dd3e0","550b2788635c660d00528295","550c0ce1f23ffc230037b96b","550c1b67f23ffc230037b980","550c2308351eeb19006b16c0","550de8078387ac0d00ed9d8f","550deb606c0b4c0d00fd4382","551052834980621900063311","5514436ae0e4bf1900378d05","551469918ad90b1700b32fc0","552924f7d739240d00a347bd"],"version":"5509d0dae463aa3d000dd351","__v":14,"project":"55078477fa89210d00c8ca23","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-03-17T01:33:44.491Z","from_sync":false,"order":0,"slug":"documentation","title":"Documentation"},"project":"55078477fa89210d00c8ca23","user":"5507846643d3400d0052fd09","githubsync":"","updates":[],"next":{"pages":[],"description":""},"createdAt":"2015-03-26T17:35:38.554Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"## What is Origins?\n\nOrigins is an open source bi-temporal database for storing and retrieving facts about the *state* of things. It supports \"time-travel\" queries, aggregate views, and change detection.\n\n## What is the motivation behind Origins?\n\nThe fundamental goal of Origins is to make it easy record and derive value from the provenance of things.\n\nThe motivation came from working on complex data integration projects which consist of many components change over time at different rates such as requirements, systems, processes, people and data itself. Changing one component, such as a requirement, may require changing several degrees of dependent components, such as a specifications, scripts, and/or schemata. However, what are ultimately accountable for these changes are the *artifacts* the system produces. This can be observed when a consumer interacts with the system and questions the validity of the information. To confidently rectify the issue, the state of the system needs to be known at the time the artifact was produced.\n\nKeeping a record of all things that influence something over time is termed provenance. A valid record of provenance would involve taking a snapshot of the state of all components that were involved in producing the artifact at the time it was produced. Depending on the complexity of the system, both in number of components and relationships between them, and the granularity of state information recorded for each component, performing a naive *snapshot* of the system could be computationally expensive and result in a lot of redundant data.\n\n## What problem does Origins solve?\n\nFor many use cases, tracking when and how data change over time can be a useful feature or essential requirement. In this context, *data* are the state of multiple, independent, and heterogeneous *resources* that span multiple business or functional domains.\n\nOrigins was designed for collaborative environments, but with an emphasis on *reducing the need* to coordinate efforts to change resources. This allows resources to change at an independent rate without information loss so dependent resources can be notified and updated at their own pace.\n\n## What is the current state of the project?\n\nOrigins is in an **alpha stage** of development, however it is being developed against two production-quality multi-organization data integration projects.\n\n## What are the design goals?\n\n- **Define a simple interface for recording provenance.** The easier it is to author the provenance data, the easier it is for a person to get started with it and integrate into existing toolchains.\n- **Provide tools for answering provenance-related questions.** Data is only useful if you can derive value from it and the tools must express immediate value.\n\nWhen a design or feature is evaluated, it must contribute to one of these two goals.\n\n## What is the data model Origins uses?\n\nThe data model is based on the entity-attribute-value (EAV) data model. The *unit of data* is called a **fact**. Facts are [immutable](https://en.wikipedia.org/wiki/Immutable_object) and stored in an append-only format.\n\n## Why is an EAV data model used?\n\nThe [original use case](doc:establishing-trust-in-data-integration-projects) that drove the development of Origins required a flexible data model to represent the various types of resources and data. An EAV model was selected since it is essentially the simplest model for representing very granular data. As a result, virtually all things can be represented in this model including both attributes and relationships between data.\n\n## What does \"append-only\" mean?\n\nAppend-only refers to the way facts are written to the database. Most databases can be referred to as [place-oriented systems](http://www.infoq.com/presentations/Value-Values). These systems only maintain the current state of the data and do so by updating values *in-place*. When an update (or delete) like this occurs, the prior data is overwritten or deleted and cannot be accessed again. With an append-only system data is never updated in placed, but rather the new *state* of the data is appended to the data log.\n\n## If a fact is immutable, how can data be *updated*?\n\nIn addition to the EAV components, facts have an **operation** associated with them, either *assert* or *retract*. For *new* entity/attribute pairs, an assertion represents an insertion. If an existing pair exists, asserting a fact with a different value represents an update. A retraction represents a deletion.\n\nWhat this means is that an *update* is a logical operation, two assertions for the same entity/attribute pair.\n\n## What does bi-temporal mean?\n\n[Bi-temporal](https://en.wikipedia.org/wiki/Temporal_database) means that data is stored with two times associated with it: *transaction* time and a *valid* time. Transaction time is the time the information *becomes known* or available while valid time is the time the information itself is true or valid in the world.\n\nFor example, if you were to learn about an event that occurred January 1, 1972, the transaction time of that event would be *now* and the valid time would be January 1, 1972.\n\n## How is the bi-temporal model used?\n\nAs mentioned above, the Origins data model is *based* on the EAV data model. The two temporal attributes are also included in the data model. What this means is that transaction and valid times are known for every fact.\n\nCombining these two models make it possible to ask questions of the data at *any point in time* and, thusly, *compare the state of data between two time points*. This enables the following features:\n\n- Report provenance and lineage of data over time.\n- Derived history of events (audit log).\n- Detect when retrospective data was added.\n- Detect out-dated or invalid relationships between data.\n\n## What is a resource?\n\nA **resource** is a named set of entities corresponding such as:\n\n- Data model\n- Data dictionary\n- Vocabulary\n- Document\n- Service\n- Data set\n\n## How do you create a resource?\n\nMany projects start out by generated resources from an existing systems, services, and sources using one of the built-in generators including:\n\n- Databases\n    - PostgreSQL\n    - Oracle\n    - MySQL\n    - SQLite\n    - MongoDB\n- Files\n    - REDCap Data Dictionary\n    - CSV files with a header\n- Services\n    - GitHub Issues\n- Other\n    - Git repository\n\nAnother method is to **fork** an existing resource to make domain-specific changes. The advantages of forking an existing resource is that the existing history is included.\n\nThe last option, of course is to create a new one from scratch.\n\n## How is a data set represented as a resource?\n\nData sets are unique in that a data depends on the model it is a record of. Therefore when a data-based resource is created, a resource for the data model must also exist.\n\n## What is the content of a resource?\n\nAs mentioned above, a resource is a named of **entities**. An entity is just an opaque identifier and has no inherent meaning. **Facts** are asserted for an entity which associates descriptive and structural information, statistics and annotations, and relationships to other entities.\n\n## What does a resource's history look like?\n\nThe history is a bi-product of the granular fact-based data model. For each fact asserted for a resource that results in a change, an event is produced.\n\nFor example if we a have resource name \"Joe's Diary\" and it has a fact stating that Joe lives in New York, the resource would look like this.\n\noperation|entity|attribute|value\n----|----|----|----|----\nassert|joe|lives in|New York\n\nIf he were to move to California, we will retract the previous fact and assert the new one:\n\noperation|entity|attribute|value\n----|----|----|----|----\nretract|joe|lives in|New York\nassert|joe|lives in|California\n\nAn event will be emitted noting the final result of the change:\n\n```\n\"joe\" \"lives in\" changed from \"New York\" to \"California\"\n```","excerpt":"Introduction and frequently asked questions about Origins.","slug":"faq","type":"basic","title":"Intro & FAQ"}

Intro & FAQ

Introduction and frequently asked questions about Origins.

## What is Origins? Origins is an open source bi-temporal database for storing and retrieving facts about the *state* of things. It supports "time-travel" queries, aggregate views, and change detection. ## What is the motivation behind Origins? The fundamental goal of Origins is to make it easy record and derive value from the provenance of things. The motivation came from working on complex data integration projects which consist of many components change over time at different rates such as requirements, systems, processes, people and data itself. Changing one component, such as a requirement, may require changing several degrees of dependent components, such as a specifications, scripts, and/or schemata. However, what are ultimately accountable for these changes are the *artifacts* the system produces. This can be observed when a consumer interacts with the system and questions the validity of the information. To confidently rectify the issue, the state of the system needs to be known at the time the artifact was produced. Keeping a record of all things that influence something over time is termed provenance. A valid record of provenance would involve taking a snapshot of the state of all components that were involved in producing the artifact at the time it was produced. Depending on the complexity of the system, both in number of components and relationships between them, and the granularity of state information recorded for each component, performing a naive *snapshot* of the system could be computationally expensive and result in a lot of redundant data. ## What problem does Origins solve? For many use cases, tracking when and how data change over time can be a useful feature or essential requirement. In this context, *data* are the state of multiple, independent, and heterogeneous *resources* that span multiple business or functional domains. Origins was designed for collaborative environments, but with an emphasis on *reducing the need* to coordinate efforts to change resources. This allows resources to change at an independent rate without information loss so dependent resources can be notified and updated at their own pace. ## What is the current state of the project? Origins is in an **alpha stage** of development, however it is being developed against two production-quality multi-organization data integration projects. ## What are the design goals? - **Define a simple interface for recording provenance.** The easier it is to author the provenance data, the easier it is for a person to get started with it and integrate into existing toolchains. - **Provide tools for answering provenance-related questions.** Data is only useful if you can derive value from it and the tools must express immediate value. When a design or feature is evaluated, it must contribute to one of these two goals. ## What is the data model Origins uses? The data model is based on the entity-attribute-value (EAV) data model. The *unit of data* is called a **fact**. Facts are [immutable](https://en.wikipedia.org/wiki/Immutable_object) and stored in an append-only format. ## Why is an EAV data model used? The [original use case](doc:establishing-trust-in-data-integration-projects) that drove the development of Origins required a flexible data model to represent the various types of resources and data. An EAV model was selected since it is essentially the simplest model for representing very granular data. As a result, virtually all things can be represented in this model including both attributes and relationships between data. ## What does "append-only" mean? Append-only refers to the way facts are written to the database. Most databases can be referred to as [place-oriented systems](http://www.infoq.com/presentations/Value-Values). These systems only maintain the current state of the data and do so by updating values *in-place*. When an update (or delete) like this occurs, the prior data is overwritten or deleted and cannot be accessed again. With an append-only system data is never updated in placed, but rather the new *state* of the data is appended to the data log. ## If a fact is immutable, how can data be *updated*? In addition to the EAV components, facts have an **operation** associated with them, either *assert* or *retract*. For *new* entity/attribute pairs, an assertion represents an insertion. If an existing pair exists, asserting a fact with a different value represents an update. A retraction represents a deletion. What this means is that an *update* is a logical operation, two assertions for the same entity/attribute pair. ## What does bi-temporal mean? [Bi-temporal](https://en.wikipedia.org/wiki/Temporal_database) means that data is stored with two times associated with it: *transaction* time and a *valid* time. Transaction time is the time the information *becomes known* or available while valid time is the time the information itself is true or valid in the world. For example, if you were to learn about an event that occurred January 1, 1972, the transaction time of that event would be *now* and the valid time would be January 1, 1972. ## How is the bi-temporal model used? As mentioned above, the Origins data model is *based* on the EAV data model. The two temporal attributes are also included in the data model. What this means is that transaction and valid times are known for every fact. Combining these two models make it possible to ask questions of the data at *any point in time* and, thusly, *compare the state of data between two time points*. This enables the following features: - Report provenance and lineage of data over time. - Derived history of events (audit log). - Detect when retrospective data was added. - Detect out-dated or invalid relationships between data. ## What is a resource? A **resource** is a named set of entities corresponding such as: - Data model - Data dictionary - Vocabulary - Document - Service - Data set ## How do you create a resource? Many projects start out by generated resources from an existing systems, services, and sources using one of the built-in generators including: - Databases - PostgreSQL - Oracle - MySQL - SQLite - MongoDB - Files - REDCap Data Dictionary - CSV files with a header - Services - GitHub Issues - Other - Git repository Another method is to **fork** an existing resource to make domain-specific changes. The advantages of forking an existing resource is that the existing history is included. The last option, of course is to create a new one from scratch. ## How is a data set represented as a resource? Data sets are unique in that a data depends on the model it is a record of. Therefore when a data-based resource is created, a resource for the data model must also exist. ## What is the content of a resource? As mentioned above, a resource is a named of **entities**. An entity is just an opaque identifier and has no inherent meaning. **Facts** are asserted for an entity which associates descriptive and structural information, statistics and annotations, and relationships to other entities. ## What does a resource's history look like? The history is a bi-product of the granular fact-based data model. For each fact asserted for a resource that results in a change, an event is produced. For example if we a have resource name "Joe's Diary" and it has a fact stating that Joe lives in New York, the resource would look like this. operation|entity|attribute|value ----|----|----|----|---- assert|joe|lives in|New York If he were to move to California, we will retract the previous fact and assert the new one: operation|entity|attribute|value ----|----|----|----|---- retract|joe|lives in|New York assert|joe|lives in|California An event will be emitted noting the final result of the change: ``` "joe" "lives in" changed from "New York" to "California" ```