The objective of this post is to document getting stated with MongoDB working up to replication (aka replica sets in Mongodb) and eventually sharding to allow arbitrarily adding storage capacity (shards) to the system. For a good general overview of these types of systems see Tim Berglund's talk on Four Distributed Systems Architectural Patterns.
If we want to reduce the risk of loosing data, one way to mitigate the risk is by making copies of the data. This is essentially what replication achieves.
Heads up make sure you actually need Mongodb (or similar) for your project. If it's just for fun because it's cool, carry on! You're learning! Otherwise if it's for business and you're a start-up, trying to learn Mongodb whilst not having a customer-base is a very bad idea. Start simple. See "Sharding and Scaling your database by Neha Narula".
It also wouldn't make sense to put all of your copies in one place. Depending on your risk management this might mean putting your data on separate VPS, entirely different physical servers, and/or different geographical regions. Mongodb allows you to take further advantage of this by tagging. For instance, to make requests from certain users in one country to use the Mongodb instances closest to them for speed. How to use Mongodb Tagging is described in some detail in Everything you Need To Know About Sharding.
So in the spirit of holding onto our data, let's assume we're putting our data onto separate VPS instances hosted ran different providers. We'll use Bitfolk, Vultr and Digital Ocean creating a VPS instance on each. Note this is terrible for performance without proper planning and use of tagging if you want fast write performance:
Mongodb isn't geared toward fast write performance. However depending on our application, perhaps Capped-Collections are what you're looking for, which "support higher insertion throughput".
The _id field:
In Mongodb, the _id field (also known as the ObjectId field) is used by mongodb as the primary key for an entry. It is autoamlly generated for you on an inset- but you may also manually set it at creation time by calling the 'ObjectId' function e.g.
ObjectId("5099803df3f4948bd2f98391")will generate a valid id.
A valid _id / ObjectId field is made up of:
- a 4-byte value representing the seconds since the Unixepoch,
- a 3-byte machine identifier,
- a 2-byte process id, and
- a 3-byte counter, starting with a random value.