MongoDB Schema Design and Common Practices


Exhaustive documentation:

Mongo executables will be installed into /use/bin/, database files will under /data/db/

Log file location: /var/log/mongodb/mongodb.log



sudo service mongodb start/stop/restart or simply issue mongod to start MongoDB process
mongo to enter MongoDB Console
> show dbs  #Show all databases
> db             #Show current db
>   #View commands related with db

# "Create" a new database named foo_db
> use foo_db  # Mongo will create this DB virutally, if we save anything into any collections, it will really creates the db.
# Create a new document into a collection (implicitly)
>{ first_name: "Wayne", last_name: "Ye" })
# Explicitly create a new collection ()
# Query the collection
> db.user_profiles.find()
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "first_name" : "Wayne", "last_name" : "Ye" }
>{ first_name: "Wendy", last_name: "Shen", gender: "Female" })
> db.user_profiles.find()
# Update
> db.collection.update( { field: value1 }, { $set: { field1: value2 } } );
# View status
> db.stats()
> db.mycol.stats()
# Query subdocument using Dot Notation
> db.demo.insert({ "Items": [ { "Name": "Milk Powder", "Price": 9.9 }, { "Name": "Toy Car", "Price": 26 } ] })
> db.demo.find({ "Items.Price": { $gt: 20 } })
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "Items" : [  {  "Name" : "Milk Powder",  "Price" : 9.9 },  {  "Name" : "Toy Car",  "Price" : 26 } ] }
Batch administration from JavaScript  (Mongo shell JS references)

mongo localhost:27017/mydb db_schema.js


load("scripts/myjstest.js") OR load("/data/db/scripts/myjstest.js")

Schema Design

Embedding (de-normalize data)

Store two related pieces of data in a single document.


  • There is a "contains" relationship between entities.
  • There is a "one-to-many" relationship, and the "many" objects always appear inline with the "one".
Example 1: Blog with comments

Denormalized blog with comments

 _id: 1,
 title: "Investigation on MongoDB",
 content: "some investigation contents",
 permalink: "",
 comments: [
   { content: "Gorgeous post!!!", nickname: "Scott", email: "", timestamp: "1377742184305" },
   { content: "Splendid article!!!", nickname: "Guthrie", email: "", timestamp: "1377742184305" }
Example 2: Dishes and Cheves

Normalized Dishes and Cheves

 _id: 1,
 name: "Kong Bao Ji Ding",
 price: 5.5,
 rate: 4.5,
 cheves: [ "Flora Zhang", "Cristina Wang" ]
 _id: 1,
 name: "Flora Zhang",
 age: 32,
 avatar: "",
 dishes: [ "Kong Bao Ji Ding", "Knight Zhang Beef", "Ma Po Tou Fu" ]


Better performance for read operations
Request and retrieve related data in a single database operation.

Referencing (Normalize-data)

store references between two documents to indicate a relationship between the data represented in each document.


  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.


  • Separation of Concerns
  • Data model independent of logic

Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server. 

Example 3: Books and publisher

Books and publishers

 _id: 1,
 name: "MongoDB Applied Design Patterns",
 price: 35,
 rate: 5,
 author: "Rick Copeland",
 ISBN: "1449340040",
 publisher_id: 1,
 reviews: [
   { isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
   { isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
 _id: 1,
 name: "Packtpub INC",
 address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
 telephone: "+44 0121 265 6484",

Advanced Features


Mongo supports indexing subdocument's key, consider the above "Books and Publishers" collection, Mongo can index the reviewer key by telling him this:

db.books.ensureIndex({ "reviews.reviewer": 1 })

Aggregation framework

A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object. The core point of aggregation framework is the aggregation pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines.

Aggregation was introduced in Mongo version 2.2, below is a table of comparison between Mongo and traditional relational DB from the aspect of aggregation functionalities:

SQL Terms, Functions, and Concepts MongoDB Aggregation Operators
WHERE $match
GROUP BY $group
HAVING $match
SELECT $project
ORDER BY $sort
LIMIT $limit
SUM() $sum
COUNT() $sum
join No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document.

For example, still using the above "Books and Publishers" example, image I want to query "a specific reviewer with the book(s) he/she reviewed", I can do this:

> > db.books.aggregate({ $unwind: "$reviews" }, { $match: { "reviews.reviewer": "Xiaoshen"} })
 "result" : [
   "_id" : 1,
   "name" : "MongoDB Applied Design Patterns",
   "price" : 35,
   "rate" : 5,
   "author" : "Rick Copeland",
   "ISBN" : "1449340040",
   "publisher_id" : 1,
   "reviews" : {
    "isUseful" : true,
    "content" : "Cool book!",
    "reviewer" : "Xiaoshen",
    "timestamp" : "1377742184305"
 "ok" : 1

Aggregation introduction:

One caveat: Aggregation is running upon JavaScript VM, which means - V8 after MongoDB version 2.4, although V8 is deadly fast, it cannot compete with native compiled/optimized C/C++ implementation, refer:

Common Practices 

  • Denormalize data when frequently read together (one-to-one, one-to-many)
  • Normalize data when where are separated queries happened frequently for both entities; or when there are too many data duplications
  • Reduce collection size by always using short field names as a convention. This will help you save memory over time.
  • Avoid using DBRef! Why
  • Always test queries with .explain() to check that you’re hitting the right index.

Useful resources

The ultimate manual 

Greate article explains differences between MongoDB and other famous Relational DBs:

Data Modeling Considerations for MongoDB Applications 

Serialize Documents with the CSharp Driver 

Schema Design --Indexes!! 

Sharding and Mongo DB

MongoDB Operations Best Practices 

Sharding and Replica Sets Illustrated




Leave a comment