- Distribution Models
- Oriented - Makes the distribution of data easier
- Distributed Memory - Assign the memory using the network
- Shared Memory - The memory which is assigned to all system using the same memory
- Sharding - Distributing the data across multiple servers. No data redundancy
- Replication: The data is copied data across multiple servers
- Master-server replication
- Peer to peer replication
- Master-slave replication will reduce chance of update conflicts.
- Peer to peer avoid loading writes onto a single server
- Master-slave is no code
- Salesforce is utilizing both the models
- CAP = Consistency, Availability, Partition
- MTTR - MAIN TIME TO REPAIR - How long did it take to repair
- MTBF - MAIN TIME BETWEEN FAILURES - How long did it stay up between the failures
- MTTR is inversely proportional to MTBF.
- This is the availability of the cloud.
- Partition Tolerance - Atomicity of the transaction - Threads that can be used in the tasks.
- Three variables, r, w, n
- r = read, w = write, n = replication factor
- Apache Hadoop & Spark
- The database which can handle the growth of large data
- Hadoop Ecosystem - Scalable, Fault Tolerance, Handle Variety of Data
- Hadoop Distributed File System - It uses the master-slave design.
- Master Node - Stores and manages the meta data
- Slave Node - Storing the data
- HBase
- Horizontal Scalable - Ability to add more resources from pool
- HBase not good for relational database model, like transactional applications, data analytics.
- It is not good for text processing.
- MapReduce - it is a simple programming for Hadoop ecosystem.
- Runs the code in parallel.
- Spark uses memory/caching instead of disk for data sharing, whereas MapReduce uses hard drive to share the data.
- MapReduce - Data is stored in hard drive, so it is better with fault tolerance.
Recent Pastes