- Distributed File System
- when file is being uploaded, it will select the chunks based on the file size, by default, 64 mb
- Master declares the nodes, it determines which node will be made.
- Namespace stores the information about the replication of the data
- Google file system or GFS is distributed by default.
- YARN - Yet Another Resource Negotiator
- MapReduce - Makes the link between the master and the slave. And determines which job to assign to the slave. If a job is
- assigned to slave, the slave itself informs about the progress of the work through MPI.
- Queries are run in Hive in Hadoop. SQL + HIVE = HQL
- Hadoop is an eco-system.
Recent Pastes