RookieDB概述
Tips:本文主要內(nèi)容是解析RookieDB項目,并進行相關(guān)代碼框架闡述,RookieDB 項目對于需要學(xué)習(xí)數(shù)據(jù)庫領(lǐng)域的同學(xué)們來說是一個不可多得的基礎(chǔ)底層項目,我們雖沒有完整深入關(guān)系型數(shù)據(jù)庫的整體開發(fā),但給了我們一個很好的學(xué)習(xí)視角和途徑,讓我們更明白數(shù)據(jù)庫的數(shù)據(jù)類型、索引、緩存、備份、恢復(fù)等關(guān)鍵技術(shù)節(jié)點。
課程資源
課程網(wǎng)站:https://cs186berkeley.net/
課程作業(yè):7 個 Project
Setup
SQL
B+ Trees
Joins and Query Optimization
Concurrency
NoSQL
資源匯總
https://csdiy.wiki/%E6%95%B0%E6%8D%AE%E5%BA%93%E7%B3%BB%E7%BB%9F/CS186/
在學(xué)習(xí)這門課中用到的所有資源和作業(yè)實現(xiàn)都匯總在 PKUFlyingPig/CS186 - GitHub 中。
文章更好瀏覽體驗:https://blog.codesnake.space/tags/rookiedb
RookieDB Overview
RookieDB is a bare-bones database implementation which supports executing simple transactions in series. In the assignments of this class you will be adding support for B+ tree indices, efficient join algorithms, query optimization, multigranularity locking to allow concurrent execution of transactions, and database recovery.
RookieDB是一個光禿禿的數(shù)據(jù)庫,支持串聯(lián)執(zhí)行簡單的事務(wù)。在這門課的作業(yè)中,你將會使其支持B+樹索引、高效的連接算法、查詢優(yōu)化、允許并發(fā)執(zhí)行事務(wù)的多粒度鎖,以及數(shù)據(jù)庫恢復(fù)。
For convenience, the staff will be maintaining a read-only public repo here containing the project skeleton. When starting projects remember to work off of the private repos provided to you through GitHub Classroom rather than the public one.
為方便起見,工作人員將在這里維護一個包含項目骨架的只讀公共倉庫。當(dāng)開始項目時,請記得使用通過GitHub教室提供給你的私人倉庫而不是公共倉庫。
As you will be working with this codebase for the rest of the semester, it is a good idea to get familiar with it. The code is located in the src/main/java/edu/berkeley/cs186/database
directory, while the tests are located in the src/test/java/edu/berkeley/cs186/database
directory. The following is a brief overview of each of the major sections of the codebase.
由于你將在本學(xué)期余下的時間里使用這個代碼庫,熟悉它是一個好主意。代碼位于
src/main/java/edu/berkeley/cs186/database
目錄中,而測試則位于src/test/java/edu/berkeley/cs186/database
目錄中。下面是對代碼庫中每個主要部分的簡要概述。

cli_命令行界面
The cli
directory contains all the logic for the database's command line interface. Running the main method of CommandLineInterface.java
will create an instance of the database and create a simple text interface that you can send and review the results of queries in. The inner workings of this section are beyond the scope of the class (although you're free to look around), you'll just need to know how to run the Command Line Interface.
cli目錄包含了數(shù)據(jù)庫命令行界面的所有邏輯。運行CommandLineInterface.java的主方法將創(chuàng)建一個數(shù)據(jù)庫的實例,并創(chuàng)建一個簡單的文本界面,你可以在其中發(fā)送和審查查詢結(jié)果。這一部分的內(nèi)部工作超出了本類的范圍(盡管你可以自由地查看),你只需要知道如何運行命令行界面。
parser_解析
The subdirectory cli/parser
contains a lot of scary looking code! Don't be intimidated, this is all automatically generated automatically from the file RookieParser.jjt
in the root directory of the repo. The code here handles the logic to convert from user inputted queries (strings) into a tree of nodes representing the query (parse tree).
子目錄cli/parser(解釋器)包含了很多看起來很嚇人的代碼! 不要被嚇到,這都是由 repo 根目錄下的 RookieParser.jjt 文件自動生成的。這里的代碼處理的是將用戶輸入的查詢(字符串)轉(zhuǎn)換成代表查詢的節(jié)點樹(解析樹)的邏輯。
visitor_訪問
The subdirectory cli/visitor contains classes that help traverse the trees created from the parser and create objects that the database can work with directly.
子目錄cli/visitor(訪問者)包含幫助遍歷從解析器創(chuàng)建的樹并創(chuàng)建數(shù)據(jù)庫可以直接處理的對象的類。
common_公共目錄
The common
directory contains bits of useful code and general interfaces that are not limited to any one part of the codebase.
common(公共)目錄包含了一些有用的代碼和一般的接口,不限于代碼庫的任何一個部分。
concurrency_并發(fā)
The concurrency
directory contains a skeleton for adding multigranularity locking to the database. You will be implementing this in Project 4.
concurrency(并發(fā))目錄包含一個骨架,用于向數(shù)據(jù)庫添加多粒度鎖。你將在項目4中實現(xiàn)這一點。
databox_數(shù)據(jù)盒
Our database has, like most DBMS's, a type system distinct from that of the programming language used to implement the DBMS. (Our DBMS doesn't quite provide SQL types either, but it's modeled on a simplified version of SQL types).
像大多數(shù)DBMS一樣,我們的數(shù)據(jù)庫有一個與用于實現(xiàn)DBMS的編程語言不同的類型系統(tǒng)。(我們的DBMS也不提供完全的SQL類型,但它是以SQL類型的簡化版本為模型的)。
The databox
directory contains classes which represents values stored in a database, as well as their types. The various DataBox
classes represent values of certain types, whereas the Type
class represents types used in the database.
databox目錄包含了代表存儲在數(shù)據(jù)庫中的值的類,以及它們的類型。各種DataBox類代表某些類型的值,而Type類代表數(shù)據(jù)庫中使用的類型。
An example:
DataBox x = new IntDataBox(42); // The integer value '42'.
Type t = Type.intType(); ? ? ? ?// The type 'int'.
Type xsType = x.type(); ? ? ? ? // Get x's type, which is Type.intType().
int y = x.getInt(); ? ? ? ? ? ? // Get x's value: 42.
String s = x.getString(); ? ? ? // An exception is thrown, since x is not a string.
index_索引
The index
directory contains a skeleton for implementing B+ tree indices. You will be implementing this in Project 2.
Index(索引)目錄包含一個實現(xiàn)B+樹索引的骨架。你將在項目2中實現(xiàn)它。
memory_內(nèi)存
The memory
directory contains classes for managing the loading of data into and out of memory (in other words, buffer management).
memory(內(nèi)存)目錄包含管理數(shù)據(jù)載入和流出內(nèi)存的類(換句話說,緩沖區(qū)管理)。
The BufferFrame
class represents a single buffer frame (page in the buffer pool) and supports pinning/unpinning and reading/writing to the buffer frame. All reads and writes require the frame be pinned (which is often done via the requireValidFrame
method, which reloads data from disk if necessary, and then returns a pinned frame for the page).
BufferFrame類表示一個緩沖幀(緩沖池中的頁面),并支持對緩沖幀的固定/取消固定和讀/寫。所有的讀和寫都需要固定幀(這通常是通過requireValidFrame方法完成的,該方法在必要時從磁盤重新加載數(shù)據(jù),然后返回頁面的固定幀)。**
The BufferManager
interface is the public interface for the buffer manager of our DBMS.
The BufferManagerImpl
class implements a buffer manager using a write-back buffer cache with configurable eviction policy. It is responsible for fetching pages (via the disk space manager) into buffer frames, and returns Page objects to allow for manipulation of data in memory.
BufferManager接口是我們DBMS的緩沖區(qū)管理器的公共接口。
BufferManagerImpl類使用帶有可配置的提取策略的回寫緩沖區(qū)緩存實現(xiàn)了緩沖區(qū)管理器。它負責(zé)將頁面(通過磁盤空間管理器)獲取到緩沖區(qū)幀中,并返回Page對象以允許對內(nèi)存中的數(shù)據(jù)進行操作。
The Page
class represents a single page. When data in the page is accessed or modified, it delegates reads/writes to the underlying buffer frame containing the page.
The EvictionPolicy
interface defines a few methods that determine how the buffer manager evicts pages from memory when necessary. Implementations of these include the LRUEvictionPolicy
(for LRU) and ClockEvictionPolicy
(for clock).
Page類表示單個頁面。當(dāng)訪問或修改頁中的數(shù)據(jù)時,它將讀/寫委托給包含頁的底層緩沖幀。
EvictionPolicy接口定義了一些方法,用于確定緩沖區(qū)管理器在必要時如何從內(nèi)存中清除頁。這些方法的實現(xiàn)包括LRUEvictionPolicy(用于LRU)和ClockEvictionPolicy(用于時鐘)。
IO_輸入輸出流
The io
directory contains classes for managing data on-disk (in other words, disk space management).
IO目錄包含用于管理磁盤上數(shù)據(jù)的類(換句話說,磁盤空間管理)。
The DiskSpaceManager
interface is the public interface for the disk space manager of our DBMS.
The DiskSpaceMangerImpl
class is the implementation of the disk space manager, which maps groups of pages (partitions) to OS-level files, assigns each page a virtual page number, and loads/writes these pages from/to disk.
DiskSpaceManager接口是DBMS的磁盤空間管理器的公共接口。
DiskSpaceMangerImpl類是磁盤空間管理器的實現(xiàn),它將頁面組(分區(qū))映射到操作系統(tǒng)級文件,為每個頁面分配一個虛擬頁碼,并將這些頁面從/寫入磁盤。
query_查詢
The query
directory contains classes for managing and manipulating queries.
The various operator classes are query operators (pieces of a query), some of which you will be implementing in Project 3.
Query(查詢)目錄包含用于管理和操作查詢的類。
各種操作符類都是查詢操作符(查詢的一部分),其中一些將在項目3中實現(xiàn)。
The QueryPlan
class represents a plan for executing a query (which we will be covering in more detail later in the semester). It currently executes the query as given (runs things in logical order, and performs joins in the order given), but you will be implementing a query optimizer in Project 3 to run the query in a more efficient manner.
QueryPlan類表示執(zhí)行查詢的計劃(我們將在本學(xué)期晚些時候更詳細地討論)。它目前按照給定的順序執(zhí)行查詢(按照邏輯順序運行,并按照給定的順序執(zhí)行連接),但是您將在Project 3中實現(xiàn)一個查詢優(yōu)化器,以更有效的方式運行查詢。
recovery_恢復(fù)
The recovery
directory contains a skeleton for implementing database recovery a la ARIES. You will be implementing this in Project 5.
Recovery(恢復(fù))目錄包含實現(xiàn)數(shù)據(jù)庫恢復(fù)的框架。您將在項目5中實現(xiàn)它。
table_表
The table
directory contains classes representing entire tables and records.
The Table
class is, as the name suggests, a table in our database. See the comments at the top of this class for information on how table data is layed out on pages.
The Schema
class represents the schema of a table (a list of column names and their types).
The Record
class represents a record of a table (a single row). Records are made up of multiple DataBoxes (one for each column of the table it belongs to).
The RecordId
class identifies a single record in a table.
The HeapFile
interface is the interface for a heap file that the Table
class uses to find pages to write data to.
The PageDirectory
class is an implementation of HeapFile
that uses a page directory.
Table(表)目錄包含表示整個表和記錄的類。
顧名思義,Table類是數(shù)據(jù)庫中的一個表。有關(guān)表數(shù)據(jù)如何在頁面上布局的信息,請參閱該類頂部的注釋。
Schema類表示表的模式(表結(jié)構(gòu))(列名稱及其類型的列表)。
Record類表示一個表(單行)的記錄。記錄由多個databox組成(它所屬的表的每列對應(yīng)一個databox)。
RecordId類標識表中的單個記錄。
HeapFile接口是堆文件的接口,Table類使用堆文件查找要寫入數(shù)據(jù)的頁面。
PageDirectory類是使用頁面目錄的HeapFile的一個實現(xiàn)。
stats_統(tǒng)計數(shù)據(jù)
The table/stats
directory contains classes for keeping track of statistics of a table. These are used to compare the costs of different query plans, when you implement query optimization in Project 4.
table/stats目錄包含用于跟蹤表統(tǒng)計信息的類。在項目4中實現(xiàn)查詢優(yōu)化時,這些參數(shù)用于比較不同查詢計劃的成本。
關(guān)鍵類
Transaction.java
The Transaction
interface is the public interface of a transaction - it contains methods that users of the database use to query and manipulate data.
Transaction接口是事務(wù)的公共接口——它包含數(shù)據(jù)庫用戶用于查詢和操作數(shù)據(jù)的方法。
This interface is partially implemented by the AbstractTransaction
abstract class, and fully implemented in the Database.Transaction
inner class.
該接口部分由AbstractTransaction抽象類實現(xiàn),并在Database中完全實現(xiàn)。事務(wù)內(nèi)部類。
TransactionContext.java
The TransactionContext
interface is the internal interface of a transaction - it contains methods tied to the current transaction that internal methods (such as a table record fetch) may utilize.
TransactionContext接口是事務(wù)的內(nèi)部接口——它包含綁定到當(dāng)前事務(wù)的方法,內(nèi)部方法(如表記錄獲取)可以利用這些方法。
The current running transaction's transaction context is set at the beginning of a Database.Transaction
call (and available through the static getCurrentTransaction
method) and unset at the end of the call.
當(dāng)前正在運行的事務(wù)的事務(wù)上下文設(shè)置在**
Database.Transaction
的開頭。事務(wù)調(diào)用(并且可通過靜態(tài)getCurrentTransaction方法獲得)并在調(diào)用結(jié)束時取消設(shè)置。**
This interface is partially implemented by the AbstractTransactionContext
abstract class, and fully implemented in the Database.TransactionContext
inner class.
該接口部分由AbstractTransactionContext抽象類實現(xiàn),并在數(shù)據(jù)庫中完全實現(xiàn)。TransactionContext內(nèi)部類。
Database.java
The Database
class represents the entire database. It is the public interface of our database - users of our database can use it like a Java library.
Database類表示整個數(shù)據(jù)庫。它是我們數(shù)據(jù)庫的公共接口——我們數(shù)據(jù)庫的用戶可以像使用Java庫一樣使用它。
All work is done in transactions, so to use the database, a user would start a transaction with Database#beginTransaction
, then call some of Transaction
's numerous methods to perform selects, inserts, and updates.
所有的工作都是在事務(wù)中完成的,因此要使用數(shù)據(jù)庫,用戶需要使用databdatab# beginTransaction啟動一個事務(wù),然后調(diào)用transaction的許多方法中的一些來執(zhí)行選擇、插入和更新。
For example:
More complex queries can be found in src/test/java/edu/berkeley/cs186/database/TestDatabase.java
更復(fù)雜的查詢可以在 src/test/java/edu/berkeley/cs186/database/TestDatabase.java 中找到。