最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

RookieDB_Project 1: SQL

2023-02-10 14:36 作者:CodeSnake  | 我要投稿

Tips???♂

  • SQL項(xiàng)目主要鍛煉同學(xué)們的SQL語(yǔ)句基礎(chǔ)與編寫(xiě)能力,非常建議大家認(rèn)真完成其中SQL聯(lián)系,會(huì)發(fā)現(xiàn)許多之前沒(méi)發(fā)現(xiàn)的SQL問(wèn)題

  • 原文中作者使用的python3命令來(lái)調(diào)用python,我自己的系統(tǒng)安裝部署的python只允許我使用python命令來(lái)調(diào)用,不過(guò)問(wèn)題不大,只要能調(diào)用python運(yùn)行腳本即可

  • 在開(kāi)始第四部分的tasks時(shí),需要認(rèn)真將第三部分的Testing認(rèn)真熟悉,可以做到事半功倍

一、Getting-started

(一)、Logistics

后勤

This project is due Friday, 9/9/2022 at 11:59PM PDT (GMT-7). It is worth 5% of your overall grade in the class.

該項(xiàng)目應(yīng)于2022年9月9日星期五在太平洋時(shí)間11:59PM(GMT-7)提交。它的價(jià)值是你在課堂上總成績(jī)的5%。

(二)、Prerequisites

先決條件

You should watch the SQL I lecture before beginning this project. Later questions will require material from the SQL II lecture.

在開(kāi)始這個(gè)項(xiàng)目之前,你應(yīng)該觀看SQL I講座。后面的問(wèn)題將需要SQL II講座的材料。

(三)、Fetching the released code

獲取發(fā)布代碼

The GitHub Classroom link for this project is in the Project 1 release post on Edstem. Once your private repo is set up clone the project 1 skeleton code onto your local machine.

這個(gè)項(xiàng)目的GitHub教室鏈接在Edstem上的項(xiàng)目1發(fā)布帖中。一旦你的私人 repo 設(shè)置好了,就把項(xiàng)目1的骨架代碼克隆到你的本地機(jī)器上。

(四)、Required Software

所需軟件

1. SQLite3

Check if you already have sqlite3 instead by opening a terminal and running sqlite3 --version. Any version at 3.8.3 or higher should be fine.

If you don't already have SQLite on your machine, the simplest way to start using it is to download a precompiled binary from the SQLite website.

檢查你是否已經(jīng)有了sqlite3而不是通過(guò)打開(kāi)終端并運(yùn)行sqlite3--版本。任何在3.8.3或更高的版本都應(yīng)該是好的。

如果你的機(jī)器上還沒(méi)有SQLite,開(kāi)始使用它的最簡(jiǎn)單方法是從SQLite網(wǎng)站上下載一個(gè)預(yù)編譯的二進(jìn)制文件。

安裝windows版本的SQLite

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Windows. Click on the link sqlite-tools-win32-x86-*.zip to download the binary.

  2. Unzip the file. There should be a sqlite3.exe file in the directory after extraction.

  3. Navigate to the folder containing the sqlite3.exe file and check that the version is at least 3.8.3: cd path/to/sqlite_folder ./sqlite3 --version

  4. Move the sqlite3.exe executable into your fa22-proj1-yourname directory (the same place as the proj1.sql file)

  1. 訪問(wèn)上面鏈接的下載頁(yè)面,并導(dǎo)航到用于Windows的預(yù)編譯二進(jìn)制文件部分。點(diǎn)擊鏈接sqlite-tools-win32-x86-*.zip來(lái)下載二進(jìn)制文件。

  2. 解壓縮該文件。解壓縮后的目錄中應(yīng)該有一個(gè)sqlite3.exe文件。

  3. 導(dǎo)航到包含sqlite3.exe文件的文件夾,檢查版本是否至少是3.8.3: cd path/to/sqlite_folder ./sqlite3 --version

  4. 將sqlite3.exe可執(zhí)行文件移至fa22-proj1-yourname目錄(與proj1.sql文件的位置相同)。

    fa22-proj1-xx下載

Note:省略SQLite其他系統(tǒng)的安裝


2. Python

You'll need a copy of Python 3.5 or higher to run the tests for this project locally. You can check if you already have an existing copy by running python3 --version in a terminal. If you don't already have a working copy download and install one for your appropriate platform from here.

你需要一個(gè)Python 3.5或更高版本的拷貝來(lái)在本地運(yùn)行這個(gè)項(xiàng)目的測(cè)試。你可以通過(guò)在終端運(yùn)行 python3 --版本來(lái)檢查你是否已經(jīng)有了一個(gè)現(xiàn)有的副本。如果你還沒(méi)有一個(gè)可用的拷貝,請(qǐng)從這里下載并安裝一個(gè)適合你的平臺(tái)的拷貝。

image-20230105161638768


(五)、Download and extract the data set

下載并提取數(shù)據(jù)集

Download the data set for this project from the course's Google Drive here. You should get a file called lahman.db.zip. Unzip the lahman.db.zip file inside your fa22-proj1-yourname directory. You should now have a lahman.db file in your fa22-proj1-yourname directory (the same place as the proj1.sql file)

在這里 從課程的Google Drive下載本項(xiàng)目的數(shù)據(jù)集。你應(yīng)該得到一個(gè)叫做lahman.db.zip的文件。在你的fa22-proj1-yourname目錄下解壓lahman.db.zip文件。你現(xiàn)在應(yīng)該在你的fa22-proj1-yourname目錄下有一個(gè)lahman.db文件(與proj1.sql文件的位置相同)。

image-20230105163923163


(六)、Running the tests

運(yùn)行測(cè)試任務(wù)

If you followed the instructions above you should now be able to test your code. Navigate to your project directory and try using python3 test.py. You should get output similar to the following:

如果你按照上面的說(shuō)明,你現(xiàn)在應(yīng)該可以測(cè)試你的代碼了。導(dǎo)航到你的項(xiàng)目目錄,嘗試使用python3 test.py。你應(yīng)該得到類(lèi)似于以下的輸出。

image-20230105164144960

If so, move on to the next section to start the project. If you see ERRORinstead of FAIL create a followup on Edstem with details from your your_output/ folder.

如果是這樣,就進(jìn)入下一節(jié),開(kāi)始項(xiàng)目。如果你看到的是ERROR而不是FAIL,請(qǐng)?jiān)贓dstem上創(chuàng)建一個(gè)跟帖,并附上你的_output/文件夾中的詳細(xì)信息。

二、SQL vs. SQLite

[Note: You can skip this section for now and come back to it while you're doing project 1.]

注意:你可以暫時(shí)跳過(guò)這一節(jié),在做項(xiàng)目1的時(shí)候再來(lái)討論。

(一)、Why Are We Using SQLite in This Class?

為什么我們要在這門(mén)課上使用SQLite?

As you may have learned mostly SQL synax, it will not be the engine that we use for this project. Instead, we will use a more lightweight variant called SQLite. As noted on the docs of SQLite official website, Client/server SQL database engines strive to implement a shared repository of enterprise data. They emphasize scalability, concurrency, centralization, and control. SQLite strives to provide local data storage for individual applications and devices. As such, SQLite is very easy to set up and run, while a standard SQL engine requires setting up an entire server.

正如你可能已經(jīng)了解到的大多數(shù)SQL synax(語(yǔ)法),它將不會(huì)是我們?cè)谶@個(gè)項(xiàng)目中使用的引擎。相反,我們將使用一個(gè)更輕量級(jí)的變體,叫做SQLite。正如在SQLite官方網(wǎng)站的文檔中指出的,客戶(hù)/服務(wù)器 SQL數(shù)據(jù)庫(kù)引擎努力實(shí)現(xiàn)企業(yè)數(shù)據(jù)的共享庫(kù)。它們強(qiáng)調(diào)可擴(kuò)展性、并發(fā)性、集中化和控制。SQLite努力為單個(gè)應(yīng)用程序和設(shè)備提供本地?cái)?shù)據(jù)存儲(chǔ)。因此,SQLite非常容易設(shè)置和運(yùn)行,而標(biāo)準(zhǔn)SQL引擎需要設(shè)置整個(gè)服務(wù)器。

Now, with downloading an app of several megabytes, you can quickly run SQL-like queries on any database you want!

現(xiàn)在,只要下載一個(gè)幾兆字節(jié)的應(yīng)用程序,你就可以在任何你想要的數(shù)據(jù)庫(kù)上快速運(yùn)行類(lèi)似SQL的查詢(xún)!


(二)、New Autograder

新的自動(dòng)評(píng)分

Starting this semester, we will be using a new autograder integrating Cosette to grade your work. The Cosette SQL Solver will check the equivalence of two SQL queries, and that implies if you are not writing in standard SQL syntax, but somehow SQLite engine understood it, Cosette will complain. And you will be deducted 5% of points for that question even if the output produced by your query matches the output of the official solution.

從本學(xué)期開(kāi)始,我們將使用一個(gè)新的自動(dòng)評(píng)分器集成Cosette來(lái)評(píng)分你的作業(yè)。珂賽特SQL求解器將檢查兩個(gè)SQL查詢(xún)的等價(jià)性,這意味著如果您不是用標(biāo)準(zhǔn)SQL語(yǔ)法編寫(xiě),但不知為何SQLite引擎理解了它,[Cosette將會(huì)抱怨。即使你的查詢(xún)結(jié)果與官方解答的結(jié)果相匹配,你也會(huì)被扣5%的分?jǐn)?shù)。


(三)、SQLite Syntax Difference

SQLite語(yǔ)法差異

SQLite is a much more tolerant language than SQL, so a lot of queries that raise an error in SQL will be inferred and run successfully by SQLite. We do not wish that you utilize this tolerance to write "incorrect" queries. Next, we will go over some most common errors that students make and which Cosette Solver will complain about.

SQLite是一種比SQL更寬容的語(yǔ)言,所以很多在SQL中產(chǎn)生錯(cuò)誤的查詢(xún)都會(huì)被SQLite推斷出來(lái)并成功運(yùn)行。我們不希望你利用這種容忍度來(lái)編寫(xiě) "不正確 "的查詢(xún)。接下來(lái),我們將介紹一些學(xué)生最常犯的錯(cuò)誤,Cosette Solver會(huì)抱怨這些錯(cuò)誤。


(四)、Most Common SQL Errors

最常見(jiàn)的SQL錯(cuò)誤

  • Use the alias directly in WHERE/HAVING clause 在WHERE/HAVING子句中直接使用別名

    SELECT birthyear, AGG(col1) AS foo, ...
    ?FROM 186_TAs
    ?GROUP BY birthyear
    ?HAVING foo > "bar"
    ?...

    The problem here is that SELECT is applied after the TAs are "GROUP BY"ed and filtered by "HAVING". At the stage of "HAVING", the SQL engine doesn't understand the alias "foo" in the SELECT clause yet.

    這里的問(wèn)題是,SELECT是在TAs被 "GROUP BY "編輯并被 "HAVING "過(guò)濾后應(yīng)用的。在 "HAVING "階段,SQL引擎還不能理解SELECT子句中的別名 "foo"。


  • "=="

    Something that you may learn in the first day of a CS class includes that computers start at index 0, and "=" is the assignment operator rather than comparison. This convention will be broken in SQL world, where you should use "=" for direct comparison.

    在CS課程的第一天,你可能會(huì)學(xué)到一些東西,包括計(jì)算機(jī)從索引0開(kāi)始,而"="是賦值運(yùn)算符,而不是比較。這個(gè)慣例在SQL世界中會(huì)被打破,你應(yīng)該使用"="來(lái)直接比較。


  • (INNER | { LEFT | RIGHT | FULL } [OUTER]) JOIN without a join condition (連接條件)

    In SQL, only NATURAL JOIN does not require a join condition as it automatically infers the common column names. It is the language's rule that you are required to give some condition with the ON clause.

    在SQL中,只有NATURAL JOIN不需要連接條件,因?yàn)樗茏詣?dòng)推斷出公共列名。這是語(yǔ)言的規(guī)則,你需要在ON子句中給出一些條件。


  • GROUP BY without aggregate ()

    This is probably one of the most common mistakes made by using SQLite. Let's take a look at the following example, where we are trying to gain insight into the attendance rate of each student in 186, displayed with their sid, number of appearances in sections, along with their names.

    這可能是使用SQLite最容易犯的錯(cuò)誤之一。讓我們看一下下面的例子,我們?cè)噲D深入了解186班每個(gè)學(xué)生的出勤率,用他們的sid、在各科出現(xiàn)的次數(shù)以及他們的名字來(lái)顯示。


    SELECT s.sid, SUM(a.attendance) AS attend_rate, s.name
    ?FROM 186_students s INNER JOIN section_attendance a ON s.sid = a.sid
    ?GROUP BY s.sid
    ?...

    In this SQL query, s.sid will be recognized without any issue, as it's the GROUP BY key; same for SUM(a.attendance), as it is the Aggregate column. But how about s.name? It doesn't fall into either of the categories, so it is invalid to use it here.

    在這個(gè)SQL查詢(xún)中,s.sid將被識(shí)別,沒(méi)有任何問(wèn)題,因?yàn)樗荊ROUP BY鍵;SUM(a.attend)也一樣,因?yàn)樗茿ggregate列。但是s.name呢?它不屬于這兩個(gè)類(lèi)別,所以在這里使用它是無(wú)效的。

(五)、OK, so SQLite Seems Untrustworthy...

好的,所以SQLite似乎不值得信任......

Now, you may be very concerned that some code that gets executed in SQLite engine will fail the autograder check. Don't worry about that, as SQLite is a commercial use database engine, it is quite fault-tolerant, that is to say, it catches a lot of syntax issues. The ones mentioned above are just slightly more demanding syntax rules in SQL. So if your code passes the SQLite check, and it is following the rule taught in class, you should be good to go!

現(xiàn)在,你可能非常擔(dān)心一些在SQLite引擎中執(zhí)行的代碼會(huì)無(wú)法通過(guò)自動(dòng)檢查器的檢查。不要擔(dān)心這個(gè)問(wèn)題,因?yàn)镾QLite是一個(gè)商業(yè)用途的數(shù)據(jù)庫(kù)引擎,它具有相當(dāng)?shù)娜蒎e(cuò)性,也就是說(shuō),它可以捕捉到很多語(yǔ)法問(wèn)題。上面提到的那些只是SQL中要求稍高的語(yǔ)法規(guī)則。所以,如果你的代碼通過(guò)了SQLite的檢查,并且遵循了課堂上講授的規(guī)則,你就應(yīng)該可以順利通過(guò)了!

三、Testing

You can run your answers through SQLite directly by running sqlite3 lahman.db to open the database and then entering .read proj1.sql

你可以通過(guò)SQLite直接運(yùn)行你的答案,運(yùn)行sqlite3 lahman.db來(lái)打開(kāi)數(shù)據(jù)庫(kù),然后輸入.read proj1.sql(執(zhí)行 proj1.sql文件中的 SQL)。

This can help you catch any syntax errors in your SQLite.

這可以幫助你抓住SQLite中的任何語(yǔ)法錯(cuò)誤。

To help debug your logic, we've provided output from each of the views you need to define in questions 1-4 for the data set you've been given. Your views should match ours, but note that your SQL queries should work on ANY data set. We will test your queries on a (set of) different database(s), so it is NOT sufficient to simply return these results in all cases! Please also note that queries that join on extra, unnecessary tables will slow down queries and not receive full credit on the hidden tests.

為了幫助調(diào)試你的邏輯,我們提供了你在問(wèn)題1-4中需要定義的每個(gè)視圖的輸出,用于你所得到的數(shù)據(jù)集。你的視圖應(yīng)該與我們的一致,但請(qǐng)注意,你的SQL查詢(xún)應(yīng)該在任何數(shù)據(jù)集上工作。我們將在一個(gè)(一組)不同的數(shù)據(jù)庫(kù)上測(cè)試你的查詢(xún),所以在所有情況下僅僅返回這些結(jié)果是不夠的。還請(qǐng)注意,在額外的、不必要的表上連接的查詢(xún)將減慢查詢(xún)速度,并且在隱藏測(cè)試中不能得到滿分。

To run the test, from within the fa22-proj1-yourname directory:

要運(yùn)行測(cè)試,從fa22-proj1-yourname目錄中。

image-20230105165932131

Become familiar with the UNIX diff format, if you're not already, because our tests saves a simplified diff for any query executions that don't match in diffs/. As an example, the following output for diffs/q1i.txt::

如果你還不熟悉UNIX的diff格式,因?yàn)槲覀兊臏y(cè)試為任何在diffs/中不匹配的查詢(xún)執(zhí)行保存了一個(gè)簡(jiǎn)化的diff。作為一個(gè)例子,下面是diffs/q1i.txt的輸出:

- 1|1|1
+ Jumbo|Diaz|1984
+ Walter|Young|1980

indicates that your output has an extra 1|1|1 (the - at the beginning means the expected output doesn't include this line but your output has it) and is missing the lines Jumbo|Diaz|1984 and Walter|Young|1980 (the plus at the beginning means the expected output does include those lines but your output is missing it). If there is neither a + nor - at the beginning then it means that the line is in both your output and the expected output (your output is correct for that line).

表示你的輸出有一個(gè)額外的1|1|1(開(kāi)頭的-表示預(yù)期輸出不包括這一行,但你的輸出有),并且缺少Jumbo|Diaz|1984Walter|Young|1980這兩行(開(kāi)頭的+表示預(yù)期輸出包括這兩行,但你的輸出缺少這一行)。如果開(kāi)頭沒(méi)有 "+"或"-",則意味著你的輸出和預(yù)期輸出中都有該行(你的輸出對(duì)該行來(lái)說(shuō)是正確的)。

If you care to look at the query outputs directly, ours are located in the expected_output directory. Your view output should be located in your solution's your_output directory once you run the tests.

如果你想直接看一下查詢(xún)輸出,我們的輸出位于expected_output目錄下。你的視圖輸出應(yīng)該位于你的解決方案的your_output目錄中,一旦你運(yùn)行測(cè)試。

Note: For queries where we don't specify the order, it doesn't matter how you sort your results; we will reorder before comparing. Note, however, that our test query output is sorted for these cases, so if you're trying to compare yours and ours manually line-by-line, make sure you use the proper ORDER BY clause (you can determine this by looking in test.py). Different versions of SQLite handle floating points slightly differently so we also round certain floating point values in our own queries. A full list is specified here for convenience:

注意:對(duì)于我們沒(méi)有指定順序的查詢(xún),你如何對(duì)你的結(jié)果進(jìn)行排序并不重要;我們會(huì)在比較之前重新排序。但是請(qǐng)注意,在這些情況下,我們的測(cè)試查詢(xún)輸出是排序的,所以如果你想手動(dòng)逐行比較你的和我們的,請(qǐng)確保你使用適當(dāng)?shù)腛RDER BY子句(你可以通過(guò)查看test.py來(lái)確定)。不同版本的SQLite處理浮點(diǎn)的方式略有不同,所以我們?cè)谧约旱牟樵?xún)中也對(duì)某些浮點(diǎn)值進(jìn)行了舍入。為了方便起見(jiàn),這里指定了一個(gè)完整的列表。


四、Tasks

In this project we will be working with the commonly-used Lahman baseball statistics database (our friends at the San Francisco Giants tell us they use it!) The database contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2019. It includes data from the two current leagues (American and National), four other "major" leagues (American Association, Union Association, Players League, and Federal League), and the National Association of 1871-1875.

在這個(gè)項(xiàng)目中,我們將使用常用的拉赫曼棒球統(tǒng)計(jì)數(shù)據(jù)庫(kù)(我們?cè)谂f金山巨人隊(duì)的朋友告訴我們他們使用它?。┰摂?shù)據(jù)庫(kù)包含了1871年至2019年美國(guó)職業(yè)棒球大聯(lián)盟的投球、擊球和擊球統(tǒng)計(jì)。它包括目前的兩個(gè)聯(lián)盟(美國(guó)和國(guó)家),其他四個(gè) "主要 "聯(lián)盟(美國(guó)協(xié)會(huì)、聯(lián)盟協(xié)會(huì)、球員聯(lián)盟和聯(lián)邦聯(lián)盟),以及1871-1875年的國(guó)家協(xié)會(huì)的數(shù)據(jù)。

At this point you should be able to run SQLite and view the database using either ./sqlite3 -header lahman.db (if in the previous section you downloaded a precompiled binary) or sqlite3 -header lahman.db otherwise. If you're using windows and you find that the previous command doesn't work, try running winpty ./sqlite3 lahman.db.

在這一點(diǎn)上,你應(yīng)該能夠運(yùn)行SQLite并使用./sqlite3 -header lahman.db(如果在上一節(jié)中你下載了一個(gè)預(yù)編譯的二進(jìn)制文件)或sqlite3 -header lahman.db查看數(shù)據(jù)庫(kù)。如果你使用的是windows系統(tǒng),而你發(fā)現(xiàn)前面的命令不起作用,可以嘗試運(yùn)行winpty ./sqlite3 lahman.db。

$ sqlite3 lahman.db
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
sqlite> .tables

Try running a few sample commands in the SQLite console and see what they do:

試著在SQLite控制臺(tái)運(yùn)行幾個(gè)示例命令,看看它們的作用。

sqlite> .schema people sqlite> ?SELECT playerid, namefirst, namelast FROM people; sqlite> SELECT COUNT(*) FROM fielding;

(一)、Understanding the Schema

了解用例庫(kù)

image-20230105170142926

The database is comprised of the following main tables:

該數(shù)據(jù)庫(kù)由以下主要表組成。

People - Player names, date of birth (DOB), and biographical info # 球員姓名、出生日期(DOB)和傳記信息 Batting - batting statistics # 擊球統(tǒng)計(jì) Pitching - pitching statistics # 投手統(tǒng)計(jì) Fielding - fielding statistics # 外場(chǎng)統(tǒng)計(jì)

It is supplemented by these tables:

它由這些表加以補(bǔ)充。

?AllStarFull - All-Star appearances # 全明星出場(chǎng)次數(shù) ? HallofFame - Hall of Fame voting data # 名人堂投票數(shù)據(jù) ? Managers - managerial statistics # 經(jīng)理人的統(tǒng)計(jì)數(shù)據(jù) ? Teams - yearly stats and standings # 年度統(tǒng)計(jì)和排名 ? BattingPost - post-season batting statistics # 季后擊球統(tǒng)計(jì) ? PitchingPost - post-season pitching statistics # 季后賽投手統(tǒng)計(jì)數(shù)據(jù) ? TeamFranchises - franchise information # 特許經(jīng)營(yíng)信息 ? FieldingOF - outfield position data # 外場(chǎng)位置數(shù)據(jù) ? FieldingPost- post-season fielding data # 季后賽的外野數(shù)據(jù) ? FieldingOFsplit - LF/CF/RF splits # LF/CF/RF的分界線 ? ManagersHalf - split season data for managers # 經(jīng)理人的分賽季數(shù)據(jù) ? TeamsHalf - split season data for teams # 球隊(duì)的分賽季數(shù)據(jù) ? Salaries - player salary data # 球員薪資數(shù)據(jù) ? SeriesPost - post-season series information # 季后賽系列信息 ? AwardsManagers - awards won by managers # 經(jīng)理獲得的獎(jiǎng)項(xiàng) ? AwardsPlayers - awards won by players # 球員獲得的獎(jiǎng)項(xiàng) ? AwardsShareManagers - award voting for manager awards # 經(jīng)理獎(jiǎng)的投票情況 ? AwardsSharePlayers - award voting for player awards # 球員獎(jiǎng)項(xiàng)的投票情況 ? Appearances - details on the positions a player appeared at # 球員出場(chǎng)位置的詳細(xì)信息 ? Schools - list of colleges that players attended # 球員就讀的大學(xué)列表 ? CollegePlaying - list of players and the colleges they attended # 球員名單及其就讀的大學(xué) ? Parks - list of major league ballparks # 大聯(lián)盟的球場(chǎng)列表 ? HomeGames - Number of homegames played by each team in each ballpark # 每支球隊(duì)在每個(gè)球場(chǎng)所進(jìn)行的主場(chǎng)比賽的數(shù)量

For more detailed information, see the docs online.

更詳細(xì)的信息,請(qǐng)見(jiàn)在線文檔。


(二)、Writing Queries

We've provided a skeleton solution file, proj1.sql, to help you get started. In the file, you'll find a CREATE VIEW statement for each part of the first 4 questions below, specifying a particular view name (like q2i) and list of column names (like playerid, lastname). The view name and column names constitute the interface against which we will grade this assignment. In other words, don't change or remove these names. Your job is to fill out the view definitions in a way that populates the views with the right tuples.

我們提供了一個(gè)骨架解決方案文件,proj1.sql,以幫助你開(kāi)始工作。在該文件中,你會(huì)發(fā)現(xiàn)下面前4個(gè)問(wèn)題的每一部分都有一個(gè)CREATE VIEW語(yǔ)句,指定一個(gè)特定的視圖名稱(chēng)(如q2i)和列名列表(如playerid、lastname)。視圖名稱(chēng)和列名構(gòu)成了界面,我們將據(jù)此對(duì)這項(xiàng)作業(yè)進(jìn)行評(píng)分。換句話說(shuō),不要改變或刪除這些名稱(chēng)。你的工作是填寫(xiě)視圖的定義,以便用正確的圖元來(lái)填充視圖。

image-20230105200112189


For example, consider Question 0: "What is the highest era (earned run average) recorded in baseball history?".

In the proj1.sql file we provide:

例如,考慮問(wèn)題0:"在棒球歷史上記錄的最高時(shí)代(自責(zé)分率ERA, earned run average)是什么?"。

在proj1.sql文件中,我們提供:

CREATE VIEW q0(era) AS ? ? SELECT 1 -- replace this line ;

You would edit this with your answer, keeping the schema the same:

你可以用你的答案編輯這個(gè),保持模式不變。

-- solution you provide CREATE VIEW q0(era) AS ?SELECT MAX(era) ?FROM pitching ;

To complete the project, create a view for q0 as above (via copy-paste), and for all of the following queries, which you will need to write yourself.

You can confirm the test is now passing by running python3 test.py -q 0

為了完成這個(gè)項(xiàng)目,為上面的q0創(chuàng)建一個(gè)視圖(通過(guò)復(fù)制粘貼),并為下面所有的查詢(xún)創(chuàng)建一個(gè)視圖,你需要自己編寫(xiě)。

你可以通過(guò)運(yùn)行python3 test.py -q 0來(lái)確認(rèn)測(cè)試是否通過(guò)。

image-20230105200732326

More details on testing can be found in the Testing section.

關(guān)于測(cè)試的更多細(xì)節(jié)可以在測(cè)試部分找到。

1. Changes from Lecture

課件變化

SQLite doesn't support every SQL feature covered in lecture, specifically:

SQLite并不支持講座中所涉及的每一個(gè)SQL特性,特別是:

  • There is support for LEFT OUTER JOIN but not RIGHT OUTER or FULL OUTER.

    • To get equivalent output to RIGHT OUTER you can reverse the order of the tables (i.e. A RIGHT JOIN B is the same as B LEFT JOIN A.

    • While it isn't required to complete this assignment, the equivalent to FULL OUTER JOIN can be done by UNIONing RIGHT OUTER and LEFT OUTER

  • There is no regex match (~) tilde operator. You can use LIKE instead.

  • There is no ANY or ALL operator.

  • 支持LEFT OUTER JOIN,但不支持RIGHT OUTER或FULL OUTER。

  • 為了得到與RIGHT OUTER相當(dāng)?shù)妮敵?,你可以將表的順序顛倒過(guò)來(lái)(即 A RIGHT JOIN B與B LEFT JOIN A相同。

  • 雖然這不是完成這項(xiàng)任務(wù)的必要條件,但可以通過(guò)UNION RIGHT OUTER和LEFT OUTER來(lái)實(shí)現(xiàn)與FULL OUTER JOIN的等價(jià)。

  • 沒(méi)有regex匹配(~)的tilde操作符。你可以用LIKE代替。

  • 沒(méi)有ANY或ALL操作符。


(三)、Tasks

1. Task 1: Basics

任務(wù)1:基礎(chǔ)

i. In the people table, find the namefirst, namelast and birthyear for all players with weight greater than 300 pounds.

在people表中,找出所有體重超過(guò)300磅的球員的 namefirst, namelastbirthyear 。

ii. Find the namefirst, namelast and birthyear of all players whose namefirst field contains a space. Order the results by namefirst, breaking ties with namelast both in ascending order

查找所有 namefirst字段包含空格的球員的 namefirst, namelastbirthyear 。 ?將結(jié)果按namefirst排序,打破與namelast的并列關(guān)系,均按升序排列。

iii. From the people table, group together players with the same birthyear, and report the birthyear, average height, and number of players for each birthyear. Order the results by birthyear in ascending order.

從people表中,將具有相同出生年份的球員分組,并查詢(xún)出每個(gè)出生年份的球員的出生年份、平均身高和人數(shù)。將結(jié)果按出生年份升序排列。

Note: Some birth years have no players; your answer can simply skip those years. In some other years, you may find that all the players have a NULL height value in the dataset (i.e. height IS NULL); your query should return NULL for the height in those years.

注意:有些出生年份沒(méi)有球員;你的答案可以直接跳過(guò)這些年份。在其他一些年份,你可能會(huì)發(fā)現(xiàn)所有的球員在數(shù)據(jù)集中的身高值都是空的(即身高是空的);你的查詢(xún)?cè)谶@些年份的身高應(yīng)該返回空值。

iv. Following the results of part iii, now only include groups with an average height > 70. Again order the results by birthyear in ascending order.

按照第iii部分的結(jié)果,現(xiàn)在只包括平均身高 > 70 的群體。再次將結(jié)果按出生年份以升序排列。

2. Task 2: Hall of Fame Schools

名人堂的學(xué)校

i. Find the namefirst, namelast, playerid and yearid of all people who were successfully inducted into the Hall of Fame in descending order of yearid. Break ties on yearid by playerid (ascending).

找到所有成功入選名人堂的人的namefirst、namelast、playerid 和 yearid ,按yearid降序排列。按playerid(升序)打破與yearid的并列關(guān)系。

ii. Find the people who were successfully inducted into the Hall of Fame and played in college at a school located in the state of California. For each person, return their namefirst, namelast, playerid, schoolid, and yearid in descending order of yearid. Break ties on yearid by schoolid, playerid (ascending). For this question, yearid refers to the year of induction into the Hall of Fame.

  • Note: a player may appear in the results multiple times (once per year in a college in California).

查找成功入選名人堂并在位于加利福尼亞州的學(xué)校打過(guò)大學(xué)的人。對(duì)于每個(gè)人,按年份降序返回他們的名字(namefirst)、名字(namelast)、球員(playerid)、學(xué)校(schoolid)和年份(yearid)。在yearid的基礎(chǔ)上,通過(guò)schoolid、playerid(升序)打破并列關(guān)系。在這個(gè)問(wèn)題上,yearid指的是入選名人堂的年份。

iii. Find the playerid, namefirst, namelast and schoolid of all people who were successfully inducted into the Hall of Fame -- whether or not they played in college. Return people in descending order of playerid. Break ties on playerid by schoolid (ascending). (Note: schoolid should be NULL if they did not play in college.)

查找所有成功進(jìn)入名人堂的人的 "playerid"、"namefirst"、"namelast "和 "schoolid",無(wú)論他們是否在大學(xué)打球。按照 "playerid "的降序排列來(lái)返回人。按照schoolid(升序)打破playerid上的并列。(注意:如果他們沒(méi)有在大學(xué)打球,schoolid應(yīng)該是NULL。)

3. Task 3: SaberMetrics

棒球記錄統(tǒng)計(jì)分析

i. Find the playerid, namefirst, namelast, yearid and single-year slg (Slugging Percentage) of the players with the 10 best annual Slugging Percentage recorded over all time. A player can appear multiple times in the output. For example, if Babe Ruth’s slg in 2000 and 2001 both landed in the top 10 best annual Slugging Percentage of all time, then we should include Babe Ruth twice in the output. For statistical significance, only include players with more than 50 at-bats in the season. Order the results by slg descending, and break ties by yearid, playerid (ascending).

找出有史以來(lái)10個(gè)最佳年度猛擊率記錄的球員的playerid、namefirst、namelast、yearid和單年slg(猛擊率)。一個(gè)球員可以在輸出中出現(xiàn)多次。例如,如果貝比-魯斯在2000年和2001年的擊球率都進(jìn)入了歷史上年度最佳擊球率的前10名,那么我們應(yīng)該在輸出中包括貝比-魯斯兩次。為了統(tǒng)計(jì)學(xué)上的意義,只包括在該賽季有超過(guò)50次擊球的球員。將結(jié)果按slg降序排列,并按yearid、playerid(升序排列)打破并列關(guān)系。

  • Baseball note: Slugging Percentage is not provided in the database; it is computed according to a simple formula you can calculate from the data in the database.

  • SQL note: You should compute slg properly as a floating point number---you'll need to figure out how to convince SQL to do this!

  • Data set note: The online documentation batting mentions two columns 2B and 3B. On your local copy of the data set these have been renamed H2B and H3B respectively (columns starting with numbers are tedious to write queries on).

  • Data set note: The column H o f the batting table represents all hits = (# singles) + (# doubles) + (# triples) + (# home runs), not just (# singles) so you’ll need to account for some double-counting

  • If a player played on multiple teams during the same season (for example anderma02 in 2006) treat their time on each team separately for this calculation

  • 棒球說(shuō)明:數(shù)據(jù)庫(kù)中沒(méi)有提供猛擊率;它是根據(jù)一個(gè)簡(jiǎn)單的公式計(jì)算出來(lái)的,你可以根據(jù)數(shù)據(jù)庫(kù)中的數(shù)據(jù)來(lái)計(jì)算。(XXX在2006年球季535 AB打數(shù)中,擊出94支一壘安打 1B、33支二壘安打 2B、1支三壘安打 3B以及49支本壘打HR。他的Slugging Percentage即為:[(1×94)+(2×33)+(3×1)+(4×49)]÷535=359÷535=0.671)

  • SQL注意:你應(yīng)該把slg正確地計(jì)算成一個(gè)浮點(diǎn)數(shù)--你需要弄清楚如何說(shuō)服SQL來(lái)做這件事。

  • 數(shù)據(jù)集說(shuō)明:在線文檔中提到了兩列2B和3B。在你本地的數(shù)據(jù)集副本中,這兩列已經(jīng)分別被重新命名為H2B和H3B(以數(shù)字開(kāi)頭的列在編寫(xiě)查詢(xún)時(shí)很繁瑣)。

  • 數(shù)據(jù)集說(shuō)明:擊球表的H列代表所有擊球數(shù)=(#單打)+(#雙打)+(#三打)+(#全壘打),而不僅僅是(#單打),所以你需要考慮到一些重復(fù)計(jì)算。

  • 如果一名球員在同一賽季在多支球隊(duì)效力(例如2006年的anderma02),在此計(jì)算中應(yīng)分別處理他們?cè)诿恐蜿?duì)的時(shí)間

ii. Following the results from Part i, find the playerid, namefirst, namelast and lslg (Lifetime Slugging Percentage) for the players with the top 10 Lifetime Slugging Percentage. Lifetime Slugging Percentage (LSLG) uses the same formula as Slugging Percentage (SLG), but it uses the number of singles, doubles, triples, home runs, and at bats each player has over their entire career, rather than just over a single season.

根據(jù)第一部分的結(jié)果,找出終身打擊率排名前10的球員的playerid、namefirst、namelast和lslg(Lifetime Slugging Percentage)。終身重?fù)袈剩↙SLG)使用的公式與重?fù)袈剩⊿LG)相同,但它使用的是每個(gè)球員在整個(gè)職業(yè)生涯中的單打、二打、三打、全壘打和擊球數(shù),而不僅僅是一個(gè)賽季。

Note that the database only gives batting information broken down by year; you will need to convert to total information across all time (from the earliest date recorded up to the last date recorded) to compute lslg. Order the results by lslg (descending) and break ties by playerid (ascending)

  • Note: Make sure that you only include players with more than 50 at-bats across their lifetime.

請(qǐng)注意,數(shù)據(jù)庫(kù)只給出了按年份劃分的擊球信息;你需要轉(zhuǎn)換為所有時(shí)間的總信息(從最早的記錄日期到最后的記錄日期)來(lái)計(jì)算lslg。將結(jié)果按lslg排序(降序),并按球員ID打破平局(升序)。

  • 注意:確保你只包括一生中擁有超過(guò)50次擊球的球員。

iii. Find the namefirst, namelast and Lifetime Slugging Percentage (lslg) of batters whose lifetime slugging percentage is higher than that of San Francisco favorite Willie Mays.

找出終身?yè)羟蚵矢哂谂f金山最受歡迎的威利-梅斯的擊球手的名字(namefirst)、名字(namelast)和終身?yè)羟蚵剩╨slg)。

You may include Willie Mays' playerid in your query (mayswi01), but you may not include his slugging percentage -- you should calculate that as part of the query. (Test your query by replacing mayswi01 with the playerid of another player -- it should work for that player as well! We may do the same in the autograder.)

  • Note: Make sure that you still only include players with more than 50 at-bats across their lifetime.

你可以在你的查詢(xún)中包括威利-梅斯的球員ID(mayswi01),但是你不能包括他的擊球率--你應(yīng)該把它作為查詢(xún)的一部分來(lái)計(jì)算。(將mayswi01替換成另一個(gè)球員的球員ID來(lái)測(cè)試你的查詢(xún)--它對(duì)那個(gè)球員也應(yīng)該有效?。N覀兛梢栽赼utograder中也這樣做)。

  • 注意:確保你仍然只包括在其一生中擁有超過(guò)50次擊球的球員。

Just for fun: For those of you who are baseball buffs, variants of the above queries can be used to find other more detailed SaberMetrics, like Runs Created or Value Over Replacement Player. Wikipedia has a nice page on baseball statistics; most of these can be computed fairly directly in SQL.

Also just for fun: SF Giants VP of Baseball Operations, Yeshayah Goldfarb, suggested the following:

Using the Lahman database as your guide, make an argument for when MLBs “Steroid Era” started and ended. There are a number of different ways to explore this question using the data.

(Please do not include your "just for fun" answers in your solution file! They will break the autograder.)

只是為了好玩。對(duì)于那些愛(ài)好棒球的人來(lái)說(shuō),上述查詢(xún)的變體可以用來(lái)尋找其他更詳細(xì)的SaberMetrics,比如說(shuō)Runs Created或Value Over Replacement Player。維基百科有一個(gè)很好的關(guān)于棒球統(tǒng)計(jì)的頁(yè)面;其中大部分可以在SQL中直接計(jì)算出來(lái)。

也只是為了好玩。舊金山巨人隊(duì)的棒球運(yùn)營(yíng)副總裁Yeshayah Goldfarb建議如下。

以Lahman數(shù)據(jù)庫(kù)為指導(dǎo),為MLB(美國(guó)職業(yè)棒球大聯(lián)盟)的 "類(lèi)固醇時(shí)代 "的開(kāi)始和結(jié)束時(shí)間做一個(gè)論證。有許多不同的方法可以利用這些數(shù)據(jù)來(lái)探討這個(gè)問(wèn)題。

(請(qǐng)不要在你的解決方案文件中包括你的 "只是為了好玩 "的答案! 它們會(huì)玩壞自動(dòng)評(píng)分器)。


4. Task 4: Salaries

薪資

i. Find the yearid, min, max and average of all player salaries for each year recorded, ordered by yearid in ascending order.

找到每一年記錄的所有球員工資的yearid, min, max and average,按yearid升序排列。

ii. For salaries in 2016, compute a histogram. Divide the salary range into 10 equal bins from min to max, with binids 0 through 9, and count the salaries in each bin. Return the binid, low and high boundaries for each bin, as well as the number of salaries in each bin, with results sorted from smallest bin to largest.

對(duì)于2016年的工資,計(jì)算一個(gè)柱狀圖。將工資范圍從最小到最大分為10個(gè)相等的bin,binid為0到9,并計(jì)算每個(gè)bin中的工資。返回每個(gè)binid,每個(gè)bin的低和高邊界,以及每個(gè)bin中的工資數(shù)量,結(jié)果從最小的bin到最大的bin排序。

  • Note: binid 0 corresponds to the lowest salaries, and binid 9 corresponds to the highest. The ranges are left-inclusive (i.e. [low, high)) -- so the high value is excluded. For example, if bin 2 has a high value of 100000, salaries of 100000 belong in bin 3, and bin 3 should have a low value of 100000.

  • Note: The high value for bin 9 may be inclusive).

  • Note: The test for this question is broken into two parts. Use python3 test.py -q 4ii_bins_0_to_8 and python3 test.py -q 4ii_bin_9 to run the tests

  • Hidden testing advice: we will be testing the case where a bin has zero player salaries in it. The correct behavior in this case is to display the correct binid, low and high with a count of zero, NOT just excluding the bin altogether.

注意:binid 0對(duì)應(yīng)的是最低工資,binid 9對(duì)應(yīng)的是最高工資。這些范圍是左邊包含的(即[低,高])--所以高值被排除。例如,如果bin 2的高值是100000,那么100000的工資就屬于bin 3,而bin 3的低值應(yīng)該是100000。

注意:9號(hào)倉(cāng)的高值可能是包括在內(nèi)的)。

注意:本題的測(cè)試分為兩部分。使用python3 test.py -q 4ii_bins_0_to_8和python3 test.py -q 4ii_bin_9來(lái)運(yùn)行測(cè)試

隱藏的測(cè)試建議:我們將測(cè)試一個(gè)bin中的玩家工資為零的情況。在這種情況下,正確的行為是顯示正確的binid,低位和高位,計(jì)數(shù)為0,而不是完全排除這個(gè)bin。

Some useful information:一些有用的信息:

  • In the lahman.db, you may find it helpful to use the provided helper table binids, which contains all the possible binids. Get a feel of what the data looks like by running SELECT * FROM binids; in a sqlite terminal. We'll only be testing with these possible binids (there aren't any hidden tests using say, 100 bins) so using the hardcoded table is fine

  • If you want to take the floor of a positive float value you can do CAST (some_value AS INT)

在lahman.db中,你可能會(huì)發(fā)現(xiàn)使用提供的輔助表binids很有幫助,它包含了所有可能的binids。通過(guò)在sqlite終端運(yùn)行SELECT * FROM binids; 來(lái)感受一下數(shù)據(jù)的樣子。我們只用這些可能的binids進(jìn)行測(cè)試(沒(méi)有任何使用100個(gè)bin的隱藏測(cè)試),所以使用硬編碼表就可以了。

如果你想取一個(gè)正的浮點(diǎn)數(shù)的下限,你可以做CAST (some_value AS INT)

iii. Now let's compute the Year-over-Year change in min, max and average player salary. For each year with recorded salaries after the first, return the yearid, mindiff, maxdiff, and avgdiff with respect to the previous year. Order the output by yearid in ascending order. (You should omit the very first year of recorded salaries from the result.)

現(xiàn)在讓我們來(lái)計(jì)算最小、最大和平均球員工資的年際變化。對(duì)于第一年之后的每一年的工資記錄,返回相對(duì)于前一年的yearid、mindiff、maxdiff和avgdiff。按照yearid的升序來(lái)排列輸出。(你應(yīng)該從結(jié)果中省略第一年的工資記錄)。

iv. In 2001, the max salary went up by over $6 million. Write a query to find the players that had the max salary in 2000 and 2001. Return the playerid, namefirst, namelast, salary and yearid for those two years. If multiple players tied for the max salary in a year, return all of them.

在2001年,最高工資增加了600多萬(wàn)美元。寫(xiě)一個(gè)查詢(xún),找到在2000年和2001年擁有最高工資的球員。返回這兩年的球員ID、姓名第一、姓名第二、工資和年份ID。如果有多名球員在某年獲得了最高工資,則返回所有的球員。

  • Note on notation: you are computing a relational variant of the argmax for each of those two years.

注意符號(hào):你正在計(jì)算這兩年中每一年的argmax的關(guān)系變體。

v. Each team has at least 1 All Star and may have multiple. For each team in the year 2016, give the teamid and diffAvg (the difference between the team's highest paid all-star's salary and the team's lowest paid all-star's salary).

每支球隊(duì)至少有1名全明星,可能有多名。對(duì)于2016年的每支球隊(duì),請(qǐng)給出球隊(duì)名稱(chēng)和diffAvg(球隊(duì)最高薪酬的全明星球員的工資和球隊(duì)最低薪酬的全明星球員的工資之間的差異)。

  • Note: Due to some discrepancies in the database, please draw your team names from the All-Star table (so use allstarfull.teamid in the SELECT statement for this).

注意:由于數(shù)據(jù)庫(kù)中存在一些差異,請(qǐng)從全明星表中抽取你的球隊(duì)名稱(chēng)(所以在SELECT語(yǔ)句中使用allstarfull.teamid來(lái)實(shí)現(xiàn))。

You're done!

Rerun python3 test.py to see if you're passing tests. If so, follow the instructions in the next section to submit your work.

重新運(yùn)行python test.py,看看你是否通過(guò)了測(cè)試。如果是,請(qǐng)按照下一節(jié)的說(shuō)明提交你的工作。

五、My-answer


RookieDB_Project 1: SQL的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
黔西| 子洲县| 三门县| 曲麻莱县| 手机| 隆尧县| 盱眙县| 镇平县| 丹凤县| 涿州市| 巴中市| 蚌埠市| 中阳县| 刚察县| 读书| 仁怀市| 赣州市| 七台河市| 花莲市| 滦平县| 万安县| 剑河县| 荔浦县| 六安市| 宁蒗| 花莲市| 扎兰屯市| 青冈县| 洮南市| 新平| 宣汉县| 象州县| 定兴县| 赤峰市| 精河县| 三台县| 盐亭县| 东山县| 福海县| 辽宁省| 栾川县|