Page History: My data is on multiple pages, how can I join it together into one table?
Compare Page Revisions
Page Revision: 2007/06/21 20:18
If data is on different pages of a site, this means you have to use different datapages and therefore separate tables. (in some very rare instances you can write to the same table from several datapages, but this would mean you were scraping the exact same information from different page formats, which is not what would happen in this scenario)
So how do you join the data back together after its scattered to two or more tables? The short answer is by using SQL Joins. Assuming the pages are somehow linked together, we can use the information about those links that we scraped to join the tables together. Imagine the following scenario of three datapages:
Main Page Datapage
SQL Table Name: Main
Fields: f1, f2, f3, Link1, Link2, META_SRC_URI
First Additional Datapage
SQL Table Name: Details1
Fields: f1, f2, META_SRC_URI
Second Additional Datapage
SQL Table Name: Details2
Fields: f1, f2, META_SRC_URI
Link1 and Link2 form the main datapage lead to the two additional pages. The META_SRC_URI field is a META tag that indicates the URL where the data originated from. In the query editor you would do this to join things together:
SELECT Main.f1
, Main.f2
, Main.f3
, Details1.f1 as f4
, Details1.f2 as f5
, Details2.f1 as f6
, Details2.f2 as f7
, Main.META_SRC_URI as MainPageSourceLocation
, Details1.META_SRC_URI as Details1SourceLocation
, Details2.META_SRC_URI as Details2SourceLocation
FROM Main
INNER JOIN Details1 ON Main.Link1 = Details1.META_SRC_URI
INNER JOIN Details2 ON Main.Link2 = Details2.META_SRC_URI