X

Track changes made to this page

If you find this page useful and would like to be notified of changes made to this page, start by inputting your email below.



Privacy policy
Close this window

powered by ChangeDetection

Difference between revisions of "Wellwiki Page Generation Scripts"

m (CreateEntitySources.py)
Line 4: Line 4:
 
Wellwiki uses four primary scripts in order to generate new pages. All of these scripts are located within ~/installers/wikimanager/
 
Wellwiki uses four primary scripts in order to generate new pages. All of these scripts are located within ~/installers/wikimanager/
  
==pre_deployment.sql==
+
===pre_deployment.sql===
  
 
The first script updates the violation counts for wells in the well table of wellsdb_dev. It can be run from mysql by logging in as root, selecting wellsdb_dev as the database of choice, and running  
 
The first script updates the violation counts for wells in the well table of wellsdb_dev. It can be run from mysql by logging in as root, selecting wellsdb_dev as the database of choice, and running  
Line 11: Line 11:
 
The script takes about 1 minute to fully execute.
 
The script takes about 1 minute to fully execute.
  
==pre_deployment_co.py==
+
===pre_deployment_co.py===
  
 
This script updates the violation counts for Colorado wells in the well table of wellsdb_dev, since their violation information is stored in a table separate from other states'.  Usage: '''python pre_deployment_co.py'''
 
This script updates the violation counts for Colorado wells in the well table of wellsdb_dev, since their violation information is stored in a table separate from other states'.  Usage: '''python pre_deployment_co.py'''
  
==well_updates.py==
+
===well_updates.py===
  
 
This script updates the well table based on information in permit, production, spud, municipality, and county.  It should be run after new data is added to the database to ensure that well info is fully populated.  It is located in ~/scripts/ and is run using '''python well_updates.py''' in this directory.  If only specific update functions need to be carried out, this can be specified in if __name__=='__main__':.
 
This script updates the well table based on information in permit, production, spud, municipality, and county.  It should be run after new data is added to the database to ensure that well info is fully populated.  It is located in ~/scripts/ and is run using '''python well_updates.py''' in this directory.  If only specific update functions need to be carried out, this can be specified in if __name__=='__main__':.
  
==UpdateManagerPageTable.py==
+
===UpdateManagerPageTable.py===
 
This is a script which adds any wells within the mysql wellsdb_dev database to the wikiManagerPages table within the wikidb database. This should be run prior to CreateEntitySources such that all necessary pages needed to be generated are able to be flagged for updates.
 
This is a script which adds any wells within the mysql wellsdb_dev database to the wikiManagerPages table within the wikidb database. This should be run prior to CreateEntitySources such that all necessary pages needed to be generated are able to be flagged for updates.
  
==Flagging Operators for Update==
+
===Flagging Operators for Update===
 
In addition to flagging wells for update, operator pages should also be set to be updated. This can be done on a state by state basis with the following:
 
In addition to flagging wells for update, operator pages should also be set to be updated. This can be done on a state by state basis with the following:
  
Line 36: Line 36:
 
(SELECT u1.* FROM wikidb.wikiManagerPages u1 LEFT JOIN wellsdb_dev.operator u2 ON u1.page_name=u2.operator_name WHERE u2.operator_state='PA') AS p);
 
(SELECT u1.* FROM wikidb.wikiManagerPages u1 LEFT JOIN wellsdb_dev.operator u2 ON u1.page_name=u2.operator_name WHERE u2.operator_state='PA') AS p);
  
==CreateEntitySources.py==
+
===CreateEntitySources.py===
  
 
This is a script that will read through all wiki pages located in the wikiManagerPages table within the wikidb database that have "NEEDS_UPDATE" set to 1.  
 
This is a script that will read through all wiki pages located in the wikiManagerPages table within the wikidb database that have "NEEDS_UPDATE" set to 1.  
Line 54: Line 54:
 
This script is currently fully functional for at least the state of Pennsylvania.
 
This script is currently fully functional for at least the state of Pennsylvania.
  
==commit_staged_sources.py==
+
===commit_staged_sources.py===
  
 
This file takes a source directory as an argument, and begins adding all wiki pages within that source directory to the actual wikimedia database, making them available for viewing here.  
 
This file takes a source directory as an argument, and begins adding all wiki pages within that source directory to the actual wikimedia database, making them available for viewing here.  
Line 60: Line 60:
 
Usage: '''python commit_staged_sources.py ~/PycharmProjects/WikiManager/WikiSources'''
 
Usage: '''python commit_staged_sources.py ~/PycharmProjects/WikiManager/WikiSources'''
  
==add_county_info_boxes.py==
+
===add_county_info_boxes.py===
  
 
This script adds Wikipedia infoboxes to county pages.  It should be run exactly one time after county pages have been flagged for update and CreateEntitySources.py and commit_staged_sources.py have executed.  Inboxes are added to all county pages that exist in both wikiPageUpload and wikiManagerPages.
 
This script adds Wikipedia infoboxes to county pages.  It should be run exactly one time after county pages have been flagged for update and CreateEntitySources.py and commit_staged_sources.py have executed.  Inboxes are added to all county pages that exist in both wikiPageUpload and wikiManagerPages.
Line 70: Line 70:
 
Currently, all BC well pages are generated at once (not selectively by flagging individual pages).
 
Currently, all BC well pages are generated at once (not selectively by flagging individual pages).
  
==CreateEntitySources.py==
+
===CreateEntitySources.py===
  
 
Uncomment the lines dbClient = DBClient() (57) and CreateBCSources(dbClient) (60) and run as python CreateEntitySources.py.  Newly generated pages are output to '''~/PycharmProjects/WikiManager/WikiSources'''.  Re-comment these lines when finished so that BC pages are not generated every time non-BC pages are generated.
 
Uncomment the lines dbClient = DBClient() (57) and CreateBCSources(dbClient) (60) and run as python CreateEntitySources.py.  Newly generated pages are output to '''~/PycharmProjects/WikiManager/WikiSources'''.  Re-comment these lines when finished so that BC pages are not generated every time non-BC pages are generated.
  
==commit_staged_sources.py==
+
===commit_staged_sources.py===
  
This file adds all wiki pages within a source directory to the actual wikimedia database, making them available for viewing on the website.
+
See [http://wellwiki.org/w/index.php?title=Wellwiki_Page_Generation_Scripts&action=submit#commit_staged_sources.py above].
  
Usage: '''python commit_staged_sources.py ~/PycharmProjects/WikiManager/WikiSources'''
+
===EntityController.py===
 
 
==EntityController.py==
 
  
 
The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/BC/.
 
The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/BC/.
Line 90: Line 88:
 
Currently, all AB well and operator pages are generated at once (not selectively by flagging individual pages).
 
Currently, all AB well and operator pages are generated at once (not selectively by flagging individual pages).
  
==CreateEntitySources.py==
+
===CreateEntitySources.py===
  
 
Uncomment the lines dbClient = DBClient() (57) and CreateABSources(dbClient) (59) and run as '''python CreateEntitySources.py'''.  Newly generated pages are output to ~/PycharmProjects/WikiManager/WikiSources.  Re-comment these lines when finished so that AB pages are not generated every time non-AB pages are generated.
 
Uncomment the lines dbClient = DBClient() (57) and CreateABSources(dbClient) (59) and run as '''python CreateEntitySources.py'''.  Newly generated pages are output to ~/PycharmProjects/WikiManager/WikiSources.  Re-comment these lines when finished so that AB pages are not generated every time non-AB pages are generated.
  
==create_orphan_ab_page.py==
+
===create_orphan_ab_page.py===
  
 
Running this script causes the AB Orphan Wells page source text file to be generated in ~/PycharmProjects/WikiManager/WikiSources.
 
Running this script causes the AB Orphan Wells page source text file to be generated in ~/PycharmProjects/WikiManager/WikiSources.
  
==commit_staged_sources.py==
+
===commit_staged_sources.py===
  
see above
+
See [http://wellwiki.org/w/index.php?title=Wellwiki_Page_Generation_Scripts&action=submit#commit_staged_sources.py above].
  
==EntityController.py==
+
===EntityController.py===
 
The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/AB/.
 
The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/AB/.
  
 
----
 
----
  
==create_tx_pages.py==
+
==TX Pages==
 +
 
 +
===create_tx_pages.py===
  
 
This file takes an argument to generate pages for Texas wells, operators, or counties. It makes use of OperatorSourceBuilder, WellSourceBuilder, CountySourceBuilder, and EntityController located within SourceBuilders/TX/. It makes use of the texas database rather than wellwiki_dev.  
 
This file takes an argument to generate pages for Texas wells, operators, or counties. It makes use of OperatorSourceBuilder, WellSourceBuilder, CountySourceBuilder, and EntityController located within SourceBuilders/TX/. It makes use of the texas database rather than wellwiki_dev.  

Revision as of 11:52, 10 August 2015

Page Generation

Wellwiki uses four primary scripts in order to generate new pages. All of these scripts are located within ~/installers/wikimanager/

pre_deployment.sql

The first script updates the violation counts for wells in the well table of wellsdb_dev. It can be run from mysql by logging in as root, selecting wellsdb_dev as the database of choice, and running source ~/installers/wikimanager/pre_deployment.sql; within a mysql console.

The script takes about 1 minute to fully execute.

pre_deployment_co.py

This script updates the violation counts for Colorado wells in the well table of wellsdb_dev, since their violation information is stored in a table separate from other states'. Usage: python pre_deployment_co.py

well_updates.py

This script updates the well table based on information in permit, production, spud, municipality, and county. It should be run after new data is added to the database to ensure that well info is fully populated. It is located in ~/scripts/ and is run using python well_updates.py in this directory. If only specific update functions need to be carried out, this can be specified in if __name__=='__main__':.

UpdateManagerPageTable.py

This is a script which adds any wells within the mysql wellsdb_dev database to the wikiManagerPages table within the wikidb database. This should be run prior to CreateEntitySources such that all necessary pages needed to be generated are able to be flagged for updates.

Flagging Operators for Update

In addition to flagging wells for update, operator pages should also be set to be updated. This can be done on a state by state basis with the following:


UPDATE wikidb.wikiManagerPages

SET needs_update='1'

WHERE page_name IN

(SELECT page_name FROM

(SELECT u1.* FROM wikidb.wikiManagerPages u1 LEFT JOIN wellsdb_dev.operator u2 ON u1.page_name=u2.operator_name WHERE u2.operator_state='PA') AS p);

CreateEntitySources.py

This is a script that will read through all wiki pages located in the wikiManagerPages table within the wikidb database that have "NEEDS_UPDATE" set to 1.

It uses scripts within DBUtil/ to query the mysql database, which in turn return dictionaries for each well in question in which keys correspond with column names from the mysql database, and values are the corresponding entries for a given well. Scripts within FileUtil/SourceBuilders/ generate page templates and output parsed files using well specific data from DBUtil results. Note that the *SourceBuilder.py scripts are used, while the *TableTemplateBuilder.py scripts are older versions of these and are therefore unused.

Each entry within the wikiManagerPages table will be queried for in the wellsdb_dev database, which then outputs newly generated pages to ~/PycharmProjects/WikiManager/WikiSources

Here is some sample SQL to flag all Pennsylvania pages for update:

UPDATE `wikiManagerPages`

SET `NEEDS_UPDATE`='1'

WHERE `PAGE_NAME` LIKE '37%'

This script is currently fully functional for at least the state of Pennsylvania.

commit_staged_sources.py

This file takes a source directory as an argument, and begins adding all wiki pages within that source directory to the actual wikimedia database, making them available for viewing here.

Usage: python commit_staged_sources.py ~/PycharmProjects/WikiManager/WikiSources

add_county_info_boxes.py

This script adds Wikipedia infoboxes to county pages. It should be run exactly one time after county pages have been flagged for update and CreateEntitySources.py and commit_staged_sources.py have executed. Inboxes are added to all county pages that exist in both wikiPageUpload and wikiManagerPages.


BC Well Pages

Currently, all BC well pages are generated at once (not selectively by flagging individual pages).

CreateEntitySources.py

Uncomment the lines dbClient = DBClient() (57) and CreateBCSources(dbClient) (60) and run as python CreateEntitySources.py. Newly generated pages are output to ~/PycharmProjects/WikiManager/WikiSources. Re-comment these lines when finished so that BC pages are not generated every time non-BC pages are generated.

commit_staged_sources.py

See above.

EntityController.py

The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/BC/.


AB Well Pages

Currently, all AB well and operator pages are generated at once (not selectively by flagging individual pages).

CreateEntitySources.py

Uncomment the lines dbClient = DBClient() (57) and CreateABSources(dbClient) (59) and run as python CreateEntitySources.py. Newly generated pages are output to ~/PycharmProjects/WikiManager/WikiSources. Re-comment these lines when finished so that AB pages are not generated every time non-AB pages are generated.

create_orphan_ab_page.py

Running this script causes the AB Orphan Wells page source text file to be generated in ~/PycharmProjects/WikiManager/WikiSources.

commit_staged_sources.py

See above.

EntityController.py

The selection of data from the wellwiki_dev database in the form of pre-defined SQL statements that return results in the form of a dictionary or array. Located in DomainEntities/AB/.


TX Pages

create_tx_pages.py

This file takes an argument to generate pages for Texas wells, operators, or counties. It makes use of OperatorSourceBuilder, WellSourceBuilder, CountySourceBuilder, and EntityController located within SourceBuilders/TX/. It makes use of the texas database rather than wellwiki_dev.

The generation of pages for Texas consists solely of running this script, followed by commit stages sources on the standard ~/PycharmProjects/WikiManager/WikiSources path.

Creating counties: python create_tx_pages.py county Creating operators: python create_tx_pages.py operator Creating wells: python create_tx_pages.py well

After the creation of Texas county pages, wikipedia infoboxes will need to be manually re-added.