Alberta Technical Information

General

Tables are backed up using mysqldump to ~/database_backups/ before starting.

CSVs are copied to the server. Their group and ownership are changed to 'mysql' using the chgrp and chown commands. They are then moved to /var/lib/mysql/wellsdb_dev/ such that they can be read by mysql. Ensure that CSVs are free of special characters (e.g. \n and \) and extra quotation marks ".

ab_well

The content for ab_well was obtained from multiple sources, a textfile and shapefile located here: http://www.aer.ca/data-and-publications/statistical-reports/st37

The textfile contains the majority of the information required for the table, excluding FdDate and StatDate which must be extracted from the shapefile. Additionally, longitude and latitude do not seem to be included in the most up to date data. Coordinates were obtained from google drive, within AB/AER/ST37/

This data can be combined into a single csv using excel's index and match functions. Due to the high volume of data, this process can be somewhat time consuming. The database was then updated using the following sql:

Importing:

LOAD DATA INFILE 'abw2.csv' IGNORE

INTO TABLE ab_well FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n' IGNORE 1 ROWS (LOCATION,well_id,Name,Field,Pool,OSArea,OSDep,License,@License_Status,@LicDate_tmp,Licensee,agent,@final_drilling_date_tmp,totalDep,well_status_code,@well_status_Date_tmp,fluid,mode,type,structure,latitude_decimal,longitude_decimal,@statDate_tmp,@FdDate_tmp)

SET LicDate = STR_TO_DATE(@LicDate_tmp, '%m/%d/%Y'), final_drilling_date = STR_TO_DATE(@final_drilling_date_tmp, '%m/%d/%Y'), well_status_Date = STR_TO_DATE(@well_status_Date_tmp, '%m/%d/%Y'), statDate= STR_TO_DATE(@statDate_tmp, '%m/%d/%Y'), FdDate = STR_TO_DATE(@FdDate_tmp, '%m/%d/%Y')

1827 new wells were added.

Setting zero dates to NULL (final_drilling_date and fddate):

update ab_well

set final_drilling_date=NULL

where final_drilling_date='0000-00-00'

Setting blank fields to NULL (name, pool, osarea, agent, and operator):

update ab_well

set name=null

where trim(name)=""

ab_pool_codes

The Reserves Sequence Code column was deleted from Excel document because it is not included in our database table. 524 duplicate rows were then deleted and the file saved as a Windows Comma Separated (.csv).

Importing':'

CREATE TABLE ab_pool_codes_tmp AS (SELECT * FROM ab_pool_codes where 0=1)

LOAD DATA INFILE 'CommingledPoolList.csv'

INTO TABLE ab_pool_codes_tmp FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(field_name, production_pool, field_code, production_pool_code, @dummy, geological_pool_code, geological_pool_name, confidential)

(17000 rows inserted)

INSERT INTO ab_pool_codes

SELECT * FROM ab_pool_codes_tmp t2

WHERE (t2.field_name, t2.production_pool, t2.field_code, t2.production_pool_code, t2.geological_pool_code, t2.geological_pool_name, t2.confidential)

NOT IN (SELECT field_name, production_pool, field_code, production_pool_code, geological_pool_code, geological_pool_name, confidential FROM ab_pool_codes)

(17000 rows added - there were no duplicates between the old and new data)

DROP TABLE ab_pool_codes_tmp

Setting blank fields to NULL:

UPDATE ab_pool_codes

SET geological_pool_name=NULL

WHERE geological_pool_name=

UPDATE ab_pool_codes

SET geological_pool_name=NULL

WHERE LENGTH(geological_pool_name)=1

UPDATE ab_pool_codes

SET confidential=NULL

WHERE LENGTH(confidential)=1

ab_facility

Text files were retrieved from http://www.aer.ca/data-and-publications/statistical-reports/st102, opened in Excel, and saved as CSV.

Importing:

CREATE TABLE ab_facility_tmp AS (SELECT * FROM ab_facility where 0=1)

LOAD DATA INFILE 'ActiveFacility.csv'

INTO TABLE ab_facility_tmp FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(FACILITY_ID, FACILITY_NAME, OPERATOR_CODE, OPERATOR_NAME, SUB_TYPE_CODE, SUB_TYPE, LE, LSD, SEC, TWP, RNG, MER, LICENSE_NUMBER, EDCT_CODE, EDCT_DESCRIPTION, LICENSE_CODE, OPERATIONAL_STATUS)

LOAD DATA INFILE 'InactiveFacility.csv'

INTO TABLE ab_facility_tmp FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(FACILITY_ID, FACILITY_NAME, OPERATOR_CODE, OPERATOR_NAME, SUB_TYPE_CODE, SUB_TYPE, LE, LSD, SEC, TWP, RNG, MER, LICENSE_NUMBER, EDCT_CODE, EDCT_DESCRIPTION, LICENSE_CODE, OPERATIONAL_STATUS)

INSERT INTO ab_facility

SELECT * FROM ab_facility_tmp t2

WHERE (t2.FACILITY_ID, t2.FACILITY_NAME, t2.OPERATOR_CODE, t2.OPERATOR_NAME, t2.SUB_TYPE_CODE, t2.SUB_TYPE, t2.LE, t2.LSD, t2.SEC, t2.TWP, t2.RNG, t2.MER, t2.LICENSE_NUMBER, t2.EDCT_CODE, t2.EDCT_DESCRIPTION, t2.LICENSE_CODE, t2.OPERATIONAL_STATUS)

NOT IN (SELECT FACILITY_ID, FACILITY_NAME, OPERATOR_CODE, OPERATOR_NAME, SUB_TYPE_CODE, SUB_TYPE, LE, LSD, SEC, TWP, RNG, MER, LICENSE_NUMBER, EDCT_CODE, EDCT_DESCRIPTION, LICENSE_CODE, OPERATIONAL_STATUS FROM ab_facility)

DROP TABLE ab_facility_tmp

87923 records added.

Setting blank fields to NULL:

The following query was run to replace blanks with NULLs in all columns except well_id:

UPDATE ab_facility

SET FACILITY_ID=NULL

WHERE LENGTH(FACILITY_ID)=0

Similar operations where performed to replace blank fields with a length of 1 (taking care not to erase any real single digit values). Columns affected were LSD, MER, LICENSE_NUMBER, EDCT_CODE, EDCT_DESCRIPTION, and LICENSE_CODE.

Generating the well location from filled table fields:

The values in the fields SEC, TWP, RNG, and MER often omitted leading zeros, so were fixed as follows:

UPDATE ab_facility

SET SEC=CONCAT('0',SEC)

WHERE LENGTH(SEC)=1

UPDATE ab_facility

SET TWP=CONCAT('00',TWP)

WHERE LENGTH(TWP)=1

UPDATE ab_facility

SET TWP=CONCAT('0',TWP)

WHERE LENGTH(TWP)=2

UPDATE ab_facility

SET RNG=CONCAT('0',RNG)

WHERE LENGTH(RNG)=1

A new column (which will be used to replace well_id) was added to contain the location:

ALTER TABLE ab_facility

ADD LOCATION varchar(20)

UPDATE ab_facility

SET LOCATION=CONCAT(LSD,'-',SEC,'-',TWP,'-',RNG,MER)

WHERE LENGTH(LSD)=2 AND LENGTH(SEC)=2 AND LENGTH(TWP)=3 AND

LENGTH(RNG)=2 AND LENGTH(MER)=2

An index was applied to this column.

ab_operator

The Excel file was obtained from the Business Associate Codes (ST104A) section on http://www.aer.ca/data-and-publications/statistical-reports/st104. Originally, the address was given in a single column and contained the street number, city, province, and postal code. These individual components were extracted into separate fields. Leading and trailing whitespace was removed from cells and the address cells containing ".", "-", "1", "na", "n/a", "none", "unknown", "not found", and "no address found" were cleared, as well as one case of city and postal codes being "unknown".

Excel formulas extracted the address components. The columns in the Excel sheet were as follows:

A: BA Code

B: Company Name

C: Phone

D: Address Long

E: Address (i.e. house/box number)

F: Province

G: Postal Code

H: Country

I: Position of Last Comma in D

J: Last Word Before Province (i.e. last word of city name for multi-word cities, or full city name for single-word cities)

K: City Name Long (only filled for multi-word cities)

L: City

The formulas were:

Last Comma (I): =IF(ISERROR(SEARCH("^^",SUBSTITUTE(D2,",","^^",LEN(D2)-LEN(SUBSTITUTE(D2,",",""))))),"",SEARCH("^^",SUBSTITUTE(D2,",","^^",LEN(D2)-LEN(SUBSTITUTE(D2,",","")))))

Province (F): =IF(ISERROR(MID(D2,I2+2,2)),"",MID(D2,I2+2,2))

Postal Code (G): =IF(ISERROR(MID(D2,I2+5,LEN(D2)-I2-4)),"",MID(D2,I2+5,LEN(D2)-I2-4))

Country (H): =IF(NOT(ISERROR(VLOOKUP($F2, [provinces.xlsx]Sheet1!$B$2:$B$14,1, FALSE))), "Canada", IF(NOT(ISERROR(VLOOKUP($F2, [states.xlsx]Sheet1!$B$2:$B$51,1, FALSE))), "United States", "" ))

Last Word Before Province (J): =IF(ISERROR(RIGHT(LEFT(D2,I2-1),LEN(LEFT(D2,I2-1))-SEARCH("^^",SUBSTITUTE(LEFT(D2,I2-1)," ","^^",LEN(LEFT(D2,I2-1))-LEN(SUBSTITUTE(LEFT(D2,I2-1)," ","")))))),"",RIGHT(LEFT(D2,I2-1),LEN(LEFT(D2,I2-1))-SEARCH("^^",SUBSTITUTE(LEFT(D2,I2-1)," ","^^",LEN(LEFT(D2,I2-1))-LEN(SUBSTITUTE(LEFT(D2,I2-1)," ",""))))))

From here, data was sorted based on province/state and a manual search located multi-word cities, whose proper names were then recorded in K2. The true city name was inserted into L2 by =IF(K2="",J2,K2).

Address (E): =IF(ISERROR(TRIM(SUBSTITUTE(LEFT(D2,I2-1),L2,""))),"",TRIM(SUBSTITUTE(LEFT(D2,I2-1),L2,""))) - removal of all components from the long address to produce the short address

The final Excel file was saved as a CSV.

ab_operator_city_names.xlsx has been added to WellWiki Data on Google Drive to assist with the process of assigning city names in the future.

Importing:

ab_operator_test is an intermediate table used to avoid duplicate key error #1062.

create table ab_operator_test select * from ab_operator where 1=0

LOAD DATA INFILE 'BusinessAssociateCodes_proper_cols.csv'

INTO TABLE ab_operator_test FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(ba_code, company, phone, @dummy, address, province, postal_code, country, @dummy, @dummy, @dummy, city)

INSERT INTO ab_operator (ba_code, company, address, city, province, postal_code, phone, country)

SELECT ba_code, company, address, city, province, postal_code, phone, country FROM ab_operator_test t2

ON DUPLICATE KEY UPDATE ba_code=t2.ba_code, company=t2.company, address=t2.address, city=t2.city, province=t2.province, postal_code=t2.postal_code, phone=t2.phone, country=t2.country

DROP TABLE ab_operator_test

14496 records added.

Setting blank fields to NULL:

UPDATE ab_operator SET city=NULL WHERE city=""

The same operation was performed on province, postal_code, phone, and country.

Some blank strings where still present, so the following were used as well:

UPDATE ab_operator

SET city=NULL

WHERE length(city)<2

UPDATE ab_operator SET phone = REPLACE(REPLACE(phone, '\r', ), '\n', ) (followed by a repeat of the initial update statement)

For consistency of data:

update ab_operator

set country='United States'

where country='US'

ab_violations

The June 2014 data was presented as tables in a PDF document, and extracted by pasting the tables first into Word, then into Excel. The AER Group / Compliance Category column was split into two separate columns to insert into AER_GROUP and COMPLIANCE_CATEGORY in the database. ENFORCEMENT_ACTION_CATEGORY and RISK_CATEGORY were filled out based on the PDF document table headings. The ID # and AER Actions and Licensee Response/Follow-up columns were not used. The Excel document was saved as a CSV.

Importing June 2014:

LOAD DATA INFILE 'June2014_Enforcement_Action_Summary.csv'

INTO TABLE ab_violations FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(@dummy, LICENSEE, @dummy, NONCOMPLIANCE_DESC, ENFORCEMENT_DATE, LOCATION, ENFORCEMENT_ACTION_CATEGORY, RISK_CATEGORY, AER_GROUP, COMPLIANCE_CATEGORY)

57 records added.

The July 2014-June 2015 data was copied from http://www1.aer.ca/compliancedashboard/enforcement.html into Excel and saved in CSV format. The enforcement data here was less extensive than the earlier records.

Importing July 2014-June 2015:

LOAD DATA INFILE 'compliance_june_2015.csv'

INTO TABLE ab_violations FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(@dummy, LICENSEE, NONCOMPLIANCE_DESC, ENFORCEMENT_DATE)

318 records added.

FracFocus Chemical Data

After the creation / updating of Colorado pages, chemical data from the fracfocus registry should be re-added to select pages. This is done by executing python chemicals.py AB from the wikimanager folder. This will generated modified pages within staged_sources. After which, the pages must be committed using python commit_staged_sources.py staged_sources

Track changes made to this page