-
Notifications
You must be signed in to change notification settings - Fork 0
Commodities patterns data
Andy Gout edited this page Feb 8, 2023
·
4 revisions
commodities-patterns.json is no longer ignored by Git so the information in this wiki can now be found in that file.
The first list of 40 commodities to train the model on were as follows:
[
"aluminium",
"Aluminium",
"amber",
"Amber",
"Brent crude",
"cattle",
"Cattle",
"cobalt",
"Cobalt",
"cocoa",
"Cocoa",
"coffee",
"Coffee",
"copper",
"Copper",
"corn",
"Corn",
"cotton",
"Cotton",
"crude oil",
"Crude oil",
"ethanol",
"Ethanol",
"gold",
"Gold",
"grain",
"Grain",
"heating oil",
"Heating oil",
"hogs",
"Hogs",
"iron",
"Iron",
"lead",
"Lead",
"lithium",
"Lithium",
"milk",
"Milk",
"molybdenum",
"Molybdenum",
"natural gas",
"Natural gas",
"nickel",
"Nickel",
"oats",
"Oats",
"palladium",
"Palladium",
"palm oil",
"Palm oil",
"platinum",
"Platinum",
"poultry",
"Poultry",
"propane",
"Propane",
"rapeseed",
"Rapeseed",
"rice",
"Rice",
"rubber",
"Rubber",
"silver",
"Silver",
"soybeans",
"Soybeans",
"soya beans",
"Soya beans",
"sugar",
"Sugar",
"tin",
"Tin",
"wheat",
"Wheat",
"wool",
"Wool",
"zinc",
"Zinc"
]
The next approach reduced that list to 20 commodities that each had a greater number of articles, which it was acknowledged would be better for training the NER model (~100 articles per commodity for training and evaluation, with more in addition for testing). It also adapted the format to employ spaCy features to improve precision (see spaCy docs: Example: Using linguistic annotations and Rule-based entity recognition: Entity Patterns:
[
[
{ "LOWER": "aluminium", "POS": "NOUN" }
],
[
{ "LOWER": "cattle", "POS": "NOUN" }
],
[
{ "LOWER": "cobalt", "POS": "NOUN" }
],
[
{ "LOWER": "cocoa", "POS": "NOUN" }
],
[
{ "LOWER": "coffee", "POS": "NOUN" }
],
[
{ "LOWER": "copper", "POS": "NOUN" }
],
[
{ "LOWER": "corn", "POS": "NOUN" }
],
[
{ "LOWER": "cotton", "POS": "NOUN" }
],
[
{ "LOWER": "crude", "POS": "ADJ" },
{ "LOWER": "oil", "POS": "NOUN" }
],
[
{ "LOWER": "gold", "POS": "NOUN" }
],
[
{ "LOWER": "iron", "POS": "NOUN" },
{ "LOWER": "ore", "POS": "NOUN" }
],
[
{ "LOWER": "lithium", "POS": "NOUN" }
],
[
{ "LOWER": "natural", "POS": "ADJ" },
{ "LOWER": "gas", "POS": "NOUN" }
],
[
{ "LOWER": "palm", "POS": "NOUN" },
{ "LOWER": "oil", "POS": "NOUN" }
],
[
{ "LOWER": "poultry", "POS": "NOUN" }
],
[
{ "LOWER": "rice", "POS": "NOUN" }
],
[
{ "LOWER": "silver", "POS": "NOUN" }
],
[
{ "LOWER": "sugar", "POS": "NOUN" }
],
[
{ "LOWER": "wheat", "POS": "NOUN" }
],
[
{ "LOWER": "zinc", "POS": "NOUN" }
]
]