Skip to content

Xcrap Factory is a set of utilities for dynamically creating instances of clients, extractors, and parsing models, making it easier to configure and extend scraping and parsing pipelines.

License

Notifications You must be signed in to change notification settings

Xcrap-Cloud/factory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕷️ Xcrap Factory: Instantiate clients, parsing models, and extractors from configuration objects

Xcrap Factory is a set of utilities for dynamically creating instances of clients, extractors, and parsing models, making it easier to configure and extend scraping and parsing pipelines.

📦 Installation

Installation is straightforward—just use your favorite dependency manager. Here’s an example using NPM:

npm i @xcrap/factory

🛠️ Features

  • createClient: Instantiates clients from a registry of allowed classes.
  • createExtractor: Creates extractor functions from configurable text and a registry of allowed extractors.
  • createParsingModel: Builds validated and nested parsing models with customizable extractors and types.

🚀 Usage

1. Creating a Client

import { GotScrapingClient } from "@xcrap/got-scraping-client"
import { AxiosClient } from "@xcrap/axios-client"
import { createClient } from "@xcrap/factory"

const config = {
	allowedClients: {
		"got-scraping": GotScrapingClient,
		"axios": AxiosClient 
	}
}

const client = createClient({
	config: config,
	type: "...", // Client type
	options: {...} // Client constructor options
})

2. Creating an Extractor

import { extractInnerText, extractSrc, extractHref, extractAttribute } from "@xcrap/parser"
import { createExtractor } from "@xcrap/factory"

const config = {
	allowedExtractors: {
		innerText: extractInnerText,
		src: extractSrc,
		href: extractHref,
		attribute: extractAttribute // extractAttribute(name: string) -> Generates an extractor
	},
	argumentSeparator: ":" // Optional | Usage example -> "attribute:value"
}

const extractor = createExtractor({
	extractorText: "..", // innerText, src, href, attribute:ATTRIBUTE_NAME...
	config: config
})

3. Creating a Parsing Model

import { HtmlParsingModel, JsonParsingModel } from "@xcrap/parser"
import { createParsingModel } from "@xcrap/factory"

const config = {
	allowedExtractors: {...},
	extractorArgumentSeparator: "...", // Optional
	allowedModels: {
		html: HtmlParsingModel,
		json: JsonParsingModel
	}
}

const parsingModel = createParsingModel({
	config: config,
	model: {
		type: "html", // Model type: html, json..
		model: {
			title: {
				query: "title",
				extractor: "innerText",
			},
			bodyData: { // Nested model
				query: "body",
				nested: {
					type: "html",
					model: {
						heading: {
							query: "h1",
							extractor: "innerText"
						}
					}
				}
			}
		}
	}
})

🧪 Testing

Automated tests are located in __tests__. To run them:

npm run test

🤝 Contributing

  • Want to contribute? Follow these steps:
  • Fork the repository.
  • Create a new branch (git checkout -b feature-new).
  • Commit your changes (git commit -m 'Add new feature').
  • Push to the branch (git push origin feature-new).
  • Open a Pull Request.

📝 License

This project is licensed under the MIT License.

About

Xcrap Factory is a set of utilities for dynamically creating instances of clients, extractors, and parsing models, making it easier to configure and extend scraping and parsing pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published