ScrapeBackend API

ScrapeBackend API is a Node.js-based API service designed to scrape and store data from websites. It provides an easy-to-use interface to interact with scraped data and enables both real-time scraping and data retrieval from a MongoDB database.

Features

Web Scraping: Leverage Puppeteer to scrape data from websites.
Real-Time Data Fetching: Scrape and serve data directly from websites.
Database Integration: Store scraped data in a MongoDB database for persistent access.
Flexible API: Expose endpoints for retrieving and managing scraped data.
Modular Design: Easily extend and integrate with other systems.

Installation

Follow the steps below to get the ScrapeBackend API up and running:

Prerequisites

Make sure you have the following installed:

Node.js (v14+)
MongoDB (local or cloud, such as MongoDB Atlas)

1. Clone the Repository

git clone https://github.com/YuvrajKarna/ScrapeBackend-API.git
cd ScrapeBackend-API

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root directory and set the following environment variables:

MONGO_URI=your_mongodb_connection_string
PORT=5000

Replace your_mongodb_connection_string with your MongoDB connection URI.

4. Run the Development Server

npm run devStart

This will start the server on the specified port (5000 by default) with hot reloading enabled.

Usage

Once the server is running, you can interact with the API via HTTP requests. The API supports both real-time scraping and retrieving stored data.

Example Request (GET)

To get a list of all scraped books from the database:

GET http://localhost:5000/api/scrape/books

API Endpoints

`/api/scrape/books` [GET]

Description: Fetch all scraped book details from the database.
Response: Returns an array of book objects.

[
  {
    "title": "A Light in the Attic",
    "price": "£51.77",
    "stock": "In Stock",
    "rating": "Three",
    "link": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
    "stockInfo": "In stock (22 available)",
    "imageLink": "https://books.toscrape.com/media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg"
  },
  ...
]

`/api/scrape/real-time` [POST]

Description: Scrapes data from a provided URL and returns the result.
Request Body:
```
{
  "url": "https://example.com"
}
```
Response: Returns the scraped data from the provided URL.

Contributing

We welcome contributions to ScrapeBackend API! If you'd like to help improve this project, follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Make your changes and commit them (git commit -am 'Add new feature').
Push to your branch (git push origin feature/your-feature).
Create a pull request.

Please ensure your code is well-tested and follows the project's coding conventions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Additional Notes:

Scalability: The project is designed to scale easily. You can extend it to support more complex scraping logic, additional APIs, or more databases.
Error Handling: Ensure you add proper error handling in your own code when interacting with the API.
Puppeteer: Scraping performance can depend on the complexity of the target website. Puppeteer is used here for headless browser-based scraping, but you may need to tweak it for certain sites.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
controllers		controllers
middlewares		middlewares
models		models
routes		routes
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
app.js		app.js
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScrapeBackend API

Table of Contents

Features

Installation

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Run the Development Server

Usage

Example Request (GET)

API Endpoints

`/api/scrape/books` [GET]

`/api/scrape/real-time` [POST]

Contributing

License

Additional Notes:

About

Uh oh!

Releases

Packages

Languages

yuvrajkarna2717/ScrapeBackend-API

Folders and files

Latest commit

History

Repository files navigation

ScrapeBackend API

Table of Contents

Features

Installation

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Run the Development Server

Usage

Example Request (GET)

API Endpoints

/api/scrape/books [GET]

/api/scrape/real-time [POST]

Contributing

License

Additional Notes:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/api/scrape/books` [GET]

`/api/scrape/real-time` [POST]

Packages