Featured Projects

Explore my latest work in MLOps, from automated pipelines to scalable deployment solutions

Titanic

DEATH RECORD'S OF RMS TITANIC INCIDENT

PythonCSVParquetPytorch
200
30

CodeReasoningPro

CodeReasoningPro is a large-scale synthetic dataset comprising 1,785,725 competitive programming problems in Python, created by XythicK, an MLOps Engineer.

PythonCSVParquetPyTorch
87
1,785,725

Chemistry

A dataset of 62,941,756 chemistry questions covering Organic Chemistry (Alkenes, Nomenclature), Inorganic Chemistry (Oxidation States), and Physical Chemistry (Kinetics). Each row includes a question, 2-3 answers with explanations, and difficulty level.

PythonChemistryEducationDataset
88
63,002,273

Orca-Instruct-100K

This repository contains scripts and documentation for generating a synthetic dataset inspired by the Open-Orca dataset. The dataset consists of conversational instruction-response pairs, designed for natural language processing tasks.

PythonNLPInstructionDataset
28
100,000