# Single-Cell RNA-Seq Analysis of CTRL vs. C3 Conditions

## Project Overview

This repository contains the R script used for the analysis of single-cell RNA-sequencing (scRNA-seq) data to compare two experimental conditions: a control (CTRL) and CDDO-Me treatment (C3).

The analysis pipeline performs the following steps:
*   **Data Loading and Pre-processing:** Loads 10x Genomics filtered feature-barcode matrices.
*   **Doublet Detection and Removal:** Identifies and filters out potential doublets using the `DoubletFinder` package.
*   **Quality Control (QC):** Filters cells based on nUMI, nGene, log10GenesPerUMI, and mitochondrial DNA ratio.
*   **Data Integration:** Merges the two conditions and corrects for batch effects using `Harmony`. Cell cycle effects and mitochondrial ratio are regressed out during scaling with `SCTransform`.
*   **Clustering and Visualization:** Performs dimensionality reduction (PCA and UMAP) and identifies cell clusters.
*   **Cell Type Annotation:** Includes code for renaming clusters based on marker gene expression.
*   **Differential Expression (DE) Analysis:** Identifies DE genes between conditions within specific cell clusters using `MAST`.
*   **Cell Signature Scoring:** Calculates gene signature scores using `UCell`.
*   **Cell-Cell Communication Analysis:** Infers and compares intercellular communication networks between the two conditions using `CellChat`.

## System Requirements & Dependencies

*   **Operating System:** The code has been tested on Windows 10, but it is expected to be compatible with other systems where R is supported.
*   **R Version:** R version 4.0.0 or higher is recommended.
*   **R Packages:** The following packages are required. The version numbers used in the original analysis are listed; you may be able to use newer versions, but compatibility is not guaranteed.

    *   `Seurat` (v4.3)
    *   `patchwork`
    *   `ggplot2`
    *   `cowplot`
    *   `magrittr`
    *   `harmony`
    *   `Matrix`
    *   `dplyr`
    *   `tidyverse`
    *   `DoubletFinder`
    *   `Nebulosa`
    *   `BiocFileCache`
    *   `MAST`
    *   `UCell`
    *   `CellChat`
    *   `NMF`
    *   `ggalluvial`
    *   `ComplexHeatmap`

## Installation

1.  **Clone the repository:**
    ```bash
    git clone [URL to GitHub repository]
    cd [repository name]
    ```

2.  **Install R packages:** Open an R session and run the following commands to install the required dependencies.

    ```r
    # Install from CRAN
    install.packages(c("Seurat", "patchwork", "ggplot2", "cowplot", "magrittr", "harmony", "Matrix", "dplyr", "tidyverse", "NMF", "ggalluvial", "remotes"))

    # Install from Bioconductor
    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    BiocManager::install(c("MAST", "BiocFileCache", "Nebulosa", "ComplexHeatmap"))

    # Install from GitHub
    remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
    remotes::install_github('carmonalab/UCell')
    remotes::install_github("sqjin/CellChat")
    ```

## Folder Structure

For the script to run correctly, please organize your project directory as follows. You will need to create the `data/` and `output/` directories.

```
project_folder/
│
├── analysis_script.R         # The main R script provided
│
├── data/
│   ├── CT_filtered_feature_bc_matrix.h5  # Input data for CTRL sample
│   ├── C3_filtered_feature_bc_matrix.h5  # Input data for C3 sample
│   ├── cycle.rda                         # R data file with cell cycle genes
│   └── annotation.txt                    # Gene annotations file
│
└── output/                     # All generated files will be saved here
    ├── Harmony_object_CTRLC3.RData
    ├── cellchat_triculture_CTRL.rds
    ├── cellchat_triculture_C3.rds
    ├── DEgenes_*.csv
    ├── Clusters_top100.csv
    └── n_cells.csv
```
**Note:** Please download the necessary input data and place it in the `data/` folder. The script must be updated to use these relative paths. For example, change `Read10X_h5("CT_filtered_feature_bc_matrix.h5")` to `Read10X_h5("data/CT_filtered_feature_bc_matrix.h5")`.

## Documentation and Working Example

To execute the analysis pipeline:

1.  **Set Working Directory:** Open the `analysis_script.R` in RStudio or your preferred R environment. The first line of the script `setwd("")` should be changed to the path of `project_folder`.
2.  **Verify Paths:** Ensure all file paths for both inputs (e.g., `.h5` files, `.rda`) and outputs (e.g., `.csv`, `.rds` files) are correct according to the proposed folder structure.
3.  **Run the Script:** Execute the entire script. The analysis is computationally intensive and may require significant time and memory to complete.

The script is divided into sections that correspond to the major steps outlined in the **Project Overview**. Each section can be run sequentially to reproduce the full analysis.

## Limitations

*   The quality control thresholds (e.g., `nCount_RNA >= 500`, `mitoRatio < 0.20`) are dataset-specific and may require adjustment for other datasets.
*   The number of PCA dimensions (`dims = 1:20` or `1:40`) used for downstream analysis was chosen for this specific dataset and should be re-evaluated for others, for example by inspecting the Elbow plot.
*   Cell type annotations (`new.cluster.ids`) were assigned manually based on biological knowledge and marker genes, and may need to be adapted for other experiments.

## Citation

If you use this code for a publication, please cite this repository. Additionally, please include a link to the relevant publication.
