# Microglia Signature Enrichment Analysis

## Project Overview

This repository contains an R script for analyzing differentially expressed genes (DEGs) from a bulk RNA-seq experiment on microglia. The primary goal of this analysis is to determine if the upregulated and downregulated genes from the experiment are significantly enriched in publicly available microglia gene signatures.

The analysis pipeline performs the following key steps:
*   **Data Loading:** Loads the user's DEG table and a public microglia signature database (`HuMiCa`).
*   **Gene Set Preparation:** Filters for significantly up- and downregulated genes from the user's data and organizes the public signatures into a list of gene sets.
*   **Enrichment Analysis:**
    *   Performs a **Fisher's Exact Test** to calculate the statistical significance of the overlap between the user's DEGs and each public signature.
    *   (Optional) Performs **Gene Set Enrichment Analysis (GSEA)** for a ranked-based enrichment test.
*   **Visualization:** Generates several plots to visualize the enrichment results, including:
    *   Lollipop plots showing significantly enriched signatures.
    *   A heatmap displaying the presence/absence of the most frequent overlapping genes across signatures.
    *   Dot plots for GSEA results.

## System Requirements & Dependencies

*   **Operating System:** Tested on Windows, but compatible with macOS and Linux.
*   **R Version:** R version 4.0.0 or higher is recommended.
*   **R Packages:** The following packages are required:

    *   `tidyverse`
    *   `readxl`
    *   `VennDiagram`
    *   `data.table`
    *   `fgsea` (from BiocManager)
    *   `pheatmap`

## Installation

1.  **Clone the repository:**
    ```bash
    git clone [URL to your GitHub repository]
    cd [repository name]
    ```

2.  **Install R packages:** Open an R session and run the following commands to install the required dependencies.

    ```r
    # Install from CRAN
    install.packages(c("tidyverse", "readxl", "VennDiagram", "data.table", "pheatmap"))

    # Install fgsea from BiocManager
    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    BiocManager::install("fgsea")
    ```

## Folder Structure

To ensure the script runs correctly, please organize your project directory as follows. You will need to create the `data/` and `output/` directories.

```
project_folder/
│
├── signatures_analysis.Rmd     # The main R Markdown script
│
├── data/
│   ├── Table 1_DEG Bulk RNAseq Microglia.csv  # Input DEG table
│   ├── Table Microglia signatures HuMiCa.csv    # Public signatures database
│   └── Table Microglia signatures HuMiCa_FC.csv # Public signatures with Fold Change (for GSEA)
│
└── output/                     # All generated plots will be saved here
    ├── fisher_signatures_plot.png
    ├── fisher_heatmap50_plot.png
    └── ... (other plots)
```
**Note:** Please update the script to use these relative paths. For example, change `read_csv("D:/Single cell analysis/...")` to `read_csv("data/Table 1_DEG Bulk RNAseq Microglia.csv")`.

## Documentation and Working Example

To execute the analysis pipeline:

1.  **Organize Files:** Place your input `.csv` files into the `data/` directory.
2.  **Open the Script:** Open the `signatures_analysis.Rmd` file in RStudio.
3.  **Run the Code:** You can run each code chunk sequentially by clicking the "Run" button within each chunk or run the entire script at once. The script will:
    *   Load the required libraries and data.
    *   Perform Fisher's exact test and generate lollipop plots.
    *   Create and save a heatmap of overlapping genes.
    *   Perform GSEA and visualize the results.
    *   Save the output plots to the `output/` folder.

## Citation

If you use this code for a publication, please cite this repository.

*   [Link to your publication/preprint]

## License

This project is licensed under the [Choose a License, e.g., MIT License]. Please see the `LICENSE.txt` file for details.