Ma `lumot

RNK-seq ma'lumotlarini tahlil qilish uchun asboblar

RNK-seq ma'lumotlarini tahlil qilish uchun asboblar


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Umid qilamanki, bu shunday savol berish uchun yaxshi joy. Men inson hujayralaridan olingan RNK-seq ma'lumotlari bo'yicha ma'lumotlarni tahlil qilishim kerak. Men hozir bu borada menga yordam beradigan vositalarni qidiryapman. Xususan, menga ma'lumotlardan gen ifodasini tahlil qilish uchun ba'zi vositalar kerak bo'ladi. Har bir fastq faylida tanlangan genlar ifodasini tuzish va natijalarni eksport qilish imkoniyati yoki skript uchun ba'zi buyruq qatori interfeysi bilan ifodadagi farqlarni solishtirishga yordam beradigan narsa. Asosan, men fastq faylini va ehtimol, inson genomi annotatsiya faylini kiritish va gen ifodasini chiqish sifatida olishim mumkin bo'lgan narsaga muhtojman. Men bioo'tkazgich va uning paketlarini va Vikipediyaning RNK-Seq bioinformatika vositalari ro'yxatini ko'rib chiqdim. O'ylaymanki, ushbu vositalardan ba'zilari menga kerak bo'lgan narsani qila olishlari kerak, ammo bunga erishish uchun qaysi biri va ulardan qanday foydalanish kerakligini topa olmadim. Kimdir menga maslahat bera oladimi?


Sizga mos yozuvlar genomidagi o'qishlarni "xaritalash" uchun vosita kerak bo'lishi mumkin. Bunday mos yozuvlar genomini izohlar bilan birga bu yerda topishingiz mumkin: ftp://ussd-ftp.illumina.com/.

Bowtie2 yoki bwa kabi xaritalash vositalari fastq fayllarini oladi va genomlarga murojaat qiladi va xaritalash natijalarini sam deb nomlangan formatda chiqaradi.

Keyin sizda gen ifodasini baholash uchun juda ko'p imkoniyatlar mavjud.

  • Sam formatini tahlil qilish va har bir gen bo'yicha normallashtirilgan o'qish sonini baholash uchun o'z algoritmingizni yozishingiz mumkin.

  • Buning uchun samtools, pysam, htseq kabi ko'proq yoki kamroq past darajadagi vositalarni ba'zi skriptlar bilan birlashtira olasiz.

  • Hisoblash (masalan, bedtools ot htseq-count) va differentsial ifoda tahlilini (masalan, deseq2) bajaradigan vositalardan foydalanishingiz mumkin.

Oxirgi holatda, men oldingi bosqichning natijasini yaratish uchun qanday vositalar kerakligini bilish uchun yakuniy vositaning hujjatlaridan boshlashni maslahat beraman.

Siz ba'zi R yoki Python dan foydalanasiz yoki ba'zi qadamlar uchun veb-platforma galaktikasidan foydalanasiz.

Tahrirlar

Ushbu javobda @scribaniwannabe ta'kidlaganidek, Tuxedo asboblar to'plami haqidagi maqola so'nggi vositalar yordamida RNK-seq tahlilini o'tkazish bosqichlariga yaxshi misol beradi (2016 yil oktyabr holatiga ko'ra).

@Student T ushbu javobda eslatganidek, RNK-seq ma'lumotlari ekson-ekson birikmalaridan kelib chiqishi mumkin bo'lgan o'qishlarni o'z ichiga oladi, shuning uchun o'qish xaritasi o'qishni butun uzunligi bo'ylab doimiy ravishda xaritalashdan voz kechmaydigan tarzda o'rnatilishi kerak. genom. Mening bilishimcha, HISAT2 va CRAC buni sukut bo'yicha qiladi. Bowtie2 maxsus sozlamalarga muhtoj.


Men ham @bli R va Python (xususanBioo'tkazgich) gen ifodasini solishtirish uchun yetarli paketlarga ega. Siz bo'lmasligi kerak o'qishlaringizni bwa yoki kamon bilan tekislang, chunki ular intronlarni hisobga olmaydilar. Foydalanishingiz kerakTopHatyokiYULDUZ.


@bli bergan javob ajoyib. Men Jons Xopkins ham yaqinda smoking to'plamini yangilaganini ta'kidlayman deb o'yladim. U istiqbolli ko'rinadi va foydalanish bo'yicha ajoyib ko'rsatmalarga ega.

Bundan tashqari, men RNK-Seq ikkilamchi tahlilim uchun GeneTrail 2 vositasini juda yaxshi ko'ra boshladim. Boyitish tahlillari uchun ajoyib natijalar beradi.

Umid qilamanki, bu foydalidir.


O'ylaymanki, STAR bugungi kunda afzal ko'riladigan splice-xabarli alignerdir. STAR gen yoki transkript bo'yicha hisoblarni chiqarishi mumkin. Illumina ma'lumotlariga ega bo'lsangiz, Illumina's BaseSpace'dagi asboblardan foydalanishga urinib ko'rishingiz mumkin. RNASeq u erda bepul qilishingiz mumkin bo'lgan narsalardan biri bo'lishi mumkin.


Menimcha, HTSeq deyarli buni qiladi. Fastq namunasi va izoh fayli berilgan har bir gen uchun o'qishlar soni matritsasi chiqaradi


PCAGO: RNK-Seq ma'lumotlarini asosiy komponentlar tahlili bilan tahlil qilish uchun interfaol vosita

Biologik namunalarni dastlabki tavsiflash va klasterlash har qanday transkriptomik tadqiqotlarni tahlil qilishda muhim qadamdir. Ko'pgina tadqiqotlarda asosiy komponentlar tahlili (PCA) faqat differentsial gen ifodasiga asoslangan namunalar yoki hujayralar o'rtasidagi munosabatlarni bashorat qilish uchun tanlangan klasterlash algoritmidir. Ma'lumotlarning sof sifatini baholashdan tashqari, PCA eksperimentning biologik foni haqida dastlabki ma'lumotlarni taqdim etishi va tadqiqotchilarga ma'lumotlarni sharhlash va shunga mos ravishda keyingi hisoblash bosqichlarini loyihalashda yordam berishi mumkin. Biroq, noto'g'ri klasterlar va talqinlarning oldini olish uchun PCA ni yaratish uchun asosiy gen to'plamlarini to'g'ri tanlash va vizualizatsiya uchun eng mos keladigan asosiy komponentlarni tanlash hal qiluvchi qismlardir. Bu erda biz PCA bilan RNK sekvensiyasi tajribalaridan olingan gen miqdorini aniqlash ma'lumotlarini tahlil qilish uchun ishlatish uchun qulay va interaktiv vosita bo'lgan PCAGO ni taqdim etamiz. Asbob o'qish sonini normallashtirish, o'qish sonlarini gen izohlari bo'yicha filtrlash va turli vizualizatsiya imkoniyatlari kabi xususiyatlarni o'z ichiga oladi. Bundan tashqari, PCAGO mazmunli vizualizatsiya yaratish uchun genlar va asosiy komponentlar soni kabi tegishli parametrlarni tanlashga yordam beradi.

Mavjudligi va amalga oshirilishi PCAGO R-da amalga oshirilgan va github.com/hoelzer-lab/pcago saytida bepul mavjud. Asbob veb-xizmat sifatida yoki Docker tasviridan foydalangan holda mahalliy sifatida bajarilishi mumkin.

Aloqa martin.xoelzeruni-jena.de


Kirish

Yuqori samarali keyingi avlod sekvensiyasi (NGS) texnologiyalarining rivojlanishi transkriptomik sohani inqilob qilib, keng ko'lamli RNK ketma-ketligi (RNK-Seq) 1 uchun yo'l ochdi. RNK-Seq nafaqat genom bo'ylab transkripsiyani o'rganish uchun ishlatilishi mumkin, balki u yangi genlar va transkriptlarni 2 kashf qilish yoki yangi kodlanmaydigan RNKlar, kichik interferent RNKlar (siRNKlar), kichik kabi qo'shimcha elementlarni aniqlash qobiliyatini ham taklif qiladi. yadroviy RNK (snoRNK) va mikro-RNK (miRNK). Yaqinda RNK larning yangi sinfi tasvirlangan bo'lib, ular bitta RNK molekulasining uchlarida kovalent bog'lanish orqali dumaloq RNK hosil qilish qobiliyati bilan ajralib turadigan circRNAs 3 deb ataladi. Ushbu circRNKlar gen ekspressiyasini tartibga solishda ishtirok etayotganga o'xshaydi va ular bilan o'ziga xos bog'lanish orqali miRNKlarning regulyatori sifatida ishlaydi. Ushbu yangi tartibga soluvchi molekulalarning paydo bo'lishi, shuningdek, RNK-Seq tajribalari 4 orqali circRNAlarni aniqlash uchun yangi vositalarni ishlab chiqishga olib keldi.

RNK-Seq eksperimentlarining ikkita muhim jihati bor, bu turdagi tadqiqotda yaratilgan katta miqdordagi ma'lumotlar va biologik ahamiyatga ega ma'lumotlarni ajratib olish va sharhlash qobiliyati. Bu muammolar ayniqsa dolzarbdir, chunki transkriptomik ma'lumotlarni tahlil qilish osonlikcha muhim eksperimental to'siq bo'lishi mumkin, ayniqsa RNK-Seq va miRNA-Seq tahlillari qo'yadigan qo'shimcha cheklovlarni hisobga olgan holda. Darhaqiqat, turli xil statistik va bioinformatika vositalarining ko'plab moslashtirilgan parametrlarga ega kombinatsiyasi ko'pincha tajribali bo'lmagan tadqiqotchilar uchun bunday tahlilni qiyinlashtiradi. Bundan tashqari, turli xil vositalardan foydalanish ko'p vaqt talab qiladigan o'rnatishlarni o'z ichiga olishi mumkin, odatda keyingi bosqichga o'tish uchun inson aralashuvini talab qiladi. Ushbu muammoni bartaraf etish uchun gen ekspresyonini tahlil qilish uchun ExpressionPlot 5, GENE-counter 6, RobiNA 7, TCW 8, Grape RNK-Seq 9 yoki MAP-RSeq 10 kabi bir nechta vositalar yaratilgan. Bundan tashqari, boshqa vositalar to'plami DSAP 11 , miRanalyzer 12 , miRExpress 13 , miRNAkey 14 , iMir 15 , CAP-miRSeq 16 , mirTools 2.0 17 yoki sRNAtool kabi miRNK ifoda profillarini tahlil qilishga qaratilgan. Bundan tashqari, wapRNA 19, eRNA 20, BioVLAB-MMIA-NGS 21 yoki Omics Pipe 22 kabi RNK-Seq va miRNA-Seq tahlillarini amalga oshirish uchun bir nechta vositalar amalga oshirildi. Har xil turdagi NGS tahlillarini ta'minlaydigan bir nechta dasturiy ta'minotni birlashtiruvchi boshqa mavjud usullar: GALAXY (https://galaxyproject.org/), QuasR 23, RAP 24, Subread/edgeR 25, boshqalari esa ViennaNGS kabi fayllarni qayta ishlash uchun modullar to'plamini taqdim etadi. 26-to'plam.

Garchi juda qimmatli bo'lsa-da, bu vositalarning asosiy kamchiligi shundaki, ba'zi istisnolardan tashqari, ular ko'pincha qo'lda o'rnatish tartib-qoidalariga va odamlarning keyingi kiritishiga, avtomatlashtirish qiyin bo'lgan qadamlarga tayanadi. Ularning keng tarqalishi va amalga oshirilishiga xalaqit beradigan boshqa muammolar ham mavjud: i) ba'zi vositalar veb-platformalarda ishlash uchun mo'ljallangan bo'lib, natijada ma'lumotlarni yuklashni cheklash yoki parametr tanlashning cheklangan taklifi (masalan, Galaxy, RAP 24 , BioVLAB) -MMIA-NGS 21 yoki DSAP 11) ii) amalga oshirilgan tahlil quvurlari qattiq ish oqimlariga ega, shuning uchun foydalanuvchilar quvur liniyasining turli bosqichlarida tahlillarni boshlay olmaydilar (masalan, RAP 24, BioVLAB-MMIA-NGS 21) iii) ushbu vositalardan ba'zilari mahalliy o'rnatish uchun zarur bo'lgan katta ro'yxat mavjud bo'lib, ulardan kam tajribali tadqiqotchilar foydalanishini qiyinlashtiradi (masalan: Cap-miRSEq 16, Omics Pipe 22, iMir 15, Galaxy, ExpressionPlot 5) iv) tahlil odatda bir nechta tanlanganlar bilan cheklanadi. model organizmlar (ya'ni QuasR 23 , ExpressionPlot 5 , BioVLAB-MMIA-NGS 21 ) va iiv) ba'zi vositalar NGS hamjamiyatida keng sinovdan o'tkazilmagan ichki koddan foydalanadi (ya'ni Grape RNA-Seq 9 yoki ExpressionPlot 5). Bundan tashqari, bizning ma'lumotlarimizga ko'ra, ushbu vositalarning hech biri circRNKlarni tahlil qilish uchun quvur liniyasini amalga oshirmagan.

Ushbu cheklovlarni hisobga olgan holda, biz "miARma-Seq" deb nomlangan keng qamrovli quvurlarni tahlil qilish to'plamini ishlab chiqdik. miRNK-Seq va RNK-Seq ko'p jarayonli tahlil, ya'ni mRNK, miRNK va circRNKlarni aniqlash, shuningdek, differentsial ifodalash, maqsadni bashorat qilish va funktsional tahlil qilish uchun mo'ljallangan. Eng muhimi, u har qanday ketma-ket organizmga qo'llanilishi mumkin va u ish jarayonining har qanday bosqichida boshlanishi mumkin.


ROSALIND bilan gen ifodasini qanday tahlil qilish kerak

NEGA GEN EKSPRESSIYASI yoki RNK-SEQNI O'rganish kerak?

Gen ekspressiyasini o'rganish biologik namunadagi RNK faolligini aniqlash orqali kasalliklarning tabiati va davolash ta'siri haqida qimmatli ma'lumotlarni beradi. RNK-seq - bu genlar ifodasini, muqobil splicing transkriptlarini va sintezlarni baholash uchun tez o'sib borayotgan Keyingi avlod ketma-ketligi (NGS) tahlilidir.

Onkologiya, immunologiya, regenerativ tibbiyot, dori-darmonlarni aniqlash va boshqa tadqiqot sohalarida ishlaydigan olimlar ko'pincha sog'lom va kasallik holatlari o'rtasida terapevtik maqsadlarni aniqlash uchun differentsial ifodalangan genlar va biologik yo'llarni aniqlash uchun tajribalar o'tkazadilar. Ushbu differentsial naqshlar o'rtasidagi taqqoslashlar dori va diagnostika rivojlanishi uchun qimmatli noyob gen imzolarini ochib beradi.

UMUMIY

ROSALIND - bu bulutli platforma bo'lib, tadqiqotchilarni tajriba dizayni bilan sifat nazorati, differentsial ifoda va real vaqtda hamkorlik muhitida yo'lni o'rganishga bog'laydi.

Har qanday malaka darajasidagi olimlar ROSALIND-dan foydalanadilar, chunki hech qanday dasturlash yoki bioinformatika talab qilinmaydi. ROSALIND xom FASTQ ketma-ketligi ma'lumotlarini, shuningdek qayta ishlangan hisob ma'lumotlarini qabul qilish orqali kuchli quyi oqim tahlilini va gen ifodasi ma'lumotlar to'plamida chinakam tushunarli vizualizatsiyani ta'minlaydi. Foydalanish qulayligi va qimmatli vaqtni tejash uchun yaratilgan interfaol tajribada har bir tajriba bilan bir kunlik natijalarni oling.

DIFFERENTIAL GEN EKSpressiyasini QANDAY TAHLIL QILISh KERAK

ROSALIND olimlar va tadqiqotchilarga bioinformatika yoki dasturlash ko'nikmalarini talab qilmasdan differentsial gen ifodasini tahlil qilish va izohlash imkonini beradi. Buning uchun biologiya bo'yicha asosiy ma'lumot va joriy obuna yoki faol sinov talab qilinadi.

Biologik savollarni mustaqil ravishda yoki yuklangan tajriba ma'lumotlari bilan birgalikda o'rganish mumkin, chunki ROSALIND Milliy Biotexnologiya Axborot Markazi (NCBI) Qisqa o'qish arxivi (SRA) va Gen ifodasi Omnibusidan (GEO) ommaviy ma'lumotlarni import qilishni avtomatlashtiradi.

KASHF QILIShLARI

"Endi men bir necha soat ichida ketma-ketlik tahlilini loyihalash va kirishim mumkin va men natijalarimga ko'proq ishonaman."

RNK-SEQ BILAN MUVAFFAQIYAT UCHUN BESH QADAM

ROSALIND ma'lumotlar tahlilini soddalashtiradi va ma'lumotlarni sharhlashning har bir bosqichini o'zaro bog'laydigan ma'lumotlar markazi kabi ishlaydi. ROSALIND Gene Expression kashfiyoti tajribasi tadqiqotchilarga bioinformatika ekspertizasini talab qilmasdan chegaralarni sozlash, taqqoslashlar qo'shish, kovariativ tuzatishlarni qo'llash va hattoki bir nechta ma'lumotlar to'plamida naqshlarni topish erkinligini berish uchun tajriba natijalarini vizual tekshirish va o'z-o'zini tekshirish imkonini beradi. ROSALIND da RNK-seq ma'lumotlarini tahlil qilish uchun beshta oson qadam mavjud.

1. TAJRIBA LOYIHALASH

RNK-seq ma'lumotlar tahlilini boshlash yangi tajriba yaratish va eksperiment dizaynini qo'lga kiritish bilan boshlanadi. ROSALIND biologik maqsadlarni, namunaviy atributlarni va tahlil parametrlarini qayd qilish uchun boshqariladigan tajribada eksperimentning asosiy jihatlarini ko'rib chiqadi. Ushbu tafsilotlar tajriba kashfiyoti asboblar panelining asosiga aylanadi. Hujjatlarni nashr etadigan va NCBI ommaviy ma'lumotlari bilan ishlaydigan tadqiqotchilar NCBI ma'lumotlar modellarini mahalliy darajada qo'llab-quvvatlash muhimligini bilishadi. ROSALIND metadata tayinlash va namunaviy atribut tavsiflari uchun NCBI BioProject va BioSample modellarini to'liq qo'llab-quvvatlaydi. ROSALIND shuningdek, olimlarga biologik xatti-harakatlarni eksperimentga tegishli atamalar bilan tavsiflash uchun maxsus atributlarni yaratishga imkon beradi. Taqqoslashni o'rnatish ushbu tanish atamalar yordamida namunalarni tavsiflash va izohlash orqali soddalashtirilgan. Ushbu metodologiya taqqoslash uchun namunalarni tanlashda differensial ifoda xatolarining xavfini kamaytiradi.

RNK-seq ma'lumotlarini tahlil qilish uchun ROSALIND olimlarga tanlash imkoniyatini beradi: a) yuqori o'tkazuvchanlik ketma-ketligi bilan ishlab chiqarilgan xom FASTQ fayllaridan boshlang yoki b) boshqa tahlil quvuri orqali ishlab chiqarilgan qayta ishlangan ma'lumotlar fayllaridan foydalaning. Qayta ishlangan ma'lumotlar normallashtirilgan yoki xom hisoblar sifatida import qilinadi. Bu olimlarga ma'lumotlar manbasidan qat'i nazar, ma'lumotlarni vizualizatsiya qilish va sharhlash uchun ROSALIND kashfiyoti tajribasidan foydalanish uchun moslashuvchanlikni ta'minlaydi. ROSALIND xom FASTQ fayllarini tahlil qilganda, avtomatik ifloslanishni aniqlash, differensial ravishda ifodalangan genlarni identifikatsiyalash va chuqur yo'l talqini bilan aqlli sifat nazoratini o'z ichiga olgan tahlil uchun ilg'or quvur liniyasi yordamida ma'lumotlar tahlilini soddalashtiradi. ROSALIND RNK-seq ma'lumotlarini tahlil qilish quvuri va mavjud ma'lumotnoma materiallari haqida ko'proq ma'lumot olish uchun texnik xususiyatlar bo'limiga tashrif buyuring.

To'g'ri RNK-seq natijalari uchun tahlil quvur liniyasi namuna tayyorlash va eksperimentda ishlatiladigan kutubxona tayyorlash to'plamlaridagi mulkiy farqlarga moslashishi kerak. To'plamni tanlash nafaqat kerakli transkriptomik elementlarni yo'naltirish va qo'lga olish uchun muhim, balki tahlil quvuri to'plamning o'ziga xos xususiyatlarini, masalan, chiziqlilik, strand yo'nalishi, har qanday noyob molekulyar identifikatorlar (UMI) va ishlatiladigan adapterlarni moslashtiradi va optimallashtiradi. ROSALIND har bir tahlilni tegishli tafsilotlar bilan avtomatik ravishda sozlab, namuna va kutubxona tayyorlash to'plamlarining keng kutubxonasini birlashtiradi va qo'llab-quvvatlaydi. Qo'llab-quvvatlanadigan to'plamlar haqida ko'proq ma'lumot olish uchun texnik xususiyatlar bo'limiga tashrif buyuring. Tanlangan to'plamlar va asboblar hamkorlari ham quyida keltirilgan.

2. RNK-SEQ SIFATNI NAZORAT

Tadqiqotchilar RNK-seq tajribasidan tushunchalar to'plashdan oldin sifat nazorati bosqichida ishonch hosil qilishlari kerak, aks holda tahlil natijalariga ishonmaslik kerak. Biologiyaning sirlari tushunarsiz va murakkab. Yaxshi ishlab chiqilgan tajriba jarayonida yuzaga kelishi mumkin bo'lgan chegaralar, ifloslanish, almashtirilgan namunalar va boshqa ko'plab xatolarni tuzatish choralarini ko'rish uchun vaqtni boy bermaslik kerak.

Tekshirish uchun eng muhim sifat nazorati ko'rsatkichlaridan ba'zilari Q30 ballari, moslashish stavkalari, ribosoma tarkibi, dublikat stavkalari, namuna korrelyatsiyasi, genlarni qamrab olish, genomik hududlar va ko'p o'lchovli masshtablash (MDS) yoki barcha namunalar uchun asosiy komponentlar tahlili (PCA). ROSALIND past hizalanishni aniqlaganida, tekislanmaydigan o'qishlar mumkin bo'lgan ifloslanish uchun baholanadi. Agar ribosoma miqdori kutilganidan yuqori bo'lsa, ROSALIND ogohlantirishlar hosil qiladi. Illumina sekvenserlari bilan Q30 qiymatlari maqsadli turlar uchun 85% va hizalanish stavkalari 80% dan yuqori bo'lsa, natijalar odatda yaxshi bo'ladi. Bundan tashqari, takrorlash stavkalari 25% dan kam bo'lsa, o'qishlarning 10% dan kamrog'i qisqartirilganiga afzallik beriladi. Tadqiqotchilar namunani noaniq ko'rsatkich sifatida aniqlash orqali noto'g'ri namunalarni va natijalarga zararli ta'sirlarni bartaraf etishlari va natijalarni talqin qilishning kashfiyot va tadqiqot bosqichiga ishonch bilan o'tishlari mumkin.

ROSALIND Quality Control Intelligence ma'lumotlar sifati bilan bog'liq potentsial muammolarni aniqlaydi va natijalarni taqdim etishdan oldin ma'lumotlarni tahlil qiladi. Bu tadqiqotchilarning sifat nazorati masalalari bo'yicha mutaxassis bo'lish ehtiyojlarini yo'q qiladi. Tadqiqotchilar Sifat nazorati razvedkasi orqali o'z natijalariga qanday ishonch hosil qilishlarini bilib oling.

3. NATIJALARNI OCHISH

Tadqiqotchi sifat nazorati bosqichini ko'rib chiqqandan so'ng, natijalarning interaktiv taqdimoti boshlashga tayyor. Keyingi qadam eksperiment qulfini ochishdir. ROSALIND natijalarni ochish uchun zarur bo'lgan tahlil birliklari ("AU") miqdorini hisoblab chiqadi. Bu, odatda, RNK-seq tajribalari uchun bitta namunali FASTQ fayli uchun 1 AU ni tashkil qiladi, ammo bu fayllar soni yoki boshqa tajriba parametrlariga qarab farq qilishi mumkin. Hisob balanslari va ko'proq AU olish uchun tezkor havolalarga to'g'ridan-to'g'ri qulfni ochish ekranidan kirish mumkin. Tahlil birliklari haqida ko'proq ma'lumot olish uchun quyidagi bo'limdagi Savol-javoblarni ko'ring yoki ROSALIND do'koniga tashrif buyuring.

4. TAHLIL VA KASHFIYAT

Odatdagi RNK-seq tahlili differensial ravishda ifodalangan genlar ro'yxatini taqdim etadi, odatda massiv va keng tarqalgan CSV fayli ko'rinishida. Afsuski, bu ko'pincha olimlar uchun javoblardan ko'ra ko'proq savollarga olib keladi. Ushbu CSV faylini yaratish uchun bir nechta ilovalar ham kerak bo'lishi mumkin. Bunday ilovalar ko'pincha nostandart kiritish/chiqarish formatlari bilan keng murakkablikka ega, ularning aksariyati dasturlash bo'yicha ilg'or bilimlarni talab qiluvchi buyruq qatori vositalaridir - bu ko'pchilik biologlar darajasidan ancha yuqori mashqdir.

ROSALIND RNK-seq ma'lumotlarini differentsial ifoda tahlili va talqin qilish uchun keng qamrovli boshqaruv panelini taqdim etish orqali CSV faylidan tashqariga chiqadi. Tadqiqotchilar hisoblangan kesish filtri bilan aniqlangan muhim Differensial ifodalangan genlar ro'yxatidan boshlanadi. Filtrning standart sozlamalari 0,05 p-sozlash bilan tartibga solingan 1,5 ko'tarilgan va 1,5 pastga qarab o'zgarishi bilan boshlanadi. Agar kerak bo'lsa, muhim genlar to'plamiga erishish uchun qo'shimcha sozlashlar ROSALIND tomonidan amalga oshiriladi. Tadqiqotchilar, shuningdek, katlama o'zgarishlari va P qiymati parametrlari yordamida o'zlarining moslashtirilgan filtrlarining cheksiz to'plamini yaratishlari mumkin. Filtrlarni o'zgartirish, kovariant tuzatishlar qo'shish, genlar ro'yxati va imzolarni qo'llash va syujet ranglar palitrasini sozlash uchun qulay ekran boshqaruvlariga osongina kirish mumkin. ROSALIND gen ekspressiyasini kashf qilish tajribasi yuqori yo'llar, gen ontologiyasi kasalliklari va dori vositalarining o'zaro ta'sirini chuqur talqin qilishni o'z ichiga oladi, ular ekranni to'ldiradigan va olimning o'zaro ta'siriga javob beradigan, moslashtirilgan issiqlik xaritalari, vulqon va MA uchastkalari, shuningdek quti va barni ko'rsatadigan boy interaktiv syujetlar sifatida. uchastkalar.

Istalgan vaqtda yangi taqqoslash va meta-tahlil qo'shilishi mumkin. Taqqoslashlar BioProject atributlari yordamida yaratiladi. Yaratilgan meta-tahlillar o'zaro tajribalar va multi-omik bo'lishi mumkin. Ushbu istiqbollarning har biri o'rnatilgandan so'ng bir necha daqiqada mavjud bo'lib, ichki bioinformatik ish yukini kamaytiradi va olimlarga to'g'ridan-to'g'ri eksperiment faniga e'tibor qaratish orqali tezkor reaktsiyaga kirishish imkonini beradi.

5. HAMKORLIK VA NATIJALARNI ALMASH

Kashfiyot jarayoni kamdan-kam hollarda tadqiqotchining yagona nuqtai nazari bilan yakunlanadi. ROSALIND Spaces virtual maʼlumotlar xonalari orqali haqiqiy olimdan olimga hamkorlik qilish imkonini beradi, bu yerda olimlar va hamkorlar Google Docs bilan ishlash kabi umumiy tajribalarni interaktiv tarzda oʻrganish uchun dunyoning istalgan nuqtasidagi tegishli maʼlumotlar toʻplamida birlashishi mumkin. Tadqiqotchilar ma'lumotlarning izchil versiyasiga, noqulay fayllarni uzatish yoki manba fayllarini qayta talqin qilish zaruratisiz kirishadi. Barcha oʻzgarishlar interaktiv, bir zumda mavjud va dunyoning hamma joyida (tashkilot tomonidan ruxsat berilgan) real vaqt rejimidagi faoliyat tasmalari va tarixiy hisobotlar bilan koʻrish mumkin. Spaces ishtirokchilari umumiy hamkorlik muhitida tajribalar qo'shishlari, yo'llarni o'rganishlari, chegaralarni o'zgartirishlari, meta-tahlillarni qo'shishlari va yangi taqqoslashlarni qo'shishlari mumkin.

Bo'shliqlar - bu virtual yig'ilish xonalari bo'lib, u erda olimlar har bir tajribaning kashfiyot qiymatini oshirish va keyingisiga tayyorgarlik ko'rish uchun mutaxassislar, mijozlar va yordamchi guruhlar bilan uchrashadilar.


RNK ketma-ketligi ma'lumotlarini tahlil qilish bo'yicha yangi boshlanuvchilar uchun qo'llanma

Bu atamani o'z ichiga olgan birinchi nashrlardan beri RNK-seq (RNK ketma-ketligi) 2008 yilda paydo bo'ldi, RNK-seq ma'lumotlarini o'z ichiga olgan nashrlar soni eksponensial ravishda o'sib, 2016 yilda 2808 ta nashrni tashkil qilib, rekord darajaga etdi (PubMed). RNK-seq ma'lumotlarining bunday boyligi yaratilganligi sababli, ushbu ma'lumotlar to'plamidan maksimal ma'noni olish qiyin va tegishli ko'nikmalar va ma'lumotlarsiz ushbu ma'lumotlarni noto'g'ri talqin qilish xavfi mavjud. Biroq, RNK-seq ma'lumotlarini tahlil qilishning har bir bosqichida yotgan tamoyillarni umumiy tushunish dasturlash va bioinformatika bo'yicha ma'lumotga ega bo'lmagan tadqiqotchilarga o'zlarining ma'lumotlar to'plamini, shuningdek nashr etilgan ma'lumotlarni tanqidiy tahlil qilish imkonini beradi. Ushbu sharhdagi bizning maqsadlarimiz odatiy RNK-seq tahlilining bosqichlarini ajratish va RNK-seqdan foydalanadigan tajribalarni amalga oshirayotgan dastgoh olimlari va biotibbiyot tadqiqotchilari uchun muhim bo'lgan yo'ldagi tuzoq va nazorat nuqtalarini ta'kidlashdir.

RNK ketma-ketligi (RNK-seq) birinchi marta 2008 yilda (1-4) joriy etilgan va so'nggi o'n yil ichida ko'plab tadqiqot muassasalarida xarajatlarning kamayishi va umumiy resurslarni ketma-ketlashtirish yadrolarining ommalashishi tufayli kengroq qo'llanila boshlandi. RNK-seqning mashhurligi ortib borayotgani bioinformatika bo'yicha ekspertiza va hisoblash resurslariga tez o'sib borayotgan ehtiyojni keltirib chiqardi. Dastgoh olimlari katta ma'lumotlar to'plamini to'g'ri tahlil qilishlari va qayta ishlashlari uchun ular RNK-seq tahlilining murakkab jarayoni bilan birga keladigan bioinformatika tamoyillari va cheklovlarini tushunishlari kerak. Garchi RNK-seq tahlili nihoyatda kuchli bo'lishi va ko'plab hayajonli yangi topilmalarni ochib berishi mumkin bo'lsa-da, u oddiy tadqiqotchilar tomonidan o'rganilgan tahlillardan farq qiladi, chunki u juda katta ma'lumotlar to'plami sifatida keladi, uni keng tahlilsiz izohlab bo'lmaydi.

RNK-seq protokoli RNKning umumiy, mRNK uchun boyitilgan yoki rRNK dan kamaygan holda cDNK ga aylanishi bilan boshlanadi. Parchalanish, adapter bog'lash va indeks bog'lashdan so'ng, har bir cDNK fragmenti keyinchalik yuqori o'tkazuvchanlik platformasi yordamida ketma-ketlashtiriladi yoki "o'qiladi". Xom o'qilgan ma'lumotlar so'ngra demultiplekslashtiriladi, moslashtiriladi va genlar bilan taqqoslanadi va xom hisoblash jadvalini yaratadi, shu nuqtada ma'lumotlar ko'pincha o'z tahlilini boshlash uchun dastgoh tadqiqotchisiga topshiriladi. RNK-seq ma'lumotlarini qayta ishlash uchun eng mos quvur liniyasi bo'yicha hali ham haqiqiy konsensus mavjud emas, ammo BaseSpace (Illumina), MetaCore (Thomson Reuters) yoki Bluebee (Lexogen) kabi ko'plab onlayn yarim avtomatlashtirilgan vositalar mavjud. Ushbu vositalar asosiy komponentlar tahlili (PCA) uchastkalarini yaratishga, issiqlik xaritalarini ko'rsatishga va bioinformatika mutaxassisi yordamisiz differensial gen ekspresyon tahlilini amalga oshirishga qaramasdan, ular foydalanuvchilarga o'z ma'lumotlarining sifatini to'liq baholashga, o'z tahlillarining to'g'riligini aniqlashga imkon bermaydi. , va tahlilni ularning biologik savoliga moslashtiring, bu esa ma'lumotlar to'plamining noto'g'ri talqin qilinishiga olib kelishi mumkin. Tergovchilar o'zlarining ma'lumotlar to'plamiga qanday yondashishni tushunishlari, ma'lumotlar to'plamining xususiyatlarini qadrlashlari va ma'lumotlarning xulosa chiqarish qobiliyatini cheklashi mumkin bo'lgan zaif tomonlarini kuzatishlari muhimdir. Bundan tashqari, har bir ma'lumot to'plamini tahlil qilish majburiydir de novo, ya'ni chegaralar va usullarni yangidan moslashtirish kerak, bunga umumiy onlayn ilovalar yoki vositalar yordamida erishib bo'lmaydi.

Ushbu maqolaning maqsadlari uchun biz tadqiqot guruhimizdagi eksperimentning namunaviy ma'lumotlar to'plamidan foydalandik, unda sodda sichqon alveolyar makrofaglari transplantatsiya qilingan o'pkadan 2 va 24 soatdan keyin reperfüzyondan keyin ajratilganlar bilan taqqoslangan. Biz dastgoh olimi uchun RNK-seq tahliliga foydalanuvchilar uchun qulay yondashuvni tavsiflash uchun ushbu ma'lumotlar to'plamidan foydalangan holda tahlilimizni taqdim etamiz.

Erkak Cx3cr1 gfp/+ C57BL / 6 fonida sichqonlar va 12-14 haftalik yovvoyi turdagi BALB / c sichqonlari ishlatilgan. Barcha sichqonlar ma'lum bir patogen bo'lmagan joyda joylashtirilgan. Barcha reagentlar ishlab chiqaruvchi tomonidan endotoksinsiz sertifikatlangan. Barcha tadqiqotlar Shimoli-g'arbiy universiteti hayvonlarni parvarish qilish va ulardan foydalanish qo'mitasining ko'rsatmalariga muvofiq o'tkazildi.

Transplantlar allogenik mos kelmaydigan donor-oluvchi juftliklar o'rtasida yuqorida aytib o'tilganidek amalga oshirildi (5). Xususan, donor o'pkadan Cx3cr1 gfp/+ sichqonlar allogreft sifatida ishlatilgan va yovvoyi turdagi BALB/c qabul qiluvchilariga implantatsiya qilingan. Qisqacha aytganda, donor sichqonlari geparinizatsiya qilindi va o'pka arteriyasi orqali antegrad bilan yuvildi, o'pkalarni jalb qilgandan so'ng traxeya bog'landi, so'ngra yurak-o'pka bloki yig'ildi va 2 soatlik sovuq ishemiya uchun 4 ° C da saqlanadi. Yagona chap o'pka transplantatsiyasi uchun anastomozlar chap torakotomiya orqali manjetli texnika yordamida yakunlandi, o'pka reperfuziya qilindi va qayta ishga tushirildi, so'ngra torakotomiya qatlamlarga yopildi. Sichqonlar ventilyatordan ajratilgan va ular ambulatoriyaga kelgandan so'ng tiklanish vaqtida ekstubatsiya qilingan. Reperfüzyondan so'ng belgilangan vaqt nuqtalarida qabul qiluvchi sichqonlar o'ldirildi va o'pka allogrefti yig'ib olindi.

O'pkalar ilgari ta'riflanganidek, bitta hujayrali suspenziyalar uchun qayta ishlandi (5). Qisqacha aytganda, o'ng qorincha 10 ml muzli sovuq Hanksning muvozanatli tuzi eritmasi bilan yuvildi, so'ngra o'pkalarga kollagenaza D (Roche) va DNase I (Roche) ni o'z ichiga olgan to'qimalarni hazm qilish aralashmasi infiltratsiya qilindi. GentleMACS (Miltenyi Biotec) yordamida mexanik dissotsiatsiya va 37 ° C da 30 daqiqa davomida fermentativ hazm qilish kombinatsiyasi amalga oshirildi. Keyin namunalar antikorlarni bo'yashdan oldin CD45 mikroboncuklari (Miltenyi Biotec) va AutoMACS tizimi (Miltenyi Biotec) yordamida boyitilgan.

Qarang Bir hujayrali suspenziyani bo'yash uchun ishlatiladigan antikorlar va suyultirish uchun ma'lumotlar qo'shimchasidagi E1-jadval va alveolyar makrofaglarni saralash uchun eshik strategiyasi uchun E1-rasm. Hujayralar BD FACSAria II SORP to'rt lazerli oqim sitometri (BD Biosciences) yordamida 4°C da magnit faollashtirilgan hujayra saralash buferiga saralandi.

Yangi saralangan hujayralar darhol granulalarga solindi, 100 mkl PicoPure Extraction Buffer (Thermo Fisher Scientific) ichida qayta suspenziya qilindi va keyin -80 ° C da saqlanadi. RNK izolyatsiyasi PicoPure RNK izolyatsiyasi to'plami (Thermo Fisher Scientific) yordamida amalga oshirildi va 4200 TapeStation (Agilent Technologies) yordamida o'lchangan yuqori sifatli RNK (RNK yaxlitlik raqami, >7.0) bo'lgan namunalar kutubxona tayyorlash uchun ishlatilgan. mRNK umumiy RNKdan NEBNext Poly(A) mRNK magnit izolyatsiyalash to'plamlari (New England BioLabs) yordamida olingan va cDNK kutubxonalari keyinchalik Illumina (New England BioLabs) uchun NEBNext Ultra DNK Library Prep Kit yordamida tayyorlangan. Kutubxonalar NextSeq 500 platformasida 75-tsiklli bir-uchli yuqori chiqishli ketma-ketlik to'plami (Illumina) yordamida ketma-ketlashtirildi. Ketma-ketlash moslashtirilgandan so'ng o'rtacha 8 million o'qishga ega bo'lgan kutubxonalarni berdi. RNK-seq tahlili yagona moslashtirilgan o'qishlarga asoslangan.

O'qishlar demultiplekslashtirildi (bcl2fastq) va fastq fayllari mm10 sichqoncha genomiga (TopHat2 [6]) moslashtirildi va Ensembl gen izohi yordamida genlar (HTSeq [7]) bilan taqqoslandi. Turli shartlar o'rtasidagi juftlik taqqoslashlari edgeR da glmLRT fit funksiyasi orqali salbiy binomial umumlashtirilgan log-chiziqli model yordamida amalga oshirildi (8, 9).

Ushbu maqolada keltirilgan RNK-seq ma'lumotlari NCBI ning Gene Expression Omnibus (GEO) da saqlangan va GEO Series kirish raqami GSE116583 orqali kirish mumkin.

RNK-seq tahlilining asosiy maqsadi differensial ravishda ifodalangan va koregulyatsiyalangan genlarni aniqlash va keyingi tadqiqotlar uchun biologik ma'noni aniqlashdir. Manba materiali hujayra madaniyati bo'lishi mumkin in vitro, butun to'qima gomogenatlari yoki tartiblangan hujayralar. Topilmalarni sharhlash qobiliyati tegishli eksperimental dizayn, nazoratni amalga oshirish va to'g'ri tahlilga bog'liq. Partiya effektini minimallashtirish uchun barcha sa'y-harakatlarni amalga oshirish kerak, chunki atrof-muhitdagi kichik va nazoratsiz o'zgarishlar ishlab chiqilgan eksperimentga aloqasi bo'lmagan differentsial ifodalangan genlarni (DEG) aniqlashga olib kelishi mumkin. Partiya effekti manbalari eksperiment davomida, RNK kutubxonasini tayyorlash paytida yoki sekvensiyani ishga tushirish paytida yuzaga kelishi mumkin va ular 1-jadvalda keltirilganlarni o'z ichiga oladi, lekin ular bilan cheklanmaydi. Yaxshi ishlab chiqilgan va boshqariladigan tajriba o'tkazilgandan so'ng, tizimli yondashuv ma'lumotlar to'plami sifatni nazorat qilish, so'ngra ma'lumotlarni xolis tahlil qilish imkonini beradi. Ushbu tahlilda biz past sonli filtrlashni o'rnatish, shovqin chegarasini o'rnatish, potentsial chegaralarni tekshirish, DEGlarni aniqlash uchun tegishli statistik testlarni o'tkazish, ekspresyon naqshlari bo'yicha genlarni klasterlash va gen ontologiyasini (GO) boyitish uchun testni o'z ichiga olgan yondashuvdan foydalanamiz. . Ushbu tahlil komponentlarining har biri uchun biz muhim nazorat punktlari va sifat nazoratini ajratib ko'rsatishni maqsad qilganmiz, ular ma'lumotlar tahlilini soddalashtiradigan va kuchaytiradigan, tarafkashlikdan qochish va tergovchilarga o'z ma'lumotlar to'plamidan maksimal darajada foydalanish imkonini beradi.

1-jadval. To'plam ta'sirining manbalari va ularni yumshatish uchun taklif qilingan strategiyalar

Ushbu qo'llanma uchun biz reperfuziyaning dastlabki 24 soati davomida o'pka transplantatsiyasining sichqoncha modelida o'rganilgan uchta guruh alveolyar makrofaglarni o'z ichiga olgan ma'lumotlar to'plamidan foydalanamiz. Ushbu yondashuv (biz o'ziga xosligi haqida hech qanday da'vo qilmaymiz va o'quvchini Conesa va hamkasblarning RNK-seq ma'lumotlarini tahlil qilishning asosiy bosqichlarini ko'rsatadigan ajoyib sharhiga havola qilamiz) tergovchiga ma'lumotlarni xolis tarzda tekshirishga imkon beradi. transkripsiya imzolarini aniqlash va keyingi tahlillarni amalga oshirish uchun harakat.

Ma'lumotlar to'plamidagi o'zgaruvchanlikni baholashda, nazorat shartlari bilan solishtirganda eksperimental shartlar orasidagi farqni ifodalovchi guruhlararo o'zgaruvchanlik texnik yoki biologik o'zgaruvchanlikni ifodalovchi guruh ichidagi o'zgaruvchanlikdan kattaroq bo'lishi afzalroqdir. Ma'lumotlarning global ko'rinishi replikatsiyalar o'rtasidagi o'zgarishlarni tavsiflash imkonini beradi va tadqiqotchi tomonidan aniqlangan eksperimental guruhlar guruhlar o'rtasidagi haqiqiy farqlarni ko'rsatadimi (guruh bir xil holatdagi yoki bir xil hujayra turidagi replikatsiyalar to'plamidir). Ma'lumotlar to'plamidagi o'zgarishlarni tasavvur qilishning bir usuli PCA (11) orqali amalga oshiriladi. PCA kirish sifatida katta ma'lumotlar to'plamini oladi va gen "o'lchamlari" sonini ma'lumotlar to'plamining umumiy o'zgarishini aks ettiruvchi chiziqli o'zgartirilgan o'lchamlarning minimal to'plamiga kamaytiradi. Natijalar odatda ikki o'lchovli syujet sifatida taqdim etiladi, unda ma'lumotlar ma'lumotlar to'plamidagi o'zgarishlarni tavsiflovchi o'qlar bo'ylab tasvirlanadi. asosiy komponentlar (Kompyuterlar). PC1 ma'lumotlarning eng ko'p o'zgarishini tavsiflaydi, PC2 ikkinchi eng ko'p va hokazo. Har bir shaxsiy kompyuter tomonidan ko'rsatilgan o'zgarish umumiy dispersiyaning foizi sifatida hisoblanishi va skrining chizmasi bilan ko'rsatilishi mumkin. Agar dastlabki ikkita kompyuter farqning ko'p qismini qamrab olmasa, boshqa shaxsiy kompyuterlarni aks ettiruvchi qo'shimcha ikki o'lchovli PCA chizmalarini yaratish foydali bo'lishi mumkin. Shunday qilib, PCA syujeti replikatsiyalar orasidagi guruhlashni tasavvur qilishga yordam beradi va texnik yoki biologik chegaralarni aniqlashga yordam beradi.

Guruhlararo va guruh ichidagi o'zgaruvchanlikni aniqlashning yana bir yondashuvi namunalar orasidagi korrelyatsiya orqali ko'rsatilgan masofani hisoblashdir. Two commonly used measures of correlation are the Pearson’s coefficient and the Spearman’s rank correlation coefficient (12–14), which describe the directionality and strength of the relationship between two variables. The Pearson’s correlation reflects the linear relationship between two variables accounting for differences in their mean and SD, whereas the Spearman’s rank correlation is a nonparametric measure using the rank values of the two variables. The more similar the expression profiles for all transcripts are between two samples, the higher the correlation coefficient will be. These correlation coefficients are calculated between all samples and can be visualized as either a table or a heat map, allowing the investigator to assess whether replicates (technical or biological) group together. In addition to allowing an assessment of variability, both PCA and sample correlation analysis can help to identify outliers that were not excluded during upstream steps such as alignment. For instance, a sample that aligned well and demonstrated good read depth might make it to this step of the pipeline however, a PCA or correlation analysis may identify this library as a mislabeled or contaminated sample, clustering the outlier within another group. It is also possible that a correctly labeled sample will fall out as a biological outlier, such as if it was collected from an animal that was believed to have received a challenge but did not show symptoms. In summary, these analyses provide a global overview of all samples, allow for determination of outliers, and present data in an easy-to-digest format to the investigator and reader.

Using our alveolar macrophage dataset, we show a PCA plot and a heat map of Pearson’s correlation across alveolar macrophage samples: naive, transplant 2 hours postreperfusion, and transplant 24 hours postreperfusion sample groups ( Figure 1A ). Both the PCA plot and the Pearson’s correlation heat map were generated using normalized reads per kilobases of transcript per 1 million mapped reads (RPKM) counts (qarang N ormalized C ounts box). The PCA demonstrated expected grouping among replicates within samples and sample groups spread across the two PCs. PC1 accounts for 68.1% of the variance, and PC2 accounts for an additional 20.3%. The scree plot (Figure E2) confirmed that the majority of the variance within the dataset was described by the first two PCs. Although the PCA plot emphasizes intergroup variability, the Pearson’s correlation analysis ( Figure 1B ) provides an overview of all the variation between samples showing a correlation value of r > 0.9 (Table 2), consistent with each group belonging to the same cell type.

1 -rasm. Assessing inter- and intragroup variability. (A) Principal component (PC) analysis plot displaying all 12 samples along PC1 and PC2, which describe 68.1% and 20.3% of the variability, respectively, within the expression data set. PC analysis was applied to normalized (reads per kilobases of transcript per 1 million mapped reads) and log-transformed count data. (B) Pearson’s correlation plot visualizing the correlation (r) values between samples. Scale bar represents the range of the correlation coefficients (r) displayed.


Integrate input files into AskOmics

AskOmics conversion into RDF is called integration.

Ustida Files page (link at the top of the page), you will see the files you uploaded from Galaxy . We will now integrate all these files.

Hands_on Hands-on: Integrate data

  1. Got to the Files sahifa
  2. Select all the input files
  3. Click on the Integratsiyalash tugmasi

You will land on the Integratsiyalash page that shows a preview of the data present in each selected file, depending of its data type.


Target Audience

Graduates, postgraduates, and PIs working or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.

Old shartlar:

Basic familiarity with Linux environment and S, R, or Matlab.

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact [email protected] for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.


Tools to analyze RNA-seq data - Biology

A database of software tools for the analysis of single-cell RNA-seq data. To make it into the database software must be available for download and public use somewhere (CRAN, Bioconductor, PyPI, Conda, GitHub, Bitbucket, a private website etc). To view the database head to https://www.scRNA-tools.org.

This database is designed to be an overview of the currently available scRNA-seq analysis software, it is unlikely to be 100% complete or accurate but will be updated as new software becomes available.

We welcome contributions from the scRNA-seq community! If you would like to contribute please follow the have a look at the wiki or fill in the submission form on our website (https://www.scrna-tools.org/submit). Please be aware that by contributing you are agreeing to abide by the code of conduct.

If you are interested in joining the scRNA-tools team please contact us.

If you find the scRNA-tools database useful for your work please cite our publication:


Reference-based RNA-seq data analysis (Galaxy)

Galaxy is an open source, web-based platform for data intensive biomedical research. This tutorial is modified from Reference-based RNA-seq data analysis tutorial on github. In this tutorial, we will use Galaxy to analyze RNA sequencing data using a reference genome and to identify exons that are regulated by Drosophila melanogaster gene. To achieve that objectives, we will go through:

Pretreatments
The original data we use is available at NCBI Gene Expression Omnibus(GEO) under accession number GSE18508
To conduct a Differential expression analysis, we will look at 7 first samples:

  • 3 treated samples with Drosophila melanogaster gene depletion: GSM461179, GSM461180, GSM461181
  • 4 untreated samples: GSM461176, GSM461177, GSM461178, GSM461182

Each sample constitutes a separate biological replicate of the corresponding condition (treated or untreated). Moreover, two of the treated and two of the untreated samples are from a paired-end sequencing assay, while the remaining samples are from a single-end sequencing experiment.

We have extracted sequences from the Sequence Read Archive (SRA) files to build FASTQ files. All files are available on Zenodo First we need create a new history for this RNA-seq exercise. Detailed instruction is shown below:

  1. Click “History Option " icon on the top of History section.
  2. Hit “create new”. A new history will be created. You may rename the name by directly editing it.

Then we need to import a FASTQ pair (e.g. GSM461177_untreat_paired_chr4_R1.fastq and GSM461177_untreat_paired_chr4_R2.fastq ) from Zenodo, and convert file format to fastqsanger. Detailed instruction is shown below:

  1. Copy the link location
  2. Open the Galaxy Upload Manager
  3. Select “Paste/Fetch Data”
  4. Paste the link into the text field
  5. Press Start (Note that Galaxy takes the link as name. It also do not link the dataset to a database or a reference genome as default)
  6. Click on the pencil button displayed in your dataset in the history
  7. Rename the datasets according to the samples
  8. Press Save
  9. Choose Datatype on the top
  10. Select fastqsanger
  11. Press Save


Both files contain the reads that belong to chromosome 4 of a paired-end sample. The sequences are raw sequences from the sequencing machine, without any pretreatments. They need to be controlled for their quality.

For quality control, we use FastQC and Trim Galore. We first run Fastqc on both FastQ files to check quality of reads.

  1. Select the paired ended dataset (e.g GSM461177_untreat_paired_chr4_R1.fastq and GSM461177_untreat_paired_chr4_R2.fastq )
  2. Find and open FastQC from Tools bar
  3. Press Execute

Then treat for the quality of sequences by running Trim Galore on the paired-end datasets.

  1. Find and open FastQC from Tools
  2. Choose “Pair-end”
  3. Use default value for other parameters
  4. Select the paired ended dataset
  5. Press “Execute”

Finally, we may re-run FastQC on Trim Galore’s outputs and inspect the differences.

Xaritalash
To make sense of the reads, their positions within Drosophila melanogaster genome must be determined. This process is known as aligning or ‘mapping’ the reads to the reference genome. Here, we will use HISAT2 , a successor to TopHat2 that is faster with low memory requirements.To run efficiently the mapping, HISAT2 needs to know on important parameters about the sequencing library: the library type. This information should usually come with your FASTQ files, ask your sequencing facility! If not, try to find them on the site where you downloaded the data or in the corresponding publication. Another option is to estimate these parameters with a preliminary mapping of a downsampled file and some analysis programs. Afterward, the actual mapping can be redone on the original files with the optimized parameters.

We first need to run a preliminary mapping, we will estimate the library type to run HISAT2 efficiently afterwards. This step is not necessary if you don’t need to estimate the library type of your data. The library type corresponds to a protocol used to generate the data: which strand the RNA fragment is synthesized from.

In the previous illustration, you could see that for example dUTP method is to only sequence the strand from the first strand synthesis (the original RNA strand is degradated due to the dUTP incorporated).

If you do not know the library type, you can find it by yourself by mapping the reads on the reference genome and infer the library type from the mapping results by comparing reads mapping information to the annotation of the reference genome.

The sequencer always read from 5’ to 3’. So, in First Strand case, all reads from the left-most end of RNA fragment (always from 5’ to 3’) are mapped to transcript-strand, and (for pair-end sequencing) reads from the right-most end are always mapped to the opposite strand.

We can now try to determine the library type of our data. The first step is loading the Ensembl gene annotation for Drosophila melanogaster ( Drosophila_melanogaster.BDGP5.78.gtf ) from Zenodo into your current Galaxy history and rename it.

  1. “FASTQ” as “Input data format”
  2. “Individual paired reads”
  3. Downsampled “Trimmed reads pair 1” (Trim Galore output) as “Forward reads”
  4. Downsampled “Trimmed reads pair 2” (Trim Galore output) as “Reverse reads”
  5. “dm3” as reference genome
  6. Default values for other parameters

Then run Infer Experiment to determine the library type:

    1. HISAT2 output as “Input BAM/SAM file”
    2. Drosophila annotation as “Reference gene model”

    Sometimes it is difficult to find out which settings correspond to those of other programs. The following table might be helpful to identify library type:

    We can now map all the RNA sequences on the Drosophila melanogaster genome using HISAT2. HISAT2 will output a BAM file.

        1. FASTQ” as “Input data format”
        2. “Individual paired reads”
        3. “Trimmed reads pair 1” (Trim Galore output) as “Forward reads”
        4. “Trimmed reads pair 2” (Trim Galore output) as “Reverse reads”
        5. “dm3” as reference genome
        6. Default values for other parameters except “Spliced alignment parameters”
        7. “Specify strand-specific information” to the previously determined value
        8. Drosophila_melanogaster.BDGP5.78.gtf as “GTF file with known splice sites”

        We can inspect the mapping statistics:

        The BAM file contains information about where the reads are mapped on the reference genome. But it is binary file and with the information for more than 3 millions of reads, it makes it difficult to visualize it. We use IGV to visualize the HISAT2 output BAM file, particularly the region on chromosome 4 between 560kb to 600 kb.

        1. Download and install IGV on your local machine by following instruction found here
        2. Hit the BAM file
        3. click “local” under display with IGV


        Analysis of the differential gene expression
        To compare the expression of single genes between different conditions (e.g. with or without PS depletion), an first essential step is to quantify the number of reads per gene. HTSeq-count is one of the most popular tool for gene quantification.To quantify the number of reads mapped to a gene, an annotation of the gene position is needed. We already upload on Galaxy the <code>Drosophila_melanogaster.BDGP5.78.gtf</code> with the Ensembl gene annotation for Drosophila melanogaster.

        In principle, the counting of reads overlapping with genomic features is a fairly simple task. But there are some details that need to be decided, such how to handle multi-mapping reads. HTSeq-count offers 3 choices (“union”, “intersection_strict” and “intersection_nonempty”) to handle read mapping to multiple locations, reads overlapping introns, or reads that overlap more than one genomic feature:

        The recommended mode is “union”, which counts overlaps even if a read only shares parts of its sequence with a genomic feature and disregards reads that overlap more than one feature.

        1. Drosophila_melanogaster.BDGP5.78.gtf as “GFF file”
        2. The “Union” mode
        3. A “Minimum alignment quality” of 10
        4. Appropriate value for “Stranded” option

        For time and computer saving, in this section, we run the previous steps for you and obtain 7 count files, available on Zenodo. These files contain for each gene of Drosophila the number of reads mapped to it. We could compare directly the files and then having the differential gene expression. But the number of sequenced reads mapped to a gene depends on some other factors, such as expression level, length,and sequencing depth. Either for within or for inter-sample comparison, the gene counts need to be normalized. We can then use the Differential Gene Expression (DGE) analysis. This expression analysis is estimated from read counts and attempts are made to correct for variability in measurements using replicates that are absolutely essential accurate results. For your own analysis, we advice you to use at least 3, better 5 biological replicates per condition. You can have different number of replicates per condition. In our example, there are 2 factors that can explain differences in gene expression, treatment and sequencing type. Here treatment is the primary factor which we are interested in.

        DESeq2 is a great tool for DGE analysis. It takes read counts produced by HTseq-count , combine them into a big table (with gene in the rows and samples in the columns) and applies size factor normalization. To import read count files and run DESeq2 , follow instruction shown below:

        1. Create a new history
        2. import the seven count files from Zenodo
        3. Run DESeq2
        4. Set “Treatment” as first factor with “treated”
          and “untreated” as levels and selection of count files corresponding to both levels
        5. Press “insert factor”
        6. set “Sequencing” as second factor with “PE” and “SE” as levels and selection of count files corresponding to both levels (Keeping the CTRL key pressed and clicking on the files to select several files)
        7. hit “execute”

        The first output of DESeq2 is a tabular file. The columns are:

        • Gene identifiers
        • Mean normalized counts, averaged over all samples from both conditions
        • Logarithm (to basis 2) of the fold change
        • Standard error estimate for the log2 fold change estimate
        • Wald statistic
        • p-value for the statistical significance of this change
        • p-value adjusted for multiple testing with the Benjamini-Hochberg procedure which controls false discovery rate (FDR)

        To extract genes with most significant changes (adjusted p-value equal or below 0.05), we use Filter .

        1. Launch Filter
        2. Select the DESeq2 result table as input
        3. Type c7 < 0.05 in “With following condition”
        4. Press “execute”
        5. (optional)rename the output file for downstream analysis

        In addition to the list of genes, DESeq2 outputs a graphical summary of the results, useful to evaluate the quality of the experiment based on histogram of p-values for all tests, MA plot, principal Component Analysis (PCA), Heatmap of sample-to-sample distance matrix, and dispersion estimate.

        MA plot provides a global view of the relationship between the expression change of conditions (log ratios, M), the average expression strength of the genes (average mean, A), and the ability of the algorithm to detect differential gene expression. The genes that passed the significance threshold (adjusted p-value < 0.25) are colored in red.

        The heatmap provides overview over similarities and dissimilarities between samples.

        Dispersion estimates: gene-wise estimates (black), the fitted values (red), and the final maximum a posteriori estimates used in testing (blue)

        Analysis of the functional enrichment among differentially expressed genes

        We have extracted genes that are differentially expressed in treated (with PS gene depletion) samples compared to untreated samples. We would like to know the functional enrichment among the differentially expressed genes.

        The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind large lists of genes.

        The query to DAVID can be done only on 100 genes. So, we will need to select the ones where the most interested in.

        1. Launch Sort tool
        2. Select previously filtered file under “Sort Query”
        3. “Column:3” under “on column” and “Descending order under “everything in” to check most unregulated genes
        4. Press “Execute”
        5. Launch Select first tool
        6. Extract first 100 lines
        7. Lauch DAVID
        8. First column as “Column with identifiers”
        9. “ENSEMBL_GENE_ID” as “Identifier type”
        10. press “Execute”

        The output of the DAVID tool is a HTML file with a link to the DAVID website.

        Inference of the differential exon usage
        Now, we would like to know the differential exon usage between treated (PS depleted) and untreated samples using RNA-seq exon counts. We will rework on the mapping results we generated previously.

        We will use DEXSeq. DEXSeq detects high sensitivity genes, and in many cases exons, that are subject to differential exon usage. But first, as for the differential gene expression, we need to count the number of reads mapping the exons.

        Similar to the step of counting the number of reads per annotated gene. Here instead of HTSeq-count, we are using DEXSeq-Count

        1. Transfer Gene annotation file Drosophila_melanogaster.BDGP5.78.gtf from Zenodo to a Galaxy history.
        2. Launch “DEXSeq-Count”
        3. “Prepare annotation” of “Mode of operation”

        The output is again a GTF file that is ready to be used for counting. To count reads using DEXSeq-Count,

        1. “count reads” as “Mode of operation”
        2. “HISAT2 output as “Input bam file”
        3. GTF file from previous step as “DEXSeq compatible GTF file”

        This output a flatten GTF file.

        Next, we calculate differential exon usage. As for DESeq2 , in the previous step, we counted only reads that mapped to exons on chromosome 4 and for only one sample. To be able to identify differential exon usage induced by PS depletion, all datasets (3 treated and 4 untreated) must be analyzed with the similar procedure. For time saving, we use results available on Zenodo.

        1. Create a new history
        2. Import the seven count files from Zenodo and the gtf file generated from previous step
        3. Launch DEXSeq
        4. “Condition” as first factor with “treated” and “untreated” as levels and selection of count files corresponding to both levels
        5. “Sequencing” as second factor with “PE” and “SE” as levels and selection of count files corresponding to both levels

        Note that unlike DESeq2 , DEXSeq does not allow flexible primary factor names. Always use your primary factor name as “condition”. This step will take a couple hours to run.

        Similarly to DESeq2 , DEXSeq generates a table with:

        • Exon identifiers
        • Gene identifiers
        • Exon identifiers in the Gene
        • Mean normalized counts, averaged over all samples from both conditions
        • Logarithm (to basis 2) of the fold change
        • Standard error estimate for the log2 fold change estimate
        • p-value for the statistical significance of this change
        • p-value adjusted for multiple testing with the Benjamini-Hochberg procedure which controls false discovery rate

        Similarly, we also run Filter to extract exons with a a significant usage (adjusted p-value equal or below 0.05) between treated and untreated samples.

        In addition, DEXSeq generates a interactive HTML file which allows users to inspect deferentially expressed exons graphically.

        In this tutorial, we have analyzed real RNA sequencing data to extract useful information, such as which genes are up- or downregulated by depletion of the Drosophila melanogaster gene and which genes are regulated by the Drosophila melanogaster gene. To answer these questions, we analyzed RNA sequence datasets using a reference-based RNA-seq data analysis approach. This approach can be sum up with the following scheme:


        Target Audience

        Graduates, postgraduates, and PIs working or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.

        Old shartlar: Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials (up to and including “Descriptive Statistics”) before attending:


        Videoni tomosha qiling: DNK HAQIDA MALUMOT VA U BILAN BOGLIQ SODDA MASALALAR YECHISH (Iyul 2022).


Izohlar:

  1. Idas

    Bravo, bu porloq o'ylar qulay keladi

  2. Kaylyn

    Sizni xalaqit berayotganim uchun uzr so'rayman, lekin men boshqacha yo'l tutishni taklif qilaman.



Xabar yozing